Before we Validate Performance

Author

Uriah Finkel

Ideally we would like to keep Performance Validation as agnostic as possible. However, the structure of the validation set (probs, reals and times) implies the nature of the related assumptions and the required use case.

So before we validate performance, let us consider the underlying process.

✍️ The User Inputs
🪛 Internal Function

✍️ Declare reference groups

The dimentions of the probs and the real dictionaries imply the nature of the use case:

TODO: copy from rtichoke r README.

One Model, One Population:

Just one reference group: “model”.

Several Models, One Population:

Compare between different candidate models. - Each model stand as a reference groups such as “thin” model, or a “full” model.

Several Models, Several Populations

Compare performance over different sub-populations. - Internal Validation: “test”, “val” and “train”. - External Validation: “Framingham”, “Australia”. - Fairness: “Male”, “Female”.

✍️ Declare how to stratify predictions ✂️

The stratified_by argument is designed for the user to choose how to stratify predictions for decision-making, each method implies different problem:

Probability Threshold

By choosing Probability Threshold as a cutoff the implied assumption is that you are concerned with individual harm or benefit.

Baseline Strategy: Treat None

graph LR
    subgraph trt[Treatment Decision]
        linkStyle default stroke:#000
        A("😷") -->|"Treatment 💊"|B("<B>Predicted<br>Positive</B><br>💊<br>😷")
        A -->|"No Treatment"|C("<B>Predicted<br>Negative</B><br>😷")    
    end

    subgraph ut[Utility of the Decision]
        subgraph pred[Prediction Model]
            B -->|"Disease 🤢"| D["<B>TP</B><br>💊<br>🤢"]
            B -->|"No Disease 🤨"| E["<B>FP</B><br>💊<br>🤨"]
            C -->|"Disease 🤢"| F["<B>FN</B><br>🤢"]
            C -->|"No Disease 🤨"| G["<B>TN</B><br>🤨"]
        end
        subgraph baselinestrategy[Baseline Strategy: Treat None]
            Dnone["<B>FN</B><br>🤢"]
            Enone["<B>TN</B><br>🤨"]
            Fnone["<B>FN</B><br>🤢"]
            Gnone["<B>TN</B><br>🤨"]
        
            D---Dnone
            E---Enone
            F---Fnone
            G---Gnone
        end
        subgraph nb[Net Benefit]
            Dnb[1]
            Enb["pt / (1-pt)"]
            Fnb[0]
            Gnb[0]
        Dnone---Dnb
        Enone---Enb
        Fnone---Fnb
        Gnone---Gnb
        end
    end



    style A fill:#E8F4FF, stroke:black,color:black
    style B fill:#E8F4FF, stroke:black,color:black
    style C fill:#E8F4FF, stroke:black,color:black
    style D fill:#C0FFC0,stroke:black,color:black
    style Dnone fill:#FFCCE0,stroke:black,color:black
    style Dnb fill: #C0FFC0,stroke:black,color:black
    style E fill: #FFCCE0,stroke:black,color:black
    style Enone fill: #C0FFC0,stroke:black,color:black
    style Enb fill: #FFCCE0,stroke:black,color:black
    style F fill:#FFCCE0,stroke:black,color:black
    style Fnone fill: #FFCCE0,stroke:black,color:black
    style Fnb fill: #E8F4FF,stroke:black,color:black
    style G fill: #C0FFC0,stroke:black,color:black
    style Gnone fill: #C0FFC0,stroke:black,color:black
    style Gnb fill: #E8F4FF,stroke:black,color:black
    style nb fill: #E8F4FF,stroke:black,color:black 
    style pred fill: #E8F4FF,stroke:black,color:black
    style baselinestrategy fill: #E8F4FF,stroke:black,color:black

    classDef subgraphStyle fill:#FAF6EC,stroke:#333,stroke-width:1px
    class trt,ut subgraphStyle

Baseline Strategy: Treat All

graph LR
    subgraph trt[Treatment Decision]
        linkStyle default stroke:#000
        A("😷") -->|"Treatment 💊"|B("<B>Predicted<br>Positive</B><br>💊<br>😷")
        A -->|"No Treatment"|C("<B>Predicted<br>Negative</B><br>😷")    
    end

    subgraph ut[Utility of the Decision]
        subgraph pred[Prediction Model]
            B -->|"Disease 🤢"| D["<B>TP</B><br>💊<br>🤢"]
            B -->|"No Disease 🤨"| E["<B>FP</B><br>💊<br>🤨"]
            C -->|"Disease 🤢"| F["<B>FN</B><br>🤢"]
            C -->|"No Disease 🤨"| G["<B>TN</B><br>🤨"]
        end
        subgraph baselinestrategy[Baseline Strategy: Treat All]
            Dall["<B>TP</B><br>💊<br>🤢"]
            Eall["<B>FP</B><br>💊<br>🤨"]
            Fall["<B>TP</B><br>💊<br>🤢"]
            Gall["<B>FP</B><br>💊<br>🤨"]
        
            D---Dall
            E---Eall
            F---Fall
            G---Gall
        end
        subgraph nb[Net Benefit]
            Dnb[0]
            Enb[0]
            Fnb[1]
            Gnb["(1-pt) / pt"]
        Dall---Dnb
        Eall---Enb
        Fall---Fnb
        Gall---Gnb
        end
    end



    style A fill:#E8F4FF, stroke:black,color:black
    style B fill:#E8F4FF, stroke:black,color:black
    style C fill:#E8F4FF, stroke:black,color:black
    style D fill:#C0FFC0,stroke:black,color:black
    style Dall fill:#C0FFC0,stroke:black,color:black
    style Dnb fill:#E8F4FF,stroke:black,color:black
    style E fill:#FFCCE0,stroke:black,color:black
    style Eall fill:#FFCCE0,stroke:black,color:black
    style Enb fill:#E8F4FF,stroke:black,color:black
    style F fill:#FFCCE0,stroke:black,color:black
    style Fall fill:#C0FFC0,stroke:black,color:black
    style Fnb fill:#C0FFC0,stroke:black,color:black
    style G fill:#C0FFC0,stroke:black,color:black
    style Gall fill:#FFCCE0,stroke:black,color:black
    style Gnb fill:#FFCCE0,stroke:black,color:black
    style nb fill: #E8F4FF,stroke:black,color:black 
    style pred fill: #E8F4FF,stroke:black,color:black
    style baselinestrategy fill: #E8F4FF,stroke:black,color:black

    classDef subgraphStyle fill:#FAF6EC,stroke:#333,stroke-width:1px
    class trt,ut subgraphStyle

Regardless of ranking each prediction is categorised to a bin: 0.32 -> [0.3, 0.4).

Categorise Absolute Risk: 0.32 -> [0.3, 0.4)

References: Pauker SG, Kassirer JP. Therapeutic decision making: a cost-benefit analysis. N Engl J Med. 1975;293(5):229-234. doi:10.1056/NEJM197507312930505

PPCR

graph LR
    subgraph trt[Treatment Allocation Decision]
        linkStyle default stroke:#000
        A("😷<br>😷<br>😷<br>😷<br>😷<br>😷<br>😷<br>😷<br>😷<br>😷") -->|"Treatment 💊💊💊💊"|B("<B>Σ Predicted<br>Positives</B><br>💊💊💊💊<br>😷😷😷😷")
        A -->|"No Treatment"|C("<B>Σ Predicted<br>Negatives</B><br>😷😷😷😷😷😷")    
    end

    subgraph ut[Utility of the Decision]
        B -->|"Disease 🤢🤢🤢"| D["<B>Σ TP</B><br>💊💊💊<br>🤢🤢🤢"]
        B -->|"No Disease 🤨"| E["<B>Σ FP</B><br>💊<br>🤨"]
        C -->|"Disease 🤢"| F["<B>Σ FN</B><br>🤢"]
        C -->|"No Disease 🤨🤨🤨🤨🤨"| G["<B>Σ TN</B><br>🤨🤨🤨🤨🤨"]  
    end



    style A fill:#E8F4FF, stroke:black,color:black
    style B fill:#E8F4FF, stroke:black,color:black
    style C fill:#E8F4FF, stroke:black,color:black
    style D fill:#C0FFC0,stroke:black,color:black
    style E fill:#FFCCE0,stroke:black,color:black
    style F fill:#FFCCE0,stroke:black,color:black
    style G fill:#C0FFC0,stroke:black,color:black

    classDef subgraphStyle fill:#FAF6EC,stroke:#333,stroke-width:1px
    class trt,ut subgraphStyle

By choosing PPCR as a cutoff the implied assumption is that you are concerned with resource constraint and assume no individual treatment harm.

Regarding the ranking each prediction is categorised to a bin: if the absolute probability 0.32 is the 18th highest predictions out of 100, it will be categorised to the second decile -> 0.18.

Calculate Risk-Quantile from Absolute Risk: 0.32 -> 0.18

References: https://en.wikipedia.org/wiki/Precision_and_recall

✍️ Declare Fixed Time Horizons 🌅 (📅🤬)

The fixed_time_horizons argument is designed for the user to choose the set of time horizons to follow.

Different followups contain different distributions of observed outcomes: Declare fixed time horizons for the prediction model, such as [5, 10] years of prediction for CVD evet.

🪛 Update Administrative Censorng

For cases with observed time-to-event is shorter than the prediction time horizon, the outcomes might change:

Real Positives 🤢 should be considered as Real Negatives 🤨, the outcome of interest did not happen yet.
Always included and Encoded as 0.
Real Neagtives 🤨 should be considered as Real Censored 🤬, the event of interest could have happened in the gap between the observed time and the fixed time horizon.
If adjusted: encoded as 0.
If excluded: counted with crude estimate.

data = [
  { time: 1, real: "🤨", id: 1, time_horizon: 1 },
  { time: 3, real: "🤨", id: 1, time_horizon: 3 },
  { time: 5, real: "🤨", id: 1, time_horizon: 5 },
  { time: 1, real: "🤨", id: 2, time_horizon: 1 },
  { time: 3, real: "🤨", id: 2, time_horizon: 3 },
  { time: 5, real: "🤨", id: 2, time_horizon: 5 },
  { time: 1, real: "🤨", id: 3, time_horizon: 1 },
  { time: 3, real: "🤨", id: 3, time_horizon: 3 },
  { time: 4.6, real: "🤬", id: 3, time_horizon: 5 },
  { time: 1, real: "🤨", id: 4, time_horizon: 1 },
  { time: 3, real: "🤨", id: 4, time_horizon: 3 },
  { time: 5, real: "🤨", id: 4, time_horizon: 5 },
  { time: 1, real: "🤨", id: 5, time_horizon: 1 },
  { time: 2.4, real: "🤢", id: 5, time_horizon: 3 },
  { time: 2.4, real: "🤢", id: 5, time_horizon: 5 },
  { time: 1, real: "🤨", id: 6, time_horizon: 1 },
  { time: 3, real: "🤨", id: 6, time_horizon: 3 },
  { time: 4.4, real: "🤬", id: 6, time_horizon: 5 },
  { time: 0.4, real: "🤢", id: 7, time_horizon: 1 },
  { time: 0.4, real: "🤢", id: 7, time_horizon: 3 },
  { time: 0.4, real: "🤢", id: 7, time_horizon: 5 },
  { time: 1, real: "🤨", id: 8, time_horizon: 1 },
  { time: 1.6, real: "💀", id: 8, time_horizon: 3 },
  { time: 1.6, real: "💀", id: 8, time_horizon: 5 },
  { time: 0.8, real: "🤢", id: 9, time_horizon: 1 },
  { time: 0.8, real: "🤢", id: 9, time_horizon: 3 },
  { time: 0.8, real: "🤢", id: 9, time_horizon: 5 },
  { time: 1, real: "🤨", id: 10, time_horizon: 1 },
  { time: 2.9, real: "🤢", id: 10, time_horizon: 3 },
  { time: 2.9, real: "🤢", id: 10, time_horizon: 5 }
]

filteredData = data.filter((d) => d.time_horizon == timeHorizon)

viewof timeHorizon = Inputs.range([1, 5], {
  step: 2,
  value: 5,
  label: "Time Horizon"
})

Plot.plot({
  x: {
    domain: [0, 5]
  },
  y: {
    domain: [0, 11],
    axis: false
  },
  marks: [
    Plot.ruleX([timeHorizon], {
      stroke: "#D9E8A3",
      strokeWidth: 6,
      strokeDasharray: "5,5",
      y1: 0,
      y2: 10 // Should match the y-domain max
    }),
    Plot.ruleY(filteredData, {
      x: "time",
      y: "id",
      strokeWidth: 1.5
    }),
    Plot.text(filteredData, {
      x: "time",
      y: "id",
      text: "real",
      tip: true,
      fontSize: 30
    })
  ]
})

Declare Heuristics Regarding ambigious `reals`

✍️ Declare Heuristics Regarding Censored Events 📅🤬

graph LR
    S0["Non Event<br>0 🤨 / 🤬"] -->|"?"|S1["Primary Event<br>1 🤢"]
    S0-->|"?"|S2["Competing Event<br>2 💀"]

    
    classDef nonEvent fill:#E0E0E0,stroke:#333,stroke-width:1px,color:black
    classDef primaryEvent fill:#808080,stroke:#333,stroke-width:1px,color:white
    classDef competingEvent fill:#9DB4C0,stroke:#333,stroke-width:1px,color:black
    
    class S0 nonEvent
    class S1 primaryEvent
    class S2 competingEvent
    class S3 censoredEvent

    linkStyle 0 stroke:#333,background:yellow

The censored_heuristic argument is designed for the user to choose how interpret censored events.

Performance Validation in the face of censored observations require assumptions regarding the unobserved followup.

TODO: add link to nan-van-geloven article

Exclude Censored Events

graph LR
    S0["Non Event<br>0 🤨"] -->S1["Primary Event<br>1 🤢"]
    S0-->S2["Competing Event<br>2 💀"]

     S3["Censored<br>0 🤬"]

    
    classDef nonEvent fill:#E0E0E0,stroke:#333,stroke-width:1px,color:black
    classDef primaryEvent fill:#808080,stroke:#333,stroke-width:1px,color:white
    classDef censoredEvent fill:#E3F09B,stroke:#333,stroke-width:1px,color:black
    classDef competingEvent fill:#9DB4C0,stroke:#333,stroke-width:1px,color:black
    
    class S0 nonEvent
    class S1 primaryEvent
    class S2 competingEvent
    class S3 censoredEvent

    linkStyle 0 stroke:#333,background:yellow

All censored events to be excluded.

Underlying Assumption: Small amount of censored events. Violation of the assumption leads to: Overestimation of the observed outcomes.

Adjust Censored as partially seen Non-Event

Observed outcomes for each strata are estimated using the AJ-estimate (equivalent to CIF and KM): Each censored observation is assumed to be similar to the ones who weren’t censored.

TODO: Link to article

Underlying Assumption: Independent Censoring. Violation of the assumption leads to: Biased estimate for observed outcomes.

✍️ Declare Heuristics Regarding Competing Events 📅💀

The competing_heuristic argument is designed for the user to choose how interpret censored events.

Performance Validation in the face of competing observations require assumptions regarding the unobserved followup.

TODO: add link to nan-van-geloven article

Exclude Competing Events

graph LR
    subgraph adj[Adjusted for Censoring]
    S0["Non Event<br>0 🤨 / 🤬"] -->S1["Primary Event<br>1 🤢"]    
    end
    S0 -->S2["Competing Event<br>2 💀"]

    
    classDef nonEvent fill:#E0E0E0,stroke:#333,stroke-width:1px,color:black
    classDef primaryEvent fill:#808080,stroke:#333,stroke-width:1px,color:white
    classDef competingEvent fill:#9DB4C0,stroke:#333,stroke-width:1px,color:black
    
    class S0 nonEvent
    class S1 primaryEvent
    class S2 competingEvent

    linkStyle 0 stroke:#333
    
    style adj fill:#E3F09B,color:black

All competing events to be excluded.

Underlying Assumption: Small amount of competing events. Violation of the assumption leads to: Overestimation of the observed outcomes. A competing event means that the primary event cannot happen.

Adjust Competing Events as Censored (partially seen Non-Event)

Check

graph LR
    subgraph adj[Adjusted for Censoring]
    direction LR
    S0["Non Event<br>0 🤨 / 🤬<br><br> Competing Event<br>2 💀"] -->S1["Primary Event<br>1 🤢"]    
    end

    
    classDef nonEvent fill:#E0E0E0,stroke:#333,stroke-width:1px,color:black
    classDef primaryEvent fill:#808080,stroke:#333,stroke-width:1px,color:white
    classDef competingEvent fill:#9DB4C0,stroke:#333,stroke-width:1px,color:black
    
    class S0 nonEvent
    class S1 primaryEvent
    class S2 competingEvent

    style adj fill:#E3F09B,color:black


    linkStyle 0 stroke:#333

All competing events to be treated as censored.

Underlying Assumption: We consider a patient experiencing a competing event equivalent to independent censoring. Violation of the assumption leads to: Overestimation of the observed outcomes. A competing event means that the primary event cannot happen.

Adjust Competing Events as Competing

All competing events to be treated as Competing event to the primary event-of-interest.

In a way, a patient experiencing a competing event is “more” of a “real-negative” than a conventional “real-negative”.

This is derived from the assumed state-covention

Beyond the horizon time the following transition is possible: Real Neagtives 🤨 => Real Positives 🤢 💀 2

graph LR
subgraph adj[Adjusted for Censoring]
    direction LR
    S0["Non Event<br>0 🤨"] -->S1["Primary Event<br>1 🤢"]    
    S0 -->S2["Competing Event<br>2 💀"]
    end

    
    classDef nonEvent fill:#E0E0E0,stroke:#333,stroke-width:1px,color:black
    classDef primaryEvent fill:#808080,stroke:#333,stroke-width:1px,color:white
    classDef competingEvent fill:#9DB4C0,stroke:#333,stroke-width:1px,color:black

    class S0 nonEvent
    class S1 primaryEvent
    class S2 competingEvent

    linkStyle 0 stroke:#333
    style adj fill:#E3F09B,color:black

Underlying Assumption: We consider a patient experiencing a competing event as a definite non-event. Violation of the assumption leads to Underestimation of the observed outcomes if a competing event can be considered as a different form of the primary event.

What rtichoke from now on?

Render Predictions Histogram

Extract AJ Estimate by Assumptions

For each requried combination of reference_group x predictions_strata x fixed_time_horizons x censored_heuristic x competing_heuristic a separate AJ estimated is calculated for the adjusted reals and a Crude estimate is calculated for the excluded reals.

The sum of the AJ estimates for each predictions_strata is equal to the overal AJ estimate.