graph LR subgraph trt[Treatment Decision] linkStyle default stroke:#000 A("😷") -->|"Treatment 💊"|B("<B>Predicted<br>Positive</B><br>💊<br>😷") A -->|"No Treatment"|C("<B>Predicted<br>Negative</B><br>😷") end subgraph ut[Utility of the Decision] subgraph pred[Prediction Model] B -->|"Disease 🤢"| D["<B>TP</B><br>💊<br>🤢"] B -->|"No Disease 🤨"| E["<B>FP</B><br>💊<br>🤨"] C -->|"Disease 🤢"| F["<B>FN</B><br>🤢"] C -->|"No Disease 🤨"| G["<B>TN</B><br>🤨"] end subgraph baselinestrategy[Baseline Strategy: Treat None] Dnone["<B>FN</B><br>🤢"] Enone["<B>TN</B><br>🤨"] Fnone["<B>FN</B><br>🤢"] Gnone["<B>TN</B><br>🤨"] D---Dnone E---Enone F---Fnone G---Gnone end subgraph nb[Net Benefit] Dnb[1] Enb["pt / (1-pt)"] Fnb[0] Gnb[0] Dnone---Dnb Enone---Enb Fnone---Fnb Gnone---Gnb end end style A fill:#E8F4FF, stroke:black,color:black style B fill:#E8F4FF, stroke:black,color:black style C fill:#E8F4FF, stroke:black,color:black style D fill:#C0FFC0,stroke:black,color:black style Dnone fill:#FFCCE0,stroke:black,color:black style Dnb fill: #C0FFC0,stroke:black,color:black style E fill: #FFCCE0,stroke:black,color:black style Enone fill: #C0FFC0,stroke:black,color:black style Enb fill: #FFCCE0,stroke:black,color:black style F fill:#FFCCE0,stroke:black,color:black style Fnone fill: #FFCCE0,stroke:black,color:black style Fnb fill: #E8F4FF,stroke:black,color:black style G fill: #C0FFC0,stroke:black,color:black style Gnone fill: #C0FFC0,stroke:black,color:black style Gnb fill: #E8F4FF,stroke:black,color:black style nb fill: #E8F4FF,stroke:black,color:black style pred fill: #E8F4FF,stroke:black,color:black style baselinestrategy fill: #E8F4FF,stroke:black,color:black classDef subgraphStyle fill:#FAF6EC,stroke:#333,stroke-width:1px class trt,ut subgraphStyle
Before we Validate Performance
Ideally we would like to keep Performance Validation as agnostic as possible. However, the structure of the validation set (probs
, reals
and times
) implies the nature of the related assumptions and the required use case.
So before we validate performance, let us consider the underlying process.
✍️ The User Inputs
🪛 Internal Function
✍️ Declare reference groups
The dimentions of the probs
and the real
dictionaries imply the nature of the use case:
TODO: copy from rtichoke r README.
One Model, One Population:
- Just one reference group: “model”.
Several Models, One Population:
Compare between different candidate models. - Each model stand as a reference groups such as “thin” model, or a “full” model.
Several Models, Several Populations
Compare performance over different sub-populations. - Internal Validation: “test”, “val” and “train”. - External Validation: “Framingham”, “Australia”. - Fairness: “Male”, “Female”.
✍️ Declare how to stratify predictions ✂️
The stratified_by
argument is designed for the user to choose how to stratify predictions for decision-making, each method implies different problem:
Probability Threshold
By choosing Probability Threshold as a cutoff the implied assumption is that you are concerned with individual harm or benefit.
Baseline Strategy: Treat None
Baseline Strategy: Treat All
graph LR subgraph trt[Treatment Decision] linkStyle default stroke:#000 A("😷") -->|"Treatment 💊"|B("<B>Predicted<br>Positive</B><br>💊<br>😷") A -->|"No Treatment"|C("<B>Predicted<br>Negative</B><br>😷") end subgraph ut[Utility of the Decision] subgraph pred[Prediction Model] B -->|"Disease 🤢"| D["<B>TP</B><br>💊<br>🤢"] B -->|"No Disease 🤨"| E["<B>FP</B><br>💊<br>🤨"] C -->|"Disease 🤢"| F["<B>FN</B><br>🤢"] C -->|"No Disease 🤨"| G["<B>TN</B><br>🤨"] end subgraph baselinestrategy[Baseline Strategy: Treat All] Dall["<B>TP</B><br>💊<br>🤢"] Eall["<B>FP</B><br>💊<br>🤨"] Fall["<B>TP</B><br>💊<br>🤢"] Gall["<B>FP</B><br>💊<br>🤨"] D---Dall E---Eall F---Fall G---Gall end subgraph nb[Net Benefit] Dnb[0] Enb[0] Fnb[1] Gnb["(1-pt) / pt"] Dall---Dnb Eall---Enb Fall---Fnb Gall---Gnb end end style A fill:#E8F4FF, stroke:black,color:black style B fill:#E8F4FF, stroke:black,color:black style C fill:#E8F4FF, stroke:black,color:black style D fill:#C0FFC0,stroke:black,color:black style Dall fill:#C0FFC0,stroke:black,color:black style Dnb fill:#E8F4FF,stroke:black,color:black style E fill:#FFCCE0,stroke:black,color:black style Eall fill:#FFCCE0,stroke:black,color:black style Enb fill:#E8F4FF,stroke:black,color:black style F fill:#FFCCE0,stroke:black,color:black style Fall fill:#C0FFC0,stroke:black,color:black style Fnb fill:#C0FFC0,stroke:black,color:black style G fill:#C0FFC0,stroke:black,color:black style Gall fill:#FFCCE0,stroke:black,color:black style Gnb fill:#FFCCE0,stroke:black,color:black style nb fill: #E8F4FF,stroke:black,color:black style pred fill: #E8F4FF,stroke:black,color:black style baselinestrategy fill: #E8F4FF,stroke:black,color:black classDef subgraphStyle fill:#FAF6EC,stroke:#333,stroke-width:1px class trt,ut subgraphStyle
Regardless of ranking each prediction is categorised to a bin: 0.32 -> [0.3, 0.4)
.
- Categorise Absolute Risk: 0.32 ->
[0.3, 0.4)
References: Pauker SG, Kassirer JP. Therapeutic decision making: a cost-benefit analysis. N Engl J Med. 1975;293(5):229-234. doi:10.1056/NEJM197507312930505
PPCR
graph LR subgraph trt[Treatment Allocation Decision] linkStyle default stroke:#000 A("😷<br>😷<br>😷<br>😷<br>😷<br>😷<br>😷<br>😷<br>😷<br>😷") -->|"Treatment 💊💊💊💊"|B("<B>Σ Predicted<br>Positives</B><br>💊💊💊💊<br>😷😷😷😷") A -->|"No Treatment"|C("<B>Σ Predicted<br>Negatives</B><br>😷😷😷😷😷😷") end subgraph ut[Utility of the Decision] B -->|"Disease 🤢🤢🤢"| D["<B>Σ TP</B><br>💊💊💊<br>🤢🤢🤢"] B -->|"No Disease 🤨"| E["<B>Σ FP</B><br>💊<br>🤨"] C -->|"Disease 🤢"| F["<B>Σ FN</B><br>🤢"] C -->|"No Disease 🤨🤨🤨🤨🤨"| G["<B>Σ TN</B><br>🤨🤨🤨🤨🤨"] end style A fill:#E8F4FF, stroke:black,color:black style B fill:#E8F4FF, stroke:black,color:black style C fill:#E8F4FF, stroke:black,color:black style D fill:#C0FFC0,stroke:black,color:black style E fill:#FFCCE0,stroke:black,color:black style F fill:#FFCCE0,stroke:black,color:black style G fill:#C0FFC0,stroke:black,color:black classDef subgraphStyle fill:#FAF6EC,stroke:#333,stroke-width:1px class trt,ut subgraphStyle
By choosing PPCR as a cutoff the implied assumption is that you are concerned with resource constraint and assume no individual treatment harm.
Regarding the ranking each prediction is categorised to a bin: if the absolute probability 0.32 is the 18th highest predictions out of 100, it will be categorised to the second decile -> 0.18
.
- Calculate Risk-Quantile from Absolute Risk: 0.32 ->
0.18
References: https://en.wikipedia.org/wiki/Precision_and_recall
✍️ Declare Fixed Time Horizons 🌅 (📅🤬)
The fixed_time_horizons
argument is designed for the user to choose the set of time horizons to follow.
Different followups contain different distributions of observed outcomes: Declare fixed time horizons for the prediction model, such as [5, 10] years of prediction for CVD evet.
🪛 Update Administrative Censorng
For cases with observed time-to-event is shorter than the prediction time horizon, the outcomes might change:
Real Positives
🤢 should be considered asReal Negatives
🤨, the outcome of interest did not happen yet.Always included and Encoded as 0.
Real Neagtives
🤨 should be considered asReal Censored
🤬, the event of interest could have happened in the gap between the observed time and the fixed time horizon.If adjusted: encoded as 0.
If excluded: counted with crude estimate.
Declare Heuristics Regarding ambigious reals
✍️ Declare Heuristics Regarding Censored Events 📅🤬
graph LR S0["Non Event<br>0 🤨 / 🤬"] -->|"?"|S1["Primary Event<br>1 🤢"] S0-->|"?"|S2["Competing Event<br>2 💀"] classDef nonEvent fill:#E0E0E0,stroke:#333,stroke-width:1px,color:black classDef primaryEvent fill:#808080,stroke:#333,stroke-width:1px,color:white classDef competingEvent fill:#9DB4C0,stroke:#333,stroke-width:1px,color:black class S0 nonEvent class S1 primaryEvent class S2 competingEvent class S3 censoredEvent linkStyle 0 stroke:#333,background:yellow
The censored_heuristic
argument is designed for the user to choose how interpret censored events.
Performance Validation in the face of censored observations require assumptions regarding the unobserved followup.
TODO: add link to nan-van-geloven article
Exclude Censored Events
graph LR S0["Non Event<br>0 🤨"] -->S1["Primary Event<br>1 🤢"] S0-->S2["Competing Event<br>2 💀"] S3["Censored<br>0 🤬"] classDef nonEvent fill:#E0E0E0,stroke:#333,stroke-width:1px,color:black classDef primaryEvent fill:#808080,stroke:#333,stroke-width:1px,color:white classDef censoredEvent fill:#E3F09B,stroke:#333,stroke-width:1px,color:black classDef competingEvent fill:#9DB4C0,stroke:#333,stroke-width:1px,color:black class S0 nonEvent class S1 primaryEvent class S2 competingEvent class S3 censoredEvent linkStyle 0 stroke:#333,background:yellow
All censored events to be excluded.
Underlying Assumption: Small amount of censored events. Violation of the assumption leads to: Overestimation of the observed outcomes.
Adjust Censored as partially seen Non-Event
Observed outcomes for each strata are estimated using the AJ-estimate (equivalent to CIF and KM): Each censored observation is assumed to be similar to the ones who weren’t censored.
TODO: Link to article
Underlying Assumption: Independent Censoring. Violation of the assumption leads to: Biased estimate for observed outcomes.
✍️ Declare Heuristics Regarding Competing Events 📅💀
The competing_heuristic
argument is designed for the user to choose how interpret censored events.
Performance Validation in the face of competing observations require assumptions regarding the unobserved followup.
TODO: add link to nan-van-geloven article
Exclude Competing Events
graph LR subgraph adj[Adjusted for Censoring] S0["Non Event<br>0 🤨 / 🤬"] -->S1["Primary Event<br>1 🤢"] end S0 -->S2["Competing Event<br>2 💀"] classDef nonEvent fill:#E0E0E0,stroke:#333,stroke-width:1px,color:black classDef primaryEvent fill:#808080,stroke:#333,stroke-width:1px,color:white classDef competingEvent fill:#9DB4C0,stroke:#333,stroke-width:1px,color:black class S0 nonEvent class S1 primaryEvent class S2 competingEvent linkStyle 0 stroke:#333 style adj fill:#E3F09B,color:black
All competing events to be excluded.
Underlying Assumption: Small amount of competing events. Violation of the assumption leads to: Overestimation of the observed outcomes. A competing event means that the primary event cannot happen.
Adjust Competing Events as Censored (partially seen Non-Event)
Check
graph LR subgraph adj[Adjusted for Censoring] direction LR S0["Non Event<br>0 🤨 / 🤬<br><br> Competing Event<br>2 💀"] -->S1["Primary Event<br>1 🤢"] end classDef nonEvent fill:#E0E0E0,stroke:#333,stroke-width:1px,color:black classDef primaryEvent fill:#808080,stroke:#333,stroke-width:1px,color:white classDef competingEvent fill:#9DB4C0,stroke:#333,stroke-width:1px,color:black class S0 nonEvent class S1 primaryEvent class S2 competingEvent style adj fill:#E3F09B,color:black linkStyle 0 stroke:#333
All competing events to be treated as censored.
Underlying Assumption: We consider a patient experiencing a competing event equivalent to independent censoring. Violation of the assumption leads to: Overestimation of the observed outcomes. A competing event means that the primary event cannot happen.
Adjust Competing Events as Competing
All competing events to be treated as Competing event to the primary event-of-interest.
In a way, a patient experiencing a competing event is “more” of a “real-negative” than a conventional “real-negative”.
This is derived from the assumed state-covention
Beyond the horizon time the following transition is possible: Real Neagtives
🤨 => Real Positives
🤢 💀 2
graph LR subgraph adj[Adjusted for Censoring] direction LR S0["Non Event<br>0 🤨"] -->S1["Primary Event<br>1 🤢"] S0 -->S2["Competing Event<br>2 💀"] end classDef nonEvent fill:#E0E0E0,stroke:#333,stroke-width:1px,color:black classDef primaryEvent fill:#808080,stroke:#333,stroke-width:1px,color:white classDef competingEvent fill:#9DB4C0,stroke:#333,stroke-width:1px,color:black class S0 nonEvent class S1 primaryEvent class S2 competingEvent linkStyle 0 stroke:#333 style adj fill:#E3F09B,color:black
Underlying Assumption: We consider a patient experiencing a competing event as a definite non-event. Violation of the assumption leads to Underestimation of the observed outcomes if a competing event can be considered as a different form of the primary event.
What rtichoke from now on?
Render Predictions Histogram
Extract AJ Estimate by Assumptions
For each requried combination of reference_group x predictions_strata x fixed_time_horizons x censored_heuristic x competing_heuristic a separate AJ estimated is calculated for the adjusted reals
and a Crude estimate is calculated for the excluded reals
.
The sum of the AJ estimates for each predictions_strata is equal to the overal AJ estimate.