graph LR
subgraph trt[Treatment Decision]
linkStyle default stroke:#000
A("😷") -->|"Treatment 💊"|B("<B>Predicted<br>Positive</B><br>💊<br>😷")
A -->|"No Treatment"|C("<B>Predicted<br>Negative</B><br>😷")
end
subgraph ut[Utility of the Decision]
subgraph pred[Prediction Model]
B -->|"Disease 🤢"| D["<B>TP</B><br>💊<br>🤢"]
B -->|"No Disease 🤨"| E["<B>FP</B><br>💊<br>🤨"]
C -->|"Disease 🤢"| F["<B>FN</B><br>🤢"]
C -->|"No Disease 🤨"| G["<B>TN</B><br>🤨"]
end
subgraph baselinestrategy[Baseline Strategy: Treat None]
Dnone["<B>FN</B><br>🤢"]
Enone["<B>TN</B><br>🤨"]
Fnone["<B>FN</B><br>🤢"]
Gnone["<B>TN</B><br>🤨"]
D---Dnone
E---Enone
F---Fnone
G---Gnone
end
subgraph nb[Net Benefit]
Dnb[1]
Enb["pt / (1-pt)"]
Fnb[0]
Gnb[0]
Dnone---Dnb
Enone---Enb
Fnone---Fnb
Gnone---Gnb
end
end
style A fill:#E8F4FF, stroke:black,color:black
style B fill:#E8F4FF, stroke:black,color:black
style C fill:#E8F4FF, stroke:black,color:black
style D fill:#C0FFC0,stroke:black,color:black
style Dnone fill:#FFCCE0,stroke:black,color:black
style Dnb fill: #C0FFC0,stroke:black,color:black
style E fill: #FFCCE0,stroke:black,color:black
style Enone fill: #C0FFC0,stroke:black,color:black
style Enb fill: #FFCCE0,stroke:black,color:black
style F fill:#FFCCE0,stroke:black,color:black
style Fnone fill: #FFCCE0,stroke:black,color:black
style Fnb fill: #E8F4FF,stroke:black,color:black
style G fill: #C0FFC0,stroke:black,color:black
style Gnone fill: #C0FFC0,stroke:black,color:black
style Gnb fill: #E8F4FF,stroke:black,color:black
style nb fill: #E8F4FF,stroke:black,color:black
style pred fill: #E8F4FF,stroke:black,color:black
style baselinestrategy fill: #E8F4FF,stroke:black,color:black
classDef subgraphStyle fill:#FAF6EC,stroke:#333,stroke-width:1px
class trt,ut subgraphStyle
Before we Validate Performance
Ideally we would like to keep Performance Validation as agnostic as possible. However, the structure of the validation set (probs, reals and times) implies the nature of the related assumptions and the required use case.
So before we validate performance, let us consider the underlying process.
✍️ The User Inputs
🪛 Internal Function
✍️ Declare reference groups
The dimentions of the probs and the real dictionaries imply the nature of the use case:
TODO: copy from rtichoke r README.
One Model, One Population:
- Just one reference group: “model”.
Several Models, One Population:
Compare between different candidate models. - Each model stand as a reference groups such as “thin” model, or a “full” model.
Several Models, Several Populations
Compare performance over different sub-populations. - Internal Validation: “test”, “val” and “train”. - External Validation: “Framingham”, “Australia”. - Fairness: “Male”, “Female”.
✍️ Declare how to stratify predictions ✂️
The stratified_by argument is designed for the user to choose how to stratify predictions for decision-making, each method implies different problem:
Probability Threshold
By choosing Probability Threshold as a cutoff the implied assumption is that you are concerned with individual harm or benefit.
Baseline Strategy: Treat None
Baseline Strategy: Treat All
graph LR
subgraph trt[Treatment Decision]
linkStyle default stroke:#000
A("😷") -->|"Treatment 💊"|B("<B>Predicted<br>Positive</B><br>💊<br>😷")
A -->|"No Treatment"|C("<B>Predicted<br>Negative</B><br>😷")
end
subgraph ut[Utility of the Decision]
subgraph pred[Prediction Model]
B -->|"Disease 🤢"| D["<B>TP</B><br>💊<br>🤢"]
B -->|"No Disease 🤨"| E["<B>FP</B><br>💊<br>🤨"]
C -->|"Disease 🤢"| F["<B>FN</B><br>🤢"]
C -->|"No Disease 🤨"| G["<B>TN</B><br>🤨"]
end
subgraph baselinestrategy[Baseline Strategy: Treat All]
Dall["<B>TP</B><br>💊<br>🤢"]
Eall["<B>FP</B><br>💊<br>🤨"]
Fall["<B>TP</B><br>💊<br>🤢"]
Gall["<B>FP</B><br>💊<br>🤨"]
D---Dall
E---Eall
F---Fall
G---Gall
end
subgraph nb[Net Benefit]
Dnb[0]
Enb[0]
Fnb[1]
Gnb["(1-pt) / pt"]
Dall---Dnb
Eall---Enb
Fall---Fnb
Gall---Gnb
end
end
style A fill:#E8F4FF, stroke:black,color:black
style B fill:#E8F4FF, stroke:black,color:black
style C fill:#E8F4FF, stroke:black,color:black
style D fill:#C0FFC0,stroke:black,color:black
style Dall fill:#C0FFC0,stroke:black,color:black
style Dnb fill:#E8F4FF,stroke:black,color:black
style E fill:#FFCCE0,stroke:black,color:black
style Eall fill:#FFCCE0,stroke:black,color:black
style Enb fill:#E8F4FF,stroke:black,color:black
style F fill:#FFCCE0,stroke:black,color:black
style Fall fill:#C0FFC0,stroke:black,color:black
style Fnb fill:#C0FFC0,stroke:black,color:black
style G fill:#C0FFC0,stroke:black,color:black
style Gall fill:#FFCCE0,stroke:black,color:black
style Gnb fill:#FFCCE0,stroke:black,color:black
style nb fill: #E8F4FF,stroke:black,color:black
style pred fill: #E8F4FF,stroke:black,color:black
style baselinestrategy fill: #E8F4FF,stroke:black,color:black
classDef subgraphStyle fill:#FAF6EC,stroke:#333,stroke-width:1px
class trt,ut subgraphStyle
Regardless of ranking each prediction is categorised to a bin: 0.32 -> [0.3, 0.4).
- Categorise Absolute Risk: 0.32 ->
[0.3, 0.4)
References: Pauker SG, Kassirer JP. Therapeutic decision making: a cost-benefit analysis. N Engl J Med. 1975;293(5):229-234. doi:10.1056/NEJM197507312930505
PPCR
graph LR
subgraph trt[Treatment Allocation Decision]
linkStyle default stroke:#000
A("😷<br>😷<br>😷<br>😷<br>😷<br>😷<br>😷<br>😷<br>😷<br>😷") -->|"Treatment 💊💊💊💊"|B("<B>Σ Predicted<br>Positives</B><br>💊💊💊💊<br>😷😷😷😷")
A -->|"No Treatment"|C("<B>Σ Predicted<br>Negatives</B><br>😷😷😷😷😷😷")
end
subgraph ut[Utility of the Decision]
B -->|"Disease 🤢🤢🤢"| D["<B>Σ TP</B><br>💊💊💊<br>🤢🤢🤢"]
B -->|"No Disease 🤨"| E["<B>Σ FP</B><br>💊<br>🤨"]
C -->|"Disease 🤢"| F["<B>Σ FN</B><br>🤢"]
C -->|"No Disease 🤨🤨🤨🤨🤨"| G["<B>Σ TN</B><br>🤨🤨🤨🤨🤨"]
end
style A fill:#E8F4FF, stroke:black,color:black
style B fill:#E8F4FF, stroke:black,color:black
style C fill:#E8F4FF, stroke:black,color:black
style D fill:#C0FFC0,stroke:black,color:black
style E fill:#FFCCE0,stroke:black,color:black
style F fill:#FFCCE0,stroke:black,color:black
style G fill:#C0FFC0,stroke:black,color:black
classDef subgraphStyle fill:#FAF6EC,stroke:#333,stroke-width:1px
class trt,ut subgraphStyle
By choosing PPCR as a cutoff the implied assumption is that you are concerned with resource constraint and assume no individual treatment harm.
Regarding the ranking each prediction is categorised to a bin: if the absolute probability 0.32 is the 18th highest predictions out of 100, it will be categorised to the second decile -> 0.18.
- Calculate Risk-Quantile from Absolute Risk: 0.32 ->
0.18
References: https://en.wikipedia.org/wiki/Precision_and_recall
✍️ Declare Fixed Time Horizons 🌅 (📅🤬)
The fixed_time_horizons argument is designed for the user to choose the set of time horizons to follow.
Different followups contain different distributions of observed outcomes: Declare fixed time horizons for the prediction model, such as [5, 10] years of prediction for CVD evet.
🪛 Update Administrative Censorng
For cases with observed time-to-event is shorter than the prediction time horizon, the outcomes might change:
Real Positives🤢 should be considered asReal Negatives🤨, the outcome of interest did not happen yet.Always included and Encoded as 0.
Real Neagtives🤨 should be considered asReal Censored🤬, the event of interest could have happened in the gap between the observed time and the fixed time horizon.If adjusted: encoded as 0.
If excluded: counted with crude estimate.
Declare Heuristics Regarding ambigious reals
✍️ Declare Heuristics Regarding Censored Events 📅🤬
graph LR
S0["Non Event<br>0 🤨 / 🤬"] -->|"?"|S1["Primary Event<br>1 🤢"]
S0-->|"?"|S2["Competing Event<br>2 💀"]
classDef nonEvent fill:#E0E0E0,stroke:#333,stroke-width:1px,color:black
classDef primaryEvent fill:#808080,stroke:#333,stroke-width:1px,color:white
classDef competingEvent fill:#9DB4C0,stroke:#333,stroke-width:1px,color:black
class S0 nonEvent
class S1 primaryEvent
class S2 competingEvent
class S3 censoredEvent
linkStyle 0 stroke:#333,background:yellow
The censored_heuristic argument is designed for the user to choose how interpret censored events.
Performance Validation in the face of censored observations require assumptions regarding the unobserved followup.
TODO: add link to nan-van-geloven article
Exclude Censored Events
graph LR
S0["Non Event<br>0 🤨"] -->S1["Primary Event<br>1 🤢"]
S0-->S2["Competing Event<br>2 💀"]
S3["Censored<br>0 🤬"]
classDef nonEvent fill:#E0E0E0,stroke:#333,stroke-width:1px,color:black
classDef primaryEvent fill:#808080,stroke:#333,stroke-width:1px,color:white
classDef censoredEvent fill:#E3F09B,stroke:#333,stroke-width:1px,color:black
classDef competingEvent fill:#9DB4C0,stroke:#333,stroke-width:1px,color:black
class S0 nonEvent
class S1 primaryEvent
class S2 competingEvent
class S3 censoredEvent
linkStyle 0 stroke:#333,background:yellow
All censored events to be excluded.
Underlying Assumption: Small amount of censored events. Violation of the assumption leads to: Overestimation of the observed outcomes.
Adjust Censored as partially seen Non-Event
Observed outcomes for each strata are estimated using the AJ-estimate (equivalent to CIF and KM): Each censored observation is assumed to be similar to the ones who weren’t censored.
TODO: Link to article
Underlying Assumption: Independent Censoring. Violation of the assumption leads to: Biased estimate for observed outcomes.
✍️ Declare Heuristics Regarding Competing Events 📅💀
The competing_heuristic argument is designed for the user to choose how interpret censored events.
Performance Validation in the face of competing observations require assumptions regarding the unobserved followup.
TODO: add link to nan-van-geloven article
Exclude Competing Events
graph LR
subgraph adj[Adjusted for Censoring]
S0["Non Event<br>0 🤨 / 🤬"] -->S1["Primary Event<br>1 🤢"]
end
S0 -->S2["Competing Event<br>2 💀"]
classDef nonEvent fill:#E0E0E0,stroke:#333,stroke-width:1px,color:black
classDef primaryEvent fill:#808080,stroke:#333,stroke-width:1px,color:white
classDef competingEvent fill:#9DB4C0,stroke:#333,stroke-width:1px,color:black
class S0 nonEvent
class S1 primaryEvent
class S2 competingEvent
linkStyle 0 stroke:#333
style adj fill:#E3F09B,color:black
All competing events to be excluded.
Underlying Assumption: Small amount of competing events. Violation of the assumption leads to: Overestimation of the observed outcomes. A competing event means that the primary event cannot happen.
Adjust Competing Events as Censored (partially seen Non-Event)
Check
graph LR
subgraph adj[Adjusted for Censoring]
direction LR
S0["Non Event<br>0 🤨 / 🤬<br><br> Competing Event<br>2 💀"] -->S1["Primary Event<br>1 🤢"]
end
classDef nonEvent fill:#E0E0E0,stroke:#333,stroke-width:1px,color:black
classDef primaryEvent fill:#808080,stroke:#333,stroke-width:1px,color:white
classDef competingEvent fill:#9DB4C0,stroke:#333,stroke-width:1px,color:black
class S0 nonEvent
class S1 primaryEvent
class S2 competingEvent
style adj fill:#E3F09B,color:black
linkStyle 0 stroke:#333
All competing events to be treated as censored.
Underlying Assumption: We consider a patient experiencing a competing event equivalent to independent censoring. Violation of the assumption leads to: Overestimation of the observed outcomes. A competing event means that the primary event cannot happen.
Adjust Competing Events as Competing
All competing events to be treated as Competing event to the primary event-of-interest.
In a way, a patient experiencing a competing event is “more” of a “real-negative” than a conventional “real-negative”.
This is derived from the assumed state-covention
Beyond the horizon time the following transition is possible: Real Neagtives 🤨 => Real Positives 🤢 💀 2
graph LR
subgraph adj[Adjusted for Censoring]
direction LR
S0["Non Event<br>0 🤨"] -->S1["Primary Event<br>1 🤢"]
S0 -->S2["Competing Event<br>2 💀"]
end
classDef nonEvent fill:#E0E0E0,stroke:#333,stroke-width:1px,color:black
classDef primaryEvent fill:#808080,stroke:#333,stroke-width:1px,color:white
classDef competingEvent fill:#9DB4C0,stroke:#333,stroke-width:1px,color:black
class S0 nonEvent
class S1 primaryEvent
class S2 competingEvent
linkStyle 0 stroke:#333
style adj fill:#E3F09B,color:black
Underlying Assumption: We consider a patient experiencing a competing event as a definite non-event. Violation of the assumption leads to Underestimation of the observed outcomes if a competing event can be considered as a different form of the primary event.
What rtichoke from now on?
Render Predictions Histogram
Extract AJ Estimate by Assumptions
For each requried combination of reference_group x predictions_strata x fixed_time_horizons x censored_heuristic x competing_heuristic a separate AJ estimated is calculated for the adjusted reals and a Crude estimate is calculated for the excluded reals.
The sum of the AJ estimates for each predictions_strata is equal to the overal AJ estimate.