calculate_pseudostates
calculate_pseudostates(times_and_reals, fixed_time_horizon)Compute Aalen Johansen pseudo-observations at a fixed time horizon.
This function computes jackknife pseudo-observations for state occupation probabilities (or related Aalen Johansen estimands) evaluated at a fixed time horizon. Pseudo-observations are constructed using a leave-one-out scheme:
.. math::
\text{pseudo}_i = n \cdot \theta_{\text{full}} - (n - 1) \cdot \theta_{(-i)}
where :math:\theta_{\text{full}} is the Aalen Johansen estimate based on the full sample, and :math:\theta_{(-i)} is the estimate obtained after removing individual :math:i.
The resulting pseudo-observations can be used as individual-level outcomes in regression models (e.g., GEE or GLM) to assess covariate effects on cumulative incidence or state occupation probabilities.
Parameters
| Name | Type | Description | Default |
|---|---|---|---|
| times_and_reals | pl.DataFrame | Input data containing individual event times and realized states. Must be compatible with prepare_event_table and predict_aj_estimates. Each row corresponds to a single individual. |
required |
| fixed_time_horizon | int | Time point at which the Aalen Johansen estimates are evaluated. | required |
Returns
| Name | Type | Description |
|---|---|---|
| pl.DataFrame | A Polars DataFrame containing pseudo-observations for each individual. The output includes: - Identifier columns (e.g., state, time, horizon) copied from the full-sample Aalen Johansen estimate. - Numeric columns containing the pseudo-observations. - A row_id column indicating the index of the left-out observation. For each numeric column, the mean of the pseudo-observations equals the corresponding full-sample Aalen Johansen estimate. |
See Also
polarstate.prepare_event_table : Convert raw data into an event table. polarstate.predict_aj_estimates : Predict Aalen-Johansen estimates.
Notes
- This implementation uses an explicit leave-one-out loop and therefore has time complexity O(n²). It is intended for methodological work, simulations, or moderate sample sizes.
- In the absence of censoring, the pseudo-observations reduce to the empirical individual contributions.
- Non-numeric columns (e.g., state labels) are treated as identifiers and are not transformed.
References
.. [1] Andersen, P. K., & Pohar Perme, M. (2010). Pseudo-observations in survival analysis. Statistical Methods in Medical Research, 19(1), 71–99.