calculate_pseudostates

calculate_pseudostates(times_and_reals, fixed_time_horizon)

Compute Aalen Johansen pseudo-observations at a fixed time horizon.

This function computes jackknife pseudo-observations for state occupation probabilities (or related Aalen Johansen estimands) evaluated at a fixed time horizon. Pseudo-observations are constructed using a leave-one-out scheme:

.. math::

\text{pseudo}_i = n \cdot \theta_{\text{full}} - (n - 1) \cdot \theta_{(-i)}

where :math:\theta_{\text{full}} is the Aalen Johansen estimate based on the full sample, and :math:\theta_{(-i)} is the estimate obtained after removing individual :math:i.

The resulting pseudo-observations can be used as individual-level outcomes in regression models (e.g., GEE or GLM) to assess covariate effects on cumulative incidence or state occupation probabilities.

Parameters

Name Type Description Default
times_and_reals pl.DataFrame Input data containing individual event times and realized states. Must be compatible with prepare_event_table and predict_aj_estimates. Each row corresponds to a single individual. required
fixed_time_horizon int Time point at which the Aalen Johansen estimates are evaluated. required

Returns

Name Type Description
pl.DataFrame A Polars DataFrame containing pseudo-observations for each individual. The output includes: - Identifier columns (e.g., state, time, horizon) copied from the full-sample Aalen Johansen estimate. - Numeric columns containing the pseudo-observations. - A row_id column indicating the index of the left-out observation. For each numeric column, the mean of the pseudo-observations equals the corresponding full-sample Aalen Johansen estimate.

See Also

polarstate.prepare_event_table : Convert raw data into an event table. polarstate.predict_aj_estimates : Predict Aalen-Johansen estimates.

Notes

  • This implementation uses an explicit leave-one-out loop and therefore has time complexity O(n²). It is intended for methodological work, simulations, or moderate sample sizes.
  • In the absence of censoring, the pseudo-observations reduce to the empirical individual contributions.
  • Non-numeric columns (e.g., state labels) are treated as identifiers and are not transformed.

References

.. [1] Andersen, P. K., & Pohar Perme, M. (2010). Pseudo-observations in survival analysis. Statistical Methods in Medical Research, 19(1), 71–99.