Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions misc/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
*_files/
*.html
129 changes: 129 additions & 0 deletions misc/design_other_outcomes.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
---
bibliography: references.bib
---

# Design doc for extending to other outcomes (count, survival, binary)

Currently, `rbmi` only supports continuous endpoints. However, also for other commonly used endpoints, in particular count, survival and binary endpoints, multiple imputation is often needed. Both the name of the `rbmi` package, as well as the structure of the workflow, would also fit these other endpoint methods for (reference based) multiple imputation.

## Methodology overview

Here is a first high level overview of important references for the non-continuous endpoints:

- Count:
- @keene2014 is the basic proposal for Bayesian multiple imputation for negative binomial count data, covering two time periods: before and after discontinuing the trial and treatment at the same time.
- @roger2019 extends this to more than two time periods, in particular a third time period where patients are still continuing the trial, but have stopped treatment already. So this is the more general setting.
- Survival:
- TODO
- Binary:
- TODO

### Outcome structure

For the current longitudinal continuous outcome, the structure is that of multiple visits (i.e. preplanned time-points) with a continuous outcome at each visit. This is the same for binary outcomes (e.g. response, relapse, etc.), which are often also measured at the same time-points as the continuous outcomes.
Missing data occurs because of intercurrent events or missed visits, and the missingness is defined for each visit.

For count outcomes, the structure is different: In the best case where a patient starts treatment and continues it until the end of the trial, we have a single count at the end of the trial. If we have treatment discontinuation and trial drop-out, we can have a maximum of three time periods: (1) before discontinuing the trial and treatment, (2) after discontinuing the treatment but still in the trial, and (3) after discontinuing the trial and treatment.
After discontinuing the trial, we cannot observe the outcome any longer, and therefore we have missing data for the count outcome in period (3).

For survival outcomes, the structure is similar as for count outcomes: Ideally we have a single survival time at the end of the trial (either the patient had an event before, or the patient is censored at the end of the trial). If we have treatment discontinuation and trial drop-out, we can again have a maximum of three time periods as for count outcomes. At the time of trial discontinuation the patient becomes immediately censored.

### Available imputation methods

For count outcomes, we focus on Bayesian multiple imputation for negative multinomial count data.

TODO survival and binary outcomes.

In the Bayesian multiple imputation methods, we can always use the `rbmi` workflow steps:

- Draw: Fit the initial imputation model via MCMC and draw parameters from the posterior distribution.
- Impute: Use the drawn parameters and the assumed model to impute the missing data for each parameter draw.
- Analyse: Analyse each of the imputed datasets with a (fast) analysis model, and save relevant estimators.
- Pool: Pool the estimators across the imputed datasets to get a single point estimate and confidence interval.

For `rbmi` this means that the other imputation methods, such as conditional mean imputation, are (for now) not relevant and therefore not supported.

## User interface generalization

### Data

- For count and survival outcomes, the same `expand_locf()` function can be used: Instead of the visit variable, we use a "period" or "phase" variable, which indicates whether the patient is in period (1), (2) or (3) as described above.
- For count outcomes, we have then in addition a "count" variable, capturing counts for each of the three periods, as well as a "duration" variable capturing the duration of each period (e.g. in years).
- For survival outcomes, we also need in addition two variables:
- The "event" variable, which is a binary variable indicating whether the patient had an event in the time period or continued without an event (i.e. censored if next time period is not observable)
- The "time" variable, which captures either the time to event in the period where there was an event, or the duration of the period if there was no event (i.e. censored).

### Draws

The `draws()` function needs to take the following inputs:

- `outcome`: The outcome type, which defaults to "continuous" (current implementation), but can now also be "count", "survival" or "binary".
- `data`: The whole data as described above
- `data_ice`: The data with intercurrent events (ICE) time points (that lead to discontinuation of follow up) and imputation strategies, which include:
- Missing At Random: "MAR"
- Jump to Reference: "JR"
- Copy Reference: "CR"
- Copy Increments from Reference: "CIR"
- Last Mean Carried Forward: "LMCF"
- `data_treatment_ice`: Only needed for count and survival outcomes, the data with treatment discontinuation time points and corresponding off-treatment imputation strategies.
- TODO: We might need to be able to specify different strategies here
- `vars`: In addition to the existing variables, where we use `outcome` to capture counts and binary outcomes, we also need `event` and `time` for survival outcomes, and `duration` for count outcomes.
- `method`: For now we will only support `method_bayes()`, but could later support additional methods via corresponding `method_*()` constructors.

### Impute

The `impute()` function takes the `draws` object and the reference group mapping, and internally produces the imputed data sets.
For this to work correctly with different outcome types, the `draws` object needs to have a second (or a more specific) class indicating the outcome type, e.g. `draws_continuous`, `draws_count`, `draws_survival` and `draws_binary`. The `impute()` function can then dispatch on this second class to use the correct imputation method for the different outcome types.

Similarly, the result of the `impute()` call needs to have a second class indicating the outcome type, e.g. `imputation_continuous`, `imputation_count`, `imputation_survival` and `imputation_binary`. This allows the `analyse()` function to dispatch on the correct analysis method for the different outcome types.

### Analyse

In the last step of the workflow, the user chosen analysis model is applied to each of the imputed datasets, and the relevant estimators are saved for pooling. Because we can dispatch on the outcome type, we might have slightly different arguments supported here.
For example, we will need to check whether a delta adjustment makes sense for the non-normal outcomes.

In addition, `rbmi` can provide the standard analysis functions for the different outcome types. For example, for count outcomes we can provide a `neg_bin_regression` function, for survival outcomes a `cox_regression()` function, and for binary outcomes a `logistic_regression()` function.

### Pool

Here we just want to use Rubin's rules to pool the Bayesian multiple imputation results across the imputed datasets.

We might want to add an argument `df` or so to define the calculation of the degrees of freedom, because SAS uses a different method and we might want to be able to reproduce that with `rbmi` if needed.

## Internal structure adaptation

### Draws

The `draws()` function needs to be adapted as described above for the user interface. Internally, this means both the generic function as well as the `draws.bayes()` method need to be changed, while the other methods can almost stay as is (they should just assert that the outcome type is "continuous").

Internally, `draws.bayes()` can then dispatch on the outcome type for the correct code path, keeping the current behavior as is for the "continuous" outcome type. In particular, the `fit_mcmc()` function need to also dispatch on the different outcome types, to fit the correct imputation model and draw parameters from the posterior distribution: Instead of calling `fit_mmrm()` for the continuous outcome type, we can call `fit_neg_multinom()` for the count outcome type e.g. However, many of the logistics around the MCMC fitting and parameter drawing can be kept the same across the different outcome types, so we can reuse a lot of code here.

The new fitting functions would use new Stan code from `inst/stan/` for the different outcome types, which would be added to the package.

Also the post-processing is slightly different depending on outcome. Finally, the `as_draws()` constructor now also attaches the additional class for the outcome type, as described above.

### Impute

For the `impute()` function, we primarily need to change the `impute.random()` method, which needs to dispatch on the outcome type for the correct code path. This means changing `impute_internal()` and the workhorse `impute_data_individual()`, which currently calls `impute_outcome()` which in turn generates multivariate normal samples. In contrast, for the count outcome type e.g., we would need to generate negative binomial samples at the end.

For both the `draws()` and `impute()` steps, we will need to make sure we use the two data sets for intercurrent events and treatment discontinuation correctly for count and survival outcomes.

### Analyse

Looking at the `analyse()` function's stack [here](https://github.com/openpharma/rbmi/blob/main/R/analyse.R) it seems that most of the code won't need to be adapted a lot. The dispatching on outcome type will matter for small things like the initial input validation only. The most important additions will be the outcome type specific analysis functions, supplementing the existing [ANCOVA](https://github.com/openpharma/rbmi/blob/main/R/ancova.R) function.

### Pool

There is not much to change here.
As mentioned above we might want to have a `df` argument to be able to optionally choose the simpler degrees of freedom calculation, along these lines (compare [here](https://documentation.sas.com/doc/en/statug/latest/statug_mianalyze_details08.htm)):

```{r}
#| eval: false
df_sas <- (M - 1) * (1 + within_var / ((1 + 1/M) * between_var))^2
df_sas
t_crit_sas <- qt(1 - alpha/2, df_sas)
ci_lower_sas <- coef_hat - t_crit_sas * coef_se
ci_upper_sas <- coef_hat + t_crit_sas * coef_se
```

## References
23 changes: 23 additions & 0 deletions misc/references.bib
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
@article{roger2019,
author = {Roger, James H. and Bratton, Daniel J. and Mayer, Bhabita and Abellan, Juan J. and Keene, Oliver N.},
title = {Treatment policy estimands for recurrent event data using data collected after cessation of randomised treatment},
journal = {Pharmaceutical Statistics},
volume = {18},
number = {1},
pages = {85-95},
keywords = {recurrent event, estimand, treatment policy, imputation, missing data},
doi = {10.1002/pst.1910},
year = {2019}
}

@article{keene2014,
author = {Keene, Oliver N. and Roger, James H. and Hartley, Benjamin F. and Kenward, Michael G.},
title = {Missing data sensitivity analysis for recurrent event data using controlled imputation},
journal = {Pharmaceutical Statistics},
volume = {13},
number = {4},
pages = {258-264},
keywords = {missing, sensitivity, recurrent event, exacerbation, multiple imputation, MNAR},
doi = {10.1002/pst.1624},
year = {2014}
}