openpharma · danielinteractive · May 25, 2026 · May 25, 2026
diff --git a/misc/.gitignore b/misc/.gitignore
@@ -0,0 +1,2 @@
+*_files/
+*.html
diff --git a/misc/design_other_outcomes.qmd b/misc/design_other_outcomes.qmd
@@ -0,0 +1,129 @@
+---
+bibliography: references.bib
+---
+
+# Design doc for extending to other outcomes (count, survival, binary)
+
+Currently, `rbmi` only supports continuous endpoints. However, also for other commonly used endpoints, in particular count, survival and binary endpoints, multiple imputation is often needed. Both the name of the `rbmi` package, as well as the structure of the workflow, would also fit these other endpoint methods for (reference based) multiple imputation. 
+
+## Methodology overview
+
+Here is a first high level overview of important references for the non-continuous endpoints:
+
+- Count:
+  - @keene2014 is the basic proposal for Bayesian multiple imputation for negative binomial count data, covering two time periods: before and after discontinuing the trial and treatment at the same time.
+  - @roger2019 extends this to more than two time periods, in particular a third time period where patients are still continuing the trial, but have stopped treatment already. So this is the more general setting.
+- Survival:
+  - TODO  
+- Binary: 
+  - TODO
+
+### Outcome structure
+
+For the current longitudinal continuous outcome, the structure is that of multiple visits (i.e. preplanned time-points) with a continuous outcome at each visit. This is the same for binary outcomes (e.g. response, relapse, etc.), which are often also measured at the same time-points as the continuous outcomes.
+Missing data occurs because of intercurrent events or missed visits, and the missingness is defined for each visit.
+
+For count outcomes, the structure is different: In the best case where a patient starts treatment and continues it until the end of the trial, we have a single count at the end of the trial. If we have treatment discontinuation and trial drop-out, we can have a maximum of three time periods: (1) before discontinuing the trial and treatment, (2) after discontinuing the treatment but still in the trial, and (3) after discontinuing the trial and treatment.
+After discontinuing the trial, we cannot observe the outcome any longer, and therefore we have missing data for the count outcome in period (3).
+
+For survival outcomes, the structure is similar as for count outcomes: Ideally we have a single survival time at the end of the trial (either the patient had an event before, or the patient is censored at the end of the trial). If we have treatment discontinuation and trial drop-out, we can again have a maximum of three time periods as for count outcomes. At the time of trial discontinuation the patient becomes immediately censored.
+
+### Available imputation methods
+
+For count outcomes, we focus on Bayesian multiple imputation for negative multinomial count data.
+
+TODO survival and binary outcomes.
+
+In the Bayesian multiple imputation methods, we can always use the `rbmi` workflow steps:
+
+- Draw: Fit the initial imputation model via MCMC and draw parameters from the posterior distribution.
+- Impute: Use the drawn parameters and the assumed model to impute the missing data for each parameter draw.
+- Analyse: Analyse each of the imputed datasets with a (fast) analysis model, and save relevant estimators.
+- Pool: Pool the estimators across the imputed datasets to get a single point estimate and confidence interval.
+
+For `rbmi` this means that the other imputation methods, such as conditional mean imputation, are (for now) not relevant and therefore not supported.
+
+## User interface generalization
+
+### Data
+
+- For count and survival outcomes, the same `expand_locf()` function can be used: Instead of the visit variable, we use a "period" or "phase" variable, which indicates whether the patient is in period (1), (2) or (3) as described above. 
+- For count outcomes, we have then in addition a "count" variable, capturing counts for each of the three periods, as well as a "duration" variable capturing the duration of each period (e.g. in years). 
+- For survival outcomes, we also need in addition two variables:
+  - The "event" variable, which is a binary variable indicating whether the patient had an event in the time period or continued without an event (i.e. censored if next time period is not observable)
+  - The "time" variable, which captures either the time to event in the period where there was an event, or the duration of the period if there was no event (i.e. censored).
+
+### Draws
+
+The `draws()` function needs to take the following inputs:
+
+- `outcome`: The outcome type, which defaults to "continuous" (current implementation), but can now also be "count", "survival" or "binary". 
+- `data`: The whole data as described above
+- `data_ice`: The data with intercurrent events (ICE) time points (that lead to discontinuation of follow up) and imputation strategies, which include:
+  - Missing At Random: "MAR"
+  - Jump to Reference: "JR"
+  - Copy Reference: "CR"
+  - Copy Increments from Reference: "CIR"
+  - Last Mean Carried Forward: "LMCF"
+- `data_treatment_ice`: Only needed for count and survival outcomes, the data with treatment discontinuation time points and corresponding off-treatment imputation strategies.
+  - TODO: We might need to be able to specify different strategies here
+- `vars`: In addition to the existing variables, where we use `outcome` to capture counts and binary outcomes, we also need `event` and `time` for survival outcomes, and `duration` for count outcomes.
+- `method`: For now we will only support `method_bayes()`, but could later support additional methods via corresponding `method_*()` constructors.
+
+### Impute
+
+The `impute()` function takes the `draws` object and the reference group mapping, and internally produces the imputed data sets.
+For this to work correctly with different outcome types, the `draws` object needs to have a second (or a more specific) class indicating the outcome type, e.g. `draws_continuous`, `draws_count`, `draws_survival` and `draws_binary`. The `impute()` function can then dispatch on this second class to use the correct imputation method for the different outcome types.
+
+Similarly, the result of the `impute()` call needs to have a second class indicating the outcome type, e.g. `imputation_continuous`, `imputation_count`, `imputation_survival` and `imputation_binary`. This allows the `analyse()` function to dispatch on the correct analysis method for the different outcome types.
+
+### Analyse
+
+In the last step of the workflow, the user chosen analysis model is applied to each of the imputed datasets, and the relevant estimators are saved for pooling. Because we can dispatch on the outcome type, we might have slightly different arguments supported here.
+For example, we will need to check whether a delta adjustment makes sense for the non-normal outcomes.
+
+In addition, `rbmi` can provide the standard analysis functions for the different outcome types. For example, for count outcomes we can provide a `neg_bin_regression` function, for survival outcomes a `cox_regression()` function, and for binary outcomes a `logistic_regression()` function.
+
+### Pool
+
+Here we just want to use Rubin's rules to pool the Bayesian multiple imputation results across the imputed datasets. 
+
+We might want to add an argument `df` or so to define the calculation of the degrees of freedom, because SAS uses a different method and we might want to be able to reproduce that with `rbmi` if needed.
+
+## Internal structure adaptation
+
+### Draws
+
+The `draws()` function needs to be adapted as described above for the user interface. Internally, this means both the generic function as well as the `draws.bayes()` method need to be changed, while the other methods can almost stay as is (they should just assert that the outcome type is "continuous").
+
+Internally, `draws.bayes()` can then dispatch on the outcome type for the correct code path, keeping the current behavior as is for the "continuous" outcome type. In particular, the `fit_mcmc()` function need to also dispatch on the different outcome types, to fit the correct imputation model and draw parameters from the posterior distribution: Instead of calling `fit_mmrm()` for the continuous outcome type, we can call `fit_neg_multinom()` for the count outcome type e.g. However, many of the logistics around the MCMC fitting and parameter drawing can be kept the same across the different outcome types, so we can reuse a lot of code here.
+
+The new fitting functions would use new Stan code from `inst/stan/` for the different outcome types, which would be added to the package. 
+
+Also the post-processing is slightly different depending on outcome. Finally, the `as_draws()` constructor now also attaches the additional class for the outcome type, as described above.
+
+### Impute
+
+For the `impute()` function, we primarily need to change the `impute.random()` method, which needs to dispatch on the outcome type for the correct code path. This means changing `impute_internal()` and the workhorse `impute_data_individual()`, which currently calls `impute_outcome()` which in turn generates multivariate normal samples. In contrast, for the count outcome type e.g., we would need to generate negative binomial samples at the end.
+
+For both the `draws()` and `impute()` steps, we will need to make sure we use the two data sets for intercurrent events and treatment discontinuation correctly for count and survival outcomes. 
+
+### Analyse
+
+Looking at the `analyse()` function's stack [here](https://github.com/openpharma/rbmi/blob/main/R/analyse.R) it seems that most of the code won't need to be adapted a lot. The dispatching on outcome type will matter for small things like the initial input validation only. The most important additions will be the outcome type specific analysis functions, supplementing the existing [ANCOVA](https://github.com/openpharma/rbmi/blob/main/R/ancova.R) function.
+
+### Pool
+
+There is not much to change here.
+As mentioned above we might want to have a `df` argument to be able to optionally choose the simpler degrees of freedom calculation, along these lines (compare [here](https://documentation.sas.com/doc/en/statug/latest/statug_mianalyze_details08.htm)):
+
+```{r}
+#| eval: false
+df_sas <- (M - 1) * (1 + within_var / ((1 + 1/M) * between_var))^2
+df_sas
+t_crit_sas <- qt(1 - alpha/2, df_sas)
+ci_lower_sas <- coef_hat - t_crit_sas * coef_se
+ci_upper_sas <- coef_hat + t_crit_sas * coef_se
+```
+
+## References
diff --git a/misc/references.bib b/misc/references.bib
@@ -0,0 +1,23 @@
+@article{roger2019,
+author = {Roger, James H. and Bratton, Daniel J. and Mayer, Bhabita and Abellan, Juan J. and Keene, Oliver N.},
+title = {Treatment policy estimands for recurrent event data using data collected after cessation of randomised treatment},
+journal = {Pharmaceutical Statistics},
+volume = {18},
+number = {1},
+pages = {85-95},
+keywords = {recurrent event, estimand, treatment policy, imputation, missing data},
+doi = {10.1002/pst.1910},
+year = {2019}
+}
+
+@article{keene2014,
+author = {Keene, Oliver N. and Roger, James H. and Hartley, Benjamin F. and Kenward, Michael G.},
+title = {Missing data sensitivity analysis for recurrent event data using controlled imputation},
+journal = {Pharmaceutical Statistics},
+volume = {13},
+number = {4},
+pages = {258-264},
+keywords = {missing, sensitivity, recurrent event, exacerbation, multiple imputation, MNAR},
+doi = {10.1002/pst.1624},
+year = {2014}
+}