Transporting treatment effects with missing attributes


The simultaneous availability of experimental and observational data to estimate a treatment effect is both an opportunity and a statistical challenge: Combining the information gathered from both data is a promising avenue to build upon the internal validity of randomized controlled trials (RCTs) and a greater external validity of observational data, but it raises methodological issues, especially due to different sampling designs inducing distributional shifts. We focus on the aim of transporting a causal effect estimated on an RCT onto a target population described by a set of covariates. Available methods such as inverse propensity weighting are not designed to handle missing values, which are however common in both data. In addition to coupling the assumptions for causal identifiability and for the missing values mechanism and to defining appropriate strategies, one has to consider the specific structure of the data with two sources and treatment and outcome only available in the RCT. We study different approaches and their underlying assumptions on the data generating processes and distribution of missing values and suggest several adapted methods, in particular multiple imputation strategies. These methods are assessed in an extensive simulation study and practical guidelines are provided for different scenarios. This work is motivated by the analysis of a large registry of over 20,000 major trauma patients and a multi-centered RCT studying the effect of tranexamic acid administration on mortality. The analysis illustrates how the missing values handling can impact the conclusion about the effect transported from the RCT to the target population.

Imke Mayer
PhD, Research Scientist in Statistics and Causal Inference