Selection Modelling Methods


Ackerman, B., Lesko, C. R., Siddique, J., Susukida, R., & Stuart, E. A. (2021). Generalizing randomized trial findings to a target population using complex survey population data. Statistics in Medicine, 40(5), 1101–1120.

  • Randomized trials are considered the gold standard for estimating causal effects. Trial findings are often used to inform policy and programming efforts, yet their results may not generalize well to a relevant target population due to potential differences in effect moderators between the trial and population. Statistical methods have been developed to improve generalizability by combining trials and population data, and weighting the trial to resemble the population on baseline covariates. Large-scale surveys in fields such as health and education with complex survey designs are a logical source for population data; however, there is currently no best practice for incorporating survey weights when generalizing trial findings to a complex survey. We propose and investigate ways to incorporate survey weights in this context.
  • We examine the performance of our proposed estimator through simulations in comparison to estimators that ignore the complex survey design. We then apply the methods to generalize findings from two trials—a lifestyle intervention for blood pressure reduction and a web-based intervention to treat substance use disorders—to their respective target populations using population data from complex surveys.
  • The work highlights the importance in properly accounting for the complex survey design when generalizing trial findings to a population represented by a complex survey sample.

Chan, W. (2018). Applications of Small Area Estimation to Generalization with Subclassification by Propensity Scores. Journal of Educational and Behavioral Statistics, 43(2), 182–224. ERIC.

Policymakers have grown increasingly interested in how experimental results may generalize to a larger population. However, recently developed propensity score-based methods are limited by small sample sizes, where the experimental study is generalized to a population that is at least 20 times larger. This is particularly problematic for methods such as subclassification by propensity score, where limited sample sizes lead to sparse strata. - This article explores the potential of small area estimation methods to improve the precision of estimators in sparse strata using population data as a source of auxiliary information to borrow strength. Results from simulation studies identify the conditions under which small area estimators outperform conventional estimators and the limitations of this application to causal generalization studies.

Cole, S. R., & Stuart, E. A. (2010). Generalizing evidence from randomized clinical trials to target populations: The ACTG 320 trial. American Journal of Epidemiology, 172(1), 107–115.

  • Properly planned and conducted randomized clinical trials remain susceptible to a lack of external validity. The authors illustrate a model-based method to standardize observed trial results to a specified target population using a seminal human immunodeficiency virus (HIV) treatment trial, and they provide Monte Carlo simulation evidence supporting the method.
  • The example trial enrolled 1,156 HIV-infected adult men and women in the United States in 1996, randomly assigned 577 to a highly active antiretroviral therapy and 579 to a largely ineffective combination therapy, and followed participants for 52 weeks. The target population was US people infected with HIV in 2006, as estimated by the Centers for Disease Control and Prevention.
  • Results from the trial apply, albeit muted by 12%, to the target population, under the assumption that the authors have measured and correctly modeled the determinants of selection that reflect heterogeneity in the treatment effect. In simulations with a heterogeneous treatment effect, a conventional intent-to-treat estimate was biased with poor confidence limit coverage, but the proposed estimate was largely unbiased with appropriate confidence limit coverage. The proposed method standardizes observed trial results to a specified target population and thereby provides information regarding the generalizability of trial results.

Dahabreh, I. J., Robertson, S. E., Steingrimsson, J. A., Stuart, E. A., & Hernan, M. A. (2020). Extending inferences from a randomized trial to a new target population. Statistics in Medicine, 39(14), 1999–2014.

  • When treatment effect modifiers influence the decision to participate in a randomized trial, the average treatment effect in the population represented by the randomized individuals will differ from the effect in other populations.
  • In this tutorial, we consider methods for extending causal inferences about time-fixed treatments from a trial to a new target population of nonparticipants, using data from a completed randomized trial and baseline covariate data from a sample from the target population. We examine methods based on modeling the expectation of the outcome, the probability of participation, or both (doubly robust). We compare the methods in a simulation study and show how they can be implemented in software. We apply the methods to a randomized trial nested within a cohort of trial-eligible patients to compare coronary artery surgery plus medical therapy versus medical therapy alone for patients with chronic coronary artery disease. We conclude by discussing issues that arise when using the methods in applied analyses.

Dong, N., Stuart, E. A., Lenis, D., & Quynh Nguyen, T. (2020). Using propensity score analysis of survey data to estimate population average treatment effects: A case study comparing different methods. Evaluation Review, 44(1), 84–108.

  • Background: Many studies in psychological and educational research aim to estimate population average treatment effects (PATE) using data from large complex survey samples, and many of these studies use propensity score methods. Recent advances have investigated how to incorporate survey weights with propensity score methods. However, to this point, that work had not been well summarized, and it was not clear how much difference the different PATE estimation methods would make empirically. Purpose: The purpose of this study is to systematically summarize the appropriate use of survey weights in propensity score analysis of complex survey data and use a case study to empirically compare the PATE estimates using multiple analysis methods that include ordinary least squares regression, weighted least squares regression, and various propensity score applications.
  • Methods: We first summarize various propensity score methods that handle survey weights. We then demonstrate the performance of various analysis methods using a nationally representative data set, the Early Childhood Longitudinal Study–Kindergarten to estimate the effects of preschool on children’s academic achievement. The correspondence of the results was evaluated using multiple criteria.
  • Results and Conclusions: It is important for researchers to think carefully about their estimand of interest and use methods appropriate for that estimand. If interest is in drawing inferences to the survey target population, it is important to take the survey weights into account, particularly in the outcome analysis stage for estimating the PATE. The case study shows, however, not much difference among various analysis methods in one applied example.

Kern, H. L., Stuart, E. A., Hill, J., & Green, D. P. (2016). Assessing methods for generalizing experimental impact estimates to target populations. Journal of Research on Educational Effectiveness, 9(1), 103–127.

  • Randomized experiments are considered the gold standard for causal inference because they can provide unbiased estimates of treatment effects for the experimental participants. However, researchers and policymakers are often interested in using a specific experiment to inform decisions about other target populations. In education research, increasing attention is being paid to the potential lack of generalizability of randomized experiments because the experimental participants may be unrepresentative of the target population of interest.
  • This article examines whether generalization may be assisted by statistical methods that adjust for observed differences between the experimental participants and members of a target population. The methods examined include approaches that reweight the experimental data so that participants more closely resemble the target population and methods that utilize models of the outcome. Two simulation studies and one empirical analysis investigate and compare the methods’ performance. One simulation uses purely simulated data while the other utilizes data from an evaluation of a school-based dropout prevention program.
  • Our simulations suggest that machine learning methods outperform regression-based methods when the required structural (ignorability) assumptions are satisfied. When these assumptions are violated, all of the methods examined perform poorly. Our empirical analysis uses data from a multisite experiment to assess how well results from a given site predict impacts in other sites. Using a variety of extrapolation methods, predicted effects for each site are compared to actual benchmarks. Flexible modeling approaches perform best, although linear regression is not far behind.
  • Taken together, these results suggest that flexible modeling techniques can aid generalization while underscoring the fact that even state-of-the-art statistical techniques still rely on strong assumptions.

O’Muircheartaigh, C., & Hedges, L. V. (2014). Generalizing from unrepresentative experiments: A stratified propensity score approach. Journal of the Royal Statistical Society: Series C (Applied Statistics), 63(2), 195–210.

  • The paper addresses means of generalizing from an experiment based on a non-probability sample to a population of interest and to subpopulations of interest, where information is available about relevant covariates in the whole population.
  • Using stratification based on propensity score matching with an external populationwide data set, an estimator of the population average treatment effect is constructed. An example is presented in which the applicability of a major education intervention in a non-probability sample of schools in Texas, USA, is assessed for the state as a whole and for its constituent counties.
  • The implications of the results are discussed for two important situations: how to use this methodology to establish where future experiments should be conducted to improve this generalization and how to construct a priori a strategy for experimentation which will maximize both the initial inferential power and the final inferential basis for a series of experiments.

Robertson, S. E., Steingrimsson, J. A., Joyce, N. R., Stuart, E. A., & Dahabreh, I. J. (2021). Estimating subgroup effects in generalizability and transportability analyses. ArXiv Preprint ArXiv:2109.14075.

  • Methods for extending – generalizing or transporting – inferences from a randomized trial to a target population involve conditioning on a large set of covariates that is sufficient for rendering the randomized and non-randomized groups exchangeable. Yet, decision-makers are often interested in examining treatment effects in subgroups of the target population defined in terms of only a few discrete covariates.
  • Here, we propose methods for estimating subgroup-specific potential outcome means and average treatment effects in generalizability and transportability analyses, using outcome model-based (g-formula), weighting, and augmented weighting estimators.
  • We consider estimating subgroup-specific average treatment effects in the target population and its non-randomized subset, and provide methods that are appropriate both for nested and non-nested trial designs. As an illustration, we apply the methods to data from the Coronary Artery Surgery Study to compare the effect of surgery plus medical therapy versus medical therapy alone for chronic coronary artery disease in subgroups defined by history of myocardial infarction.

Rudolph, K. E., Díaz, I., Rosenblum, M., & Stuart, E. A. (2014). Estimating population treatment effects from a survey subsample. American Journal of Epidemiology, 180(7), 737–748.

  • We considered the problem of estimating an average treatment effect for a target population using a survey subsample. Our motivation was to generalize a treatment effect that was estimated in a subsample of the National Comorbidity Survey Replication Adolescent Supplement (2001-2004) to the population of US adolescents. To address this problem, we evaluated easy-to-implement methods that account for both nonrandom treatment assignment and a nonrandom 2-stage selection mechanism.
  • We compared the performance of a Horvitz-Thompson estimator using inverse probability weighting and 2 doubly robust estimators in a variety of scenarios. We demonstrated that the 2 doubly robust estimators generally outperformed inverse probability weighting in terms of mean-squared error even under misspecification of one of the treatment, selection, or outcome models.
  • Moreover, the doubly robust estimators are easy to implement and provide an attractive alternative to inverse probability weighting for applied epidemiologic researchers. We demonstrated how to apply these estimators to our motivating example.

Tipton, E. (2013). Improving generalizations from experiments using propensity score subclassification: Assumptions, properties, and contexts. Journal of Educational and Behavioral Statistics, 38(3), 239–266.

  • As a result of the use of random assignment to treatment, randomized experiments typically have high internal validity. However, units are very rarely randomly selected from a well-defined population of interest into an experiment; this results in low external validity. Under nonrandom sampling, this means that the estimate of the sample average treatment effect calculated in the experiment can be a biased estimate of the population average treatment effect.
  • This article explores the use of the propensity score subclassification estimator as a means for improving generalizations from experiments. It first lays out the assumptions necessary for generalizations, then investigates the amount of bias reduction and average variance inflation that is likely when compared to a conventional estimator. It concludes with a discussion of issues that arise when the population of interest is not well represented by the experiment, and an example.

Wang, L., Graubard, B. I., Katki, H. A., & Li, and Y. (2020). Improving external validity of epidemiologic cohort analyses: A kernel weighting approach. Journal of the Royal Statistical Society: Series A (Statistics in Society), 183(3), 1293–1311. Business Source Alumni Edition.

  • Summary: For various reasons, cohort studies generally forgo probability sampling required to obtain population representative samples. However, such cohorts lack population representativeness, which invalidates estimates of population prevalences for novel health factors that are only available in cohorts. To improve external validity of estimates from cohorts, we propose a kernel weighting (KW) approach that uses survey data as a reference to create pseudoweights for cohorts.
  • A jackknife variance is proposed for the KW estimates. In simulations, the KW method outperformed two existing propensity‐score‐based weighting methods in mean‐squared error while maintaining confidence interval coverage. We applied all methods to estimating US population mortality and prevalences of various diseases from the non‐representative US National Institutes of Health–American Association of Retired Persons cohort, using the sample from the US‐representative National Health Interview Survey as the reference. Assuming that the survey estimates are correct, the KW approach yielded generally less biased estimates compared with the existing propensity‐score‐based weighting methods. [ABSTRACT FROM AUTHOR]

Westreich, D., Edwards, J. K., Lesko, C. R., Stuart, E., & Cole, S. R. (2017). Transportability of trial results using inverse odds of sampling weights. American Journal of Epidemiology, 186(8), 1010–1014.

  • Increasingly, the statistical and epidemiologic literature is focusing beyond issues of internal validity and turning its attention to questions of external validity. Here, we discuss some of the challenges of transporting a causal effect from a randomized trial to a specific target population.
  • We present an inverse odds weighting approach that can easily operationalize transportability. We derive these weights in closed form and illustrate their use with a simple numerical example. We discuss how the conditions required for the identification of internally valid causal effects are translated to apply to the identification of externally valid causal effects. Estimating effects in target populations is an important goal, especially for policy or clinical decisions.
  • Researchers and policy-makers should therefore consider use of statistical techniques such as inverse odds of sampling weights, which under careful assumptions can transport effect estimates from study samples to target populations.