This page provides information about publications related to the generalizability of causal research. For non-technical overviews, researchers may want to first read one or both of the following publications:
Enhancing the Generalizability of Impact Studies in Education, a guide to support educational researchers through all stages of their impact studies, from defining the target population to reporting the impact findings.
Implementing Statistical Methods for Generalizing Randomized Trial Findings to a Target Population, a non-technical discussion of the methods for making statistical adjustments to an existing sample for improved generalizability.
Click here for a full annotated bibliography of the articles in each category. Click here for a description of how we identified relevant articles.
All publications have been categorized based on their content. Some articles have been classified into two or more categories. See below for descriptions of these categories and 2-3 exemplars of articles in each category.
Overviews or Conceptual Frameworks - These papers develop conceptual frameworks for generalizability, use those frameworks to derive measures of generalizability, and/or summarize a range of methods for improving a study’s generalizability. Some examples include:
- Tipton, E., & Olsen, R. B. (2018). A review of statistical methods for generalizing from evaluations of educational interventions. Educational Researcher, 47(8), 516-524.
- Stuart, E. A., Cole, S. R., Bradshaw, C. P., & Leaf, P. J. (2011). The use of propensity scores to assess the generalizability of results from randomized trials. Journal of the Royal Statistical Society: Series A (Statistics in Society), 174(2), 369-386.
- Degtiar, I., & Rose, S. (2022). A review of generalizability and transportability. Annual Review of Statistics and Its Application, 2326-8298.
Empirical Evidence on Generalizability- These papers assess the generalizability of one or more studies by comparing the characteristics of or average impact in the study sample to the characteristics of or average impact in some population. Papers that focus on sample characteristics include:
- Stuart, E. A., Bell, S. H., Ebnesajjad, C., Olsen, R. B., & Orr, L. L. (2017). Characteristics of school districts that participate in rigorous national educational evaluations. Journal of Research on Educational Effectiveness, 10(1), 168-206.
- Susukida, R., Crum, R. M., Stuart, E. A., Ebnesajjad, C., & Mojtabai, R. (2016). Assessing sample representativeness in randomized controlled trials: application to the National Institute of Drug Abuse Clinical Trials Network. Addiction, 111(7), 1226-1234.
Papers that examine impacts directly include:
Bell, S. H., Olsen, R. B., Orr, L. L., & Stuart, E. A. (2016). Estimates of external validity bias when impact evaluations select sites non-randomly. Educational Evaluation and Policy Analysis, 38(2), 318-335.
Hotz, V. J., Imbens, G. W., & Mortimer, J. H. (2005). Predicting the efficacy of future training programs using past experiences at other locations. Journal of Econometrics, 125(1-2), 241-270.
Sample Selection Methods - These papers develop or test the performance of sampling methods (e.g., random sampling, balanced sample) for obtaining representative samples of the population for impact studies. Examples include:
- Tipton, E. (2013). Stratified sampling using cluster analysis: A sample selection strategy for improved generalizations from experiments. Evaluation Review, 37(2), 109-139.
- Tipton, E., Hedges, L., Vaden-Kiernan, M., Borman, G., Sullivan, K., & Caverly, S. (2014). Sample selection in randomized experiments: A new method using propensity score stratified sampling. Journal of Research on Educational Effectiveness, 7(1), 114-135.
Selection Modelling Methods - These papers develop or test the performance of methods (e.g., propensity score methods) for modelling selection into the study sample and (often) reweighting a study sample to resemble the study’s population. Examples include:
- Tipton, E. (2013). Improving generalizations from experiments using propensity score subclassification: Assumptions, properties, and contexts. Journal of Educational and Behavioral Statistics, 38(3), 239-266.
- Westreich, D., Edwards, J. K., Lesko, C. R., Stuart, E., & Cole, S. R. (2017). Transportability of trial results using inverse odds of sampling weights. American Journal of Epidemiology, 186(8), 1010-1014.
Outcome Modelling Methods - These papers develop or test the performance of regression methods to model outcomes and use the model to predict the population average treatment effect. Examples include:
- Kern, H. L., Stuart, E. A., Hill, J., & Green, D. P. (2016). Assessing methods for generalizing experimental impact estimates to target populations. Journal of Research on Educational Effectiveness, 9(1), 103-127. (Also covers selection modelling methods.)
- Verde, P. E., Ohmann, C., Morbach, S., & Icks, A. (2016). Bayesian evidence synthesis for exploring generalizability of treatment effects: a case study of combining randomized and non‐randomized results in diabetes. Statistics in Medicine, 35(10), 1654-1675.
Doubly Robust Methods - These papers develop or test the performance of methods that combine selection modelling and outcome modelling to make estimates of the population effect more robust to violations of model assumptions. Examples include:
- Dahabreh, I. J., & Hernán, M. A. (2019). Extending inferences from a randomized trial to a target population. European Journal of Epidemiology, 34(8), 719-722.
- Pressler, T. R., & Kaizar, E. E. (2013). The use of propensity scores and observational data to estimate randomized controlled trial generalizability bias. Statistics in Medicine, 32(20), 3552-3568.
Sensitivity Analysis Methods - These papers develop or test the performance of method for testing the assumptions needed for generalizability, assessing the sensitivity of the findings to those assumptions, or estimating bounds for the population average effect based on weaker assumptions. Examples include:
- Chan, W. (2017). Partially identified treatment effects for generalizability. Journal of Research on Educational Effectiveness, 10(3), 646-669.
- Nguyen, T. Q., Ebnesajjad, C., Cole, S. R., & Stuart, E. A. (2017). Sensitivity analysis for an unobserved moderator in RCT-to-target-population generalization of treatment effects. The Annals of Applied Statistics, 225-247.
Transparency in Reporting - These papers recommend information that impact studies should report or assess the extent to which studies report information needed to assess their generalizability. Examples include:
- Braslow, J. T., Duan, N., Starks, S. L., Polo, A., Bromley, E., & Wells, K. B. (2005). Generalizability of studies on mental health treatment and outcomes, 1981 to 1996. Psychiatric Services, 56(10), 1261-1268.
- Rothwell, P. M. (2005). External validity of randomised controlled trials: “To whom do the results of this trial apply?”. The Lancet, 365(9453), 82-93.
Disclaimer: This list of publications did not result from a systematic review. We compiled papers that we knew, conducted some searches in various databases to identify additional papers, and screened those papers for relevance. See here for more details on how we assembled these publications and classified them based on what they offer.
Have questions about the literature? Or suggestions of papers we should add? Please contact Rob Olsen (robolsen@gwu.edu) or Elizabeth Stuart(estuart@jhu.edu).