Data for Generalizability


Najafzadeh, M., & Schneeweiss, S. (2017). From trial to target populations—Calibrating real-world data. N Engl J Med, 376(13), 1203–1205.

Olsen, R. B. (2022). Using Survey Data to Obtain More Representative Site Samples for Impact Studies. *ArXiv Preprint *ArXiv:2201.05221

  • To improve the generalizability of impact evaluations, recent research has examined statistical methods for selecting representative samples of sites. However, these methods rely on having rich data on impact moderators for all sites in the target population.
  • This paper offers a new approach to selecting sites for impact studies when rich data on impact moderators are available, but only from a survey based on a representative sample of the impact study’s target population. Survey data are used to (1) estimate the proportion of sites in the population with certain characteristics, and (2) set limits on the number of sites with different characteristics that the sample can include. The Principal Investigator enforces the limits to ensure that certain types of sites are not overrepresented in the final sample. These limits can be layered on top of site selection and recruitment approaches to improve the representativeness of the sample.

Stuart, E. A., & Rhodes, A. (2017). Generalizing treatment effect estimates from sample to population: A case study in the difficulties of finding sufficient data. Evaluation Review, 41(4), 357–388.

  • Background: Given increasing concerns about the relevance of research to policy and practice, there is growing interest in assessing and enhancing the external validity of randomized trials: determining how useful a given randomized trial is for informing a policy question for a specific target population.
  • Objectives: This article highlights recent advances in assessing and enhancing external validity, with a focus on the data needed to make ex post statistical adjustments to enhance the applicability of experimental findings to populations potentially different from their study sample.
  • Research design: We use a case study to illustrate how to generalize treatment effect estimates from a randomized trial sample to a target population, in particular comparing the sample of children in a randomized trial of a supplemental program for Head Start centers (the Research-Based, Developmentally Informed study) to the national population of children eligible for Head Start, as represented in the Head Start Impact Study.
  • Results: For this case study, common data elements between the trial sample and population were limited, making reliable generalization from the trial sample to the population challenging. - Conclusions: To answer important questions about external validity, more publicly available data are needed. In addition, future studies should make an effort to collect measures similar to those in other data sets. Measure comparability between population data sets and randomized trials that use samples of convenience will greatly enhance the range of research and policy relevant questions that can be answered.

White, M. C., Rowan, B., Hansen, B., & Lycurgus, T. (2019). Combining Archival Data and Program-Generated Electronic Records to Improve the Usefulness of Efficacy Trials in Education: General Considerations and an Empirical Example. Journal of Research on Educational Effectiveness, 12(4), 659–684. ERIC.

  • There is growing pressure to make efficacy experiments more useful. This requires attending to the twin goals of generalizing experimental results to those schools that will use the results and testing the intervention’s theory of action.
  • We show how electronic records, created naturally during the daily operation of technology-based interventions, contain the information needed to attend to these twin goals. These records allow researchers to define the population of schools considering adoption of an intervention and to plan an experiment to generalize to these schools. They also allow researchers to identify schools likely to fully implement the intervention, such that the theory of action can be properly tested. Designing experiments to address these goals involves many tradeoffs and prioritizing the different purposes of the planned experiment. We discuss these challenges, linking experimental purposes with design decisions.