When ‘Business-as-Usual’ Isn’t
- EOF
- 5 hours ago
- 6 min read
The Use of Randomised Controlled Trials in Outcome-Based Financing Education Programmes

Georges Poquillon
Evaluation Manager, the Education Outcomes Fund
At first glance, outcome-based financing (OBF) and randomised controlled trials (RCTs) appear to be a natural pair. OBF is anchored in the promise that payments are made for actual results. Meanwhile RCTs, still viewed as the 'gold standard' of evaluation, provide a credible estimate of the impact that can be attributed to a given intervention. That sounds like a perfect match! However, there is a catch. RCTs are great as long as their integrity can be maintained.
Drawing on EOF's experience in using RCTs alongside other impact evaluation methods to verify education outcomes in OBF programmes, this note reflects on the adequacy of the fit between OBF and RCTs in the education sector, and argues that the concept of 'business-as-usual' (BAU) needs to be rethought when multi-year interventions are delivered in fast-changing contexts, as is often the case in the education sector in many low- and middle-income countries.
RCT validity
RCTs involve randomly assigning units from a given population, for instance public schools, into two groups so that both groups share similar characteristics (e.g. same number of pupils per class or same performance levels on national exams). From here, one group, usually referred to as the treatment group, receives an intervention, while the other group continues to receive the normal treatment, often referred to as business-as-usual (BAU). Since both groups were initially ‘identical’, any later difference in outcomes observed between them can therefore be attributed to the intervention.
The validity of RCTs is grounded in the assumption that the control group constitutes a credible counterfactual, or what would have happened in the absence of an intervention. This typically requires: (i) no interference or spillovers between groups, (ii) fidelity and low attrition over time, and (iii) a steady BAU environment in controls, i.e. a level of support schools are expected to receive under normal circumstances (what the Abdul Latif Jameel Poverty Action Lab [J-PAL] refers to as the “status quo”i).
Importantly, BAU should not be interpreted as no support at all. If control schools are shielded from the support they would normally receive, the RCT estimates an idealised effect that is unlikely to hold at scale - contributing to “voltage” drop and underscoring the need to clearly define BAU from the outset. The limitations of highly 'controlled' RCTs when it comes to scaling have been widely documented (see, for instance John Listii on the voltage effect, and by the What Works Hub for Global Educationiii, amongst others).
In that context, what does BAU mean in today’s rapidly evolving education sector? According to the Global Schools Forum, nearly 24 million children were reached by more than 130 organisations in 2024iv. Across many countries, multiple actors are competing to test new solutions and explore models leveraging AI or other digital tools, while governments are under pressure to seize those opportunities and deliver results. In such a fast-changing environment, the concept of BAU becomes a moving target. What constitutes BAU at baseline may no longer hold a couple of months or years down the line.
Imagine a four-year RCT of a literacy intervention launched in Kenyan primary schools in early 2020: by the endline in 2024, multiple other national or multi-county education initiatives have been rolled out or dramatically expanded, such as the GPE-funded Learning Continuity in Basic Education Project, Safaricom and Eneza’s nationwide zero-rating of the Shupavu291 mobile learning platform, and the Keep Kenya Learning caregiver-support campaign. The RCT that was initially comparing the schools benefiting from the literacy intervention to schools receiving no support ended up comparing one intervention to many others (like a multi-arm A/B test).
The issue of uncertainty around the stability of the education landscape over the study period is reinforced when interventions are delivered at scale and over multi-year cycles, as is the case for the majority of OBF programmes. Nationwide coverage increases exposure to concomitant interventions, but making any justification to withhold or delay the implementation of new initiatives for the sake of research is hardly tenable from a political and ethical perspective (let alone classic political cycle factors such as elections or leadership changes). As governments legitimately want strong evidence and may therefore push for the adoption of RCTs in OBF programmes, spelling out the implications of adopting this method at an early stage of design is paramount to ensure that the stakes are well understood and properly assessed.
Why contamination matters for OBF payments
Contamination can have significant detrimental effects on OBF evaluations. The changing nature of the BAU environment (via the proliferation of new programmes being delivered) can be interpreted as a contamination issue, which we can refer to as 'external contamination through co-intervention'. From an evaluation perspective, contamination typically dilutes the intervention impact, resulting in lower and less precise impact estimates. If a substantial share of control schools receive support from external actors (e.g. government, non-state actors) during the implementation of the intervention, the evaluation effectively ends up measuring the impact of the original intervention against an evolving bundle of other programmes, rather than a clearly defined reference group (which would result into a multi-arm A/B test rather than an RCT).
While there is some tolerance for this issue in the academic literature, it poses a major challenge for OBF evaluations. In an OBF setting, underestimating the true impact of the intervention has direct consequences for provider payments. For example, suppose a provider truly achieves a 0.20 SD increase in learning relative to a control group. If 20% of control schools receive some support from another organisation that generates a 0.10 SD impact on learning, the observed impact of the main provider will be 0.18 SD - 10% lower than its true value. Under an OBF mechanism, that would translate into a comparable reduction in payment (if payments are proportional to the measured effect). This not only weakens OBF as a payment mechanism, but it can also erode service provider’s trust in the evaluation’s ability to capture the true impact achieved.
Second, addressing contamination often requires spending additional resources not planned during the evaluation design. Whether through additional data collection to identify new control units, conducting extended monitoring surveys, or performing additional sensitivity analyses and robustness checks, any interference with an RCT design can inflate the overall evaluation costs and ‘eat’ into stakeholder teams' time and attention.
Recommendations and conclusion
How, then, can evaluations for OBF programmes move forward with utilising RCTs given the considerations outlined above? The following recommendations draw on EOF’s experience designing and implementing multiple OBF programmes across diverse contexts, informed by evidence from a range of evaluation approaches.
Recommendations on the use of RCTs for OBF
1. Time-box randomisation: Use an RCT in Year 1 (or the first cohort) to calibrate impact and establish a benchmark for target setting. Thereafter, shift to lighter methods such as pre-post, to measure progress against the established benchmark.
2. Plan the analysis to protect credibility: Build contamination risk into the design/analysis plan. For example, use an ANCOVA approach (record baseline and endline, and use the baseline score to adjust the endline comparison), which can then be repurposed into a pre-post design if the control group is no longer valid. In this case, one would compare the baseline and endline values of the outcomes in the treatment group only.
3. Document BAU and co-interventions: Add a simple co-intervention log at baseline, midline and endline: who delivered what, when, where, and to whom. This is standard good practice in education trials and becomes essential when evaluation results drive payments.
4. BAU review in target setting: When targets are set based on the impacts from similar interventions, analyse how BAU was defined and maintained in those studies, and how it compares with the OBF programme context. If higher levels of contamination are more likely than in other studies, consider treating observed impacts as upper bound.
5. Choose the evaluation design that fits the purpose: Early in the programme design phase, discuss with governments and stakeholders what evidence they want to generate and for which purpose (whether for learning, accountability, payment, or all three), and think creatively about alternatives to RCTs-or comparison-based impact evaluation methods more broadly – where BAU is unlikely to remain stable.
The point of this post is not to argue that we should no longer use RCTs as a verification tool in education OBF programmes. Rather, it is to emphasise the importance of carefully analysing the context in which RCTs would be applied and critically assessing whether they are appropriate, particularly for multi-year programmes in fast changing systems. In particular, managing stakeholder expectations and clearly highlighting the implications of conducting an RCT over several years is critical to ensure that decisions are made with a clear view of the trade-offs and risks.
i Abdul Latif Jameel Poverty Action Lab (J-PAL) (2023) ‘Lecture: Why Randomize’. Cambridge, MA: J-PAL.
ii List, J.A. (2024) ‘Optimally generate policy-based evidence before scaling’, Nature, 626(7999), pp. 491–499.
iii Angrist, N., Benveniste, L., Bevan, N. & Herbertson, J. 2025. Investing in implementation science, so ‘what works’ actually works in practice. What Works Hub for Global Education. Blog. 2025/022
iv Global Schools Forum (2024) Annual Impact Report 2023–24. Global Schools Forum
Image: UNICEF/UNI405809/Dejongh
