Introduction

The Intergovernmental Panel on Climate Change (IPCC) estimates that if greenhouse gas (GHG) emissions maintain their rise at the current business-as-usual rate, then by the end of the twenty first century the average temperature will increase by 2.6 to 4.8 °C and sea levels will rise by 0.45 to 0.82 m (Intergovernmental Panel on Climate Change 2018). The international community’s response to such climate threats is not sufficient to meet the goals of the Paris Agreement. Although climate finance has risen considerably over the past years, it is still deemed too low compared to the level required for a 1.5 °C scenario (Climate Policy Initiative, 2019). For example, investments in low-carbon technologies fall short of what is required to meet the mitigation target, according to a report by the International Energy Agency (2019a, b). The same applies to insufficient adaptation finance (Global Commission on Adaptation 2019) which does not meet the needs expressed in nationally determined contributions (Neufeldt et al. 2018). To meet such targets, climate investments need to be increased and impact per dollar spent need to be considerably higher. Furthermore, the longer current mitigation and adaptation measures fall short, the higher the overall impact required by future interventions. It is therefore imperative to usher in interventions that have (1) a large impact that (2) is sustained over time and at (3) a large scale.

This article considers such changes in outcomes as being transformational not least as these three elements are common in how major multinational agencies operationalize transformational change.Footnote 1 For example, a recent publication from the Climate Investment Funds (2019) identified the dimensions of relevance, scale, systemic change and sustainability as integral to transformation. This work posits that all four dimensions must be in place (to a greater or lesser extent) for transformational change to be considered both real and lasting. As shown by recent interest, transformational change has become the Holy Grail in climate change and development assistance. While there is anecdotal evidence in brochures and examples of highly successful interventions in the academic literature, many interventions do not replicate when scaled up, or work well in one context, but fail elsewhere (Banerjee et al. 2017; Deaton 2010; Madrian 2014; Muralidharan and Niehaus 2017). In addition, there appears to be a lack of causal evidence for climate interventions as rigorous evidence has only recently started to grow (Prowse and Snilstveit 2010; Puri 2019; Ferraro 2009).

This article summarises one step that two agencies—the Green Climate Fund’s Independent Evaluation Unit and the World Bank’s Climate Investment Funds—are taking to contribute to this urgent goal. It does so by outlining our approach for an evidence review of causal evidence of transformational change and its drivers. On the one hand, the article approaches this directly by systematically reviewing the experimental and quasi-experimental literature with the potential to document transformational change across a broad set of interventions and outcomes. Here, the article focuses on the energy sector in low- and middle-income countries (LMICs) because of its key role in climate mitigation. The article also approaches this learning exercise indirectly by reviewing the evidence on behavioural change in public health. The public health literature has the longest tradition of long-term causal studies on behavioural change and the thickest evidence base using causal designs (see Nisa et al. 2019). Insights from this sector can offer insights into how to overcome the last-mile problem which so often stands in the way of realizing transformational change. The goal behind the evidence review is to assess how lessons about transformational change in energy and behavioural change in public health (in terms of interventions that led to large and sustained change at scale) may inform broader mitigation and adaptation investments. This review therefore combines, in a novel way, two different reviews into one learning exercise on transformational change with the aim of systematically mapping and meta-analysing multi-sector evidence. The primary research question guiding this review is: what are the attributes, determinants and contributors of transformational change in the energy and public health sectors? This article has six further sections. The second section offers some background for the evidence review.  "Interventions and Outcomes" section outlines the intervention and outcome framework used in the review. The fourth section summarises the approach taken by the review and the methods used.  "Search Strategy" section highlights the search strategy whilst "Screening of Studies" section details the screening of studies. The conclusion round off by suggesting that such cross-sector learning could contribute to understanding how we can meet one of the largest challenges in the coming decades.

Background

Due to the urgency of the climate challenge and the scale of climate finance required to limit temperature increases to close to 1.5 °C, climate finance needs to precipitate system-broad and long-term changes across sectors and societies. Yet, most interventions supporting adaptation and mitigation actions focus on what can be described as incremental changes (Termeer et al. 2017). For example, incremental interventions are often deployed in adaptation in an ex post manner and are framed as a process of adjustment instead proactive preparation for the scale of change that is required (Bassett and Fogelman 2013). Turning to mitigation, here a wider range of interventions are supporting the transition to lower-carbon societies (Markard et al. 2012) and the scale, longevity and degree innovation within such interventions are more closely aligned with transformational actions (Wienges et al. 2017). In this review, through our synthesis, we highlight robust and causal evidence across individual studies of transformational change in energy (that shows a direct connection to climate interventions) and behavioural change in the public health sector (from which lessons can be drawn to inform climate interventions). We now describe each of these two intervention areas in some detail.

Studying the energy sector in LMICs is key to future climate change mitigation efforts. Carbon dioxide, methane, nitrous dioxide and fluorinated gases are the key greenhouse gases emitted by human activities with 76% of these constituted by carbon dioxide (CO2) emissions alone. Of overall emissions, fossil fuels and industrial processes account for 86%.Footnote 2 In terms of economic sectors, energy is the largest offender in terms of contribution to GHG emissions. It accounts for around 35%, including emissions that occur in the middle stages of energy production, e.g. fuel extraction, refining, processing and transportation (Pachuari and Meyer 2014). Energy contributes to the lagged, cumulative effect of greenhouse gas emissions. Such gases stay in the atmosphere for up to a century, such that on a per capita, historical basis, industrialised countries (that is, Annex 1 countries who are party to the UNFCCC) still bear the majority of the responsibility for such pollutants. That said, nearly all of the growth in energy demand, and consequently fossil-fuel use and GHG emissions, is predicted to come from LMICs (Wolfram et al. 2012). Part of this increase may in itself be driven by climate change. With rising temperatures, LMICs, for example, are expected to increase demand for residential air conditioning from 500 TWh in 2000 to around 4000 TWh in 2050 (World Energy Council 2015). The reliance of LMICs on fossil fuels for energy production means the projected increase in energy demand will, without strong countermeasures, result in even higher emissions (Ebinger and Vergara 2011a). For the period of 1994 and 2014, Falconí et al. (2019) already found considerably higher growth rates of per-capita CO2 emissions in middle-income compared to high-income countries (HICs), with − 0.2% for the latter compared to 2.8% for upper and 1.4% for lower middle-income countries. Similarly, upper and lower middle-income countries have nearly 24 times (for upper) and 9 times (for lower) per-capita energy-use growth rate of HICs. The contrast between the responsibility of Annex 1 countries for historical emissions and the responsibility of non-Annex 1 countries for future emissions is why climate change is such an intractable problem. It also shows why the energy sector in LMICs is set to play such a key role for climate mitigation measures. At the same time, the energy sector itself is vulnerable to climate change. Changing precipitation and weather patterns directly affect renewable energy plants, which are dependent on natural activities—hydropower plants can suffer from drying rivers, wind power plants would produce less energy if there is a windless drought and solar panels suffer from higher precipitation and cloud cover (Ebinger and Vergara 2011a). Overall, LMICs are predicted to suffer the greatest impacts from climate change and often display limited adaptive capacity. That said, LMICs also have opportunities to implement effective mitigation and adaptation strategies. For example, according to the International Energy Agency (2019a, b), sub-Saharan Africa could achieve significant industrialisation and economic growth while keeping emissions relatively low by increasing the share of renewable energy in the energy mix. To achieve this, the IEA calls for investments on grid expansion, reinforcement and maintenance as well as on renewable energy generating capacity, in particular solar PV. As outlined in a report by the GCA (2019), investments in climate change adaptation could generate high rates of return and pay out a “triple dividend” of avoided losses, economic benefits (e.g. through reduced climate risk) as well as social and environmental co-benefits. Our review will illustrate which interventions show robust and causal evidence—across individual studies—of transformational change in energy.

This review also summarises indirect evidence on transformational change through focussing on behavioural change in public health. This is because we can use the long tradition of using causal methods in this sector to investigate the interventions that may produce large and sustained behavioural change. This review uses this tradition to highlight the key interventions that elicited sustained behaviour change in individuals within five areas—nutritional (and dietary) habits, physical activity, substance abuse, hygiene practices and utilization of health care services—due to the widespread use of behavioural science interventions within these five broad areas. So far, the relevant literature in public health, including systematic reviews, analyse either large-scale interventions or long-term effects and rarely consider behaviour change outcomes. Instead, they focus on changes in health outcomes, such as morbidity, weight, etc. (Loveman et al. 2011; Reiner et al. 2013; Zubala et al. 2017). Since health outcomes depend on many factors besides behaviour, health outcomes are poor proxies for health behaviour. Behavioural interventions may be particularly influenced with scalability or sustainability issues: A behaviour change may either not persist over long periods of time or does not espouse itself within a large swaths of the population that is targeted by the intervention (List 2022; Nisa et al. 2019). This issue becomes even more relevant given the shorter lifespan of intervention in low- and middle-income countries, where evidence is currently scarce. Consequently, our review intends to focus on studies that can help answer the question: what types of intervention yield long-term and large-scale effects on behaviour change outcomes in low and middle-income countries.

We now describe the interventions and outcomes that are covered in this review and how they are categorized within two broad theories of change (see Figs. 1, 2). These theories of change simultaneously structure and define the scope of this study.

Fig. 1
figure 1

Source Authors

Theory of change for energy.

Fig. 2
figure 2

Source Authors

Theory of change for behavioural change in public health.

Interventions and Outcomes

Transformational change, as such, is difficult to find directly for two reasons. First, since transformational change consists of several elements and still lacks an established definition, it is not the outcome measured in empirical studies. Instead, evidence for transformational change may be found across a wide range of possible outcomes. Second, restricting our search to studies that document transformational change, i.e. large effects, at scale and sustained over time, risks finding statistical outliers rather than an unbiased reflection of the available evidence. We therefore search for evidence across a wide range of interventions and outcomes in studies that have the potential to document transformational change—regardless of whether the individual study indeed found large effects over time. In a subsequent step, we synthesize the evidence across studies to identify those interventions that can produce transformational change.

For the energy sector, we cover a broad set of interventions that either target or could have effects for climate change mitigation and adaptation. These take place either at the level of institutional and market systems, through incentives and standards, through “soft” interventions (nudges), or in form of investments into infrastructure. Outcomes under the purview of this review capture either climate change mitigation, adaptation (resilience of energy systems), or labour-market co-benefits of investments or transition into renewable energy. These are described in more detail in the next section.

For the public health sector, and as detailed above, we include interventions targeting behavioural change in five broad areas—nutritional (dietary) habits, physical activity, substance abuse, hygiene practices and utilization of health care service—due to the use of behavioural science interventions within these five broad areas. The scope of the targeted areas for the interventions in the health sector is as shown in the inclusion and exclusion criteria (“Appendix 1”). As is common practice in systematic reviews, these already defined criteria may be further refined during the screening process in case they turn out not to be sufficiently precise for efficient and consistent coding. These interventions will then be coded following the behavioural change framework provided by Michie et al. (2011), which includes nine different intervention functions, as listed in the following section. The behavioural change framework outlines the intersection between the two sectors, to enable the comparison and cross-sectoral learning on behavioural change outcomes.

The Theories of Change also include the moderators and the assumptions which influence the overall relation between the interventions and their potential outcomes. Therefore, the existing institutions, the political and ideological framework, the economic structures, the available resources, the environmental and technological constraints, and finally the characteristics of the intervention population are all important variables that might moderate the effect or the nature of an intervention. These are therefore included within both Theories of Change. Specific assumptions on which causal chains between interventions and long-term goals rest are as follows:

  • Individuals are responsive, receive the intervention as envisioned, and take up the intervention,

  • Interventions are relevant for the context or have been contextualized appropriately,

  • Institutions at all levels support the implementation of the interventions.

These assumptions are neither necessary nor sufficient conditions for large long-term impacts to occur. Rather, they are a set of ideal conditions that facilitate long-term effects. If one or several of these assumptions are not fulfilled, the causal chain may break down. As one example, think of an intervention in the form of an information campaign that promotes an effective vaccine. The first assumption stipulates that individuals are responsive to the information provided. However, if many individuals do not trust the content or the source of the information, then this assumption breaks down and the intervention is unlikely to be effective.Footnote 3

Finally, the Theories of Change map out the outcomes and long-term goals that are targeted by the type of interventions listed above. The sector-specific Theories of Change are now described one by one.

Public Health

The purpose of the public health sector is to indirectly learn about transformational change. Hence the goal of the public health systematic review is at a higher level than any individual intervention or context in which the intervention is tested. Instead we examine the basis of theoretic mechanisms within a mid-level theory of behaviour change. Learning about theoretic mechanisms—why an intervention works—makes policy makers and development organizations much more flexible than focusing on the “what works” question, in particular when—as in our case—the learning goal targets different sectors. Knowing about the active ingredients of interventions allows to apply these to different, but theoretically similar problems, i.e. where the same obstacle to behaviour is present. The critical element for learning across concrete interventions is an appropriate mid-level theory of change, which is both sufficiently general to be transferable and detailed enough in order to be able to tackle conceptually different problems. Mid-level theories rest between high-level theories, which are too abstract to have empirical application, and project-level theories, which apply to a specific context (see Cartwright 2020; Cartwright et al. 2020; White 2023). Mid-level theory is an approach which helps assess the transferability of study findings from one setting to another.Footnote 4

We choose the Behaviour Change Wheel (BCW; Michie et al. 2011) as the mid-level theory underlying the Theory of Change and to categorize interventions. Michie et al. (2011) rely on expert consultation as well as a review of a range of other behavioural frameworks to define a framework categorizing intervention and policies that encompass all previous frameworks.Footnote 5 This framework, which they call the “Behaviour Change Wheel”, groups interventions along nine intervention functions.Footnote 6 Behind these functions lie three essential sources of behaviour change: capability, opportunity and motivation, or the COM-B system. These source functions are effectively the drivers of behaviour change, without one (or more) of these being targeted, behaviour change is not possible. Therefore, interventions targeting behaviour change in the long term and on a large scale should look towards explicitly including these sources within their design. Each of these sources is further broken into two additional categories. Within capability, we find psychological and physical capacity, to allow the individual to engage in the activity promoted/inhibited by the intervention. Similarly, without social and physical opportunity, which lies outside of the control of the individual, behaviour change might not be possible. Both capability and opportunity also provide the necessary stimulus to the cognitive processes that motivate behaviour change, either by reflection or automatically. All these sources inform the design of the intervention, as depicted within the causal chain. There are nine categories of intervention functions that are included within the Behaviour Change Wheel. They are meant to contribute towards long-term change in health behaviours, which are the targeted outcomes within the public health sector.

First are interventions under the category of education, such as awareness and knowledge campaigns, used to increase knowledge or understanding, not only to inspire a particular behaviour but also to provide knowledge about competing behaviours. The second category of interventions falls under persuasion, whereby through various methods of communication, such as reminders or warnings via phone or other ICTs, positive or negative feelings are induced to stimulate action. Incentivization in the form of monetary and in-kind rewards is the third category of interventions, meant to create reward expectations for following a particular behaviour or abstaining from it. The fourth category of interventions is coercion, the opposite of incentivization, which creates an expectation of punishment, such as by raising prices or increasing taxes. The fifth type of intervention is training, where individuals are imparted skills to encourage particular activities. Restriction, which prohibits engagement in target behaviour with the use of rules such as bans or regulated uses, is the sixth category of interventions. By discouraging competing behaviours, these can also be used to encourage a particular behaviour. Another set of interventions falls under the category of environmental restructuring, where, by modifying the physical context around an individual, such as improving infrastructure or technologies related to the targeted behaviour, behaviour change can be encouraged or discouraged. Another subset of interventions under this category captures the modification of the social context around the targeted behaviour, such as prompts that guide behaviour change. The penultimate category of interventions is modelling, where behavioural change is stimulated by depicting what model behaviour should be. This is the method of leading by example, by showcasing the model behaviour. Finally, under the category enablement, any type of support that increases the means, reduces the barriers, or increase the capability to act on targeted behaviour (such as surgeries or prosthetics to increase physical activity) will be included. Within our review, we further divide the intervention function of environmental restructuring into its two categories, physical restructuring and social restructuring, giving us ten intervention functions in total.

Each intervention function affects one or more source functions, and thereby leads to the required modification of health behaviours, attitudes and practices, as depicted in the concrete outcomes in our Theory of Change. These interventions aim at changing behaviour in the outcome categories, which also define the scope of this sector. For the purpose of this review, we are not per se interested in all possible health outcomes, but rather in what we can learn from these health behaviours for behaviours related to climate change mitigation and adaptation. We therefore propose to define the scope of the health outcomes along the following dimensions: action/health-seeking behaviours and purchasing/consumption behaviour. These two dimensions can have a private benefit (quitting smoking), or might alternatively also affect health outcomes for other individuals (because of less exposure to passive smoke). Overall, these outcomes (and interventions) will lead us to observe sustained improvements in health behaviour, infrastructure and practices.

Energy

Many of the interventions in climate mitigation can be found in the energy sector. Due to the implementation of the Paris Agreement, 197 countries are required to have national GHG-emission reduction policies and plans for their post-2020 agenda (World Resources Institute 2018). Fostering low-carbon technologies is therefore projected to be a major issue for governments (Bouye et al. 2018). The long-term goal of the Theory of Change in the energy sector is that production and consumption is sustainable, resilient and does not contribute to climate change. Moreover, an increase in energy supply and demand also aims to contribute to higher employment, which is on the one hand a social co-benefit of energy investments but on the other hand a potential conflict with the goal of climate change mitigation. In this sector, we base the Theory of Change mainly on different assessment reports and systematic reviews concerning climate change mitigation and adaptation to it, especially the IPCC’s Synthesis Report on Climate Change (Pachuari and Meyer 2014), the 3ie scoping report by Robalino et al. (2014), frameworks and reports by the World Bank Group, the International Energy Agency and the European Union Energy Initiative Partnership Dialogue Facility (Ebinger and Vergara 2011b), as well as on extensive discussions with the Climate Investment Fund’s Evaluation and Learning Initiative and the Green Climate Fund’s Independent Evaluation Unit.

In the spirit of Arnott et al. (2014), energy sector interventions are coded as „behavioural“ or „structural“. In light of cross-sector learnings, behavioural interventions are those that directly target behaviour change of individuals or household and measure a behavioural outcome. These will be classified according to the Behaviour Change Wheel (BCW, Michie et al. 2011). Structural interventions are those that do not or only indirectly lead to individual or household-level behavioural change.

We group interventions into four broad categories, which, according to the ToC, will jointly contribute to achieving the long-term goals. The first category is institutional and market systems (structural), i.e. interventions that change the institutional structure of energy systems or markets. The sub-categories are public-administration reforms, industry coordination and industry self-regulation, privatization, liberalization and introduction of market-based mechanisms as well as de-privatization and de-liberalization. The second category is incentives and standards (behavioural). This category consists of three sub-categories that directly link to the behavioural framework from Michie et al. (2011), as described in the public health sector above: incentivization (such as transfers), coercion (such as taxes and fees) and restrictions (such as bans and limits). The third category is “soft” interventions, which do not change incentives (behavioural). The sub-categories therein are again taken from Michie et al. (2011): education, persuasion, training, environmental restructuring (such as social norms), modelling (such as presenting model behaviour in TV shows) and enablement (such as defaults). Lastly, the fourth category includes investments into energy infrastructure, equipment and technologies (structural). Sub-categories are investments into energy transmission, distribution and storage of electric energy systems as well as investments into renewable energy generating equipment.

These interventions may lead to outcomes grouped into seven categories. First, mainly through investments into energy infrastructure, such as grid-extension, access to energy and the supply of (renewable) energy may increase. Second, energy market development may be spurred through institutional and market-systems interventions (International Finance Corporation 2019). Third, energy consumption and demand (differentiated between renewable, non-renewable and on-grid electricity) and fourth, adoption of more energy-efficient technologies (including the transition to renewables), may change due to targeted interventions in all intervention categories. Fifth, the resilience of energy systems to climate change may increase due to investments into energy systems, such as smart grids and energy storage capacities (Ebinger and Vergara 2011a; Stuart and Escudero 2017). Sixth, as a result of incentives and standards (such as energy-efficiency standards), as well as cleaner energy supply and demand and adoption of more energy-efficient technologies (such as improved cookstoves), GHG emissions and indoor air pollution may decrease. Lastly, as a labour-market co-benefit from investments into renewable energy, formal employment may increase.

In order to facilitate cross-sector learnings from the public health sector, all behavioural outcomes within these seven outcomes will be coded in terms of whether they are action behaviours or purchasing/consumption behaviours.

Approach and Methods

To our knowledge, there appears to be an absence of systematic evidence on the causal drivers of transformational change in general, and in particular in relation to climate change mitigation and adaptation. The study that is closest to our review is Lee et al. (2013), who systematically review the literature on organizational transformation, mainly in health care. Their definition of transformational change is, however, focussed on organizational practices, whereas we look at a broad range of outcomes. Furthermore, most included studies in their review are qualitative and thereby not able to show causal drivers of transformational change. This review will reduce this gap within the literature in order to inform governments, donors and other policy makers on the available evidence on a broad set of interventions and their effects on climate change mitigation and adaptation outcomes in the energy sector. We contribute to the literature on the drivers of transformational change in the following seven ways.

First, we discuss attributes of transformational change by offering a precise definition of transformational change (see below), which will form the basis of this review. Second, in order to learn about causal evidence on transformational change, we select only quantitative studies with an experimental or quasi-experimental study design. Furthermore, our inclusion criteria are based on a precise definition of transformational change. More specifically, we only include studies that have the potential to document transformational change according to these criteria. For instance, we intend to only include studies where data-collection was done at least 1 year after the intervention. Whether transformational change indeed happened is the empirical question to be answered through our meta-analysis. Third, while our review is broad in scope, and as illustrated above, we have a precise but extensive list of interventions and outcomes within each sector, within clearly structured categories. This allows us to search for evidence for transformational change across fields of studies while at the same time keeping the scope of the review manageable. Fourth, as a first step we provide a framework of reviewed evidence in the form of an evidence gap map (EGM) of interventions in the specified sectors. EGMs are a convenient and simple-to-use tool for policymakers to quickly inform themselves about existing evidence. This exercise will highlight where research is comprehensive and where there appears to be a lack of evidence. Moreover, it enables policymakers and practitioners to make informed decisions about project prioritization and further research activities. Fifth, we then conduct meta-analyses with the data extracted from the selected quantitative studies for sufficiently populated cells of the EGMs (i.e. at least 10 studies for the same intervention and outcome combination). This is another exercise that has not been found to be common in the literature on transformational change. Sixth, the results of the meta-analysis are important to determine where robust evidence exists, i.e. across individual studies and contexts, for transformational change. Doing so will minimize the risk that large effects of interventions are simply statistical outliers. By using the results from this meta-analysis, we intend to produce “transformational change maps” (TCMs), i.e. infographics that only show those intervention and outcome combinations where evidence for large effects at scale and over time exists. The TCMs will show the determinants of transformational change.

And seventh, in order to identify contributors of transformational change, we will search for common characteristics between populated cells in the TCMs, i.e. those intervention and outcome combinations where we find evidence for transformational change. We will also run, where applicable, meta-regressions across these cells in the TCMs and across sectors in order to explain heterogeneity in study results. This way we might learn which characteristics of interventions which contribute to transformational change and to do so we have to operationalize it into clearly measurable criteria. The first of these is a large depth of change: transformational change requires a sizeable change. This is measured in terms of a large effect size an intervention produced on the outcomes. To define what a large impact is, we rely on previous literature that has attempted to standardize these definitions. Sawilowsky (2009) defines rule of thumb effect sizes for Cohen’s d as large if d = 0.8 (and very large if d = 1.2 and huge for a d = 2), based on a review of literature and contextualization of effect sizes (Cohen 1988). For relative risk, common in the medical literature, the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) uses a scale separating relative risks of at least 2 as large (and those greater than 5 as very large, Guyatt et al. 2011). We will use these two definitions to define large impacts based on effect sizes in outcomes. Second, a large scale of change: Even with a large effect size, an intervention only becomes an important contributor of transformational change if it has sufficiently large scale, i.e. targeting many beneficiaries or covering large areas. Given the variety of interventions within the two sectors, we consider interventions as large scale if there are at least 1000 individual beneficiaries (effect being measured here is the treatment effect on the treated) or if they target an entire administrative area larger than a village (e.g. district, region, state). Third, sustained change. For a change to be transformational, it has to persist over time. The definition of sustained is found to vary considerably over the literature we reviewed (between 6 months and several years). In order to maintain coherence across the results, we consider an effect sustained if it persists at least 1 year after first full implementation of the intervention. Note that this is a lower bound so that impacts that arrive later than 1 year also pass this threshold. The question whether impacts are likely to wane or increase over time may be different between the two sectors. In public health, the time required to form a habit may be relatively short (Lally et al. 2010), while dis-adoption of behaviours is a strong concern. Therefore, behaviour change is unlikely to happen if it is not already present after 1 year. By contrast, many energy interventions may instead take a long time to demonstrate impacts. We therefore acknowledge that absence of large impacts in the energy sector after 1 year does not imply that large impacts may not arrive later. This is a concern for studies that measure outcomes only until 1 year after the onset of the intervention. While studying longer time-frames than 1 year would certainly be useful, setting the threshold higher would risk leaving out many studies.

The three criteria above only help finding studies that have the potential to document transformational change. Crucially, individual studies will also be selected for inclusion into the review if the evaluated intervention did not lead to large effects over time. If we only included studies with large effects over time, we would run a strong risk of picking statistical outliers instead of finding an unbiased picture of the available evidence. Study results will only become a selection criterion after meta- analysis and therefore always at the level of a group of closely related studies (with the same intervention and outcome combination). We will describe in the analysis subsection how the fulfilment of the three criteria of transformational change is reflected in the analysis that leads to the TCMs.

Inclusion/Exclusion Criteria

Following Petticrew and Roberts (2006), we use the PICOS model to precisely describe the inclusion and exclusion criteria. The tables including the summary of the inclusion and exclusion criteria for both sectors can be found in “Appendix 1”. Pilot screening may lead to adjustments in these tables to make sure that the categorization and coding of studies are sufficiently clear.

Population

We include interventions rolled out in LMICs, as defined by the current World Bank categorization (financial year 2020).Footnote 7 Thus, we exclude studies of interventions in high-income countries or that include LMICs but do not separately report results for those. In the energy sector, we exclude interventions targeted at children (below the age of 12) because generally they are not main agents of climate change mitigation and adaptation. In the health sector, interventions that target behavioural change of adolescents or children (below the age of 18) are excluded because we study long-lasting behavioural change, about which we may learn more general lessons from adults, who have more solidified personalities than adolescents. While it would be interesting to compare adults and adolescents, this would be beyond the scope of this review.

Interventions

The types of interventions we study are informed by the sector-specific Theories of Change described above and in Figs. 1 and 2. We focus on studies which seek to evaluate the causal effect of an intervention that was purposefully implemented. We focus only on interventions that are sufficiently large in scale in order to draw meaningful conclusions. Results need to be representative of a large-scale intervention through two ways, following (Muralidharan and Niehaus 2017). First, the scale of the intervention: there need to be at least 1000 treated beneficiaries (automatically fulfilled if there are more than 1000 treated individuals in the study sample). If the number of beneficiaries are not given or in case the intervention is disseminated through radio and other media, it needs to target an entire administrative area larger than a village (e.g. district, region, state). Second, the scale of the population represented the sample of treated individuals must be representative of a sampling frame of at least 1000 treated individuals or of an administrative area larger than a village. While truly large scale (to reduce the familiar upward “bias” of small-scale interventions and studies) would mean a higher threshold than 1000 beneficiaries, it is purposefully set low initially in order not to risk the exclusion of too many studies. Depending on the number of studies passing this threshold, it might therefore still be raised later.

Comparison

We consider only quantitative studies that aim to evaluate the causal effect of an intervention on the outcome, i.e. experimental or quasi-experimental studies further defined later. We include studies that have a clearly defined comparison group for evaluation of the treatment effect. The nature of the comparison group depends on the type of research design used in the study and can include both active and passive comparison groups.

Outcomes

Since our major outcome, transformational change, cannot be directly measured, we look at a range of outcomes and measure change therein, which could reflect transformative processes in the two sectors. Our list of outcomes is described above and in Figs. 1 and 2. When baseline values are used for identifying treatment effect, then time between baseline and endline needs to be at least 1 year. While collecting data 1 year after the intervention is in many cases not sufficiently long to be certain of a sustained change, e.g. by enduring over changes in political or administrative leadership, a higher threshold may lead to the exclusion of too many studies. This threshold may be adjusted depending on the number of studies we find, possibly differentiating between types of outcomes. As an example, the 1-year threshold is likely to be too short for key behavioural outcomes in public health, such as smoking and alcohol consumption.

Study Design

Based on the research design, we categorize the studies into two major groups: (a) Experimental designs—this type of study specifically uses random assignment of intervention to the treatment group and evaluates the effect by comparing the outcome with the control group and by using an appropriate methodology; (b) Non-experimental designs—in cases when the assignment of treatment is not random, various quasi-random designs are used to evaluate the treatment effects. These methods include and will be restricted to regression discontinuity design (RDD), instrumental variable (IV), difference-in-differences (DID) and propensity score matching (PSM). For the health sector, in addition to these aforementioned methods, we will also use interrupted time series (ITS) and controlled before after (CBA), given their relevance in the health literature. Of both these design types, the finalized studies would be critically appraised in order to identify their strength and weaknesses. We will explore the possibility of using Robot Reviewer for a (semi-) automated risk of bias assessment.

Exclusion Criteria

We will exclude studies that are conducted outside the time frame of 1990–2020 and before 2000 in the public health sector or not including a separate sample from LMICs. We also exclude studies that do not attempt to evaluate causal effects of the intervention on the outcome, in particular, that do not follow the methods explained in the study design. As mentioned above, we will exclude studies that are not sufficiently large-scale or long-term (as defined before). In addition, all studies that are not included within our interventions, even if they measure relevant outcomes, or vice versa, measure relevant outcomes but are not capturing relevant interventions will also be excluded.

Search Strategy

The search strategy aims to find both published and unpublished studies. A three-stage search strategy will be utilized in this review. In the first stage, studies will be searched using text in the title, abstract and the keywords (see “Appendix 2” for a full list of databases). For searching the databases, we decided on search terms for each sector as described in the following subsection on combinations of search terms.Footnote 8 In addition to these databases, the Cochrane, Campbell Collaboration, Collaboration for Environmental Evidence and 3ie libraries will be searched for impact evaluation studies and systematic reviews in the area of the above sectors. Further searches for grey literature in the energy sector will be conducted on institutional websites. In case there are less than 12,000 search results in the public health sector, a further search in Epistemonikos database will be conducted for impact evaluation studies and systematic reviews in the public health area. A record will be maintained describing the databases searched, the keywords used, and search results from each search engine. As the final (optional) step, in case there are less than 50 studies (in any of the sectors) selected for data extraction from the full text screening, we will do backward snowballing of the studies that have been selected from the full text screening, as a follow up on the initial search.Footnote 9 We are planning to run searches on the most appropriate databases for published literature, and websites of agencies and research institutes for grey literature. The choice of databases was guided by relevance and comprehensiveness of their coverage of the sectoral literature as well as the technical possibility to apply advanced search filters allowing to increase search precision. We also validated the selection by running preliminary searches following the expert advice from our search expert.Footnote 10 Given the scope of the review in terms of the range of topics as well as the time period covered, we will not perform hand searches of key journals. Instead, we will run a database search in the Web of Science platform with the simplified set of search terms in the three relevant energy journals with the highest impact factors.Footnote 11 We will not hand search specific journals in public health, as the relevant studies for LMICs are dispersed across a large number of journals, and we expect to capture a large number of studies already through the database searches.

Search Terms

The search terms are organized in six different categories that reflect the inclusion criteria and the sector-specific theories of change. The search terms within each category are combined with the OR operator, whereas the AND operator is used to combine the different categories of search terms.

  1. (1)

    Long-term or large-scale this category encompasses terms used to describe studies carried out over a longer time span or at a large scale.

  2. (2)

    Methodology these terms capture the experimental and quasi-experimental methods (for more details see the inclusion/exclusion tables in “Appendix 1”).

  3. (3)

    Countries all lower- and middle-income countries as well as general terms describing LMICs are listed here.

  4. (4)

    Interventions terms are based on sector-specific ToCs.

  5. (5)

    Outcomes terms are based on sector-specific ToCs.

  6. (6)

    Sector-specifying terms this category contains terms used to describe the respective sectors.

The following combinations of categories are used:

  • Energy the five categories (2–6 above) are combined through the AND operator. In case the total number of studies to be screened exceeds 7500, the long-term or large-scale terms (category 1 above) will be applied with the AND operator.

  • Public Health the four categories (2–5 above) are combined with the AND operator. In case the total number of studies to be screened exceeds 15,000, the long-term or large-scale terms (category 1 above) will be applied with the AND operator.

The search strategies are tested against a set of benchmark studies in each sector. If more than two thirds of the benchmark papers can be retrieved through the database searches, the search strategy is deemed satisfactory. This threshold has already been passed in the energy sector with the total number of studies within the target of 7500 (for the list of benchmark papers, see “Appendix 3”).

In public health, the list of benchmark papers has already been determined (also see “Appendix 3”), while the search strategy is being finalized by testing over two databases: PubMed and Web of Science. Currently, 68–90% of the benchmark studies for health (9 out of 10 in the Web of Science database and 11 out of 16 in the PubMed database) were part of the search results. The number of studies found for screening in these two databases stands at 30,120 papers (12,448 in the Web of Science database and 17,672 in the PubMed database). Given this is much larger than the targeted 15,000 papers, our current strategy is to streamline the search so as to not lose benchmark papers but remove most of the non-relevant papers within the search results. Several iterations, requiring various permutations and combinations of the search categories, are required. The current list of search terms (in Boolean format) is provided in “Appendix 4”.

Screening of Studies

The screening process of the two populations of studies, which we found through the literature searches described in the previous section, will be carried out in several steps. Note that these steps will be done for each sector such that there are two separate screening processes. First, pilot screening will make sure that the coding tools are well understood or revised. Two independent screeners will each screen 200 studies. The results of pilot screening are considered satisfactory when the overlap between the inclusion decisions of both screeners after reconciliation is above 80%. 12 Second, titles, abstracts and keywords will be screened to exclude any irrelevant studies. In order to save time given the wide scope of the literature search, this stage will be assisted by the machine-learning algorithm embedded in EPPI Reviewer 4. We propose the following procedures to achieve both speed and quality of screening. The machine learning algorithm will be fed the results of the pilot screening of 200 studies. The software will then sort the entire population of studies by prioritizing them according to relevance. The first 50% of studies, sorted by relevance, will be screened by two independent screeners. The next 25% of studies will be screened only by a single screener and the last 25% directly excluded from the review. The screening process will stop earlier in case 100 continuous studies, sorted by priority, are all excluded. Third, we will apply the specified inclusion/exclusion criteria to the full text and determine whether the study should be included for analysis. We will record all search results, including the reasons for exclusion at the full-text screening stage. These results will be presented in the PRISMA diagram. At least 20% of studies will be double-screened by a second reviewer. Disagreement will be resolved through discussion and third-member involvement.

Systematic reviews will be screened on the basis of their inclusion criteria. If the inclusion criteria of a systematic review meet all of our inclusion criteria, it passes on to data-extraction to be shown in our own EGMs. Depending on the number of studies found, either one or two people working independently will extract information from each study included in the review. In this step, data will be extracted and summarized using a pre-piloted extraction form by two people. Disagreements in coding will be resolved through discussion and third-member involvement. The goal of the analysis is to document evidence for transformational change. The analysis will proceed in several steps for each sector as described below, with technical details following in later paragraphs.

  1. (1)

    We will use simple EGMs, with interventions listed along the Y-axis and outcomes along the X-axis, to document evidence and gaps within the scope of each sector.

  2. (2)

    We will then concentrate on the sufficiently populated cells (at least 10 individual studies) within the map to run meta-analyses on the available evidence and estimate average effect sizes.

  3. (3)

    We will then map only those combinations of interventions and outcomes where evidence of transformational change is found. That is to say, we will only show those combinations of interventions and outcomes where there is a large effect size at least 1 year after the intervention, following the thresholds defined before. It is this step where the results of the studies, i.e. depth of change and sustained change, are used as selection criteria. However, selection is not done at the level of the individual study but rather at the level of intervention–outcome combinations (cells in the EGM). Based on the simulation results of our meta- analysis expert (Frank Renkewitz), 10 studies are a lower bound to test for heterogeneity and therefore to assess the generalizability of the results. The results of this exercise will be shown in “transformational change maps” (TCMs) and discussed.

  4. (4)

    In order to identify contributors of transformational change, we will search for common characteristics between populated cells in the TCMs, i.e. those intervention and outcome combinations where we find evidence for transformational change. We will also run, where applicable, meta-regressions across these cells in the TCMs and across sectors in order to explain heterogeneity in study results. This way we might learn which characteristics of interventions contribute to transformational change.

In order to draw the EGMs, the following procedure will be applied. For the categorization of studies, we intend to follow Rankin et al. (2016) to determine the categorization of studies in the EGMs. In case several different interventions were grouped together, each intervention would be coded separately in order to be able to show all available evidence related to a particular intervention. For example, a study may look at the effects of a programme that includes a cash transfer intervention and an awareness intervention on two different outcomes. In this case, the two associated outcomes would be coded separately for each intervention. In some studies, it might be that only some elements of the programme or evaluation were relevant to this EGM (e.g. specific intervention or outcome) and only these aspects would then be extracted and coded. Systematic reviews will be coded based on the PICOS of the review. If a systematic review covers more than one intervention and outcome, it will appear in each cell that applies. In terms of generating categories based on the outcomes related to each sector, the outcomes are presented on the x axis of the map (every column) and indicate a cluster of multiple studies. These categories would be generated on the basis of the outcomes as described in the sector-specific theories of change. In terms of generating categories for the interventions and outcomes related to each sector, the Y axis (each row) of the EGM lists all the specific interventions that were found as part of the review. These would be listed under the nature of the intervention. For instance, if the aim is to reduce CO2 emissions via carbon taxation, then incentivization would be the category for that particular row. All impact evaluations/systematic reviews that use carbon taxation as an intervention would be included within that row. In terms of ranking the systematic reviews on the basis of their quality, following the categorization of each systematic review, we will sort it according to the confidence with which one can attribute the particular outcomes to the given intervention. This ranking code can be based on the SURE (2011) ranking, which was used in the Snilsveit et al. (2013) paper. The checklist ranks systematic reviews on the basis of methods that were used to identify, include and critically appraise studies in the systematic review, as well as the methods used to analyse the findings.Footnote 12 To assess the risk of bias in primary studies, we will use the Cochrane tool by Higgins et al. (2011).

We will attempt to do a meta-analysis for the studies with comparable variables and coefficients. We will make the studies comparable by calculating the same standardized effect sizes across the studies. We will also attempt to detect publication bias and subsequently run sensitivity analyses of the distribution of the effects, after comparing outcomes of different correction methods. We exclude studies that do not provide sufficient information to do this or which are not exclusively based on experimental or quasi-experimental methods. Synthesis of the evidence from the included studies will be presented through narrative and statistical analysis of comparable effect sizes using meta-analysis. Meta-analysis is useful in synthesizing quantitative evidence as it takes into account the statistical power of the estimated effect. Calculations of standardized mean difference, or the risk ratios are appropriate for similar type of treatment effects, hence they can be widely used for studies that apply randomized control trials. However, in case of quasi-experimental studies, the treatment effects may not be strictly comparable. For instance, studies that use a regression discontinuity design or instrumental variable method typically estimate local average treatment effect (LATE), while those using propensity score matching would estimate the average treatment effect on treated (ATT). Therefore, we shall conduct meta-analysis where it is possible to convert the treatment effects into comparable measures (Duvendack et al. 2012). Specifically, we shall carry out meta-analysis if the following conditions are met:

  1. (a)

    the interventions are sufficiently similar to be comparable,

  2. (b)

    the effect sizes can be computed for comparison,

  3. (c)

    the outcome measures are sufficiently similar,

  4. (d)

    there are at least 10 different studies available that meet these criteria for the same intervention and outcome combination.

These results will be presented using conventional methods such as forest plots. In terms of software, we will use Stata or R for this purpose. In case there are less than 25 studies that enter meta-analysis in each sector, we will at that point explore the possibilities of combining intervention (sub-)categories (combining cells), re-arranging cells so as to conduct further meta-analysis or alternative evidence aggregation methods that can provide suggestive evidence on transformational change. When meta-analysis is possible (see above), we shall test for heterogeneity across studies and assess the amount of heterogeneity by the tau statistic as well as the I2 statistic. Tau denotes the standard error of true effect sizes in the original units, whereas I2 measures the percentage of variability across studies that is not due to sampling error but rather to differences in study population, intervention and implementation. Thus, tau indicates the stability of an average true effect size across studies, while the I2 allows for a rough categorization of heterogeneity (Borenstein et al. 2011). We will follow the corresponding rule of thumb that if the I2 statistic hits the threshold of 75% then there is high heterogeneity, with 50% there is moderate heterogeneity and with 25% the extent of heterogeneity is low. We can also use the Q-statistic to test for statistical heterogeneity in the outcome variables. If high heterogeneity is present, we shall investigate what factors explain it by conducting moderator analysis, including sub-group meta-analysis and meta-regression, if possible. For sufficient statistical power in meta-regressions, we follow Borenstein et al. (2011), who recommend that each covariate (the coding of studies) contains at least ten studies. Where studies are sufficiently similar to be comparable, we will run meta-regressions across sufficiently populated cells in both EGMs. This method will enable us to examine which factors contribute to transformational change. To check if the results are sensitive to the quality of data and approaches to analysis, we shall report at sub-group based results levels, assuming at least 10 studies per sub-group, particularly based on study design. We shall use funnel plots and corresponding regression methods (Stanley and Doucouliagos 2014) and sub-group analysis comparing published versus unpublished studies to assess potential publication bias.

Conclusion

This review combines two different reviews into one learning exercise on transformational change. As part of this, we will discuss what can be learned from transformational change in the public health sector for climate change mitigation and adaptation in the energy sector. Given the focus of the health sector, those learnings will concentrate on which type of interventions may lead to, predominantly individual, behavioural change. The categorization and coding frameworks in both sectors are designed to facilitate these lessons by making intervention and outcome categorization as similar as it is possible given the different natures of the two sectors. For instance, the intervention framework by Michie et al. (2011) is applied both in the health sector and to three intervention categories in the energy sector.

Cross-sector learning will not be a statistical exercise per se, but a discussion informed by the data synthesis. We propose the following steps. First, we will discuss which determinants and contributors of transformational change identified in our review are similar between the two sectors. Second, we will discuss potential reasons for areas of conflicting evidence between the sectors. Third, assuming a larger body of evidence in the health sector and consequently gaps in evidence in the energy sector, we will discuss which determinants of transformational change in the health sector could also apply to the energy sector. This step will be guided by thinking about the theoretic mechanisms behind long-term behavioural change in the health sector (as one example: commitment devices that can narrow the divide behind an intention to exercise and actually exercising). Then, we will ask about which outcomes in the energy sector the same theoretic mechanisms may also apply and lastly which interventions are therefore promising to achieve the same transformational change in the energy sector.

Climate change is one of the most pressing global priorities of the twenty first century. To achieve the necessary mitigation and adaptation activities, transformational changes are needed across systems and individual behaviour. This joint evidence review by the Green Climate Fund—Independent Evaluation Unit and the Climate Investment Funds, completed by the Center for Evaluation and Development with the assistance of the Africa Centre for Systematic Reviews and Knowledge Translation and with the advice of the Campbell Collaboration, will map out the landscape of evidence on transformational change in two sectors (for the distribution of roles, see “Appendix 5”). The lessons from that landscape could contribute to making the globe a more habitable planet in the twenty first century and beyond.