Conservation cost-effectiveness: a review of the evidence base

Prioritizing conservation interventions based on their cost-effectiveness may enhance global conservation impact. To do this prioritization, conservation decision-makers need evidence of what works where and how much it costs. Yet, the size, representativeness, and strength of the cost-effectiveness evidence base are unknown. We reviewed conservation cost-effectiveness studies, exploring the representation of different types of conservation interventions, habitats and locations, and the methods used. Studies were included if they were published in conservation science or related fields before 2017; were peer-reviewed; reported costs and conservation-effectiveness or ratios; and were based on empirical data. From an initial search of 13,184 articles, 91 were considered eligible. We found that the number of cost-effectiveness studies were growing but remain small. Many common conservation interventions were poorly represented, and there were large geographical biases, with few studies in the world's more biodiverse regions. This sparse and patchy evidence may result from challenges faced when conducting cost-effectiveness analysis. However, some of these challenges are not unique to cost-effectiveness studies, and others could be overcome through the use of standardized reporting methods. The reward for overcoming these challenges, and strengthening the evidence base, could be a significant and much-needed improvement in global conservation.


| INTRODUCTION
Conservation does not have enough resources to achieve all its goals. Recognition of this has prompted multiple calls for conservation to optimize its impact within resource constraints by prioritizing efficient interventions (Balmford et al. 2000;Ferraro & Pattanayak 2006;Wilson et al. 2006;Wilson et al. 2011;Butchart et al. 2015). Whereas effectiveness indicates success towards a desired outcome, cost-effectiveness is the ratio of inputs to outcomes. Cost-effective conservation interventions are, therefore, those that have a high ratio of outcomes to costs relative to other interventions. Within the hypothetical example of species population recovery interventions presented in Table 1, intervention "A" is more effective than intervention "B" or "C". However, intervention "A" is also more costly per unit of outcome (percentage (%) recovery in the population of the species) than "B" and "C" and so is less cost-effective. With a limited budget of $100, a conservation manager could invest their entire budget into intervention "A" and achieve 50% of their goal (i.e., 50% effectiveness). Alternatively, by investing half the budget equally in interventions' "B" and "C," they would achieve a greater percentage of their goal (i.e., 60% effectiveness). Although hypothetical, conservationists regularly make choices about which interventions to pursue. When these choices are informed by knowledge of effectiveness, but not cost, then less costeffective interventions may be chosen.
Conversely, prioritizing cost-effective interventions is expected to maximize conservation impact across a range of activities (Pullin & Knight 2001;Cook et al. 2017). In addition to directly helping practitioners decide which interventions to choose, and donors decide what activities to fund, a well-developed evidence base offers further opportunities. For example, synthesizing across studies may reveal causes of variation, uncertainty, and risk associated with cost-effectiveness estimates (Cook et al. 2017;Iacona et al. 2018). For instance, in the example above, interventions of the same type as "A" might have highly variable cost-effectiveness ratios, meaning a funders' return on investment in intervention "A" might be uncertain. Furthermore, exploring causes of variability may indicate leverage points for improving cost-effectiveness. For instance, interventions of the same type as "A" might require good project planning, and so management training, for example, might be a leverage point for enhancing cost-effectiveness.
Moreover, evaluating ensembles of conservation interventions may tell us how costly it is to meet broader conservation goals, such as the recovery of a species described by the IUCN Green List of Species (Akcakaya et al. 2018). For instance, interventions' "A," "B," and "C" might all be required for the long-term recovery of a species, and so consolidating the cost-effectiveness estimates from each intervention illustrates the total cost of meeting that conservation goal. Similarly, evaluating sets of interventions operating in sequence or parallel might reveal cost-effective combinations of activities. For instance, intervention "A" may involve protected area enforcement, "B" may be invasive species control, and "C" might be an alternative livelihood initiative constructed around products derived from the invasive species. Interventions' "B" and "C" may, therefore, be cost-effective combinations. The cost-effectiveness of sets of interventions might be compared to find the cheapest ways of meeting a specific target, such as ensuring no net loss of biodiversity associated with infrastructure developments, for example (Arlidge et al. 2018).
Given the potential benefits of cost-effectiveness based prioritization, how has the conservation sector responded to repeated calls for its adoption? Progress has been made in understanding effectivenessone side of the costeffectiveness equationwith a move towards impact evaluation and evidence synthesis (Ferraro & Pattanayak 2006;Sutherland et al. 2019). For instance, the Conservation Evidence project collates evidence of what works in conservation (Conservation Evidence 2020). Similarly, the Collaboration for Environmental Evidence facilitates systematic reviews and maps of evidence for and against the effectiveness of conservation interventions and maintains the journal Environmental Evidence (Collaboration for Environmental Evidence 2018). One example of a systematic map explored the environmental, economic, governance, and social impacts of the Marine Stewardship Council seafood ecolabelling program (Arton et al. 2018). Such synthesis can be valuable for practitioners deciding how successful interventions are likely to be.
Yet, determining whether an intervention is effective is only one component of deciding whether it is the best course of action. Another key component is the costthe other side of the cost-effectiveness equation (Cook et al. 2017). These costs can include labor, capital assets, consumables, overheads, and other expenses, often accruing at the level of intervention, program, and organizational, and over varying time horizons (Iacona et al. 2018). The systematic reporting of costs could help reveal how much funding is required to deliver a given T A B L E 1 A hypothetical example illustrating the difference between effectiveness and cost-effectiveness in the recovery of the population of a hypothetical species conservation outcome, how and why costs vary between contexts, as well as guiding prioritization (Iacona et al. 2018). This study seeks to determine whether there is sufficient, representative, and robust cost-effectiveness evidence to facilitate the prioritization of efficient interventions by providing the first review of the number and distribution of cost-effectiveness studies in conservation. The review had five components. First, to estimate the number of conservation cost-effectiveness studies published before 2017. Second, to evaluate the rate of growth in the evidence base by assessing trends in the number of articles published on the topic over time. Assessing these trends may indicate the expected future growth of the evidence base. Third, to explore the distribution of cost-effectiveness studies across intervention types. This component may indicate underrepresented intervention types that require further cost-effectiveness evaluation. Fourth, to map the distribution of studies across geographic locations and habitat types. Conservation cost-effectiveness ratios vary greatly in different parts of the world; failing to account for this variation can distort conservation prioritization (Balmford et al. 2003;Wilson et al. 2006). Accounting for these differences requires a geographically representative selection of costeffectiveness studies. Fifth, to document the methods used by cost-effectiveness studies in order to assess the degree to which they provide a robust assessment of conservation outcomes. Collectively, this review explores if there is sufficient, representative, and robust costeffectiveness evidence to facilitate the prioritization of efficient conservation interventions, and suggests where future efforts should be directed.

| METHODS
We reviewed the peer-reviewed literature, following steps informed by the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) protocol (www. prisma-statement.org). PRISMA has been considered the gold standard for conducting systematic reviews in medicine for over a decade. The exact methods used in this review are discussed below, outlining the scope, inclusion criteria, and methods used to generate the review.

| Scope
The scope of the review was all studies of the costeffectiveness of nature conservation interventions globally. Interventions are defined as actions or activities conducted with a specific conservation goal (see below for details). This scope was judged by TP and LRC to be appropriate for answering the research aims.

| Inclusion criteria
Studies were considered eligible if they were: i) published in conservation, environmental science, ecology, biology, or related fields; ii) were peer-reviewed articles; iii) reported costs and associated conservation outcomes (i.e., effectiveness) or cost-effectiveness ratios; iv) were conducted at a sub-national scale or had a clearly defined geographical scope; v) were based on empirical data (direct measurement or empirical modeling) of costs and outcomes; vi) reported a conservation intervention; vii) reported a location; viii) were published before 2017; and ix) were published in English. While its exclusion was recognized as a limitation, grey literature was not included because of the difficulty in systematically identifying and evaluating the quality of these studies' (Adams et al. 2016). Ex-ante, optimization-based prioritization studies (where costs and effectiveness data are derived from forecasts rather than real data), simulations based on dummy data, and opinion pieces that were not based on primary empirical data were excluded.

| Key search terms
An initial set of candidate search strings followed the "patient, population or problem," "intervention or exposure," "comparison," and "outcome" (PICO) formula, as advised in the PRISMA protocol. However, trialing these search strings revealed that the a priori selection and inclusion of "comparison" and "outcome" terms significantly restricted the scope of the review, and so these terms were excluded. Limiting the search to relevant "subject areas" substantially reduced the number of articles returned that were outside the review scope. The broad "subject areas" were chosen by inspecting the research results of different choices of the subject area.
This process produced ten candidate search strings composed of terms capturing the "population," "exposure," and "subject area" ( Table 2, Table S1). The population search terms aimed to capture all peerreviewed articles associated with biodiversity conservation interventions, without being prescriptive about the type of intervention. These terms broadly related to conservation areas, endangered nature, or conservation efforts. The exposure search term aimed to capture all cost-effectiveness studies, using terms considered most likely to be included in such studies. All of the candidate search strings included the subject area search terms to help exclude irrelevant articles. Truncation (using the wildcard "*" in most databases) was used to disambiguate alternative word endings and plurals in search terms. Wildcard operators (using "$" and "?" in most databases) were used to manage differences in spelling. "AND" and "OR" were used as Boolean operators. Candidate search strings were evaluated in Web of Science and Scopus according to the proportion of relevant to nonrelevant articles, and the most effective combination was used as the final search string (Table 2). This comparison involved inspecting the first 50 titles returned from each of the ten search strings and qualitatively comparing them to the total number of articles returned. As a further step, the final search string was validated using a training set of 13 articles known to be relevant to the objectives of the review. All 13 articles identified a priori were found in the list of returned results, providing confidence that the final search string was appropriate.

| Publication sources
Searches were conducted in the following databases: • References from each publication source were stored in separate Endnote™ libraries. They were then merged, and duplicates were excluded.

| Screening
Article screening was conducted in three stages, by TP. The first stages involved reviewing the titles and then abstracts of all papers in Endnote™, excluding those that explicitly violated one or more criteria (described in Table S2 and Table S3, respectively). The second stage involved verifying the title, year, abstract, URL, and DOI of the remaining articles. Discrepancies were manually corrected, and full-text articles were retrieved. The third screening stage involved a full-text review against the inclusion criteria (Table S4). Justifications for exclusion were recorded. All papers that passed the three screening stages were included in the review.

| Data extraction
TP extracted data from the papers following the coding criteria (Table S5). Variables extracted included: the starting year of study; the conservation intervention type; the country, region, and habitat where the intervention was conducted; and the methods used to assess effectiveness. The type of conservation intervention was classified and defined according to an adapted version of the standard lexicon for conservation actions (Salafsky et al. 2008), merging closely related categories (Table S6). All activities were considered interventions if they could be classified within the adapted standard lexicon, were costed, and reported conservation outcomes. In some cases, more than one intervention or habitat was reported. In these cases, each intervention or habitat type was recorded (up to three intervention or habitat types). We also recorded if the study design used controls, comparators, or matched samples; random sampling or assignment of treatments; and before-and-after or other longitudinal methods (such as step-wedge design interventions). Habitat types were classified according to a simplified adaptation of the IUCN Habitats classification (Table S7, IUCN 2018). This included merging several categories, such as combining subarctic, sub-Antarctic, T A B L E 2 The search terms used to identify relevant literature within the review

Subject area ((Biodiversity & Conservation) OR (Environmental Sciences & Ecology) OR (Zoology) OR (Geography) OR (Marine & Freshwater Biology) OR (Forestry))
Note: Nonexact phrases (where quotes did not enclose search terms) were used for most phrases to retrieve a broad sample of articles. The format of the subject area search strings varied by the database used, since each database had slightly varying pre-defined subject areas. In the case of SCOPUS and Web of Science, these were pre-defined subject areas, as shown in the table (e.g., "Biodiversity & Conservation"). In the case of EconLit, these were codes that corresponded to the subject areas (e.g., "Q57," which corresponds to "biodiversity conservation" and related fields). Similarly, in PubMed, these were pre-defined subject areas, which also corresponded to those shown below (e.g., "conservation, natural resources"). See Table S1 for the search string for each database. and temperate grassland into a single group. This simplification was necessary because many studies reported general habitat types rather than the exact IUCN Habitats classification.

| RESULTS
A total of 13,184 studies were returned by the searches, with 10,892 unique studies identified once duplicates were removed. Following screening based on titles, 9,469 studies were excluded, mostly because they focused on biological processes, engineering, agrotechnology, or pollution remediation. The abstracts of the remaining 1,423 studies were read, and a further 929 studies were excluded at this stage, largely because no costeffectiveness analysis had been conducted within the study. A total of 494 studies passed to the final round of screening. At the full-text review, 403 studies were excluded (Table S4), mostly because they did not report costs and effectiveness or because they predicted rather than measured outcomes. Only 91 studies met all inclusion criteria and were included in the review (Table S5, Figure S8). Many of the 91 articles included more than one intervention or interventions in more than one habitat type, so the relevant number of observations are reported for each section of the results, indicated by "n."

| Temporal trends
There has been an apparent increase in the number of cost-effectiveness studies, although the absolute numbers remain small ( Figure 1). On average, studies were initiated 7.5 years before their publication.

| The distribution between conservation interventions
In total, 117 interventions representing 12 intervention types were reported in the 91 studies; 18 studies assessed two interventions, and six studies assessed more than two interventions. The most common conservation interventions assessed were invasive species control (n = 32 studies), and habitat and natural process restoration (n = 23 studies; Figure 2). "Enterprise" interventions, including payments for ecosystem services and alternative livelihoods projects, were the third-largest group (n = 17 studies). Many of the conservation action categories were poorly represented (e.g., legislation and policy, awareness, and communications), and no studies described the conservation cost-effectiveness of education/training and standards.

| The distribution between habitats and locations
A total of 158 habitats, representing 24 habitat types, were reported in the 91 studies. Tropical forests were the most studied habitat type (n = 22 studies, Figure 3(a)), followed by agricultural land (n = 18 studies), and temperate forest (n = 16 studies). Conversely, some habitat types were poorly represented, with deserts; swamp forest; estuaries, seagrass, and other marine habitats (which we refer to as "ocean/sea"); and tundra or snowy areas only having one observation each. All continents apart from Antarctica were represented in the 91 studies (Figure 3(b)). The greatest number of observations was from North America (n = 24 studies), followed by Europe (n = 18 studies), and Oceania (n = 17 studies). Africa, Asia, and South America were less well represented The number of conservation cost-effectiveness studies published before 2017 The number of observations of each intervention type (n = 15 studies, n = 13 studies, and n = 6 studies, respectively). The primary authors' first listed institution was in a country where English was a national language in 68.1% of the articles. This included 22 articles from the United States, 11 from the United Kingdom, and eight from Australia. In many cases, a given intervention might target multiple habitat types. For example, a single study in South Africa described a management intervention that covered an area of savanna, tropical shrubland, and tropical grassland. Furthermore, multiple interventions were sometimes implemented in the same habitat. For example, both enterprise and legislation and policy interventions were implemented in agricultural habitats in another study. Enterprise interventions in agricultural habitats, habitat and natural process restoration interventions in tropical forests, and invasive species control in tropical shrub habitats were the most common combinations of intervention and habitat type (see Figure S9). Species management, resource and habitat protection, and invasive species control in North America were well represented, along with restoration in Asia and invasive species control in Oceania (see Figure S9).
There appeared to be a relatively poor overlap between the number of studies and species richness of mammals, amphibians, and birds ( Figure 4). For example, large parts of the tropics were poorly represented, such as most biodiversity hotspots of the Amazon basin and parts of the Congo Basin. Conversely, there were many studies in the less species-rich areas of North America and Europe. Overall, large parts of the world were completely devoid of studies, such as across North Africa, the Middle East, parts of Eurasia, and Antarctica.

| Methods used to assess effectiveness
The majority of studies reported some form of experimental control, comparative analysis, or matching to determine effectiveness (75.3% of the studies). Similarly, around two-thirds of studies reported some form of before-and-after or longitudinal analysis (63.4%). Around one third reported random sampling and/or assignment of treatments (34.4%).

| DISCUSSION
Our results suggest that the conservation costeffectiveness evidence base may be growing but remains small relative to the number needed to aid prioritization of management interventions. Consequently, we remain far from the vision of cost-effectiveness based conservation prioritization. Moreover, there was a significant bias in the distribution of evidence across locations, habitats, and intervention type. Invasive species control and habitat and natural process restoration had the highest representation, but many commonly used interventions were poorly studied. Similarly, some habitats, such as tropical and temperate forests, were well represented, whereas many othersmost notably marine habitatswere poorly represented. Significantly, there was an absence of studies in highly biodiverse parts of South America and Africa.

| The total number of studies and temporal trends
Significant progress in evaluating and synthesizing evidence of what works in conservation has been made over the last decade. The Conservation Evidence project alone maintains a database of over 6,700 summaries of the effectiveness of conservation interventions (Conservation Evidence 2020). Yet, effectiveness studies alone cannot be used to prioritize interventions on the basis of their costeffectiveness, and therefore do not allow us to make the most efficient use of limited conservation resources.
Although the number of studies that include the keyword "effectiveness" has grown almost exponentially since 1995, the number with the keyword "cost-effectiveness" has lagged (Cook et al. 2017). Our results suggest that, while some progress has been made, the repeated calls to integrate cost-effectiveness analysis into conservation prioritization have not yet been answered. From a statistical perspective, a robust evidence base requires multiple replicate studies to control for the diverse contextual factors likely to influence cost-effectiveness ratios for a given intervention type. Within our review, invasive species control was the most studied intervention, with 32 observations. Even this number of studies may not be sufficient to derive robust predictions of cost-effectiveness in different conservation contexts. Furthermore, the absence of estimates of cost-effectiveness for many intervention types in many habitats precludes empirically-grounded cost-effectiveness prioritization in those areas.
The limited size of the cost-effectiveness evidence base could be the result of multiple factors. One challenge may be the difficulty in comparing conservation outcomes that have different subjective values (Armsworth et al. 2017). For instance, comparing the cost-effectiveness of different interventions for the same conservation outcome may be relatively tractable, such as alternative mowing and burning regimes for invasive species control (e.g., Musil et al. 2005). However, prioritizing investment across interventions with different outcomes such as comparing coral reef restoration (e.g., Haisfield et al. 2010) with wire netting to reduce tree damage by elephants (e.g., Derham et al. 2016)requires a common unit of conservation value. However, any comparison of interventions with different conservation outcomes would face this challenge; the difficulty of comparing "apples and oranges" is not unique to cost-effectiveness based approaches. This challenge has parallels with costeffectiveness based prioritization in public health, where standard measures such as Disability-Adjusted Life-Years have been developed and are used to compare candidate interventions (Baltussen et al. 2003). Although common units of conservation value are likely to be imperfect and contested, their practical value for decision-making could outweigh their limitations.
Another potentially significant challenge facing costeffectiveness analysis is the difficulty of providing a full accounting for costs, enabling consistent estimation and comparison of costs (Armsworth 2014). Costs can include labor (such as ranger training and patrols), capital assets (such as tools and equipment), consumables (such as herbicides used in invasive species control), and overheads (such as the salaries of office staff). In many cases, these costs are not discrete and may accrue at the intervention, program, organizational, and cross-organizational levels. Similarly, these costs may be expended over different time horizons and multiple interventions. These varying scales may be challenging to consolidate into a single cost-effectiveness estimate in a way that is comparable across intervention types. For instance, protected area management studies may include overhead costs, whereas invasive species control studies may exclude them. Similarly, costs may depend on the scale of the intervention, such as the economies of scale experienced during protected area acquisition (Kim et al. 2014). Alternatively, some studies may only consider marginal costs, excluding fixed costs, or spreading start-up costs over different time horizons. Other studies may deduct cobenefits from costs or include socially externalized costs. This review did not attempt to extract and categorize costs systematically. Nevertheless, we observed the large differences in the types and detail of costs recorded, which have been identified as key challenges for building the conservation cost-effectiveness evidence base (e.g., Iacona et al. 2018). In particular, we observed that in many cases the organizational level and time-horizon of the impacts of an intervention were not explicitly stated. Similarly, often relatively little context describing if an intervention was conducted in parallel or in sequence with other interventions was provided. This lack of standardized reporting presents multiple challenges for integrating costs with effectiveness analysis.
The need for a standardized approach to reporting costs has been addressed in a recent study by Iacona et al. (2018). This study provides a blueprint for reporting on the conservation objectives, context, methods, outcomes, costs categories, and scale of costed interventions. Such standardized reporting tools could be integrated into existing fundraising activities, where budgets and expected outcomes are presented in grant applications. They could also be integrated into accountancy practices, where expenditure should already be attributed to organizational activities. Doing so may help organizations reflect on the profile of their expenditure, support project evaluation, reporting to funders, and develop well-costed budgets for planning and future grant-seeking, along with enhanced transparency. Nevertheless, implementing standardized reporting will require investment and expertise. Thus, reporting requirements may need to be scaled to the capacity of the organization.
Improvements in the reporting of costs would complement the increasingly well-developed conservation effectiveness evidence base. For instance, a database similar to that maintained by the Conservation Evidence project could be a repository of both effectiveness and disaggregated cost estimates (see further discussion of this in the Study considerations section below). Similarly, cost-effectiveness meta-analyses could be encouraged and published in journals that house articles consolidating effectiveness studies, such as Environmental Evidence.

| The distribution of studies across conservation interventions
Multiple factors may account for why some intervention types are better represented than others. Firstly, some intervention types may be easier to evaluate than others, resulting in fewer studies of interventions that are challenging to assess. For instance, one study assessed the cost-effectiveness of South Africa's invasive alien plant control program Working for Water over seven years (McConnachie et al. 2012). This study evaluated changes in invasive species cover, which was directly measured over clearly defined spatial extents in standardized units. In contrast, another study assessed the cost-effectiveness of an awareness and communications intervention for the conservation of the Philippine crocodile, Crocodylus mindorensis (van der Ploeg et al. 2011). This study involved attributing conservation outcomes through a complex theory of change over a large spatial and temporal extent. Comparing these examples, it appears easier to assess the cost-effectiveness of interventions with more tangible and directly measurable outcomes and for species with traits more amenable to consistent measurement, regardless of their respective conservation value.
Secondly, there may be traditions that influence the collection of cost and effectiveness data for some intervention types. For instance, invasive species control has historically integrated economic theory and may draw on established agricultural research practices concerned with weed management (Wiles 2004;Epanchin-Niell & Hastings 2010). This integration may be because the costs of invasive species to agriculture, industry, or infrastructure were relatively apparent, and so cost-effectiveness approaches may have helped meet business and economic targets. In contrast, site-based protection that involves a wide range of interventions may not involve the routine collection of cost-effectiveness data and can be complicated by accounting methods that aggregate costs at the site or intervention level (Sutherland et al. 2004;Coad et al. 2015). As a result, the distribution of studies across intervention types may reflect historical and structural biases.
Finally, conservation organizations face multiple constraints and incentives. There can be limited incentives and active disincentives for rigorous analysis of the costeffectiveness of interventions. For example, conservation campaigns may have the dual objectives of raising awareness of a given environmental issue and promoting an organization's brand, which may advance institutional objectives even if not directly enhancing conservation impact (Wright et al. 2015). Moreover, some organizations may not wish to evaluate interventions believed to be inefficient or to report the results of evaluations that show poor outcomes. This has wider implications for conservation prioritization, since organizations may evaluate a wide range of additional factors beyond simply maximizing direct conservation returns when prioritizing efforts.

| The distribution between habitats and locations
Many international organizations prioritize conservation efforts based on biodiversity values, ignoring relative conservation cost-effectiveness (Wilson et al. 2006). One factor limiting an organization's ability to efficiently prioritize actions across locations may be the availability of cost-effectiveness data. Our results show high variability in the distribution of cost-effectiveness evidence, with few studies in some of the world's most biodiverse areas. Southeast Brazil, central Africa, and the Amazon combined include around 50% of all species globally (Jenkins et al. 2013). Yet, only 14% of cost-effectiveness studies were conducted in South America and central Africa. This result reflects a wider pattern where areas of greatest biodiversity are significantly unrepresented by conservation cost-effectiveness research, often as a result of greater investment in research funding in some countries (Wilson et al. 2016). Consequently, it is likely that the number of cost-effectiveness studies published in a given country is strongly influenced by nationally available research funding. For example, one study looking at conservation articles published between 2011 and 2015 found that the majority of published papers focused on the world's least biodiverse areas, neglecting areas that would most benefit from such research (Di Marco et al. 2017). Nevertheless, this geographical bias may limit the ability for funders to prioritize interventions at a global level. Furthermore, this bias might also obscure locally important causes of variation in cost-effectiveness, limiting our understanding of the factors that influence it.
We used an aggregated version of the IUCN Habitat classification to classify the habitats where costeffectiveness studies were conducted because many habitat categories were not represented in our dataset (IUCN 2018). Most notable was the limited number of studies in marine environmentsitself a highly aggregated categoryespecially when excluding coastal and coral reef habitats. There exists a general research bias towards terrestrial over marine environments (Levin & Kochin 2004). This bias has been attributed to lower levels of funding and the higher comparative cost of conducting research in marine environments, which potentially accounts for the bias in the distribution of cost-effectiveness studies between marine and nonmarine environments (Levin & Kochin 2004;Parsons et al. 2014).

| Methods used to assess effectiveness
Although the research focus on conservation effectiveness evaluation has grown, it has been argued that robust methods for measuring impacts are not routinely applied (Baylis et al. 2016). For example, one systematic review of the effects of decentralized forest management on poverty and deforestation found that no interventions employed a randomized treatment, and few studies used suitable comparators (Samii et al. 2014). Broadly, some study designs are thought to provide more accurate estimates of changes in environmental status than others. For instance, one simulation study explored the accuracy of methods used to evaluate an environmental impact on a population's density . This simulation suggesting that "before-after control-impact" (BACI) designs performed better than "randomized controlled trials" (RCT), followed by "before-after," "controlimpact," and "after" designs. Although RCTs are generally considered robust, they require the ability to allocate treatments randomly, which may be impractical for some types of conservation intervention. Given this, costeffectiveness studies using BACI designsthat can account for differences between control and treatment sites, and background time effects experienced across sitesshould be given greater weight as evidence compared to simple study designs. However, BACI study designs are not always practical, such as where interventions occur in unique habitats or large protected areas.
Since the optimal choice of methods can be contextspecific, we could not fairly assess how the use of different methods might affect the robustness of the study conclusions. This was further complicated by the fact that articles often did not articulate their methods in sufficient detail. As such, it may be more appropriate to examine the quality of cost-effectiveness assessments by intervention typeas was done in one study looking at the quality of cost-effectiveness assessments within terrestrial protected area acquisition (Armsworth 2014)rather than across types.

| Study considerations
Several points should be considered when interpreting the results of this study. First, the exclusion of grey literature from the review may have omitted an important source of information about cost-effectiveness, especially if produced by practitioners with access to real cost data. However, the challenges with accessing, evaluating, and using grey literature in these types of reviews have been widely discussed in the evidence synthesis literature (Mahood et al. 2014;Adams et al. 2016). Nevertheless, future studies should consider how these challenges could be resolved to ensure valuable cost-effectiveness information contained within the grey literature can be added to the broader cost-effectiveness evidence base. This reporting could be facilitated by a database of costeffectiveness evidence, as suggested above, that could be a repository for both peer-reviewed and grey literature. While recognizing the variable quality of grey literature studies, such a database could provide multiple benefits, such as facilitating transparent reporting to conservation funders. Equally, grey literaturewhich can include value-based arguments that are sometimes discouraged in academic researchmay be well placed to help decision-makers evaluate normative trade-offs (Davidson 2017). In other words, grey literature may help guide the comparison of "apples and oranges" during conservation prioritization in ways that are sometimes restricted in peer-review research.
Second, the requirement that studies explicitly mentioned costs may have excluded some relevant studies that used alternative terminology. The introduction of clear reporting standards in cost-effectiveness studies could aid future reviews of this type. This is particularly important given that the lack of a standard lexicon may also make it challenging for decision-makers to access all relevant evidence relating to conservation cost-effectiveness. This challenge thus emphasizes the need for routine and standardized reporting of cost-effectiveness data (Cook et al. 2017;Iacona et al. 2018).
Third, our review excluded ex-ante studies, which were commonly encountered at the screening stage. Although these are useful for conservation planning, they omit unexpected costs and benefits and are dependent on a researcher's understanding and assumptions about a given system. As such, it was deemed that their inclusions may have unduly exaggerated the size of the reliable evidence base.
Fourth, the search string was only in English. Consequently, potentially relevant articles in other languages were excluded from the analysis. This is a challenge for evidence synthesis in conservation more broadly, where the majority of reviews are limited to studies published in English, but over a third of studies are published in other languages (Amano et al. 2016). However, nearly a third of the articles included in our study had lead authors in institutions in non-English speaking countries, suggesting our sample captured a broad assessment of studies conducted globally. Furthermore, the geographical distribution of studies was similar to that found in conservation research generally (Di Marco et al. 2017). This observation suggests that the distribution we observed may be accounted for by broader geographical publishing biases, rather than being an artifact of using English only articles.
Finally, one individual performed the screening and data extraction steps. Established guidelines recommend that these steps be performed by multiple people independently, and the statistical testing of agreement between them, to reduce the bias associated with individuals interpretations (e.g., Collaboration for Environmental Evidence 2018). However, study resource constraints meant that only one person was able to perform these steps, so the results may partly reflect TP's interpretations.

| CONCLUSION
Cost-effectiveness based prioritization offers multiple potential benefits. At a project level, it may help practitioners decide which interventions to choose, and donors to pick which activities to fund to make the best use of scarce resources (Cook et al. 2017;Iacona et al. 2018). More generally, this evidence could be used to understand the causes of variation in cost-effectiveness, flag potential investment risks, and suggest opportunities to reduce costs and improve conservation outcomes. Furthermore, combining evidence across an ensemble of interventions might indicate how cost-effective conservation is at progressing towards broader conservation goals, such as halting species decline or contributing to their recovery. Yet, despite repeated calls, the evidence required for such prioritization remains sparse and patchy, with few studies of common interventions in areas of high conservation value. This paucity of evidence may reflect the many barriers and limited direct incentives for publishing robust cost-effectiveness studies, particularly for practitioner organizations. However, some of these challenges are also faced during effectiveness-based prioritization, and so are not unique to cost-effectiveness approaches. Other challenges could be tackled by adopting standardized reporting techniques. Overcoming these challenges would contribute to improvements in conservation impact, helping turn the tide of global biodiversity loss.

ACKNOWLEDGMENTS
This project was funded by a Tier 2 grant from the Ministry of Education of Singapore MOE2015-T2-2-121, which supported the salary of Thomas Pienkowski and Luis Roman Carrasco. Thomas Pienkowski was also supported by the Natural Environment Research Council (grant number NE/L002612/1) at the University of Oxford. Carly Cook is supported by an Australian Research Council -Discovery Early Career Researcher Award fellowship. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. The authors have declared that no competing interests exist.

CONFLICT OF INTEREST
The authors declare no conflicts of interest.

DATA AVAILABILITY STATEMENT
The data that supports the findings of this study are available in the supplementary material of this article.

ETHICS STATEMENT
This research adhered to the National University of Singapore Institutional Review Board standards. Following these standards, no formal ethical review was sought as there was no interaction with human or nonhuman subjects.