Residential retrofit in the climate emergency: the role of metrics

This paper examines whether current residential retrofit metrics are fit for purpose and if they can help deliver swift and significant cuts in carbon emissions. Information is presented on metrics used for a variety of UK and European Union building and building retrofit standards and evaluation and assessment tools. An analytical approach is developed that offers a simplified set of four key aspects of metrics: scope, headline measurement, normalisation factor and timescale. This helps to unpack the complexity of metric design. However, choice of metrics is not simply a technocratic issue, because their design is not value free. Two examples where metrics form the basis for policy-making for retrofit and energy use in buildings are described: UK Energy Performance Certificates and the Energiesprong approach to deep retrofit. Use of multiple metrics improves their fitness for purpose and is already established practice in some standards and policy. Metrics in common use omit many aspects of energy use in buildings. New metrics are required that can take account of the whole life of a building, the time profile of retrofit, or the ability of the building to be flexible as to when energy is used. Policy relevance • Existing and new metrics can contribute to the transformation of the building stock. They have real-world impacts on buildings, those retrofitting them and their occupants. • Retrofit metrics embody values and views about how retrofit should be undertaken. • Unpacking metric design and considering scope, headline measures, normalisation factors and timescale separately can help inform better policy decisions. • There is no one ideal metric for building retrofit—many policies and standards use multiple metrics. • A focus on carbon metrics only for retrofit can lead to missing opportunities for high-quality building fabric. Energy metrics remain important.


Introduction
In 2018, the Intergovernmental Panel on Climate Change (IPPC) released its report on the impacts of global warming of 1.5°C (IPCC 2018). It called for 'rapid, far-reaching and unprecedented changes in all aspects of society' to reduce the risks of increasing climate change. In response, the European Union (EU), individual countries, and parliaments and many levels of local and regional governments have declared ' climate emergencies' (Climate Emergency Declaration 2019). However, there are very few countries assessed as having policies in place consistent with 1.5C of warming, and no EU country currently meets this standard (Climate Tracker 2019). The UK's independent Committee on Climate Change (CCC) has judged that the country is not currently on track to meet intermediate targets leading up to its 2050 target for greenhouse gas (GHG) emissions reductions (CCC 2018). The European Environment Agency (EEA) judges that EU countries and the UK need make a significant increase in efforts over the next decade if GHG emissions reduction, Fawcett, T., & Topouzi, M. (2020). Residential retrofit in the climate emergency: the role of metrics. Buildings and Cities, 1(1), pp. [475][476][477][478][479][480][481][482][483][484][485][486][487][488][489][490] do justice to all this multi-scalar complexity, but outlines below what it means by 'residential retrofit in the climate emergency'.
Meeting zero or net-zero carbon emissions targets in the residential sector is likely to mean the following for retrofit: • Energy: • Current fossil fuel energy systems, e.g. gas and oil boilers, will be phased out.
• Many homes will add building-level renewable energy generation.
• There will be different local responses, targets and combinations of measures depending on building stock, ownership, geography, supply of low/zero-carbon energy sources, socioeconomic factors and others. • Scale and ambition of retrofit: • The scale of retrofit in terms of both the number of buildings treated per year and the standards to be achieved will increase hugely. • Almost all existing homes will need retrofitting to reduce their demand for heat (space and hot water), cooling and lighting. • Increased focus on quality to reduce the ' design/energy performance gap' (Sharpe 2019;Gram-Hanssen & Georg 2018) and ensure the necessary energy and carbon emissions reductions are delivered. • If retrofit requirements are to be avoided in the new buildings built between now and 2050, these will have to be constructed to considerably higher standards than at present (to meet the EU's nearly zero-energy building standards; Ipsos & Navigant 2019). • Timing: • Some retrofits may achieve the necessary standard in one intervention, but many homes are likely to be retrofitted over time in a staged process. • Not all retrofitted buildings will achieve a zero-carbon emissions target by 2050; some will meet the target earlier.

Current residential retrofit metrics: literature review and analysis
The literature on metrics in general, and metrics for energy, buildings, building retrofit is now reviewed. In addition, detailed information is presented on the metrics used for a variety of UK building and building retrofit standards and evaluation and assessment tools. The aim is to identify key elements of metrics, current debates around metric choice, and to inform thinking about whether the climate emergency changes what is required of metrics.

What is a metric?
Metrics can be defined as systems or standards of measurement which can be used to assess the performance, progress or quality of a plan, process or product (based on OED 2019; Business Dictionary 2019). O'Brien et al. (2017) note that different authors use different language, with some making a distinction between 'performance metrics' and 'simple metrics'. Here, the term 'metrics' is used for all types of building retrofit metric. Metrics are usually quantitative, and their objectivity, reproducibility and transparency make them attractive evaluation criteria. They allow comparison across projects and countries and across spatial and temporal scales (Pringle 2011). O'Brien et al. (2017), building on earlier work, identify the characteristics of a good building performance metric. They suggest it should be: fit for purpose, reproducible, easy to obtain, comparable, quantitative, accessible (i.e. easy to interpret), actionable and unbiased. Many of these characteristics are uncontroversial and unlikely to change in response to the climate emergency. However, some are worthy of further interrogation, either because they are disputed in the literature, in the case of 'unbiased', or because their interpretation may change in light of changing retrofit practices.
Unbiased is defined as ' a good performance metric offers a neutral indication of a building's performance and does not intentionally or unintentionally mislead a metric's users' (O'Brien et al. 2017: 377). This issue of neutrality is disputed. Hitchin (2018) offers a very thorough discussion of the variety of options for reporting primary energy use in buildings, which illustrates the complexity involved. As he notes, energy metrics ' contain a mixture of technical, political and economic dimensions ' (p. 198) and as such different national or organisational conventions in reporting should not be surprising. Fairey & Goldstein (2016) examine building energy-efficiency metrics in the US context. In particular, they look at the metrics for different fuels used in buildings (e.g. primary or delivered energy, carbon emissions per kWh), and conclude that there is no 'value free' choice of metrics-which is why these metrics have been 'very controversial' in the US over the past 40 years. They suggest it is important to recognise what metrics are being used for and which values they incorporate and whose values they reflect-whether that be cost or carbon reduction, energy-efficiency improvement or other outcomes. Similarly, Estrella Guillén et al. (2019) argue those using building benchmarks should first define their motivation, and then carefully choose the comparison metrics. Thus, rather than assuming that metrics are or can be unbiased, it would be better to analyse and recognise the biases implicit in any choice of metric. While metrics should be objective, that is, based on observable phenomena, this is not the same as being neutral.
Metric characteristics whose interpretation may change in the climate emergency are ' easy to obtain' and ' accessible'. 'Easy to obtain' and ' accessible', where accessible means readily understood, are relative terms. To date, residential lowenergy retrofit has been the preserve of a relatively small group of building professionals, particularly so in the case of deep retrofit whose rates have been very low (CCC 2019; Fawcett & Topouzi 2019). Building-level metrics may require considerable training and knowledge to interpret, and as building retrofit becomes a necessary part of the everyday work of the whole repair, maintenance and improvement sector (Killip 2013) consideration needs to be given as to whether metrics meet these requirements for their new audiences. Since nearly all housing will be retrofitted to some degree-at the very least to remove fossil fuel heating systems-metrics used in policies, programmes and projects have to be accessible, comprehensible to many actors, not to a specialised few.

Metrics for energy use in buildings
Buildings are complex both in the variety and variability of services they deliver to people and organisations and in terms of how their energy use and environmental impact can be understood. Because buildings incorporate many measurable characteristics, and deliver many important services, combining building qualities with energy measurement results in a vast choice of metrics. More complex buildings may have hundreds of performance objectives (Costa et al. 2013) and several hundred building performance metrics are available in the scientific literature (O'Brien et al. 2017). There is a wide range of different combinations of indicators, as well as individual indicators, available to judge buildings' sustainability or environmental impact (Lützkendorf 2018). Ade & Rehm (2020) give a good account of the decisionmaking process informing the choice of categories, metrics and weightings that created key structural elements of leading building rating tools. Given this complexity and variety, which exists for good reason, there are few calls for adoption of a single metric for building performance of retrofit. Authors have rather suggest that multiple metrics are needed whether the building level (Fairey & Goldstein 2016; Stevenson 2019), or for the whole energy system (Kraan et al. 2019). There are calls for development and reporting of additional metrics at national level (IEA & IPEEC 2015).
Residential buildings are less complex than commercial buildings. For example, the former have simpler heating, ventilation and air-conditioning systems and controls. When metrics are used to define and deliver performance for residential buildings, these are typically fewer than for other building types. However, retrofits have additional complexity compared with new build that lies in the fact that one metric does not fit always the purpose of tailored retrofit solutions.
To understand the choices that can be made about metrics, in particular metrics for ambitious residential retrofit, the focus is on setting out key choices for four aspects of metrics: • Scope: which stages of a building life cycle are included, and which uses of energy.
• Headline measurement or calculation: typically an energy or carbon measure.
• Normalisation factor: the building/occupant/environmental or other factors (e.g. per m 2 , per occupant, per heating degree-day) by which the headline measurement may be normalised. • Timescale to which the metric applies, and at which point it is applied.
There are other relevant characteristics, e.g. whether a metric is measured or modelled (Mallaburn et al. 2019), not considered in detail here.

Scope
Scope here means which aspects of a building's energy-related GHG emissions are included within the metric. Two key decisions have to be made: which stages of a building's life and which energy end uses are included.
In standards that use a life cycle assessment method, the life stages of a building are described as: product stage, construction process stage, use stage and end-of-life stage (Gervasio & Dimova 2018). Energy uses in stages other than the use stage are often referred to as embodied energy. As buildings become more energy efficient, there is increased focus on the significance of embodied energy and GHG in buildings and building retrofit (Lützkendorf et al. 2015;Parkin, Herrera, & Coley 2019;Schwartz, Raslan, & Mumovic 2018). Röck et al. (2020) show that there has been a global escalation of the contribution of embodied GHG emissions in both residential and office buildings: from approximately 20% to about 50% in new advanced buildings, surpassing 90% in extreme cases. This relative increase in embodied GHG emissions is mainly because operational GHG emissions have dropped in the transition from existing buildings to buildings with new and advanced standards. In terms of retrofit, the life cycle carbon footprint and similar whole life approaches are being used to explore whether replacement or refurbishment of buildings is environmentally preferable (Schwartz et al. 2018). There is considerable debate about whether and how embodied energy should be included in metrics, standards and policies.
There are also choices to be made about which portion of 'in use' energy should be included in metrics, and for what purpose. For example, metrics may cover only the energy uses that are tied to the building rather than occupant, e.g. fixed heating/cooling/lighting, and not other electrical equipment.

Headline measure
The headline measure relating to energy use may be a measurement of the energy sources used in the building, or the carbon or GHG emissions generated as a result of energy use, or a hybrid metric including one of these measures (e.g. energy-cost as used in UK EPCs-see below).
If an energy measure is chosen, there are further choices to be made as there is a range of metrics used for energywhich serve different purposes and offer different perspectives. For example, the UK government supplies figures on three different bases in its main statistical series: primary fuel input basis, final consumption-energy supplied basis, and final consumption-useful energy basis (BEIS 2019a). Each approach to energy accounting also includes multiple options. Hitchin (2018) offers a very thorough discussion of the variety of options for reporting primary energy use in buildings.
There is a variety of views on which headline measure is preferable. Williams et al. (2016) argue that to make progress with actually building zero-carbon/energy buildings, that energy should be favoured for standards over carbon, and that many lifecycle issues should be put to one side. Eyre (2019) makes the point that energy remains an important metric in the energy transition and that a focus on carbon alone will not be sufficient. Kraan et al. (2019) argue that the changing energy supply system means that primary energy is a less relevant element of a metric than delivered energy. In the context of revisions of Building Regulations in England and Wales, the choice of performance metrics of carbon and primary energy are strongly disputed by the London Energy Transformation Initiative (LETI), a network of over 1000 built-environment professionals. LETI's view is that ' carbon and primary energy metrics do not result in low energy homes' (LETI 2019). Its concern is that targets for new residential buildings can be met via the UK's falling carbon intensity of electricity, rather than improvements for the fabric and reduced energy consumption per m 2 . These authors are writing from different scales and perspectives and thinking about metrics for different purposes, so it can be misleading to compare them.
Interpretation of these metrics involves acknowledging that the relationships between delivered and primary energy (source or site), and energy and carbon is changing as electricity generation and the energy system changes (and this differs between countries, and regions in larger countries). The increasing electrification of heating will change the efficiency with which heat is provided (Eyre 2019), and this too can change the meaning of energy-related metrics. The headline measure or measures need to be chosen to reflect energy system context, to fit the purpose for which they are used, and to be kept under review as the energy system changes. There are no universal 'right' or 'wrong' metrics, all give different insights. As demonstrated below, a combination of metrics can often give greater clarity than a single metric.

Normalisation
Each headline measure may be expressed for the building as a whole, per m 2 , per m 2 of conditioned space, per heating or cooling degree-day, per occupant, per occupant day and so on. Many different normalisation factors can be found in metrics and indicators (Nikolaou, Kolokotsa, & Stavrakakis 2011) and each has its own definitional and measurement challenges. When comparing actual and predicted energy use, understanding the normalisation factors and methods used is critical. In a net-zero-energy buildings study testing the method of normalisation of energy use, using both static or dynamic methods, the variation between predicted energy use and actual measured highlighted the importance of the number and detail of parameters considered (Berggren & Wall 2017).
The need for large cuts in carbon automatically calls into question metrics that normalise energy or carbon emissions per m 2 or unit of economic activity. This is not to say these metrics no longer have value, but if they are sole metrics used, questions should be asked about their suitability. The value of normalisation in part rests on the quality of data available-poor-quality data will lead to misleading results. The bigger issue of whether normalisation is a useful element of metrics can only be answered in relation to the scale and type of decision they are designed to inform. For example, normalisation is not helpful for measuring progress towards national and international climate commitments, but remains important for individual household projects and comparisons (Figure 1 and Table 1).

Timescales
There are different aspects of the timescale to which a metric can apply, and choices to be made. The first is how much of the building life cycle the metric applies to (as discussed already under section 3.2.1)-whether the metric focuses on annual energy in use, or the total lifetime energy. This also raises the issue of expected building lifetime (estimated lifespan). There is also increasing government and energy company interest in peak electricity demand in terms of time of day and the capacity for temporal flexibility across scales from seconds to seasonally (BEIS 2017). This fits into broader discussion of what flexibility is and how it emerges from socio-technical systems (Torriti & Green 2019). There is currently no consensus on how to quantify building energy flexibility (Johra et al. 2019). The question for buildings is how much of that flexibility can or should be provided by this sector, and whether metrics can be developed to encourage greater flexibility. Ozkan et al. (2019) suggest is it important to consider the length of time buildings can provide thermally comfortable habitable space passively if they lose their electricity and energy supplies. Finally, current retrofit metrics generally consider the end point of the process at the delivery stage of the project. As much retrofit occurs over an extended period of time, there are arguments that policy support for staged retrofit is important (Fawcett 2014;Fawcett & Topouzi 2019). The development of the Building Renovation Passport in Europe, which records retrofit changes over an extended period, shows support for this idea (EuroAce 2018; Fabbri 2017).
Most current metrics focus on annual energy use, rather than the other time periods listed above ( Table 1). This suggests that as buildings are expected to play different roles in the energy system-no longer as just sources of demand-and as retrofit happens in a greater variety of time patterns, that new metrics may be needed.

Metrics and scale
Metrics that relate to building energy use have been developed for a wide range of purposes-as the basis for building and building component standards, to guide investment choices, to deliver various sorts of change at a range of scales. The variety of metrics and combinations of metrics in use also reflects the many different stakeholders involved, whether governments, commercial buildings owners, householders. In general, the closer to actual retrofit delivery, the more metrics are needed. Therefore, a post-occupancy evaluation of a renovated building requires a lot of detailed metrics  to be effective (Stevenson 2019), whereas a national target does not. Figure 1 summarises this understanding of the changing granularity of metrics relevant at different scales.
National commitments or legal requirements to meet GHG reduction targets are simple to state, they are generally absolute emissions reduction commitments by a certain date. Buildings in different economic sectors or of different types, require more detailed and differentiated metrics, to recognise their different purposes, technical characteristics, energy end-uses, occupancy and ownership patterns, and so on. For individual buildings, many metrics can be used to set standards and judge performance of the retrofit-the number and sophistication of those chosen depend on a host of factors, not least whether the metrics are used for policy purposes or for delivering building performance. Exemplary buildings, where exceptional performance is required, meet more exacting standards for many different elements of retrofit design, construction and operation, requiring additional metrics. As the number and specificity of metrics increases, so too does the information requirement. Table 1 presents summary information about the metrics used by several EU and UK regulations, standards and evaluation/assessment tools. These all apply at the scale of individual or exemplary individual building and apply to annual energy use. The scope of these metrics is operational energy use, either for all energy or for particular end uses or groups of end uses. These metrics are normalised per m 2 . Some of these regulations/tools include additional metrics which relate to energy use, e.g. the Passivhaus standard include requirements for building pressurisation tests (air flow) and overheating modelling (Passivhaus Trust 2020). Table 1 demonstrates that in commonly used standards and evaluation methodologies, multiple metrics are used: none relies on one energy-related metric. It also demonstrates that several different metrics are currently in use in policy-making and for voluntary standards.

Current metrics and the climate emergency
This brief literature review and analysis has demonstrated that there is no one metric, or set of metrics, around which consensus has developed in relation to ambitious retrofit of housing at the scale of individual dwellings. Numerous issues are in dispute, and these link to disputes about values and priorities, and to what important services buildings deliver, to whom. Disputes are also founded on ideas about how retrofit should be carried out, and to different views about the future of the energy system, how quickly it will decarbonise, and therefore by how much and by when building energy demand should reduce. In addition to this complex set of issues, the climate emergency poses additional challenges to metrics: to be relevant to the whole housing stock and staged retrofit, to be accessible, to be flexible enough to adapt to the changing energy system, to address a range of timescales, and to help close the energy performance gap. The next section looks at two examples of how existing metrics are being used in policy and projects, and the issues that arise.

Examples of metrics in policy and practice
To explore further the themes introduced in the previous section, two examples are provided of building-level metrics and their use in policies, initiatives and projects. The first, more extensive, case describes the debates around expansion of EPCs to more policy areas, focusing on the UK. The second looks at an innovative approach to retrofit being used in several European countries, and the place of metrics in delivery. Both examples are using energy performance and cost metrics with the former though based on estimated energy use and the latter on real measurements.

Energy Performance Certificates (EPCs) in UK policy
The EPC, and its underlying cost-energy metric, is playing an increasingly important role in UK policy on residential retrofit. It is a metric that can be used at different scales from national level to individual building (Figure 1). However, there are concerns that it may not be fit for its enhanced role in policy-the evidence and debates are summarised below.
EPCs were introduced by the EU in the Energy Performance of Buildings Directive (EPBD) in 2002 (Directive 2002/91/ EC) and their legal status was enhanced in the EPBD recast in 2010 (Directive 2010/31/EU). The main aim of EPCs is to serve as an information tool for building owners, occupiers and the property actors when a building or building unit is sold or rented (BPIE 2015). They can also identify ways in which the energy consumption of buildings and associated costs can be reduced, leading to improved energy performance of buildings (DCLG 2011). There is evidence EPCs do influence purchase, rental and renovation decisions (e.g. Charalambides et al. 2019). EPCs are in place across EU member states, and national algorithms are used to place properties in bands A (or A*)-G, with A being the most efficient and G being the least (Hardy & Glew 2019). The UK EPC is an ' asset rating', that is, it is concerned about the construction of a building, the levels of insulation, the installed heating and hot water systems and their control, and fixed lighting, irrespective of the occupants or their behaviour. It does not measure the actual energy use of a building. The EPBD allows national authorities to choose between an asset rating or measured energy consumption.

UK implementation of EPCs
In the UK, EPCs for residential property were made compulsory in 2008 and are needed whenever a property is built, sold or rented and are valid for 10 years (DCLG 2017). However, the building owner or landlord is under no obligation to act on the recommendations for energy improvements to the building.

An EPC contains:
• information about a property's energy use and typical energy costs; and • recommendations about how to reduce energy use and save money.
The focus in this paper first element of EPCs (there are separate debates about how to improve the recommendations and communication elements of the label; e.g. Taranau & Verbeek 2018). UK EPCs contain three metrics: the energyefficiency rating, environmental impact rating and primary energy per m 2 . The energy-efficiency rating is the basis of the A-G property ranking, and the most influential element of the EPC. It is the metric of focus here.
The algorithm used to calculate energy performance for residential buildings is known as the Standard Assessment Procedure (SAP) for new dwellings (BRE 2014) and Reduced Data Standard Assessment Procedures (RdSAP) for existing buildings (BRE 2019). It is an energy cost index: it combines energy consumption, energy-efficiency and fuel prices into a single number-the cost to achieve a specific space heating regime, and provide adequate hot water and sufficient lighting, divided by the dwelling's total floor area (i.e. £/m 2 ). It was devised to allow potential purchasers or tenants the ability to compare the cost of running dissimilar homes, and thus incorporates information on both the environmental impact and affordability of energy within homes. This metric choice, rather than, say, energy/m 2 , is reflective of the importance of fuel poverty in UK residential energy-efficiency policy (Boardman 2010;Rosenow, Platt, & Flanagan 2013). Debates about the strengths and weaknesses of this metric are both longstanding and ongoing (e.g. Boardman 2007; Scottish Government 2019). Elmhurst Energy (2020) suggest that many of these debates could be resolved by using the EPC in combination with both a measure reflecting households' expected use of the property, and metered energy data.
Different choices about underlying metrics, as well as how information and advice is generated and displayed, have been made in other European countries (BPIE 2015). For example, the Republic of Ireland's Building Energy Performance label is based on the calculated total primary energy requirement for heating, hot water (minus energy supplied by any solar water heating system), lighting and heating system pumps and fans (SEAI 2012). Similarly, in the Netherlands, the label is based on theoretical building-related energy usage, which is the sum of total primary energy for heating, domestic hot water, pumps/fans and lighting in common areas minus the energy gained from solar panels and cogeneration (van den Brom, Meijer, & Visscher 2018).

Expanding the use of EPCs
The EPC has seen its purpose extended so that it is used to: • set standards for social housing landlords in Scotland (Alembic Research, Energy Action Scotland, & Waterfield 2019); • assess the eligibility for energy company obligation programmes and calculate savings for some of these utility schemes (Ofgem 2015); • determine eligibility for the domestic renewable heat incentive and for the payment calculation for some renewable installations (Ofgem 2018); and • set minimum energy-efficiency standards for the private rented sector in England and Wales prohibiting new leases on properties from 1 April 2018 with an energy performance rating of F or G, extending to all private rentals from 1 April 2020 (BEIS 2019b).
There are plans to extend its reach still further. The UK government's Clean Growth Strategy uses EPC banding as the metric of stock performance, aspiring to the aims of all fuel poor homes being upgraded to EPC Band C by 2030 and ' as many homes as possible to be EPC Band C by 2035 where practical, cost-effective and affordable' (BEIS 2017). The Scottish government is consulting on proposals to set a standard for energy efficiency and make it legally binding on homeowners from 2024 onwards, with a minimum standard of EPC Band C (Scottish Government 2019). Expansion of the role of EPCs is also occurring in other European countries, including the Netherlands where, for example, social housing landlords are setting improvement targets based on EPCs (van den Brom et al. 2018).
There are concerns about expanding the use of EPCs to be a major force in public policy. Inaccuracy is a key issue, despite the UK government's quality standard which requires that 95% of a sample of assessments yield EPC ratings within 5 EPC points of the 'truth' (DCLG 2011). Hardy & Glew's (2019) analysis of residential EPCs suggests that errors identified in EPCs cause an approximate difference in energy-efficiency rating of 4 points, which would result in 30% of homes being placed in the wrong EPC band. They conclude that the volume of errors present in the data suggests much greater care should be taken when using EPC data. Crawley et al. (2019) undertook novel statistical analysis using repeated EPC assessments of 1.6 million existing dwellings in England and Wales in order to quantify the uncertainty in the process of generating EPC rating. They concluded that uncertainty generally was greater than that in the UK government guidance and decreased with increasing building energy efficiency. Their analysis predicted that 24% of E dwellings may achieve a D by chance, and 15% of D dwellings may achieve a C by chance, highlighting the potential for misidentification of properties. Jenkins, Simpson, & Peacock's (2017) smaller scale study also demonstrates that the level of quality, and outputs, from a standardised EPC energy assessment can be variable. Concerns about using the EPC band for increasing policy purposes, given current (un)reliability, are shared more widely (Alembic Research et al. 2019;Pasichnyi et al. 2019;Scottish Government 2019).
Despite all these evidence-based concerns, a move away from the existing system of measurement would be a difficult choice to make. This is because the existing system has generated the best energy-efficiency data set there is on UK property (with almost 19 million EPCs having been issued; MHCLG 2020), has thousands of trained assessors, and is already integrated into legislation and social landlords' strategies. The list of properties of ideal metrics by O'Brien et al. (2017) discussed above suggests both 'reproducibility' and ' easy-to-obtain' are important qualities. However, in the real world, these may need to be traded off against each other. Combining better quality EPC measurement based on the existing metric, with additional metrics to reflect energy in use and other issues of concern, would be a constructive way ahead.

Energiesprong
As mentioned above, there is a lack of ambitious or ' deep' retrofit throughout Europe. In the context of the EU Building Stock Observatory, ' deep' retrofit is defined on the basis of primary energy savings over 60% (European Commission 2019b). However, this definition is not in universal use, and deep retrofit is variously defined in different contexts, programmes and projects using different metrics (Fawcett 2014). Examples include the international Passivhaus refurbishment standard ( Table 1) and Retrofit for the Future, a UK innovation project, where performance targets were based on carbon and primary energy per m 2 metrics (Retrofit for the Future 2011). This need to define project-level metrics and targets continues today with the Energiesprong approach to retrofit.
Energiesprong (meaning ' energy leap' in Dutch) is an innovative approach to whole house retrofit first piloted in the Netherlands in 2013. The approach uses a set of standards that add up to a net-zero housing retrofit solutions with performance guaranteed, for 30 years (Friedler & Kumar 2019). The aim of the programme is to facilitate a selfsustaining market for net-zero-energy homes, delivered by a market intermediary (Fawcett & Topouzi 2019) and reduce the time of retrofit to under one week using off-site manufacture and modularisation (Brown, Kivimaa & Sorrell 2019) limiting occupants' disruption during works. A comprehensive, whole house retrofit is funded with a whole-life net-zero financing model, where the cost is covered by energy savings and reduced home maintenance costs (Friedler & Kumar 2019). The Energiesprong business model involves: a net-zero-energy performance contract based on annual energy balance; an integrated and industrialised supply chain; a single customer interface; a financial model based on the performance contract, and coordinated governance of these elements aided by the market development intermediary (Brown et al. 2019).
Although Energiesprong covers the ' exemplary' buildings category, its approach to retrofitting and finance makes it more readily suitable to scaling up, especially in the social housing sector for buildings with similar typology allowing landlords to build up on energy and maintenance savings over 30 years. Increasing the number of retrofitted homes can enable the cost of an Energiesprong retrofit to fall to £50,000, the point at which social landlords should be able to self-finance these retrofits and enable scaling and reducing costs further (Friedler & Kumar 2019). Within Europe, Energiesprong is currently renovating or has plans to renovate homes in the Netherlands, France, the UK, Germany and Italy, with 5000 homes completed in the Netherlands (Energiesprong 2020).
The long-term performance guarantee is novel for residential retrofit. It differs from more familiar energy performance contracts, which enable funding of energy-efficiency upgrades from running cost reductions (European Commission 2020b), as it explicitly promotes net-zero solutions and is in place for much longer. To meet the necessary standards, additional metrics are used to ensure high-quality performance is delivered. For example, following experience of problems with early projects, air tightness tests were introduced as a standard part of delivery (Energiesprong 2019). This approach involves metrics for the performance guarantee based on a technical set of performance standards and cost-based metrics on the in-use energy. The performance guarantee metric has an extended time scale based on real energy use, which is quite unusual in the residential sector.

Overall contribution
The overall research question is whether current building retrofit metrics are fit for purpose in the climate emergency. This is in a context of clear evidence that retrofit is not currently delivering energy and carbon savings from residential buildings at anything like the rate required to meet national (or international) goals. It is clear that metrics are an element of a set of social, technical, economic and policy arrangements within a complex system of regulations, standards and assessment tools that fail to deliver large-scale retrofit.
The aim of this paper is to consider what the climate emergency might mean for retrofit and how this interacts with today's metrics and elements of metric design. It has attempted to simplify the complexity inherent in metrics research by identifying key characteristics and debates, showing how these vary by scale and purpose, and discussing their relevance for the huge increase in scale and ambition of residential retrofit the climate emergency demands.
This analytical approach has been used to discuss the role of metrics in two real examples of metric use in policy and practice. There is more to be done, and this paper would only claim to be exploratory. However, the analytical approach developed here has helped identify several ways of improving both metrics and the debates around choice of metrics. This should support increased, high-quality retrofit in the energy transition based on metrics which are more fit for purpose.

Making explicit the values embedded in metric choices
Metrics can explicitly or implicitly embody a specific view about the future or about overall governmental or social goals. Disputes and debates about choice of metrics and combinations of metrics have been going on for decades (Fairey & Goldstein 2016) and continue today. This is because important principles and outcomes are at stake. However, the values embodied within metric design may not be transparent. Key choices about metrics are classified into four groups: scope, headline measurement, normalisation factor and timescale. All these choices affect what understanding and insights the metric offers. Different choices emphasise and may adhere to different retrofit approaches-such as fabric first or measure by measure (Topouzi et al. 2019) specifying the performance of individual measures rather than whole house approach. Structuring analysis of the effects of various choices should help elucidate what is actually under discussion and increase clarity about the trade-offs involved. Trade-offs cannot be avoided in the real world, but transparent analysis should help increase the quality of decision-making.

Scale, purpose and audience for metrics
Metrics operate at whole range of scales from international to individual energy-end uses in buildings (Figure 1). Not only do these metrics for different scales require different amounts of data, but also they are designed for different audiences and skills. Some are for policy-makers, others for building professionals or for householders-this makes very different requirements on the availability and accessibility of metrics. Purposes range from underpinning targets to meet international climate agreements, to ensuring an individual retrofit is to sufficiently high-quality, to meet energy performance guarantees. For individual buildings, the granularity of metrics is important for a range of purposes, from evaluating existing condition, planning the goals and design requirements, to assessing quality of construction and operational use throughout a building's lifetime. The cost-energy metric underpinning UK EPCs is now being used to deliver information and change at a wide range of scales-and as such it should perhaps not be surprising that its suitability is in dispute.

Combinations of metrics
Many reliable and replicable individual metrics are available at the building level, but it is the combination of metrics which is key to delivering urgent change. There is no one ideal metric. As Table 1 demonstrates, multiple metrics are in common use and meeting more than one set of targets is not necessarily problematic. At the building level, clients' and professionals' choice of metrics can vary at the initial appraisal stage of a building condition to the design stage and construction (Topouzi, Killip, & Owen 2017) helping them define and deliver the performance required for best practice (assuming that protocols for these are available).
For policy-making, the question arises as to what degree of complexity can policy cope with-not just in the design stage, but in implementation, monitoring and evaluation. Combinations of metrics will inevitably meet some goals and suit some situations and actors better than others. If retrofit metrics in public policy only cover the asset rating, then additional metrics and policies will be required to take account of other uses of energy and the occupant's interactions with buildings.

Choice of metrics
While this paper argues there is no one ideal metric, it is important to recognise despite the overall aim of retrofit being to reduce carbon emissions, energy remains an important metric. Reaching a zero-carbon society without considerable reductions in energy demand is impossible (Eyre & Killip 2019). When the carbon intensity of important energy sources is changing rapidly, carbon alone will not be a sufficient metric for most purposes.
Notably, different European countries have chosen different metrics to underpin the EU-wide policy tool of EPCs.

Change and development of metrics
There are widespread calls for accelerated rates of building retrofit and acknowledgement that this acceleration will be guided by government policy (CCC 2019; Eyre & Killip 2019). Speeding up policy development might suggest using existing metrics, such as those in the EPC, rather than designing new or better metrics. Can the UK EPC and its quality control system be improved and reformed, or might it be better to change the metric on which EPC and energy bands are based? This decision can be better taken by considering what qualities a metric needs (O'Brien et al. 2017) and using the analytical framings outlined in this paper of scale and of elements of a metric. As the Energiesprong example shows, meeting exemplary retrofit standards requires additional use of metrics in delivery not only of building quality but also to communicate the energy performance promise to householders. The cost metric in this case differs to the EPCs' approach as it sets a benchmark for occupants' energy use practices and also guarantees performance metric at repair and maintenance stages in a timescale of 30 years after the project's delivery.
Buildings that offer new services to the energy system, such as load flexibility at peak times (Mallaburn et al. 2019), will need new metrics. Developing new metrics can be a very time-consuming process-and given the urgency of action, this may need to be speeded up. Taking a learning approach to policy implementation (Janda & Topouzi 2015) may help to reduce the risks of faster metrics development and faster metrics-based policy-making.

Metrics and policy-making
Metrics form the basis of most energy demand-related policy-making (Rosenow et al. 2016): there cannot be meaningful minimum efficiency or energy consumption standards without metrics (although metrics can exist without standards). However, the ambition level of standards is not determined by the metrics on which they were based-and it is important to distinguish critiques of standards from those of the underlying metrics. This can be difficult, and the authors recognise the difficulty at national level of distinguishing metrics from policy, as metrics quantify the 'hero story' (Janda & Topouzi 2015) impact of government's policies.

Further research
There are several areas where further research is needed: • Investigating how metrics for embodied and operational energy use, and operational energy use beyond the asset rating, could be best combined or integrated to deliver useful information for decision-making. • Paying particular attention to the time profile of energy use and carbon emissions from a building's whole life cycle, building on work done by, for example, Röck et al. (2020) and developing suitable metrics to communicate this information. • Researching how metric development processes, and that of the standards and policies based on them, could be accelerated, and what the trade-offs between speed and quality might be. • Considering which new metrics and combinations of metrics for retrofit are needed in the energy transition. For example, metrics that involve users and operational energy use (social element in metrics) and quality of performance in a timescale that goes beyond the delivery of the retrofit project.

Limitations of this research
This research is based on a literature review, presentation of evidence from current policies and standards, developing a simple analytical framework and using it to gain insights into two examples of metrics in policy and practice. The number of examples considered was limited, and there was no consideration of alternative analytical frameworks, and whether these would be more appropriate to the research question. The paper's value lies primarily in the quality of argument developed rather than on the comprehensiveness of the literature review or new empirical evidence.

Conclusions
This paper explores whether residential retrofit metrics are fit for purpose in the climate emergency. To address this question, it brought together evidence and current debates from the literature, evidence from current policies and standards, and examples of metrics in policy and practice. Metrics embody compromises between goals, values and desirable characteristics such as accessibility and reproducibility, which may be in tension. Metrics can fit better or worse for the purpose for which they are employed, and may be used for purposes for which they were not originally designed. There is no perfect metric and this paper is not in search of perfection. Rather it offers insights and new analytical frameworks to enable choices about metrics to be made with more clarity, recognising their different purposes, the interests of their users and the trade-offs which are inherent in their design and implementation. In this way metrics can fit better to their key purpose in the climate emergency: to help deliver high-quality, high ambition, widespread residential retrofit. The analytical approach developed offers a simplified set of four key aspects: scope, headline measurement, normalisation factor and timescale. This helps to unpack the complexity of metric design. However, the choice of metrics is not simply a technocratic issue, because their design is not value free. Choices about whether metrics are based on energy or carbon, or the boundaries used for definitions, affect their meaning and impact in the world. Another important facet of design is the level of granularity required at different scales: national, building sector or individual building. Use of multiple metrics improves their fitness for purpose, and is already established practice in some standards and policy. This approach could be usefully expanded, particularly when established metrics are used for new purposes, with the inevitable compromises that entails.
Metrics in common use omit many aspects of energy use in buildings, particularly embodied energy and the interactions of people with buildings, as well as larger scale issues around provision of low-carbon energy infrastructures. Responses to the climate emergency will require new metrics which, for example, take account of the whole life of a building, the time profile of retrofit or the ability of the building to be flexible as to when energy is required. There is more to do to make the best of existing metrics and develop new metrics, in a timely way, to contribute to the coming transformation of the building stock.