Creating Domestic Building Thermal Performance Ratings Using Smart Meter Data

Energy performance certificates (EPCs) are ratings of domestic building energy performance mandated across the European Union. Their aim is to provide a reliable assessment of a building’s energy performance whilst accounting for non-building effects such as weather and occupancy. Current rating methods, based on theoretical calculations, can introduce significant error from an inability to estimate real building performance. Other methods using real energy data cannot isolate building performance from other effects due to low data resolution. The installation of smart meters in large proportions of the housing stock in European Union member states presents an opportunity. Harnessing high-resolution energy data can create or inform building energy performance ratings with reduced error and at scale. This critical review explores the challenges and opportunities of using smart meter data in building energy ratings, focusing primarily on quantifying the thermal performance of the building and heating system. The research gaps in this emerging field are identified, including: demonstrating that the rating is truly independent of the behaviour of specific occupants; the additional data inputs that add most value in combination with smart meter data; and reducing uncertainty whilst limiting the complexity of the measurement and calculation. Practice Relevance Increasing evidence shows current EPCs are unreliable. This unreliability can affect their usefulness to householders and the provision of evidence for policy decisions. The incorporation of metrics constructed from smart meter data can provide a rating of building thermal performance that better reflects the actual performance of a dwelling. The potential advantages of incorporating smart meter data would improve the reliability of building energy ratings and quantify the rating uncertainty on a per dwelling basis, which would be useful for risk assessment to inform finance and retrofit decisions. Technical challenges are identified and explained for the inclusion of smart meter data. These are summarised as follows: ensuring that ratings remain independent of occupant behaviours/practices; and identifying which additional data inputs increase reliability and enable more informed retrofit decision-making whilst keeping the rating cost low and the calculation complexity tractable.


Introduction
Domestic buildings 1 are estimated to contribute around 22% of global energy consumption (IEA 2018) and 12% of greenhouse gas emissions (IPCC 2014). Governments around the world have made ambitious commitments to reduce emissions (UNFCCC 2015), and the requirements to reduce energy use in buildings are considerable (Grove- Smith et al. 2018). Alongside this is a concern to keep energy affordable (European Commission 2017), especially for businesses and vulnerable households, and to address the risks of energy poverty (Thomson and Snell 2013), for example, its health impacts (Liddell and Morris 2010).
Improving energy efficiency in domestic buildings has a role to play in addressing these issues. In the European Union (EU), an important policy instrument has been mandatory energy ratings for buildings, ascribed through energy performance certificates (EPCs). The stated purpose of EPCs is twofold: to give correct information to the buyer/tenant about the energy performance of the building (European Parliament 2010). Both purposes are to enable informed decision-making: the former regarding purchasing and letting; the latter regarding physical upgrades to the property. Furthermore, in some member states, EPCs have appropriated a secondary purpose: providing information for policymaking. EPC-type data are used in the analysis of trends in the energy performance of the stock (Laurent et al. 2013), for retrofit policy analysis (Cayre, Allibe and Laurent 2011), or for fuel poverty and energy efficiency targets (HM Government 2017).
There is evidence that these primary aims and secondary uses of EPCs are not being met in all states. A 2011 study collecting data from several European countries found variable household responses on whether the EPC was useful as a source of information on energy costs of their homes (the Netherlands and the UK reported negative results, whilst Danish and German homeowners were more positive) and a varying level of trust in the EPC document (Adjei, Hamilton and Roys 2011;Christensen et al. 2014). Concerning the EPC's role in giving practical advice, a British survey showed that 34% of respondents who were not considering or unsure about implementing any of the recommendations on their EPC did not agree with the recommendations (NHER 2009).
In terms of the appropriated secondary purpose of EPCs in policy-making, two British studies recently showed that trustworthiness or reliability issues could also be important at this level. Previous work (Crawley et al. 2019) estimated that one-quarter of 'band D' dwellings are misclassified as band C; furthermore, Hardy and Glew (2019) predicted that 36-62% of certificates contained an identifiable error.
The present paper brings together literature on smart metering and data analysis methods to assess whether the incorporation of metrics constructed from smart meter data can help improve the reliability of building energy ratings by measuring actual building energy performance. Smart meters are being installed across the EU with an estimated 2020 penetration of 43% (European Commission 2019). The paper will highlight the challenges that must be overcome in order to use these data effectively to provide more realistic ratings.
The paper is structured as follows. Section 2 outlines the current national methods of creating building ratings, reviewing their advantages and disadvantages. Section 3 reviews the current readiness of smart meter data for largescale applicability in ratings. Section 4 critically reviews the existing methods in order to construct performance ratings from smart meter data. Section 5 then draws out the research gaps and remaining challenges.

Building energy ratings in EU: current approaches, focus, and challenges
European Directive 2010/31/EU (European Parliament 2010) has largely shaped building rating methodologies in most EU countries, although countries such as the UK had national rating systems predating the directive (Williamson et al. 2006). Member states are required to construct a method that outputs annual energy use under normative conditions (i.e. independent of variations in occupant behaviour or weather), focusing primarily on the building's shell and its technical systems (Zirngibl, Visier, and Arkesteijn 2009). This must then be compared with a reference figure to ascribe a rating, that is, a judgment of the energy performance of a building, so that buildings can be compared with each other. Member states are also required to provide recommendations for the householder as to how to upgrade the energy performance of their home cost-effectively.
Arguably, these two aims could be served best by different types of metrics. By incorporating the 'normative' aspect, which implies a typical occupant but not any particular household, this can lead to a focus on the performance of the building and thermal systems and allow an effective comparison of one building with another. However, in providing advice to the occupants, it is less intuitive to remove the effect of the particular household occupying the dwelling, and it could be more useful to use a metric incorporating the preferences and behaviours of the occupants.
Two methodologies for determining a building's energy use are permitted: calculated and measured (European Parliament 2010). Within these broad methodologies, there is flexibility for countries to implement their own method. Figure 1 shows the range of EPC methods as a scale. The left-hand end of the scale denotes an approach based purely on calculations, with no real measured data. The right-hand side denotes pure measurement, with no interpretation. Every point in between denotes the combination of calculations (models) and data in different proportions. Note that the two ends of the scale are used for different purposes. The metric on the left-hand side can be termed an asset rating, a design rating which can be calculated before buildings are constructed, for regulatory and sale purposes. The metric on the right-hand side uses measured data and can be termed an operational rating, only suitable for use in existing buildings. A considerable body of evidence highlights the difference between design and as-built energy performance, sometimes referred to as a performance gap (Rafols 2015;Zero Carbon Hub 2010). Figure 1 will now be used to outline the rating approaches used in different countries.

Calculation-based approaches
Most EU member states have opted for the calculated approach to EPC production (Buildings Performance Institute Europe 2014), depicted on the left-hand side of Figure 1, in which energy use is predicted from tabulated input data about the (thermo)physical characteristics of the building, the climate and normative occupant behaviour. As a building progresses from design to construction and into operation, it is theoretically possible to insert more realistic ' as-built' and ' as-used' data to input into a calculation. As-built data at their simplest rely on a survey, which is a common method of creating EPCs for new and existing dwellings throughout Europe (Buildings Performance Institute Europe 2014). Member states each use their own calculation method; however, the general principle is a bottom-up approach in which components are modelled individually and some approximations made for wholebuilding attributes, such as airtightness and thermal bridges. The survey can be carried out by a qualified assessor (European Parliament 2010) or a householder who enters the information online (as is the case in Germany and the Netherlands). The challenge is that although a survey should be able to identify what is built, in reality most energy efficiency measures are not easily visible (e.g. the type of insulation installed within a cavity wall).
The calculated approach is widely assumed to introduce significant error into the rating; however, there has not been a systematic study of error across the EU. A report by the Buildings Performance Institute Europe (2010) described an estimated 20% overall inaccuracy arising from combining 30%, 5% and 10% errors in repeatability, calculation and use of default values, respectively. However, it provided no published source for these numbers, which are likely to vary between countries and respective assessment methods. Few member states have carried out assessments of the quality of EPC data collected under their own national methodology. The two British studies mentioned above both used analysis of repeated measurements: Hardy and Glew (2019) focused on the different identifiable sources of error and found disagreements on floor type, wall type and built form for the same dwelling, as well as apparent decreasing loft insulation over time. Crawley et al. (2019) focused on the resulting error on the rating, estimated to range from 3% to 18% for dwellings in bands B-E, respectively, and >18% in bands F and G.
Errors in energy performance ratings are typically attributed to the data-acquisition process, error quantification and propagation, and use of default values (Chapman 1991). An example of the sensitivity of calculations to changes in input parameters is demonstrated by Hughes et al. (2015). It is not trivial to decrease these errors, as collecting more data does not necessarily make the end result more accurate and can in fact increase the error, as argued by Chapman (1991). Furthermore, implementations that are too complicated and labour intensive attract criticism (Gratzl-Michlmair, Graf and Goerth 2010).
A further step towards incorporating actual data is to supplement the survey with physical tests of the dwelling's thermal performance (Figure 1). For example, France, Denmark and England require airtightness, measured through a pressure test, to be present in their National Calculation Methodology, the basis of the calculations for the EPC of a new dwelling (DCLG 2016;DKCESB 2018). Overall, however, this step is not a common feature of EPCs in Europe. Adding physical testing to design data and/or surveys increases the time and cost of creating an EPC. Nonetheless, the approach is indicative of an overarching trend of trying to improve the calculated methods by incorporating measured aspects of actual performance.

Measurement-based approaches
Consider the set of measured methodologies, shown on the right-hand side of Figure 1. At the very right of the scale are methods purely based on energy data and floor area, reporting, for example, kWh/m 2 heated floor area, but with no model applied to the data to provide normalisation for typical use or location. This is the case, for example, for pre-1948 French buildings in which the energy use is simply an average of the three previous years of metered data (Republique Française 2012).
It is difficult to compare one building with another using metered energy data only as the label describes the building in its specific location with its current occupants. Therefore, we now describe the ways in which member states use methods that account for this limitation by adding degrees of modelling and more input data to interpret and manipulate energy demand.
One step to the left of the data-only methods in Figure 1 are methods that apply modelling to the measured annual energy use, with the aim of standardising it for one or more variables. The most common adjustment made is to external temperature (Leipziger 2013). For example, German EPCs for apartment blocks use three or more years of heating and hot water energy data adjusted by a location-specific climatic factor (Deutsche Energie-Agentur 2018). Similarly, Swedish EPCs use metered annual energy use disaggregated into end uses and corrected for external temperature (Heincke, Jagemar and Nilsson 2011). The uncertainties introduced by the disaggregation process are described by Mangold, Österbring and Wallbaum (2015). These approaches correct for weather but do not separate out the effect of different occupants, and as such can be described as operational ratings.
The Swedish system also permits two more data-intensive options, the first of which fits within the current category of adjusted energy use, so is mentioned here. It aims to provide better representations of newer buildings by taking wind and solar effects into account and producing the rating from the comparison of monthly energy data from the specific building to a modelled archetype similar to the building in question (SMHI 2018).
The next step to the left in Figure 1 is termed ' derived thermal performance' since it allows distillation of the thermal performance of the building and its systems from the influence of the weather and occupants. The gradient of monthly energy data as a function of external temperature is used to form an ' energy signature' of a building. In theory, this gradient is invariant over different sets of occupants, since the method calculates the change in power requirements with every extra degree drop in external temperature. Thus, it is a measure of the thermal performance of the building and its systems, rather than a rating of the building operated by particular occupants.
This method has been shown to be associated with several technical problems. These include susceptibility to solar gain (Chambers 2017; Heincke et al. 2011) and the requirement of an assumption that internal temperature remains a constant function of external temperature to avoid the introduction of systematic error (Chambers 2017).
The energy signature method is permitted in Sweden (Heincke et al. 2011); nevertheless, it has rarely been used in practice owing to the difficulty of accessing monthly energy data in the absence of smart meters (M. Osterbring, personal communication, 2018).
In summary, this section has discussed a variety of rating methods currently implemented in EU member states under two broad methodologies and has highlighted their main issues. Methods based on measured energy use must separate out building performance from other factors, which is a non-trivial task. Calculated methods have the benefit of being able to control the variability of these other factors but have the problem that what is built can be different from what was designed/surveyed. Moreover, the two aims of allowing the comparison of one building to another, and providing recommendations to the occupants, may each be more suited to different metrics, excluding and including the specific influence of the occupants, respectively. The focus from hereon is on the first aim, and therefore concentrates on rating the thermal performance of the building and its heating system only. We return to and critique this choice of focus below.

Smart meter requirements
Given the challenges above, the potential for smart meter data to play a role in an improved rating system is the focus of the remainder of the paper. First, three basic data requirements are examined in turn: whether smart meters are present at scale in residential buildings; whether they record sufficient data to produce an EPC-like rating; and how accessible these data are to the party creating the rating.

Presence at scale
The EU had aimed for member states to replace at least 80% of electricity meters with smart meters by 2020 wherever it was cost-effective to do so (European Commission 2014a). The actual 2020 penetration is estimated to be 43% (European Commission 2019), with considerable variability among member states in terms of both penetration rate and the likely year to hit 80%. Italy, Sweden, Finland, Spain and Estonia had penetration rates >80% by 2017 (ACER 2018), while others such as Ireland, Greece or Germany had no smart meters introduced. The target date for Britain's (i.e. England, Scotland and Wales, but not Northern Ireland) rollout (which includes gas and electricity meters) has been pushed back from 2020 to 2024 (BEIS 2019). A few member states (such as Belgium and Latvia) have no planned legal target date set to introduce smart meters owing to a negative cost-benefit analysis of a rollout (European Commission 2014b). Clearly, the challenges of delivering large-scale national infrastructure projects that require visiting and replacing meters in every house are not to be underestimated (Cuijpers and Koops 2013; Zhou and Brown 2017).
Some uncertainties remain regarding which customers will have smart meters, for example, whether flats or multifamily houses which are currently metered at the block level are to receive dwelling-level smart meters. Gas meters are another uncertainty, with only some member states such as the UK and France deciding to roll out smart gas meters.

Data sufficiency
All smart meters are required to measure energy usage at a time resolution equal to that of member states' national electricity markets (European Parliament 2016b). For most member states, smart meters record data at 15-min resolution, with Britain, Ireland and France recording at 30-min resolution, and Sweden, Finland and Estonia recording at 1-h resolution (ACER 2018). The primary feature of smart meters that is relevant for energy performance ratings is, therefore, that they measure and record high temporal resolution electricity and/or gas consumption.
First, this means that the data captured by smart meters are sufficient for the accurate measurement of energy usage in buildings. Before smart meters, energy usage had to be determined from infrequent and irregular manual meter readings and ' corrected' to allow comparison over a standardised period, typically annual. Smart meters mean that this process of ' correcting' data is avoided, removing the introduction of errors associated with this extra modelling.
Second, the data captured by smart meters are sufficiently high resolution to be matched to publicly available open data about local weather conditions (external temperature, irradiance, etc.). This is a requirement of the methods for producing the operational energy ratings described below.

Data accessibility
Smart meter data are personal data and under the General Data Protection Regulation (GDPR) (European Parliament 2016a). This means that individuals have rights to access these data where they are being collected (Art. 15), and under certain conditions have the data ported to a third party of their choice (Art. 20). Following this, the EU Electricity Directive was updated in 2019 to includes several provisions for smart meter data, including that they must be processed in accordance with member state data protection laws, that consumers should have access to their smart meter data, and that the data should be made available to ' eligible parties' according to the member state's legal framework. While smart meter data in themselves are not new in many member states, this directive means that where previously it might have been solely the property of utilities, it is now available for third parties to access (with the consent of the customer). This directive means that smart meter data can be lawfully used for the purposes of producing an energy rating for an individual dwelling, and that the porting of these data to the EPC provider cannot be hindered by the data controller.
Smart meter data are therefore accessible provided consent from the household has been obtained. It should be straightforward to obtain consent from owner-occupiers at the time they commission an EPC or equivalent performance rating. However, it becomes more challenging where the individual commissioning the rating is not the occupier of the dwelling, e.g. a landlord renting a property in which they do not live. The smart meter data belong to the occupier of the property (e.g. the tenant). If the landlord wanted to commission an EPC that required smart meter data, they would therefore require the tenant's to provide consent to access these data. This creates a barrier to using smart meter data to rate rental properties.
Under the GDPR, consent is not the only lawful basis for processing personal data; however, the lawful bases for processing smart meter data do not mirror those in the GDPR. In the UK, consent is the only lawful basis for processing smart meter data (Smart Energy Code Company 2013), unless the processor is an energy supplier or distribution network operator who can process certain smart meter data for regulated duties (equivalent to a 'legal obligation' under the GDPR). To overcome this barrier, member states would need to open up the processing of smart meter data to further lawful bases. For example, 'public task' could be an appropriate lawful basis for a trusted processor to access smart meter data for rental properties without the consent of the present occupiers for the purposes of producing building performance ratings provided they were deemed to be a public good. It should be noted that owing to the personal nature of the smart meter data underlying any rating system created using it, the smart meter data themselves would not be made publicly available.
Finally, beyond the data protection considerations, a technical means to access the data must be available. In most member states, smart meters are the responsibility of distribution network operators, in which case data-sharing agreements would be needed between these and the processor(s), which could be non-trivial to implement without government intervention. To facilitate access to smart meter data by other parties, several member states such as Estonia and the UK have opted for independent central data hubs to handle the processing of smart meter data (Data Communications Company 2020; Elering 2020).

Technical methods
This section critically reviews the methods in the literature that use smart meter data to derive the energy performance of domestic buildings.

Rating metrics
As introduced in Section 2, this review focuses on ascertaining thermal performance, including the efficiency of provision of heat by the heating system and the efficiency of retention of heat by the building itself. This can include the efficiency of the provision of hot water, but not hot water consumption itself, which is occupant dependent.
Two relevant metrics are found in the literature. The first is the heating power loss coefficient (HPLC), introduced by Chambers and Oreszczyn (2018). This is the input fuel required to maintain a given temperature difference between the inside and outside of a building. It is as a way of characterising the thermal losses of both the heating system and the building's fabric in one metric and makes use of whole-dwelling metered fuel consumption data. Where the heating system is located within the heated volume, it also includes within its system boundary the interaction between the heating system and the building, e.g. the thermal losses from the heating system into the dwelling.
Other research uses an alternative metric focusing on the building itself, without its heating system. This metric is known as the heat transfer coefficient (HTC), defined as the heat flow rate divided by the temperature difference between two environments (ISO 2017). In a building context, it represents the heat required to maintain a given temperature difference between the inside and outside of the building (Jack et al. 2018); this differs from the HPLC by its numerator being delivered heat, not fuel. This second metric may be more challenging to calculate unless delivered heat is directly measured, as in the case of a district heating system (Gianniou et al. 2018) or provided using a direct electric system-in all other cases the in-situ conversion efficiency from fuel to heat must be ascertained. This may become possible in future as heating systems increasingly measure the variables necessary to calculate efficiency (return temperature, flow temperature and flow rate) and gain the functionality to transmit these data over the internet (Bennett, Elwell and Oreszczyn 2018).
Several methods have been proposed in the literature to determine empirically the HPLC and HTC in occupied dwellings from monitored data. Depending on the mathematical model used to describe the physical process of interest and the assumptions introduced, these methods can be separated into static and dynamic. While static methods neglect thermal storage effects by analysing sufficiently long-time series to minimise the influence of dynamic variations in the input data, the latter aim at explicitly modelling and characterising thermal mass effects.

Static methods
Current static methods are set out here as a starting point. Under the category of rating approaches in Figure 1, labelled as derived thermal performance, the energy signature method (elsewhere known as PRISM, Fels 1986; or power temperature gradient, Summerfield et al. 2015) was introduced. In this method, a simple model is applied to quantify the steady-state response of energy consumption to each drop in external temperature. In its simplest form (Fels 1986), the method estimates the HPLC as the gradient of the line of regression of fuel power data on external temperature data. Additionally, the HTC can be estimated if delivered heat data are available or the efficiency of the heating plant is known.
The availability of daily smart meter data renders the use of PRISM-type methods convenient. However, the estimates of HPLC and HTC obtained are not robust, as the method does not allow for the characterisation of the building's sensitivity to free gains (Bauer and Scartezzini 1998), including solar, appliance and metabolic gains. Therefore, refinements to the PRISM method are required to eliminate the systematic errors introduced when these gains are omitted. A body of research has aimed at improving the original PRISM implementation by accounting for free gains over the survey period instead of considering them as constant (Bauer and Scartezzini 1998;Ghiaus 2006;Rabl and Rialhe 1992); or including other environmental effects such as wind velocity (Bauer and Scartezzini 1998; Favre et al. 1983). The difficulty of incorporating free gains is illustrated below using the example of solar gain. First, data availability may limit accurate characterisation of solar gain. Although many weather stations record solar radiation, far fewer do than temperature, so local representative data on incoming solar radiation may not be available. Second, solar gains inside the dwelling are not simply a function of incident solar radiation but also a complex function of the orientation of solar transmitting openings relative to the diffuse and direct components of solar radiation, and any local shading. Building models often account for this complexity by using an effective solar aperture (Baker 2015; Stamp, Altamirano-Medina and Lowe 2017). However, for the use case of building ratings, sufficient data to calculate the aperture of a given building from first principles are unlikely to be available, leaving a systematic error in the HPLC or the HTC.
Beyond the incorporation of free heat gains, there are outstanding research challenges to account for situations where measured fuel consumption does not fully correspond to the delivered energy, for example, the impact of nonmetered energy (e.g. solid-fuel room heaters), metered energy that does not manifest in increased internal temperature (e.g. heat losses from hot water drainage or when the heating system is not located within the heated space) (Li, Allinson and Lomas 2019), and heat transfer to adjacent properties.
With the forthcoming availability of smart meter data from potentially millions of dwellings, several large-scale applications of PRISM-type methods are being developed. The 'Deconstruct' method by Chambers and Oreszczyn (2018) allows the estimation of the heating power loss coefficient from smart meter and external temperature data. Average daily data have been shown as the optimal temporal resolution in this application to accomplish the steady-state assumption made (Chambers 2017), which fits well with the data frequency available from smart meters. However, several challenges remain outstanding. The assumptions are only satisfied in periods of low solar gain, and a stocklevel assumption of the relation between internal and external temperature must be made which cannot be validated without internal temperature data from the property in question.
A second method developed for large-scale use has been recently proposed by Gianniou et al. (2018). Their method uses hourly data collected from heat meters installed in each dwelling on a district heating network for the characterisation of the HTC in occupied dwellings. Their steady-state approach (i.e. not explicitly accounting for thermal mass effects) combines a heat balance and linear regression method, predicting indoor volume-averaged temperature and the HTC of dwellings at building stock level. However, large indoor-to-external temperature differences and low solar gains are required to render the simple steady-state model realistic, besides the assumptions that each dwelling is constituted of a single thermal zone and indoor air is well mixed.

Dynamic methods
Overcoming the limitations described above may be possible through the use of dynamic methods, which aim at explicitly modelling heat transfer and storage in the building. Whilst several dynamic frameworks have been developed to characterise the thermophysical performance of unoccupied buildings (Bauwens and Roels 2014; Mangematin, Pandraud and Roux 2012;Palmer et al. 2011;Subbarao et al. 1988;Thébault and Bouchié 2018), little is currently available in the literature in relation to dynamic methods for the characterisation of occupied dwellings (Fonti et al. 2017;Harb et al. 2016;Hollick, Gori and Elwell 2020). As reported by Harb et al. (2016), robust models that separate out the unpredictable influence of occupants on the internal environment (e.g., free gains due to both internal and solar gains, or air-change rates due to window interaction) are still uncommon. However, a rapid evolution may be expected in the near future as this subject is currently receiving intense interest, for example, via the international Energy in Buildings and Communities (EBC) programme Annex 71: Building Energy Performance Assessment Based on In-Situ Measurements operated by the International Energy Agency (IEA-EBC 2019).
Simplified yet robust frameworks requiring a small number of data inputs collected in occupied dwellings would be very valuable in the light of characterising whole-building thermal performance for energy performance rating purposes. Harb et al. (2016) and Hollick et al. (2020) incorporate a limited amount of additional information beyond smart meter data, such as internal temperature and geographical location (from which, in turn, solar radiation and external temperature can be retrieved). Notably, Hollick et al. (2020) developed several lumped capacitance models of occupied dwellings explicitly including gains from solar radiation with varying complexity, allowing the estimation of the HTC or the HPLC and the solar aperture from short time series collected at all times of the year (including summer) by means of an inverse grey-box framework.
Dynamic methods capture the delay between heat input and internal temperature increase through their characterisation of thermal mass. However, other ways in which power input and internal temperature increase are delayed or decoupled from one another present a challenge for dynamic methods. Two examples are the aforementioned heat loss through hot water drainage and heat storage in hot water tanks. Further research is required to determine the best set of additional variables required to capture the heat storage and losses within and -out the dwelling.
Increasing levels of sophistication in the above methods tend to require additional data inputs, in particular internal temperature. Accessing such data for the purposes of energy ratings will be challenging in the near term. First, the required sensors to record the data need to be installed in homes. Unlike smart meters, the installation rate of such internet-enabled sensors will be gradual and not universal. For example, a survey of a representative sample of English households revealed that only 16.8% of households had replaced their central heating thermostat in the past five years (National Statistics 2019). Second, the data need to be remotely accessible by the party producing the rating, e.g. via the internet. Third, the extra perceived privacy implications of the addition of temperature monitoring to smart metering has not specifically been tested. Temperature data are, of course, collected by smart thermostats, but the primary users of these are generally found to be unrepresentative of the general population (Smith 2016;Yang and Newman 2013).
Assuming such barriers can be overcome, homes with these sensors may be able to gain more reliable estimates of the HPLC or HTC. Furthermore, if room temperature is available in multiple zones, some of the methods described above may be able to incorporate this information, too (Sakuma and Nishi 2019), overcoming the single-zone assumption and allowing for partial heating of properties.

Contextualising a building's performance
The methods described above produce an estimate of HTC (describing the building) or HPLC (describing the building and heating system). This number on its own is not useful for rating purposes; it must be interpreted on a scale from most to least efficient, for example, the A-G scale currently used throughout the EU. Two methods in the literature are discussed below: benchmarking to other buildings, and comparing with theory.
An operational rating for dwellings proposed by Lomas et al. (2019) incorporates a benchmarking system where a dwelling's weather-corrected, floor area-normalised energy use is compared with the national average. This is a relatively transparent and straightforward to implement method as the latter can be looked up from published tables. The authors point out that it also requires careful consideration to maintain a balance between updating the benchmarking system as energy demand, fuel mix and other factors change and stopping the system being overly complex or unstable.
Benchmarking by comparing energy use with other dwellings (after weather correction) can give an indication about whether a given household-dwelling combination uses a typical amount of energy relatively little or relatively more than similar properties. It does not separate the influence of the people from the building or heating system, and it does not incorporate any comparison with an expected performance defined by physical theory, instead being data driven.
An alternative interpretation method is a comparison with a theoretical expectation of performance, as used in the Swedish method of comparing real energy data with that of a modelled archetype in a similar location and of a similar use (SMHI 2018). This could allow a degree of physical interpretation of the result by comparing it with what it should be in the absence of a performance gap. Conversely, it could make the rating system more complex by invoking the need for a set of models, and associated assumptions and errors, with which the real energy performance is compared. It is not clear how a performance gap could be attributed clearly to either building factors or occupant factors.
This leads to the question of which interpretation system is most suitable for an HPLC-or HTC-type metric. These metrics are already weather corrected and, in theory, independent of occupant behaviour. A reasonable further correction is normalisation by floor area, as in Lomas et al. (2019), to obtain the final version of the metric. However, once this is carried out, it is unclear whether a data-driven or physically based but modelled comparison is the most useful way to place the HTC or the HPLC on a meaningful rating scale. Perhaps instead of choosing between these options, it may be possible to combine the two by retaining a set of very well-characterised real dwellings for comparison purposes; however, the authors know of no such approaches in the literature.

Providing recommendations
A building rating created from smart meter data and set in the context of other buildings or theoretical expectations, as above, could be useful for purchasing decisions, but it does not contain detail on specific building elements or systems where the householder may wish to make modifications (European Parliament 2010). Thus, there is likely to be a role for retaining either existing survey-driven rating systems or at least certain pieces of surveyed input data. Indeed, when discussing the possibility of introducing an operational rating in the UK, Lomas et al. (2019) posited that it should sit harmoniously alongside the existing well-established rating system. Given the known presence of survey errors (Delghust et al. 2015;Jenkins, Simpson and Peacock 2017), to what extent is it possible to combine survey inputs with an HPLC-or HTC-type thermal performance input to create useful recommendations?
Research is beginning to bridge the gap between survey data and real performance. Gonzalez-Caceres and Vik (2019) describe a method to incorporate infrared photographs into EPCs to improve the estimation of wall conductivity. They argue that this does not increase the cost of the certificate; however, this assertion was not tested and could clearly be contested. Alternatively, Mathew et al. (2015) describe how predicted energy savings can be made more realistic by using a data-driven approach to estimate the effects of energy efficiency measures. This could easily be applied to a rating system of HPLC or HTC metrics, but it does not solve the problem that if, for example, the wall type is incorrectly recorded in survey data for a given dwelling, then even a realistic prediction of the typical energy savings from wall insulation will be inapplicable to that dwelling.
There is considerable scope for research on how to combine survey data and metrics such as the HPLC or HTC costeffectively to provide recommendations that are most likely to be accurate, that is, to best represent the starting condition of the building as well as the predicted energy saving.

Discussion: remaining challenges
Three main challenges arise from the above review.

The challenge of demonstrating increased reliability
A major driver for creating building ratings using smart meter data is the premise that they can be more reliable than conventional ratings. Here this assumption is explored.
The simplest method discussed in Section 4 (Chambers and Oreszczyn 2018) has been shown to achieve a mean 15% uncertainty on HPLC prediction across the subset of dwellings for which the method worked (this subset constituted 70% of dwellings in the authors' data set). This uncertainty is not an obvious improvement on the predicted 3-18% error on the calculated method (Crawley et al. 2019); however, the two methods rate very different constructs, thus a comparison should perhaps be undertaken with caution. Incorporating more data streams may reduce uncertainty. Hollick et al. (2020) achieved a (yearly average) uncertainty of around 6% using extra data inputs consisting of several internal temperatures, solar radiation from a nearby weather station and dwelling orientation. Note that the literature is not yet clear on exactly how uncertainty on a rating is defined, for example, whether it consists of the fitting error of the algorithm or a wider definition including inherent variability of the rating itself over the year (Hollick et al. 2020).
One potential advantage of empirical ratings such as those that incorporate smart meter data is the opportunity to quantify the uncertainty on the rating on a per dwelling basis. This is not currently required by the EU, which states that the prospective buyer or tenant should receive correct information on the energy performance of the building (European Parliament 2010)-surely an impossible endeavour. Ratings presented with their associated uncertainty could be valuable for applications that involve an assessment of risk, for example, green mortgages. This is important as the EU encourages member states to link the information on EPCs to investment opportunities (European Parliament 2010). Whether this uncertainty gets presented to the householder, however, is an area for debate.
Finally, the point must be highlighted that a more accurate EPC is, of course, only part of the solution to the wider challenge of improving the impact of EPCs on purchase decision-making and renovation. There are many reasons for householders not making use of EPCs in these areas, including apathy (Murphy 2014), perception of the EPC not being useful (Christensen et al. 2014) and multiple other priorities in property selection (Pascuas, Paoletti and Lollini 2017).

The challenge of low cost
Prices for conventional EPCs vary across Europe from a few tens to a few hundreds of euros (Buildings Performance Institute Europe 2014). In theory, data-driven ratings could cost less to produce than an EPC created according to the calculated method if all the data already exist. However, the above review has highlighted that to gain accuracy above that of the simplest methods, extra measurements beyond smart meter data are required. The total cost once these further measurements are incorporated will affect the viability of any attempt to create a data-driven rating system. The incorporation of extra data streams also adds to the methodological complexity. Ideally, each data stream (such as internal temperature, survey inputs, weather data) should be quality assured, as well as the outcome of the rating algorithm. One interesting finding from Chambers and Oreszczyn (2018) was that the empirical method used failed for a proportion of dwellings. Although this could be regarded as disadvantageous for the applicability of empirical ratings, it could also be interpreted as a useful quality assurance check: if the application of the method fails for a particular dwelling, this indicates that the dwelling does not behave according to the assumed set of physics assumptions. There is no way to indicate this on a per dwelling basis in a calculated rating system.

The challenge of normative occupancy
The building and heating system-focused HTC and HPLC metrics are intended to provide the same rating with different sets of occupants. However, no studies have thoroughly tested this assumption. Here we examine the viability of truly decoupling occupant behaviour from the physical system. The example of window opening is used, since this alters the building fabric and, therefore, removes the distinction between occupant effects and the physical system we have been attempting to isolate in this paper.
First, a definitional question is raised: Should the performance rating refer to a building with no occupants (windows closed) or 'typical' occupants (windows open according to what is most common)? The effect of window opening on the HTC and HPLC depends on which of these conventions is adopted; if the former, it will lead to a systematic upward error; if the latter, the nature of the error is less clear. Second, window opening may be more common at certain times of the year (Fabi et al. 2012;Sharpe et al. 2015), causing the HPLC and HTC to vary in time even when the same occupants are resident.
Further work may render possible the use of smart meter data to characterise the effect of the specific occupants, allowing its subtraction from the rating metric and perhaps the reincorporation of a normative assumption instead. However, this is challenging. Identifying, for example, periods of window opening in order to understand their energy effect is becoming technically possible using high-frequency monitored indoor environmental variables (Pereira and Ramos 2018), but the acceptability of using this information is clearly a major consideration. An alternative and less intrusive characterisation of window opening is to assume that it causes all observed seasonal variation in the HPLC or HTC. However, this is not a robust assumption as shading and solar gain also affect these metrics differently throughout the year (Hollick et al. 2020).
Finally, the point made in Section 4 is reiterated that for the purposes of providing recommendations to the occupants, removing their characteristics from the metric may decrease the relevance of the advice given to them. Another opportunity arising from smart meter data, not in the scope of this paper but equally important, is in the characterisation of occupant behaviour, combined with or potentially even isolated from building performance.

Conclusions
EPCs are intended to be a reliable source of information about building thermal performance to householders and, as a secondary purpose, to policy-makers. Recent research has begun to quantify the uncertainty on current building thermal performance rating methods; the resulting issues with reliability are shown to be a contributing factor to lack of trust in the rating system. The focus of this paper is on whether smart meter data can be used to create more accurate building energy ratings. Smart meter data have the property of high resolution, giving the opportunity to characterise a building based on its rate of change of energy use with respect to time or external temperature. This is desirable since, in theory, it is possible to remove the effect of the occupants and produce metrics that specifically and accurately represent the thermal performance of the building and/or its heating system. The paper reviewed different methods of using smart meter data to estimate these metrics with increasing levels of complexity. The main technical challenges emerging from the review include how to remove occupant influence in practice, and how to balance the incorporation of extra data streams to improve accuracy with the need to keep the cost low and the complexity tractable. Key challenges that arise are how to turn metrics into meaningful ratings through different methods of contextualisation, and the possible continued role of survey data to help inform householder recommendations, both of which yield significant research gaps.
Having drawn out and clarified the research gaps, further work by the research community is needed to develop solutions to the technical problems and to test the practical effectiveness of thermal performance ratings created using smart meter data.

Note
1 The term ' domestic buildings' refers to a variety of building types (e.g. houses, flats, hotels, prisons and boarding schools). Non-domestic buildings include commercial, public sector, industrial and agricultural buildings. However, for the purpose of this paper, the scope of domestic buildings under consideration comprises houses and flats.