To give visibility of the status of Ofgem, BEIS and Innovate UK work relating to their expectations on the use of data
Who is responsible for this page/section
Richard Dobson for this page and any child pages, unless stated otherwise
Page/section development Plan
This plan also applies to its child pages.
- Publish the latest version of data best practice guidance
- Showcase best in class examples of the principles of the guidance being put into practice
- Create a public archive detailing how companies have designed governance frameworks to implement the guidance
- Create a public archive of companies experiences assessing data within their governance framework
The Energy Data Taskforce, led by Laura Sandys and Energy Systems Catapult, was tasked with investigating how the use of data could be transformed across our energy system. In June 2019, the Energy Data Taskforce set out five key recommendations for modernising the UK energy system via an integrated data and digital strategy. The report highlighted that the move to a modern, digitalised energy system was being hindered by often poor quality, inaccurate or missing data, while valuable data is often hard to find. As a follow up to the Taskforce’s findings, the Department for Business, Energy and Industrial Strategy (BEIS), Ofgem and Innovate UK have commissioned the Energy Systems Catapult to develop Data Best Practice Guidance to help organisations understand how they can manage and work with data in a way that delivers the vision outlined by the Energy Data Taskforce.
This guidance describes a number of key actions that taken together are deemed to be 'data best practice'. Each description is accompanied by more detailed guidance that describes how the desired outcome can be achieved. In some areas the guidance is very specific, presenting a solution which can be implemented easily. In other areas the guidance is less prescriptive, this may be because there are many possible 'best practice' solutions (e.g. understanding user needs) or there is a disadvantage to providing prescriptive guidance (e.g. cyber security). Where this is the case the guidance provides organisations with useful information that can be used to inform the implementation of a solution.
Identify the roles of stakeholders of the data
Use common terms within Data, Metadata and supporting information
Describe data accurately using industry standard metadata
Enable potential users to understand the data by providing supporting information
Make datasets discoverable for potential users
Learn and understand the needs of their current and prospective data users
Ensure data quality improvement is prioritised by user needs
Ensure that data is interoperable with other data and digital services
Protect data and systems in accordance with Security, Privacy and Resilience best practice
Store, archive and provide access data in ways that maximise sustaining value
Ensure that data relating to common assets is Presumed Open
Conduct Open Data Triage for Presumed Open data
The guidance touches on many issues which have existing regulation or authoritative guidance such as personal data protection and security. In these areas this guidance should be seen as complimentary rather than competitive. The guidance includes references to many existing resources and includes key extracts of the content within the guidance where licencing allows.
This guidance has been designed to help organisations implement the vision of a Modern, Digitalised Energy System which is described in the Energy Data Taskforce report including for those looking to implement 'Presumed Open'. However, the guidance has been designed as far as possible to be sector agnostic so it can have wider value to organisations beyond the energy sector and as such we have included examples from different sectors to illustrate the points made in the guidance. We expect there to be particularly strong read across for other organisations managing infrastructure and other regulated sectors.
The Data Best Practice Guidance is a living resource and will be regularly updated to reflect the changing technology and regulatory landscape. If you have a suggestion or comment, then we would like to hear from you so please comment on this site or contact firstname.lastname@example.org
1. Identify the roles of stakeholders of the data
There are two fundamental roles which will be referred to throughout this guidance, the Data Custodian and the Data User.
Data Custodian: An organisation or individual that holds data which it has a legal right to process and publish
The data custodian is responsible for managing data and therefore for implementing data best practice guidance. It is strongly advised that organisational data custodians appoint a specific senior data leader who is responsible for data strategy, management and implementation of data best practice within the organisation.
Data User: An organisation or individual which utilises data held by a data custodian for any reason
The data user (or potential data user) seeks to achieve an outcome by utilising data which is made available by a data custodian.
In addition to the roles defined above, there are a number of well understood definitions relating to personal data provided by the Information Commissioner's Office (ICO).
Data Subject: The identified or identifiable living individual to whom personal data relates.
[Data] Controller: A person, public authority, agency or other body which, alone or jointly with others, determines the purposes and means of the processing of personal data.
Data Processor: A person, public authority, agency or other body which processes personal data on behalf of the controller.
"To determine whether you are a controller or processor, you will need to consider your role and responsibilities in relation to your data processing activities. If you exercise overall control of the purpose and means of the processing of personal data – i.e., you decide what data to process and why – you are a controller. If you don’t have any purpose of your own for processing the data and you only act on a client’s instructions, you are likely to be a processor – even if you make some technical decisions about how you process the data." - ICO
2. Use common terms within Data, Metadata and supporting information
It is critically important for data users to be able to search for and utilise similar datasets across organisations. A key enabler of this is finding a common way to describe the subject of data including in formal metadata; this requires a common glossary of terms.
There has been a proliferation of glossaries within the energy sector, with each new document or data store providing a definitive set of definitions for the avoidance of doubt. The list below is provided as an example.
Equally, the same has been occurring across other sectors and domains, with the following data related sources are all data related.
There are currently efforts to standardise the naming conventions used across a range of infrastructure domains by the Digital Framework Task Group as part of the National Digital Twin programme of work. The long term goal is to define an ontology which enables different sectors to use a common language which in turn enables effective cross sector data sharing.
In the near term, it is unhelpful to create yet another glossary so we propose a two staged approach.
Organisations label data with keywords and the authoritative source of their definition e.g. Term [Glossary Reference]
An industry wide Data Catalogue should be implemented with an authoritative glossary based on existing sources which can be expanded or adapted with user feedback and challenge
Implementing a standard referencing protocol enables organisations to understand terms when they are used and slowly converge on a unified subset of terms where there is common ground. Implementing an industry wide data catalogue with an agreed glossary and mechanism for feedback enables the convergence to be accelerated and discoverability of data to be revolutionised.
3. Describe data accurately using industry standard metadata
To realise the maximum value creation from data within an organisation, across an industry or across the economy actors need to be able to understand basic information that describes each dataset. To make this information accessible, the descriptive information should be structured in an accepted format and it should be possible to make that descriptive information available independently from the underlying dataset.
Metadata is a dataset that describes and gives information about another dataset.
The Energy Data Taskforce recommended that the Dublin Core 'Core Elements' metadata standard (Dublin Core) ISO 15836-1:2017 should be adopted for metadata across the Energy sector. Dublin Core is a well established standard for describing datasets and has many active users across a number of domains including energy sector users such as the UKERC Energy Data Centre with a small number of key fields which provide a minimum level of description which can be built upon and expanded as required.
There are 15 'core elements' as part of the Dublin Core standard which are described as follows:
Name given to the resource
Entity primarily responsible for making the resource
Topic of the resource (e.g. Keywords from an agreed vocabulary)
Account of the resource
Entity responsible for making the resource available
Entity responsible for making contributions to the resource
Point or period of time associated with an event in the lifecycle of the resource
Nature or genre of the resource
File format, physical medium, or dimensions of the resource
Compact sequence of characters that establishes the identity of a resource, institution or person alone or in combination with other elements e.g. Uniform Resource Identifier (URI) or Digital Object Identifier (DOI)
Related resource from which the described resource is derived (e.g. Source URI or DOI)
Related Resource (e.g. related item URI or DOI)
Spatial or temporal topic of the resource, spatial applicability of the resource, or jurisdiction under which the resource is relevant
Information about rights held in and over the resource
edits or additions are made in
Many of the fields are straight forward to populate either manually or through automated processes that ensure metadata is up to date and accurate. However, others fields are more open to interpretation. In the table below we have listed best practice tips to help provide consistency across organisations.
This should be a short but descriptive name for the resource.
Identify the creator(s) of the resource, individuals or organisations.
Identify the key themes of the resource
Provide a description of the resource which can be read and understood by the range of potential users.
Identify the organisation or individual responsible for publishing the data. This is usually the same as the metadata author.
Identify the contributor(s) of the resource, individuals or organisations.
Date is used in a number of different ways (start of development/collection, end of development/collection, creation of resource, publication of resource, date range of data etc.), the usage of a date field should therefore be explained in the resource description. The nature and potential use cases of data will dictate the most valuable use of this field. However, where data collection is concerned providing the time interval during which the data has been collected is likely to be most informative rather than the publication date.
Identify the type of the resource from the DCMI type vocabulary
Identify the format of the resource - in the case of data this is the file or encoding format. e.g. csv, JSON, API, etc.
Provide a unique identifier for the resource
Identify the source(s) material of the derived resource
Identify the language of the resource
Identify other resources related the resource
Identify the spatial or temporal remit of the resource
Specify under which licence conditions the resource is controlled. It should be clear if the resource is open (available to all with no restrictions), public (available to all with some conditions e.g. no commercial use), shared (available to a specific group possibly with conditions e.g. commercial data product) or closed (not available outside of the data custodian organisation).
The Dublin Core metadata should be stored in an independent file from the original data and in a machine readable format, such as JSON, YAML or XML that can easily be presented in a human readable format using free text editors, for example Notepad++. This approach ensures that metadata can be shared independently from the dataset, that it is commonly accessible and not restricted by software compatibility. The DCMI have provided schemas for representing Dublin Core in XML and RDF which may be of help.
Where a dataset is updated or extended the custodian needs to ensure that the metadata reflects this such that potential users can easily identify the additions or changes. Where the data represents a new version of the dataset (i.e. a batch update or modification of existing data) then it is sensible to produce a new version of the dataset with a new metadata file. Where the dataset has been incrementally added to (e.g. time series data) it may be best to update the metadata (e.g. data range) and retain the same dataset source.
The Department for Business, Energy and Industrial Strategy (BEIS) has published a dataset relating to the installed cost per kW of Solar PV for installations which have been verified by the Microgeneration Certification Scheme. Note, this data does not have a URI or DOI so a platform ID has been used as identifiers, this is not ideal as it may not be globally unique.
"title":"Solar PV cost data"
"creator":"Department for Business, Energy and Industrial Strategy"
"subject":"solar power station [http://www.electropedia.org/], energy cost [http://www.electropedia.org/]"
"description":"Experimental statistics. Dataset contains information on the cost per kW of solar PV installed by month by financial year. Data is extracted from the Microgeneration Certification Scheme - MCS Installation Database."
"publisher":"Department for Business, Energy and Industrial Strategy"
"contributor":"Microgeneration Certification Scheme"
"coverage":"Great Britain (GB)"
"rights":"Open Government Licence v3.0"
4. Enable potential users to understand the data by providing supporting information
When data is published openly, made publicly available or shared with a specific group it is critical that the data has any supporting information that is required to make the data useful for potential users. There is a need to differentiate between Core Supporting Information, without which the data could not be understood by anyone, and Additional Supporting Information that makes understanding the data easier. As a rule of thumb, if the original custodian of the dataset were to stop working with it and then come back 10 years later with the same level of domain expertise, but without the advantage of having worked with the data on a regular basis, the Core Supporting Information is that which they need to make the dataset intelligible.
Data Custodians should make Core Supporting Information available with the dataset.
Data custodians should consider the following topics for areas where Core Supporting Information may be required:
Data collection methodology
If data is aggregated or processed the methodology should be included e.g. for half hourly time-series data, does the data point represent the mean, start, midpoint or end value?
Data structure description (e.g. data schema)
Granularity (spatial, temporal, etc.) of the data
Units of measurement
Version number of any reference data that has been used (e.g. the Postcode lookup reference data)
References to raw source data (within metadata)
Protocols that have been used to process the data
It may be possible to minimise the required Core Supporting Information if the dataset uses standard methodologies and structures (e.g. ISO 8601 timestamps with UTC offset are strongly recommended) but it should not be assumed that an externally hosted reference dataset or document will be enduring unless it has been archived by an authoritative, sustainable body (e.g. ISO, BSI, UK Data Archive, UKERC Energy Data Centre, etc.). If there is doubt that a key reference data source or document will be available in perpetuity this should be archived by the publisher and, where possible, made available as supporting information.
Whilst Data Custodians will attempt to provide all of the core supporting information required it is likely that there will be cases where essential core supporting information is missed or where there is a need for supporting information to be clarified. The data custodian should provide a Data Contact Point for potential data users to raise queries or request additional core supporting information where required.
A Data Contact Point should be made available to respond to data queries and requests relating to datasets and their core supporting information
Depending on the goals of the organisation sharing data, it may be prudent to also include additional supporting information. Reasons to make additional supporting information available could include:
Maximising user engagement with the dataset
Addressing a particular user need
To reduce the number of subsequent queries about the dataset
To highlight a particular issue or challenge which the data publisher would like to drive innovators towards
Core Supporting Information
The UK Energy Research Centre (UKERC) are the hosts of the Energy Data Centre, which was set up to "to create a hub for information on all publicly funded energy research happening in the UK". The centre hosts and catalogues a large amount of data which is collected by or made available to aid energy researchers, however much of the data is made available using an open licence (e.g. Creative Commons Attribution 4.0 International License). This licence enables data use for a wide range of purposes, including research. The centre aims to archive data for future researchers and as such, the administrators have embedded a range of data best practice principles including the provision of core supporting information that is required to understand the stored dataset. The custodians of the Energy Data Centre provided the '10 year' rule of thumb, described above.
The Energy Data Centre host the Local Authority Engagement in UK Energy Systems data and associated reports. The data is accompanied by rich metadata and a suit of core supporting information that enables users to understand the data. In this case, the core information includes:
Core Supporting Information
Description of the source datasets
Description of the fields and their units
ReadMe.txt files with a high level introduction to the associated project
Additional Supporting Information
A list of academic reports about the data and associated findings
The list of academic reports contains some core supporting information (e.g. the detailed collection methodology), but much of the content is additional supporting information.
Additional Supporting Information
The Data Custodian may have an interest in maximising engagement with datasets to solve a particular problem or drive innovation. In these cases, the custodian may choose to provide additional supporting information to help their potential users. Good examples of this are data science competitions, which are commonly used in industry to solving particular problems by drawing on a large number of experts.
A recent Kaggle competition asked participants to utilise sensor data to identify faults in power lines. To maximise engagement, the hosts provided high level overviews of the problem domain, more detailed explanations of datasets and offered additional advice as required through question and answer sessions that were widely published. This information was not strictly essential for an expert to understand the data, but the underlying goal of the project was to attract new talent to the area. If the potential participants could not easily understand the data they would likely move on to another lucrative project.
Core Supporting Information
Descriptions of datasets and fields (metadata)
Core and Additional Supporting Information
Description of the problem in non-technical language
Description of common problems and how to identify them
Question and Answer feeds
5. Make datasets discoverable for potential users
The value of data can only be realised when it is possible for potential users to identify what datasets exist and understand how they could utilise them effectively. Data custodians should implement a strategy which makes their data inclusively discoverable by a wide range of stakeholders within and outside of their organisation. There may be instances where it is not advisable to make it known that a dataset exists but this is expected to be exceptionally rare e.g. in cases of national security.
Discoverable: The ability for dataset to be easily found by potential users
Please note, there is a difference between a dataset being discoverable and accessible. Section 12 discusses the Open Data Triage process which should be used to identify the appropriate level of openness. In some cases a dataset may be too sensitive to be made available but the description of the dataset (metadata) can almost always be made available without any sensitivity issues, this visibility can provide significant value through increasing awareness of what data exists. For example, an advertising agency's dataset that describes personal details about individuals cannot be made openly available without the explicit consent of each subject in the data, but many of the advanced advertising products that provide customers with more relevant products and services would be significantly less effective if the advertising platform could not explain to potential advertisers that they can use particular data features to target their advertisements.
Metadata can be used to describe the contents and properties of a dataset. It is possible to make metadata open (i.e. published with no access or usage restrictions) in all but the rarest of cases without creating security, privacy, commercial or consumer impact issues due to it not actually involving the underlying data.
Metadata can be published by individual organisation initiatives or by collaborative industry services. Individual organisations may choose to host their own catalogue of metadata and/or participate with industry initiatives.
Where data is made available via services such as a website (open, public or shared) organisations can choose to markup datasets to make them visible to data-centric search engines and data harvesting tools. The Schema.org vocabulary is becoming an increasingly popular way to structure embedded data (such as recipes*) but it can also be used to describe formal datasets. Structured markup has similarities to formal metadata but should not be seen as a replacement for standardised metadata.
One of the most common uses of Schema.org is to formalise the data that is held within online recipes. This is semi structured information which commonly includes a list of required ingredients, a list of equipment and step by step instructions. By structuring this data, we have enabled cross recipe website search, useful tools such as 'add to shopping basket' for popular recipe websites and enable greater analysis of the underlying data. (How long should the perfect boiled egg be cooked for?)
Search Engine Optimisation
Search engines are likely to be the way in which most users will discover datasets. It is therefore the responsibility of the data custodian to make sure that data is presented in a way that search engines can find and index. Most major search engines provide guides explaining how to ensure that the correct pages and content appear in 'organic searches' (e.g. Microsoft, Google) and a range of organisations provide Search Engine Optimisation (SEO) services. The Geospatial Commission have developed a guide to help organisations optimise their websites to enable search engines to identify and surface geospatial data more effectively.
Direct stakeholder engagement is a powerful tool to drive interest into new or underutilised datasets. Additionally, this technique may be used when there is a specific use case or challenge which the data custodian is seeking to address.
Data Contact Point
The data custodian should provide a Data Contact Point for potential data users to contact the data custodian where it has not been possible to establish if data exists or is available
The Department for Business, Energy and Industrial Strategy (BEIS) has published a dataset relating to the installed cost per kW of Solar PV for installations which have been verified by the Microgeneration Certification Scheme. This dataset is published under an Open Government Licence so the data can be accessed by all. The data custodians have registered the dataset with the UK Government open data portal Data.gov.uk, this makes the metadata publicly available (albeit only via the API) in JSON - a machine readable format that can easily be presented in a human readable form. It additionally provides search engine optimisation to surface the results in organic search and uses webpage markup to make the data visible to dataset specific search engines.
Note, the discoverability actions taken above are related to a dataset which is publicly available but could also be used for a metadata stub entry in a data catalogue.
6. Learn and understand the needs of current and prospective data users
Digital connectivity and data are enabling a wealth of new products and services across the economy and creating new data users outside of the traditional sector silos. In order to maximise the value of data it is vital custodians develop a deep understanding of the spectrum of their users and their differing needs such that datasets can be designed to realise the maximum value for customers.
Data custodians should develop a deep understanding of range of topics.
Their current and potential data users
The goals of each current and potential data user
The user needs of each current and potential data user
How this relates to the goals of the data holders
How this delivers benefit for customers
How the data holder can deliver this in an appropriate format within reasonable timescales
There are a range of methods which organisations can use to elicit the needs of current and potential users.
Interviews - detailed research with representative users
Workshops - broad research with groups of users (or potential users)
Usability Testing - feedback from service users
Monitoring - tracking usage of a live service
User initiated feedback or requests - feedback forms, email contact, etc.
Innovation Projects - generating new user needs through novel work
Knowledge Sharing - collaboration between organisations with similar user types
Direct Requests - prospective user needs
The Government Digital Service have published advice on the topic of user research within their service manual.
Note, it is important to recognise that it will not be possible to identify all potential users and potential use cases of a dataset before it is made available to innovators. Therefore, data without a clear user or use case should not be ignored as there may be unexpected value which can be created.
One approach that can be used to gain input from potential users is to convene a workshop where individuals from different backgrounds can come together to discuss the challenges they face and the needs that this creates. Formulating the needs of users as structured user stories which can be used to subsequently identify trends:
As a Role given that Situation I need Requirement so that Outcome
The type of stakeholder
The motivating or complicating factor
The user need
The desired outcome
As a homeowner given that home heating accounts for 31% of carbon emissions I need to have access to data that evidences the impact of energy efficiency measures so that I can prioritise upgrades and developments and reach net zero carbon as quickly as possible.
7. Ensure data quality improvement is prioritised by user needs
Data quality is subjective. A dataset may be perfectly acceptable for one use case but entirely inadequate for another. Data accuracy can be more objective but there remain many instances where the required precision differs across use cases.
Data is not perfect; even the most diligent organisation that makes the greatest effort to collect and disseminate the highest quality data cannot guarantee enduring accuracy or foresee all the potential future needs of data users. Data custodians are therefore faced with an ongoing task to identify quality and accuracy limitations which can be improved over time. This is a particular challenge for the owners and operators of infrastructure who have the task of deploying assets which will be in situ for many years (often decades). The data needs of potential users are almost certain to change dramatically within the lifespan of the asset and the ability to recollect information is often challenging (especially for buried assets).
Note: Organisations should not see data quality as a barrier to opening datasets. Potential users may find the quality acceptable for their use, find ways to handle the quality issues or develop ways to solve issues which can improve the quality of the underlying data.
Data is most useful when it is accurate and trustworthy. In many cases, data will be used to store information which can objectively be categorised as right or wrong e.g. customer addresses, asset serial numbers, etc.
Data custodians should make reasonable efforts to ensure that data is accurate and rectify issues quickly when they are identified
This is in line with the GDPR accuracy requirements for personal data. Organisations should utilise Master Data Management (MDM) techniques to validate inputs, monitor consistency across systems and rectify issues quickly when they are identified by internal or external stakeholders.
Beyond accuracy, data custodians should consider how they can iteratively improve the quality of data.
Data custodians should seek to improve data quality in a way that responds to the needs of users
Not all data will be of sufficient quality for all users and in some cases significant action may be required to rectify shortcomings. Many issues may be resolved through incremental changes to business process or data processing but some may have a more fundamental issue. Data controllers should consider if the insufficient quality is due to lack of quality in the underlying data source (e.g. the sensor data is not precise or frequent enough), significant processing of the data (e.g. aggregation, rounding, etc.) or technical choices (e.g. deleting data due to storage constants).
An organisation collects and holds data about companies which operate in a sector in order to help potential innovators find suppliers or collaboration partners. The data that is held is made available on a website and can be queried by users. A potential user searches for an organisation and finds that a company has been incorrectly categorised. The data publishers have provided a contact form which enables the user to submit the correction which is subsequently verified by the data publisher before the dataset is updated.
A data user wishes to develop an application which enables public transport users to select options based on their impact on air pollution in a city. The public transport provider makes a range of data available about the various modes of transport including routes travelled, vehicle type, average emissions and passenger numbers. However, the data user has identified that the emissions data is not of a sufficient quality for their use case for the following reasons:
The average emissions data field is not consistently populated
There is likely to be a variation between the average emission output and the real impact on local areas
The issues highlighted above are quite different in nature and have different potential solutions.
The public transport provider could address the first point by:
manually checking the dataset for missing values and populate based on the vehicle type and known specification
using data science techniques such as interpolation and to populate missing data
The public transport provider could address the second point by:
deploying a real emission monitoring solution
utilising an existing, alternative data source and data processing to provide 'proxy' data. e.g. static air quality monitoring sites
Having proposed a quick solution to point 1 and an alternative solution for point 2 the data user can continue with their use case.
8. Ensure that data is interoperable with other data and digital services
Data is most useful when it can be shared, linked and combined with ease.
Interoperability (Data): enabling data to be shared and ported with ease between different systems, organisations and individuals
Data custodians should, directed by user needs, ensure that their data is made available in a way that minimises friction between systems through the use of standard interfaces, standard data structures, reference data matching or other methods as appropriate. Wherever possible, the use of cross sector and international standards is advised.
Standard Data Structures
Data structure standardisation is a common method of aligning data across organisations and enabling seamless portability of data between systems. This method provides robust interoperability between systems and if the standard has been correctly adhered to enables entire data structures to be ported between systems as required. However, standardisation of data structures can be expensive, time consuming and require significant industry, regulatory or government effort to develop the standard if one does not already exist.
Examples of Implementation
Telecoms: Many of the underlying functions in telecoms have been standardised at a data structure level by the GSM Association (GSMA). This enables network companies to deploy devices from different manufacturers with limited burden, this drives down costs by reducing the risk of vendor lock in.
Energy: The Common Information Model (CIM) is a set of standardised data models which can be used to represent electricity networks, their assets and functionality. CIM is being deployed in a number of network areas to aid the portability between systems, enable innovation and lower costs.
Data interfaces can also be standardised, this means that the formal channels of communication are structured in a standard way which enables systems to 'talk' to one another with ease. This approach has the advantage of being very quick to implement as interfaces can be implemented as required and providing there is robust documentation a single organisation can define a 'standard' interface for their users without the need for cross sector agreement as they can be easily developed and evolved over time. However, this approach is limited in that a new interface needs to be developed or deployed for each type of data that needs to be shared. Additionally, in sectors where there a few powerful actors they can use interface standardisation to create siloed ecosystems which reduces portability.
Examples of Implementation
Reference Data Matching
Matching datasets back to reference data spines can be a useful method to enable non standardised data to be joined with relative ease. This approach can provide a minimum level of interoperability and linkability across datasets without the need for full standardisation. However, it does require the users to learn about each new dataset rather than being able to understand the data from the outset with standard data structures.
Examples of Implementation
National Statistics: Datasets are matched back to a core set of 'data spines' to aid cross referencing and the production of insightful statistics
Standard Data Structures
Electricity network data is essential for a number of emerging energy system innovations including the successful integration of a highly distributed, renewables dominated grid. However, the network is broken into a number of areas which are operated by different organsations that have implemented different data structures to manage their network data (power flow model, GIS and asset inventory). The Common Information Model (CIM) is the common name for a series of IEC standards (IEC 61970 and IEC 61968) which standardise the data structure for electricity network data. The deployment of the CIM standards enables network operators to provide third parties with access to their data in a standard form which enables innovation to be rolled out across network areas with relative ease.
Reference Data Matching
The Office for National Statistics (ONS) has the responsibility to "collect, analyse and disseminate statistics about the UK’s economy, society and population." In order to undertake their duties, the ONS collects large amounts of data from disparate sources which needs to be cross referenced in order to produce statistical outputs. To this end they utilise a series of data spines which provide a number of key identifiers which span the economy such as company reference number, individual identifiers and property identifiers. As part of the data ingest pipeline they match data sources to one or more of the spines This enables data which have no common fields to be cross referenced and used to create meaningful statistics.
In the case above, the ONS have performed the matching but it is equally possible for individual organisations to match their own data to key reference datasets such as street identifiers or network location. This provides third parties that wish to utilise the data with a useful anchor which can enable the easy linking back to external datasets.
9. Protect data and systems in accordance with Security, Privacy and Resilience best practice
Ensure data and systems are protected appropriately and comply with all relevant data policies, legislation, and Security, Privacy and Resilience (SPaR) best practice principles.
Data Custodians should consider:
How to protect data appropriately
Data in transit
How the release of data could impact the security of systems
What is the value of data to potential attackers or hostile actors?
What impact could a data breach cause?
How the systems which are used to release data are made secure
How to ensure compliance with related policy, legislation and regulation
Note, data custodians should utilise modern, agile approaches which seek to balance risk and reward rather than those which take a strict closed approach. A number of frameworks, standards and regulations exist which provide organisations with implementable guidance on the topic of SPaR such as:
Cyber Assessment Framework (CAF)
In addition, there is a wealth of advice available to organisations through the following organisations:
10. Store, archive and provide access data in ways that maximise sustaining value
Organisations should consider the way in which data is stored or archived in order to ensure that potential future value is not unnecessarily limited. This includes ensuring that the storage solution is specified in such a way to ensure that the risk of data being lost due to technical difficulties is minimal. Technical storage solutions which offer component and geographical redundancy are common place with many cloud providers offering data resilience as standard. In addition to technical resilience the data custodian should ensure that data is not unduly aggregated or curtailed in a way that limits future value. Where possible the most granular version of the data should be stored for future analysis but there may be cases where the raw data is too large to store indefinitely. Where this occurs the data custodian should consider how their proposed solution (aggregation, limiting retention window, etc.) would impact future analysis opportunities.
Where data does not have value to the original data custodian the information could be archived with a trusted third party e.g. UKERC Energy Data Centre, UK Data Archive, etc. to ensure that it continues to have sustaining value.
Access to data should also be considered as part of the storage and archiving strategy. Data custodians should seek to make data available in formats which are appropriate for the type of data being presented and respond to the needs of the user. Whilst visualisations and data presentation interfaces are useful to some they restrict the way in which potential users can interact with the data and should be considered as Supporting Information rather than data itself.
Application Protocol Interfaces (APIs) are the defacto standard for the delivery of live data. Many organisations from Twitter to Elexon have developed and deployed successful APIs which respond to the needs of their data users.
Machine readable files (such as CSV or JSON) are preferable for large historic data. This negates the need for a large number of API calls which reduces the load on the send and receivers' systems and provides the end user with a useful dataset which can be manipulated as required. The BEIS National Energy Efficiency Data is a good example of providing well documented bulk data. The Met Office also publish a variety of useful data in bulk formats.
The UK Smart Meter Implementation Program is responsible for rolling out digitally connected meter points to all UK premises with the goal of providing an efficient, accurate means of measuring consumption alongside a range of other technical metrics. Electricity distribution networks can request to gain access to this data in order to inform their planning and optimise operation. The personal nature of smart meter data means that organisations have to protect individual consumer privacy and are therefore choosing to aggregate the data at feeder level before it is stored, this approach provides the network with actionable insight for the current configuration of the network.
However, network structure is not immutable. As demand patterns change and constraints appear, network operators may need to upgrade or reconfigure their network to mitigate problems. The decision to aggregate the data means that it is not possible to use the granular data to simulate the impact of splitting the feeder in different ways to more effectively balance demand across the new network structure. In addition, it means that the historic data cannot be used for modelling and forecasting going forwards. Finally, the data ingest and aggregation processes need to be updated to ensure that any future data is of value.
Northern Power Grid (a GB Electricity Distribution Network) have recently proposed to store the smart meter data that they collect in a non-aggregated format but strictly enforce that data can only be extracted and viewed in an aggregated format. This approach is novel in that it protects the privacy of the consumer whilst retaining the flexibility and value of the data.
11. Ensure that data relating to common assets is Presumed Open
Presumed Open is the principle that data should be as open as possible. Where the raw data cannot be entirely open, the data custodian should provide objective justification for this.
Open Data is made available for all to use, modify and distribute with no restrictions
Data relating to common assets should be open unless there are legitimate issues which would prevent this. Legitimate issues include Privacy, Security, Negative Consumer Impact, Commercial and Legislation and Regulation. It is the responsibility of the data controller to ensure that issues are effectively identified and mitigated where appropriate. It is recommended that organisations implement a robust Open Data Triage process.
Common Assets are defined as a resource (physical or digital) that is essential to or forms part of common shared infrastructure
In cases where there has been data processing applied to raw data (e.g. Issue mitigation, data cleaning, etc.) it is considered best practice for the processing methodology or scripts to be made available as core supporting information in order to maximise the utility of the data to users
Data relating to common assets should be open unless there are legitimate issues
12. Conduct Open Data Triage for Presumed Open data
The triage process considers themes such as privacy, security, commercial and consumer impact issues. Where the decision is for the raw data to not be made open the data controller will: share the rationale for this and consider sensitivity mitigation options (data modification or reduced openness) that maximise usefulness of the data. Where a mitigation option is implemented the protocol should be made publicly available with reference to the desensitised version of the data. In the cases where no data can be made available then the rational should be documented and made available for review and challenge.
Users of the data should have reasonable opportunity to challenge decisions and have a point of escalation where agreement between data users and data controllers cannot be reached.
The diagram below is a high level representation of the proposed process, more detail is provided about the steps in the following subsections.
Identification of Discrete Datasets
The goal of open data triage is to identify where issues exist that would prevent the open publication of data in its most granular format, and address them in a way that maintains as much value as possible. To make this process manageable for the data controller, the first step in this process is:
Identify thematic, usable datasets that can be joined if required rather than general data dumps
In this context, we define a 'thematic, usable dataset' as a discrete collection of data which relates to a focused, coherent topic but provides enough information to be of practical use. Data custodians should consider:
Data source (device, person, system)
Subject of data (technical, operational, personal, commercial)
Time and granularity (collection period, frequency of data collection, inherent aggregation)
Location (country, region, public/private area)
Other logical categorisations (project, industry, etc.)
For example, an infrastructure construction company may have data about the construction and operation of various building projects across a number of countries. It is sensible to split the data into operational and construction data and then group by type of construction (public space, office building, residential building, etc.), geographic region and year of construction.
The approach described above minimises the risk that the size and complexity of datasets results in issues that are not correctly identified. It also reduces the risk that an issue in one part of the dataset results in the whole dataset being made less open or granular therefore maximising the amount of useful data that is openly available in its most granular form. For example, providing complete output from a data warehouse in one data dump could contain information about customers, employees, financial performance, company Key Performance Indicators (KPIs), etc. all of which would present issues that would mean the data needs to be modified or the openness reduced. Whereas extracting tables (or parts of tables) from the data warehouse would provide a more granular level of control which enables individual issues to be identified and addressed accordingly which would in turn maximise the data which is made openly available.
Identification of Issues
Once a thematic, usable dataset has been identified the data controller should assess the dataset to identify if there are any issues which would prevent the open publication of the data in its most granular format.
Identify the potential issues which might limit the openness or granularity of dataset
In the table below, we outline a range of issue categories which should be carefully considered. Some of these categories will directly relate to existing triage processes which already exist in organisations but others may require the adaptation of existing processes or creation of new processes to provide a comprehensive solution.
Data that relates to a natural person who can be identified directly from the information in question or can be indirectly identified from the information in combination with other information.
This should be a familiar process as GDPR introduced a range of requirements for organisations to identify personal data and conduct Data Privacy Impact Assessments (DPIA). The ICO has a wealth of advice and guidance on these topics, including definitions of personal data and DPIA templates.
It is important that Open Data Triage is used to effectively identify privacy issues and ensure that any data which is released has been appropriately processed to remove private information and retain customer confidence in the product, service or system.
Data that creates incremental or exacerbates existing security issues which cannot be mitigated via sensible security protocols such as personnel vetting, physical site security or robust cyber security.
Companies and organisations that own and operate infrastructure should already have a risk identification and mitigation program to support the protection of Critical National Infrastructure (CNI). The Centre for the Protection of National Infrastructure (CPNI) have advice and guidance for organisations involved in the operation and protection of CNI.
Outside of CNI, organisations should assess the incremental security risks that could be created through the publication of data. Organisations should consider personnel, physical and cyber security when identifying issues and identify if the issue primarily impacts the publishing organisation or if it has wider impacts. Issue identification should take into account the existing security protocols that exist within an organisation and flag areas where the residual risk (after mitigation) is unacceptably high.
Note, where that information contained within a dataset is already publicly available via existing means (such as publicly available satellite imagery) the security issue assessment should consider the incremental risk of data publication using the existing situation as the baseline.
Negative consumer Impact
Data that is likely to drive actions, intentional or otherwise, which will negatively impact the consumer
Organisations should consider how the dataset could be used to drive outcomes that would negatively impact customers by enabling manipulation of markets, embedding bias into products or services, incentivising of actions which are detrimental to decarbonisation of the system, etc.
Data that relates to the private administration of a business or data which was not collected as part of an obligation / by a regulated monopoly and would not have been originated or captured without the activity of the organisation
Commercial data relating to the private administration of a business (HR, payroll, employee performance, etc.) is deemed to be private information and as a legitimate reason for data to be closed, although organisations may choose to publish for their own reasons such as reporting or corporate social responsibility (CSR) reasons.
Data which does not relate to the administration of the business but has been collected or generated through actions which are outside of the organisation's legislative or regulatory core obligations and funded through private investment may also have legitimate reason to be closed. This description may include the data generated through innovation projects but consideration should be given to the source of funding and any data publication or sharing requirements this might create.
Where an organisation is a regulated monopoly, special consideration should also be given to the privileged position of the organisation and the duty to enable and facilitate competition within their domain.
Where datasets contain Intellectual Property (IP) belonging to other organisations or where the data has been obtained with terms and conditions or a data licence which would restrict onward publishing this should also be identified. Note, the expectation is that organisations should be migrating away from restrictive licences / terms and conditions that restrict onward data publishing and sharing where possible.
Legislation and Regulation
Specific legislation or regulation exists which prohibits the publication of data.
Organisations should have legal and regulatory compliance processes which are able to identify and drive compliance with any obligations the company has.
Consideration should include:
Consider the impact of related or adjacent datasets
When assessing the sensitivity of data, thought should be given to the other datasets which are already publicly available and the issues which may arise from the combination of datasets. Organisations should consider where there are datasets outside of their control which, if published, could create issues which would need to be mitigated. Special consideration should be given to datasets which share a common key or identifier, this includes but is not limited to:
subject reference (e.g. Passport Number),
technical reference (e.g. Serial Number),
time (e.g. Universal Time Coordinated or UTC),
space (e.g. Postcode or Property Identifier)
As new datasets are made available, markets develop and public attitudes change there may be a need to revise the original assessment. For example, a dataset which was initially deemed too sensitive to be released openly in its most granular form could be rendered less sensitive due to changes to market structure or change in regulatory obligations. Equally, a dataset which was published openly could become more sensitive due to the publication of a related dataset or technology development. At a minimum, data custodians should aim to review and verify their open data triage assessments on an annual basis.
Mitigation of Issues
Where the assessment process identifies an issue, the aim should be to mitigate the issue through modification of data or reduced openness whilst maximising the value of the dataset for a range of stakeholders.
Mitigate issues through modification of data or reduced openness whilst addressing user needs
Open data with some redactions may be preferable to shared data without, but if redactions render the data useless then public or shared data may be better. In some cases, the objectives of the prospective data users might create requirements which cannot be resolved by a single solution so it may be necessary to provide different variations or level of access, for example providing open access to a desensitised version of the data for general consumption alongside shared access to the unadulterated data to a subset of known users.
Modification of Data
Modification of data can serve to reduce the sensitivity whilst enabling the data to be open. There are a wide variety of possible modifications of data which can be used to address different types of sensitivity.
Removing or altering identifying features
An organisation has a licence condition to collect certain data about individual usage of national infrastructure. The data is collected about individual usage on a daily basis and could reveal information about individuals if it was to be released openly.
By removing identifying features such as granular location and individual reference it could be possible to successfully anonymise the data such that individuals cannot be re-identified so the data could be made openly available.
Simple anonymisation can be very effective at protecting personal data but it needs to be undertaken with care to minimise the risk of re-identification. Anonymisation techniques can be combined with other mitigation techniques to minimise this risk.
The UK ICO have provided an anonymisation code of practice which should be adhered to.
Replacing identifying features with a unique identifier that retains the reference to an individual whilst breaking the link with the 'real world' identity.
An organisation (with permission) collects data about how customers use a web service. This data is used to diagnose problems where there are issues with the website operation.
Replacing the customer name and address with a random unique identifier that allows the behavior that led to an issue to be analysed whilst protecting the identify of an individual user.
Pseudonymisation is distinct from Anonmysation as it is possible to consistently identify individuals but not link this to a specific, named person.
Pseudonymisation should be used carefully as it is often possible to utilise external datasets and data analysis to match identifiers and trends such that the individual can be re-identified. for example, it may be possible to analyse the website usage patterns (times, locations, device type, etc.) and cross reference with other personally identifiable datasets (social media posts, mobile positioning data, work schedules, etc.) to identify an individual with a sufficient level of confidence. Again the ODI and ICO provide useful guidance in this area.
Combining the original dataset with meaningless data
An organisation collects information about how individuals use a privately built product or service (e.g. a travel planner). This data could be of great use for the purposes of planning of adjacent system (e.g. energy system or road network) but releasing the anonymised, granular data would given competitors a commercial advantage.
By introducing seemingly random noise into the dataset in a way that ensure that the data remains statistically representative but the detail of individuals is subtly altered the data can be made available whilst reducing the commercial risk.
Introducing noise to data in a way that successfully obfuscates sensitive information whilst retaining the statistical integrity of the dataset is a challenging task that requires specialist data and statistics skills. Consideration needs to be given to the required distribution, which features the noise will be applied to and the consistency of application.
Deferring publication of data for a defined period
An organisation operates a network of technical assets some of which fail on occasion. If the data related to those assets was made available innovators could help to identify patterns which predict outages before the occur and improve the network stability. However, the data could also be used to target an attack on the network at a point which is already actively under strain and cause maximum impact.
However, by introducing a sufficient delay between the data being generated and published the organisation can mitigate the risk of the data being used to attack the network whilst benefiting from innovation.
Delaying the release of data is a simple but effective method of enabling detailed information to be released whilst mitigating many types of negative impact. However, it may be necessary to combine this with other mitigation techniques to completely mitigate more complicated risks.
An algorithm or model which obscures the original data to limit re-identification
An organisation collects rich data from customers which is highly valuable but sensitive (e.g. email content). The sensitivity of this data is very high but the potential for learning is also very high.
Differential privacy enables large amounts of data to be collected from many individuals whilst retaining privacy. Noise is added to individuals' data which is then ingested by a model. As large amounts of data are combined, the noise averages out and patterns can emerge. It is possible to design this process such that the results cannot be linked back to an individual user and privacy is preserved.
Differential privacy is an advanced technique but can be very effective. It is used by top technology firms to provide the benefits of machine learning but without the privacy impact that is usually required.
Sharing a model can be a highly effective way of enabling parties to access the benefit of highly sensitive, granular data but without proving direct access to the raw information. However, this is an emerging area so carries some complexity and risk.
Removing or overwriting selected features
Security / Legislation and Regulation
An organisation maintains data about a larger number of buildings across the country and their usage. Within the dataset there are a number of buildings which are identified as Critical National Infrastructure (CNI) sites which are at particular risk of targeted attack if they are known.
In this case it is possible to simply redact the data for the CNI sites and release the rest of the dataset (assuming there is no other sensitivity). Note, this approach works here because the dataset is not complete and therefore it is not possible to draw a conclusion about a site which is missing from the data as it may simply have not been included.
Redaction is commonplace when publishing data as it is a very effective method of reducing risk. However, care needs to be taken to ensure that it is not possible to deduce something by the lack of data. In general, if the scope and completeness of data is sufficient that the lack of data is noteworthy then redaction may not be appropriate. e.g. An authoritative map which has a conspicuous blank area indicates the site is likely to be of some interest or importance.
Combining data to reduce granularity of resolution, time, space or individuals
An organisation collects information about the performance of their private assets which form part of a wider system (e.g. energy generation output). This data could be of great use to the other actors within the system but releasing the data in its raw format may breach commercial agreements or provide competitors with an unfair advantage.
By aggregating the data (by technology, time, location or other dimension) the sensitivity can be reduced whilst maintaining some of the value of the data.
Aggregation is effective at reducing sensitivity but can significantly reduce the value of the data. It may be worth providing multiple aggregated views of the data to address the needs of a range of stakeholders.
Where aggregation is the only effective mechanism to reduce sensitivity organisations may want to consider providing access to aggregated data openly alongside more granular data that can be shared with restricted conditions.
Note, aggregating data which is of a low level of accuracy or quality can provide a misleading picture to potential user. Custodians should consider this when thinking about the use of data aggregation and make potential users aware of potential quality issues.
Shift / Rotate
Altering the position or orientation of spatial or time series data
An organisation collects data on how their customers use a mobile product including when and where. This movement data could be of value to other organisations in order to plan infrastructure investment but the data reveals the patterns of individuals which cannot be openly published.
The initial step is to remove any identifying features (e.g. device IDs) and break the movement data into small blocks. Each block of movement can then be shifted in time and space such that they cannot be reassembled it identify the movement patterns of individuals. This means that realistic, granular data can be shared but the privacy of individuals can be protected.
Shifting or rotating data can be useful to desensitise spatial or temporal data. However, it is important to recognise context to ensure that the data makes sense and cannot be easily reconstructed. For example, car journey data will almost always take place on roads and therefore rotation can make the data nonsensical and it can be pattern matched to the underlying road network with relative ease.
Making arbitrary changes to the data
An organisation may be the custodian of infrastructure data relating to a number of sensitive locations such as police stations or Ministry of Defence (MoD) buildings. The data itself is of use for a range of purposes but making it openly available could result in security impacts.
Randomising the data (generating arbitrary values) relating to the sensitive locations (rather than redaction) could reduce the sensitivity such that it can be open.
Randomisation can be very effective to reduce sensitivity but it is also destructive so impacts the quality of the underlying data.
Modifying data to reduce the difference between individual subjects
Negative Consumer Impact
An organisation provides and collects data about usage of a product in order to diagnose problems and optimise performance. The data has wider use beyond the core purpose, but the associated demographic data could result in bias towards certain groups.
By normalising the data (reducing variance and the ability to discriminate between points) it is possible to reduce the ability for certain factors to be used to differentiate between subjects and hence reduce types of bias.
Normalisation is a statistical technique that requires specialist skills to apply correctly. It may not be enough on its own to address all sensitivities so a multifaceted approach may be required.
Data Custodians should consider the impact of data modification on the usefulness of the data and seek to use techniques which retain the greatest fidelity of data whilst mitigating the identified issue. Given the diversity of data and variety of use cases it is not possible to define a definitive hierarchy of preferred data modification techniques but techniques which have limited impact on the overall scope and accuracy of the data should be prioritised over those which make substantial, global changes. e.g. Redaction of sensitive fields for <1% of records is likely to be better than dataset wide aggregation.
Level of Access
When the mitigation techniques have been applied as appropriate the data custodian should consider how open the resulting data can be. Where the mitigation has been successful the data can be published openly for all to use. However, if the nature of the data means that it is only valuable in its most granular form it may be necessary to reduce openness but keep granularity.
Level of Access
Data is made available for all to use, modify and distribute with no restrictions
Data is made publicly available but with some restrictions on usage
Data is made available to a limited group of participants possibly with some restrictions on usage
Data is only available within a single organisation
The above table is based on the ODI data spectrum.
Balancing Openness, Modification and User Needs
A key factor to consider is the needs of the potential data users. Initially, there may be value in providing aggregated summaries of data which can be made entirely open but as new user cases and user needs emerge we may find that access to more granular data is required which necessitates a more sophisticated mitigation technique or a more granular version of the data which is shared less openly. In some cases, it may be prudent to make multiple versions of a dataset available to serve the needs of a range of users.
The goal of presumed open is to make data as accessible as possible, this enables innovators to opportunistically explore data and identify opportunities that are not obvious. Wherever possible, the data custodian should seek to identify a mitigation approach which addresses all issues whilst maximising access to the most granular data.
In some cases, the number and diversity of issues may be so great that the mitigation or reduction of openness required to address all of the issues simultaneously is deemed too detrimental to the overall value of the data. In these cases, the data custodian could consider the user needs and individual use cases to help guide the mitigation strategies. For example, it may be possible to provide aggregated data openly for the purposes of statistical reporting but more granular data to a set of known participants via a secure data environment for another use case.
Issues that are identified should be clearly documented. Where issues have been mitigated through reduced openness or data modification, the mitigation technique should also be clearly documented.
Description of Issue
A description of the data and identification of features which contain personal data with additional flag for sensitive personal data.
Where possible, a description of the data and features which cannot be published. There may be some cases where acknowledging the data exists may represent a security risk, in which case the relevant regulator or government department should be consulted.
Negative Consumer Impact
A description of the data, sensitive fields and overview of the likely negative impact. Whilst it might not be possible to describe the likely negative impact in detail there should be some indication of the cause of the sensitivity e.g. market manipulation.
A description of the data along with an outline of how this impacts commercial interest, e.g. business administration data or justifiable private investment.
Legislation and Regulation
A description to the data and reference to specific articles of legislation or regulation which prohibits publication.
Proactive Peer Review
Risk management -
After the triage process has taken place the resulting data should be made available in line with the triage outcome accompanied by any supporting information and metadata. Where issues have been identified and mitigated this should be documented and made available ideally with the processing methodology or script such that potential users can understand the modifications to the data. This provides transparency, promotes consistency across the sector and enables challenge.
Publish descriptions of issues and mitigation actions alongside data
Data custodians should ensure that when releasing protocols and scripts that have been used to mitigate issues that this does not enable users to reverse engineering the raw data or sensitive information. For example, if noise is being added then the exact noise pattern should not be included.
Challenge and Review
Once a dataset and accompanying material has been published the data custodian should ensure that they have a process to regularly review the datasets that have been made available to ensure they are correct, relevant and there are no new issues arising. At a minimum, data custodians should aim to review and verify their open data triage assessments on an annual basis.
There should also be a process for those outside of the publishing organisation to challenge the granularity, format and level of openness of any data which is published or publicised.
Assessment of issues should be an ongoing activity which can be interrogated and challenged
Substation Connection Capacity
Identification of Discrete Datasets
The dataset that has been identified is substation connection capacity. This represents the likely connection headroom at every substation within an electricity distribution network's area. Monitoring data (substation and smart meters) is used to measure the current demand and network data is used to provide the maximum capacity.
Note - this is a scenario for the purposes of providing an example and does not represent an operational network monitoring solution.
Identification of Issues
Identification of Issues
Mitigation of Issues
Substation connection capacity is not personal data. However, if the dataset includes total capacity and used or available capacity then in the rare case where a substation serves a single customer this could be an individual's private data.
Analysis will take place before the data is released to identify any substations which serve a single premises. If these cases materialise the sensitive fields will be redacted and the data will be displayed as available capacity only to avoid personal data issues.
If the data represents the live status of the network then a bad actor could use this information to inform an attack.
Data can be delayed by 24 hours such that it is not possible to determine live status of the network.
Negative Consumer Impact
If an actor used the data to opportunistically request connections that utilise all of the available capacity it could drive up costs for future users, including for new housing, a cost which could be passed on to the consumer.
Ensure there is a process in place to stop actors stockpiling capacity and distribute costs fairly.
None - the network is a monopoly player with the duty to deliver an efficient, competitive system.
Legislation and Regulation
GDPR (if personal data is involved)
See privacy mitigation.
Due to the ability to mitigate all of the issues above the data can be open.
The metadata description will include details about how the dataset is generated, and the processing which has taken place to remove private data; the delay will be clearly stated. If the privacy issue is more prominent and a data pipeline is required then the methodology will be documented and made available alongside the data.
Data will be published as a csv file with core supporting information which describes the fields and provides units. The data will be hosted on an open data access platform on the company website and metadata will be registered with open data catalogues.
Challenge and Review
A feedback form will be provided on the website to enable challenge.
An innovator has indicated that for their specific use case they need to be able to access the data in near real time. In order to make this possible the decision has been made to provide a shared access data dashboard and Application Programming Interface (API) such that verified actors can gain access to the data in near real time. This has wider value for the technicians and engineers within the company as it provides high quality, low latency data which they can use in their roles.