This article offers an overview of the data quality process involved in the production of HES data extracts.
![]() |
The diagram above is a simplified version of how HES is extracted. It can be broken down as follows:
The HES Data Quality team are responsible for cleaning the data, enabling the HES Output Universe to be created from the HES Input Universe.
In previous years extracts were submitted quarterly (provisional data), with an additional extract called annual refresh (published data) at the end of quarter four.
Since April 2008-09 these extracts have been taken from SUS on a monthly basis. Each extract contains data submitted for the year so far, ie Month 1 will only contain the data submitted for April, but Month 6 will contain data submitted from April to September. One of the reasons for this is that additional data may need to be added to an episode from earlier in the year, eg an episode may potentially run for several months or an amendment may need to be made.
You can find details of the current HES submission deadlines on the Submission Deadlines page.
Within the HES Input Universe, cleaning is broken down into four stages:
1. Provider mapping
During this stage any old or invalid provider codes are changed / merged to new valid provider codes (using reference data based on information from the ODS website).
2. Automatic cleaning
During this stage a pre-defined list of cleaning rules that remove or correct common errors is worked through, including:
The introduction of XML validation is helping to improve data quality from the initial submission stage.
3. Manual cleaning
During this stage:
4. Derivation
During this stage the following information is derived:
Full details of the cleaning process can be found in the Data Cleaning section.
Feedback
An important part of the cleaning process is feedback. The Data Quality team liaise with providers on a range of subjects, including:
They also:
The HES data dictionaries, cleaning rules files and the Data Quality Dashboard are good sources of information about HES fields and data quality.
Aims of the Data Quality team
The HES Data Quality team aims to:
Further information
If you have any questions about HES data quality or anything covered in this article please contact the HES Data Quality team ([email protected]).