
Automatic data cleaning and derivation rules
Cleaning and derivation rules for HES have been developed over time and continue to evolve to enhance the data set further. The rules are described in the Related documents section of this page and are also available in the HES data dictionaries.
These rules have two main purposes:
To clean common and obvious data quality errors
For example, Rule #0150 looks for evidence where a Birth Episode (CDS type 120) has been incorrectly submitted to SUS as a General Episode (CDS type 130). If evidence is found then the Episode Type of the record is altered to reflect this.
Without this clean, the number of birth records (Episode Type 3) in HES would tend to be lower than the actual number of births taking place. This improves the value of HES as a statistical data source.
To derive additional data items to populate the HES dataset
Rule #1200 uses the postcode from each submitted CDS record to derive additional geographical data items relating to the episode of care. HES uses reference data from the ONS Postcode Directory to derive data items such as Parliamentary Constituency or Strategic Health Authority of the patient's residence. This allows record level data to be easily aggregated to enable effective spatial analysis to be performed.