OR/13/042 Summary of best practice
Hughes, A G, Harpham, Q K, Riddick, A T, Royse, K R, and Singh, A. 2013. Meta-model: ensuring the widespread access to metadata and data for environmental models - scoping report. British Geological Survey External Report, OR/13/042. |
Current metadata standards
Since datasets form the boundary condition inputs and resultant outputs of modelling studies, the authors consider that a study of metadata for models should also include that of the metadata for the supporting datasets. Indeed, it is expected that the two will be very similar and derived from the same base standards.
We begin by looking at the metadata elements associated with various forms of typical environmental monitoring data divided into categories based around the geospatial structure of the dataset. This allows modelling data to be included easily alongside that of its measured equivalents. Early versions of the Climate Science Modelling Language (CSML) identified 13 geospatial data feature types for describing measured and modelled datasets. CSML version 3 offers 10 (OGC, 2011[1]), with an additional ‘observation’ feature type. Version 3 feature types are specialisations of the O&M model (ISO19156) with the exception of ‘observation’ which is a direct usage. The 10 feature types are as follows (OGC, 2011[1]):
- Point — A single observation at a point e.g. a single raingauge measurement
- PointSeries — A time series of single datum observations at a fixed location e.g. a stream of measurements of a single parameter from a tide gauge, buoy or weather station
- Profile — An observation of a parameter along a vertical line in space e.g. a wind sounding or radiosonde
- ProfileSeries — A time-series of profiles on fixed vertical levels at a fixed location e.g. vertical radar timeseries
- Grid — Single time-snapshot of a gridded field
- GridSeries — Time-series of gridded parameter fields e.g. a numerical weather prediction model output
- Trajectory — An observation along a discrete path in time and space e.g. aerosol measurements along an aircraft’s flight path
- Section — A series of profiles from a trajectory in time and space e.g. marine Conductivity and Temperature Data (CTD) measurements along a ship’s track
- Swath — Two-dimensional grid of data along a satellite ground path. e.g. AVHRR satellite imagery
- ScanningRadar — Backscatter profiles along a look direction at fixed elevation but rotating in azimuth e.g. a weather radar output
Data produced according to each of these feature types is typically combined with, either a separate metadata file, or metadata incorporated into the data file itself. This metadata can be divided into an number of categories, some more closely related to information required to find the dataset (‘discovery metadata’) and some more closely related to information required in order to use the dataset once it has been obtained (‘use metadata’). Of course, some information is useful for both locating and using the dataset. Table 2 gives examples of the metadata elements given by three environmental datasets, one set of model results and two from sensors and together they represent six of the ten geospatial feature types given in CSML version 3:
- NetCDF CF: Meteorological model results stored as the Climate and Forecasting version of NetCDF (NetCDF CF) as part of the DRIHM project [www.unidata.ucar.edu/software/netcdf/]
- WaterML 2.0: Data served via a web service from the HydroServer instance at SDSC in San Diego [www.opengeospatial.org/standards/waterml]
- Satellite: Altimeter data from the Envisat mission stored as part of the GlobWave dataset, stored as NetCDF [www.unidata.ucar.edu/software/netcdf/]
MetaData | NetCDF CF DRIHM Implementation (CSML Grid and GridSeries) | WaterML 2.0 (CSML Point and PointSeries) | Satellite NetCDF Envisat Altimeter Data (CSML Swath) |
Ownership and Contact Details | :email, :institution |
gmd:organisationName, gmd:pointOfContact, gmd:individualName, gmd:role, gmd:onlineResource, gmd:address, gmd:phone, gmd:electronicMailAddress |
:institution, :contact, :processing_center, :source_provider |
Title and Abstract | :title, :comment, :filename |
gmd:title, gmd:abstract, gmd:citation |
:title |
Provenance | :projectinfo, :algorithm, :history, :source, :model_name, :model_description, |
wml2:generationSystem, om:featureOfInterest, wml2:ObservationProcess, wml2:processing, wml2:processType |
:source, :project, :history, :mission_name, :source_name |
Reference Dates and Times | :calendar, :time, :time_bounds, :time_date, :datestart, :dateend, :filedate, :julday, :julyear, :GMT, :dt, :units |
wml2:generationDate, om:phenomenonTime, om:resultTime, gml:TimePeriod, gml:TimeInstant, gml:timePosition, gml:beginPosition, gml:endPosition, wml2:time, wml2:temporalExtent, wml2:aggregationDuration |
:start_date, :stop_date, :calendar |
Spatial Extent, Geometry, SRS | :coordinates, :grid_mapping_name, :dx, :dy, :griddim_bottomtop, :griddim_southnorth, :griddim_westeast, :units, :epsg_code, :bounding_box, :inverse_flattening, :semi_major_axis, :longitude_of_prime_meridian, :grid_mapping_name |
wml2:samplingFeatureMember, sams:shape, gml:Point, gml:pos srsName, wml2:MonitoringPoint, wml/siteProperty/elevation_m |
:comment, :coordinates, :scale_factor, :add_offset |
Phenomenon/Parameters | :standard_name, :long_name |
om:observedProperty, wml2:parameter, wml2:qualifier, wml2:processReference, wml2:sampledMedium |
:altimeter_sensor_name, :radiometer_sensor_name, :long_name, :standard_name, :calibration_formula, :calibration_reference |
Units of Measurement | :units | wml2:uom | :units |
Technical | :cell_method, :_Netcdf4Dimid, :_FillValue |
wml2:interpolationType, wml2:source, wml2:cumulative, name xlink:title="noDataValue", wml2:aggregationDuration |
:software_version, :source_software, :source_version, :acq_station_name, :cycle_number, :pass_number, :equator_crossing_time, :equator_crossing_longitude, :product_version, :_FillValue, :flag_masks, :flag_meanings, :valid_min, :valid_max |
Quality Measures | wml2:quality | :quality_flag | |
Licence and IP | |||
Standards Definitions | :references | gmd:language, gmd:CI_RoleCode, gml:Dictionary, gml:dictionaryEntry |
:references |
Table 3 compares these common metadata categories, established by looking at reasonably mature implementations of modelled and measured data, with the core metadata for geographic datasets found in ISO19115[2] [OGC, 2003; Table 4]. This core metadata from ISO19115[2] constitutes mandatory elements (M) recommended but option elements (O) and elements mandatory under certain conditions (C).
MetaData | ISO19115[2] |
Ownership and Contact Details | Dataset responsible party (O) (MD_Metadata > MD_DataIdentification.pointOfContact > CI_ResponsibleParty) |
Title and Abstract | Dataset title (M) (MD_Metadata > MD_DataIdentification.citation > CI_Citation.title) Dataset topic category (M) (MD_Metadata > MD_DataIdentification.topicCategory) Abstract describing the dataset (M) (MD_Metadata > MD_DataIdentification.abstract) |
Provenance | Lineage (O) (MD_Metadata > DQ_DataQuality.lineage > LI_Lineage) |
Reference Dates and Times | Dataset reference date (M) (MD_Metadata > MD_DataIdentification.citation > CI_Citation.date) Additional extent information for the dataset (temporal) (O) (MD_Metadata > MD_DataIdentification.extent > EX_Extent > EX_TemporalExtent or EX_VerticalExtent) Reference system (O) (MD_Metadata > MD_ReferenceSystem) |
Spatial Extent, Geometry, SRS | Geographic location of the dataset (by four coordinates or by geographic identifier) (C) (MD_Metadata > MD_DataIdentification.extent > EX_Extent > EX_GeographicExtent > EX_GeographicBoundingBox or EX_GeographicDescription) Spatial resolution of the dataset (O) (MD_Metadata > MD_DataIdentification.spatialResolution > MD_Resolution.equivalentScale or MD_Resolution.distance) Additional extent information for the dataset (vertical) (O) (MD_Metadata > MD_DataIdentification.extent > EX_Extent > EX_TemporalExtent or EX_VerticalExtent) Spatial representation type (O) (MD_Metadata > MD_DataIdentification.spatialRepresentationType) Reference system (O) (MD_Metadata > MD_ReferenceSystem) |
Phenomenon/Parameters | |
Units of Measurement | |
Technical | Dataset character set (C) (MD_Metadata > MD_DataIdentification.characterSet) Distribution format (O) (MD_Metadata > MD_Distribution > MD_Format.name and MD_Format.version) On-line resource (O) (MD_Metadata > MD_Distribution > MD_DigitalTransferOption.onLine > CI_OnlineResource) Metadata file identifier (O) (MD_Metadata.fileIdentifier) Dataset language (M) (MD_Metadata > MD_DataIdentification.language) |
Quality Measures | |
Licence and IP | |
Standards Definitions | Metadata standard name (O) (MD_Metadata.metadataStandardName) Metadata standard version (O) (MD_Metadata.metadataStandardVersion) Metadata language (C) (MD_Metadata.language) Metadata character set (C) (MD_Metadata.characterSet) Metadata point of contact (M) (MD_Metadata.contact > CI_ResponsibleParty) Metadata date stamp (M) (MD_Metadata.dateStamp) |
It can be seen that all but four of the common metadata categories are covered by what is considered ‘core’ ISO19115[2]: Phenomenon/Parameters, Units of Measurement, Quality Measures, Licence and IP. Quality Measures are covered by the optional element DQ_DataQuality and Licence and IP through the optional constraints element MD_Constraints.
Handling physical, chemical or biological parameters and their units of measurement is best achieved through the use of phenomenon and unit dictionaries such as climate and forecasting (CF) standard names [OGC, 2011[1]] (see Table 4) or the BODC parameter code units definition [www.bodc.ac.uk/] (see Table 5).
Entry ID | Canonical Units | Description |
wave_frequency | s-1 | Frequency is the number of oscillations of a wave per unit time |
BODC Parameter Code | Units | Definition | Minimum Permissible Value | Maximum Permissible Value | Absent Data Value |
CTMPZZ01 | Degrees Celsius | Temperature of the atmosphere | -100 | 60 | -999 |
Current usage
There are a significant amount of tools, initiatives and technologies related to metadata. A literature search is summarised in Appendix 2 - Summary of current approaches under the following headings:
- Metadata tools
- Repository technologies
- Storage technologies
- Data preservation technologies — summary and main trends
- Data discovery and access
- Technologies and frameworks for processing data
The main findings from this search include:
- Availability of tools for producing metadata in XML format and the use of the OpenSource GeoNetwork which is used within the FluidEarth Model Catalogue to display the location of model instances
- Availability of system to store digital objects such as FEDORA
- NERC has put significant resources into a data store under the JASMIN project
- Use of Open Geospatial Consortium standards to define catalogue services
- There are a significant number of portals including the INSPIRE (see below) Geoportal to display spatial data
- Once models become more readily available then there is the potential to be linked using frameworks such as ESMF and standards such as OpenMI
INSPIRE
The INSPIRE Directive aims at creating an infrastructure for geographical information interoperability in Europe. In this context data holders should publish their geographic datasets through a range of Network Services. INSPIRE Transformation services provide a means to transform a given dataset through the invoking of a service implementing a standardized procedure on a remote machine. Typical examples of transformation services are the schema transformation which transforms the structure of the input dataset and the Coordinate Reference System (CRS) transformation which can be used to bring together datasets based on different CRS.
NERC initiatives
Data Catalogue Service (DCS)
The DCS (see www.data-search.nerc.ac.uk/) allows you to search a catalogue of metadata (information describing data) to discover and gain access to NERC's data holdings and information products. The metadata are prepared to a common NERC Metadata Standard and are provided to the catalogue by the NERC Data Centres.
Data Providers create metadata documents describing data resources. These are published by each data provider to make them available for others to access. An automatic process gathers or harvests these documents from each data provider, and ingests them into a database where they are stored alongside those from other data providers. Data providers have control over their publishing tool via the Data Providers Admin Interface. A web service carries out searches of this database in response to search requests received from a search interface, possibly hosted by a third party as part of a web portal. The web service returns results back to the search interface, for presentation by the search interface to display to the user. Search tools included in the search interface help the user construct search requests based on time periods, geographic areas and text terms from controlled vocabularies, provided by a vocab server.

Lowland Catchment Research (LOCAR) Data Management
The LOCAR data Centre was set up to manage the scientific data produced by the NERC LOCAR Thematic Programme, which finished in March 2006. The aim of the Data Centre was to create an integrated, quality controlled, quality assured database readily accessible to LOCAR scientists, and to the wider scientific community.

To create the database the Data Centre was responsible for specifying procedures, formats and media in which data will be received from the field and disseminated to users, setting up a data management policy, and ensuring that data were held securely. The Data Centre actively sought out existing NERC and third party datasets, and was responsible for disseminating field data as it become available, and for storage and dissemination of the datasets created by LOCAR researchers.
Earth Science Academic Archive
The Earth Science Academic Archive has been set up as part of the National Geoscience Data Centre (NGDC) to deposit the results of the relevant Earth Science research. The ESAA accepts results from NERC research and any other similar research projects to ensure their long term safe keeping and future use.
The Earth Science Academic Archive is responsible for:
- liaising with principal investigators and other NERC grant holders to ensure that appropriate data are offered to the NGDC
- selection of data for inclusion in the NGDC in liaison with BGS scientists and other stakeholders
- long-term curation and preservation of analogue and digital data (including samples)
- publicising the holdings and making available information on the web
Examples of types of data submitted to the ESAA:
- research reports
- photographs
- spreadsheet data
- figures and diagrams
- 3D models
It is essential that all data gatherers/generators provide appropriate metadata to their Data Centre, in line with current metadata standards, such as the ‘working standards‘ provided by Holmes et al, 1999[3]. These ‘working standards’ are in turn derived from more comprehensive National Geospatial framework Archive and ISO standards.
References
- ↑ 1.0 1.1 1.2 OGC, 2011. Climate Science Modelling Language (CSML): Sampling Coverage Observations for the met/ocean domain, Open Geospatial Organisation, OGC 11–021.
- ↑ 2.0 2.1 2.2 2.3 2.4 ISO19115, 2003. Geographic information — Metadata. International Standards Organisation, ref: ISO19115:2003(E).
- ↑ Holmes, K A, Dobinson, A, Giles, J R A, Johnson, C C, Lawrence, D J D, and McInnes, J L. 1999. BGSgeoIDS Metadata — Issues Document. British Geological Survey Technical Report, WO/99/01R.