OR/13/042 Summary of findings and proposed work

From MediaWiki
Jump to navigation Jump to search
Hughes, A G, Harpham, Q K, Riddick, A T, Royse, K R, and Singh, A. 2013. Meta-model: ensuring the widespread access to metadata and data for environmental models - scoping report. British Geological Survey External Report, OR/13/042.

Summary of findings

The following summarises the main findings of the work, the gaps that have been identified and how they could be tackled.

What does exist?

There are a significant number of standards for both discovery and technical metadata. There are also a range of services by which metadata can be recorded and the data stored alongside these data. NERC itself puts a significant amount of effort into storing data and model results and making the metadata available. For example there are seven Data Centres and the Data Catalogue Service (DCS) to search metadata for datasets stored in the NERC data centres.

What is used?

Whilst there has been a significant amount of time and effort put into standards, the use is variable. There are a number of different standards, which are mainly related to ISO standards, WaterML GEMINI and MEDIN, climate based standards as well as bespoke standards for data, but there is a lack of formal standards for model metadata. Storage of data and its associated metadata is facilitated via the NERC data centres with a reasonable uptake.

What gaps are there?

Whilst the standards and approaches for discovery and technical metadata for data are well advanced and, in theory, well used there are a number of issues:

  • Recognition of what the user wants rather than what the data manager feels is required.
  • Consolidation of discovery metadata schema based on ISO19115[1]
  • Recording different file formats and tools to allow ease of transfer from different file formats
  • Retrospective capture of metadata for data and models
  • Incorporation of time based information into metadata

However for model metadata, the situation is less well advanced. There is no internationally recognised standard for model metadata, which should include, but not necessarily be limited to:

  • Model code and version
  • Code Guardian who they are and contact details
  • Links to further information (URL to papers, manuals, etc),
  • details on how to run the models, etc.
  • Spatial extent of the model instance
  • Information on mixing data of different types (observed and modelled).

Other considerations include an assessment of data quality and uncertainty needs to be recorded to enable model uncertainty to be quantified and there is the issue of storage of the models themselves. The latter could either be the model code (via standard repositories) or the executable.

How could they be filled?

To assist the development of successful uptake of the storage and discovery of data and models, the following activities are required:

  • Development of a metadata standard for models based on ISO19115[1]
  • Creation or extension of a tool to record metadata including for time based data
    • Use of NERC’s DCS to store and serve discovery metadata for models
  • File conversion tools made more readily available
  • Provide for storage of models in an accessible form: code and executable

As well as this, there a number of initiatives that could provide tools and techniques to fill these gaps.

Details of activities

There is a need for a system that allows the storage and interrogation of metadata for model codes and their instances. A project is envisaged that would build on existing metadata standards (i.e. ISO19115[1]) to provide a suitable standard that could be used in conjunction with existing tools to provide a system to store model metadata. The development of this system should be undertaken in conjunction with suitable project partners, for example the Environment Agency (EA) and water companies. The work should be undertaken in conjunction with the NERC SIS ‘Model Code’ project. Whilst this initiative is using climate models as their example, this could be extended to included models of the terrestrial water environment, i.e. hydrologic and hydraulic models. The likely activities in this project are outlined below.

Activity 1 — Metadata standards for model discovery and use

The initial task will be to determine the metadata standard to be used for data along with the metadata standard to be used for models (as an extension of the standard used for data). It is likely to be related to ISO standards and be INSPIRE compliant, so develop recommendations for extensions to the ISO19115[1] discovery metadata standard in prototype form. This could build on the user consultation information gained during this scoping study, for example. Develop a suitable schema for technical/descriptive metadata to support using models (as distinct from discovery) models.

If the ISO standard is used then there is a need to investigate how to progress adding extensions with relevant ISO committees at an early stage.

Activity 2 — Stakeholder workshop and initial engagement

Establish discipline specific and also cross discipline stakeholder focus groups to review the proposed schemes (in a short time scale). These could include for example commercial users and relevant NERC data centre staff. Develop user requirements for capturing and also accessing model metadata (what applications are needed) — in association with stakeholder focus groups.

Liaise with NERC’s Model Code project via the Science and Information Strategy Board.

Activity 3 — Investigate feasibility of approach

Identify test-bed models/datasets, to test proposed scheme and applications. Develop a prototype application to allow searching selected models with input from NERC data centres, such as NGDC and EIDC. Develop tools to aid metadata capture — based on user requirements — maybe focus on a couple of common modelling environments.

Activity 4 — Cataloguing technology for model metadata

Building on existing technology such as the FluidEarth Model Catalogue and NERC’s DCS, a model catalogue will be developed. This will use a mix of input form and map based searches to enable the user to find model codes and their instances.

Alongside this, finalise metadata scheme and liaise with ISO committees to get standard extended/adopted.

Activity 5 — Investigate commercial feasibility

Application testing and release of appropriate model metadata capture applications. Create demonstration examples of how these datasets can be used to create commercial products, either by reprocessing and adding value, or by running further models against the data.

Activity 6 — Dissemination and stakeholder feedback

Dissemination of activities will be by the project partners usual channels e.g. FluidEarth, OpenMI, OGC, BGS webpages, EA standard processes etc. This will be supplemented by using the stakeholder group developed in activity 2. Alongside this then one or two showcase examples of the data in action will be created and used as exemplars to show the utility of the approach. To compliment this, a set of quotations from opinion formers will be used to build the impact case.

Set up user group to assist project direction which will include: Project staff, Environment Agency, water companies such as Thames Water, representative of NERC’s Model Code project. This will help steer the project.

Activity 7 — Project management

The project will be managed using the internal procedures of each organisation, for example PRINCE2. A project lead will be identified with overall responsibility for delivery and who will liaise with the funding body and ensure proper communication with the project partners.

References

  1. 1.0 1.1 1.2 1.3 ISO19115, 2003. Geographic information — Metadata. International Standards Organisation, ref: ISO19115:2003(E).