Metadata repository
A metadata repository is a data store for metadata. It is the aggregation of a wide range of metadata from across your agency.
Bringing your individual repositories together to develop a central repository of metadata can be beneficial. It allows consumers to look across all of your agency's available information from one point.
TIP: technologies such as ETL have their own metadata repository. Don't forget to integrate these when creating a central repository.
Metadata harvesting
Metadata harvesting tools and protocols can assist agencies in indexing large batches of metadata records.
Metadata harvesting uses automated tools to collect metadata descriptions from diverse sources such as catalogues, websites and other repositories. The Open Archives Initiative protocol for Metadata Harvesting (OAI-PMH) is an example of a protocol or Application Programming Interface (API) for harvesting data. It supports aggregating data from multiple sources into one collection.
Metadata for publication and exchange
Your agency can enhance metadata management to support publication and exchange of data by:
- implementing centralised metadata repositories
- updating metadata files to align with standards used at other agencies.
Many approaches to designing and implementing metadata exchange across your agency are available. Metadata architecture can be planned so that it facilitates this exchange. Consult with your internal specialists to determine which solutions work best for your agency.
Common architectural approaches include:
-
Centralised metadata architecture
This involves copying metadata information from other applications and replicating it in a centralised repository. Users can perform global searches through a single application.
- Strengths:
- metadata information is accessed from one point
- opportunity for improving metadata quality by aggregating and transforming metadata sources into one standard
- prompt query retrieval
- manual metadata entry is possible.
- Limitations:
- complex maintenance and version control
- challenging tasks such as rapid replication or synchronisation of metadata
- custom code may be required to integrate metadata into the centralised repository's schema.
-
Distributed metadata architecture
This consists of an application that retrieves data from source metadata in real-time, when a user requests information. There is no centralised repository. The intermediary application uses source catalogues to determine which repository to request information from.
- Strengths:
- no maintenance or version control required as the metadata is from the source
- processing is reduced as there is no metadata replication and queries are distributed among different sources.
- Limitations:
- metadata sources may not adhere to the same standards. Custom code may be required to retrieve the different metadata structures
- capture of additional metadata from external repositories can be difficult.
-
Hybrid metadata architecture
This uses a combination of centralised and distributed metadata architecture. It provides both real-time access and allows manual entry of metadata.
- Strengths:
- metadata information is accessed from one point in real-time
- metadata can be added to the repository
- manual metadata entry is possible
- by adding metadata you can implement version control
- metadata quality can be improved by users.
- Limitations:
- dependent on source metadata repositories being available.