Metadata for the web

Metadata is used to support the creation, management and evaluation of web content, and the delivery of content to different audiences. It is key to ensuring your online resources are discoverable and useable. Metadata can also improve transparency and trust in content by providing information about its provenance. Your approach to web metadata should be part of your agency’s wider metadata framework and based on approved terminology and goals.

AGLS Metadata Standard no longer in operation

Updated December 2023.

National Archives decommissioned the AGLS Metadata Standard in December 2023. Australian Government agencies are not required to use it to describe their web-based resources.

The AGLS Metadata Standard was originally developed to improve the visibility, availability and interoperability of online information. Its use was mandated for Australian Government agencies between 2000 and 2023. National Archives was responsible for overseeing the development and ongoing maintenance of the standard.

In 2022, National Archives decided to retire the AGLS Metadata Standard as it was no longer meeting its intended purpose and had low uptake. This was based on recommendations from a review that included consultation with a number of stakeholders.

The standard was published by Standards Australia in 2002 as AS 5044-02 and in 2010 as AS 5044-2010 (reconfirmed in 2020). Standards Australia continues to manage AS 5044. The role of the National Archives as the maintenance agency has ended.

This advice provides technical information directed at agencies and other users of the AGLS. We do not advocate for any specific replacement of the AGLS.

Arrangements from 2024

Any agency who has identified a business value in using AGLS metadata can continue to use it at their own discretion. The AGLS website has been archived by the National Library of Australia and the last version will still be accessible through this platform. Alternatively, agencies can revert to using the AGLS’s DCMI metadata terms only.

How web metadata is used

Search engines

Metadata helps search engines understand web content. The major search engines, such as Google and Bing, use HTML elements and Schema.org markup to inform their search results. The Australian Government Style Manual provides advice on meta tags, keywords and structure to make content discoverable through the major search engines. Your agency should evaluate if there is any additional value with implementing Schema.org markup.

You can also use metadata to improve search facilities on your agency’s website. Using metadata to describe web pages and digital assets improves user experience. It improves the precision of search results, allows filtering and sorting of results based on a range of criteria, provides enhanced descriptions of resources, and supports customisation of content. Metadata allows more precision and encourages standardisation across assets.

Changes to search technology

The way people find and interact with information online is changing every day. We encourage agencies to keep up-to-date with developments. Improvements in the way artificial intelligence (AI) is used to generate human-like text is one example of a technological change that is likely to have a profound effect on how audiences interact with information, and their expectations of what search engines will deliver.

Social media

Metadata schemas such as the Open Graph protocol are used by social media platforms to include external content in social graphs. Social graphs model the connections between people and things. The things people interact with, and which are recommended to others through social media, can include your agency’s web content. To ensure content is included, described and visually represented in a way that is consistent with an agency’s requirements, use the metadata recommended by that platform.

The best social media metadata schema(s) for an agency will depend on which platforms it uses and for what purpose.

RSS

RSS, or Really Simple Syndication, is a content distribution method that allows users to get the latest content from the websites of their choice via an aggregator. A number of Australian Government agencies continue to publish RSS feeds, with some even offering a range of feeds for targeted audiences.

RSS feeds are published as XML documents including 3 required metadata elements for describing both your feed (“channel”) and the items within it: title, link and description. There are also a range of optional elements you can use to provide more detailed information about your content, such as language, image, category or author.

RSS feeds can be created and updated manually but most content management systems will automate the process for you.

Linked Data

Web content is often mistakenly thought of as ‘unstructured data’. However, adding appropriate metadata to your content introduces structure and can integrate it with other online data in what is known as the Semantic Web. Using the Resource Description Framework (RDF) to mark-up content allows computers to compare metadata across the web to determine if it is referring to the same thing. Additional information can be retrieved from other linked web resources. The Semantic Web is like a large and constantly growing global database.

Linked Data relies on accepted web technologies such as URIs and HTTP(S), as well as metadata and encoding standards.

The Australian Government Linked Data Working Group can assist agencies with Linked Data-related advice.

Internal metadata use

Metadata is critical to making many internal tasks possible relating to the creation, management and tracking of content.

Metadata can help with:

finding existing resources to repurpose
managing rights
applying appropriate security controls
tracking workflow actions, such as approvals
rendering content dynamically based on established criteria
personalising content
analysing content to identify which topics or formats have high visitor numbers
increasing transparency by disclosing details such as date last updated or author
managing content review cycles and versioning.

Metadata schemas for the web

Metadata schemas usually have a specific focus so you need to determine which one matches your requirements. It is common practice to use more than one type of metadata on web pages. Wherever possible you should limit the number of schemas in use and adopt external schemas with wide uptake and publicly available documentation.

HTML elements and meta tags

Everything in an HTML document could be described as “metadata” because HTML is a markup language made up of elements that are marked with tags, such as <p> for paragraph or <h1> for heading 1. Despite this, the word metadata is generally reserved for descriptive elements and attributes within a web page’s <head> element. Attributes within the nested <meta> element are commonly called meta tags.

Properties from metadata schemes like the DCMI Metadata Terms can be added using meta tags but there are also a number of native HTML elements and meta tags, some of which are used by the major search engines. The On-page optimisation section of the Australian Government Style Manual recommends using the <title> element and <description> meta tag, both of which are used by the major search engines to populate search results.

Schema.org

Schema.org is a metadata standard developed and used by the major search engines, including Google, Microsoft and Yahoo. Referred to as a vocabulary, Schema.org is made up of many schemas that describe entities called “types”, such as Article, Event or GovernmentOrganization. Types are arranged hierarchically. For example, the OpinionNewsArticle type has its own properties as well as properties inherited from its parent types, Article and CreativeWork. A single web page can include descriptions of multiple types and show the relationship between those types.

Schema.org is commonly referred to as ‘structured data markup’. As with other forms of metadata, most content management systems offer plug-ins to add Schema.org structured data markup to content. Schema.org uses the W3C’s Web Ontology Language (OWL) data model which means use of schema.org is aligned with Linked Data and the Semantic Web.

One of the goals of Schema.org is to structure data so it can be displayed outside the context of where it is originally published. This can include the “rich snippets” displayed alongside search engine results. For example, by publishing the details of an event with Schema.org markup, search engines may include your event at the top of their search results and allow users to get directions or book a ticket directly from the search results. Schema.org Speakable markup can also be used by virtual assistants to determine the parts of web pages that can be spoken aloud.

Search engines may only use a subset of Schema.org types and using Schema.org markup does not guarantee web content will be included in rich results. Schema.org also has applications beyond search engine optimisation. It is a good idea to develop specific use cases and weigh up the benefits against the resourcing required.

The US Government used Schema.org SpecialAnnouncement markup to make COVID-19 information easier to find in search results. The UK Government have also trialled a small number of types, including Article, Breadcrumb, Dataset, FAQPage, GovernmentOrganization, HowTo and NewsArticle. There is no government-wide application of Schema.org in Australia but a number of Australian Government agencies are using it (see case studies).

DCMI Metadata Terms

Dublin Core is a well-known metadata standard originally developed in 1998 to describe all kinds of web resources. The fifteen core properties are part of a larger set of terms now called the DCMI Metadata Terms, DCMI being the Dublin Core Metadata Initiative who oversees the standard and its ongoing development.

While the DCMI Metadata Terms were once commonly used within HTML meta tags to describe resources to facilitate search engine discovery, this has become less common as search engine technology has evolved. The DCMI Metadata Terms are increasingly used to enable Linked Data, and are also useful for the internal management of web content. The standard is also frequently used in Digital Asset Management Systems to describe images and other media assets.

Open Graph Protocol

The Open Graph Protocol (OGP) was developed by Facebook and is used by various social media platforms including Facebook, LinkedIn and Pinterest. It is a lightweight schema used to make any web content item an “object” in a social graph.

Apart from social media, OGP is also used by some search engines for rich search results, but its scope in this regard is limited compared to Schema.org.

Metadata for digital assets

Metadata for digital assets, such as images, videos, audio and documents, makes them discoverable, manageable and reusable. The metadata can be separate to the object, such as Schema.org or Dublin Core properties in a digital asset management system, and/or embedded in the file itself.

The Extensible Metadata Platform (XMP) is a standard which provides guidance on embedding metadata into common formats, such as JPEG, MP4 and PDF. The core XMP properties are extensible and the data model can accommodate any metadata schema.

Some distribution channels, such as YouTube or Soundcloud, require users to enter proprietary metadata that facilitates the functionality of the platform. If your agency is going to use these platforms frequently it is good idea to integrate the properties into embedded metadata or your digital asset management system.

Encoding standards

Metadata schemas work in tandem with encoding standards to enable data exchange. As with metadata schemas, consider web encoding standards that meet your particular requirements. Standards that are public and widely used improve interoperability. Most metadata schemas will mandate, recommend or provide specifications for one or more encoding standards.

Data serialisation format

Data serialisation formats, also known as markup languages, make metadata usable by different computer systems. The following are examples of serialisation formats commonly used to mark up web resources with metadata: HTML, XML, Microdata, RDFa and JSON-LD. JSON-LD is based on the established JSON format and provides a method of encoding Linked Data. It is the format favoured by the major search engines for adding Schema.org structured data markup (at the time of writing).

Data models

A data model is an abstract set of rules for representing the meaning of information. The Resource Description Framework (RDF) is a World Wide Web Consortium (W3C) standard used to describe and exchange graph data, and is central to Linked Data. It is like a grammar. In fact, it is expressed in “triples” which are named using the grammatical terms subject, predicate and object. RDF can be expressed in various data serialisation formats, including XML, RDFa, microdata and JSON-LD.

Another W3C data model is the Web Ontology Language (OWL), which is a language for authoring ontologies (see Data value below).

Data type and format

If you want to share, merge or make sense of metadata, it is important data is consistent and understandable. It is best practice to use the same metadata standard and data serialisation format to allow for comparisons to be made, promote re-use of metadata and support interoperability.

The data type and format for each property will usually be built into a metadata standard. Some properties might be free text, but others will be specific data types such as a date which has to confirm to a particular format such as ISO 8601 (YYYY-MM-DD).

Other formatting rules might include the order of values (e.g. for personal names), punctuation, symbols, and abbreviations. Ideally a content management system will be designed to only allow the input of valid or consistent data types and formats.

Data value

Limiting the number of permissible values for some properties will make your metadata more useful and interoperable. Schema.org, with a few exceptions, does not provide guidance on how to fill in text properties, but this does not mean you shouldn’t use a standard vocabulary.

Vocabularies include:

taxonomies, in which terms are hierarchically arranged with broader and narrower terms
thesauri, which are taxonomies that also specify related, preferred and non-preferred terms
ontologies, which groups together items based on common properties in a conceptual web for a subject area or domain.

The Australian Governments’ Interactive Functions Thesaurus (AGIFT) is an example of a controlled vocabulary that covers terminology related to Australian Government functions and activities. Similarly, the Department of Finance maintains a list of functional categories for expense reporting.

Using published vocabularies harmonises terminology across agencies and greatly improves the discoverability of resources. Research Vocabularies Australia is a service provided by the Australian Research Data Commons. It facilitates search and reuse of a variety of vocabularies. A number of Australian Government agencies maintain vocabularies, like the Australian Bureau of Statistics’ Classifications, the AIATSIS Thesauri, or Healthdirect Australia's Australian Health Thesaurus (AHT).

Information and data governance

Agency web content is Australian Government information and should be managed properly. National Archives has published a lot of advice online for agencies, including advice on how to get started with information management.

Don’t forget to factor your agency’s approach to web metadata into your wider information governance considerations. Our advice on building data interoperability should also be considered.

FAIR data principles

FAIR stands for Findable, Accessible, Interoperable and Reusable. We recommend agencies consult the FAIR principles when seeking to improve the discoverability and useability of their web content in the same way they might for structured data assets. For instance, FAIR principle F1 – assigning a globally unique and persistent identifier - will improve the discoverability of key web pages and open them up for reuse, for example in Linked Data applications.

The Australian Research Data Commons provides a number FAIR data resources, including a self-assessment tool and training materials.

Case studies

Please get in touch with us via the Agency Service Centre if your agency has a web metadata case study that can be shared online. Here is a link to a Geoscience Australia case study that may be useful.

Geoscience Australia’s use of metadata for online resources

More information

Agency Service Centre – ask us a question
Australian Government Style Manual
Australian Government Linked Data Working Group
Standards Australia AS 5044: AGLS Metadata Standard
Schema.org
DCMI Metadata Terms (Dublin Core)
FAIR data principles
Requirements for Australian Government websites (Digital Transformation Agency)
Australian Cyber Security Centre – provides advice and alerts to improve cyber security in Australia