NASA’s Earth observation data can be used in many ways, from forecasting and responding to severe storms and wildfires to maximizing agricultural production. The key is continually making the data easier for people to find and use.
Recently, NASA took another giant leap forward in expanding access to its data by creating metadata pipelines for agency datasets included in the Data.nasa.gov, Data.gov, and Geoplatform.gov open science data portals as well as the National Geospatial Data Assets list. The federal sites are hubs the public, scientists, and decision makers can use to search and download open science data to support industry, innovation, and research.
The first part of the metadata project specifically focuses on taking the metadata paired with NASA's publicly available data collections — which number more than 10,000, aggregating more than 1.8 billion science records — and tailoring the metadata for ingestion into Data.nasa.gov and Data.gov. Metadata describe a dataset, providing essential details on its content, formatting, and retrieval, and are what the portals explore to locate files and provide information for search results and webpages.
Sometimes metadata for files are written, labeled, and organized in ways that are specific to the instrument or the home archive they’re stored in, which can create compatibility problems with outside portals that try to pull information from them. This means NASA needed to develop a method for translating its metadata to be compatible with the requirements of federal portal systems and webpages.
This new metadata pipeline is being created by a team from NASA's Earth Science Data Systems (ESDS) program who are experts in metadata. Doug Newman, science systems data lead for the Earth Science and Data Information System (ESDIS) project, is a key member of the team.
"In NASA Earth science we do have our own online catalog, called the Common Metadata Repository (CMR), that is particularly geared towards our NASA user community,” said Newman. “CMR works great in this case, but people outside of our immediate community might not have the familiarity and specific knowledge required to get the data they need. More general portals, such as Data.gov, are a natural place for them to go for government data, so it’s important that we have a presence there."
The job of engineering a system that could wrangle NASA’s proprietary metadata into versions compatible with the portals is being headed by Kaylin Bugbee, a data manager for the Office of the Chief Science Data Officer. Bugbee works with NASA’s Science Discovery Engine (SDE), which is a search system within NASA for the agency’s open science data, software, and resources. Bugbee and the SDE team have been using their expertise to create a workflow that gathers NASA's metadata, then maps the sometimes unique terms and other aspects to the versions used by Data.nasa.gov and Data.gov.
"We're in the process of testing out each step of the way and continuing to improve the metadata mapping so that it works well with the portals," said Bugbee.
The metadata project is also working with geospatial datasets. The project team has been identifying datasets containing geospatial data and packaging their metadata to specifically work with the more specialized Geoplatform.gov portal. Additionally, a subset of those datasets and work involves NASA files that are designated as National Geospatial Data Assets (NGDA).
The NDGAs are prominent geospatial datasets that are used by multiple federal agencies for critical work and arranged into 15 different themes ranging from postal address to transportation. Bugbee and her team have been developing a special metadata workflow that includes the ability to provide Geoplatform.gov visitors with an Earthdata Search link that takes them directly from the portal to NASA’s hundreds of NGDA-designated datasets.
The beauty of the new metadata workflow is that it not only draws from the preexisting metadata created for NASA archives, which saves the time and labor of manually preparing data for the portals, it also makes it clear what information to include for new, future, or updated datasets to be listed in the federal portals.
Initially, the team is focusing on preparing metadata for the popular Terra platform’s hundreds of Moderate Resolution Imaging Spectroradiometer (MODIS) and Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) products while it refines the workflow and delivery process to the federal portals. After that, they’ll expand to the many other instruments and freely open products. Before long, the whole of NASA’s open data will be flowing through the new metadata pipeline to portals, and consequently, people and projects everywhere.