Accelerating Science Using Virtualized Data at PO.DAAC

TThis Jupyter Notebook workflow image shows an 11 TB analysis-ready virtualized sea surface temperature dataset from NASA’s PO.DAAC loaded with Xarray. A quick contour plot of the first timestep is shown of the left and a regional mean time series plot off the U.S. West Coast on the right.

This Jupyter Notebook workflow image shows an 11 TB analysis-ready virtualized sea surface temperature dataset from NASA’s PO.DAAC loaded with Xarray. A quick contour plot of the first timestep is shown of the left and a regional mean time series plot off the U.S. West Coast on the right.

As Earth science data archives and data density continue to increase, traditional science workflows of data download, conditioning, and analysis become more and more unwieldy. Network bandwidth, local storage, and computer performance all place cost and time constraints that an investigator must account for before science and hypothesis testing can begin.

Virtualized datasets offer a pathway to navigate around these issues; these lightweight reference files can be used to access an entire data record using Python packages like Xarray. From there, users can quickly subset to their region and timespan of interest, eliminating the need to download and subset thousands of files and terabytes of data. This presents a new pathway for both streamlined data access and improved science workflows where a user can easily iterate over datasets, change space and time bounds, and quickly compare complementary datasets.

NASA’s Physical Oceanography Distributed Active Archive Center (PO.DAAC) has created 10 virtualized datasets covering ocean currents, winds, bottom pressure, sea surface height, salinity, and temperature from satellite observations and ocean models. In this webinar, we will briefly describe the fundamentals of the technology and demonstrate how to use it in Python scripts and notebooks. We also present performance metrics from computing a regional mean time series of satellite records 25-40 years in length, showing a full order of magnitude improvement in compute time compared to traditional access and methods.

Lastly, we will present examples of utilizing virtual datasets to conduct real world science investigations, including interdisciplinary relationships between wind and ocean response during upwelling events, Indian Ocean Dipole surface characteristics, and the ocean response to El Niños.

Documents

Access slides, guide documents, and other resources related to this event.

Name Sort descending	Description	File Type	Published Date
Accelerating Science Using Virtualized Data at PO.DAAC	This presentation provides information about how to easily and rapidly access the entire NASA Physical Oceanography Distributed Active Archive Center (PO.DAAC) collections from a Jupyter notebook via virtual datasets (VDS’s). It provides a brief overview of the new technology and demonstrates basic usage and access while also providing links to resources for working with VDS's.		Aug. 21, 2025

SHOWING 1 OF 1

Rows

Details

Last Updated

Aug. 21, 2025

Published

Aug. 12, 2025

Data Archive

PO.DAAC

Accelerating Science Using Virtualized Data at PO.DAAC

Presenter(s)

Hosted By

Time

Location

Documents

Details

Last Updated

Published

Data Archive

Find Data

By Platform

By Topic

Data Catalog

Data Tools