Building an Exabyte Data Archive to Advance Climate Science Research at DKRZ
Monday, June 28, 2021 1:15 PM to 1:30 PM · 15 min. (Africa/Abidjan)
Exascale SystemsHPC Workflows
Information
The German Climate Computing Center (DKRZ) is a central service center for German climate and earth system research and provides infrastructure for simulation-based climate science.
This presentation from DKRZ storage administrator Carsten Schmitt describes their strategy for creating an Exabyte Data Archive, which modernises their large tiered storage architecture to manage future data growth. Of key importance is the ability to transition existing datasets of over 150PB of archived data from proprietary systems to a vendor neutral, open standard environment that will enable accessibility to and growth of research data for years to come.
Scientists performing climate science research need to access and store many Petabytes of data, but a common problem is that data is often scattered over multiple storage resources. This is not only a technical problem but also a problem of storage organization and data management.
In this case, DRKZ had to ensure continual access to over 150PB of active archive data, plus accommodate at least 120PB per year of data growth, while transitioning from proprietary data formats to open standards during the seamless migration process, and be transparent to users.
The DKRZ Exabyte Data Archive solution is built with commercially available metadata-driven tools to modernize its tiered architecture so data can remain open and not bound to one storage vendor. In this way, the same principles and technologies used by DKRZ are applicable for other use cases and with storage types from any vendor, and therefore may help different environments in a similar way.
This presentation from DKRZ storage administrator Carsten Schmitt describes their strategy for creating an Exabyte Data Archive, which modernises their large tiered storage architecture to manage future data growth. Of key importance is the ability to transition existing datasets of over 150PB of archived data from proprietary systems to a vendor neutral, open standard environment that will enable accessibility to and growth of research data for years to come.
Scientists performing climate science research need to access and store many Petabytes of data, but a common problem is that data is often scattered over multiple storage resources. This is not only a technical problem but also a problem of storage organization and data management.
In this case, DRKZ had to ensure continual access to over 150PB of active archive data, plus accommodate at least 120PB per year of data growth, while transitioning from proprietary data formats to open standards during the seamless migration process, and be transparent to users.
The DKRZ Exabyte Data Archive solution is built with commercially available metadata-driven tools to modernize its tiered architecture so data can remain open and not bound to one storage vendor. In this way, the same principles and technologies used by DKRZ are applicable for other use cases and with storage types from any vendor, and therefore may help different environments in a similar way.