Coupled Storage System for Efficient Management of Self-Describing Data Formats
Information
Contributors:
Abstract:
The project goal is to explore the benefits of a coupled storage system for self-describing data formats. It will introduce a novel hybrid approach leveraging storage technologies from the fields of high-performance computing and database systems, where each technology will be used according to its respective strengths and weaknesses. By coupling the storage system tightly with self-describing data formats, it can make use of structural information for selecting appropriate storage technologies and tiers. As such information is currently not available, storage systems have to employ heuristics, which often lead to suboptimal performance as well as unnecessary and expensive data movements. Moreover, the storage system will support adaptable I/O semantics to tune its performance according to application and data format requirements. Together, these features will enable completely new data management methods and provide significant performance improvements. Existing workflows of scientific users will be supported through a dedicated data analysis interface. All changes will be thoroughly tested to ensure backwards compatibility with existing applications and interfaces. Consequently, no modifications will be necessary to run applications on top of CoSEMoS, which helps preserve past investments in scientific software development.Visit the Project Website
- Michael Kuhn (Otto von Guericke University Magdeburg)
- Kira Duwe (Otto von Guericke University Magdeburg)
Abstract:
The project goal is to explore the benefits of a coupled storage system for self-describing data formats. It will introduce a novel hybrid approach leveraging storage technologies from the fields of high-performance computing and database systems, where each technology will be used according to its respective strengths and weaknesses. By coupling the storage system tightly with self-describing data formats, it can make use of structural information for selecting appropriate storage technologies and tiers. As such information is currently not available, storage systems have to employ heuristics, which often lead to suboptimal performance as well as unnecessary and expensive data movements. Moreover, the storage system will support adaptable I/O semantics to tune its performance according to application and data format requirements. Together, these features will enable completely new data management methods and provide significant performance improvements. Existing workflows of scientific users will be supported through a dedicated data analysis interface. All changes will be thoroughly tested to ensure backwards compatibility with existing applications and interfaces. Consequently, no modifications will be necessary to run applications on top of CoSEMoS, which helps preserve past investments in scientific software development.Visit the Project Website