Data Lakes
Tuesday, May 31, 2022 9:00 AM to 6:30 PM · 9 hr. 30 min. (Europe/Berlin)
Foyer 3 + H - Ground Floor
Information
n recent years, classic HPC users have seen an ever-increasing interest in the public cloud that is used as part of traditional HPC workflows. There are many reasons for this, e.g. special hardware components such as TPUs or special GPUs are available in the cloud earlier than in a local data center.
In addition, there is a need for users to store any data for analysis using AI methods in different data silos and to be able to access them flexibly from HPC and cloud systems. A central role for data analytics workflows is the flexible data migration and provision in the data lake. For this purpose, highly-scalable object storage has long been established in the cloud area, which is mostly used via an
S3 interface.
Another advantage from the user's point of view for a consistent data management strategy as offered by a data lake, is the uniform and consistent view that it allows for the individual data silos.
In this project, methods are developed to be able to provide an effective data
lake, i.e. to be able to store data, search for it flexibly and process it.
Contributors:
Contributors:
- Hendrik Nolte (Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen)
- Christian Boehme (Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen)
- Julian Kunkel (Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen)
- Simon Hernan Sarmiento Sabater (Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen)
Format
On-site
Registered attendees
Sven Willner
ScientistMax-Planck-Institut für Meteorologie