Leiden Data Store
Welcome to our Leiden Data Store (LDS) space.
Data-intensive science offers new opportunities for innovation and discoveries, provided that large datasets can be handled efficiently. Data management for data-intensive science applications is challenging; requiring support for complex data life cycles, coordination across multiple sites, fault tolerance, and scalability to support tens of sites and petabytes of data. Data management for data-intensive science applications requires a fundamentally different management approach than the current ad-hoc task-centric approach.
This new space provides an overview of the various projects that we are currently working on. Our projects may have dedicated Confluence spaces, which we directly link to. The space is new and in development.
Roadmap
Extra copy for unique unstructured data objects.
This is a temporary solution for storing data from not already back-upped media.
Automated Ingest - Landing Zone
Data is written to disk by an instrument or another source, an ingest job can be run on that directory. Once data is ingested, it is moved out of the way to improve ingest performance.
Off site replication
A copy of specific data will be stored at the SURF Scale-out storage tape facility
Data to Compute
Take data to where it is processed
Automated Ingest - File system Scanning
Periodically scan a source directory, registers the data in place, or update system metadata for changed files.
Custom application integration
RSpace, Omero, iRODS consortium is looking at a solution with the Omero consortium
Storage Tiering
A policy framework providing a scalable solution for data movement between storage resources
Project-specific storage
A project wants to store its data in the LDS.
● 100 TB of data
● replicas stored locally at more institutions
Compute to Data
Take compute to where the data is. (Virtual machines, Container, Lambda technology.)
Optimization for accessing large sets of small files
● thousands of kilobyte-sized files
● interactive file browsing,
Load files as needed
must be responsive, i.e., cannot take 20 seconds for each user request
Data event publishing
Report on, or raise alarm on, data written or changed.
Bring your Own Infrastructure
Integrate storage and compute
Continuous analysis
The culmination of Automated Ingest, Data to Compute or Compute to Data and Data event publishing.
Projects
Ready for use
Extra copy for unique unstructured data objects.
This is a temporary solution for storing data from not already back-upped media.
How to contact us?
If you have any questions, you can e-mail us at ricc@issc.leidenuniv.nl or submit a ticket through the ISSC helpdesk
Search this space