Leiden Data Store

Welcome to our Leiden Data Store (LDS) space.

Data-intensive science offers new opportunities for innovation and discoveries, provided that large datasets can be handled efficiently. Data management for data-intensive science applications is challenging; requiring support for complex data life cycles, coordination across multiple sites, fault tolerance, and scalability to support tens of sites and petabytes of data. Data management for data-intensive science applications requires a fundamentally different management approach than the current ad-hoc task-centric approach.

This new space provides an overview of the various projects that we are currently working on. Our projects may have dedicated Confluence spaces, which we directly link to. The space is new and in development.


Roadmap

Extra copy for unique unstructured data objects.

This is a temporary solution for storing data from not already back-upped media.

Automated Ingest - Landing Zone

Data is written to disk by an instrument or another source, an ingest job can be run on that directory.​ Once data is ingested, it is moved out of the way to improve ingest performance. 

Off site replication

A copy of specific data will be stored at the SURF Scale-out storage tape facility

Data to Compute

Take data to where it is processed

Automated Ingest - File system Scanning

Periodically scan a source directory, registers the data in place, or update system metadata for changed files.

Custom application integration

RSpace, Omero, iRODS consortium is looking at a solution with the Omero consortium

Storage Tiering

A policy framework providing a scalable solution for data movement between storage resources

Project-specific storage

A project wants to store its data in the LDS.
● 100 TB of data
● replicas stored locally at more institutions

Compute to Data

Take compute to where the data is. (Virtual machines, Container, Lambda technology.)

Optimization for accessing large sets of small files

● thousands of kilobyte-sized files
● interactive file browsing,

  • Load files as needed

  • must be responsive, i.e., cannot take 20 seconds for each user request

Data event publishing

Report on, or raise alarm on, data written or changed.

Bring your Own Infrastructure

Integrate storage and compute

Continuous analysis

The culmination of Automated Ingest,  Data to Compute or Compute to Data and Data event publishing. 

 

Projects

 

 

Ready for use

Extra copy for unique unstructured data objects.

This is a temporary solution for storing data from not already back-upped media.


How to contact us?

If you have any questions, you can e-mail us at ricc@issc.leidenuniv.nl or submit a ticket through the ISSC helpdesk

Recently updated pages

 

Search this space

Search