Large-scale Object Store for File Sharing

Quick Summary

The GBSC is offering object storage to share files with collaborators and with the public. All Stanford faculty can share data from this server at no charge.

What is the Large-scale Object Store?

The GBSC Object Store is a file server with which the Genetics Bioinformatics Service Center (GBSC) provides the Stanford Community a file storage location which can be used to share files easily with collaborators and/or with the public.

Data sharing is essential for the expedited translation of research results into knowledge, products, and procedures to improve human health, so researchers are under pressure from a variety of institutions to make their data available. Many scientific journals require that authors make available the data included in their papers as a condition of publication. Also, most NIH grant applications require that the investigator include a data-sharing plan, which encompasses all data from funded research that can be shared without compromising individual subjects’ rights and privacy, regardless of whether the data have been used in a publication.

The Genetics Bioinformatics Service Center manages an Object Store server which was acquired via an NIH S10 Shared Instrumentation Grant for data sharing applications. This object store is made available to the Stanford community for data sharing applications at no cost.

How does an object store work?

Object storage is a strategy that manages and manipulates data storage as distinct units called objects. These objects are kept in a single storehouse and are not organized as files inside other folders. This flat address space allows object storage to be almost infinitely scalable.

Also, object storage combines the pieces of data that make up a file and adds all its relevant metadata to an object containing the file, so all the information regarding a set of data can be accessed in one place. A real-life example of the value of metadata can be seen in how hospitals can store and process X-rays images of patients. An ordinary X-ray file would have limited metadata associated with it, such as created date, owner, location, and size. Any other metadata about such a file would have to be stored in some other place (e.g., another file or a database) and somehow associated with the X-ray image file. An X-ray object, on the other hand, could have a rich variety of metadata information attached to it, including patient name, date of birth, injury details, which area of the body was X-rayed – in addition to the same tags that the file had. This metadata makes it much easier for researchers to search for the X-rays that they need for a particular study.

Objects are shared via two main protocols: HTTP (like webpages) and S3 (the standard for cloud storage created by Amazon). HTTP sharing allows objects to be downloaded via browsers and URLs to the object store can be integrated into any website. S3 sharing allows the object store to interact seamlessly with cloud computing environments.

How can I share my data on the object store?

If you are interested in sharing your data on the object store, free of charge, you can fill out this form, and we will get back to you regarding next steps. If you have further questions, please email scg-action@lists.stanford.edu.

What equipment does it use?

The object store is a EMC Elastic Cloud Storage (ECS) U1600/U400T.

This object store features:

  • 1.65 Petabytes of usable storage
    • 280 x 8 Terabyte SAS Drives
    • Empty slots available to expand to a total of 5.6 Petabytes
  • 16 file servers for high-throughput
  • Multiple 10Gb Ethernet interfaces for parallel access
  • ECS Software supporting Amazon S3 and HTTP protocols

This object store is integrated into GBSC’s SCG computing cluster and meets moderate risk compliance requirements along with dbGaP compliance requirements.

How is this object store supported?

This object store was purchased with an an S10 Instrumentation grant from the NIH Office of Research Infrastructure Programs (1 S10 OD025082). The S10 grants awarded by ORIP support purchases of state-of-the-art commercially available instruments to enhance research of NIH–funded investigators.

Its operation is funded by the GBSC and the Department of Genetics.