SLAC Shared Science Data Facility (S3DF)
The SLAC Shared Science Data Facility (S3DF) is a heterogeneous CPU and GPU accelerated computing resource with high-performance disk, flash and archival storage. S3DF is designed to support the data analytics needed for massive throughput observational and experimental data. S3DF provides resources to support work in the key areas of real-time data reduction and feedback, real-time access to HPC facilities for data analysis and new ML techniques to interpret data faster, enabling efficient operation of the next-generation facilities, and improving accelerator, detector, and facility operations.
S3DF augments the Department of Energy’s Integrated Research Infrastructure (IRI) ecosystem with robust connections—networking, workflow tools and support—to external exascale compute resources. S3DF is the hub for the large BES and HEP experiments at SLAC - such as LCLS and LSST-Rubin providing key capabilities such as:
- High performance and high throughput computing with robust platform services that enable scientific workflows and pipelines within SLAC and across the DOE ecosystem. For example, automated data processing, including job scheduling interfaces, including workflows between SLAC and NERSC, and the support for a hybrid cloud-local workflow such as that used by the Rubin Observatory’s Legacy Survey of Space and Time (LSST).
- A multi-petabyte filesystem used by SLAC’s research facilities and experiments for data-intensive workflows and computing pipelines. Enabled by a broad range of storage technologies, including an all-flash Weka based filesystem, a tiering, Ceph object storage system, Lustre and GPFS-based filesystems, and tape-based archival systems.
- Data management capabilities and hardware infrastructure that provides a data processing system, which integrates sensors and front-end electronics in the experimental area, with on-the-fly data monitoring and data reduction fabric, and with complex and configurable processing capabilities in the data center, onsite and remote. For eg. eg. the LCLS Data Reduction Pipeline and tools such as LCLStream.