© 2023 by TIC. Proudly created with Wix.com

When

May 20-24, 2019

Where

Paul L. Locatelli, S.J. Student Activities Center

500 El Camino Real

Santa Clara, CA 95053

What

MSST 2019

Contact Us

To learn more, don’t hesitate to get in touch

35th International Conference
on Massive Storage Systems
and Technology (MSST 2019)
May 20 — 24, 2019

Meet. Discuss. Learn

MSST 2019, will focus on distributed storage system technologies, including persistent memory, new memory technologies, long-term data retention (tape, optical disks...), solid state storage (flash, MRAM, RRAM...), software-defined storage, OS- and file-system technologies, cloud storage, big data, and data centers (private and public). The conference will focus on current challenges and future trends in storage technologies. 

Agenda

May 20-24, 2019

May 20

Tutorials

May 21

Keynote and talks

May 22

Keynote and talks

May 23

Research Track

May 24

Research Track

Registration and Breakfast

7:30 - 9:00

Introduction
Sean Roberts, Tutorial Chair

9:00 - 9:05

IME Storage System (10:30 — 11:00 Break)

Paul Nowoczynski, DDN

DDN’s IME (aka "the Infinite Memory Engine") is an all NAND-flash storage system which acts as a high-performance storage tier in a user’s overall storage environment. IME has been built from the ground up as a highly-available, clustered storage technology which provides millions of IOPs to applications and best-case media endurance properties. IME’s top-rated capabilities report the highest overall performance for the most demanding data workloads as recorded by the independent IO500 organization.

The tutorial will focus on IME’s tiering ability along with performance demonstrations in difficult workload scenarios. Configuration, usage, and monitoring of IME, which will be done on a live cluster, will all be covered and attendees can expect to obtain a reasonable sense of an IME environment’s look and feel.

9:05 - 12:30

Lunch

12:30 - 1:30

Expanding the World of Heterogeneous Memory Hierarchies: The Evolving Non-Volatile Memory Story (3:00 — 3:30 Break)

Bill Gervasi, Nantero

Emerging technologies and storage options are challenging the traditional system architecture hierarchies and giving designers new variables to consider. Existing options include module level solutions such as 3DXpoint and NVDIMMs which bring data persistence onto the memory channel, each with a variety of tradeoffs in terms of cost, performance, and mechanical considerations. Emerging options include new non-volatile memory technologies capable of addressing the limitations of the current solutions with lower latency and predictable time to data persistence, a critical factor for high reliability data processing applications. Meanwhile, an increasing number of systems are moving towards distributed fabric-based backbones with heterogeneous computing elements as well, including but not limited to artificial intelligence and deep learning, but also in-memory computing and non-von Neumann processing.

This tutorial is targeted at system architects who can appreciate the complexity of a confusing number of options, and would like some insights about managing the complexity to solve real world problems. Some of the standards in process are new, such as NVDIMM-P, DDR5 NVRAM, or Gen-Z specifications, so this is an opportunity to learn about future developments as well. The tutorial will allocate time for attendees to share their system integration stories as well, making it a joint learning experience for all.

1:30 - 5:00

Registration and Breakfast

7:30 - 8:30

Keynote: Mark Kryder - A Perspective on the Past and Future of Magnetic Hard Drives

A Perspective on the Past and Future of Magnetic Hard Drives

Dr. Mark Kryder, Carnegie Mellon University (bio)

The market for data storage technology is expanding at a rapid pace. IDC predicts that the total worldwide data will increase from 33 Zytes in 2018 to 175 Zbytes in 2025. Moreover the capacity of hard drives has increased from 5 Mbytes in 1956 to 15 Tbytes today, which is an increase of 3 million fold, while reducing the weight from over a ton to 1.5 lbs. Historically, HDDs have been as critical to advances in computing as semiconductors and continue to be. The 5.25 inch disk drive, first introduced by Seagate in 1978, enabled the IBM PC, and the 2.5 and 1.8 inch disk drives enabled laptop computers. Although there has been a significant decrease in the areal density increase of hard drives, the industry is still well over an order of magnitude away from the theoretical limits to the density that may be achieved, and new technologies such as two-dimensional magnetic recording (TDMR), heat assisted magnetic recording HAMR, microwave assisted magnetic recording (MAMR) and bit patterned media recording (BPMR) promise to renew the areal density growth in the future. Although the capacity of data stored on a HDD has increased at a pace equal to that of Moore’s law for semiconductors, the improvement in performance of HDDs has lagged. This has contributed to making solid state drives (SSDs), based on flash technology attractive in both mobile devices and high performance applications, where either a limited amount of storage is needed or where fast read access time is critical. However, flash is more expensive than storage on hard drives, and this has limited its use in applications requiring large volumes of data, such as the cloud, which today is being enabled by HDDs. Recently, to improve performance the industry has announced that they will be introducing multiple actuators on their drives. This presentation will describe the evolution of high-density data storage technology and project what might be expected in the future.

8:30 - 9:30

Break

9:30 - 10:00

Storage in the Age of AI

Tiering and Life Cycle Management with AI/ML Workloads

Jacob Farmer, Cambridge Computer, Starfish Storage (bio)

This talk takes a quick look at the pressures that machine learning workloads put on traditional HPC storage systems and proposes that organizations who embrace machine learning will want to up their games when it comes to data life cycle management. The talk then explores common approaches to namespace management, data movement, and data life cycle policy enforcement.

Machine Learning and Algorithmic Privacy at Humu

Dr. Aleatha Parker-Wood, Humu

 

Storage and Data Challenges for Production Machine Learning

Dr. Nisha Talagala, ParallelM (bio)

Machine Learning and Advanced Analytics are some of the most exciting and promising uses of the masses of data accumulated and stored over the last decade. However, as industries race to monetize the insights hidden in their data stores, new challenges emerge for storage and data management. The performance needs of AI workloads have been known for some time and Flash, for example, has been successfully applied to mitigate some of these challenges. As AI usage grows and becomes more dynamic and distributed (such as on edge), these performance requirements continue to expand and be coupled with other needs such as power efficiency. Secondly, as AI moves to production, other concerns are emerging, such regulatory requirements and business’ needs to demonstrate AI trustworthiness while managing risk. These requirements generate new data challenges from security to provenance and governance. This talk will describe recent trends and focus areas in AI (such as productization, trust and distributed execution) and how they create challenges and opportunities for storage and data management systems. The talk will also cover how storage systems are used in production AI workflows and how innovations in storage and data management can impact and improve the production AI lifecycle.

 

I/O for Deep Learning at Scale

Quincey Koziol, National Energy Research Scientific Computing Center (NERSC)

Deep Learning is revolutionizing the fields of computer vision, speech recognition and control systems. In recent years, a number of scientific domains (climate, high-energy physics, nuclear physics, astronomy, cosmology, etc) have explored applications of Deep Learning to tackle a range of data analytics problems. As one attempts to scale Deep Learning to analyze massive scientific datasets on HPC systems, data management becomes a key bottleneck. This talk will explore leading scientific use cases of Deep Learning in climate, cosmology, and high-energy physics on NERSC and OLCF platforms; enumerate I/O challenges and speculate about potential solutions.

10:00 - 12:00

Lunch

12:00 - 1:15

Computational Memory and Storage for AI

 

Changing Storage Architecture will require new Standards

Mark Carlson, Toshiba Memory Corporation

 

How NVM Express and Computational Storage can make your AI Applications Shine!

Dr. Stephen Bates, Eideticom

Artificial Intelligence and Machine Learning are becoming dominant workloads in both data-centers and on the edge. However AI/ML require large amounts of input data for both training and inference. In this talk we will discuss how NVM Express and Computational Storage can vastly improve the performance and efficiency of AI/ML systems. We will give an introduction to both technologies and show how they can be deployed and the benefits to expect from them.

 

Storage in the New Age of AI/ML

Young Paik, Samsung (bio)

One of the hottest topics today is Artificial Intelligence/Machine Learning. Most of the attention has been on the enormous increases in computational power now possible with GPU/ASIC servers. Much less time has been spent on what is arguably just as important: the storage of the data that feeds these hungry beasts. There are many technologies that may be used (e.g. PCIe Gen4, erasure coding, smart storage, SmartNICs). However, in designing new storage architectures it is important to realize where these may not work well together. Young will describe some of the characteristics of machine learning systems, the methods to process the data to feed them, and what considerations should go into designing the storage for them.

1:15 - 2:45

Break

2:45 - 3:00

User Requirements of Storage at Scale

 

Storage systems requirements for massive throughput detectors at light sources

Dr. Amedeo Perazzo, SLAC National Accelerator Laboratory (bio)

This presentation describes the storage systems requirements for the upgrade of the Linac Coherent Light Source, LCLS-II, which will start operations at SLAC in 2021. These systems face formidable challenges due to the extremely high data throughput generated by the detectors and to the intensive computational demand for data processing and scientific interpretation.

NWSC Storage: A look at what users need

Chris Hoffman, National Center for Atmospheric Research (bio)

The NCAR HPC Storage Team manages large scale storage systems at the NCAR-Wyoming Supercomputing Center (NWSC). This talk will discuss the current storage environment and driving factors. Chris will then discuss the requirements of the future storage environment to come.

 

Understanding Storage System Challenges for Parallel Scientific Simulations

Dr. Bradley Settlemyer, Los Alamos National Laboratory (bio)

Computer-based simulation is critical to the study of physical phenomena that are difficult or impossible to physically observe. Examples include asteroid collisions, chaotic interactions in climate models, and massless particle interactions. Long-running simulations, such as those at running on Los Alamos National Laboratory's Trinity supercomputer, generate many thousands of snapshots of the simulation state that are written to stable storage for fault tolerance and visualization/analysis. For extreme scale simulation codes, such as the Vector Particle-in-Cell code (VPIC), improving the efficiency of storage system access is critical to accelerating scientific insight and discovery. In this talk we will discuss the structure of the VPIC software architecture and several storage system use cases associated with the VPIC simulation code and the challenges associated with parallel access to the underlying storage system. We will not present solutions but instead focus on the underlying requirements of the scientific use cases including fault tolerance and emerging data analysis workloads that directly accelerate scientific discovery.

 

The Art of Storage

Eric Bermender, Pixar Animation Studios (bio)

Storage systems are designed to adapt to new looks and new plot lines in service to the stories our studio tells. "The Art of Storage" is a story about how creative decision making influences our storage architecture decisions. This overview will cover many of the storage technologies and considerations currently used within our production pipelines and what future technologies interest us in helping to push the boundaries of animation.

3:00 - 4:30

Break followed by Lightning talks
Sign-up board will be available all day

4:40 - 5:30

Cocktail Reception Sponsored by Aeon Computing

5:00 - 6:00

Registration and Breakfast

7:30 - 8:30

Keynote: Margo Seltzer - Importance of Provenance

More than Storage

Dr. Margo Seltzer, University of British Columbia (bio)

The incredible growth and success that our field has experienced over the past half a century has had the side effect of transforming systems into a constellation of siloed fields; storage is one of them. I'm going to make the case that we should return to a road interpretation of systems, undertaking bolder, higher risk projects, and be intentional about how we interact with other fields. I'll support the case with examples or several research projects that embody this approach.

8:30 - 9:30

Break

9:30 - 10:00

Resilience at Scale

 

Session Chair: TBD

Rethinking End-to-end Reliability in Distributed Cloud Storage System

 

Dr. Asaf Cidon, Barracuda Networks

A Storage Architecture for Resilient Assured Data

 

Paul D. Manno, Georgia Tech

Rightscaling: Varying data safety techniques with scale

 

Lance Evans, Cray, Inc.

Practical erasure codes tradeoffs for scalable distributed storage systems

 

Cyril Guyot, Western Digital

Extreme-scale Data Resilience Trade-offs at Experimental Facilities

 

Dr. Sadaf R. Alam, Swiss National Supercoming Centre (bio)

Large scale experimental facilities such as the Swiss Light Source and the free-electron X-ray laser SwissFEL at the Paul Scherrer Institute (PSI), and the particle accelerators and detectors at CERN are experiencing unprecedented data generation growth rates. Consequently, management, processing and storage requirements of data are increasing rapidly. The Swiss National Supercomputing Centre, CSCS, provides computing and storage capabilities, specifically related to a dedicated archiving system for scientific data, for PSI. This talk overviews performance and cost efficiency trade-offs for managing data at rest as well as data in motion for PSI workflows. This co-design approach is needed to address resiliency challenges at extreme scales, in particular, considering unique data generation capabilities at experimental facilities.

10:00 - 12:00

Lunch

12:00 - 1:15

Next Generation Storage Software

(Session Chair: Meghan McClelland)

 

How are new algorithms and storage technologies addressing the new requirements of AI and Big Science? How are virtual file systems bridging the gap between big repositories and usability?

 

CERN's Virtual File System for Global-Scale Software Delivery

Dr. Jakob Blomer, CERN (bio)

Delivering complex software across a worldwide distributed system is a major challenge in high-throughput scientific computing. Copying the entire software stack everywhere it’s needed isn’t practical—it can be very large, new versions of the software stack are produced on a regular basis, and any given job only needs a small fraction of the total software. To address application delivery in high-energy physics, the global scale virtual file system CernVM-FS distributes software to hundreds of thousands of machines around the world. It uses HTTP for data transport and it provides POSIX access to more than a billion files of application software stacks and operating system containers to end user devices, university clusters, clouds, and supercomputers. This presentation discusses key design choices and trade-offs in the file system architecture as well as practical experience of operating the infrastructure.

 

Disaggregated, Shared-Everything Infrastructure to Break Long-Standing Storage Tradeoffs

Renen Hallak, VAST Data (bio)

Storage architects have traditionally had to trade the various virtues of a storage system off against one another sacrificing performance for capacity, scale for simplicity or resilience for cost among others, VAST’s Disaggregated Shared Nothing Architecture (DASE) leverages the latest storage technologies including 3D XPoint, NVMe over Fabrics and QLC flash to break these tradeoffs.This session will describe the DASE architecture and how it empowers VAST’s Universal Storage system to deliver all-flash performance at petabyte to exabyte scale and at a cost low enough for archival use cases. Customers using Universal Storage can therefore eliminate the islands of storage common in today’s datacenter and expand their data mining to all their data.

Zach Brown, Versity Software

ScoutFS is an open source clustered POSIX file system built to support archiving of extremely large file sets. This talk will summarize the challenges faced by sites that are managing large archives and delve into the solution Versity is developing. We'll explore the technical details of how POSIX can scale and how we index file system metadata concurrently across a cluster while operating a high bandwidth.

 

Grand Unified File Index: A Development, Deployment, and Performance Update

Dominic Manno, Los Alamos National Laboratory

Compute clusters are growing, and along with them the amount of data being generated is increasing. It is becoming more important for end-users and storage administrators to manage the data, especially when moving between tiers. The Grand Unified File Indexing (GUFI) system is a hybrid indexing capability designed to assist storage admins and users in managing their data. GUFI utilizes trees and embedded databases to securely provide very fast access to an indexed version of their metadata. In this talk we will provide an update on GUFI development, early performance results, and deployment strategies.

1:15 - 3:15

Break

3:15 - 3:30

Future Storage Systems

Moore's Law coming to an end has parallels in the storage industry. What comes next? What lies beyond 10 years with respect to new nonvolatile media? What software approaches can help stem the tide in achieving peak performance and density?

 

Ultra-dense data storage and extreme parallelism with electronic-molecular systems

Dr. Karin Strauss, Microsoft Research (bio)

In this talk, I will explain how molecules, specifically synthetic DNA, can store digital data and perform certain types of special-purpose computation by leveraging tools already developed by the biotechnology industry.

 

The Future of Storage Systems – a Dangerous Opportunity

Rob Peglar, Advanced Computation and Storage, LLC (bio)

We are at a critical point concerning storage systems, in particular, how these systems are integrated into the larger whole of compute and network elements which comprise HPC infrastructure. The good news is, we have a plethora of technologies from which to choose – recording media, device design, subsystem construction, transports, filesystems, access methods, etc. The bad news is, we have lots of choices. This talk will explore the past, present and future of storage systems (emphasis on "systems", not just storage) and the dangerous opportunity we have to significantly improve the state of the art. Fair warning: this may involve the throwing down of gauntlets and the abandoning of long-held beliefs. Remember, as Einstein said in 1946, we cannot solve problems by using the same thinking that created them originally.

 

Nantero NRAM carbon nanotube memory changes the foundation for next generation storage concepts

Bill Gervasi, Nantero, Inc. (bio)

Nantero NRAM defines a new class of memory class storage (MCS) devices with the performance of a DRAM using carbon nanotubes for centuries-long data persistence. How does the introduction of MCS change how we think of data storage hierarchies? When main memory acts as a self-serving storage layer, traditional concepts of checkpointing to slower media or maintaining energy stores for backup mechanisms evaporate, and a new model for data integrity emerges. MCS clearly can replace the volatile caches used for acceleration in the mass storage media as well, and when the cache size is decoupled from considerations like the capacity of energy stores, designers are able to rethink their assumptions about cost versus performance calculations. With a growing trend towards fabric-based system bus architectures including Gen-Z, Open-CAPI, CCIX, etc., the timing is right for introduction of new paradigms for data distribution that take advantage of data persistence. This talk describes Nantero NRAM’s technical details, touches on the new JEDEC standards effort for MCS devices, and explores use cases for MCS in massive storage systems.

3:30 - 5:00

Lightning Talks
Sign-up board will be available all day.

5:00 - 6:00

Preliminary Program

XORInc: Optimizing Data Repair and Update for Erasure-Coded Systems with XOR-Based
In-Network Computation

Yingjie Tang, Fang Wang, Yanwen Xie and Xuehai Tang

Efficient Encoding and Reconstruction of HPC Datasets for Checkpoint/Restart

Seung Woo Son

CDAC: Content-Driven Deduplication-Aware Storage Cache

Yujuan Tan, Jing Xie, Congcong Xu, Zhichao Yan, Hong Jiang, Yajun Zhao, Min Fu, Xianzhang Chen, Duo Liu and Wen Xia

Scalable QoS for Distributed Storage Clusters using Dynamic Token Allocation

Yuhan Peng, Qingyue Liu and Peter Varman

Tiered-ReRAM: A Low Latency and Energy Efficient TLC Crossbar ReRAM Architecture

Yang Zhang, Dan Feng, Wei Tong, Jingning Liu, Chengning Wang and Jie Xu

CeSR: A Cell State Remapping Strategy to Reduce Raw Bit Error Rate of MLC NAND Flash

Yutong Zhao, Wei Tong, Jingning Liu, Dan Feng and Hongwei Qin

vPFS+: Managing I/O Performance for Diverse HPC Applications

Ming Zhao and Yiqi Xu

Parity-Only Caching for Robust Straggler Tolerance

Mi Zhang, Qiuping Wang, Zhirong Shen and Patrick P. C. Lee

Mitigate HDD Fail-Slow by Pro-actively Utilizing System-level Data Redundancy
with Enhanced HDD Controllability and Observability

Jingpeng Hao, Yin Li, Xubin Chen and Tong Zhang

Accelerating Relative-error Bounded Lossy Compression for HPC datasets with
Precomputation-Based Mechanisms

Xiangyu Zou, Tao Lu, Wen Xia, Xuan Wang, Weizhe Zhang, Sheng Di, Dingwen Tao and Franck Cappello

Towards Virtual Machine Image Management for Persistent Memory

Jiachen Zhang, Lixiao Cui, Peng Li, Xiaoguang Liu and Gang Wang

vNVML: An Efficient Shared Library for Virtualizing and Sharing Non-volatile Memories

Chih Chieh Chou, Jaemin Jung, Narasimha Reddy, Paul Gratz and Doug Voigt

Preliminary Program

DFPE: Explaining Predictive Models for Disk Failure Prediction

Yanwen Xie, Dan Feng, Fang Wang, Xuehai Tang, Jizhong Han and Xinyan Zhang

Efficient Encoding and Reconstruction of HPC Datasets for Checkpoint/Restart

Seung Woo Son

Metadedup: Deduplicating Metadata in Encrypted Deduplication via Indirection

Jingwei Li, Patrick P. C. Lee, Yanjing Ren and Xiaosong Zhang

FastBuild: Accelerating Docker Image Building for Efficient Development and Deployment of Containers

Zhuo Huang, Song Wu, Song Jiang and Hai Jin

Wear-aware Memory Management Scheme for Balancing Lifetime and Performance of Multiple NVM Slots

Chunhua Xiao, Linfeng Cheng, Lei Zhang, Duo Liu and Weichen Liu

Adjustable flat layouts for Two-Failure Tolerant Storage Systems

Thomas Schwarz

LIPA: a Learning-based Indexing and Prefetching Approach for data deduplication

Guangping Xu, Chi Wan Sung, Quan Yu, Hongli Lu and Bo Tang

AZ-Code: An Efficient Availability Zone Level Erasure Code to Provide High Fault
Tolerance in Cloud Storage Systems

Xin Xie, Chentao Wu, Junqing Gu, Han Qiu, Jie Li, Minyi Guo, Xubin He, Yuanyuan Dong and Yafei Zhao

BFO: Batch-File Operations on Massive Files for Consistent Performance Improvement

Yang Yang, Qiang Cao and Hong Jiang

Crab-tree: Crash Recoverable ARMv8-oriented B+-tree for Persistent Memory

Chundong Wang, Sudipta Chattopadhyay and Gunavaran Brihadiswarn

A Performance Study of Lustre File System Checker: Bottlenecks and Potentials

Dong Dai, Om Rameshwar Gatla and Mai Zheng

ES-Dedup: a Case for Low-Cost ECC-based SSD Deduplication

Zhichao Yan, Hong Jiang and Yujuan Tan

Fighting with Unknowns: Estimating the Performance of Scalable Distributed Storage
Systems with Minimal Measurement Data

Moo-Ryong Ra and Hee Won Lee

Pattern-based Write Scheduling and Read Balance-oriented Wear- leveling for Solid State Drivers

Jun Li, Xiaofei Xu, Xiaoning Peng and Jianwei Liao

When NVMe over Fabrics Meets Arm: Performance and Implications

Yichen Jia, Eric Anger and Feng Chen

Long-Term JPEG Data Protection and Recovery for NAND Flash-Based Solid-State Storage

Yu-Chun Kuo, Ruei-Fong Chiu and Ren-Shuo Liu

Economics of Information Storage: The Value in Storing the Long Tail

James Hughes