top of page
Ambrose Carr
Stephan Balme
Ryan Abernathey

Zarr - Scalable Storage of Tensor Data for Use in Parallel and Distributed Computing

Many scientific problems involve computing over large N-dimensional typed arrays of data, and reading or writing data is often the major bottleneck limiting speed or scalability. The Zarr project is developing a simple, scalable approach to storage of such data in a way that is compatible with a range of approaches to distributed and parallel computing. We describe the Zarr protocol and data storage format, and the current state of implementations for various programming languages including Python. We also describe current uses of Zarr in malaria genomics, the Human Cell Atlas, and the Pangeo project. Ryan Abernathey, Stephan Balmer, Ambrose Carr, Tim Crone, Martin Durant, Jan Funke, Darren Gallagher, Fabian Gans, Shikhar Goenka, Joe Hamman, Stephan Hoyer, Jerome Kelleher, John Kirkham, Alistair Miles, Josh Moore, Charles Noyes, Tarik Onalan, Constantin Pape, Zain Patel, Matthew Rocklin, Stephan Saafeld, Vincent Schut, Justin Swaney and Ryan Williams
bottom of page