BeeGFS 3-Server Cluster & Prototyping

Over the past week or so, I've been prototyping components of the end-to-end DRC pipeline based on interesting discussions about the KPF and UHE pipelines.

My goal with this prototype was to create a sort of straw man with a specific set of technologies and architectural choices to ground our planning/design of the production PANOSETI DRC pipeline.

I'll briefly describe this prototype system for reference:

3-Server Cluster in the RAL @ Berkeley

Capture d’écran 2025-09-07 à 20.19.21.png

Wei, Ben, and I set up a cluster of 3 Dell 1U R410 servers running Ubuntu 24.04, connected to a 1GbE switch, and installed in the RAL server room (Berkeley).
- Two servers have three 2TB HDDs in RAIDZ1 configurations, and the third server has a 500GB SSD for metadata caching and a RAID0 array consisting of a 3TB and a 2TB HDD. (disks from Dan's previous projects.)
- This hardware is about 14 years old, so we should expect much better performance on a more modern system.

**BeeGFS Distributed Filesystem**

11.6TiB of combined HDD and SSD capacity is presented to clients as a single data volume /mnt/beegfs, managed by the parallel file system software BeeGFS.
- BeeGFS can stripe files across HDDs and SSDs, enabling high I/O throughput.
- BeeGFS can also define storage pools to create a hierarchy of pools of fast, small SSD and slow, large HDD secondary storage.

Unified Data Directory: /mnt/beegfs/data

Capture d’écran 2025-09-07 à 20.19.21.png

Like the Keck pipeline, our raw data will be archived in a level-0 data product directory (/mnt/beegfs/data/L0) by rsync processes belonging to a special "l0-copiers" group with exclusive write access to this directory. Other groups can read this data, but cannot write to protect the raw data from problematic file I/O code.
- Note: BeeGFS allows us to set access control lists (ACLs) on directories and any new files/subdirectories created within them, so we could enforce these rules automatically.

**Dask + Zarr --> mini-HPC Cluster**

The 3-server cluster plus one additional server provides a pool of 64 cores to a Dask SSHCluster scheduler to run various jobs.
Each server runs a beegfs-client service, allowing them to directly read input data products from /mnt/beegfs/data/L0 and write output data products to /mnt/beegfs/data/L1+.
The pipeline works entirely with Zarr-formatted files to enable Dask to build computation graphs with a high degree of parallelism on disjoint chunks of data.

Capture d’écran 2025-09-07 à 20.24.06.png

Zarr + xarray data model

Pipeline stages could take an xarray reference to a Zarr file as one of their inputs. This xarray object could provide an object-oriented, numpy-like interface to all our observational data.
- Zarr group/array hierarchies can be traversed with xarray using Python's dot notation.
- We can define the hierarchy, so this data model would be extensible for new data products. For further customization, we could subclass xarray classes and add custom methods.
xarray automatically creates labeled ND arrays from Zarr files, which we could use to store time for images, time, temperature, etc.

Simple End-to-end Tests