Follow BigDATAwire: [wpsr_follow_icons facebook=”https://www.facebook.com/BigDATAwire/” twitter=”https://www.twitter.com/datanami” linkedin=”https://www.linkedin.com/company/datanami/” rss=”https://www.bigdatawire.com/feed/” bg_color=”#1b5e5c” shape=”squircle”]
December 9, 2024

ORNL’s New Data Transfer Tool Aims to Simplify HPC Storage Migration

Dec. 9, 2024 — A new data transfer tool created at the Oak Ridge Leadership Computing Facility could be available to facilities nationwide after making its debut at the Department of Energy’s Oak Ridge National Laboratory (ORNL). The tool, called hsi_xfer, was created by HPC storage systems engineer Jake Wynne to help transfer large quantities of data from the facility’s High Performance Storage System, or HPSS, to the newly deployed nearline storage system, Kronos.

Jake Wynne, an HPC storage systems engineer at ORNL, created the hsi_xfer tool to simplify the process of transferring large quantities of data.

After decades of service, HPSS is set to be decommissioned in early 2025, and users are working to migrate their data before the Jan. 31 deadline. As the HPC Storage Systems team prepared for the mass exodus earlier this year, Wynne recognized the need for a new tool to facilitate the process.

“Reading from tape is slow. We only have so many tape drives and hundreds of users. We had to figure out how to coordinate that to prevent the influx of users putting too much strain on the aging hardware,” Wynne said.

Wynne developed an idea for the tool with his colleague, Gregg Gawinski, another HPC storage systems engineer at ORNL. First, they disabled Globus, a command-line tool and web interface commonly used for seamless data transfer. This enabled Wynne to focus solely on hsi, the remaining tool available for facilitating data transfer from HPSS.

The hsi_xfer transfer tool is named after the hsi command-line interface for the HPSS. The hsi_xfer tool builds on the capabilities of hsi by offering a more efficient and streamlined way to transfer data. To distinguish this enhanced version from the original hsi, the team coined the name hsi_xfer.

Kronos is a 134 PB, multiprogrammatic nearline storage system that also provides tape-based backups for all data as a disaster-recovery measure.

“HPSS was designed to be a ‘cold-storage’ solution — a write-once, read-in-the-far-future solution,” Wynne said. “This doesn’t lend well to a mass retrieval of data — potentially over 100 petabytes — within the time frame of a few months.”

Wynne wrote the script to optimize the management of HPSS and data transfer node resources during this period of large-scale data retrieval. The goal was to prevent overloading the tape library’s robotic tape-retrieval mechanisms and provide users with a simpler, more efficient way to access their data.

“By batching all requested files from a single tape and streaming them together, the script minimizes robotic movement, reduces tape loading and seek times, and helps extend the lifespan of both the tapes and the hardware,” Wynne said.

The hsi_xfer tool is essentially a wrapper around hsi — it uses the existing hsi tool but adds features such as concurrent transfer threads, checksumming, and checkpointing while offering a more user-friendly interface. The enhanced functionality, proven efficiency and ease of use have attracted the interest from other computing facilities.

The script has demonstrated superior transfer performance compared to other tools while also offering data-integrity features typically found in tools such as Globus — features that are not available in the standard hsi. Like Globus, the script includes a checkpointing feature that allows users to quickly recover from interrupted transfers, resuming exactly where they left off with minimal overhead.

Wynne is currently stewarding the script through ORNL’s open-sourcing process, and he hopes it will soon appear in the OLCF’s public GitHub repository.

Wynne started at ORNL as an intern in 2013 by writing tutorials and guides for the User Assistance and Outreach group and moved around to various departments until he ended up with the HPC Storage Systems group. This is the second user-facing tool that he has developed at ORNL.

“This tool has changed how users approach the transfer of data from HPSS to other resources, making the process smoother, faster and far less complex,” Wynne said. “Not only has it significantly improved the user experience, but it has also streamlined workflows, thereby reducing the strain on our resources and minimizing potential bottlenecks. As the deadline draws nearer, seeing the tool in action and witnessing the tangible benefits for our users is a huge win for us — it’s exactly what we needed to ensure a successful transition.”


Source: Angela Gosnell, OLCF

BigDATAwire