This paper represents original work by Vincent Cordrey, Doug Freyburger, Jordan Schwartz and Liza Weissler. It is copyright to those authors.

The USENIX Association has been granted exclusive rights to publish and disseminate this paper in printed form, electronically, and in audio formats.

Original publication of this paper by the USENIX Association is in the "Proceedings of the LISA XIII Conference".

This paper may not be published or otherwise disseminated except by the USENIX Association for the duration of the Grant of Rights to Publish. During that time, the Grant of Rights permits this paper to be posted on the Author's web site(s) for electronic access.

RAND

Non HSM Home Directories: Forward Relocation by request

All UNIX account home directories (over a thousand) resided on non-Y2K compliant servers, which had to be upgraded as a part of a standard technology refresh program. The replacement servers were separate fileservers with large RAID boxes at each campus.

mh is used pervasively at RAND, thus, because the resynchronization of the source and destination file systems was not built into Version 2 of the Forward Relocation algorithm, it was not used to move UNIX account home directories to the new servers.

However, users could request an early relocation of their home directories by contacting their help desk. As part of this standard help desk procedure, Full On-line Forward Relocation is used to move their home directory to one of the new servers.

HSM systems: Enhanced Just Plain Copy

This HSM exit was accomplished by Weissler. It is included for completeness to demonstrate that highly successful File System Relocations do not require Forward or Reverse Relocation.

RAND acquired two EpochTM optical hierarchical storage management systems in 1989-1990. The initial systems were Sun 4/75 workstations with a proprietary EpochTM operating system based upon SunOS 4.0.3. EpochTM used Ingres as the supporting relational database with Hewlett-Packard and Hitachi optical Jukeboxes populated with WORM [write once, read many] media. A series of upgrades brought the systems up to Sun Sparc 20 workstations running SunOS 4.1.3 with erasable optical media.

By 1996, it was clear that the systems would have to be replaced. Backups had become increasingly difficult as the amount of data increased: it was common for a full backup to run several days, rendering it of questionable integrity. Staff turnover left RAND with little expertise in HSM which in turn led to the deterioration of administration. On going garbage collection efforts decreased with the staff turnover, resulting in many of the 1200+ optical media being under 50% utilized. The vendor stopped supporting the non-Y2K compliant hardware which rendered relocation of the data mandatory.

The system was running, healthy, old, and slow. Because it was healthy, a PR campaign was necessary. Some users were convinced to buy their own disks, some wanted the "higher performance" of not having to wait for stage-ins, and others had to be shown the lower overall support cost of newer technology storage systems.

Analysis was conducted to find usage patterns. Three patterns emerged and data that followed these patterns was copied to different target servers. However, because the replacement servers did not arrive on site at the same time, a systematic draining of each optical media was not possible since only a portion of each media could be relocated to a given server. The first server arrived and a portion of the data was moved, and the process was paused. Relocation temporarily resumed after the second server was delivered. When third server arrived, the last of the data was evacuated, some three months after the process began.

The system was placed in a Read-Only or Semi On-line state during the relocation. The EpochTM utility, epls, worked rapidly enough to allow media preparation in the form of pre-load lists per directory. To avoid thrashing, all files were staged-out to optical storage leaving the file systems largely empty. Files in the pre-load lists were then staged-in in bulk using epbsi. Data was then manually copied using tar in relatively small chunks. These improvements on the Just Plain Copy technique virtually eliminated jukebox thrashing. This makes this example much higher performance than a brute force Just Plain Copy approach.