|
This paper represents original work by Vincent Cordrey, Doug Freyburger, Jordan Schwartz and Liza Weissler. It is copyright to those authors. The USENIX Association has been granted exclusive rights to publish and disseminate this paper in printed form, electronically, and in audio formats. Original publication of this paper by the USENIX Association is in the "Proceedings of the LISA XIII Conference". This paper may not be published or otherwise disseminated except by the USENIX Association for the duration of the Grant of Rights to Publish. During that time, the Grant of Rights permits this paper to be posted on the Author's web site(s) for electronic access. |
The Northrop Grumman case was the genesis of Full On-line Relocation techniques. On the system there, the multi tiered HSM software had deadlocked at the first HSM level. This level filled to capacity and was unable to migrate files to deeper levels and, lacking space, was unable to retrieve files from the next deeper tier. With the backing store locked up, continuing file creation on the primary media caused them to fill to capacity and refuse to take new data. With the primary media full and the first tier storage deadlocked, no retrievals of staged-out files could be completed. Any client systems that referenced staged-out files or tried to create new files would suffer permanent NFS timeouts (they used hard mounts for robustness) and would eventually hang.
A replacement fileserver large enough to accommodate all of the data was already in place, but there seemed to be no way to relocate the data to it. At that time, over a hundred thousand of the half million files managed by HSM were in this state. On a multi-vendor UNIX LAN of around 200 regular users, at least twenty percent (20%) of the workstations had to be restarted each day to circumvent HSM NFS hangs. Something had to yield.
After a month of trying to repair the system, Freyburger and Cordrey were seeking a way to avoid abandoning all of the inaccessible data. That was when the innovation to relocate the files on-line by replacing them with symbolic links was made. At the time, all issues of losing small amounts of data because of race conditions became secondary to recovering as much inaccessible data as possible and resolving client system hangs. All resident files were relocated one at a time to the new server and replaced with symbolic links in the hopes that some of the inaccessible data could be retrieved once space became available on the old storage.
In the first phase, find and the vendor
supplied version of ls were used to identify
resident files, with cpio being used to
relocate them. As it turned out, freeing space on the
primary media was enough to relieve the pressure on the
HSM system. In the process, more and more files that
had previously been inaccessible became available again.
It was also necessary to manually recreate the HSM
databases several times as the filesystems were evacuated,
but that was a well documented process, already in the
manuals supplied by the HSM vendor. This phase alone was
sufficient to completely evacuate one of the filesystems.
In the second phase, a script was written to iterate through the database for each filesystem and force relocate those staged-out files that were local to the first tier HSM storage. Since only a few dozen files remained when this phase was completed, "fingerprinting" techniques were used to locate, for recovery by hand, those last few files.
No third phase was needed to recover files from the second tier HSM storage because the evacuation was complete. This was despite the fact that the robot was 80% full-all of the data in the robot was stale, representing deleted and prior versions of current files, because garbage collection had never been done.
No data was lost to race conditions! Some users
even ran make and similar programs in their
directories while those directories were being swept clear
of files. Since the relocation involved about a half
million files in active use by two hundred developers,
this came as a pleasant surprise to Freyburger and Cordrey.
One unsurprising anomaly was encountered: some executables (web server daemons in this case) exited when their binaries were relocated.
During the development of the software to do Full On-line Forward Relocation, two main problems were encountered: Double Relocation and Pathological Filenames, both of which are discussed in the previous section.
There were two filesystems on the old servers that were not HSM managed or had never had files staged-out. Since the servers were slated for decommissioning, that data had to be relocated as well. One of those two contained several web sites which supported the entire corporation, so it had to be available at all times with no outage.
Having used Full On-line Forward Relocation on filesystems under HSM management, the authors applied their software to those normal and critical filesystems. The relocations completed in a single pass, during the production day. Further, because the binaries for the http daemons were stored local to the front end web server which served its content from the NFS mounted volume, the daemons continued serving with no interruptions due to having their binaries moved out from under them.