[Revised June 10, 2004]
I do not want to start yet another thread on this subject, which has been discussed extensively here and at the Apple Discussion Forums. The most complete summary of what I have to say about optimization follows:
MicroMat strongly recommends that you always leave at least 15% of any HFS+ volume as free space. If an HFS+ volume is more than 85% full and is heavily fragmented, any further data added to the volume can result in irreparable damage to the disk directory.
The first time I heard about the 85% full guideline for HFS+ volumes was shortly after the developer of TechTool Pro determined that there was such a problem and found that it could be reproduced. The corresponding guideline for HFS volumes on hard drives is 70%.
Many of you are familiar with the fact that Macintosh files can have both a data fork and a resource fork. The locations of the first eight pieces (called “extents”) of each fork of a Macintosh file are recorded in a part of the disk directory called the Catalog B-Tree. Any additional pieces have their locations recorded in the Extents B-Tree (known within Apple as the Extents Overflow File). These two files make up the major part of the disk directory, and are created when the disk is formatted or initialized. Because they are an inherent part of the volume, these files are also described as logical structures or volume structures. The data that records the location of pieces of files is stored in data structures called nodes within the Catalog B-Tree and Extents B-Tree.
When a disk is formatted, its sectors are organized into allocation blocks, groups of consecutive sectors. One of the advantages of HFS+ disks is that the allocation blocks can be as small as 4K. This advantage results in less waste of disk space on disks that contain many small files, because each file, no matter how little useful data it contains, occupies at least one allocation block.
Any ordinary file can grow by an increment as small as one allocation block, but the Catalog B-Tree and Extents B-Tree cannot. If all of their nodes are full, and new entries need to be made in these directory files, then a new piece of Catalog B-Tree or Extents B-Tree must be added to the volume. If these logical structures could grow by one allocation block, they would spend much of the new space keeping track of their own new pieces. The solution to this problem is to require that the Catalog B-Tree and Extents B-Tree grow by an amount of disk space equal to the “clump size” for these files. In order for the new piece of Catalog B-Tree or Extents B-Tree to work efficiently, it is required to be written to disk space that is not only free, but contiguous (in one piece). If the amount of free contiguous disk space is less than the clump size, and a new piece of Catalog B-Tree or (more likely) Extents B-Tree must be added to the disk, an older piece of Catalog B-Tree or Extents B-Tree is overwritten. The resulting disaster cannot be repaired by any utility.
While the new piece of the Catalog B-Tree or Extents B-Tree must be written in a single extent, the new extent does not have to be contiguous with the last piece of the file. In fact, MicroMat has seen some disks that shipped from the factory with the Catalog B-Tree in seven extents, with the last one not near the first ones. This is not a problem as long as the disk directory properly records the locations of its own pieces. However, a commercial disk repair utility, when it writes a new disk directory, may require or strongly encourage the user to provide enough free contiguous disk space so that the new Catalog B-Tree and Extents B-Tree can each be in one piece. This seems to be a prudent choice, to create a new directory that is as simple as possible.
I do not know the relationship between allocation block size and clump size, but on a 2 GB HFS+ volume that has 4K allocation blocks, the clump size is 4 MB, or 1024 allocation blocks. (In this case, the clump size is 1/500 of the capacity of the disk. For a 100 GB disk, that would be 200 MB, if the relationship is linear.) Therefore, if this disk needs to add a new piece of Catalog B-Tree or Extents B-Tree, and the amount of contiguous free disk space is less than 4 MB, irreparable damage results. I supposed one could contrive to make a disk that was less than 85% full, yet had so many small scattered files that the amount of contiguous free disk space was less than the “clump size”, but I have not heard of that happening. It would probably require considerable effort to contrive.
Disk optimizers began on the Macintosh in an effort to improve the performance of early hard drives. With today’s high-speed drives, the amount of time required to open a file that is in 60 pieces appears to the user to be only slightly greater than the amount of time required to open it if it is one piece. However, the optimizer is now seen to have a purpose more important than performance.
In addition to ensuring that HFS+ volumes have sufficient free contiguous disk space for the disk directory to grow, disk optimizers are useful because they simplify the disk directory, causing all of the nodes in the Extents B-Tree to be free rather than used. A simplified disk directory is easier to repair or rebuild. One symptom of an excessively complex disk directory is an error messsage from Disk First Aid that the “hash table is full.” The hash table is created in RAM by Disk First Aid as it attempts to rebuild the disk directory. It is not a file on the disk itself.
Should you ever require the services of a data recovery firm, please be advised that your bill will be proportional to how badly fragmented your disk is. File recovery is greatly simplified when the pieces (extents) of a file do not require being searched for individually by a person.
Always make and test a backup before running any disk optimizer. It is prudent to check the volume structures (disk directory) of the disk before running the optimizer, and to perform a surface scan to check for bad blocks before the optimizer begins to move around large amounts of data. A UPS device to ensure a steady supply of electricity for models other than iBooks and PowerBooks is highly recommended.
The claim that installations of Mac OS X on HFS+ volumes do not fragment is a myth believed by people who do not have disk optimizers that allow them to see how much fragmentation their disks have. It is an example of ignorance that is not able to be removed by any amount of evidence. I think theologians call that “invincible ignorance.” It is now a widespread form of the pollution of information space.
I decided to erase a damaged 9.1 GB HFS+ volume named Cube_Part_3, and to install Mac OS X 10.2.0 on it. After the installer was finished, I repaired the permissions, then restarted under Mac OS 9 so I could use TechTool Pro 3.0.9 to see how many fragmented files there were, and in how many pieces the free space was. There were 146 fragmented files, and the free space was in 519 pieces. Fortunately, one of them made up much of the majority of the free space on the disk.
After running the 10.2.6 Combo Updater, I repaired the permissions again, then restarted under Mac OS 9 to check the disk. There were 149 fragmented files, and the free space was in 2,454 pieces. Of course, many of these small free space fragments will be able to accomodate the writing of small files in a single extent later, but these figures do show that it might be a good idea to optimize the volume before and after adding the third-party applications. Having the disk completely optimized before you start adding, modifying, and deleting your own documents is worth the small amount of effort it takes.
I installed Mac OS X 10.3.0 on an external FireWire drive, a LaCie Data Bank. The result was 115 fragmented files, 153 file fragments (file pieces, or extents, beyond the first one), and a free space in 249 pieces.
Mac OS X Panther adds some automatic file optimization and file relocation features, but they are quite limited, and the optimization affects only files smaller than 20 MB. The most detailed description I have seen of this new feature is the one written by David Badinovac at
http://www.macintouch.com/panreader18.html .
UFS disks attempt to reduce fragmentation by putting all the pieces of a file (still called “extents”) on the same cylinder. If the cylinder becomes full and new pieces must be added to the files on it, the files fragment. UFS disks use structures call indirect and double-indirect inodes to keep track of the locations of pieces of fragmented files.
For an inode diagram that includes single and double indirect blocks on UFS disks:
http://e2fsprogs.sourceforge.net/ext2intro.html
For a good text description of a UFS disk, see
http://www.hicom.net/~shchuang/Unix/unix4.html
and
http://www.isu.edu/departments/comcom/unix/workshop/fstour.html .
Apple does not recommend using UFS disks unless you are a developer working on software for UNIX platforms. Apple’s implementation of UFS is said to be relatively slow, and if fsck cannot fix a disk directory error, there is no alternative to reformating the volume and restoring the files from a backup.
In addition to providing sufficient free contiguous disk space for the growth of the disk directory, there are two other reasons why you should maintain your HFS+ volumes so that they are never more than 85% full:
1. Swapfiles must be written to disk space that is both free and contiguous.
In Jaguar, each swapfile is 80 MB (80,000,000 bytes).
In Panther, the first two swapfiles are 64 MB, the next is 128 MB, and the next is 256 MB. The one after that is named “buy more RAM”.
You can see the swapfiles currently in use by using the Go to Folder menu choice of the Finder’s Go menu, and typing in the dialog box:
/private/var/vm
Swapfiles are created at boot time, and are removed at restart or shutdown.
2. The journal file, which has a default size of 8192K, must be in a single piece.
Revised June 10, 2004
revision notes:
1. Changed number of extents recorded in the Catalog B-Tree and Extents B-Tree from three to eight for each fork of a Macintosh file. (Sorry about the error, the source of which I can no longer reconstruct. It predates my registration here at MacFixIt.) A detailed description of the HFS+ volume format is at
http://developer.apple.com/technotes/tn/tn1150.html .
2. Added information about swapfiles requiring free contiguous disk space.
3. Added information about the journal file requiring free contiguous disk space.
4. Added detail about free contiguous disk space required or strongly encouraged by commercial disk repair utilities when a rebuilt disk directory is written from RAM to the volume.
5. Added other minor details and repetitions in the hope that misinterpretation will be minimized.
6. Added guideline of 70% for HFS volumes.