Disk formatting is the process of preparing a hard disk or other storage medium for use, including setting up an empty file system. A variety of utilities and programs exist for this task; pictured to the right is the iconic FORMAT.COM of MS-DOS and PC-DOS.
Large disks can be partitioned, divided into logical sections that are formatted with their own file systems. This is normally only done on hard disks because of the small sizes of other disk types, as well as compatibility issues.
A corrupted operating system can be reverted to a clean state by formatting the disk and reinstalling the OS, as a drastic way of combatting a software problem or malware infection. Obviously, important files should be backed up beforehand.
Formatting a disk involves two quite different processes known as low-level and high-level formatting. The former deals with the formatting of disk surfaces and installing characteristics like sector numbers that are visible to, and used by, the disk controller hardware, while the latter deals with specific information written by the operating system.
The low-level format of floppy disks (and early hard disks) is performed by the disk drive hardware.
The process is most easily described with a standard 1.44 MB floppy disk in mind. Low-level formatting of the floppy normally writes 18 sectors of 512 bytes each on each of 160 tracks (80 on each side) of the floppy disk, providing 1,474,560 bytes of storage on the disk.
Sectors are actually physically larger than 512 bytes as they include sector numbers, CRC bytes, and other information required in order to identify and verify the sector during reading and writing. These additional bytes are not included in the quoted figure for overall storage capacity of the disk.
To complicate matters, different low-level formats can be used on the same media; for example, large records can be used to cut down on interrecord gap size.
Several freeware, shareware and free software programs (e.g. GParted, FDFORMAT, NFORMAT and 2M) allowed considerably more control over formatting, allowing the formatting of high-density 3 1/2" disks with a capacity up to 2 MB.
Techniques used include:
User instigated low-level formatting (LLF) of hard disks was common in the 1980s. Typically this involved setting up the MFM pattern on the disk, so that sectors of bytes could be successfully written to it. With the advent of RLL encoding, low-level formatting grew increasingly uncommon, and most modern hard disks are embedded systems, which are low-level formatted at the factory with the physical geometry dimensions and thus not subject to user intervention.
Early hard disks were quite similar to floppies, but low-level formatting was generally done by the BIOS rather than by the operating system. This process involved using the MS-DOS debug program to transfer control to a routine hidden at different addresses in different BIOSs.
Early hard disks often had imprecise head-movement mechanisms based on stepper motor technology, which located tracks by advancing the stepper a specific number of steps, and the correct track should appear under the head. But a drive formatted horizontally often would not function in a vertical orientation, due to the force of gravity pulling down on the mechanism and moving the heads slightly out of alignment with tracks written in the horizontal position. It was usually necessary to LLF a drive for the orientation it was meant to be used.
Early hard drives also tended to use a magnetic storage material with a low resistance to demagnetization (coercivity). An MFM/RLL drive containing data that was rarely written would eventually develop data errors all by itself due to the opposing magnetic domains that define data bits softening and neutralizing each other. Although data would become unreadable, this was not due to a media defect. The low-level format process can wipe out these mushy sectors and firm up new boundaries in the mud, allowing the drive to perform again as if it were brand new for a while longer. Some older drive utilities such as Spinrite included a sector refreshing function that read and rewrote all sectors to firm up the sector magnetic domains.
Rather than face ever-escalating difficulties with BIOS versioning, disk vendors started doing low-level formatting at the factory. Today, an end-user, in most cases, should never perform a low-level formatting of an IDE or ATA hard drive, and in fact it is often not possible to do so on modern hard drives outside of the factory.
The primary reason low-level formatting cannot be done is because modern drives do not use stepper motors to locate tracks, and hence there is no way to determine where tracks should be recreated on the media. Instead in modern drives the heads are positioned using a stepless analog servomotor, often referred to as the voice coil since it operates almost exactly like an analog audio speaker.
Modern drives locate tracks based on special servo control data permanently written to the drive platters at the factory by the hard drive manufacturer, using highly specialized equipment. Early servo-controlled drives used an entire separate disk platter to store this read-only servo data, but this was inefficient. Modern drives store the servo data directly embedded among the regular tracks and sectors, and operate in a manner so that servo data is absolutely never overwritten for any reason. Loss of servo data results in a loss of the ability to locate the data tracks.
Servo data is also why modern drives can operate in any position compared to early MFM and RLL drives. The head positioning is based on data embedded directly within the media itself so the drive always knows exactly where the heads should be positioned, and the servo can immediately compensate for any jarring motion that would otherwise misalign MFM drives and get the stepper out of sync with the tracks, requiring a seek to track zero to resynchronize the stepper.
The present ambiguity in the term "high-level format" seems to be due to both inconsistent documentation on web sites and the belief by many users that any process below a "high-level (file system) format" must be called a low-level format. Instead of correcting this mistaken idea (by clearly stating such a process cannot be performed on specific drives), various drive manufacturers have actually described reinitialization software as LLF utilities on their web sites. Since users generally have no way to determine the difference between a true LLF and reinitialization (they simply observe running the software results in a hard disk that must be partitioned and "high-level formatted"), both the misinformed user and mixed signals from various drive manufacturers have perpetuated this error. Note: Whatever possible misuse of such terms may exist (search hard drive manufacturers' web sites for all these terms), many sites do make such reinitialization utilities available (possibly as bootable floppy diskette or CD image files), to both overwrite every byte and check for damaged sectors on the hard disk.
One popular method for performing only the "zero-fill" operation on a hard disk is by writing zero-bytes to the drive using the Unix dd utility (available under Linux as well) with the "/dev/zero" stream as the input file (if=) and the drive itself (either the whole disk, or a specific partition) as the output file (of=).
High-level formatting is the process of setting up an empty file system on the disk, and installing a boot sector. This alone takes little time, and is sometimes referred to as a "quick format".
In addition, the entire disk may optionally be scanned for defects, which takes considerably longer, up to several hours on larger harddisks.
In the case of floppy disks, both high- and low-level formatting are customarily done in one pass by the software. In recent years, most floppies have shipped preformatted from the factory as DOS FAT12 floppies. It is possible to format them again to other formats, if necessary.
Under MS-DOS, PC-DOS and Microsoft Windows, disk formatting can be performed by the FORMAT program. FORMAT usually asks for confirmation beforehand to prevent accidental removal of data, but some versions of DOS have an undocumented /AUTOTEST option; if used, the usual confirmation is skipped and the format begins right away. The WM/FormatC macro virus uses this command to format the C: drive as soon as a document is opened.
There is also the undocumented /U parameter that performs an unconditional format which overwrites the entire partition, preventing the recovery of data through software (but see below).
As with regular deletion, data on a disk is not fully destroyed during a high-level format. Instead, the area on the disk containing the data is merely marked as available (in whatever file system structure the format uses), and retains the old data until it's overwritten. If the reformatting is done with a different file system than previously existed in the partition, some data may be overwritten that wouldn't be if the same file system had been used. However, under some file systems (e.g., NTFS; but not FAT), the file indexes (such as $MFTs under NTFS, "inodes" under ext2/3, etc.) may not be written to the same exact locations. And if the partition size is increased, even FAT file systems will overwrite more data at the beginning of that new partition.
From the perspective of preventing the recovery of sensitive data through recovery tools, the data must either be completely overwritten (every sector) with random data before the format, or the format program itself must perform this overwriting; as the DOS FORMAT command did with floppy diskettes, filling every data sector with the byte value
F6 in hex.