The ZIP file format is a data compression and archival format. A ZIP file contains one or more files that have been compressed, to reduce their file size, or stored as-is. A number of compression algorithms are permitted in zip files but as of 2008 only DEFLATE is widely used and supported.
The format was originally evolved by Phil Katz for PKZIP from the previous ARC compression format by Thom Henderson. However, many software utilities other than PKZIP itself are now available to create, modify, or open (unzip, decompress) ZIP files, notably WinZip, BOMArchiveHelper, KGB Archiver, PicoZip, Info-ZIP, WinRAR, IZArc, 7-Zip, ALZip, TUGZip, PeaZip, Universal Extractor and Zip Genius. Microsoft has included built-in ZIP support (under the name "compressed folders") in later versions of its Windows operating system. Apple has included built-in ZIP support in Mac OS X 10.3 and later via the BOMArchiveHelper utility. The zip, zipcloak, zipnote, zipsplit tools are used widely in unix-like systems.
ZIP files generally use the file extensions ".zip" or ".ZIP" and the MIME media type
application/zip. Some software uses the ZIP file format as a wrapper for a large number of small items in a specific structure. Generally when this is done a different file extension is used. Examples of this usage are Java JAR files, Python .egg files, id Software .pk3/.pk4 files, package files for StepMania and Winamp/Windows Media Player skins, XPInstall, as well as OpenDocument and Office Open XML office formats. Both OpenDocument and Office Open XML formats use the JAR file format internally, so files can be easily uncompressed and compressed using tools for ZIP files. Google Earth makes use of KMZ files, which are just KML files in ZIP format. Mozilla Firefox Add-ons are zip files with extension "xpi". Nokia's mobile phone themes are zipped with extension "nth". Sony Ericsson mobile phone themes are zipped with extension "thm".
During the mid-1980s, System Enhancement Associates, a small company run by Thom Henderson, created a file archiving format called ARC, and a corresponding archiver (also called ARC) that could compress and decompress files into this format. This program was released as shareware for a number of platforms, with the source code included. The file format quickly became a de facto standard. Phil Katz released a file compatible software package on the IBM Intel DOS platform, known as PKXARC. It used hand-optimized 8088 assembly language and was considerably faster than SEA's original cross-platform implementation in C.
The competition from Katz did not please SEA, who sued Katz for trademark and copyright infringement, as it alleged that Katz had plagiarized sections of the code. Katz lost the lawsuit and was forced to pay $62,500 to SEA to cover their legal fees. It was found during the court case that Katz had used SEA's ARC source code for the majority of the application but had only made code optimizations to increase speed. Primarily he changed the word length used by the algorithm from 12 bits to 13 bits resulting in a higher compression for typical binary files. As a result of the lawsuit, Katz changed the names of his utilities to PKPAK and PKUNPAK.
Katz then went on to create his own file format, which is known worldwide now as the ZIP format (commonly called a "ZIP file"). The ZIP format was more resistant to data loss than the ARC format because of redundant catalog storage; it also was more flexible than ARC, providing room for additional optional compression algorithms and future expansion. Along with the new format, PKZIP included at least one compression algorithm more efficient than any supported by ARC. Once PKZIP was released, many users abandoned ARC because of its slower speed and less effective compression, and because SEA alienated many by seeming to suddenly assert proprietary legal rights over the ARC file format after it had become widely used among the on-line community (similar in this respect to the later GIF patents controversy).
Katz publicly released technical documentation on the ZIP file format making it an open format, along with the first version of his PKZIP archiver, in January 1989. Originally only bundled with registered versions of PKZIP, the APPNOTE.TXT documentation file, titled .ZIP File Format Specification, was later available in PKWARE site.
The name zip (meaning speed) was suggested by Katz's friend Robert Mahoney. They wanted to imply that their product would be faster than ARC and other compression formats of the time.
In the late 1990s, various file manager software started integrating support for the ZIP format into their user interface. Even earlier, Norton Commander and its clones like Volkov Commander in DOS had started that trend, and that remains the norm for the "Commander-like" or Orthodox file managers like Midnight Commander for Linux and UNIX-like systems and Total Commander (previously Windows Commander) for Windows. The KDE file manager (kfm) supported the ZIP format very early; ZIP support was also first added to Windows Explorer with the Plus! enhancement package in Windows 98 and later included in Windows Me and Windows XP; ZIP format support is also built in the Mac OS Finder (as of Mac OS X, via the BOMArchiveHelper utility), the Nautilus file manager used by GNOME and the Konqueror file manager of newer versions of KDE. By 2002, all major desktop environments included ZIP file support in their file managers: a ZIP file is typically presented as a directory or folder, so that files are copied into and out of it in the same manner as any other folder and the compression is handled in a way largely transparent to the user. This has eliminated the need to learn a specialized tool and interface for file archival and compression.
tar.gzarchive which consists of a TAR archive compressed using gzip).
The specification for ZIP indicates that files can be stored either uncompressed or using a variety of compression algorithms. However, in practice, ZIP is almost always used with Katz's DEFLATE algorithm, except when files being added are already compressed or are resistant to compression.
ZIP supports a simple password-based symmetric encryption system which is known to be seriously flawed. In particular it is vulnerable to known-plaintext attacks which are in some cases made worse by poor implementations of random number generators. It also supports spreading archives across multiple removable disks (generally floppy disks, but it could also be used with other removable media).
New features including new compression and encryption (e.g. AES) methods have been documented to .ZIP File Format Specification since version 5.2. WinZip developed AES based standard is used also by 7-Zip but some vendors use other formats. PKWARE SecureZIP also supports DC2, DC4, DES, 3DES encryption methods, Digital Certificate (X.509)-based encryption and authentication, archive header encryption.
The original ZIP format had a number of limits (uncompressed size of a file, compressed size of a file and total size of the archive) at 4GB. In version 4.5 of the specification, PKWARE introduced the "ZIP64" format extensions to get around these limitations.
The FAT filesystem of DOS only has a timestamp resolution of two seconds; ZIP file records mimic this. As a result, the built-in timestamp resolution of files in a ZIP archive is only two seconds, though extra fields can be used to store more accurate timestamps.
Since September 2007, the ZIP specification (APPNOTE.TXT) contains a provision to store file names using UTF-8, finally adding Unicode compatibility to ZIP.
The Info-ZIP implementations of the ZIP format adds support for Unix filesystem features, such as user and group IDs, file permissions, and support for symbolic links. The Apache Ant implementation is aware of them to the extent that it can create files with predefined Unix permissions.
The Info-ZIP Windows tools also support NTFS filesystem permissions, and will make an attempt to translate from NTFS permissions to Unix permissions or vice-versa when extracting files. This can result in potentially unintended combinations, e.g. .exe files being created on NTFS volumes with executable permission denied.
Besides the file data each file entry is introduced by a local header with information about the file such as the comment, file size and file name. The central directory consists of file headers holding the relative offset of the local headers for each file.
Due to the arbitrary order and the fact that the order of the file entries and the corresponding header references in the central directory may be different, the format is non-sequential.
According to the PKWARE Inc. ZIP file format specification supports large files (more than 65000 entries and entries larger than 4GB). File encryption, and the possibility to span a ZIP file over multiple files are supported as well.
However, not all of these features are implemented by the known development libraries.
In another controversial move, PKWare applied for a patent in 2003-07-16 describing a method for combining .ZIP and strong encryption to create a secure .ZIP file.
In the end, PKWARE and WinZip agreed to support each other's products. On 2004-01-21, PKWARE announced the support of WinZip-based AES compression format. In later version of WinZip beta, it is able to support SES-based ZIP files. PKWARE eventually released version 5.2 of .ZIP File Format Specification to public, which documented SES.
Patent No. 7,624,132 Issued on Nov. 24, Assigned to Sun Microsystems for Streamed Zip File Processing Method (California Inventors)
Nov 25, 2009; ALEXANDRIA, Va., Nov. 26 -- Paul A. Lovvik of Boulder Creek, Calif., and Junaid A. Saiyed of Sunnyvale, Calif., have developed a...