SUMMARY
The Setup program for the applications listed above uses the
following technologies that previous versions of the Setup program do
not use:
- Diamond: A lossless data compression tool
- Quantum: A new core compressor
- DMF (Distribution Media Format): a new read-only format for 3.5-inch
floppy disks.
The following information describes each of these technologies.
DIAMOND
Diamond is a lossless data compression tool that can be used for a
wide variety of purposes. Although it was originally designed for use
by Setup programs, it can also be used in almost any situation where
lossless data compression is required and slow compression time (in
exchange for better compression) is OK.
Diamond has three key features: (1) storing multiple files together in
a single cabinet file, (2) compressing across file boundaries, and (3)
permitting files to span across cabinets. Existing products such as
PKZIP, LHARC, and ARJ support some of these features, but combining
all of these features does not seem to be a common practice.
Depending on how many files are to be compressed, and what kind of
access patterns are expected (sequential versus random access; most
of the files will be read versus only a small number of files), you
will make different choices about how you tell Diamond to build your
cabinet files. One very key concept in Diamond is the folder. A
folder is a collection of one or more files that are compressed
together, as a single entity. The most important property of a folder
is that to access a particular file in the folder, any preceding
files in the folder must be read and decompressed. For example, if
you have 100 files in a folder, and they compress down from 3M to 1M,
and you want to extract the last file in the folder, you must read
the entire folder in order to do so.
Diamond Concepts
The key feature of Diamond is that it takes a set of files and
produces a disk layout while at the same time attempting to minimize
the number of disks required. To understand how Diamond does this,
you need to understand the following terms: cabinet, folder, and
file. Essentially, Diamond takes all of your files, lays the bytes
down as one continues byte stream, compresses the entire stream,
chopping it up into folders as appropriate, and then filling up one
or more cabinets with the folders.
Cabinet: A normal file that contains pieces of one or more files, usually
compressed.
Folder: A decompression boundary. Large folders enable higher
compression, because the compressor can refer back to more data
in finding patterns. However, to retrieve a file at the end of a
folder, the entire folder must be decompressed. So there is a
tradeoff between achieved compression and the quickness of random
access to individual files.
File: A file to be placed in the layout.
Diamond Application Disk Layout
The distribution disks for a typical application such as Microsoft
Word for Windows produced by Diamond appear as follows:
Disk1 -- WORD1.CAB
SETUP.EXE
WDREADME.HLP
...
Disk2 -- WORD2.CAB
Disk3 -- WORD3.CAB
QUANTUM
Quantum is a new compression technology that Microsoft obtained an
unrestricted license to in early May, 1994. It achieves compressed
file sizes 10-15% smaller than MSZIP, and Quantum will be the
preferred compressor (possibly the only one) supported by Diamond. In
order to achieve these impressive results, Quantum can require a fair
amount of memory (up to 12 MB) at compress time, and even at decompress
time (configurable from 1K to 2 MB), and Quantum gets its best results
on large data streams. For this reason, cabinet files and Quantum are
a great fit, because cabinet files with large folders ensure that
Quantum is always compressing big blocks of data. The decompression
memory requirements for Quantum is tunable in the Diamond directive
file.
Distribution Media Format (DMF)
DMF is a special read-only format for 3.5-inch floppy disks that permits
storing 1.7 MB of data (a 17.7% increase over the standard 1.44 MB
format). This is achieved by reducing the inter-sector gap, and
adding 3 sectors per track. This does not affect the ability of
arbitrary floppy drives to read the disk, because we have not
changed the magnetic recording density. With this reduced
inter-sector gap, however, there is not enough room between sectors
to allow a floppy drive to reliably write to a DMF disk. There
are tools to create DMF disk images, and we have verified that the
disk duplicating machines (Trace and Rimage) used by Microsoft and
our key duplicators will correctly and efficiently duplicate these
disks.
One limitation of the DMF format is that the root directory only
holds 16 entries, and the cluster size is 2K. For this reason, using
cabinet files on DMF is ideal, since the root directory size will not
be exceeded, and with only one cabinet file per DMF disk, the 2K
cluster allocation granularity does not cause any wasted space.
The combination of Diamond, Quantum, and DMF should yield a 20-30%
reduction in the number of disks in a product, compared to previous
Setup programs that used Microsoft Setup version 1.0, and MSZIP. The
actual results may vary, but in measurements using Microsoft Office
version 4.2 for Windows, the 25 3.5-inch disks required using Microsoft
Setup version 1.0 and MSZIP were reduced to 18 3.5-inch disks using
Diamond + Quantum + DMF; a 28% savings.
For additional information, please see the following article in the
Microsoft Knowledge Base:
120006
XL5C: How to Copy Files from Cabinets on DMF-Formatted Disks