third_party/zlib/doc/rfc1952.txt - bazel - Git at Google



 Network Working Group                                         P. Deutsch
 Request for Comments: 1952                           Aladdin Enterprises
 Category: Informational                                         May 1996


                GZIP file format specification version 4.3

 Status of This Memo

    This memo provides information for the Internet community.  This memo
    does not specify an Internet standard of any kind.  Distribution of
    this memo is unlimited.

 IESG Note:

    The IESG takes no position on the validity of any Intellectual
    Property Rights statements contained in this document.

 Notices

    Copyright (c) 1996 L. Peter Deutsch

    Permission is granted to copy and distribute this document for any
    purpose and without charge, including translations into other
    languages and incorporation into compilations, provided that the
    copyright notice and this notice are preserved, and that any
    substantive changes or deletions from the original are clearly
    marked.

    A pointer to the latest version of this and related documentation in
    HTML format can be found at the URL
    <ftp://ftp.uu.net/graphics/png/documents/zlib/zdoc-index.html>.

 Abstract

    This specification defines a lossless compressed data format that is
    compatible with the widely used GZIP utility.  The format includes a
    cyclic redundancy check value for detecting data corruption.  The
    format presently uses the DEFLATE method of compression but can be
    easily extended to use other compression methods.  The format can be
    implemented readily in a manner not covered by patents.


 Deutsch                      Informational                      [Page 1]

 RFC 1952             GZIP File Format Specification             May 1996


 Table of Contents

    1. Introduction ................................................... 2
       1.1. Purpose ................................................... 2
       1.2. Intended audience ......................................... 3
       1.3. Scope ..................................................... 3
       1.4. Compliance ................................................ 3
       1.5. Definitions of terms and conventions used ................. 3
       1.6. Changes from previous versions ............................ 3
    2. Detailed specification ......................................... 4
       2.1. Overall conventions ....................................... 4
       2.2. File format ............................................... 5
       2.3. Member format ............................................. 5
           2.3.1. Member header and trailer ........................... 6
               2.3.1.1. Extra field ................................... 8
               2.3.1.2. Compliance .................................... 9
       3. References .................................................. 9
       4. Security Considerations .................................... 10
       5. Acknowledgements ........................................... 10
       6. Author's Address ........................................... 10
       7. Appendix: Jean-Loup Gailly's gzip utility .................. 11
       8. Appendix: Sample CRC Code .................................. 11

 1. Introduction

    1.1. Purpose

       The purpose of this specification is to define a lossless
       compressed data format that:

           * Is independent of CPU type, operating system, file system,
             and character set, and hence can be used for interchange;
           * Can compress or decompress a data stream (as opposed to a
             randomly accessible file) to produce another data stream,
             using only an a priori bounded amount of intermediate
             storage, and hence can be used in data communications or
             similar structures such as Unix filters;
           * Compresses data with efficiency comparable to the best
             currently available general-purpose compression methods,
             and in particular considerably better than the "compress"
             program;
           * Can be implemented readily in a manner not covered by
             patents, and hence can be practiced freely;
           * Is compatible with the file format produced by the current
             widely used gzip utility, in that conforming decompressors
             will be able to read data produced by the existing gzip
             compressor.


 Deutsch                      Informational                      [Page 2]

 RFC 1952             GZIP File Format Specification             May 1996


       The data format defined by this specification does not attempt to:

           * Provide random access to compressed data;
           * Compress specialized data (e.g., raster graphics) as well as
             the best currently available specialized algorithms.

    1.2. Intended audience

       This specification is intended for use by implementors of software
       to compress data into gzip format and/or decompress data from gzip
       format.

       The text of the specification assumes a basic background in
       programming at the level of bits and other primitive data
       representations.

    1.3. Scope

       The specification specifies a compression method and a file format
       (the latter assuming only that a file can store a sequence of
       arbitrary bytes).  It does not specify any particular interface to
       a file system or anything about character sets or encodings
       (except for file names and comments, which are optional).

    1.4. Compliance

       Unless otherwise indicated below, a compliant decompressor must be
       able to accept and decompress any file that conforms to all the
       specifications presented here; a compliant compressor must produce
       files that conform to all the specifications presented here.  The
       material in the appendices is not part of the specification per se
       and is not relevant to compliance.

    1.5. Definitions of terms and conventions used

       byte: 8 bits stored or transmitted as a unit (same as an octet).
       (For this specification, a byte is exactly 8 bits, even on
       machines which store a character on a number of bits different
       from 8.)  See below for the numbering of bits within a byte.

    1.6. Changes from previous versions

       There have been no technical changes to the gzip format since
       version 4.1 of this specification.  In version 4.2, some
       terminology was changed, and the sample CRC code was rewritten for
       clarity and to eliminate the requirement for the caller to do pre-
       and post-conditioning.  Version 4.3 is a conversion of the
       specification to RFC style.


 Deutsch                      Informational                      [Page 3]

 RFC 1952             GZIP File Format Specification             May 1996


 2. Detailed specification

    2.1. Overall conventions

       In the diagrams below, a box like this:

          +---+
          |   | <-- the vertical bars might be missing
          +---+

       represents one byte; a box like this:

          +==============+
          |              |
          +==============+

       represents a variable number of bytes.

       Bytes stored within a computer do not have a "bit order", since
       they are always treated as a unit.  However, a byte considered as
       an integer between 0 and 255 does have a most- and least-
       significant bit, and since we write numbers with the most-
       significant digit on the left, we also write bytes with the most-
       significant bit on the left.  In the diagrams below, we number the
       bits of a byte so that bit 0 is the least-significant bit, i.e.,
       the bits are numbered:

          +--------+
          |76543210|
          +--------+

       This document does not address the issue of the order in which
       bits of a byte are transmitted on a bit-sequential medium, since
       the data format described here is byte- rather than bit-oriented.

       Within a computer, a number may occupy multiple bytes.  All
       multi-byte numbers in the format described here are stored with
       the least-significant byte first (at the lower memory address).
       For example, the decimal number 520 is stored as:

              0        1
          +--------+--------+
          |00001000|00000010|
          +--------+--------+
           ^        ^
           |        |
           |        + more significant byte = 2 x 256
           + less significant byte = 8


 Deutsch                      Informational                      [Page 4]

 RFC 1952             GZIP File Format Specification             May 1996


    2.2. File format

       A gzip file consists of a series of "members" (compressed data
       sets).  The format of each member is specified in the following
       section.  The members simply appear one after another in the file,
       with no additional information before, between, or after them.

    2.3. Member format

       Each member has the following structure:

          +---+---+---+---+---+---+---+---+---+---+
          |ID1|ID2|CM |FLG|     MTIME     |XFL|OS | (more-->)
          +---+---+---+---+---+---+---+---+---+---+

       (if FLG.FEXTRA set)

          +---+---+=================================+
          | XLEN  |...XLEN bytes of "extra field"...| (more-->)
          +---+---+=================================+

       (if FLG.FNAME set)

          +=========================================+
          |...original file name, zero-terminated...| (more-->)
          +=========================================+

       (if FLG.FCOMMENT set)

          +===================================+
          |...file comment, zero-terminated...| (more-->)
          +===================================+

       (if FLG.FHCRC set)

          +---+---+
          | CRC16 |
          +---+---+

          +=======================+
          |...compressed blocks...| (more-->)
          +=======================+

            0   1   2   3   4   5   6   7
          +---+---+---+---+---+---+---+---+
          |     CRC32     |     ISIZE     |
          +---+---+---+---+---+---+---+---+


 Deutsch                      Informational                      [Page 5]

 RFC 1952             GZIP File Format Specification             May 1996


       2.3.1. Member header and trailer

          ID1 (IDentification 1)
          ID2 (IDentification 2)
             These have the fixed values ID1 = 31 (0x1f, \037), ID2 = 139
             (0x8b, \213), to identify the file as being in gzip format.

          CM (Compression Method)
             This identifies the compression method used in the file.  CM
             = 0-7 are reserved.  CM = 8 denotes the "deflate"
             compression method, which is the one customarily used by
             gzip and which is documented elsewhere.

          FLG (FLaGs)
             This flag byte is divided into individual bits as follows:

                bit 0   FTEXT
                bit 1   FHCRC
                bit 2   FEXTRA
                bit 3   FNAME
                bit 4   FCOMMENT
                bit 5   reserved
                bit 6   reserved
                bit 7   reserved

             If FTEXT is set, the file is probably ASCII text.  This is
             an optional indication, which the compressor may set by
             checking a small amount of the input data to see whether any
             non-ASCII characters are present.  In case of doubt, FTEXT
             is cleared, indicating binary data. For systems which have
             different file formats for ascii text and binary data, the
             decompressor can use FTEXT to choose the appropriate format.
             We deliberately do not specify the algorithm used to set
             this bit, since a compressor always has the option of
             leaving it cleared and a decompressor always has the option
             of ignoring it and letting some other program handle issues
             of data conversion.

             If FHCRC is set, a CRC16 for the gzip header is present,
             immediately before the compressed data. The CRC16 consists
             of the two least significant bytes of the CRC32 for all
             bytes of the gzip header up to and not including the CRC16.
             [The FHCRC bit was never set by versions of gzip up to
             1.2.4, even though it was documented with a different
             meaning in gzip 1.2.4.]

             If FEXTRA is set, optional extra fields are present, as
             described in a following section.


 Deutsch                      Informational                      [Page 6]

 RFC 1952             GZIP File Format Specification             May 1996


             If FNAME is set, an original file name is present,
             terminated by a zero byte.  The name must consist of ISO
             8859-1 (LATIN-1) characters; on operating systems using
             EBCDIC or any other character set for file names, the name
             must be translated to the ISO LATIN-1 character set.  This
             is the original name of the file being compressed, with any
             directory components removed, and, if the file being
             compressed is on a file system with case insensitive names,
             forced to lower case. There is no original file name if the
             data was compressed from a source other than a named file;
             for example, if the source was stdin on a Unix system, there
             is no file name.

             If FCOMMENT is set, a zero-terminated file comment is
             present.  This comment is not interpreted; it is only
             intended for human consumption.  The comment must consist of
             ISO 8859-1 (LATIN-1) characters.  Line breaks should be
             denoted by a single line feed character (10 decimal).

             Reserved FLG bits must be zero.

          MTIME (Modification TIME)
             This gives the most recent modification time of the original
             file being compressed.  The time is in Unix format, i.e.,
             seconds since 00:00:00 GMT, Jan.  1, 1970.  (Note that this
             may cause problems for MS-DOS and other systems that use
             local rather than Universal time.)  If the compressed data
             did not come from a file, MTIME is set to the time at which
             compression started.  MTIME = 0 means no time stamp is
             available.

          XFL (eXtra FLags)
             These flags are available for use by specific compression
             methods.  The "deflate" method (CM = 8) sets these flags as
             follows:

                XFL = 2 - compressor used maximum compression,
                          slowest algorithm
                XFL = 4 - compressor used fastest algorithm

          OS (Operating System)
             This identifies the type of file system on which compression
             took place.  This may be useful in determining end-of-line
             convention for text files.  The currently defined values are
             as follows:


 Deutsch                      Informational                      [Page 7]

 RFC 1952             GZIP File Format Specification             May 1996


                  0 - FAT filesystem (MS-DOS, OS/2, NT/Win32)
                  1 - Amiga
                  2 - VMS (or OpenVMS)
                  3 - Unix
                  4 - VM/CMS
                  5 - Atari TOS
                  6 - HPFS filesystem (OS/2, NT)
                  7 - Macintosh
                  8 - Z-System
                  9 - CP/M
                 10 - TOPS-20
                 11 - NTFS filesystem (NT)
                 12 - QDOS
                 13 - Acorn RISCOS
                255 - unknown

          XLEN (eXtra LENgth)
             If FLG.FEXTRA is set, this gives the length of the optional
             extra field.  See below for details.

          CRC32 (CRC-32)
             This contains a Cyclic Redundancy Check value of the
             uncompressed data computed according to CRC-32 algorithm
             used in the ISO 3309 standard and in section 8.1.1.6.2 of
             ITU-T recommendation V.42.  (See http://www.iso.ch for
             ordering ISO documents. See gopher://info.itu.ch for an
             online version of ITU-T V.42.)

          ISIZE (Input SIZE)
             This contains the size of the original (uncompressed) input
             data modulo 2^32.

       2.3.1.1. Extra field

          If the FLG.FEXTRA bit is set, an "extra field" is present in
          the header, with total length XLEN bytes.  It consists of a
          series of subfields, each of the form:

             +---+---+---+---+==================================+
             |SI1|SI2|  LEN  |... LEN bytes of subfield data ...|
             +---+---+---+---+==================================+

          SI1 and SI2 provide a subfield ID, typically two ASCII letters
          with some mnemonic value.  Jean-Loup Gailly
          <gzip@prep.ai.mit.edu> is maintaining a registry of subfield
          IDs; please send him any subfield ID you wish to use.  Subfield
          IDs with SI2 = 0 are reserved for future use.  The following
          IDs are currently defined:


 Deutsch                      Informational                      [Page 8]

 RFC 1952             GZIP File Format Specification             May 1996


             SI1         SI2         Data
             ----------  ----------  ----
             0x41 ('A')  0x70 ('P')  Apollo file type information

          LEN gives the length of the subfield data, excluding the 4
          initial bytes.

       2.3.1.2. Compliance

          A compliant compressor must produce files with correct ID1,
          ID2, CM, CRC32, and ISIZE, but may set all the other fields in
          the fixed-length part of the header to default values (255 for
          OS, 0 for all others).  The compressor must set all reserved
          bits to zero.

          A compliant decompressor must check ID1, ID2, and CM, and
          provide an error indication if any of these have incorrect
          values.  It must examine FEXTRA/XLEN, FNAME, FCOMMENT and FHCRC
          at least so it can skip over the optional fields if they are
          present.  It need not examine any other part of the header or
          trailer; in particular, a decompressor may ignore FTEXT and OS
          and always produce binary output, and still be compliant.  A
          compliant decompressor must give an error indication if any
          reserved bit is non-zero, since such a bit could indicate the
          presence of a new field that would cause subsequent data to be
          interpreted incorrectly.

 3. References

    [1] "Information Processing - 8-bit single-byte coded graphic
        character sets - Part 1: Latin alphabet No.1" (ISO 8859-1:1987).
        The ISO 8859-1 (Latin-1) character set is a superset of 7-bit
        ASCII. Files defining this character set are available as
        iso_8859-1.* in ftp://ftp.uu.net/graphics/png/documents/

    [2] ISO 3309

    [3] ITU-T recommendation V.42

    [4] Deutsch, L.P.,"DEFLATE Compressed Data Format Specification",
        available in ftp://ftp.uu.net/pub/archiving/zip/doc/

    [5] Gailly, J.-L., GZIP documentation, available as gzip-*.tar in
        ftp://prep.ai.mit.edu/pub/gnu/

    [6] Sarwate, D.V., "Computation of Cyclic Redundancy Checks via Table
        Look-Up", Communications of the ACM, 31(8), pp.1008-1013.


 Deutsch                      Informational                      [Page 9]

 RFC 1952             GZIP File Format Specification             May 1996


    [7] Schwaderer, W.D., "CRC Calculation", April 85 PC Tech Journal,
        pp.118-133.

    [8] ftp://ftp.adelaide.edu.au/pub/rocksoft/papers/crc_v3.txt,
        describing the CRC concept.

 4. Security Considerations

    Any data compression method involves the reduction of redundancy in
    the data.  Consequently, any corruption of the data is likely to have
    severe effects and be difficult to correct.  Uncompressed text, on
    the other hand, will probably still be readable despite the presence
    of some corrupted bytes.

    It is recommended that systems using this data format provide some
    means of validating the integrity of the compressed data, such as by
    setting and checking the CRC-32 check value.

 5. Acknowledgements

    Trademarks cited in this document are the property of their
    respective owners.

    Jean-Loup Gailly designed the gzip format and wrote, with Mark Adler,
    the related software described in this specification.  Glenn
    Randers-Pehrson converted this document to RFC and HTML format.

 6. Author's Address

    L. Peter Deutsch
    Aladdin Enterprises
    203 Santa Margarita Ave.
    Menlo Park, CA 94025

    Phone: (415) 322-0103 (AM only)
    FAX:   (415) 322-1734
    EMail: <ghost@aladdin.com>

    Questions about the technical content of this specification can be
    sent by email to:

    Jean-Loup Gailly <gzip@prep.ai.mit.edu> and
    Mark Adler <madler@alumni.caltech.edu>

    Editorial comments on this specification can be sent by email to:

    L. Peter Deutsch <ghost@aladdin.com> and
    Glenn Randers-Pehrson <randeg@alumni.rpi.edu>


 Deutsch                      Informational                     [Page 10]

 RFC 1952             GZIP File Format Specification             May 1996


 7. Appendix: Jean-Loup Gailly's gzip utility

    The most widely used implementation of gzip compression, and the
    original documentation on which this specification is based, were
    created by Jean-Loup Gailly <gzip@prep.ai.mit.edu>.  Since this
    implementation is a de facto standard, we mention some more of its
    features here.  Again, the material in this section is not part of
    the specification per se, and implementations need not follow it to
    be compliant.

    When compressing or decompressing a file, gzip preserves the
    protection, ownership, and modification time attributes on the local
    file system, since there is no provision for representing protection
    attributes in the gzip file format itself.  Since the file format
    includes a modification time, the gzip decompressor provides a
    command line switch that assigns the modification time from the file,
    rather than the local modification time of the compressed input, to
    the decompressed output.

 8. Appendix: Sample CRC Code

    The following sample code represents a practical implementation of
    the CRC (Cyclic Redundancy Check). (See also ISO 3309 and ITU-T V.42
    for a formal specification.)

    The sample code is in the ANSI C programming language. Non C users
    may find it easier to read with these hints:

       &      Bitwise AND operator.
       ^      Bitwise exclusive-OR operator.
       >>     Bitwise right shift operator. When applied to an
              unsigned quantity, as here, right shift inserts zero
              bit(s) at the left.
       !      Logical NOT operator.
       ++     "n++" increments the variable n.
       0xNNN  0x introduces a hexadecimal (base 16) constant.
              Suffix L indicates a long value (at least 32 bits).

       /* Table of CRCs of all 8-bit messages. */
       unsigned long crc_table[256];

       /* Flag: has the table been computed? Initially false. */
       int crc_table_computed = 0;

       /* Make the table for a fast CRC. */
       void make_crc_table(void)
       {
         unsigned long c;


 Deutsch                      Informational                     [Page 11]

 RFC 1952             GZIP File Format Specification             May 1996


         int n, k;
         for (n = 0; n < 256; n++) {
           c = (unsigned long) n;
           for (k = 0; k < 8; k++) {
             if (c & 1) {
               c = 0xedb88320L ^ (c >> 1);
             } else {
               c = c >> 1;
             }
           }
           crc_table[n] = c;
         }
         crc_table_computed = 1;
       }

       /*
          Update a running crc with the bytes buf[0..len-1] and return
        the updated crc. The crc should be initialized to zero. Pre- and
        post-conditioning (one's complement) is performed within this
        function so it shouldn't be done by the caller. Usage example:

          unsigned long crc = 0L;

          while (read_buffer(buffer, length) != EOF) {
            crc = update_crc(crc, buffer, length);
          }
          if (crc != original_crc) error();
       */
       unsigned long update_crc(unsigned long crc,
                       unsigned char *buf, int len)
       {
         unsigned long c = crc ^ 0xffffffffL;
         int n;

         if (!crc_table_computed)
           make_crc_table();
         for (n = 0; n < len; n++) {
           c = crc_table[(c ^ buf[n]) & 0xff] ^ (c >> 8);
         }
         return c ^ 0xffffffffL;
       }

       /* Return the CRC of the bytes buf[0..len-1]. */
       unsigned long crc(unsigned char *buf, int len)
       {
         return update_crc(0L, buf, len);
       }


 Deutsch                      Informational                     [Page 12]






	Network Working Group P. Deutsch
	Request for Comments: 1952 Aladdin Enterprises
	Category: Informational May 1996


	GZIP file format specification version 4.3

	Status of This Memo

	This memo provides information for the Internet community. This memo
	does not specify an Internet standard of any kind. Distribution of
	this memo is unlimited.

	IESG Note:

	The IESG takes no position on the validity of any Intellectual
	Property Rights statements contained in this document.

	Notices

	Copyright (c) 1996 L. Peter Deutsch

	Permission is granted to copy and distribute this document for any
	purpose and without charge, including translations into other
	languages and incorporation into compilations, provided that the
	copyright notice and this notice are preserved, and that any
	substantive changes or deletions from the original are clearly
	marked.

	A pointer to the latest version of this and related documentation in
	HTML format can be found at the URL
	<ftp://ftp.uu.net/graphics/png/documents/zlib/zdoc-index.html>.

	Abstract

	This specification defines a lossless compressed data format that is
	compatible with the widely used GZIP utility. The format includes a
	cyclic redundancy check value for detecting data corruption. The
	format presently uses the DEFLATE method of compression but can be
	easily extended to use other compression methods. The format can be
	implemented readily in a manner not covered by patents.










	Deutsch Informational [Page 1]

	RFC 1952 GZIP File Format Specification May 1996


	Table of Contents

	1. Introduction ................................................... 2
	1.1. Purpose ................................................... 2
	1.2. Intended audience ......................................... 3
	1.3. Scope ..................................................... 3
	1.4. Compliance ................................................ 3
	1.5. Definitions of terms and conventions used ................. 3
	1.6. Changes from previous versions ............................ 3
	2. Detailed specification ......................................... 4
	2.1. Overall conventions ....................................... 4
	2.2. File format ............................................... 5
	2.3. Member format ............................................. 5
	2.3.1. Member header and trailer ........................... 6
	2.3.1.1. Extra field ................................... 8
	2.3.1.2. Compliance .................................... 9
	3. References .................................................. 9
	4. Security Considerations .................................... 10
	5. Acknowledgements ........................................... 10
	6. Author's Address ........................................... 10
	7. Appendix: Jean-Loup Gailly's gzip utility .................. 11
	8. Appendix: Sample CRC Code .................................. 11

	1. Introduction

	1.1. Purpose

	The purpose of this specification is to define a lossless
	compressed data format that:

	* Is independent of CPU type, operating system, file system,
	and character set, and hence can be used for interchange;
	* Can compress or decompress a data stream (as opposed to a
	randomly accessible file) to produce another data stream,
	using only an a priori bounded amount of intermediate
	storage, and hence can be used in data communications or
	similar structures such as Unix filters;
	* Compresses data with efficiency comparable to the best
	currently available general-purpose compression methods,
	and in particular considerably better than the "compress"
	program;
	* Can be implemented readily in a manner not covered by
	patents, and hence can be practiced freely;
	* Is compatible with the file format produced by the current
	widely used gzip utility, in that conforming decompressors
	will be able to read data produced by the existing gzip
	compressor.




	Deutsch Informational [Page 2]

	RFC 1952 GZIP File Format Specification May 1996


	The data format defined by this specification does not attempt to:

	* Provide random access to compressed data;
	* Compress specialized data (e.g., raster graphics) as well as
	the best currently available specialized algorithms.

	1.2. Intended audience

	This specification is intended for use by implementors of software
	to compress data into gzip format and/or decompress data from gzip
	format.

	The text of the specification assumes a basic background in
	programming at the level of bits and other primitive data
	representations.

	1.3. Scope

	The specification specifies a compression method and a file format
	(the latter assuming only that a file can store a sequence of
	arbitrary bytes). It does not specify any particular interface to
	a file system or anything about character sets or encodings
	(except for file names and comments, which are optional).

	1.4. Compliance

	Unless otherwise indicated below, a compliant decompressor must be
	able to accept and decompress any file that conforms to all the
	specifications presented here; a compliant compressor must produce
	files that conform to all the specifications presented here. The
	material in the appendices is not part of the specification per se
	and is not relevant to compliance.

	1.5. Definitions of terms and conventions used

	byte: 8 bits stored or transmitted as a unit (same as an octet).
	(For this specification, a byte is exactly 8 bits, even on
	machines which store a character on a number of bits different
	from 8.) See below for the numbering of bits within a byte.

	1.6. Changes from previous versions

	There have been no technical changes to the gzip format since
	version 4.1 of this specification. In version 4.2, some
	terminology was changed, and the sample CRC code was rewritten for
	clarity and to eliminate the requirement for the caller to do pre-
	and post-conditioning. Version 4.3 is a conversion of the
	specification to RFC style.



	Deutsch Informational [Page 3]

	RFC 1952 GZIP File Format Specification May 1996


	2. Detailed specification

	2.1. Overall conventions

	In the diagrams below, a box like this:

	+---+
	\| \| <-- the vertical bars might be missing
	+---+

	represents one byte; a box like this:

	+==============+
	\| \|
	+==============+

	represents a variable number of bytes.

	Bytes stored within a computer do not have a "bit order", since
	they are always treated as a unit. However, a byte considered as
	an integer between 0 and 255 does have a most- and least-
	significant bit, and since we write numbers with the most-
	significant digit on the left, we also write bytes with the most-
	significant bit on the left. In the diagrams below, we number the
	bits of a byte so that bit 0 is the least-significant bit, i.e.,
	the bits are numbered:

	+--------+
	\|76543210\|
	+--------+

	This document does not address the issue of the order in which
	bits of a byte are transmitted on a bit-sequential medium, since
	the data format described here is byte- rather than bit-oriented.

	Within a computer, a number may occupy multiple bytes. All
	multi-byte numbers in the format described here are stored with
	the least-significant byte first (at the lower memory address).
	For example, the decimal number 520 is stored as:

	0 1
	+--------+--------+
	\|00001000\|00000010\|
	+--------+--------+
	^ ^
	\| \|
	\| + more significant byte = 2 x 256
	+ less significant byte = 8



	Deutsch Informational [Page 4]

	RFC 1952 GZIP File Format Specification May 1996


	2.2. File format

	A gzip file consists of a series of "members" (compressed data
	sets). The format of each member is specified in the following
	section. The members simply appear one after another in the file,
	with no additional information before, between, or after them.

	2.3. Member format

	Each member has the following structure:

	+---+---+---+---+---+---+---+---+---+---+
	\|ID1\|ID2\|CM \|FLG\| MTIME \|XFL\|OS \| (more-->)
	+---+---+---+---+---+---+---+---+---+---+

	(if FLG.FEXTRA set)

	+---+---+=================================+
	\| XLEN \|...XLEN bytes of "extra field"...\| (more-->)
	+---+---+=================================+

	(if FLG.FNAME set)

	+=========================================+
	\|...original file name, zero-terminated...\| (more-->)
	+=========================================+

	(if FLG.FCOMMENT set)

	+===================================+
	\|...file comment, zero-terminated...\| (more-->)
	+===================================+

	(if FLG.FHCRC set)

	+---+---+
	\| CRC16 \|
	+---+---+

	+=======================+
	\|...compressed blocks...\| (more-->)
	+=======================+

	0 1 2 3 4 5 6 7
	+---+---+---+---+---+---+---+---+
	\| CRC32 \| ISIZE \|
	+---+---+---+---+---+---+---+---+




	Deutsch Informational [Page 5]

	RFC 1952 GZIP File Format Specification May 1996


	2.3.1. Member header and trailer

	ID1 (IDentification 1)
	ID2 (IDentification 2)
	These have the fixed values ID1 = 31 (0x1f, \037), ID2 = 139
	(0x8b, \213), to identify the file as being in gzip format.

	CM (Compression Method)
	This identifies the compression method used in the file. CM
	= 0-7 are reserved. CM = 8 denotes the "deflate"
	compression method, which is the one customarily used by
	gzip and which is documented elsewhere.

	FLG (FLaGs)
	This flag byte is divided into individual bits as follows:

	bit 0 FTEXT
	bit 1 FHCRC
	bit 2 FEXTRA
	bit 3 FNAME
	bit 4 FCOMMENT
	bit 5 reserved
	bit 6 reserved
	bit 7 reserved

	If FTEXT is set, the file is probably ASCII text. This is
	an optional indication, which the compressor may set by
	checking a small amount of the input data to see whether any
	non-ASCII characters are present. In case of doubt, FTEXT
	is cleared, indicating binary data. For systems which have
	different file formats for ascii text and binary data, the
	decompressor can use FTEXT to choose the appropriate format.
	We deliberately do not specify the algorithm used to set
	this bit, since a compressor always has the option of
	leaving it cleared and a decompressor always has the option
	of ignoring it and letting some other program handle issues
	of data conversion.

	If FHCRC is set, a CRC16 for the gzip header is present,
	immediately before the compressed data. The CRC16 consists
	of the two least significant bytes of the CRC32 for all
	bytes of the gzip header up to and not including the CRC16.
	[The FHCRC bit was never set by versions of gzip up to
	1.2.4, even though it was documented with a different
	meaning in gzip 1.2.4.]

	If FEXTRA is set, optional extra fields are present, as
	described in a following section.



	Deutsch Informational [Page 6]

	RFC 1952 GZIP File Format Specification May 1996


	If FNAME is set, an original file name is present,
	terminated by a zero byte. The name must consist of ISO
	8859-1 (LATIN-1) characters; on operating systems using
	EBCDIC or any other character set for file names, the name
	must be translated to the ISO LATIN-1 character set. This
	is the original name of the file being compressed, with any
	directory components removed, and, if the file being
	compressed is on a file system with case insensitive names,
	forced to lower case. There is no original file name if the
	data was compressed from a source other than a named file;
	for example, if the source was stdin on a Unix system, there
	is no file name.

	If FCOMMENT is set, a zero-terminated file comment is
	present. This comment is not interpreted; it is only
	intended for human consumption. The comment must consist of
	ISO 8859-1 (LATIN-1) characters. Line breaks should be
	denoted by a single line feed character (10 decimal).

	Reserved FLG bits must be zero.

	MTIME (Modification TIME)
	This gives the most recent modification time of the original
	file being compressed. The time is in Unix format, i.e.,
	seconds since 00:00:00 GMT, Jan. 1, 1970. (Note that this
	may cause problems for MS-DOS and other systems that use
	local rather than Universal time.) If the compressed data
	did not come from a file, MTIME is set to the time at which
	compression started. MTIME = 0 means no time stamp is
	available.

	XFL (eXtra FLags)
	These flags are available for use by specific compression
	methods. The "deflate" method (CM = 8) sets these flags as
	follows:

	XFL = 2 - compressor used maximum compression,
	slowest algorithm
	XFL = 4 - compressor used fastest algorithm

	OS (Operating System)
	This identifies the type of file system on which compression
	took place. This may be useful in determining end-of-line
	convention for text files. The currently defined values are
	as follows:






	Deutsch Informational [Page 7]

	RFC 1952 GZIP File Format Specification May 1996


	0 - FAT filesystem (MS-DOS, OS/2, NT/Win32)
	1 - Amiga
	2 - VMS (or OpenVMS)
	3 - Unix
	4 - VM/CMS
	5 - Atari TOS
	6 - HPFS filesystem (OS/2, NT)
	7 - Macintosh
	8 - Z-System
	9 - CP/M
	10 - TOPS-20
	11 - NTFS filesystem (NT)
	12 - QDOS
	13 - Acorn RISCOS
	255 - unknown

	XLEN (eXtra LENgth)
	If FLG.FEXTRA is set, this gives the length of the optional
	extra field. See below for details.

	CRC32 (CRC-32)
	This contains a Cyclic Redundancy Check value of the
	uncompressed data computed according to CRC-32 algorithm
	used in the ISO 3309 standard and in section 8.1.1.6.2 of
	ITU-T recommendation V.42. (See http://www.iso.ch for
	ordering ISO documents. See gopher://info.itu.ch for an
	online version of ITU-T V.42.)

	ISIZE (Input SIZE)
	This contains the size of the original (uncompressed) input
	data modulo 2^32.

	2.3.1.1. Extra field

	If the FLG.FEXTRA bit is set, an "extra field" is present in
	the header, with total length XLEN bytes. It consists of a
	series of subfields, each of the form:

	+---+---+---+---+==================================+
	\|SI1\|SI2\| LEN \|... LEN bytes of subfield data ...\|
	+---+---+---+---+==================================+

	SI1 and SI2 provide a subfield ID, typically two ASCII letters
	with some mnemonic value. Jean-Loup Gailly
	<gzip@prep.ai.mit.edu> is maintaining a registry of subfield
	IDs; please send him any subfield ID you wish to use. Subfield
	IDs with SI2 = 0 are reserved for future use. The following
	IDs are currently defined:



	Deutsch Informational [Page 8]

	RFC 1952 GZIP File Format Specification May 1996


	SI1 SI2 Data
	---------- ---------- ----
	0x41 ('A') 0x70 ('P') Apollo file type information

	LEN gives the length of the subfield data, excluding the 4
	initial bytes.

	2.3.1.2. Compliance

	A compliant compressor must produce files with correct ID1,
	ID2, CM, CRC32, and ISIZE, but may set all the other fields in
	the fixed-length part of the header to default values (255 for
	OS, 0 for all others). The compressor must set all reserved
	bits to zero.

	A compliant decompressor must check ID1, ID2, and CM, and
	provide an error indication if any of these have incorrect
	values. It must examine FEXTRA/XLEN, FNAME, FCOMMENT and FHCRC
	at least so it can skip over the optional fields if they are
	present. It need not examine any other part of the header or
	trailer; in particular, a decompressor may ignore FTEXT and OS
	and always produce binary output, and still be compliant. A
	compliant decompressor must give an error indication if any
	reserved bit is non-zero, since such a bit could indicate the
	presence of a new field that would cause subsequent data to be
	interpreted incorrectly.

	3. References

	[1] "Information Processing - 8-bit single-byte coded graphic
	character sets - Part 1: Latin alphabet No.1" (ISO 8859-1:1987).
	The ISO 8859-1 (Latin-1) character set is a superset of 7-bit
	ASCII. Files defining this character set are available as
	iso_8859-1.* in ftp://ftp.uu.net/graphics/png/documents/

	[2] ISO 3309

	[3] ITU-T recommendation V.42

	[4] Deutsch, L.P.,"DEFLATE Compressed Data Format Specification",
	available in ftp://ftp.uu.net/pub/archiving/zip/doc/

	[5] Gailly, J.-L., GZIP documentation, available as gzip-*.tar in
	ftp://prep.ai.mit.edu/pub/gnu/

	[6] Sarwate, D.V., "Computation of Cyclic Redundancy Checks via Table
	Look-Up", Communications of the ACM, 31(8), pp.1008-1013.




	Deutsch Informational [Page 9]

	RFC 1952 GZIP File Format Specification May 1996


	[7] Schwaderer, W.D., "CRC Calculation", April 85 PC Tech Journal,
	pp.118-133.

	[8] ftp://ftp.adelaide.edu.au/pub/rocksoft/papers/crc_v3.txt,
	describing the CRC concept.

	4. Security Considerations

	Any data compression method involves the reduction of redundancy in
	the data. Consequently, any corruption of the data is likely to have
	severe effects and be difficult to correct. Uncompressed text, on
	the other hand, will probably still be readable despite the presence
	of some corrupted bytes.

	It is recommended that systems using this data format provide some
	means of validating the integrity of the compressed data, such as by
	setting and checking the CRC-32 check value.

	5. Acknowledgements

	Trademarks cited in this document are the property of their
	respective owners.

	Jean-Loup Gailly designed the gzip format and wrote, with Mark Adler,
	the related software described in this specification. Glenn
	Randers-Pehrson converted this document to RFC and HTML format.

	6. Author's Address

	L. Peter Deutsch
	Aladdin Enterprises
	203 Santa Margarita Ave.
	Menlo Park, CA 94025

	Phone: (415) 322-0103 (AM only)
	FAX: (415) 322-1734
	EMail: <ghost@aladdin.com>

	Questions about the technical content of this specification can be
	sent by email to:

	Jean-Loup Gailly <gzip@prep.ai.mit.edu> and
	Mark Adler <madler@alumni.caltech.edu>

	Editorial comments on this specification can be sent by email to:

	L. Peter Deutsch <ghost@aladdin.com> and
	Glenn Randers-Pehrson <randeg@alumni.rpi.edu>



	Deutsch Informational [Page 10]

	RFC 1952 GZIP File Format Specification May 1996


	7. Appendix: Jean-Loup Gailly's gzip utility

	The most widely used implementation of gzip compression, and the
	original documentation on which this specification is based, were
	created by Jean-Loup Gailly <gzip@prep.ai.mit.edu>. Since this
	implementation is a de facto standard, we mention some more of its
	features here. Again, the material in this section is not part of
	the specification per se, and implementations need not follow it to
	be compliant.

	When compressing or decompressing a file, gzip preserves the
	protection, ownership, and modification time attributes on the local
	file system, since there is no provision for representing protection
	attributes in the gzip file format itself. Since the file format
	includes a modification time, the gzip decompressor provides a
	command line switch that assigns the modification time from the file,
	rather than the local modification time of the compressed input, to
	the decompressed output.

	8. Appendix: Sample CRC Code

	The following sample code represents a practical implementation of
	the CRC (Cyclic Redundancy Check). (See also ISO 3309 and ITU-T V.42
	for a formal specification.)

	The sample code is in the ANSI C programming language. Non C users
	may find it easier to read with these hints:

	& Bitwise AND operator.
	^ Bitwise exclusive-OR operator.
	>> Bitwise right shift operator. When applied to an
	unsigned quantity, as here, right shift inserts zero
	bit(s) at the left.
	! Logical NOT operator.
	++ "n++" increments the variable n.
	0xNNN 0x introduces a hexadecimal (base 16) constant.
	Suffix L indicates a long value (at least 32 bits).

	/* Table of CRCs of all 8-bit messages. */
	unsigned long crc_table[256];

	/* Flag: has the table been computed? Initially false. */
	int crc_table_computed = 0;

	/* Make the table for a fast CRC. */
	void make_crc_table(void)
	{
	unsigned long c;



	Deutsch Informational [Page 11]

	RFC 1952 GZIP File Format Specification May 1996


	int n, k;
	for (n = 0; n < 256; n++) {
	c = (unsigned long) n;
	for (k = 0; k < 8; k++) {
	if (c & 1) {
	c = 0xedb88320L ^ (c >> 1);
	} else {
	c = c >> 1;
	}
	}
	crc_table[n] = c;
	}
	crc_table_computed = 1;
	}

	/*
	Update a running crc with the bytes buf[0..len-1] and return
	the updated crc. The crc should be initialized to zero. Pre- and
	post-conditioning (one's complement) is performed within this
	function so it shouldn't be done by the caller. Usage example:

	unsigned long crc = 0L;

	while (read_buffer(buffer, length) != EOF) {
	crc = update_crc(crc, buffer, length);
	}
	if (crc != original_crc) error();
	*/
	unsigned long update_crc(unsigned long crc,
	unsigned char *buf, int len)
	{
	unsigned long c = crc ^ 0xffffffffL;
	int n;

	if (!crc_table_computed)
	make_crc_table();
	for (n = 0; n < len; n++) {
	c = crc_table[(c ^ buf[n]) & 0xff] ^ (c >> 8);
	}
	return c ^ 0xffffffffL;
	}

	/* Return the CRC of the bytes buf[0..len-1]. */
	unsigned long crc(unsigned char *buf, int len)
	{
	return update_crc(0L, buf, len);
	}




	Deutsch Informational [Page 12]