Gzip
Documentation |
#include <cryptopp/gzip.h>
|
Gzip is a lossless compression format standardized in RFC 1952, GZIP file format specification. Gzip is actually a file format with additional metadata (like original filename, file modified time and comments), and the underlying compression occurs using Deflator
from RFC 1951. Crypto++ provides GZIP compression through the Gzip
class, and decompression though the Gunzip class.
The Gzip
compressor takes a pointer to a BufferedTransformation
. Because a pointer is taken, the Gzip
owns the attached transformation, and therefore will destroy it. See ownership for more details.
The Crypto++ implementation supports filenames, comments and filetimes as of Crypto++ 6.0. The support was added under Issue 420, Add Gzip Filename, Filetime and Comment support. You will have to patch the library for filenames, comments and filetimes for Crypto++ 5.6.5 and below.
The Gzip
class inherits from the Deflator
class (which provides the RFC 1951 implementation), so many of the constants used by Gzip are provided by Deflator in zdeflate.h
.
Construction
Gzip (BufferedTransformation *attachment=NULL, unsigned int deflateLevel=DEFAULT_DEFLATE_LEVEL, unsigned int log2WindowSize=DEFAULT_LOG2_WINDOW_SIZE, bool detectUncompressible=true) Gzip (const NameValuePairs ¶meters, BufferedTransformation *attachment=NULL)
attachment
is a BufferedTransformation, such as another filter or sink. If attachment
is NULL
, then the Gzip
object will internally accumulate the output byte stream.
deflateLevel
is the deflation level. The value should be between 0 and 9. 0 provides minimum compression and executes quickly, while 9 provides maximum compression and executes the slowest. zdeflate.h
provides some constants for the deflateLevel
. MIN_DEFLATE_LEVEL
is 0, DEFAULT_DEFLATE_LEVEL
is 6, and MAX_DEFLATE_LEVEL
is 9.
log2WindowSize
controls the table size used for compression. The value should be between 9 and 15, meaning the table will be between 29 and 215. 9 provides the smallest table size, while 15 provides the largest table size. zdeflate.h
provides some constants for the log2WindowSize
. MIN_LOG2_WINDOW_SIZE
is 9, DEFAULT_LOG2_WINDOW_SIZE
is 15, and MAX_LOG2_WINDOW_SIZE
is 15.
detectUncompressible
means the library should try to detect if a file is uncompressible. From zdeflate.h
, detectUncompressible
makes it faster to process uncompressible files, but if a file has both compressible and uncompressible parts, it may fail to compress some of the compressible parts.
parameters
are NameValuePairs
used in the alternate constructor. The names recognized are Log2WindowSize
, DeflateLevel
and DetectUncompressible
.
Sample Programs
The following is a small collection of sample programs to demonstrate using the Gzip
compressor.
In-memory String
string data = "abcdefghijklmnopqrstuvwxyz"; string compressed; Gzip zipper(new StringSink(compressed)); zipper.Put((byte*) data.data(), data.size()); zipper.MessageEnd();
On-disk File
string filename("test.txt.gz"); string data = "abcdefghijklmnopqrstuvwxyz"; Gzip zipper(new FileSink(filename.c_str(), true)); zipper.Put((byte*) data.data(), data.size()); zipper.MessageEnd();
String using Pipeline
string data = "abcdefghijklmnopqrstuvwxyz"; string compressed; StringSource ss(data, true, new Gzip( new StringSink(compressed) ));
File using Pipeline
string filename("test.txt.gz"); string data = "abcdefghijklmnopqrstuvwxyz"; StringSource ss(data, true, new Gzip( new FileSink(filename.c_str(), true) ));
String using Put/Get
Gzip zipper; zipper.Put((byte*)data.data(), data.size()); zipper.MessageEnd(); word64 avail = zipper.MaxRetrievable(); if(avail) { string compressed; compressed.resize(avail); zipper.Get((byte*)&compressed[0], compressed.size()); }
Array using Put/Get
Gzip zipper; zipper.Put((byte*)data.data(), data.size()); zipper.MessageEnd(); word64 avail = zipper.MaxRetrievable(); if(avail) { vector<byte> compressed; compressed.resize(avail); zipper.Get(&compressed[0], compressed.size()); }
Patch
The patch below adds the ability to read and write the original filename, the modified filetime and comments for an archive. The sample program below shows how it could be used.
try { string filename("test.txt.gz"), s1, s2; string data = "abcdefghijklmnopqrstuvwxyz"; // Create a compressor, save stream to memory via 's1' Gzip zipper(new StringSink(s1)); // Add some Gzip specific fields zipper.SetFilename(filename); zipper.SetFiletime((word32)time(0)); zipper.SetComment("This is a test of filenames and comments"); // Write the data to the stream zipper.Put((byte*) data.c_str(), data.size()); zipper.MessageEnd(); // Save the compressed data to a file FileSink fs(filename.c_str(), true); fs.Put((byte*) s1.data(), s1.size()); fs.MessageEnd(); // Create a decompressor, save stream to memory via 's2' Gunzip unzipper(new StringSink(s2)); // Add the compressed data to it unzipper.Put( (unsigned char*) s1.data(), s1.size()); unzipper.MessageEnd(); // Print the Gzip specific data cout << "Filename: " << unzipper.GetFilename() << endl; cout << "Filetime: " << unzipper.GetFiletime() << endl; cout << "Comment: " << unzipper.GetComment() << endl; // Print the uncompressed stream cout << "Data: " << s2 << endl; } catch(CryptoPP::Exception& ex) { cerr << ex.what() << endl; }
A typical run of the program is showed below.
$ ./cryptopp-test.exe Filename: test.txt.gz Filetime: 1420337339 Comment: This is a test of filenames, filetimes and comments Data: abcdefghijklmnopqrstuvwxyz
Saving to the original filename with a pipeline using Crypto++ can be tricky because the original filename is not available when the FileSink
is created. Here's one way to do it:
// Create a decompressor, save stream to ByteQueue' ByteQueue queue; Gunzip unzipper(new Redirector(queue)); // Add the compressed data to it unzipper.Put( (unsigned char*) compressed.data(), compressed.size()); unzipper.MessageEnd(); FileSink fs(unzipper.GetFilename().c_str(), true); queue.TransferTo(fs); fs.MessageEnd();
To unpack the archive using the original filename from the command line, you would use gunzip -N
. It can be tested by renaming test.txt.gz
to something else, like test.gz
.
And a view of the archive under The Archive Browser:
Note: The Archive Browser on OS X displays the implicit filename (the archive name without the gz
extension), and not the original filename embedded in the header. Also see Issue 802: The Archive Browser does not honor original filename field in a GZIP header.
Downloads
gzip.diff.zip - patch that adds the ability to set and retrieve the original filename, the modified filetime and comments on a GZIP archive. The ZIP includes the diff of changes to gzip.h
and gunzip.h
, and the modified gzip.h
and gunzip.h
files themselves.