Unzip Nested Zip Files While Streaming

5.0

updated 9 years ago

I recently encountered a scenario where I needed to unzip all the files in a zip file and also any files from internal zip files.  The source data is streaming in through an HTTP POST via IIS into BizTalk. The zip files can be large (up to 200 MB) and there can be multiple posts happening at the same time.  This is too much data to fit in memory.  Also, I needed to avoid unnecessary network traffic so using temporary files is not an optimal solution.  Therefore, I needed a forward-only streaming solution.

To accomplish this, I turned to #ziplib. The ZipInputStream object looked like the perfect solution to this situation. Here is an example of how to use this class:

As the raw data is streamed through the ZipInputStream, it gets unzipped.  The GetNextEntry() method sets the position to the beginning of the next file.  Then we just read from the ZipInputStream to get the unzipped file data.  So to unzip nested zip files, I came up with a function I could call recursively:

Now this would work great for my needs as it process the data as a forward-only read-only stream.  However, whenever a nested zip runs out of entries (i.e. GetNextEntry() == null) the ZipInputStream calls close on the underlying stream.  This results in the unzip process ending prematurely.

To fix this, I commented out the Close() call within the GetNextEntry() method of the ZipInputStream class:

if (header == ZipConstants.CentralHeaderSignature ||
  header == ZipConstants.EndOfCentralDirectorySignature ||
  header == ZipConstants.CentralHeaderDigitalSignature ||
  header == ZipConstants.ArchiveExtraDataSignature ||
  header == ZipConstants.Zip64CentralFileHeaderSignature) {
  // No more individual entries exist
  // -jv- 11-Jun-2009 Removed close so it can support nested zips
  //Close();
  return null;
}

Of course, the calling method should properly close the source stream so this is a safe change to make. For example:

using (Stream s = inmsg.BodyPart.GetOriginalDataStream()) {
  NestedUnzip(s, unzipLocation)
}

The result is a perfect streaming solution with low memory usage and no need for temporary files.