Skip to content

Day 16 - Archiving and compressing

INTRO

As a system administrator, you need to be able to confidently work with compressed “archives” of files. In particular two of your key responsibilities; installing new software, and managing backups, often require this.

On other operating systems, applications like WinZip, and pkzip before it, have long been used to gather a series of files and folders into one compressed file - with a .zip extension.

Archiving and compressing, however, are actually distinct processes that are frequently used together. Archiving consolidates multiple files into one “box” while preserving their structure and attributes, but it does not change the total data size. Compression uses sophisticated algorithms to encode data efficiently, saving storage space by shrinking that file.

YOUR TASKS TODAY

  • Compress a file and compare sizes
  • Archive the contents of a folder
  • Create a compressed “tarball”
  • Extract files from a tarball

Check out the demo

Help maintain the course by purchasing the reference card for this lesson.

COMPRESSING FILES

The goal of compression is to reduce file size. Smaller files use less bandwidth and can be transmitted over networks faster. Different tools use different algorithms to actually shrink that data.

You can compress a file with GZip like this:

gzip my-big-file

…which will create my-big-file.gz but it will replace the original file. If you want to preserve the uncompressed file, try:

gzip -vk my-big-file

This uses the -v to make the command “verbose” and -k to keep the original file.

When it’s time to decompress, do it with the -d switch, like this:

gzip -d my-big-file.gz

Popular Tools:

  • gzip: Very common and fast.
  • bzip2: Slower but generally creates smaller files.
  • xz: Currently the most space-efficient tool in Linux.

Text usually compress well regardless of the tool. JPEG/MP4 files may not shrink as much, but those formats are already somewhat compressed.

CREATING ARCHIVES

So, you could create a “snapshot” of the current files in your /etc/init.d folder like this:

tar -cvf myinits.tar /etc/init.d/

This creates myinits.tar in your current directory.

Note 1: The -f switch specifies that “the output should go to the filename which follows” - so in this case the order of the switches is important. VERY IMPORTANT: tar considers anything after -f as the name of the archive that needs to be created. So, we should always use -f as the last flag while creating an archive.

Note 2: The -v switch (verbose) is included to give some feedback - traditionally many utilities provide no feedback unless they fail.

(The cryptic “tar” name? - originally short for “tape archive”)

You could then compress this file with GnuZip like this:

gzip myinits.tar

…which will create myinits.tar.gz. A compressed tar archive like this is known as a “tarball”. You will also sometimes see tarballs with a .tgz extension - at the Linux commandline this doesn’t have any meaning to the system, but is simply helpful to humans.

In practice you can do the two steps in one with the -z switch, like this:

tar -cvzf myinits.tgz /etc/init.d/

This uses the -c switch to say that we’re creating an archive; -v to make the command “verbose”; -z to compress the result - and -f to specify the output file.

EXTRACTING ARCHIVES

To “explode” an archive and retrieve your files, use the -x (extract) flag:

tar -xvf archive.tar.gz

Safety First: Before extracting, it is a good practice to preview the contents using the -t (list) flag:

tar -tf archive.tar.gz

That gives you an idea of how the file structure will look like after extracting. If the list shows many files without a common prefix (like folder/file), you have found a tarbomb. No, not this one. Extracting this may spew files directly into your current directory, potentially overwriting existing files.

You can use the -C flag to extract files into a specified target directory to keep things tidy.

tar -xvf archive.tar.gz -C target_dir/

EXTENSION

You might notice that some tutorials write tar cvf rather than tar -cvf with the switch character - do you know why? Hint: It’s related to old “tape archive” styles and modern compatibility.

A note about compression levels - Most tools allow you to adjust the balance between speed and size, using a numeric scale (1–9):

  • Level 1: Fast, but provides low compression.
  • Level 6: The default setting; a balance of speed and size.
  • Level 9: Best compression (smallest file size) but takes the longest time.

However, when calling a compression tool while creating a tarball, tar will only use the default level 6.

RESOURCES

Some rights reserved. Check the license terms here

Comments