Overview
By the end of this article you should be able to answer the following questions:
- c: creating tar files
- x: extracting tar files
- t: displaying table of contents of a tar file
$ tar -xzvf testfile.tar.gz
$ tar -cvzf name.tar.gz /tmp/testfolder
$ gzip testfolder.tar # this will rename this file to testfolder.tar.gz
$ gunzip /tmp/testfolder.tar.gz
$ bzip2 /tmp/testfile1.txt # this will rename file to testfile1.txt.bz2
$ bunzip2 /tmp/testfile1.txt.bz2
$ tar -tf /tmp/testfolder.tar.gz
$ star -czv -f=testfolder.tar.gz testfolder
$ yum install star
$ star -xzv -f=testfolder.tar.gz
$ star -tv -f=testfolder.tar.gz
Compressing files
There are 2 main commands that you can use for compressing files gzip
and bzip2
.
Using the gzip file compression utility
gzip is the more popular command out of the 2. gzip is actually really easy to use, here's an example. First let's create a file that we want to compress:
$ echo "hello world" > testfile1.txt $ file testfile1.txt testfile1.txt: ASCII text
Here we use the file
command to confirm the type of file this is. We used the ".txt" for our benefit, and it as an indicator to us of what type of file this is. So it is best practice to always add a meaningful suffix to the files you create. E.g. if you create a shell script, you should give it a suffix such ".sh". However filename suffixes like ".txt" and ".sh" doesn't have any special meaning to Linux. Instead Linux determines a file's type by analyzing its content, in the same way that the file
command does. Since a suffix is something that we assign to files ourselves, it means that a filename's can be wrong/inaccurate due to human error. Therefore if a file doesn't have a suffix, or we suspect that it could be wrong, then we can check a file's type using the file
command.
Now here's how to compress this file using gzip:
$ gzip testfile1.txt $ ls -l total 4 -rw-r--r--. 1 root root 46 Oct 23 19:00 testfile1.txt.gz $ file testfile1.txt.gz testfile1.txt.gz: gzip compressed data, was "testfile1.txt", from Unix, last modified: Fri Oct 23 19:00:14 2015
As you can see, the gzip command not only compressed the file, but also added it's own suffix to the resulting compressed file, ".gz". This again is for our benefit only and doesn't have any special meaning to Linux or to the gzip utility.
Now to unzip, we can use the gzip's command's (d)ecompress option. Or we use the gunzip command:
$ gunzip testfile1.txt.gz $ ls -l total 4 -rw-r--r--. 1 root root 12 Oct 23 19:00 testfile1.txt $ file testfile1.txt testfile1.txt: ASCII text
Using the bzip2 file compression utility
Now here's how we compress a file using bzip2:
$ bzip2 testfile1.txt $ ls -l total 4 -rw-r--r--. 1 root root 52 Oct 23 19:00 testfile1.txt.bz2 $ file testfile1.txt.bz2 testfile1.txt.bz2: bzip2 compressed data, block size = 900k
Once again bzip2 has added the bz2 suffix for our benefit. Now to uncompress, we can use the bzip command's -d option, or we can use the bunzip2
command:
$ bunzip2 testfile1.txt.bz2 $ ls -l total 4 -rw-r--r--. 1 root root 12 Oct 23 19:00 testfile1.txt $ file testfile1.txt testfile1.txt: ASCII text
Compressing whole directories using tar
gzip and bzip2 on their own can't compress a directory. They can only compress files. However the tar command can convert (i.e. archive) a file into a single file, which is referred to as a "tar" file. For example let's say we have the following folder, called "testfolder":
$ pwd /home/codingbee/testfolder $ ls -l total 12 drwxr-xr-x. 5 root root 91 Oct 23 19:58 testfolder $ file testfolder testfolder: directory $ tree . └── testfolder ├── file2 ├── file3 ├── folder1 │ ├── testfile2.txt │ └── testfile3.txt ├── folder2 ├── folder3 │ └── testfile4.txt └── testfile.txt 4 directories, 6 files
Now here's how we create a tar file from this folder:
$ tar -cvf testfolder.tar testfolder testfolder/ testfolder/file2 testfolder/file3 testfolder/folder1/ testfolder/folder1/testfile2.txt testfolder/folder1/testfile3.txt testfolder/folder2/ testfolder/folder3/ testfolder/folder3/testfile4.txt testfolder/testfile.txt $ ls -l total 12 drwxr-xr-x. 5 root root 91 Oct 23 19:58 testfolder -rw-r--r--. 1 root root 10240 Oct 23 20:03 testfolder.tar $ file testfolder.tar testfolder.tar: POSIX tar archive (GNU)
You should read this tar command as
Create verbosely, a tar archive with filename of 'testfolder.tar' using contents from 'testfolder'.
Note, don't use the full path to the testfolder. Instead cd into the directory and then run the tar command.
Note that tar keeps the original directory intact, and just generates a tar equivalent for it. Also note that the tar command doesn't automatically add any suffixes to the resulting tar file, e.g. a .tar suffix. That's why we manually added this as shown above.
However if you want to also compress a directory and not just create a tar file, then you can run the gzip command on the tar file:
$ gzip testfolder.tar $ ls -l total 4 drwxr-xr-x. 5 root root 91 Oct 23 19:58 testfolder -rw-r--r--. 1 root root 442 Oct 23 20:09 testfolder.tar.gz $ file testfolder.tar.gz testfolder.tar.gz: gzip compressed data, was "testfolder.tar", from Unix, last modified: Fri Oct 23 20:09:17 2015
So to compress a folder, we had to first had to run the tar command followed by the gzip command. However a faster way to achieve the same result is to instruct the tar command to automatically gzip the tar file straight after it has been created. This is done using the tar command's -z option:
$ ls -l total 0 drwxr-xr-x. 5 root root 91 Oct 23 19:58 testfolder $ tar -cvzf testfolder.tar.gz testfolder tar: Removing leading `/' from member names testfolder/ testfolder/file2 testfolder/file3 testfolder/folder1/ testfolder/folder1/testfile2.txt testfolder/folder1/testfile3.txt testfolder/folder2/ testfolder/folder3/ testfolder/folder3/testfile4.txt testfolder/testfile.txt $ ls -l total 4 drwxr-xr-x. 5 root root 91 Oct 23 19:58 testfolder -rw-r--r--. 1 root root 437 Oct 23 20:27 testfolder.tar.gz $ file testfolder.tar.gz testfolder.tar.gz: gzip compressed data, from Unix, last modified: Fri Oct 23 20:27:09 2015
Note, if you want tar to use the bzip2 instead of gzip, then you simply using the tar command's "j" option instead of the "z" option.
Uncompressing tar files back into directories
$ ls -l total 4 -rw-r--r--. 1 root root 427 Oct 23 20:51 testfolder.tar.gz $ tar -xvzf testfolder.tar.gz testfolder/ testfolder/file2 testfolder/file3 testfolder/folder1/ testfolder/folder1/testfile2.txt testfolder/folder1/testfile3.txt testfolder/folder2/ testfolder/folder3/ testfolder/folder3/testfile4.txt testfolder/testfile.txt
You should read this command as:
Extract verbosely the gzipped tar archive of the filename testfolder.tar.gz
Now let's check this has worked:
$ ls -l total 4 drwxr-xr-x. 5 root root 91 Oct 23 19:58 testfolder -rw-r--r--. 1 root root 427 Oct 23 20:51 testfolder.tar.gz $ tree testfolder testfolder ├── file2 ├── file3 ├── folder1 │ ├── testfile2.txt │ └── testfile3.txt ├── folder2 ├── folder3 │ └── testfile4.txt └── testfile.txt 3 directories, 6 files
One thing to remember is that doing an extract like this will overwrite any files/folders that it will get extracted to, here's an example:
$ cat testfolder/testfile.txt hello wooooorld $ tar -xvf testfolder.tar testfolder/ testfolder/file2 testfolder/file3 testfolder/folder1/ testfolder/folder1/testfile2.txt testfolder/folder1/testfile3.txt testfolder/folder2/ testfolder/folder3/ testfolder/folder3/testfile4.txt testfolder/testfile.txt $ cat testfolder/testfile.txt apple pie apple cake banofee cake orange soda brown bread cottage pie chocolate cake white bread upside down pineapple cake bread and butter pudding $
One way to avoid accidantically overwrite
Looking inside a tar file
You can look inside a tar file using the -t option:
$ tar -tf testfolder.tar.gz testfolder/ testfolder/file2 testfolder/file3 testfolder/folder1/ testfolder/folder1/testfile2.txt testfolder/folder1/testfile3.txt testfolder/folder2/ testfolder/folder3/ testfolder/folder3/testfile4.txt testfolder/testfile.txt
Compressing using the star utility
Standard Tape ARchiver, aka star, is actually a wrapper around tar, which offers some other features on top of tar, for example it avoids overwriting existing files/folders during extracting., but it isn't installed by default, so to use this utility you first have to install it:
$ yum install star
Afte that, to create an archive you do:
$ ls -l total 0 drwxr-xr-x. 5 root root 91 Oct 23 19:58 testfolder $ star -cz -f=testfolder.tar.gz testfolder/ star: 1 blocks + 0 bytes (total of 10240 bytes = 10.00k). $ ls -l total 4 drwxr-xr-x. 5 root root 91 Oct 23 19:58 testfolder -rw-r--r--. 1 root root 458 Oct 23 21:26 testfolder.tar.gz $ file testfolder.tar.gz testfolder.tar.gz: gzip compressed data, from Unix, last modified: Fri Oct 23 21:26:32 2015 $
Note: we stick with the same ".tar.gz" extension convention to keep this simple. star also doesn't have
To extract, you do:
$ star -x -z -f=testfolder.tar.gz star: current 'testfolder/' newer. star: current 'testfolder/file2' newer. star: current 'testfolder/file3' newer. star: current 'testfolder/folder1/' newer. star: current 'testfolder/folder1/testfile2.txt' newer. star: current 'testfolder/folder1/testfile3.txt' newer. star: current 'testfolder/folder2/' newer. star: current 'testfolder/folder3/' newer. star: current 'testfolder/folder3/testfile4.txt' newer. star: current 'testfolder/testfile.txt' newer. star: 1 blocks + 0 bytes (total of 10240 bytes = 10.00k).
Note, this process will ONLY overwrite if archive files are newer than those that are on the Operating System. If the OS has newer versions of the corresponding archive files, then star will skip extracting the older archived files.
Also you can uncompress star files using tar as well.
You can also use star to extract just a single file:
$ tree . ├── testfolder │ ├── file2 │ ├── file3 │ ├── folder1 │ │ ├── testfile2.txt │ │ └── testfile3.txt │ ├── folder2 │ ├── folder3 │ │ └── testfile4.txt │ └── testfile.txt # lets first delete this file └── testfolder.tar.gz 4 directories, 7 files $ rm testfolder/testfile.txt rm: remove regular file ‘testfolder/testfile.txt’? y $ tree . ├── testfolder │ ├── file2 │ ├── file3 │ ├── folder1 │ │ ├── testfile2.txt │ │ └── testfile3.txt │ ├── folder2 │ └── folder3 │ └── testfile4.txt └── testfolder.tar.gz 4 directories, 6 files $ star -t -f=testfolder.tar.gz star: WARNING: Archive is 'gzip' compressed, trying to use the -z option. testfolder/ testfolder/folder1/ testfolder/folder1/testfile2.txt testfolder/folder1/testfile3.txt testfolder/folder2/ testfolder/folder3/ testfolder/folder3/testfile4.txt testfolder/file2 testfolder/file3 testfolder/testfile.txt # we want to just extract this file star: 1 blocks + 0 bytes (total of 10240 bytes = 10.00k). $ star -x -z -f=testfolder.tar.gz testfolder/testfile.txt star: 1 blocks + 0 bytes (total of 10240 bytes = 10.00k). $ tree . ├── testfolder │ ├── file2 │ ├── file3 │ ├── folder1 │ │ ├── testfile2.txt │ │ └── testfile3.txt │ ├── folder2 │ ├── folder3 │ │ └── testfile4.txt │ └── testfile.txt └── testfolder.tar.gz 4 directories, 7 files $