Recently, while doing an experiment with my blog’s rendered output with a VPS instance, I needed to transfer it to the server over SSH. While doing that, I experimented with archiving the folder a bit, so I’m putting the outcome of that experience here, should I need it again in the future.
All notes below assume GNU tar v1.26
. More specifically, the output of tar --version | head -1
gives:
tar (GNU tar) 1.26
I’m only listing the arguments and use-cases that I think are most frequently used (at least by me)
and the ones I’m most likely to need in the future. Please complement this with a healthy serving of
man tar
to keep your sanity.
Check out this neat little tool to help generate often-used tar
commands:
cligen.sharats.me. Thanks!
Creating Archives¶
The -c
(or --create
) command is used to create archives.
The -
in front of the c
can be omitted, but I find that ugly and prefer to include it. That way
it’s consistent with most other such GNU commands.
Additional options after -c
:
-
v
– Enable verbose output. Adding this will print each file as it is being added to the archive. -
z
orj
– Specify the compression format, if needed. Usez
forgz
archive orj
forbz2
archive. This can also bea
to infer the compression format from the file name, but only if thef
(explained in the next point) is also given. Other compression formats like--xz
,--lzip
etc. can also be used. -
f
– Use the next argument as the file name of the archive. If this argument is not provided, the archive content is written to the standard out. -
--remove-files
– Remove files after adding them to the archive. Be careful with this.
To illustrate the examples, I’ll clone one of my public repositories and play around with creating archives of it.
$ git clone git@github.com:sharat87/just-a-calendar.git
$ du -sh just-a-calendar
248K just-a-calendar/
Create a .tar.bz2
Archive¶
To create a bz2
archive of a folder:
$ tar -cjf package.tar.bz2 just-a-calendar
$ file package.tar.bz2
package.tar.bz2: bzip2 compressed data, block size = 900k
$ du -sh package.tar.bz2
76K package.tar.bz2
Since we are specifying the file name here, which includes the .bz2
part at the end, we can tell
tar
to just figure out the compression we want to use. Instead of the j
argument specifying the
compression, we’d put in a
to indicate this.
$ tar -caf package.tar.bz2 just-a-calendar
$ file package.tar.bz2
package.tar.bz2: bzip2 compressed data, block size = 900k
$ du -sh package.tar.bz2
76K package.tar.bz2
Exclude .git
Directory¶
Now, the archive also contains the .git
directory that was present in our clone. We probably don’t
what that. The tar
command provides --exclude*
family of arguments to deal with this. For
example, as in our case, to ignore the folder .git
, we could do:
$ tar -caf package.tar.bz2 --exclude=.git just-a-calendar
$ du -sh package.tar.bz2
12K package.tar.bz2
This package doesn’t contain the .git
folder (and consequently is much smaller). However, for
this particular problem, there’s perhaps an even better solution, the --exclude-vcs
argument. This
argument will ignore any VCS directories automatically and it knows about .git
. So our command
becomes:
$ tar -caf package.tar.bz2 --exclude-vcs just-a-calendar
Another similar useful argument is the --eclude-backups
, which will exclude backup and lock
files which also is usually what we want.
Set Initial Directory¶
The -C
(or --directory
) argument sets the initial working directory before creating the archive.
This will influence the paths with which the files inside the archive are saved with. This is
normally only useful if for some reason you can’t cd
or pushd
to that directory yourself, which
is not very often.
Inspecting Archives¶
The -t
(or --list
) can be used to list the contents of an archive without extracting it.
Additional options after -t
:
-
v
– Verbose listing. The affect of adding this option is like adding-l
to thels
command. That is, it will show each file’s permissions, size, last modified etc. details. -
f
– Treat next argument as the archive file name. This argument is usually always needed with the-t
command (unless the archive is being piped in to thetar -t
command).
Let’s run this on our package archive created in the previous section.
$ tar -tf package.tar.bz2 | wc -l
6
Single vs Multiple Top Levels¶
There’s one thing about extracting archives that’s extremely annoying. If it contains multiple files at top level, it’ll pollute the current directory with several objects. To combat this, if we make it a habit to create a new folder and extract inside it, it might turn out that the archive itself contains a top level directory, so now we end up one useless directory in the tree.
This situation is actually handled very well by the aunpack
command from the atool script.
This command takes an archive (of any of several different formats) and extracts it. If it contains
a single top level entry, it is extracted to your working directory. If it contains several top
level entries, a new directory is created and the extraction happens inside that new directory. This
command is extremely convenient, for this and several other reasons.
To find out if an archive has a single top-level entry or multiple, the following snippet can be used:
tar -tf package.tar.bz2 | cut -d/ -f1 | sort -u
This will print out one top-level entry per line. If there’s only one line in the output, then
there’s only one top-level. How this works is that first, the cut
command splits the listing with
/
character, the file separator and only prints the first entry, which will be the top level
entry. Then, the sort
command will sort the top-levels and only print the unique entries
(that’s what the -u
is for). We could further pipe this to wc -l
and check if it results in 1
.
Extracting Archives¶
The -x
(or --extract
) command is used to extract the contents of archives.
This command takes the following arguments:
-
v
– Verbose logging. Prints each file path as it is being extracted. -
z
orj
– Specify the compression format, if needed. Similar in working as with the-c
command. -
f
– Reads the next argument as the archive file name. This is almost always used with this command to specify the archive to extract. If this is not provided, the archive content is expected to be available from standard input. -
k
(or--keep-old-files
) – Fail if any existing files will be overwritten by extracting. This is useful if you don’t want any of your existing files to be overwritten.
So, to extract our archive (in a separate location, of course):
$ mkdir spike && cd spike
$ tar -xaf ../package.tar.bz2
Extracting to Different Directory¶
The extract command also supports the -C
(or --directory
) argument that sets the initial working
directory before extracting. This can be used to change the location where the extracted
files/folder will be saved.
Transferring Archives / Directories¶
In this section, I’ll show a couple of quick examples where we need to transfer a folder tree between current local system and a remote system reachable by SSH.
Local to Remote¶
We could create a tar
file of the folder (and any other files as well), transfer the file to the
remote system, login to the remote system and unpack it there.
There’s a couple of problems with this approach:
- Since we are creating an archive of the folder on our local disk, we need to have the necessary free space for that archive. This may be less the size of the folder, but can still be significant if the folder is large. The same problem will also appear on the remote system.
- We need write permissions on the local disk. If we want to just take a folder to a remote system, we should only need write permission on the remote disk, not on the local disk.
To avoid the above two problems, we can transfer the archive directly as a stream, without saving it
to the local disk. Notice that if we don’t provide a filename for the create (-c
) command, the
archive will be written to standard out. Similarly, if we don’t provide a filename for the extract
(-x
) command, it will read the archive from standard input. Our solution below will leverage these
two facts.
tar -cj just-a-calendar | ssh remote tar -xj
The first command (tar -cj just-a-calendar
) creates a bzip2
-compressed archive (we could’ve used
z
here to use gz
compression instead) and writes it to the standard out. This becomes the
standard input for the ssh
command which will connect to the remote host, invoke the tar -xj
command, and forwards it’s own standard input to that tar -xj
command. The tar -xj
command
extracts the archive from it’s standard input, using bzip2
for decompressing and writes the
extracted contents to the remote user’s home directory.
For added measure, we could use the -C
(or --directory
) argument to tar -xj
to set the
directory where the extracted files would be saved.
This method is extremely handy since the archive is not written to the disk anywhere, not on local, not on remote. It’s only processed as a stream of bytes.
The -j
argument to the tar
commands is not strictly necessary. The whole thing will work even
without it. But since the archive is being transferred over network, it pays to spend a little
processor time into compressing it so as to minimize network usage (and consequently, speed up the
operation).
We could’ve added the -v
argument to one (or both!?) tar
commands to show the files as they are
being archived/extracted.
Remote to Local¶
This follows a similar method as in the previous section, but in the other way around. We run the
archiver tar
command on the remote host, and the extractor tar
command on the local machine.
ssh remote tar -cj just-a-calendar | tar -xj
This will recreate the just-a-calendar
folder on the remote host, onto the local disk. We could
use the -C
argument to either tar
command to set it’s initial working directory.
Of course, if wanted to just save the archive on the local disk, not extract it, we could just redirect the stream to a file.
ssh remote tar -cj just-a-calendar > package.tar.bz2
Conclusion¶
The tar
command, in all it’s variations, is irreplaceable in it’s utility for these kind of
purposes. The handiest resource for getting help while working with it is, of course, the man page.
But when we’re in the mood to just copy-pasta (yes, pasta) a command to serve the purpose, I hope
this article will be helpful.