Recently, while doing an experiment with my blog’s rendered output with a VPS instance, I needed to transfer it to the server over SSH. While doing that, I experimented with archiving the folder a bit, so I’m putting the outcome of that experience here, should I need it again in the future.
All notes below assume GNU
tar v1.26. More specifically, the output of
tar --version | head -1
tar (GNU tar) 1.26
I’m only listing the arguments and use-cases that I think are most frequently used (at least by me)
and the ones I’m most likely to need in the future. Please complement this with a healthy serving of
man tar to keep your sanity.
Check out this neat little tool to help generate often-used
--create) command is used to create archives.
- in front of the
c can be omitted, but I find that ugly and prefer to include it. That way
it’s consistent with most other such GNU commands.
Additional options after
v– Enable verbose output. Adding this will print each file as it is being added to the archive.
j– Specify the compression format, if needed. Use
bz2archive. This can also be
ato infer the compression format from the file name, but only if the
f(explained in the next point) is also given. Other compression formats like
--lzipetc. can also be used.
f– Use the next argument as the file name of the archive. If this argument is not provided, the archive content is written to the standard out.
--remove-files– Remove files after adding them to the archive. Be careful with this.
To illustrate the examples, I’ll clone one of my public repositories and play around with creating archives of it.
$ git clone firstname.lastname@example.org:sharat87/just-a-calendar.git $ du -sh just-a-calendar 248K just-a-calendar/
To create a
bz2 archive of a folder:
$ tar -cjf package.tar.bz2 just-a-calendar $ file package.tar.bz2 package.tar.bz2: bzip2 compressed data, block size = 900k $ du -sh package.tar.bz2 76K package.tar.bz2
Since we are specifying the file name here, which includes the
.bz2 part at the end, we can tell
tar to just figure out the compression we want to use. Instead of the
j argument specifying the
compression, we’d put in
a to indicate this.
$ tar -caf package.tar.bz2 just-a-calendar $ file package.tar.bz2 package.tar.bz2: bzip2 compressed data, block size = 900k $ du -sh package.tar.bz2 76K package.tar.bz2
Now, the archive also contains the
.git directory that was present in our clone. We probably don’t
what that. The
tar command provides
--exclude* family of arguments to deal with this. For
example, as in our case, to ignore the folder
.git, we could do:
$ tar -caf package.tar.bz2 --exclude=.git just-a-calendar $ du -sh package.tar.bz2 12K package.tar.bz2
This package doesn’t contain the
.git folder (and consequently is much smaller). However, for
this particular problem, there’s perhaps an even better solution, the
--exclude-vcs argument. This
argument will ignore any VCS directories automatically and it knows about
.git. So our command
$ tar -caf package.tar.bz2 --exclude-vcs just-a-calendar
Another similar useful argument is the
--eclude-backups, which will exclude backup and lock
files which also is usually what we want.
Set Initial Directory¶
--directory) argument sets the initial working directory before creating the archive.
This will influence the paths with which the files inside the archive are saved with. This is
normally only useful if for some reason you can’t
pushd to that directory yourself, which
is not very often.
--list) can be used to list the contents of an archive without extracting it.
Additional options after
v– Verbose listing. The affect of adding this option is like adding
lscommand. That is, it will show each file’s permissions, size, last modified etc. details.
f– Treat next argument as the archive file name. This argument is usually always needed with the
-tcommand (unless the archive is being piped in to the
Let’s run this on our package archive created in the previous section.
$ tar -tf package.tar.bz2 | wc -l 6
Single vs Multiple Top Levels¶
There’s one thing about extracting archives that’s extremely annoying. If it contains multiple files at top level, it’ll pollute the current directory with several objects. To combat this, if we make it a habit to create a new folder and extract inside it, it might turn out that the archive itself contains a top level directory, so now we end up one useless directory in the tree.
This situation is actually handled very well by the
aunpack command from the atool script.
This command takes an archive (of any of several different formats) and extracts it. If it contains
a single top level entry, it is extracted to your working directory. If it contains several top
level entries, a new directory is created and the extraction happens inside that new directory. This
command is extremely convenient, for this and several other reasons.
To find out if an archive has a single top-level entry or multiple, the following snippet can be used:
tar -tf package.tar.bz2 | cut -d/ -f1 | sort -u
This will print out one top-level entry per line. If there’s only one line in the output, then
there’s only one top-level. How this works is that first, the
cut command splits the listing with
/ character, the file separator and only prints the first entry, which will be the top level
entry. Then, the
sort command will sort the top-levels and only print the unique entries
(that’s what the
-u is for). We could further pipe this to
wc -l and check if it results in
--extract) command is used to extract the contents of archives.
This command takes the following arguments:
v– Verbose logging. Prints each file path as it is being extracted.
j– Specify the compression format, if needed. Similar in working as with the
f– Reads the next argument as the archive file name. This is almost always used with this command to specify the archive to extract. If this is not provided, the archive content is expected to be available from standard input.
--keep-old-files) – Fail if any existing files will be overwritten by extracting. This is useful if you don’t want any of your existing files to be overwritten.
So, to extract our archive (in a separate location, of course):
$ mkdir spike && cd spike $ tar -xaf ../package.tar.bz2
Extracting to Different Directory¶
The extract command also supports the
--directory) argument that sets the initial working
directory before extracting. This can be used to change the location where the extracted
files/folder will be saved.
Transferring Archives / Directories¶
In this section, I’ll show a couple of quick examples where we need to transfer a folder tree between current local system and a remote system reachable by SSH.
Local to Remote¶
We could create a
tar file of the folder (and any other files as well), transfer the file to the
remote system, login to the remote system and unpack it there.
There’s a couple of problems with this approach:
- Since we are creating an archive of the folder on our local disk, we need to have the necessary free space for that archive. This may be less the size of the folder, but can still be significant if the folder is large. The same problem will also appear on the remote system.
- We need write permissions on the local disk. If we want to just take a folder to a remote system, we should only need write permission on the remote disk, not on the local disk.
To avoid the above two problems, we can transfer the archive directly as a stream, without saving it
to the local disk. Notice that if we don’t provide a filename for the create (
-c) command, the
archive will be written to standard out. Similarly, if we don’t provide a filename for the extract
-x) command, it will read the archive from standard input. Our solution below will leverage these
tar -cj just-a-calendar | ssh remote tar -xj
The first command (
tar -cj just-a-calendar) creates a
bzip2-compressed archive (we could’ve used
z here to use
gz compression instead) and writes it to the standard out. This becomes the
standard input for the
ssh command which will connect to the remote host, invoke the
command, and forwards it’s own standard input to that
tar -xj command. The
tar -xj command
extracts the archive from it’s standard input, using
bzip2 for decompressing and writes the
extracted contents to the remote user’s home directory.
For added measure, we could use the
--directory) argument to
tar -xj to set the
directory where the extracted files would be saved.
This method is extremely handy since the archive is not written to the disk anywhere, not on local, not on remote. It’s only processed as a stream of bytes.
-j argument to the
tar commands is not strictly necessary. The whole thing will work even
without it. But since the archive is being transferred over network, it pays to spend a little
processor time into compressing it so as to minimize network usage (and consequently, speed up the
We could’ve added the
-v argument to one (or both!?)
tar commands to show the files as they are
Remote to Local¶
This follows a similar method as in the previous section, but in the other way around. We run the
tar command on the remote host, and the extractor
tar command on the local machine.
ssh remote tar -cj just-a-calendar | tar -xj
This will recreate the
just-a-calendar folder on the remote host, onto the local disk. We could
-C argument to either
tar command to set it’s initial working directory.
Of course, if wanted to just save the archive on the local disk, not extract it, we could just redirect the stream to a file.
ssh remote tar -cj just-a-calendar > package.tar.bz2
tar command, in all it’s variations, is irreplaceable in it’s utility for these kind of
purposes. The handiest resource for getting help while working with it is, of course, the man page.
But when we’re in the mood to just copy-pasta (yes, pasta) a command to serve the purpose, I hope
this article will be helpful.