An extensive article on the subprocess module in Python's standard library. Filled with examples and scenarios from my experienece with explanations and rationales for the solutions.
Python’s subprocess module is one of my favourite modules in the standard library. If you have
ever done some decent amount of coding in python, you might have encountered it. This module is used
for dealing with external commands, intended to be a replacement to the old os.system
and the like.
The most trivial use might be to get the output of a small shell command like ls or ps. Not that
this is the best way to get a list of files in a directory (think os.listdir), but you
get the point.
I am going to put my notes and experiences about this module here. Please note, I wrote this with Python 2.7 in mind. Things are slightly different in other versions (even 2.6). If you find any errors or suggestions, please let me know.
A simple usage¶
For the sake of providing context, lets run the ls command from subprocess and get its output
import subprocess
ls_output = subprocess.check_output(['ls'])
I’ll cover getting output from a command in detail later. To give more command line arguments,
subprocess.check_output(['ls', '-l'])
The first item in the list is the executable and rest are its command line arguments (argv
equivalent). No quirky shell quoting and complex nested quote rules to digest. Just a plain python
list.
However, not having shell quoting implies you don’t also have the shell niceties. Like piping for one. The following won’t work the way one would expect it to.
subprocess.check_output(['ls', '|', 'wc', '-l'])
Here, the ls command gets its first command as | and I have no idea what ls would do with it.
Perhaps complain that no such file exists. So, instead, we have to use the shell boolean argument.
More later down in the article.
Popen class¶
If there’s just one thing in the subprocess module that you should be concerned with, its the
Popen class. The other functions like call, check_output, and
check_call use Popen internally. Here’s the signature from the docs.
class subprocess.Popen(args, bufsize=0, executable=None, stdin=None,
stdout=None, stderr=None, preexec_fn=None, close_fds=False, shell=False,
cwd=None, env=None, universal_newlines=False, startupinfo=None,
creationflags=0)
I suggest you read the docs for this class. As with all python docs, its really good.
Running via the shell¶
Subprocess can also run command-line instructions via a shell program. This is usually dash/bash
on Linux and cmd on windows.
subprocess.call('ls | wc -l', shell=True)
Notice that in this case we pass a string, not a list. This is because we want the shell to interpret the whole of our command. You can even use shell style quoting if you like. It is up to the shell to decide how to best split the command line into executable and command line arguments.
On windows, if you pass a list for args, it will be turned into a string using the same rules as the MS C runtime. See the doc-string for
subprocess.list2cmdlinefor more on this. Whereas on unix-like systems, even if you pass a string, its turned into a list of one item :).
The behaviour of the shell argument can sometimes be confusing so I’ll try to clear it a bit here.
Something I wished I had when I first encountered this module.
Firstly, lets consider the case where shell is set to False, the default. In this case, if
args is a string, it is assumed to be the name of the executable file. Even if it contains spaces.
Consider the following.
subprocess.call('ls -l')
This won’t work because subprocess is looking for an executable file called ls -l, but obviously
can’t find it. However, if args is a list, then the first item in this list is considered as the
executable and the rest of the items in the list are passed as command line arguments to the
program.
subprocess.call(['ls', '-l'])
does what you think it will.
Second case, with shell set to True, the program that actually gets executed is the OS default
shell, /bin/sh on Linux and cmd.exe on windows. This can be changed with the executable
argument.
When using the shell, args is usually a string, something that will be parsed by the shell
program. The args string is passed as a command line argument to the shell (with a -c option on
Linux) such that the shell will interpret it as a shell command sequence and process it accordingly.
This means you can use all the shell builtins and goodies that your shell offers.
subprocess.call('ls -l', shell=True)
is similar to
$ /bin/sh -c 'ls -l'
In the same vein, if you pass a list as args with shell set to True, all items in the list are
passed as command line arguments to the shell.
subprocess.call(['ls', '-l'], shell=True)
is similar to
$ /bin/sh -c ls -l
which is the same as
$ /bin/sh -c ls
since /bin/sh takes just the argument next to -c as the command line to execute.
Getting the return code (aka exit status)¶
If you want to run an external command and its return code is all you’re concerned with, the
call and check_call functions are what you’re looking for. They both
return the return code after running the command. The difference is, check_call raises a
CalledProcessError if the return code is non-zero.
If you’ve read the docs for these functions, you’ll see that its not recommended to use
stdout=PIPE or stderr=PIPE. And if you don’t, the stdout and stderr of the command are just
redirected to the parent’s (Python VM in this case) streams.
If that is not what you want, you have to use the Popen class.
proc = Popen('ls')
The moment the Popen class is instantiated, the command starts running. You can wait for it and
after its done, access the return code via the returncode attribute.
proc.wait()
print proc.returncode
If you are trying this out in a python REPL, you won’t see a need to call .wait() since
you can just wait yourself in the REPL till the command is finished and then access the
returncode. Surprise!
>>> proc = Popen('ls')
>>> file1 file2
>>> print proc.returncode
None
>>> # wat?
The command is definitely finished. Why don’t we have a return code?
>>> proc.wait()
0
>>> print proc.returncode
0
The reason for this is the returncode is not automatically set when a process ends. You have to
call .wait or .poll to realize if the program is done and set the returncode
attribute.
IO Streams¶
The simplest way to get the output of a command, as seen previously, is to use the
check_output function.
output = subprocess.check_output('ls')
Notice the check_ prefix in the function name? Ring any bell? That’s right, this function will
raise a CalledProcessError if the return code is non-zero.
This may not always be the best solution to get the output from a command. If you do get a
CalledProcessError from this function call, unless you have the contents of stderr you probably
have little idea what went wrong. You’ll want to know what’s written to the command’s stderr.
Reading error stream¶
There are two ways to get the error output. First is redirecting stderr to stdout and only being
concerned with stdout. This can be done by setting the stderr argument to
subprocess.STDOUT.
Second is to create a Popen object with stderr set to subprocess.PIPE (optionally
along with stdout argument) and read from its stderr attribute which is a readable file-like
object. There is also a convenience method on Popen class, called .communicate, which optionally
takes a string to be sent to the process’s stdin and returns a tuple of (stdout_content, stderr_content).
Watching both stdout and stderr¶
However, all of these assume that the command runs for some time, prints out a couple of lines of output and exits, so you can get the output(s) in strings. This is sometimes not the case. If you want to run a network intensive command like an svn checkout, which prints each file as and when downloaded, you need something better.
The initial solution one can think of is this.
proc = Popen('svn co svn+ssh://myrepo', stdout=PIPE)
for line in proc.stdout:
print line
This works, for the most part. But, again, if there is an error, you’ll want to read stderr too.
It would be nice to read stdout and stderr simultaneously. Just like a shell seems to be doing.
Alas, this remains a not so straightforward problem as of today, at least on non-Linux systems.
On Linux (and where its supported), you can use the select module to keep an eye on
multiple file-like stream objects. But this isn’t available on windows. A more platform independent
solution that I found works well, is using threads and a Queue.
from subprocess import Popen, PIPE
from threading import Thread
from Queue import Queue, Empty
io_q = Queue()
def stream_watcher(identifier, stream):
for line in stream:
io_q.put((identifier, line))
if not stream.closed:
stream.close()
proc = Popen('svn co svn+ssh://myrepo', stdout=PIPE, stderr=PIPE)
Thread(target=stream_watcher, name='stdout-watcher',
args=('STDOUT', proc.stdout)).start()
Thread(target=stream_watcher, name='stderr-watcher',
args=('STDERR', proc.stderr)).start()
def printer():
while True:
try:
# Block for 1 second.
item = io_q.get(True, 1)
except Empty:
# No output in either streams for a second. Are we done?
if proc.poll() is not None:
break
else:
identifier, line = item
print identifier + ':', line
Thread(target=printer, name='printer').start()
Fair bit of code. This is a typical producer-consumer thing. Two threads producing lines of output
(one each from stdout and stderr) and pushing them into a queue. One thread watching the queue
and printing the lines until the process itself finishes.
Passing an environment¶
The env argument to Popen (and others) lets you customize the environment of the command being
run. If it is not set, or is set to None, the current process’s environment is used, just as
documented.
You might not agree with me, but I feel there are some subtleties with this argument that should have been mentioned in the documentation.
Merge with current environment¶
One is that if you provide a mapping to env, whatever is in this mapping is all that’s available
to the command being run. For example, if you don’t give a TOP_ARG in the env mapping, the
command won’t see a TOP_ARG in its environment. So, I frequently find myself doing this
p = Popen('command', env=dict(os.environ, my_env_prop='value'))
This makes sense once you realize it, but I wish it were at least hinted at in the documentation.
Unicode¶
Another one, is to do with Unicode (Surprise surprise!). And windows. If you use unicodes in the
env mapping, you get an error saying you can only use strings in the environment mapping. The
worst part about this error is that it only seems to happen on windows and not on Linux. If its an
error to use unicodes in this place, I wish it break on both platforms.
This issue is very painful if you’re like me and use unicode all the time.
from __future__ import unicode_literals
That line is present in all my python source files. The error message doesn’t even bother to mention
that you have unicodes in your env so it’s very hard to understand what’s going wrong.
Execute in a different working directory¶
This is handled by the cwd argument. You set the location of the directory which you want as the
working directory of the program you are launching.
The docs do mention that the working directory is changed before the command even starts running.
But that you can’t specify program’s path relative to the cwd. In reality, I found that you
can do this.
Either I’m missing something with this or the docs really are inaccurate. Anyway, this works
subprocess.call('./ls', cwd='/bin')
Prints out all the files in /bin. Of course, the following doesn’t work when the working directory
is not /bin.
subprocess.call('./ls')
So, if you are giving something explicitly to cwd and are using a relative path for the
executable, this is something to keep in mind.
Killing and dying¶
A simple
proc.terminate()
Or for some dramatic umphh!
proc.kill()
Will do the trick to end the process. As noted in the documentation, the former sends a SIGTERM
and later sends a SIGKILL on unix, but both do some native windows-y thing on windows.
Auto-kill on death¶
The processes you start in your python program, stay running even after your program exits. This is
usually what you want, but when you want all your sub processes killed automatically on exit with
Ctrl+C or the like, you have to use the atexit module.
procs = []
@atexit.register
def kill_subprocesses():
for proc in procs:
proc.kill()
And add all the Popen objects created to the procs list. This is the only solution I found that
works best.
Launch commands in a terminal emulator¶
On one occasion, I had to write a script that would launch multiple svn checkouts and then run many ant builds (~20-35) on the checked out projects. In my opinion, the best and easiest way to do this is to fire up multiple terminal emulator windows each running an individual checkout/ant-build. This allows us to monitor each process and even cancel any of them by simply closing the corresponding terminal emulator window.
Linux¶
This is pretty trivial actually. On Linux, you can use xterm for this.
Popen(['xterm', '-e', 'sleep 3s'])
Windows¶
On windows, its not as straight forward. The first solution for this would be
Popen(['cmd', '/K', 'command'])
/Koption tellscmdto run the command and keep the command window from closing. You may use/Cinstead to close the command window after the command finishes.
As simple as it looks, it has some weird behavior. I don’t completely understand it, but I’ll try to
explain what I have. When you try to run a python script with the above Popen call, in a command
window like this
python main.py
you don’t see a new command window pop up. Instead, the sub command runs in the same command window. I have no idea what happens when you run multiple sub commands this way. (I have only limited access to windows).
If instead you run it in something like an IDE or IDLE (F5), you have a new command window open up. I believe one each for each command you run this way. Just the way you expect.
But I gave up on cmd.exe for this purpose and learnt to use the mintty utility that
comes with cygwin (I think 1.7+). mintty is awesome. Really. Its been a while since I felt
that way about a command line utility on windows.
Popen(['mintty', '--hold', 'error', '--exec', 'command'])
This. A new mintty console window opens up running the command and it closes automatically, if
the command exits with zero status (that’s what --hold error does). Otherwise, it stays on. Very
useful.
Conclusion¶
The subprocess module is a very useful thing. Spend some time understanding it better. This is my attempt at helping people with it, and turned out to be way longer than I’d expected. If there are any inaccuracies in this, or if you have anything to add, please leave a comment.