py.execnet: elastic distributed programming

execnet helps you to:

One of it's unique features is that it uses a zero-install technique: no manual installation steps are required on remote places, only a basic working Python interpreter and some input/output connection to it.

There is a EuroPython2009 talk from July 2009 with examples and some pictures.

Gateways: immediately spawn local or remote process

In order to send code to a remote place or a subprocess you need to instantiate a so-called Gateway object. There are currently three Gateway classes:

  • py.execnet.PopenGateway to open a subprocess on the local machine. Useful for making use of multiple processors to to contain code execution in a separated environment.
  • py.execnet.SshGateway to connect to a remote ssh server and distribute execution to it.
  • py.execnet.SocketGateway a way to connect to a remote Socket based server. Note that this method requires a manually started :source:py/execnet/script/socketserver.py script. You can run this "server script" without having the py lib installed on the remote system and you can setup it up as permanent service.

remote_exec: execute source code remotely

All gateways offer remote code execution via this high level function:

def remote_exec(source):
    """return channel object for communicating with the asynchronously
    executing 'source' code which will have a corresponding 'channel'
    object in its executing namespace."""

With remote_exec you send source code to the other side and get both a local and a remote Channel object, which you can use to have the local and remote site communicate data in a structured way. Here is an example for reading the PID:

>>> import py
>>> gw = py.execnet.PopenGateway()
>>> channel = gw.remote_exec("""
...     import os
...     channel.send(os.getpid())
... """)
>>> remote_pid = channel.receive()
>>> remote_pid != py.std.os.getpid()
True

Channels: bidirectionally exchange data between hosts

A channel object allows to send and receive data between two asynchronously running programs. When calling remote_exec you will get a channel object back and the code fragment running on the other side will see a channel object in its global namespace.

Here is the interface of channel objects:

#
# API for sending and receiving anonymous values
#
channel.send(item):
    sends the given item to the other side of the channel,
    possibly blocking if the sender queue is full.
    Note that items need to be marshallable (all basic
    python types are).

channel.receive():
    receives an item that was sent from the other side,
    possibly blocking if there is none.
    Note that exceptions from the other side will be
    reraised as gateway.RemoteError exceptions containing
    a textual representation of the remote traceback.

channel.waitclose(timeout=None):
    wait until this channel is closed.  Note that a closed
    channel may still hold items that will be received or
    send. Note that exceptions from the other side will be
    reraised as gateway.RemoteError exceptions containing
    a textual representation of the remote traceback.

channel.close():
    close this channel on both the local and the remote side.
    A remote side blocking on receive() on this channel
    will get woken up and see an EOFError exception.

XSpec: string specification for gateway type and configuration

py.execnet supports a simple extensible format for specifying and configuring Gateways for remote execution. You can use a string specification to instantiate a new gateway, for example a new SshGateway:

gateway = py.execnet.makegateway("ssh=myhost")

Let's look at some examples for valid specifications. Specification for an ssh connection to wyvern, running on python2.4 in the (newly created) 'mycache' subdirectory:

ssh=wyvern//python=python2.4//chdir=mycache

Specification of a python2.5 subprocess; with a low CPU priority ("nice" level). Current dir will be the current dir of the instantiator (that's true for all 'popen' specifications unless they specify 'chdir'):

popen//python=2.5//nice=20

Specification of a Python Socket server process that listens on 192.168.1.4:8888; current dir will be the 'pyexecnet-cache' sub directory which is used a default for all remote processes:

socket=192.168.1.4:8888

More generally, a specification string has this general format:

key1=value1//key2=value2//key3=value3

If you omit a value, a boolean true value is assumed. Currently the following key/values are supported:

  • popen for a PopenGateway
  • ssh=host for a SshGateway
  • socket=address:port for a SocketGateway
  • python=executable for specifying Python executables
  • chdir=path change remote working dir to given relative or absolute path
  • nice=value decrease remote nice level if platforms supports it

Examples of py.execnet usage

Compare cwd() of Popen Gateways

A PopenGateway has the same working directory as the instantiatior:

>>> import py, os
>>> gw = py.execnet.PopenGateway()
>>> ch = gw.remote_exec("import os; channel.send(os.getcwd())")
>>> res = ch.receive()
>>> assert res == os.getcwd()
>>> gw.exit()

Synchronously receive results from two sub processes

Use MultiChannels for receiving multiple results from remote code:

>>> import py
>>> ch1 = py.execnet.PopenGateway().remote_exec("channel.send(1)")
>>> ch2 = py.execnet.PopenGateway().remote_exec("channel.send(2)")
>>> mch = py.execnet.MultiChannel([ch1, ch2])
>>> l = mch.receive_each()
>>> assert len(l) == 2
>>> assert 1 in l
>>> assert 2 in l

Asynchronously receive results from two sub processes

Use MultiChannel.make_receive_queue() for asynchronously receiving multiple results from remote code. This standard Queue provides (channel, result) tuples which allows to determine where a result comes from:

>>> import py
>>> ch1 = py.execnet.PopenGateway().remote_exec("channel.send(1)")
>>> ch2 = py.execnet.PopenGateway().remote_exec("channel.send(2)")
>>> mch = py.execnet.MultiChannel([ch1, ch2])
>>> queue = mch.make_receive_queue()
>>> chan1, res1 = queue.get()  # you may also specify a timeout
>>> chan2, res2 = queue.get()
>>> res1 + res2
3
>>> assert chan1 in (ch1, ch2)
>>> assert chan2 in (ch1, ch2)
>>> assert chan1 != chan2

Receive file contents from remote SSH account

Here is a small program that you can use to retrieve contents of remote files:

import py
# open a gateway to a fresh child process
gw = py.execnet.SshGateway('codespeak.net')
channel = gw.remote_exec("""
        for fn in channel:
            f = open(fn, 'rb')
            channel.send(f.read())
            f.close()
""")

for fn in somefilelist:
    channel.send(fn)
    content = channel.receive()
    # process content

# later you can exit / close down the gateway
gw.exit()

Instantiate a socket server in a new subprocess

The following example opens a PopenGateway, i.e. a python child process, and starts a socket server within that process and then opens a second gateway to the freshly started socketserver:

import py

popengw = py.execnet.PopenGateway()
socketgw = py.execnet.SocketGateway.new_remote(popengw, ("127.0.0.1", 0))

print socketgw._rinfo() # print some info about the remote environment

Sending a module / checking if run through remote_exec

You can pass a module object to remote_exec in which case its source code will be sent. No dependencies will be transferred so the module must be self-contained or only use modules that are installed on the "other" side. Module code can detect if it is running in a remote_exec situation by checking for the special __name__ attribute like this:

if __name__ == '__channelexec__':
    # ... call module functions ...