Scripting 2005

Scripting 2005

BiRC / Courses / Scripting / Lecture Notes / Remote Method Invocation

Remote Method Invocation (RMI)

In this lecture we cover remote method invocation (RMI) — a general technique for executing code on objects distributed across a network.

Motivation

In the previous lecture we saw how we can set up socket communication between two processes, and while it requires no great leap of faith to see that this can be used to distribute work loads across a set of computers, we did not actually cover how this could be done.

In this lecture we will see a general framework for executing code on remote hosts; a framework that will let us extend our distributed system as needed, in a simple yet powerful way, while making the remote execution almost transparent to the client code.

Using this framework we will then, in the next lecture, build a simple module for distributing jobs, based on the worker pools model.

Remote Method Invocation

Remote method invocation, or RMI, is a general approach to executing some computations remotely. It is an object-oriented version of remote procedure call, an approach that is slightly simpler, so we will consider that first.

Remote Procedure Call (RPC)

Remote procedure call (RPC) is a communication protocol with a client/server architecture. The idea behind RPC is to call code remotely as if we were just calling a procedure (or function, in Python lingo).

From the client's point of view, calling a service on the server should look just like calling a local function, and the RPC framework is the responsible for serializing parameters to the service, making sure the correct service is run, and fetching the result back to the client.

At the server side, we should be able to provide a service to the system just by registering a function under a public name; clients should then be able to call this function to invoke the service.

A protocol for calling functions could look like this: The client connects to the server, sends the name of the function to call, then the list of parameters and a dictionary of keyword parameters, the server then executes the service and returns the result after which both ends closes the connection. All data is send as pickled objects.

The server code for this protocol could look like this:

HOST = ''     # local host
PORT = 50000
SERVER_ADDRESS = HOST, PORT

# set up server socket
s = socket.socket()
s.bind(SERVER_ADDRESS)
s.listen(1)

conn, addr = s.accept()
connFile = conn.makefile()

name = cPickle.load(connFile)
args = cPickle.load(connFile)
kwargs = cPickle.load(connFile)

res = _exportedMethods[name](*args,**kwargs)

cPickle.dump(res,connFile) ; connFile.flush() 

conn.close()

where we assume that _exportedMethods is a dictionary of methods. On the server side, adding a new method would simply consist of inserting it into this dictionary.

The strange syntax function(*args,**kwargs) is just a way of calling function with a list of arguments, args and a dictionary of keyword arguments, kwargs; without the star and double-star we would be calling function with two arguments, a list and a dictionary, but with the star and double star the list is provided to the position-determined arguments of the function and the dictionary is provided to the function as keyword arguments.

EXERCISE RMI.1: Experiment with the star and double star syntax. Write function that takes variable parameters and call them both using the usual syntax and using the star and double-star syntax.

The client-side of the protocol could look like this:

import socket
import cPickle

HOST = 'serverhost'
PORT = 50000
SERVER_ADDRESS = HOST, PORT

s = socket.socket()
s.connect(SERVER_ADDRESS)
f = s.makefile()

cPickle.dump(name,f)
cPickle.dump(args,f)
cPickle.dump(kwargs,f)
f.flush()
res = cPickle.load(f) 

Here, the remoteness of the call is not exactly transparent to the client, but the point is, of course, that we can hide the protocol away:

class RemoteFunction(object):
    def __init__(self,serverAddress,name):
        self.serverAddress = serverAddress
        self.name = name
    def __call__(self,*args,**kwargs):
        s = socket.socket()
        s.connect(self.serverAddress)
        f = s.makefile()
        cPickle.dump(self.name,f)
        cPickle.dump(args,f)     
        cPickle.dump(kwargs,f)   
        f.flush()
        res = cPickle.load(f)
        s.close()
        return res 

The RemoteFunction class wraps a remote function and translates calls (the __call__ method is used when an instance of this class is invoked as a function) into remote calls.

A client using this wrapper could look like this:

import socket
import cPickle

HOST = 'serverhost'
PORT = 50000
SERVER_ADDRESS = HOST, PORT


class RemoteFunction(object):
    def __init__(self,serverAddress,name):
        self.serverAddress = serverAddress
        self.name = name
    def __call__(self,*args,**kwargs):
        s = socket.socket()
        s.connect(self.serverAddress)
        f = s.makefile()
        cPickle.dump(self.name,f)
        cPickle.dump(args,f)     
        cPickle.dump(kwargs,f)   
        f.flush()
        res = cPickle.load(f)
        s.close()
        return res

f = RemoteFunction(SERVER_ADDRESS,'f')
g = RemoteFunction(SERVER_ADDRESS,'g') 

print f()
print g('bar') 

with this as the corresponding server:

import socket, cPickle

def f():
    print 'f'
    return 'x'

def g(x):
    print 'g(', x, ')'
    return 42

_exportedMethods = {
    'f': f,
    'g': g,
    }


HOST = ''     # local host
PORT = 50000
SERVER_ADDRESS = HOST, PORT

# set up server socket
s = socket.socket()
s.bind(SERVER_ADDRESS)
s.listen(1)

while True:
    conn, addr = s.accept()
    connFile = conn.makefile()

    name = cPickle.load(connFile)
    args = cPickle.load(connFile)
    kwargs = cPickle.load(connFile)

    res = _exportedMethods[name](*args,**kwargs)

    cPickle.dump(res,connFile) ; connFile.flush()

    conn.close() 

When f() and g() are called, at the bottom of the client script, RemoteFunction's __call__() method is invoked, and the call is forwarded to the server that dispatches it according to _exportedMethods and then returns the result.

The RemoteFunction objects thus function as proxies for the real functions located at the server; the client code can tread them just as local functions, but the actual execution will be dispatched to the server.

EXERCISE RMI.2: Extend the protocol to handle exceptions (a different way of returning the results of a remote call). Extend the RemoteFunction class to handle this extended protocol, so it translates normal results to normal returned object and exceptions into raised exceptions.

Hint: One possible solution is to use a "tag" class to mark exceptions, that is, use a special class to denote that a value returned is actually an exception. If a value is returned that is an instance of that class — you can use the isinstance() method to test this — it should be interpreted as an exception, if it is not an instance of this class it is just a value that is the result of the method call.

#...server code...
    try:
        res = _exportedMethods[name](*args,**kwargs)
        cPickle.dump(res,connFile) ; connFile.flush()
    except Exception, e:
        cPickle.dump(_RMIException(e),connFile) ; connFile.flush()

#...client code...
        res = cPickle.load(f)
        if isinstance(res,_RMIException):
            raise res.wrappedException
        else:
            return res 

A consequence of this solution is, of course, that no instances of this class, or any subclass, can be returned as anything but exceptions.

Remote Method Invocation

The only real difference between RPC and RMI is that there is objects involved in RMI: instead of invoking functions through a proxy function, we invoke methods through a proxy.

What this means in practice is that we now want the client to hold references to remote objects that it can invoke methods on. These references should behave just like local objects, but when invoked dispatch the method invocation to the remote object.

Because we need to refer to both an object and a method now — not just a function — we extend the protocol so it first sends an object id across the socket, then the method and then the arguments. The server then dispatches the method to the remote object based on id and method name.

The dispatch could look like this:

ID   = cPickle.load(connFile)
name = cPickle.load(connFile)
args = cPickle.load(connFile)
kwargs = cPickle.load(connFile)

obj = _exportedObjects[ID]
res = getattr(obj,name)(*args,**kwargs)

cPickle.dump(res,connFile) ; connFile.flush() 

where the function getattr() is used to look up (and bind) the method of the object, based on the name of the method, and _exportedObjects is assumed to be a dictionary mapping ids to objects.

A complete server could then look like this:

import socket, cPickle

class A(object):
    def __init__(self):
        self.count = 0

    def f(self):
        print 'A.f()'
        print self.count
        self.count += 1
        return 'x'

    def g(self, x):
        print 'A.g(', x, ')'
        print self.count
        self.count += 1
        return 42

class B(A):
    def __init__(self):
        A.__init__(self)
        
    def f(self):
        print 'B.f()'
        print self.count
        self.count += 1
        return 'y'

class C(A):
    def __init__(self):
        A.__init__(self)
        
    def g(self,x):
        print 'C.g(', x, ')'
        print self.count
        self.count += 1
        return 24

_exportedObjects = {
    1: A(), 2: B(), 3: C(),
    }


HOST = ''     # local host
PORT = 50000
SERVER_ADDRESS = HOST, PORT

# set up server socket
s = socket.socket()
s.bind(SERVER_ADDRESS)
s.listen(1)

while True:
    conn, addr = s.accept()
    connFile = conn.makefile()

    ID   = cPickle.load(connFile)
    name = cPickle.load(connFile)
    args = cPickle.load(connFile)
    kwargs = cPickle.load(connFile)

    obj = _exportedObjects[ID]
    res = getattr(obj,name)(*args,**kwargs)

    cPickle.dump(res,connFile) ; connFile.flush()

    conn.close() 

Here we add three different objects to the exported objects dictionary and then start dispatching methods to them.

On the client side we can write a proxy class for wrapping remote objects. When an attribute is accessed on the proxy it should dispatch it to the server. We can catch the access of attributes using the special method __getattribute__() and return a proxy method — similar to RemoteFunction earlier — like this:

class RemoteMethod(object):
    '''A callable object that calls a remote method over a socket file.'''
    def __init__(self,name,serverAddress,remoteObjectId):
        self.name = name
        self.serverAddress  = serverAddress
        self.remoteObjectId = remoteObjectId

    def __call__(self,*args,**kwargs):
        s = socket.socket()
        s.connect(self.serverAddress)
        f = s.makefile()
        cPickle.dump(self.remoteObjectId,f)
        cPickle.dump(self.name,f)
        cPickle.dump(args, f)    
        cPickle.dump(kwargs, f)  
        f.flush()
        res = cPickle.load(f)
        s.close()
        return res

class ClientProxy(object):
    '''Class functioning as the client-side proxy for a server-side object.'''

    def __init__(self,serverAddress,remoteObjectId):
        self._serverAddress  = serverAddress
        self._remoteObjectId = remoteObjectId
          
    def __getattribute__(self,attr):
        if attr[0] != '_':
            # handle all non-special and non-private cases by proxy
            return RemoteMethod(attr,self._serverAddress,self._remoteObjectId)
        else:
            # if it is a special or private, just use the
            # usual attribute access
            return object.__getattribute__(self, attr)  

Aside from using __getattribute__() and sending the object id in the new protocol, there is very little difference from the RPC solution above.

When a method is invoked on a proxy object, the result of the method lookup is a RemoteMethod that, when called, dispatches to the server just as the RemoteFunction from earlier.

This code can be used, together with the server above, as this:

HOST = 'serverhost'
PORT = 50000
SERVER_ADDRESS = HOST, PORT

a = ClientProxy(SERVER_ADDRESS,1)
b = ClientProxy(SERVER_ADDRESS,2)
c = ClientProxy(SERVER_ADDRESS,3)

print 'A:', a.f(), a.g('bar')
print 'B:', b.f(), b.g('bar')
print 'C:', c.f(), c.g('bar') 

Extending the RMI System

The code above defines a basic RMI system, but there is room for some improvements. First:

EXERCISE RMI.3: Refactor the RMI code into a module, rmi, containing the proxy code and server object table plus a function for dispatching calls at the server, and a function wrapping the server event loop (the loop where it waits for calls on a socket and then dispatches them).

For the server side, what is a good interface for the functions? For the client side, can you think of any functionality to add to the module?

With the current design, we can get a proxy object if we know the address of a server and an exported object id on that server. This suffices if we only wish to access a few well known services, but not if we wish to allow remote objects to be dynamically added to the system and accessed by clients.

To get around the problem of the client having to know the id of a client, we could add a service that lets clients look up objects, based, for instance, on the class of a service object.

Calling

remoteObject = rmi.lookup('ServiceClass')

could, for instance, result in remoteObject being assigned a proxy for an instance of class ServiceClass, if such an instance is exported, or None otherwise.

But where does the lookup() function find this instance?

One possibility is to have a service on the distributed system providing this lookup — with this service at a known address and known id, to avoid bootstrapping problems — and let the lookup() function dispatch to it.

EXERCISE RMI.4: Write a service for looking up remote objects based on their class. On the server side, you can use something along these lines:

def _getClass(moduleName,className):
    module = __import__(moduleName,{},{},[])
    return getattr(module,className) 

for getting a class based on its name, and something like this

def _lookup(className):

    import exceptions
    try:
         # We are expecting a module not nested in a packet
         # and not in the global module for this to work
        moduleName, className = className.split('.')
        clazz = _getClass(moduleName,className)

        for id,obj in _exportedObjects.items():
            if isinstance(obj,clazz):
                return id
        return None

    except ValueError:
        print className, 'of the wrong form'
        raise
    except exceptions.ImportError:
        print 'module', moduleName, 'was not found'
        raise    
    except exceptions.AttributeError:
        print className, 'was not found in', moduleName
        raise

for looking up an instance in the exported objects. (The exceptions thrown here should probably be propagated back to the client using your solution to exercise RMI.2).

If you register the service at a fixed address — or perhaps just a fixed port and object id — the client should always be able to find it, and you can write a lookup() function for the rmi module that uses this service.

EXERCISE RMI.5: Extend the lookup service with functionality for registering exported objects, i.e. a way for the server to add an object to the table of exported objects.

EXERCISE RMI.6: Extend the lookup service with functionality for removing a registered object.

With RMI.5 and RMI.6, it is possible for the server to add new remote objects, but what about clients? If we have a number of hosts, each connected to this RMI system and accessing the lookup service, it seems a shame that only one host, the one containing the lookup service, can actually export services.

Of course, we could just add a lookup service to each host, but then the clients would need to search through several hosts when looking for an instance of a given class. It is much better to have a single lookup service — but preferably one that can find objects on several different hosts!

EXERCISE RMI.7*: Extend the service to allow clients to export objects, on their own hosts, through the lookup service.

This will require RMI server functionality on all hosts, since all hosts now need to be able to respond to remote calls. It might also require an extension of the lookup service, since it is no longer sufficient for the lookup function to return an object id, it also needs to return the server address of the object.

You might want to have a look at the threading module for solving this exercise — or have a look at an upcoming lecture.

When a remote method returns a value, that value is pickled and transmitted back to the caller. This is, of course, in most cases what we want, but what if we want the method to return a reference to a remote object?

When we want to return a reference to a remote object — that is, we want the caller to receive a reference to a remote object rather than a pickled object back — we need to explicitly return the object id, and the client then needs to explicitly translate that into a proxy object to use for accessing the remote object.

Surely we can do something smarter than that!

There are two aspects of this problem: on the receiver (client) side we need to recognize when we are receiving an object reference that should be wrapped in a proxy object, and on the sender (server) side we need a way of specifying when an object should be sent as a reference and when an object should just be pickled.

EXERCISE RMI.8: Extend the protocol so the client can recognize objects to wrap in a proxy. Then make sure, in the client proxy call, to automatically wrap such objects when they are received.

A solution similar to that of RMI.2 is probably the simplest solution.

At the server end we need to tag, somehow, the objects that are to be returned as references. Preferably in a way that makes this handling of two kinds of objects as transparent as possible; we don't want to explicitly determine the kind of return-handling in each method.

One possible solution is to use a tag class — as in exercise RMI.2 — to mark the objects that should be send as references; objects of this class, or its subclasses, are not pickled but returned as references, any objects not of this class are just serialized.

Thus, in the code below,

class RemoteObject(object):
    pass

class AsValue(object):
    def hello(self):
        print 'here I am'

class AsReference(RemoteObject):
    def hello(self):
        print 'here I am'


class Test(object):
    def getValue(self):
        return AsValue()
    def getReference(self):
        return AsReference() 

a (remote) call to Test.getValue() will return a pickled object while a call to Test.getReference() will return a reference to a remote object.

If the client looks like this:

val = test.getValue()
ref = test.getReference()

val.hello()
ref.hello()

the call to val.hello() will print the output at the client, while the call to ref.hello() will print the output at the server.

EXERCISE RMI.9: Implement this reference tag and protocol.

EXERCISE RMI.10*: If you did exercise RMI.7, consider this: extend the protocol to allow the parameters of a method call to be remote objects as well. This means that not only a return value of the tag class should be send as references rather than values, but also parameters.

How should a ClientProxy object be treated?

Summary

We have built a framework for communication between distributed objects, much superior to the simple socket communication we used in the last lecture.

With this RMI system, we can use services implemented on other hosts — running on a distributed system — almost as we would use objects located on the same host.

In the next lecture we will then, finally, build a system for distributing computation tasks on a network.