New plugins

Pydap can understand data formats through a simple plugin architecture that makes adding support for new formats as easy as possible (but not necessarily easy). Here I show a walkthrough for building a plugin to serve simple binary data files created with numpy's .tofile method like this:

>>> from numpy import *
>>> data = arange(10)
>>> data.tofile('test.dat')

Here we'll build a plugin to serve data stored in files like test.dat as datasets with a simple DAP array. We assume that our files will have the extension .dat, and each one will have only a single variable.

Templates

Pydap 2.2.3 introduced templates for creating plugins. These templates give you a starting point for writing a new plugin, with some skeleton code that should be extended according to your plugin's needs. The example in this tutorial is based on the plugin layout introduced in 2.2.4, so you should update to the latest version if you want to follow along.

To create a new plugin, simply type the following command:

$ paster create -t dap_plugin binary

After running the command you'll have the following directory structure (don't worry if the egg-info part is a little different):

binary
binary/dap
binary/dap/__init__.py
binary/dap/__init__.pyc
binary/dap/plugins
binary/dap/plugins/__init__.py
binary/dap/plugins/__init__.pyc
binary/dap/plugins/binary
binary/dap/plugins/binary/__init__.py
binary/dap.plugins.binary.egg-info
binary/dap.plugins.binary.egg-info/PKG-INFO
binary/dap.plugins.binary.egg-info/SOURCES.txt
binary/dap.plugins.binary.egg-info/dependency_links.txt
binary/dap.plugins.binary.egg-info/entry_points.txt
binary/dap.plugins.binary.egg-info/namespace_packages.txt
binary/dap.plugins.binary.egg-info/paster_plugins.txt
binary/dap.plugins.binary.egg-info/requires.txt
binary/dap.plugins.binary.egg-info/top_level.txt
binary/dap.plugins.binary.egg-info/zip-safe
binary/setup.cfg
binary/setup.py

This is all the code that makes our plugin. Fortunately we only need to edit a single file, dap/plugins/binary/__init__.py, to write the plugin; you can see it in bold on the list above. This file is responsible for reading the data and processing the answer.

Open the file in your favorite text editor. You'll notice that it already contains some code: it has a module–level variable called extensions and a class called Handler, among a few imports. Let's see the meaning of them.

Which files will the plugin handle?

The extensions variable is simply a case-sensitive regular expression matching the files our plugin will support; if we want it to handle all files ending with .dat we should change this to:

extensions = r"""^.*\.dat$"""

The regular expression can be as complicated as you want and doesn't need to match the file extension. If you want to handle all files in a directory called my-data, for example:

extensions = r"""^.*/my-data/"""

Processing the request

So far, so good. Now we need to look at the Handler class. The class will be called by the server when a file handled by our plugin is accessed. Suppose a client accesses the following URL:

http://server/test.dat.dods?a[0:1:4]

This would be called when the client requests the first 5 values of the variable a. When this URL is accessed, the servers sees that our plugin is responsible for .dat file, and handles the request by interacting with the Handler class from our plugin. This is done by calling the class methods in order.

Initializing

The Handler class has 3 methods defined: __init__, _parseconstraints and close. The first one is called by the server with the following signature when a file is first accessed by a client:

__init__(self, filepath, environ)

Here, filepath is a string pointing to the complete path of the requested file (someting like /path/to/test.dat). We can use this to read our data from the binary file, using the corresponding function from numpy. We'll change the method to keep the binary data in a variable in the class:

def __init__(self, filepath, environ):
    dir, self.filename = os.path.split(filepath)
    self.data = numpy.fromfile(filepath)

The environ variable holds the WSGI environment, and we won't need this. It could be used for debugging our plugin with code like this, for example:

import dap.lib
if dap.lib.VERBOSE: environ['wsgi.errors'].write('%s\n' % message)

Parsing the request

The second method defined invoked by the server is _parseconstraints. This method is responsible for parsing the request and returning the corresponding dataset. It has a simple call signature:

_parseconstraints(self, constraints=None)

Here, constraints is a string holding the DAP constraint expression. If this string is empty, the method should return the complete, unrestricted dataset. In this example, the string is a[0:1:4], and the method should return a dataset with a single variable a with only the first 5 values.

Fortunately, pydap has a few functions to make this processing easy; the first thing we should do is split the CE in projections and selections:

from dap.helper import parse_querystring
fields, queries = parse_querystring(constraints)

Now we have a dictionary fields with the requested variables as keys, and the associated hyperslabs (if any) as values; in our example:

>>> f, q = parse_querystring('a[0:1:4]')
>>> print f
{'a': (slice(0, 5, 1),)}

In this case, fields contains the variable a and a slice object that returns the first 5 elements. We will use this later to return the binary data as requested by the user.

The variable queries is a list of selections that should be used for filtering sequences, so it will not be used in this plugin. If you need, take a look at the csv plugin for an example of how to use this.

Building the dataset

To return a dataset, we first instantiate the DatasetType object that our method will return. This code is already in the generated __init__.py file:

dataset = dtypes.DatasetType(name=self.filename)

Now we must look at fields and populate the dataset with the requested variables. If fields is empty, we should return the whole dataset, which will consist of a single array with the data that is stored in the file we read. This array can have any name we choose, since our files don't have any metadata associated with them — will call it a. Here's our code:

if 'a' in fields or not fields:
    slice_ = fields.get('a', slice(None))
    data = self.data[slice_]
    dataset['a'] = dtypes.ArrayType(name='a',
                                    data=data,
                                    type=data.dtype.char,
                                    shape=data.shape)

What we do here is check if our array a (or no specific variables) was requested. In this case, we retrieve the corresponding hyperslab operator and apply it to our data. We then add our ArrayType object to the dataset with the appropriate data, shape and type. All we have to do now is:

return dataset

The Handler class has a final method, called close. As you can imagine, this method is called after the request, and can be used to close open files or database connections. We can simply omit it in this example, so that the final version of our plugin looks like this:

import os.path

import numpy

from dap import dtypes
from dap.server import BaseHandler
from dap.helper import parse_querystring

extensions = r"""^.*\.(dat)$"""

class Handler(BaseHandler):
    def __init__(self, filepath, environ):
        dir, self.filename = os.path.split(filepath)
        self.data = numpy.fromfile(filepath)

    def _parseconstraints(self, constraints=None):
        dataset = dtypes.DatasetType(name=self.filename)
        fields, queries = parse_querystring(constraints)

        if 'a' in fields or not fields:
            slice_ = fields.get('a', slice(None))
            data = self.data[slice_]
            dataset['a'] = dtypes.ArrayType(name='a',
                                            data=data,
                                            type=data.dtype.char,
                                            shape=data.shape)
        
        return dataset 

Testing, installing and distributing

To test your plugin, simply run:

$ python setup.py develop

This will make your plugin available to your DAP server without needing to install it. Make sure to restart any running servers so they can find your new plugin and so that changes in your code are applied. After you're done with development, you can install your module with the command:

$ python setup.py install

You can make your plugin available to other people by registering it at the Cheese Shop:

$ python setup.py register
$ python setup.py sdist bdist_egg upload

The last line will upload a copy of your plugin so it can be installed with EasyInstall. If you write a new plugin, please send me an email so I can add it to the list.