Running a (complete) DAP server

This is a step–by–step HOWTO explaining how to install, configure and deploy a complete DAP server using pydap. The HOWTO covers everything you need to know to have a lightweight data server with everything pydap offers.

Table of contents

  1. Requirements
    1. Installing Python
    2. Installing numpy
    3. Installing EasyInstall
  2. Installing pydap
  3. Plugins for additional formats
    1. Matlab files
    2. netCDF files
    3. GrADS/GRIB files
    4. HDF5 files
    5. Geospatial data supported by GDAL
    6. Serving data from databases
    7. Serving Dapper–compliant data
  4. Different responses
    1. Downloading data without a client
    2. Serving data using JSON
    3. Visualizing data in Google Earth with a Web Map Server
  5. Generating a THREDDS catalog
  6. Deploying an AJAX interface
  7. Running a production server

1. Requirements

Pydap is a library written in Python that relies heavily on free software written by other people; this means it has a good number of dependencies, aside from the Python interpreter itself. Luckily, a considerable number of these dependencies can be installed automatically for us. Here I cover the dependencies that have to be installed manually.

1.1 Installing Python

Python is free and open source programming language developed by Guido Van Rossum and maintained by the Python Software Foundation. I assume you already know enough about Python, or at least that you know that you should read the documentation in order to know more. A nice introduction is the free book How to think like a computer scientist: learning with Python.

Pydap requires a recent version of Python to run. It should work with anything after Python 2.3+, but I recommend using Python 2.4 since 2.5 is still in its infancy. Recent Unices (including Linux and Mac OS X) come with Python installed. Python is also available for Windows and many other OS.

Even though you might have Python already installed in your machine, you might want to make a separate install just for pydap. Perhaps you want to use a different version than the one installed; perhaps you don't want to break your installation when upgrading your system. Whatever the reason, it's a good thing to install Python separetely.

(If you want to use your pre–installed Python, just skip this section.)

Installing Python in Unix is as simple as compiling any other program — here's an example of installing Python 2.4.5 to my home directory:

$ wget http://www.python.org/ftp/python/2.4.4/Python-2.4.4.tgz
$ tar zxvf Python-2.4.4.tgz
$ cd Python-2.4.4
$ ./configure --prefix=~/Python-2.4.4
$ make
$ make install

You will need some development tools (gcc, make, autoconf, etc.) to compile Python. After installation, we need to add the python interpreter to our PATH. This is done be editing the .bashrc file in your home directory and adding the line:

export PATH=${HOME}/Python-2.4.4/bin:$PATH

This is for the bash shell; for other shells the syntax will be slightly different. Check your shell documentation if you use a shell other than bash.

Try invoking the command python after setting the environment variable to see if the proper executable is being called. You should check the version and date of compilation:

Python 2.4.3 (#1, Jul 21 2006, 18:02:52) 
[GCC 4.0.0 20041026 (Apple Computer, Inc. build 4061)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>

For Windows users I recommend installing a pre–compiled Python. ActivePython has a nice pre–packed Python, although it's not open source. Once your Python installation is ready, we should proceed to install numpy.

1.2 Installing numpy

Numpy is a numerical library for Python, said to be the fundamental package needed for scientific computing with Python. Pydap depends on the powerful N-dimensional array object that comes with numpy for representing the DAP Arrays.

Although numpy can in theory be installed automatically when pydap is installed, it's better to install it manually. Installation of Python modules are trivial, and always follow the routine (you may want to use another Sourceforge mirror):

$ wget http://ufpr.dl.sourceforge.net/sourceforge/numpy/numpy-1.0.tar.gz
$ tar zxvf numpy-1.0.tar.gz
$ cd numpy-1.0
$ python setup.py install

If you make a separate install of Python, make sure that you're calling the right executable on the last line. This should install numpy on your system without any complications. If you're using Windows I recommend downloading the binaries from the website.

1.3 Installing EasyInstall

EasyInstall is a framework for automatically downloading, building and installing Python modules, while resolving dependencies. To install it you just have to download and run a single Python script:

$ wget http://peak.telecommunity.com/dist/ez_setup.py
$ python ez_setup.py

You may have to run the script as root (or with sudo), depending on where you installed Python and with which permissions. On Windows, just double–click the file and you're set. As you'll soon see, EasyInstall makes it trivial to install pydap and the optional plugins and responses.

2. Installing pydap

Remember EasyInstall? Thanks to it we can now install a pydap server with a single command:

$ easy_install "dap[server]"

This will install the latest pydap with quite a few dependencies: httplib2, used by the client to download data, together with Python Paste, Paste Deploy, Paste Script and the Cheetah template engine. Pydap stands on the shoulder of giants.

To test the installation we are going to deploy a little test server; later we can turn it on a production server, but for now we're going to set everything up using a simple server without any concurrency. We create a server with the following command:

$ paster create -t dap_server howto

Here the paster command comes from Python Script. In this line we are creating a "project" called "howto" (you can choose any other name) using the dap_server template. This will give us the following files:

howto
howto/data
howto/data/sample.csv
howto/howto.egg-info
howto/howto.egg-info/not-zip-safe
howto/howto.egg-info/paster_plugins.txt
howto/server.ini
howto/template
howto/template/index.tmpl

To check if everything is working, simply run the command

$ paster serve howto/server.ini

and point your browser to http://localhost:8080. You should see a directory listing with a single file called sample.csv like this:

This is the default template for the pydap server — we'll learn how to change it later. For now you can see that we have a single file being served. We can check the DDS and the DAS responses by clicking on the links between brackets. They should look like this, respectively:

Dataset {
    Sequence {
        String a;
        Int32 b;
        Float64 c;
    } sample;
} sample%2Ecsv;
Attributes {
    String filename "sample.csv";
    sample {
        a {
        }
        b {
        }
        c {
        }
    }
}

We can do one additional check by verifying the ASCII response for the dataset. Open the URL http://localhost:8080/sample.csv.asc and you should see this response:

Dataset {
    Sequence {
        String a;
        Int32 b;
        Float64 c;
    } sample;
} sample%2Ecsv;
---------------------------------------------
sample.a, sample.b, sample.c
"row1", 1, 1
"row2", 2, 10

Just to be sure that the server is working, we're going to access it using the pydap client. Open the Python interpreter and type the following session; you should get the same response:

>>> from dap.client import open
>>> dataset = open('http://localhost:8080/sample.csv')
>>> for values in dataset.sample: print values.data
... 
['row1', 1, 1.0]
['row2', 2, 10.0]

If you got the same result as shown in the above session everything is working fine!

Plugins for additional formats

One thing you should now is that out–of–the–box pydap supports only CSV files like the sample.csv that we saw before. This is done on purpose to avoid bloating the code and requiring lots of dependencies. Fortunately, additional formats can be easily supported by installing plugins.

A plugin is a simple Python module that acts as an abstraction layer between pydap and the data. Currently, pydap has plugins for netCDF 3, GrADS/GRIB, Matlab 4 and 5 and HDF 5 files, together will formats supported by the GDAL library; it also has a plugin for serving data from a SQL database. Writing new plugins is relatively easy, but requires a bit of knowledge of Python and on the DAP.

Matlab files

The Matlab plugin is a simple plugin that should understand binary files created with Matlab 4 and 5. It exists mostly for historical reasons, but we're going to install it since it's so simple. Just type the command:

$ easy_install dap.plugins.matlab

And we're all set. Download this a sample file from http://test.pydap.org/data.mat, put it in the howto/data directory and restart your server. You should see the file on the directory listing now:

You should check that the DDS and DAS are being generated correctly; they should be:

Dataset {
    Float64 y[254][1];
    Byte fish[254][1];
    Float64 xo[254][512];
    Float64 pos[254][2];
} data%2Emat;
Attributes {
    String header "MATLAB 5.0 MAT-file, Platform: PCWIN, Created on: Thu May 15 12:12:47 2003";
    String filename "data.mat";
    y {
    }
    fish {
    }
    xo {
    }
    pos {
    }
}

Again, we can test the plugin by connecting to our server using pydap as a client:

>>> from dap.client import open
>>> dataset = open('http://localhost:8080/data.mat')
>>> print dataset.y[:10]
[[ 80.89]
 [ 80.7 ]
 [ 80.57]
 [ 80.52]
 [ 80.44]
 [ 80.38]
 [ 80.95]
 [ 80.33]
 [ 80.38]
 [ 80.71]]

(Note that the plugin sometimes fails on 64 bits machines, resulting in strange values in the arrays.)

NetCDF files

NetCDF is a self–descriptive, machine–independent, file format for scientific data; it is by far the format best supported by pydap, since it's what I use for my daily research. The plugin for pydap even comes with my own implementation of a netCDF library, a very fast netCDF reader written in pure Python called pupynere. All you have to do is install the plugin:

$ easy_install dap.plugins.netcdf

And you'll be able to serve netCDF files. You can test your server by downloading the same file http://test.pydap.org/coads.nc, adding it to the howto/data directory and restarting the server. This is what the directory listing should look like:

The speed of pupynere is up to par with other Python bindings to the netCDF library — in some benchmarks it is up to 40% faster, but in others it performs twice as slow. If you want you can install the pynetcdf module, which depends only on Numpy and the netCDF library from Unidata. If you have pynetcdf installed it will take precedence over pupynere on the plugin.

We can test our plugin by connecting to the server using a DAP client like Ferret or GrADS. Here's a simple session with Ferret:

And with GrADS:

ga-> sdfopen http://localhost:8080/coads.nc
ga-> set lat -50 0 
ga-> set lon -70 20
ga-> set t 1 
ga-> define SST = SST
ga-> d SST

GrADS/GRIB files

The GrADS/GRIB plugin is a handler for GRIB binary data with a control file that holds the metadata. The GrADS/GRIB interface consists of 3 files — the grib data itself, a GrADS control file and an index file — that are required in order to read the data.

In order to read these files, the module requires the cdms module, available from CDAT. The CDAT installation usually installs a separate version of Python; we can install it using the same version as pydap with the following steps:

$ wget http://ufpr.dl.sourceforge.net/sourceforge/cdat/CDAT-4.1.2-everything.tar.gz
$ tar zxf CDAT-4.1.2-everything.tar.gz
$ cd CDAT-4.1.2
$ cd exsrc
$ ./install_script --enable-ioapi
$ cd ..
$ python installation/install.py

After installing CDAT, proceed to install the plugin with the command:

$ easy_install dap.plugins.grads

To test the installation, download the 3 sample files (model.ctl, model.grb and model.gmp) from this tutorial. Place the files in your howto/data directory and restart the server. Your directory listing should now look like this: