LSD

Section: LSD (1)
Updated: $Date: 2009/08/30 13:33:26 $
Index Return to Main Contents
 

NAME

lsd - License Shadowing Daemon for the PBS scheduler  

SYNOPSIS

@INSTALLDIR@/bin/lsdinit start @INSTALLDIR@/bin/lsdinit stop  

DESCRIPTION

lsd provides a mechanism for modelling the allocation of software licenses by jobs under a batch queuing system, and allows the batch scheduler to interact with this model when it is making scheduling decisions.

Under PBS, jobs that run code that use software licenses declare this fact by means of the 'software' argument in the job submission. e.g.

  #!/bin/sh
  #PBS -lsoftware=foo:bar
  foo foofile.in foo.out
  bar foo.out bar.out
We use this declaration to keep track of the license use by jobs under PBS.

There are several reasons why this modelling mechanism is desirable, but the most compelling relates to the way jobs use licenses. Since a job might not acquire its licenses as soon as it starts, nor hand them back as soon as it finishes, we can't rely on the information returned by license server queries when making scheduling decisions.

The basic idea is that you have one lsd daemon running on one machine. Each PBS scheduler makes calls to the lsd API functions (see @INSTALLDIR@/include/lsdapi.h) to communicate with the single lsd instance.

At the start of a scheduling cycle, the scheduler sends lsd a snapshot of the running and suspended jobs on its system. During the cycle it selects various candidate jobs to run from the queues. For each of these candidates that have software arguments it uses the lsdReserve() API call to see if it is allowed to start that job.

lsd keeps track of what all the schedulers have told it, and builds a model of the license economy (based of the jobs' software arguments). Thus the model is made up of "shadow licenses" that (hopefully) model the eventual license use of all the jobs. When an attempt is made to reserve a shadow license that would exceed the pool of available licenses, the reservation is rejected and the scheduler uses this information to avoid starting the job in question.

lsd also kills (or alerts you to kill) rogue jobs. These are jobs that are either using licenses but haven't declared their intent to do so in the software request, or are using more licenses than they should. This mechanism is not perfect, due to the limited information available to lsd from the license server queries, but it is conservative so that a job will only be killed if it can be shown that no other job or process could have been the offender.

Rogue jobs are killed by running lsdkilljob which is a shell script that you can tailor to your needs. This script could just email you the job details so you can kill it yourself, or, if lsd is running as a user who can manage jobs on all the machines, it can qdel the jobs itself (and mail the user too).  

SYNTAX OF THE SOFTWARE ARGUMENT

The software argument supplied with the job submission takes the form:

software:software:...
where software can take the following forms:
softwareName
softwareName/count
softwareName/featureName=count/featureName=count/...
softwareName is the name of a software package, as known to lsd. featureName is the name of a feature within the software package. count is the number of licenses (per feature) that the job will use. For example:
abaqus:matlab
abaqus/20:matlab
abaqus/abaqus=20/standard=16:matlab
In most cases the simplest of these three forms is enough to describe the license use by a job: lsd will compute the license consumption pattern based on the number of CPUs that the job will run on. In some cases, however, a job will use software in a way that causes it to consume a different number of licenses. The job might start several instances of a piece of software, or may in some other way "overload" the CPU allocation. For these cases, the more complex forms must be used.

The softwareName/count form indicates that the job will use count licenses of each feature within the software. The softwareName/featureName=count form allows you to specify that the job will use a different number of licenses for each feature within a software package. If you omit a feature name in this form, lsd will use the feature count computed from the number of CPUs that the job will run on.  

CHOOSING A HOST

lsd can run on any host that can establish TCP connections to all your PBS schedulers and query all the license servers. If you move lsd from one host to another you will probably have to modify the license query commands in the plugins (see below) and modify lsdkilljob to run on the new host.  

CLIENT-SIDE ENVIRONMENT

Set LSDCONFIG to specify a non-default configuration file.

Set LSDAPIRWTIMEOUT to specify the read and write timeout (in seconds) for communication with the daemon.

 

DAEMON ENVIRONMENT

Set LSDCONFIG to specify a non-default configuration file.

Set LSDVERBOSE to any string of characters from the set 'pfc' to control logging verbosity in the daemon.

Set LSDKILLJOB on the server side to the absolute pathname of an executable to override the default lsdkilljob script.

 

CONFIGURATION

The default configuration file is @INSTALLDIR@/lsd.conf. This has to be present on both the server and client machines. See the comments in this file for more details.  

STARTING AND STOPPING LSD

Use
@INSTALLDIR@/bin/lsdinit start
@INSTALLDIR@/bin/lsdinit stop
It can be run as any user, but needs to be able to read its configuration file. lsd can be stopped and started with generally little effect on the running schedulers. The schedulers will reconnect to lsd when lsd starts up again. However, if lsd is not running, the PBS schedulers will not start any jobs that have software arguments. lsd will also prevent the schedulers from starting jobs with software arguments until all of the schedulers have connected to lsd. If one or more of your systems is down for any length of time its probably worth commenting it out in the lsd config file and restarting lsd.  

MONITORING LSD

The internal state of lsd can be monitored via http. It uses the port number in the lsd config file plus one as its monitoring port. To monitor lsd, point your browser at http://hostname:port/lsdstatus.html
e.g.
http://sc.apac.edu.au:7107/lsdstatus.html  

PLUGINS

All the idiosyncrasies of the various licensing mechanisms used by the various software packages are handled by plugins. A plugin is basically just a python module with some mandatory definitions, placed in a particular directory where lsd can find it (see below).

A plugin performs two functions:

1. Emulates the license consumption pattern of the package. Often this is something simple like 'one license per process', but it can be much more complicated than that.

2. Interprets the output from license server queries to determine current license availability and use.  

WRITING, MODIFYING AND INSTALLING PLUGINS

A plugin must be named *.py and be installed in the lsd plugins directory (@INSTALLDIR@/plugins/ for the live instance of lsd).

It must have a global-scope function called createInstances() that returns either an instance of BasePlugin or a sequence of instances of BasePlugin (instances of children of BasePlugin are also instances of BasePlugin). The sequence can be anything that supports iteration, such as a tuple, list, or (most usefully) a generator. BasePlugin can be found in @INSTALLDIR@/bin/baseplugin.py.

Writing a new plugin usually involves subclassing BaseFlexlmPlugin to suit the appropriate license manager (even plugins for non-flexlm controlled software are best done by subclassing BaseFlexlmPlugin). For examples of handling the more perverse license consumption behaviours, have a look at flexlmfluent.py and flexlmcfx.py (all in @INSTALLDIR@/plugins/)

See the comments in @INSTALLDIR@/bin/baseplugin.py and @INSTALLDIR@/plugins/baseflexlmplugin.py for more details.  

STEP-BY-STEP GUIDE TO MODIFYING PLUGINS

1. 'cvs co pbs_lsd' or 'cvs update pbs_lsd' to get the lastest copy of lsd. You may need to set $CVSROOT $CVS_RSH first.

2. cd pbs_lsd/plugins

3. Read comments in ../bin/baseplugin.py, baseflexlmplugin.py and see *.py for more examples.

4. Edit or create your plugin. Note that the methods of a plugin must never block, and should all execute as quickly as possible. In general, the plugin methods should only do things like extract info from the string returned by the license server query, and do some simple arithmetic or tests. Also bear in mind that the plugin methods may be called much more or much less frequently than you might think. In other words, don't do side effects, don't fork shells or anything like that.

When lsd is considering a reservation request for a job, it considers each software package mentioned in the job's software request in turn. For each software package it first calls the getRunnability() method of the software's plugin. If that method returns a value indicating the job is potentially runnable (e.g. some software can never run on more than a fixed number of cpus), it then calls the plugin's getFeatureConsumption() method to calculate the number of software features that will be consumed by the job when it runs.

If there are enough shadow licenses to satisfy all of the software features requested by the job, the shadow license counts are incremented and the job scheduler is told that this job can be run. If not, the job scheduler is told that this job can't be run at the moment.

All plugin classes define a queryCommandString member. The command defined here is allowed to potentially block or be slow to execute as it gets run asynchronously. It may be run less often that you might think, due to caching of its output. Plugins that share exactly the same string value for queryCommandString will usually result in those plugins' instances sharing the output from just one common spawn of the queryCommandString.

5. Test it.

        cd wherever/pbs_lsd
        gmake tests
That should run the basic tests that ensure that nothing crashes and the plugin methods are reasonably sane. Check the output for errors and warnings. To test the business logic of the plugin, you'll have to exercise it manually. In one window start...
        LSDCONFIG=lsdtest.conf python bin/lsd.py
In another window run...
        LSDCONFIG=lsdtest.conf src/test
Use the s, r and c commands to exercise your changes. Before testing reservations you will first need to use s to send a snapshot. You can test reservations of software with 'r software ncpus'. See src/test.c for more details. The stdout from lsd.py should show the test client's commands being run, as well as debugging output from the plugins. To monitor the internal state of lsd, use your web browser, but bear in mind it will be talking on a different port due to the test config. e.g.
http://sc.apac.edu.au:50007/lsdstatus.html

6. Once it looks OK, su root and

        @INSTALLDIR@/bin/lsdinit stop
        gmake install-plugins
        @INSTALLDIR@/bin/lsdinit start

7. As you, cvs commit -m 'changes to XXX plugin'  

ADDING LSD FUNCTIONALITY TO THE PBS SCHEDULER

See @INSTALLDIR@/include/lsdapi.h for details. Under ANUPBS you can indicate that you want to include lsd functionality into the scheduler by configuring with --with-lsd=/opt/lsd  

TESTING

Use
        gmake tests
To exercise all the plugins.

If you modify lsd, or need to test lsd's behaviour, you can enable the dummy plugin by running

        LSDCONFIG=./lsdtest.conf LSDDUMMYQUERY=`pwd`/tests/lsd-dummy-query  bin/lsdinit start
or
        LSDCONFIG=./lsdtest.conf LSDDUMMYQUERY=`pwd`/tests/lsd-dummy-query python bin/lsd.py
With the dummy plugin enabled you can use the test client (src/test) to send lsd dummy jobs and see how it responds. e.g.

[djh900@sc0 pbs_lsd]$ LSDCONFIG=lsdtest.conf src/test
Commands are:
s - send snapshot
r feature ncpus - reserve ncpus of feature
c - cancel previous reservation (prompts for jobid)
q -quit
s, r feature ncpus, c, q: s

Enter EXECHOSTSTRING SOFTWARE NCPUS. Blank line to end.
h1 dummy:matlab 3
job.id=758.xx0
h2 dummy:matlab 1
job.id=113.xx0

s, r feature ncpus, c, q: r dummy 2
job.id=212.xx0
Exit code 3: Reservation successful..

 

FILES

@INSTALLDIR@/bin/lsdinit
Start/stop script
@INSTALLDIR@/bin/lsd.py
Main program
@INSTALLDIR@/bin/lsdkilljob
Shell script to kill/warn rogue jobs.
@INSTALLDIR@/lsd.conf
Default configuration file
@INSTALLDIR@/plugins/*.py
Plugin modules
/var/spool/lsd/lsd.log*
Log files - a new one every day. Delete the old ones with a cron job.
@INSTALLDIR@/lib/liblsd.a
Client-side API for scheduler.
@INSTALLDIR@/lib/lsdapi.h
Client-side API header file.
 

SEE ALSO

qstat(1), qsub(1), lmstat(1) python(1)


 

Index

NAME
SYNOPSIS
DESCRIPTION
SYNTAX OF THE SOFTWARE ARGUMENT
CHOOSING A HOST
CLIENT-SIDE ENVIRONMENT
DAEMON ENVIRONMENT
CONFIGURATION
STARTING AND STOPPING LSD
MONITORING LSD
PLUGINS
WRITING, MODIFYING AND INSTALLING PLUGINS
STEP-BY-STEP GUIDE TO MODIFYING PLUGINS
ADDING LSD FUNCTIONALITY TO THE PBS SCHEDULER
TESTING
FILES
SEE ALSO

This document was created by man2html, using the manual pages.
Time: 12:35:11 GMT, August 30, 2009