FINAL (hope so :-) SUMMARY: One Batch Queue feeding several Hosts

From: Jochen Bern (bern@kleopatra.Uni-Trier.DE)
Date: Wed Sep 08 1993 - 04:19:16 CDT


Original Question:
> we are currently crunching a Number of heavy Jobs batch(1)ed by Shell
> Scripts. I investigated the Possibility of tuning Batch Queue Defs,
> but what I'ld really like to have is that the available Hosts (cur-
> rently five) would fetch their Jobs from a single Queue, instead of
> five Queues with manual Load Balancing.
> This is for SunOS 4.1.2, one SS2 and several ELCs.

Answers:

1) Use a Package from the DQS/NQS Family (feldt@phyast.nhn.uoknor.edu,
   will@banjo.mit.edu, stuart@mtb.phil.mop.com)

   It seems that this encompasses four Packages: DQS (Distributed Queueing
   System), NQS (Network ...), MDQS (Motif DQS?) and MNQS (Motif NQS?).
   I tried DQS 2.1, nice X11 Front-End, but seemingly quite buggy. Current-
   ly, I'm running NQS 3.34. markus@octavia.anu.edu.au warned me that I'll
   have some real Work when (if?) NQS crashes, but 'til now, it runs fine.
   More Details below.

2) Use the Condor Package (adam%bwnmr4@harvard.harvard.edu)

   This is not for Batching, but for parallel Programs; It replaces the
   C Runtime Library. I didn't investigate this further. I only heard from
   rfinch@caldwr.water.ca.gov that besides making your (new!) Executables
   unrunnable on a plain Vanilla System, you may encounter Difficulties
   in File Access.

3) Use DJM (Distributed Job Manager) from the Minnesota Supercomputer Center
   (markus@octavia.anu.edu.au)

   Written for Connection Machines. Porting to SUNs is considered "relatively
   easy". But guess what, seems that nobody did it yet ... ;-)

4) Use Scalable Technologies' pshell (Parallel Shell) (TRANLE@INTELLICORP.COM)

   The first and only commercial Product mentioned. As long as PD Software
   does the Job, I won't investigate this further.

5) If you have installed /usr/5bin, hunt down the Script&Source Collection in
   galilei.fy.chalmers.se:/pub/que/que-1.23.tar.Z. BEWARE: In wipe_que.sed,
   change "rm -f $QUEHOME/queue/running/*.$JOBNBR*" to "... *.$JOBNBR.*"!
   (urban@fy.chalmers.se)

   Sounds good, but as long as NQS ... you guess it.

Thanx a F4240h to all who answered. If you're interested in NQS, read the
second Part below.
                                                                        J. Bern
-------------------------------------------------------------------------------
Installing NQS 3.34 on SunOS
============================

NQS is a Package which implements Queues on its own instead of delivering Jobs
to the existing Unix Queues. The Interface is quite VMS-ish (which is fine for
me). Except for the little Caveats listed below, I compiled it straight out of
the Box. I'm currently running three Queues on each Client (the Batch Queue,
a Pass-through Queue from the Scheduler to it, and a Pass-through Queue from
the (local) User to the Scheduler; On the Scheduler itself, this latter Queue
dispatches the Jobs to the Clients) and tested it with some simple Scripts.
Everything's fine 'til now. One major Difference to batch(1) is that, like
DQS, you get stdout and stderr in Files instead of Mails. My Startup Scripts
moan a little Bit when run by a Batch Job, but no serious Problem. I configured
the Directories so that everything except NQS_SPOOL ist NFS mounted throughout
the Cluster, and of course I used NIS to add some services.

What to hack
------------

In .../proto/make_include, you may want to create Subtargets of "directories"
which leave those Locations already mounted alone. I defined a Target "spools",
kicking out everything not related to NQS_SPOOL. This gets you rid of tinkering
with -root= in /etc/exports for Installation.

In .../src/qpr.c, add "#include <fcntl.h>".

Edit the Makefile's in and below .../msgd; They come with SGI Settings enabled.

The Rest is in the INSTALL File.

What to note
------------

In INSTALL, Item 5) "repeat for each Node" means the last TWO Steps.
inetd.conf is in /etc on SUNs.
Be sure to read the Docs of the msgd Subpackage. You're actually opening big
Gates with its Installation ...
If you have completed the Installation on a Host NFS- and NIS-serving like
mine, you have to do the following Steps only on every Client:
Make the Spool Directories and Files
Edit inetd.conf and kill -HUP inetd
Start ${NQS_LIBEXE}/nqsdaemon (install this in /etc/rc.local if you like)
Run qmgr (the local qmgr doesn't know anything about another Hosts' qmgr Setup,
so you have to do the whole Setup)

Fast Queue Setup
----------------

(for those who don't want to figure it out themselves :-)
Supposed you want to create Batch Queues on Client1, ... Clientn with a central
Queue feeding them on Host BatHost. Then in the qmgr enter:

On BatHost:
cre p schedule dest=(feed1@client1,...,feedn@clientn)
ena q schedule
sta q schedule
set lb_out schedule
set pipeo schedule

On the Clients (which might include BatHost; Commands for Client 1):
cre b exec1
ena q exec1
sta q exec1
cre p feed1 dest=exec1
ena q feed1
sta q feed1
cre p def_batch dest=schedule@bathost
ena q def_batch
sta q def_batch
set lb_in feed1
set pipeo feed1
set pipeo exec1
set def b q def_batch
set sched bathost

Good Luck,
                                                                        J. Bern



This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:08:09 CDT