Desc: Network UPS Tools design document
File: design.txt
Date: 30 April 2003
Auth: Russell Kroll <rkroll@exploits.org>

Here is the general layout for the software.  It has a layered scheme,
in which clients, the server, and the drivers are separated.  Layers 
communicate with each other using an agreed-upon protocol.

The layering
============

  CLIENTS: upsmon, upsc, upsrw, upsstats, upsset, etc. (via upsclient)
  
           < network: TCP sockets, typically on port 3493 >

   SERVER: upsd

           < Unix domain sockets with text-based messages >

  DRIVERS: apcsmart, bestups, fentonups, powercom, etc.

           < serial communications, SNMP, USB, etc. >

EQUIPMENT: Smart-UPS 700, Fenton PowerPal 660, etc. (actual UPS hardware)

How information gets around
===========================

From the equipment
------------------

DRIVERS talk to the EQUIPMENT and receive updates.  For most hardware this
is polled (DRIVER asks EQUIPMENT about a variable), but forced updates are
also possible.  The exact method is not important, as it is abstracted
by the driver.

From the driver
---------------

The core of all DRIVERS maintains internal storage for every variable
that is known, and all of the auxiliary data for those variables.  It
sends updates to this data to any process which connects to the Unix
domain socket.

The DRIVERS will also provide a full copy of their internal knowledge
upon receiving the "DUMPALL" command on the socket.  The dump is in the
same format as updates, and is followed by "DUMPDONE".  When "DUMPDONE"
has been received, the view is complete.

The SERVER will connect to the socket of each DRIVER and will request a
dump at that time.  It retains this data in local storage for later use.
It continues to listen on the socket for additional updates .

This protocol is documented in sock-protocol.txt.

From the server
---------------

The SERVER's internal storage maintains a complete copy of the data
which is in the DRIVER, so it is capable of answering any request
immediately.  When a request for data arrives from a CLIENT, the SERVER
looks through the internal storage for that UPS and returns the
requested data if it is available.

The format for requests from the CLIENT is documented in protocol.txt.

Instant commands
================

Instant commands is the term given to a set of actions that result in
something happening to the UPS.  Some of the common ones are BTEST1 to
initiate a battery test and FPTEST to test the front panel of the UPS.

They are passed to the SERVER from a CLIENT using an authenticated
network connection.  The SERVER first checks to make sure that the instant
command is valid for the DRIVER.  If it's supported, a message is sent 
via a socket to the DRIVER containing the command and any auxiliary
information.

At this point, there is no confirmation to the SERVER of the command's
execution.  This is planned for a future release.  This has been delayed
since returning a response involves some potentially interesting timing
issues.  Remember that upsd services clients in a round-robin fashion,
so all queries must be lightweight and speedy.

Setting variables
=================

Some variables in the DRIVER can be changed, and carry the FLAG_RW flag.
Upon receiving a SET command from the CLIENT, the SERVER first verifies
that it is valid for that DRIVER in terms of writability and data type.
If those checks pass, it then sends the SET command through the socket,
much like the instant command design.

The DRIVER is expected to commit the value to the EQUIPMENT and update
its internal representation of that variable.

Like the instant commands, there is currently no acknowledgement of the
command's completion from the DRIVER.  This, too, is planned for a future
release.

Example data path
=================

Here's the path a piece of data might take through this architecture.
The event is a UPS going on battery, and the final result is a pager
delivering the alpha message to the admin.

1. EQUIPMENT reports on battery by setting flag in status register

2. DRIVER notices this flag and stores it in the status variable as
   OB.  This update gets pushed out to any listeners via the sockets.

3. SERVER upsd sees activity on the socket, reads it, parses it, and 
   commits the new data to its local version of the status variable. 

4. CLIENT upsmon does a routine poll of SERVER for "STATUS" and gets "OB".

5. CLIENT upsmon then invokes its NOTIFYCMD which is upssched.

6. upssched starts up a daemon to handle a timer which will expire about
   30 seconds into the future.

7. 30 seconds later, the timer expires since the UPS is still on battery,
   and upssched calls the CMDSCRIPT upssched-cmd.

8. upssched-cmd parses the args and calls sendmail.

9. Avian carriers, smoke signals, SMTP, and some magic result in the
   message getting from the pager company's gateway to a transmitter
   and then to the admin's pager.

This scenario requires some configuration, obviously:

1. There's a UPS driver running.
   (Whatever applies for the hardware)

2. upsd has a valid UPS entry in ups.conf for this UPS.

	[myups]
		driver = upsdriver
		port = /dev/port

3. upsd has a valid user for upsmon in upsd.users.

	[monuser]
		password = somepass
		allowfrom = somewhere
		upsmon master

4. upsmon is set to monitor this UPS in upsmon.conf.
   
	MONITOR myups@localhost 1 monuser somepass master

5. upsmon is set to EXEC the NOTIFYCMD for the ONBATT condition in
   upsmon.conf.

	NOTIFYFLAG ONBATT EXEC

6. upsmon calls upssched as the NOTIFYCMD in upsmon.conf.

	NOTIFYCMD /path/to/upssched

7. upssched has a 30 second timer for ONBATT in upssched.conf.

	AT ONBATT * START-TIMER upsonbatt 30

8. upssched calls upssched-cmd as the CMDSCRIPT in upssched.conf.

	CMDSCRIPT /path/to/upssched-cmd

8. upssched-cmd knows what to do with "upsonbatt" as its first argument
   (A quick case..esac construct, see the examples)

==============================================================================

History
=======

Very old versions of this software only supported the notion of state
files.  Status was written into a static array ("the info array") by
drivers, and that array was stored on disk.  upsd would periodically
read that file into a local copy of that array.

Shared memory mode was added a bit later, and that removed some of the
lag from the status updates.  Unfortunately, it didn't have any locking
originally, and the possibility for corruption due to races existed.

mmap() support was added at some point after that, and became the
default.  The drivers and upsd would mmap() the file into memory and
read or write from it.  Locking was done using the state file as the
token, so contention problems were avoided.  This method was relatively
quick, but it involved at least 3 copies of the data (driver, disk/mmap,
server) and a whole lot of locking and unlocking.  It could occasionally
delay the driver or server when waiting for a lock.

In April 2003, the entire state management subsystem was removed and
replaced with a single local socket.  The drivers listen for
connections and push updates asynchronously to any listeners.  They also
recognize a few commands.  Drivers also dampen updates, and only push
them out when something actually changes.

As a result, upsd no longer has to poll any files on the disk, and can
just select() all of its fds and wait for activity.  When one of them is
active, it reads the fd and parses the results.  Updates from the
hardware now get to upsd about as fast as they possibly can.

Drivers used to call setinfo() to change the local array, and then would
call writeinfo() to push the array onto the disk, or into the 
mmap/shared memory space.  This introduced a lag since many drivers poll
quite a few variables during an update.
