NAME
lamshrink - Shrink a LAM universe.
SYNTAX
- lamshrink [-dhv] [-w <delay>]
<nodeid>
OPTIONS
- -d
- Print detailed debugging information.
- -h
- Print useful information on this command.
- -v
- Be verbose.
- <nodeid>
- Remove the LAM node with this ID.
- -w <delay>
- Notify processes on the doomed node and pause for <delay>
seconds before proceeding.
DESCRIPTION
An existing LAM session, initiated by lamboot(1),
can be shrunk to include less nodes with lamshrink. One node
is removed for each invocation. At a minimum, the node ID is given
on the command line. Once lamshrink completes, the node ID
is invalid across the remaining nodes (as can be seen by running
lamnodes(1)).
Existing application processes on the target node can be warned
of impending shutdown with the -w option. A LAM signal (SIGFUSE)
will be sent to these processes and lamshrink will then
pause for the given number of seconds before proceeding with
removing the node. By default, SIGFUSE is ignored. A different
handler can be installed with ksignal(2).
All application processes on all remaining nodes are always
informed of the death of a node. This is also done with a signal
(SIGSHRINK), which by default causes a process's runtime route
cache to be flushed (to remove any cached information on the dead
node). If this signal is re-vectored for the purpose of fault
tolerance, the old handler should be called at the beginning of the
new handler. The signal does not, by itself, give the process
information on which node has been removed. One technique for
getting this information is to query the router for information on
all relevant nodes using getroute(2).
The dead node will cause this routine to return an error.
FAULT TOLERANCE
If enabled with lamboot(1),
LAM will watch for nodes that fail. The procedure for removing a
node that has failed is the same as lamshrink after the
warning step. In particular, the SIGSHRINK signal is delivered.
EXAMPLES
- lamshrink -v n1 Remove LAM on n1. Report about important steps
as
- they are done.
- lamshrink n30 -w 10
- Inform all processes on LAM node 30, that the node will be dead
in 10 seconds. Wait 10 seconds and remove the node. Operate
silently.
SEE ALSO
lamboot(1),
lamnodes(1),
ksignal(2),
getroute(2)