NAME
- nettee - a network "tee" program
SYNOPSIS
nettee [options]
DESCRIPTION
nettee passes a data stream to one or more child nodes
using a daisychain method. On each node nettee may also
direct the stream to a file or pipe. nettee allows large
amounts of data to be quickly distributed to multiple nodes on a
network at a rate limited only by the network bandwidth. The
distribution chain is typically linear for each network switch but
may branch when nodes utilize multiple switches. For maximum
throughput only one instance of nettee should utilize each
network interface.
When nettee starts it waits for a connection from the
upstream node before attempting to connect to its downstream nodes.
Consequently nettee may be started on the nodes in any order
(by a script, rsh, ssh, and so forth.) Typically only the node that
reads the data stream for stdin or a file will be set to log
messages, so that the progress of the transfer may be monitored.
Transmission errors are detected by comparing the total number of
bytes read by each child node with the number of bytes transmitted
to that child.
Error Handling
By default severe errors cause the entire
chain to abort. By utilizing the -conwf and -colwf
options nettee may be instructed to do its best to continue
processing in the event of certain write failures of the data
stream. Note that failures which occur while the distribution chain
is forming are still fatal events. To allow the program to continue
with a truncated or alternate chain if chain formation errors are
encountered utilize the -connf option, and optionally
specify alternate targets in each hostlist. If the node above the
failed node is allowed to emit messages and errors ( for instance:
-v 5 ) messages similar to these will be sent to the log
destination ( -log ):
Failures detected in child 0 [node34]: NWF
Failures detected in child 1 [node35]: NONE
Failures detected in chain: NWF
The first type of message describes the failures that were
detected in the named child node, that is, those named in the
-next option. The second message describes failures that
were detected anywhere further on in the chain. The error codes
currently defined are: NONE no errors, NWF network
write failure, LWF local write failure, BBC child
returned incorrect byte count, BSTAT child returned unknown
or bad status, and NNF could not connect to (one or more)
downstream chain nodes.
Exit Status
nettee will normally emit an
EXIT_SUCCESS status. (0 on Unix.) This is true even if the
errors were detected and handled in the node itself or in a child
node. nettee will emit an EXIT_FAILURE status if it
was forced to close by an unhandled event such as a timeout, write
failure, or unexpected socket closure.
OPTIONS
- -h
- Print help information.
- -hexamples
- Print examples.
- -herrors
- Print error status codes.
- -i
- Print version, license, and copyright information.
- -in <SRC>
- Reads data from <SRC> which may have one of three
values: nettee reads from the upstream node; - reads
from stdin; socket read the output of a command from a
socket; filename reads from a file. If no -in option
is present the programs reads data from the upstream node.
- -out <DST>
- Writes data locally to <DST> which may have one of
three values: none writes nothing locally; - writes
to stdout; socket write the datastream to a command through
a socket; filename writes to a file. If no -out
option is present the program writes data to stdout.
- -next <HOSTLISTS>
- Writes data to downstream destination[s]
hostlist1(,hostlist2(,hostlist3(...))) where the hostlist
entries are separated by commas or spaces. A hostlist consists of
either a single hostname, or a comma separated list of hostnames
enclosed in square brackets. Example:
node1,[node2,node3],[node4,node5,node6],node7. The bracketed
form allows for automatic failover if unreachable nodes are
encountered and if -connf is specified. The first hostname
in the list is tried, then the next, and so on. There may be 1-8
hostlists. The number of hostlists controls the topology of the
distribution chain. Use a linear distribution chain (a single
hostlist) when all nodes share a single network switch. Use a
forked distribution chain (multiple hostlist) when nodes are
connected to two or more network switches. The End of Chain
condition (no downstream write) is indicated by a
<HOSTS> value of . , , or _EOC_. An End
of Chain condition is also indicated by the absence of an
-next option. If End of Chain is indicated there may not be
any other hostslists specified.
- -cmd <COMMAND>
- Specifies the command to use in conjunction with an -in
socket or -out socket option. Since only a single
<COMMAND> may be specified socket may not be
applied to both -in and -out at the same time. When
-cmd is used with -in socket a child process running
<COMMAND> reads data from a disk or other device and
writes the resulting data stream to stdout. When -cmd is
used with -out socket a child process running
<COMMAND> reads the datastream from stdin and writes
the processed data to a disk or other device. Typically the
<COMMAND> string invokes tar or some other
archiving program. In some instances using sockets and -cmd
will be faster than using the same command in a pipe due to the
larger buffer size used for the socket. Run nettee
-hexamples to see a usage example.
- -stm <EOS>
- stream text through a nettee chain until the string
<EOS> is encountered, then exit. This allows short
text messages to traverse the chain without waiting for a buffer to
fill. Since the text message can very rapidly traverse the
nettee chain it can be piped into execinput (or any
other program that will execute its stdin as commands) to produce
essentially simultaneous execution on all target nodes. The
<EOS> string is not passed through the data chain and
its length is ignored. When used to start further nettee
processes on the target nodes <PORT> values must be
chosen to avoid interference. While this mode may be convenient for
setting up Beowulf nodes it is exceedingly dangerous for general
use since any command introduced into the command stream will
execute on all chain nodes as if submitted by the owner of the
nettee process on that node. Run nettee -hexamples to see a
usage example.
- -name <STRING>
- Specify the node name used in messages (<=127 characters).
If not supplied the values of the environmental variables
MYHOSTNAME and HOSTNAME are first checked, and if
those are not defined, the result of a gethostname() call is
used.
- -log <LDST>
- Errors and messages are written to <LDST> which
may have one of two values: - writes to stderr or
filename writes to a file. If no -log option is
present the program writes messages to stderr.
- -p,-port <PORT>
- First of two consecutive ports use for communication. If no
-port option is present the program uses the default value
of 9997.
- -v <VERBOSE>
- <VERBOSE> is a bit mask which controls the types
of warning and error messages which are sent to the -log
destination. Bit values indicate: 1 show error messages;
2 show command line settings; 4 show messages;
8 show periodic status messages during transfer; 16
prepend nodename to all messages. Use a <VERBOSE>
value of 0 to eliminate all messages. If no -v is present
the program uses a default <VERBOSE> value of 1.
- -q
- Suppresss "ignored signal" messages.
- -t <WAIT>
- Wait up to <WAIT> seconds for a connection from
upstream in the chain to form or data to be received. If neither of
these events occur exit with an error. A value of 0 waits
forever and will only exit on an end of data condition. If no
-t is present the program uses a default <WAIT>
value of 0. The -iconnf<WAIT> and -w
options control timeouts for downstream connections.
- -w
- Wait for the next node to boot or attach to the network. If not
specified and the next node is not reachable nettee will
exit with an error no matter what the -t <WAIT>
and -iconnf <WAIT> timeout values are.
- -colwf
- Continue on Local Write Failure. Normally the failure of a
write of the data stream to the local output will be fatal and the
entire distribution chain will collapse immediately. (Typically
this happens when data is written to disk and a partition fills or
there is an ownership problem. A complete disk failure may
initially present this way but often goes on to crash the node,
resulting also in a network write failure.) When -colwf is
set and a local write failure occurs on a node that node will
continue to relay data down the chain. The node that failed will
not have correctly processed the data stream locally but all other
nodes will be unaffected by this failure. The top node will emit an
error message when this occurs so that a subsequent analysis with
other tools may locate the node(s) which failed. This option may
only be employed on a node that reads data from an upstream
node.
- -conwf
- Continue on Network Write Failure. Normally the failure of a
write of the data stream to the next node will be fatal and the
entire distribution chain will collapse immediately. (Typically
this happens when a node crashes while nettee is running.) When
-conwf is set and a network write failure occurs on a node
(indicating that the next node has failed) the node will continue
to process the data stream locally but will make no further
attempts to transfer data to the next node in the chain. This
allows the data transfer to complete on a chain down to the node
above a failed node. The top node will emit an error message when
this occurs so that a subsequent analysis with other tools may
locate the node(s) which failed. This option may only be employed
on a node that reads data from an upstream node
- -connf <WAIT>
- Continue on Next Node Failure. Give each node in a hostlist
<WAIT> seconds to join the chain. After that each
successive host in the hostlist is given <WAIT>
seconds to join, and if none succeed, no data will be sent to any
of those hosts. If -connf is not specified or the wait time
is set to zero seconds, the program will wait forever for a
connection to the first node in each hostlist.
RELATED PROGRAMS
netcat(1).
nettee is derived from Felix Rauch's dolly which
is available here:
The nettee home page is:
COPYRIGHTS
Copyright: 2007 David Mathog and Caltech.
Copyright: Felix Rauch and ETH Zurich
LICENSE
Freely distributed under the second GNU General
Public License (GPL 2).
AUTHOR
David Mathog
Biology Division, Caltech