NAME
pdcp - copy files to groups of hosts in parallel
SYNOPSIS
pdcp [options]... src [src2...] dest
DESCRIPTION
pdcp is a variant of the rcp(1) command.
Unlike rcp(1), which
copies files to a single remote host, pdcp can copy files to
multiple remote hosts in parallel. However, pdcp does not
recognize files in the format ``rname@rhost:path,'' therefore all source
files must be on the local host machine. Destination nodes must be
listed on the pdcp command line using a suitable target
nodelist option (See the OPTIONS section below). Each
destination node listed must have pdcp installed for the
copy to succeed.
When pdcp receives SIGINT (ctrl-C), it lists the status
of current threads. A second SIGINT within one second terminates
the program. Pending threads may be canceled by issuing ctrl-Z
within one second of ctrl-C. Pending threads are those that have
not yet been initiated, or are still in the process of connecting
to the remote host.
Like pdsh(1), the
functionality of pdcp may be supplemented by dynamically
loadable modules. In pdcp, the modules may provide a new
connect protocol (replacing the standard rsh(1) protocol),
filtering options (e.g. excluding hosts that are down), and/or host
selection options (e.g. -a selects all nodes from a local
config file). By default, pdcp requires at least one "rcmd"
module to be loaded (to provide the channel for remote copy).
RCMD MODULES
The method by which pdcp connects to
remote hosts may be selected at runtime using the -R option
(See OPTIONS below). This functionality is ultimately
implemented via dynamically loadable modules, and so the list of
available options may be different from installation to
installation. A list of currently available rcmd modules is printed
when using any of the -h, -V, or -L options.
The default rcmd module will also be displayed with the -h
and -V options.
A list of rcmd modules currently distributed with
pdcp follows.
- rsh
- Uses an internal, thread-safe implementation of BSD rcmd(3) to run
commands using the standard rsh(1) protocol.
- ssh
- Uses a variant of popen(3) to run
multiple copies of the ssh(1) command.
- mrsh
- This module uses the mrsh(1) protocol
to execute jobs on remote hosts. The mrsh protocol uses a
credential based authentication, forgoing the need to allocate
reserved ports. In other aspects, it acts just like rsh.
- krb4
- The krb4 module allows users to execute remote commands after
authenticating with kerberos. Of course, the remote rshd daemons
must be kerberized.
- xcpu
- The xcpu module uses the xcpu service to execute remote
commands.
OPTIONS
The list of available pdcp options is
determined at runtime by supplementing the list of standard
pdcp options with any options provided by loaded rcmd
and misc modules. In some cases, options provided by modules
may conflict with each other. In these cases, the modules are
incompatible and the first module loaded wins.
Standard target nodelist options
- -w host,host,...
- Target the specified list of hosts. Do not use with any other
node selection options (e.g. -a, -g if they are
available). No spaces are allowed in the comma-separated list. A
list consisting of a single `-' character causes the target hosts
to be read from stdin, one per line. The host list may contain
hostlist expressions of the form ``host[1-5,7]''. For more
information about the hostlist format, see the HOSTLIST
EXPRESSIONS section below.
- -x host,host,...
- Exclude the specified hosts. May be specified in conjunction
with other target node list options such as -a and -g
(when available). Hostlists may also be specified to the -x
option (see HOSTLIST EXPRESSIONS secion
below).
Standard pdcp options
- -h
- Output usage menu and quit. A list of available rcmd modules
will be printed at the end of the usage message.
- -q
- List option values and the target nodelist and exit without
action.
- -b
- Disable ctrl-C status feature so that a single ctrl-C kills
parallel copy. (Batch Mode)
- -r
- Copy directories recursively.
- -p
- Preserve modification time and modes.
- -l user
- This option may be used to copy files as another user, subject
to authorization. For BSD rcmd, this means the invoking user and
system must be listed in the user's .rhosts file (even for root).
- -t seconds
- Set the connect timeout. Default is 10 seconds.
- -f number
- Set the maximum number of simultaneous remote copies to
number. The default is 32.
- -R name
- Set rcmd module to name. This option may also be set via
the PDSH_RCMD_TYPE environment variable. A list of available rcmd
modules may be obtained via either the -h or -L
options.
- -L
- List info on all loaded pdcp modules and quit.
- -d
- Include more complete thread status when SIGINT is received,
and display connect and command time statistics on stderr when
done.
- -V
- Output pdcp version information, along with list of
currently loaded modules, and exit.
HOSTLIST EXPRESSIONS
As noted in sections above,
pdcp accepts ranges of hostnames in the general form:
prefix[n-m,l-k,...], where n < m and l < k, etc., as an
alternative to explicit lists of hosts. This form should not be
confused with regular expression character classes (also denoted by
``[]''). For example, foo[19] does not represent foo1 or foo9, but
rather represents a degenerate range: foo19.
This range syntax is meant only as a convenience on clusters
with a prefixNN naming convention and specification of ranges
should not be considered necessary -- the list foo1,foo9 could be
specified as such, or by the range foo[1,9].
Some examples of range usage follow:
Copy /etc/hosts to foo01,foo02,...,foo05
pdcp -w foo[01-05] /etc/hosts /etc
Copy /etc/hosts to foo7,foo9,foo10
pdcp -w foo[7,9-10] /etc/hosts /etc
Copy /etc/hosts to foo0,foo4,foo5
pdcp -w foo[0-5] -x foo[1-3] /etc/hosts /etc
As a reminder to the reader, some shells will interpret brackets
('[' and ']') for pattern matching. Depending on your shell, it may
be necessary to enclose ranged lists within quotes. For example, in
tcsh, the first example above should be executed as:
pdcp -w "foo[01-05]" /etc/hosts /etc
ORIGIN
Pdsh/pdcp was originally a rewrite of IBM dsh(1) by Jim
Garlick <garlick@llnl.gov> on LLNL's ASCI
Blue-Pacific IBM SP system. It is now also used on Linux clusters
at LLNL.
LIMITATIONS
When using ssh for remote execution,
stderr of ssh to be folded in with that of the remote command. When
invoked by pdcp, it is not possible for ssh to prompt for
confirmation if a host key changes, prompt for passwords if RSA
keys are not configured properly, etc.. Finally, the connect
timeout is not adjustable when ssh is used.
Host range parsing assumes numerical part of hostname is at the
end, e.g. specifying foo[0-5]bar will not work.
SEE ALSO
pdsh(1)