NAME
nget - retrieve files from NNTP (usenet news) hosts
SYNOPSIS
nget [...]
DESCRIPTION
nget retrieves messages matching a
regular expression, and decodes any files contained within.
Multipart messages are automatically pieced together. Parts from
multiple servers will be combined if needed.
OPTIONS
The order options are specified is significant. In general, an
option will only affect options that come after it on the command
line.
- -q/--quiet
- When specified once, will disable printing of auto-updating
text to allow the output to be redirected/logged without garbage in
it. When specified twice, will disable printing of merely
informative messages. Errors will still be printed.
- -h/--host host
- Force only the given host to be used for subsequent commands.
(Must be configured in .ngetrc.) Can reset to standard
auto-choosing method with -h ""
- -a/--available
- Update the list of available newsgroups. Subsequent -r/-R
commands can be use to search for newsgroups.
- -A/--quickavailable
- Like -a/--available, but does not update the list, only makes
it available for searching.
- -X/--xavailable
- Search the group list, but without loading cache file or
retrieving full group list. Instead, the search will be done on the
server. Compared to -a/-A this has the advantage of not requiring
any disk space for cache files, and not requiring the initial
retrieval of the full group list. The disadvantages are not all
servers supporting the required NNTP extensions, the inability to
use complex regexs due to the need to convert it to the simpler
wildmat format, and the possibility that the commands can be quite
slow if the server is overloaded (you may need to increase the
timeout value in some cases).
- -g/--group group(s)
- Update the list of available files in group(s). Multiple groups
can be specified by seperating them with commas. All cached groups
can be selected with "*". If a host has been specified before with
-h, it will retrieve headers only from that host. Otherwise it will
retrieve headers for all hosts above _glevel (see configuration
section for more info on priorities.) Subsequent -r/-R commands can
be used to retrieve files.
- -G/--quickgroup group(s)
- Like --group, but does not retrieve new headers.
- -x/--xgroup group(s)
- Use group(s) for subsequent -r commands, but without loading
cache file or retrieving full header list. Instead, the XPAT
command will used to retrieve only the matching headers. Compared
to -g/-G this has the advantage of not requiring any disk space for
cache files, and not requiring the initial retrieval of the full
header list. The disadvantages are not all servers supporting XPAT,
the inability to use complex regexs due to the need to convert it
to the simpler wildmat format, and the possibility that the xpat
command can be quite slow if the server is overloaded (you may need
to increase the timeout value in some cases).
- -F/--flushserver host
- Following -g/-G: Flush all headers for server from current
group(s).
Following -a/-A: Flush all groups/descriptions for server from
grouplist.
- -r/--retrieve regex
- Following -g/-G/-x: Matches regex against subjects of
previously selected group(s), and retrieves ones that match.
Following -a/-A: Matches regex against newsgroup names and
descriptions and lists ones that match. (-T required)
- -R/--expretrieve expression
- Like -r, but matches expression instead of merely a regexp.
(see EXPRETRIEVE EXPRESSIONS section for more info.) Expression is
a postfix expression that can contain these keywords:
Following -g/-G: subject, author, lines, bytes, have, req, date,
age, update, updateage, messageid(or mid), references. Note that
the --limit argument does not affect the option, if you want to
limit based on number of lines, add it as part of the
expression.
Following -a/-A: group, desc.
- -@/--list LISTFILE
- Specify a file to load a list of command line args from. Looks
in ~/.nget5/lists/ dir by default. A # char in a listfile that is
the first character on a line or is preceeded by whitespace and not
quoted starts a comment which lasts until the end of the line.
- -p/--path DIRECTORY
- Path to store subsequent retrieves. Also sets -P, and clears
previously specified dupepaths. Relative to path which nget was
started in. (Except in the case of inside a -@, which will be
relative to the cwd at the time of the -@.)
- -P/--temppath DIRECTORY
- Store temporary files in path instead of the current dir.
- --dupepath DIRECTORY
- Check for dupe files from specified path in addition to normal
path. Can be specified multiple times.
- -m/--makedirs no,yes,ask,<max # of directory levels to
create>
- Make dirs specified by -p and -P. Default is no. If yes, will
make dirs automatically. If #, if the number of directories that
would need to be created is greater than the number given, the
answer will be interpreted as no. If ask, nget will prompt the user
when trying to change to a dir that does not exist. Valid responses
to the prompt are y[es], n[o], and a max number of directory levels
to create. (This means that if you get in the habit of answering
"1" rather than "y", and one day typo the first portion of a path
you won't accidentally create a bunch of dirs in the wrong place.)
- -T/--testmode
- Causes --retrieve to merely print out all matching files.
- --text ignore,files,mbox[:filename]
- Specifies how to handle text posts. The default is files. OPT
can be ignore to save only binaries, "files" to save each text post
in a different file, and "mbox" to save each text post as a message
in a mbox format mailbox. The name of the mbox file to save in can
be specified with mbox:filename, the default is nget.mbox. If the
filename ends in .gz, it will automatically be gzipped. Unless the
filename has an absolute path, it is interpreted as relative to the
retrieve path.
- --save-binary-info yes,no
- Specifies whether to save text messages for posts that
contained only binary data. (If you want to see the headers.)
- --test-multiserver OPT
- Causes testmode to display which servers have parts of each
file. OPT may be no to disable(default), long for a verbose output,
and short for a more condensed form. (In short mode, the shortname
of each server is printed with no seperating space, and it is
upper-cased if that server does not have all the parts. If the
server has no shortname specified, it defaults to the first char of
the server alias.)
- --fullxover OPT
- Override the fullxover settings of the config file. The default
is -1, which doesn't override.
- -M/--mark
- Mark matched files as retrieved.
- -U/--unmark
- Unmark matched files as retrieved. (Automatically sets -dI)
- -t/--tries int
- Set maximum number of retries. -1 will retry indefinatly
(probably not a good idea).
- -l/--limit int
- Set the minimum number of lines a message (or total number of
lines for a multi-part message) must have to be considered for
retrieval.
- -L/--maxlines int
- Set the maximum number of lines a message must have to be
considered for retrieval. (-1 for unlimited)
- -s/--delay int
- Set the number of seconds to wait between retry attempts.
- --timeout int
- Set the number of seconds to wait for a reply from the nntp
server before giving up.
- -i/--incomplete
- Retrieve files with missing parts.
- -I/--complete
- Retrieve only files with all parts.
- --decode
- Decode and delete temp files (default)
- -k/--keep
- Decode and keep temp files.
- -K/--no-decode
- Keep temp files, and don't try to decode them.
- -c/--case
- Match case sensitively.
- -C/--nocase
- Match case insensitively.
- --autopar
- Enable automatic parfile handling. (default) Only download as
many par files as needed to replace missing or corrupt files.
- --no-autopar
- Disable automatic parfile handling. All parfiles that match the
expression will be downloaded.
- -d/--dupecheck FLAGS
- Check to make sure you don't already have files. This is done
in two ways. The first ("f") is by compiling a list of all files in
the current directory, then checking against all messages to be
retrieved to see if one of the filenames shows up in the subject.
This works reasonably well, though sometimes the filename isn't in
the subject. It can also cause problems if you happen to have files
in the directory named silly things like "a", in which case all
messages with the word "a" in them will be skipped. However, it is
still smart enough not to skip messages that merely have a word
containing "a".
The second method ("i") is by setting a flag in the header cache
that will prevent it from being retrieved again. You can use combos
such as -dfi to check both, -dFi to only check the flag, -dfI to
only check files, etc.
The third ("m") will cause files that are found by the dupe file
check ("f") to be marked as retrieved in the cache. (Useful for
handling crossposted binaries and/or binaries saved with another
newsreader.)
- -D/--nodupecheck
- Don't check either of the --dupecheck methods, retrieve any
messages that match.
- -N/--noconnect
- Do not connect to any server for retrieving articles. Useful
for trying to decode as much as you have. (if you got stuff with -K
or ngetlite.)
- -w/--writelite LITEFILE
- Write a list of parts to retrieve with ngetlite.
- --help
- Show help.
EXPRETRIEVE EXPRESSIONS
Expressions are in postfix order.
For the int, date, and age types, standard int comparisons are
allowed (==, !=, <, <=, >, >=). For regex types,
==(=~), !=(!~) are allowed. Thus a comparison would take the
following form:
Infix: <keyword> <operator> <value> Postfix:
<keyword> <value> <operator> Comparisons can be
joined with &&(and), ||(or).
Infix: <comparison> && <comparison> Postfix:
<comparison> <comparison> &&
-g/-G keywords
- subject (regex)
- Matches the Subject: header.
- author (regex)
- Matches the From: header.
- lines (int)
- Matches the Lines: header.
- bytes (int)
- Matches the length of the message in bytes
- have (int)
- Matches the number of parts of a multipart file that we have.
- req (int)
- Matches the total number of parts of a multipart file.
- date (date)
- Matches the Date: header. All the standard formats are
accepted.
- age (age)
- Matches the time since the Date: header.
Format: [X y[ears]] [X mo[nths]] [X w[eeks]] [X d[ays]] [X h[ours]]
[X m[inutes]] [X s[econds]]
Ex.: "6 months 7 hours 8 minutes"
Ex.: "6mo7h8m"
- update (date)
- Matches the "update time" of the cache item. That is, the most
recent time that a new part of the file has been added. For
example, if part 1 was added one day, and part 2 only appeared on
the server the next day, then the update time would be when part 2
was added on the second day. But if both parts were seen on the
first day, then seen again from a different server on the second
day, the update time would stay at the original value.
- updateage (age)
- Matches the time since the update of the cache item.
- messageid (regex), mid (regex)
- Matches the Message-ID header. (For multi-part posts, it
matches the message-id of the first part.)
- references (regex)
- Matches any of the message's References.
-a/-A keywords
- group (regex)
- Matches the newsgroup name.
- desc (regex)
- Matches the newsgroup description.
CONFIGURATION
Upon startup, nget will read ~/.nget5/.ngetrc
for default configuration values and host/group aliases. An example
.ngetrc should have been included with nget. nget will also check
~/_nget5/ and _ngetrc if needed, to handle OS and filesystems that
can't (or won't) handle files starting with a period. Options are
specified one per line in the form:
- key=value
Values may be strings(any sequence of
characters ending in a newline, not quoted), integers(whole
numbers), floats(decimal numbers), boolean(0=false/1=true).
Subsections are specified in the form:
- {section_name
- data
}
where data is any number of
options.
Global Configuration Options
- limit (int, default=0)
- Default value for -l/--limit
- tries (int, default=20)
- Default value for -t/--tries
- delay (int, default=1)
- Default value for -s/--delay
- usegz (int, default=-1)
- Default gzip compression level to use for cache/midinfo files
(can be overridden on a per-group basis). Acceptable values are
-1=zlib default, 0=uncompressed, and 1-9.
- timeout (int, default=180)
- Seconds to wait for a reply from the nntp server before giving
up.
- maxstreaming (int, default=64)
- Sets how many xover commands will be sent at once, when using
fullxover. maxstreaming=0 will disable streaming. Note that setting
maxstreaming too high can cause your connection to deadlock if the
write buffer is filled up and the write command blocks, but the
server will never read more commands since it is waiting for us to
read what it has already sent us.
- maxconnections (int, default=-1)
- Maximum number of connections to open at once, -1 to allow
unlimited open connections. When reached, the servers used least
recently will be disconnected first. (Note that regardless of this
setting, nget never opens more than one connection per server.)
- idletimeout (int, default=300)
- Max seconds to keep an idle connection to a nntp server open.
- curservmult (float, default=2.0)
- Priority multiplier given to servers which are currently
connected. This can be used to avoid excessive server switching.
(Set to 1.0 if you want to disable it.)
- penaltystrikes (int, default=3)
- Number of consecutive connect errors before penalizing a
server, -1 to disable penalization.
- initialpenalty (int, default=180)
- Number of seconds to ignore a penalized server for.
- penaltymultiplier (float, default=2.0)
- Multiplier for penalty time for each time the penalty time runs
out and the server continues to be down.
- case (boolean, default=0)
- Default for regex case sensitivity. (0=-C/--nocase,
1=-c/--case)
- complete (boolean, default=1)
- Default for incomplete file filter. (0=-i/--incomplete,
1=-I/--complete)
- dupeidcheck (boolean, default=1)
- Default for already downloaded file filter. (0=-dI, 1=-di)
- dupefilecheck (boolean, default=1)
- Default for duplicate file filter. (0=-dF, 1=-df)
- autopar (boolean, default=1)
- Default for automatic par handling. (0=--no-autopar,
1=--autopar)
- autopar_optimistic (boolean, default=0)
- One problem with automatic par handling, is that sometimes
people do multi-day posts and post the par files first. If
autopar_optimistic is enabled, it will assume that when there
aren't enough .pxx files, that it must just be a multi-day post and
will not grab any pxx files. If autopar_optimistic is off, it grab
all the pxx files so that if they expire before more are posted, we
will already have them.
- quiet (boolean, default=0)
- Default for quiet option. (0=normal, 1=-q)
- tempshortnames (boolean, default=0)
- 1=Use 8.3 tempfile names (for old dos partitions, etc), 0=Use
17.3 tempfile names
- fatal_user_errors (boolean, default=0)
- Makes user/path errors cause an immediate exit rather than
continuing if possible.
- unequal_line_error (boolean, default=0)
- If set, downloaded articles whose actual number of lines does
not match the expected value will be regarded as an error and
ignored. If 0, a warning will be generated but the article will be
accepted.
- fullxover (int, default=0)
- Controls whether nget will check for articles added or removed
out of order when updating header cache. fullxover=0 will follow
the nntp spec and assume articles are always added and removed in
the correct order. fullxover=1 will assume articles may be added
out of order, but are still removed in order. fullxover=2 handles
articles being added and removed in any order.
- makedirs (special, default=no)
- Create non-existant directories specified by -p/-P?
(yes/no/ask/#)
- test_multiserver (special, default=no)
- Display multiserver file complition info in testmode output?
(no=no, short=show shortname of each server that has parts of the
file, lowercase when complete and uppercase when that server only
has some parts, long=show fullname of each server along with a
count of how many parts it has if it does not have them all.)
- text (special, default=files)
- Default for the --text option (possible values are
ignore,files,mbox[:filename]).
- save_binary_info (boolean, default=0)
- Default for the --save-binary-info option.
- cachedir (string)
- Specifies a different location to store cache files. Could be
used to share a single cache dir between a trusted group of users,
to reduce HD/bandwidth usage, while still allowing each user to
have their own config/midinfo files.)
Host Configuration
Host configuration is done in the halias
section, with a subsection for each host containing its options:
- address (string, required)
- Address of the server, with optional port number seperated by a
colon. To specify a literal IPv6 address with a port number, use
the format "[address]:port".
- id (int, required)
- An identifier for this server. The id uniquely identifies a
certain set of header cache data. You may specify the same id in
more than one host, for example if you have multiple accounts on a
server to avoid to storing the same cache data multiple times. The
id should not be changed after you have used it. Must be greater
than 0 and less than ULONG_MAX. (usually 4294967295).
- shortname (string, default=first character of host alias)
- The shortname to use for this server.
- user (string)
- Username for the server, if it requires authorization.
- pass (string)
- Password for the server, if it requires authorization.
- fullxover (int)
- Override global fullxover setting for this server only.
- maxstreaming (int)
- Override global maxstreaming setting for this server only.
- idletimeout (int)
- Override global idletimeout setting for this server only.
- linelenience (special, default=0)
- The linelenience option may be specified as either a single
int, or two ints seperated by a comma. If only a single int, X is
specified, then it will be interpeted as shorthand for "-X,+X".
These values specify the ammount that the real (recieved) number of
lines (inclusive) for an article may deviate from the values
returned by the server in the header listings. For example, "-1,2"
means that the real number of lines may be one less than, equal to,
one greater than, or two greater than the expected amount. For
example, the following host section defines a single host "host1",
with nntp authentication for user "bob", password "something", and
the fullxover option enabled.
- {halias
- {host1
- addr=news.host1.com
id=3838
user=bob
pass=something
fullxover=1
linelenience=-1,2
}
}
Server Priority Configuration
Multiserver priorities are
defined in the hpriority section. Multiple priority groups can be
made, and different newsgroups can be configured to use their own
priority grouping, or they will default to the "default" group. The
-a option will use the "_grouplist" priority group if it exists,
otherwise it will use the "default" group. The hpriority section
contains a subsection for each priority group, with data items of
server=prio-multiplier, and the special items _level=float and
_glevel=float. _level sets the priority level assigned to any host
not listed in the group, and _glevel sets the required priority
needed for -g and -a to automatically use that host. Both _level
and _glevel default to 1.0 if not specified. The priority group
"trustsizes" also has special meaning, and is used to choose which
servers reporting of article line/byte counts to trust when
reporting to the user. For example, the following section defines
the default priority group and the trustsizes priority group. If
all hosts have a certain article, goodhost will be most likely to
be chosen, and badhost least likely. It also sets the default
priority level to 1.01, meaning any hosts not listed in this group
will have a priority of 1.01. When using -g without first
specifying a host, only those with prios 1.2 or above will be
selected.
- {hpriority
- {default
- _level=1.01
_glevel=1.2
host1=1.9
goodhost=2.0
badhost=0.9
}
{trustsizes
- goodhost=5.0
badhost=0.1
}
}
Newsgroup Alias Configuration
Newsgroup aliases are defined
in the galias section. An alias can be a simple alias=fullname data
item, or a subsection containing group=, prio=, and usegz= items.
The per-group usegz setting will override the global setting. An
alias can also refer to multiple groups (either fullnames or
further aliases). For example, the following galias section defines
an alias of "abpl" for the group "alt.binaries.pictures.linux",
"chocobo" for the group "alt.chocobo", and ospics for both
alt.binaries.pictures.linux and alt.binaries.pictures.freebsd. In
addition, the chocobo group is assigned to use the chocoprios
priority grouping when deciding what server to retrieve from.
- {galias
- abpl=alt.binaries.pictures.linux
{chocobo
- group=alt.chocobo
prio=chocoprios
}
ospics=abpl,alt.binaries.pictures.freebsd
}
EXIT STATUS
On exit, nget will display a summary of the
run. The summary is split into three parts:
- OK
- Lists successful operations.
-
- total
- Total number of "logical messages" retrieved (after joining
parts).
- uu
- Number of uuencoded files.
- base64
- Number of Base64 (Mime) files.
- XX
- Number of xxencoded files.
- binhex
- Number of Binhex encoded files.
- plaintext
- Number of plaintext files saved.
- qp
- Number of Quoted-Printable encoded files.
- yenc
- Number of yEncoded files.
- dupe
- Number of decoded files that were exact dupes of existing
files, and thus deleted.
- skipped
- Number of files that were queued to download but turned out to
be dupes after decoding earlier parts and comparing their filenames
to the subject line. (Same method thats used for the dupe file
check when queueing them up, just that the filename(s) of any
decoded files cannot be known until they are downloaded, so some of
the checking must occur during the run rather than at queue time.)
- group
- Number of groups successfully updated.
- grouplist
- Newsgroup list successfully updated.
- autopar
- Number of parity sets that are complete.
- WARNINGS
-
-
- group
- Updating group info failed for some (but not all) attempted
servers.
- xover
- Weird things happened while updating group info.
- grouplist
- Updating newsgroup list failed for some (but not all) attempted
servers.
- retrieve
- Article retrieval failed for some (but not all) attempted
servers.
- undecoded
- Articles were not decoded (usually because -K was used).
- unequal_line_count
- Some articles retrieved had different line counts than the
server said they should. (And unequal_line_error is set to 0).
- dupe
- Number of decoded files that had the same name as existing
files, but different content.
- autopar
- Weirdness encountered reading par files, such as encountering
unknown par versions, or non-ascii filenames in the pars.
- ERRORS
- Lists errors that occured. In addition, the exit status will be
set to a bitwise OR of the codes of all errors that occured. (Note
that some errors share an exit code, since there are only 8 bits
available.)
-
- decode (exit code 1)
- Number of file decoding errors.
- autopar (exit code 2)
- Number of parity sets that could not be completed.
- path (exit code 4)
- Errors changing to paths specified with -p or -P.
- user (exit code 4)
- User errors, such as trying -r without specifying a group
first.
- retrieve (exit code 8)
- Number of times article retrieval failed for all attempted
servers.
- group (exit code 16)
- Number of times header retrieval failed for all attempted
servers.
- grouplist (exit code 32)
- Number of times newsgroup list retrieval failed for all
attempted servers.
- fatal (exit code 128)
- Error preventing further operation, such as "No space left on
device".
- other (exit code 64)
- Any other kind of error.
EXAMPLES
The simplest possible example. Retrieve and decode
everything from alt.binaries.test that you haven't already gotten
before:
nget -g alt.binaries.test -r "" get listing of all files
matching penguin.*png from alt.binaries.pictures.linux (note this
is a regex, equivilant to standard shell glob of penguin*png.. see
the regex(7) or
grep manpage for more info on regular expressions.)
nget -g alt.binaries.pictures.linux -DTr "penguin.*png"
retrieve all the ones that have more than 50 lines:
nget -g alt.binaries.pictures.linux -l 50 -r "penguin.*png"
equivilant to above, using -R:
nget -g alt.binaries.pictures.linux -R "lines 50 > subject
penguin.*png == &&"
(basically (lines > 50) && (subject == penguin.*png))
flush all headers from host goodhost in group
alt.binaries.pictures.linux:
nget -Galt.binaries.pictures.linux -Fgoodhost
retrieve/update group list, and list all groups with "linux" in the
name or description:
nget -a -Tr linux equivilant to above, using -R:
nget -a -TR "group linux == desc linux == ||" flush all
groups from host goodhost in grouplist:
nget -A -Fgoodhost
NOTES
Running multiple copies of nget at once should be
safe. It uses file locking, so there should be no way for the files
to actually get corrupted. However if you have two ngets doing a -g
on the same group at the same time, it would duplicate the download
for both processes. If you are using -G there is no problem at all.
(Theoretically you might be able to cause some sort of problems by
downloading the same files from the same group in the same
directory at the same time..)
ENVIRONMENT
- HOME
- Where to put .nget5 directory. (put nget files $HOME/.nget5/)
- NGETHOME
- Override HOME var (put nget files in $NGETHOME)
- NGETCACHE
- Override HOME/NGETHOME vars and .ngetrc cachedir option (put
nget cache files in $NGETCACHE)
- NGETRC
- Alternate configuration file to use.
FILES
- ~/.nget5/
- All configuration and cache files are stored here. Changed to
.nget5/ because cache format changed in nget 0.27. (The 5 in the
directory name is for file format version 5, not nget version 5.)
To upgrade a .nget4 directory to .nget5, simply run "mv ~/.nget4
~/.nget5 ; rm ~/.nget5/*,cache*"
- ~/.nget5/.ngetrc
- Configuration file. If you store authentication information
here, be sure to set it readable only by owner.
- ~/_nget5/_ngetrc
- Alternate location, use this if you can't create a dir/file
starting with a period.
- ~/.nget5/lists/
- Default directory for listfiles.
AUTHOR
Matthew Mueller <donut AT dakotacom.net> The
latest version, and other programs I have written, are available
from:
http://www.dakotacom.net/~donut/programs/
ACKNOWLEDGEMENTS
Frank Pilhofer, author of uulib, which
nget depends upon for uudecoding the files once they are
downloaded. http://www.fpx.de/fp/Software/UUDeview/
Peter Brian Clements, author of par2-cmdline, which nget uses a
stripped down version of for its par2 checking. http://parchive.sourceforge.net/
The Unix-socket-faq, which my url for has gone bad, but is
supposedly posted monthly on comp.unix.programmer. Beej's Guide to
Network Programming at http://www.ecst.csuchico.edu/~beej/guide/net/
Jean-loup Gailly and Mark Adler, for the zlib library.
SEE ALSO
ngetlite(1),
(7),
grep(1)