NAME
djvused - Multi-purpose DjVu document editor.
SYNOPSIS
djvused [options]
djvufile
DESCRIPTION
Program djvused is a powerful command
line tool for manipulating multi-page documents, creating or
editing annotation chunks, creating or editing hidden text layers,
pre-computing thumbnail images, and more. The program first reads
the DjVu document djvufile and executes a number of djvused
commands.
Djvused commands can be read from a specific file (when option
-f is specified), read from the command line (when option
-e is specified), or read from the standard input (the
default).
OPTIONS
- -v
- Cause djvused to print a command line prompt before
reading commands and a brief message describing how each command
was executed. This option is very useful for debugging djvused
scripts and also for interactively entering djvused commands on the
standard input.
- -f scriptfile
- Cause djvused to read commands from file
scriptfile.
- -e command
- Cause djvused to execute the commands specified by the
option argument commands. It is advisable to surround the
djvused commands by single quotes in order to prevent unwanted
shell expansion.
- -s
- Cause djvused to save the file djvufile after
executing the specified commands. This is similar to executing
command save immediately before terminating the program.
- -n
- Cause djvused to disregard save commands. This is useful
for debugging djvused scripts without overwriting files on your
disk.
DJVUSED EXAMPLES
There are many ways to use program
djvused. The following examples illustrate some common uses
of this program.
Obtaining the size of a page
Command size outputs
the width and height of the selected pages using a HTML friendly syntax. For instance, the following
command prints the size of page 3 of document
myfile.djvu.
- djvused myfile.djvu -e 'select 3;
size'
Extracting the hidden text
Command print-pure-txt
outputs the text associated with a page or a document. For
instance, the following shell command outputs the text for the
entire document. Lines and pages are delimited by the usual control
characters.
- djvused myfile.djvu -e
'print-pure-txt'
Command print-txt produces a more extensive output
describing the structure and the location of the text components.
The syntax of this output is described later in this man page. For
instance, the following shell command outputs extended text
information for page 3 of document myfile.djvu.
- djvused myfile.djvu -e 'select 3;
print-txt'
Extracting the annotations
Annotation data can be extracted
using command print-ant. The syntax of the annotation data
is described later in this man page. For instance, the following
shell command outputs the annotation data for the first page of
document myfile.djvu.
- djvused myfile.djvu -e 'select 1;
print-ant'
Command print-ant only prints the annotations stored in
the selected component file. Command print-merged-ant also
retrieves annotations from all the component files referenced by
the current page (using INCL chunks) and
prints the merged information.
Dumping/restoring annotations and text
Three commands,
output-txt, output-ant, and output-all,
produce djvused scripts. For instance, the following shell command
produces a djvused script, myfile.dsed, that recreates all
the text and annotation data in document myfile.djvu.
- djvused myfile.djvu -e 'output-all' >
myfile.dsed
Script myfile.dsed is a text file that can be easily
edited. The following shell command then recreates the text and
annotation information in file myfile.djvu.
- djvused myfile.djvu -f myfile.dsed
-s
Extracting a page
Both commands save-page and
save-page-with create a DjVu file representing the selected
component file of a document. The following shell command, for
instance, creates a file p05.djvu containing page 5
of document myfile.djvu.
- djvused myfile.djvu -e 'select 5;
save-page p05.djvu'
Each page of a document might import data from another component
file using the so-called inclusion ( INCL )
chunks. Command save-page then produces a file with
unresolved references to imported data. Such a file should then be
made part of a multi-page document containing the required data in
other component files. On the other hand, command
save-page-with copies all the imported data into the output
file. This file is directly usable. Yet collecting several such
files into a multi-page document might lead to useless data
replication.
Pre-computing thumbnails
Commands set-thumbnails
constructs thumbnails that can be later displayed by DjVu viewers.
The following shell command, for instance, computes thumbnails of
size 64x64 pixels for all pages of file
myfile.djvu.
- djvused myfile.djvu -e 'set-thumbnails 64'
-s
DJVUSED COMMANDS
Command lines might contain zero, one, or
more djvused commands and an optional comment. Multiple djvused
commands must be separated by a semicolon character ';'. Comments
are introduced by the '#' character and extend until the end of the
command line.
Selection commands
Multi-page DjVu documents are composed
of a number of component files. Most component files describe a
specific page of a document. Some component files contain
information shared by several pages such as shared image data,
shared annotations or thumbnails. Many djvused commands operate on
selected component files. All component files are initially
selected. The following commands are useful for changing the
selection.
- ls
- List all component files in the document. Each line contains an
optional page number, a letter describing the component file type,
the size of the component file, and the identifier of the component
file. Component file type letters P, I, A, and
T respectively stand for page data, shared image data,
shared annotation data, and thumbnail data. Page numbers are only
listed for component files containing page data.
- select [fileid]
- Select the component file identified by argument fileid.
Argument fileid must be either a page number or a component
file identifier. The select command selects all component
files when the argument fileid is omitted.
- select-shared-ant
- Select a component file containing shared annotations. Only one
such component file is supported by the current DjVu software. This
component file usually contains annotations pertaining to the whole
document as opposed to specific pages. An error message is
displayed if there is no such component file.
- create-shared-ant
- Create and select a component file containing shared
annotations. This command only selects the shared annotation
component file if such a component file already exists. Otherwise
it creates a new shared annotation component file and makes sure
that it is imported by all pages in the document.
Miscellaneous commands
- help
- Display a help message listing all commands supported by
djvused.
- n
- Print the total number of pages in the document.
- dump
- Display the EA IFF 85 structure of the
document or of the selected component file. A similar capability is
offered by program djvudump.
- size
- Display the width and the height of the selected pages. The
dimensions of each page are displayed using a syntax suitable for
direct insertion into the <EMBED...></EMBED> tags.
Text and annotation commands
- print-pure-txt
- Print the text stored in the hidden text layer of the selected
pages. A similar capability is offered by program djvutxt.
Structural information is sometimes represented by control
characters. Text from different pages is delimited by form feed
characters ("\f"). Lines are delimited by newline characters
("\n"). Columns, regions, and paragraphs are sometimes delimited by
vertical tab ("\013"), group separators ("\035") and unit
separators ("\037") respectively.
- print-txt
- Prints extensive hidden text information for the selected
pages. This information describes the structure of the text on the
document page and locates the structural elements in the page
image. The syntax of this output is described later in this man
page.
- remove-txt
- Remove the hidden text information from the selected component
files. For instance, executing commands select and
remove-txt removes all hidden text information from the DjVu
document.
- set-txt [djvusedtxtfile]
- Insert hidden text information into the selected pages. The
optional argument djvusedtxtfile names a file containing the
hidden text information. This file must contain data similar to
what is produced by command print-txt. When the optional
argument is omitted, the program reads the hidden text information
from the djvused script until reaching an end-of-file or a line
containing a single period.
- output-txt
- Prints a djvused script that reconstructs the hidden text
information for the selected pages. This script can later be edited
and executed by invoking program djvused with option
-f.
- print-ant
- Prints the annotations of the selected component file. The
annotation data is represented using a simple syntax described
later in this document.
- print-merged-ant
- Merge the annotations stored in the selected component files
with the annotations imported from other component files such as
the shared annotation component file.. The annotation data is
represented using a simple syntax described later in this document.
- remove-ant
- Remove the annotation information from the selected component
files. For instance, executing commands select and
remove-ant removes all annotation information from the DjVu
document.
- set-ant [djvusedantfile]
- Insert annotations into the selected component file. The
optional argument djvusedantfile names a file containing the
annotation data. This file must contain data similar to what is
produced by command print-ant. When the optional argument is
omitted, the program reads the annotation data from the djvused
script itself until reaching an end-of-file or a line containing a
single period.
- output-ant
- Print a djvused script that reconstructs the annotation
information for the selected pages. This script can later be edited
and executed by invoking program djvused with option
-f.
- print-meta
- Print the meta-data part of the annotations for the selected
component file. This command displays a subset of the information
printed by command print-ant using a different syntax.
Meta-data are organized as key-value pairs. Each printed line
contains the key name such as author, title,etc.,
contains the key name such as author, title,etc.,
followed by a tab character ("\t") and a double-quoted string
representing the UTF-8 encoded meta-data
value.
- set-meta [djvusedmetafile]
- Set the meta-data part of the annotations of the selected
component file. The remaining part of the annotations is left
unchanged The optional argument djvusedmetafile names a file
containing the meta-data. This file must contain data similar to
what is produced by command print-meta. When the optional
argument is omitted, the program reads the annotation data from the
djvused script itself until reaching an end-of-file or a line
containing a single period.
- output-all
- Print a djvused script that reconstructs both the hidden text
and the annotation information for the selected pages. This script
can later be edited and executed by invoking program djvused
with option -f.
Outline/bookmarks commands
- print-outline
- Print the outline of the document. Nothing is printed if the
document contains no outline.
- set-outline [djvusedoutlinefile]
- Insert outline information into the document. The optional
argument djvusedoutlinefile names a file containing the
outline information. This file must contain data similar to what is
produced by command print-outline. When the optional
argument is omitted, the program reads the hidden text information
from the djvused script until reaching an end-of-file or a line
containing a single period.
Thumbnail commands
- set-thumbnails sz
- Compute thumbnails of size szxsz pixels and
insert them into the document. DjVu viewers can later display these
thumbnails very efficiently without need to download the data for
each page. Typical thumbnail size range from 48 to 128 pixels.
- remove-thumbnails
- Remove the pre-computed thumbnails from the DjVu document. New
thumbnails can then be computed using command
set-thumbnails.
Save commands
The above commands only modify the memory
image of the DjVu document. The following commands provide means to
save the modified data into the file system.
- save
- Save the modified DjVu document back into the input file
djvufile specified by the arguments of the program
djvused. Nothing is done if the DjVu file was not modified.
Passing option -s program djvused is equivalent to
executing command save before exiting the program.
- save-bundled filename
- Save the current DjVu document as a bundled multi-page DjVu
document named filename. A similar capability is offered by
program djvmcvt.
- save-indirect filename
- Save the current DjVu document as an indirect multi-page DjVu
document. The index file of the indirect document will be named
filename. All other files composing the indirect document
will be saved into the same directory as the index file. A similar
capability is offered by program djvmcvt.
- save-page filename
- Save the selected component file into DjVu file
filename. The selected component file might import data from
another component file using the so-called inclusion ( INCL ) chunks. This command then produces a file with
unresolved references to imported data. Such a file should then be
made part of a multi-page document containing the required data in
other component files.
- save-page-with filename
- Save the selected component file into DjVu file
filename. All data imported from other component files is
copied into the output file as well. This command always produces a
usable DjVu file. On the other hand, collecting several such files
into a multi-page document might lead to useless data
replication.
DJVUSED FILE FORMATS
Djvused uses a simple parenthesized syntax to represent both
annotations and hidden text.
- *
- This syntax is the native syntax used by DjVu for storing
annotations. Program djvused simply compresses the
annotation data using the bzz(1)
algorithm.
- *
- This syntax differs from the native syntax used by DjVu for
storing the hidden text. Program djvused performs the
translations between the compact binary representation used by DjVu
and the easily modifiable parenthesized syntax.
General syntax
Djvused files are ASCII text files. The legal characters in djvused files
are the printable ASCII characters and the
space, tab, cr, and nl characters. Using other characters has
undefined results.
Djvused files are composed of a sequence of expressions
separated by blank characters (space, tab, cr, or nl). There are
four kind of expressions, namely integers, symbols, strings and
lists.
- Integers:
- Integer numbers are represented by one or more digits, with the
usual interpretation.
- Symbols:
- Symbols, or identifiers, are sequences of printable ascii
characters representing a name or a keyword. Acceptable characters
are the alpha-numeric characters, the underscore "_", the minus
character "-", and the hash character "#". Names should not begin
with a digit or a minus character.
- Strings:
- Strings denote an arbitrary sequence of bytes, usually
interpreted as a sequence of UTF-8 encoded
characters. Strings in djvused files are similar to strings in the
C language. They are surrounded by double quote characters. Certain
sequences of characters starting with a backslash ("\") have a
special meaning. A backslash followed by letter "a", "b", "t", "n",
"v", "f", "r", "\", and stands for the ascii character BEL(007),
BS(008), HT(009), LF(010), VT(011), FF(012), CR(013),
BACKSLASH(134) and DOUBLEQUOTE(042) respectively. A backslash
followed by one to three digits stands for the byte whose octal
code is expressed by the digits. All other backslash sequences are
illegal. All non printable ascii characters must be escaped.
- Lists:
- Lists are sequence of expressions separated by blanks and
surrounded by parentheses. All expressions types are acceptable
within a list, including sub-lists.
Hidden text syntax
The building blocks of the hidden text
syntax are lists representing each structural component of the
hidden text. Structural components have the following form:
- (type xmin xmax ymin
ymax ... )
The symbol type must be one of page,
column, region, para, line,
word, or char, listed here by decreasing order of
importance. The integers xmin, xmax, ymin, and
ymax represent the coordinates of a rectangle indicating the
position of the structural component in the page. Coordinates are
measured in pixels and have their origin at the bottom left corner
of the page. The remaining expressions in the list either is a
single string representing the encoded text associated with this
structural component, or is a sequence of structural components
with a lesser type.
The hidden text for each page is simply represented by a single
structural element of type page. Various level of structural
information are acceptable. For instance, the page level component
might only specify a page level string, or might only provide a
list of lines, or might provide a full hierarchy down to the
individual characters.
Outline/Bookmark syntax
The outline syntax is a single list
of the form
- (bookmarks ...)
The first element of the list is symbol bookmarks. The
subsequent elements are lists representing the toplevel outline
entries. Each outline entry is represented by a list with the
following form:
- (title url ... )
(title url ... )
The string title is the title of the outline entry. The
The string title is the title of the outline entry. The
string url is composed of the hash character ("#") followed
by either the component file identifier or the page number
corresponding to the outline entry. The remaining expressions
describe subentries of this outline entry.
Annotation syntax
Annotations are represented by a sequence
of annotation expressions. The following annotation expressions are
recognized:
- (background color)
- Specify the color of the viewer area surrounding the DjVu
image. Colors are represented with the X11 hexadecimal syntax
#RRGGBB. For instance, #000000 is black and
#FFFFFF is white.
- (zoom zoomvalue)
- Specify the initial zoom factor of the image. Argument
zoomvalue can be one of stretch, one2one,
width, page, or composed of the letter d
followed by a number in range 1 to 999 representing a zoom factor
(such as in d300 or d150 for instance.)
- (mode modevalue)
- Specify the initial display mode of the image. Argument
modevalue is one of color, bw, fore, or
back.
- (align horzalign vertalign)
- Specify how the image should be aligned on the viewer surface.
By default the image is located in the center. Argument
horzalign can be one of left, center, or
right. Argument vertalign can be one of top,
center, or bottom.
- (maparea url comment area
...)
- Define an hyper-link for the specified destination.
-
Argument url can have one of the following forms:
- href
(url href target)
where href is a string representing the destination and
target is a string representing the target frame for the
hyper-link, as defined by the HTML anchor
tag <A>. The destination string
href can be an arbitrary URL or can
be composed of the hash character ("#") followed by either a
component file identifier or a page number. Page numbers may be
prefixed with an optional sign to represent a page displacement.
For instance the strings #-1 and #+1 can be used to
access the previous page and the next page.
Argument comment is a string that might be displayed by
the viewer when the user moves the mouse over the hyper-link.
Argument area defines the shape and the location of the
hyperlink. The following forms are recognized:
- (rect xmin ymin width
height)
(oval xmin ymin width
height)
(poly x0 y0 x1 y1 ...
)
(text xmin ymin width
height) - Not implemented.
(line x0 y0 x1 y1) - Not
implemented.
All parameters are numbers representing coordinates. Coordinates
are measured in pixels and have their origin at the bottom left
corner of the page.
The remaining expressions in the maparea list represent
the visual effect associated with the hyper-link.
A first set of options defines how borders are drawn for
rect, oval, polygon, or text hyperlink
areas.
- (none)
(xor)
(border color)
(shadow_in [thickness])
(shadow_out [thickness])
(shadow_ein [thickness])
(shadow_eout [thickness])
where parameter color has syntax #RRGGBB as
described above, and parameter thickness is an integer in range 1
to 32. The last four border options are only supported for
rect hyperlink areas. The default border is a simple black
line. Border options do not apply to line areas.
When a border option is specified, the border becomes visible
when the user moves the mouse over the hyperlink. The border may be
made always visible by using the following option:
- (border-avis)
The following two options may be used with rect hyperlink
areas. The complete area will be highlighted using the specified
color at the specified opacity (0-100, default 50).
- (hilite color)
(opacity op) - Not implemented.
This is often used with an empty URL for
simply emphasizing a specific segment of an image.
The following three options may be used with line areas to
specify an optional ending arrow, the line width and color. The
default is a black line with width 1 and without arrow.
- (arrow) - Not implemented.
(width w) - Not implemented.
(lineclr color) - Not implemented.
Finally the following three options can be used with text areas.
The default background color is transparent. The default text color
is black. The pushpin option indicates that the text is
symbolized by a small pushpin icon. Clicking the icon reveals the
text.
- (backclr bkcolor) - Not implemented.
(textclr txtcolor) - Not implemented.
(pushpin) - Not implemented.
- (metadata ... (key value) ... )
- Define meta-data entries. Each entry is identified by a symbol
key representing the nature of the meta data entry. Typical
keys include year, booktitle, editor,
keys include year, booktitle, editor,
author, etc. It is suggested to use the same key names as
the BibTeX bibliography system. String value represents the
value associated with the corresponding key.
LIMITATIONS
The current version of program djvused
only supports selecting one component file or all component files.
There is no way to select only a few component files.
CREDITS
This program was initially written by L'eon Bottou
<leonb@users.sourceforge.net>
and was improved by Yann Le Cun <profshadoko@users.sourceforge.net>,
Florin Nicsa, Bill Riemers <docbill@sourceforge.net>
and many others.
SEE ALSO
djvu(1),
djvutxt(1),
djvmcvt(1),
djvudump(1),
bzz(1)