sgmlproc - normalize and process SGML documents
sgmlproc ( [ -v option=value ] [ -- [ -e entity=replacement text or sysid ] [ -o outfile ]] file ) | -- -h | -- -V
sgmlproc reads SGML markup text from file and
outputs SGML conforming to a specified target document
type, or just the base document type of the input
markup, if no particular target document type has
been requested.
The output of sgmlproc is a document which does
have all used markup minimization features (such
as tag omission, attribute name and value omission,
attribute quoting, and short references) and other
permitted variant syntax (such as in whitespace or
namecase usage) transformed into the respective
canonical form, and references to any general entity
expanded into the respective replacement text.
sgmlproc parses and validates input markup according to
the markup rules declared in the document's document type
declaration(s), if any. Moreover, sgmlproc applies
transformation and templating as declared in the link
process declaration(s) of the input document, if any,
and if instructed to by requesting a particular target
document type and/or activating a link process via the
target_document_type_name and active_lpd_names
options, respectively.
-o outfile
Write output to the given outfile rather than standard output
-V
Print version and built-in features, and exit
-h
Print a short synopsis, and exit
-v output_format=FMT
where FMT is one of sgml, html (the default), xml, or none, and
sgml outputs markup according to the target document type's
declaration, eg. with omitted end tags for elements
having declared or implied content EMPTY, and namecasing
rules as specified by the applicable SGML declaration
html outputs markup with lowercase
elements and attribute names and name tkens
xml outputs end-element tags for all elements, and with preserving
the namecase of element, attribute and notation names and name tokens
none suppresses output
-v dtd_handling=VAL
Option causing inclusion or suppression of output of DTD and XML
declarations where VAL is one of
preserve (the default), omit, or force
preserve includes base declaration sets and/or XML declarations
in the output when parsed from source markup or implied
by the result markup document type of an active explicit link process
omit suppresses output of declaration sets and XML declarations
force outputs the fixed string <!doctype html> as DTD
(only used in combination with -v output_format=html)
-v forward_link_attributes=VAL
A non-empty, non-zero value causes sgmlproc to produce
link attributes in output content
Normally, sgmlproc uses link attributes only
for determining a template and any template parameters
to apply, if a template is implied in a given
element context of an active link process,
and outputs attributes from template SGML documents
-v suppress_warnings=VAL
A non-empty, non-zero value causes sgmlproc to
not print warnings (on the standard error output stream)
-v treat_recoverable_as_fatal_errors=YES
YES causes sgmlproc to abort processing on
the first error, whereas by default, or when given
any other value than YES, sgmlproc
will print an error message and continue processing,
and only abort processing on unrecoverable
errors
-v strict_iso8879_compatibility=YES|NO
Specifying strict_iso8879_compatibility=YES
switches on the following checks mandated by ISO 8879,
but not enforced by default (or when specifying value
strict_iso8879_compatibility=NO):
Link sets with mutliple rules declared on the same (source) element must all have link attribute specifications
Declarations of #CURRENT default values for data
attributes (attributes of notations) are rejected
-v system_specific_entity_path=DIR
Specifies the directory where sgmlproc looks
for resolving system-specific entities
By default, sgmlproc looks in the main input
file's directory for files, unless a replacement
value or system identifier for a system-specific
entity has been supplied on the command line
via the -e option
File names within the directory for system-specific
entities are resolved by interpreting the entity name
as file name, honoring the effective settings for
SYNTAX NAMECASE ENTITY
-e ent=replacement text
Sets replacement text as the value of the ent
system-specific entity
-e ent=<literal>replacement text
Sets replacement text as the value of the ent
system-specific entity
This is a variant for the aforementioned option using
the <literal> formal system identifier notation syntax
to represent string literals
-e ent=<osfile>file name
Sets file name as the file to read the replacement
value from for the ent system-specific entity
Note that in addition, the special system-specifc entity
sgmlstdin can be used to supply the content of the
<osfd>0 formal system identifier (in preference to
reading content from the standard input for <osfd>0).
-v target_document_type_name=DOCTYPE
Specifies the document type name to produce from source document;
DOCTYPE must be the name of a document type
definition declared in the source document
-v active_lpd_names=LINKTYPE[,LINKTYPE,...]
Specifies one or more (comma-separated) link process name(s) to activate
-v system_specific_implied_lpd_names=LINKTYPE[,LINKTYPE,...]
Specifies a single name or a a comma-separated list of names of additional link process(es) treated as if declared as system-specifc LPDs following actual link process declarations in the document prolog
When giving a name of a link process actually declared in
the document prolog, the respective link process name
parameter value is ignored (a link process declaration in
the document prolog is always used as effective link process
declaration in preference to one specified via
system_specific_implied_lpd_names)
-v system_specific_implied_lpd_source_document_type_names=DOCTYPE[DOCTYPE,...]
system_specific_implied_lpd_source_document_type_names and
system_specific_implied_lpd_result_document_type_names can
contain (comma- or space-separated) names of the source and
result document type name, resp., of the link processes specified
at the respective position in system_specific_implied_lpd_names
(where all but the last link process must contain
names of explicit link processes)
These parameters are only used internally in nested sgmlproc
invocations for propagating source link processing context and
state to sub processes, and are not supported (nor required)
on basic sgmlproc execution where templates are only executed
in the last, or only, link process of a link process pipeline
Only available for sgmljs.net SGML Pro
-v system_specific_implied_lpd_result_document_type_names=DOCTYPE[DOCTYPE, ...]
See above
-v enable_lax_templates=VAL
A non-empty, non-zero value allows a template document
to declare a document type with an external declaration
set (with the value of the expected_external_dtd_subset_identifier
optino as system identifier)
By default, a template document is required to receive
markup declarations from its calling context by specifying
<!DOCTYPE ... SYSTEM> as base document type, and/or as
a target document type
-v expected_external_dtd_subset_identifier=sysid|#IMPLIED
Specifies the system identifier of an external DTD subset that is expected for the main document when "lax" templating is permitted
#IMPLIED indicates that the base DTD is expected to
be <!DOCTYPE ... SYSTEM> (where the doctype
can be #IMPLIED or specified), or that the prolog may
be omitted alltogether
-v disable_referential_attributes=VAL
A non-empty, non-zero value causes attributes with
declared value ID, IDREF, IDREFS, ENTITY, ENTITIES,
or NOTATION, or attributes with #CURRENT default value
to be rejected as recoverable error in content, irrespective
of whether declared in the applicable document type definition
This option is used internally to enforce referential integrity when processing "strict" templates in recursive subcontext invocations
-v disable_data_entity_references=VAL
A non-empty, non-zero value causes parsing to produce a recoverable error on data entity references in content
This option is used internally to enforce referential integrity when processing "strict" templates in recursive subcontext invocations
-v sax_event_tracing=VAL
Specifying sax_event_tracing with any value, including an empty
value, causes sgmlproc to print info about the declaration
set from which the element originates (either a document type
name for parsed elements, or a link process name for produced
result element)
The info is printed in SGML comments in regular output, next to the produced element
Not available in all sgmlproc builds
-v sax_error_context_info_collection=VAL
Specifying sax_error_context_info_collection with any value,
including an empty value, causes sgmlproc to print
the context location (system identifier of document and line
number) of not only the document where an error occurs,
but also of the document(s) and place(s) where the erroneous
document is included as entity in the running processing
context
sax_error_context_info_collection is normally switched off
to avoid processing overhead
Not available in all sgmlproc builds
-v disable_path_relativization=VAL
Specifying disable_path_relativization with any value,
including an empty value, causes sgmlproc to print
file names in error messages as absolute rather than
relative paths
Used to produce location-independent error message output
in internal sgmlproc testing
-v strict_markdown_pl_compatibility=YES|NO
Specifying strict_markdown_pl_compatibility=YES
switches on emulation of Markdown_1.0.1.pl (John Gruber's
original markdown formatter) in producing HTML from markdown
Specifically, two newlines (but not more) at the end of a
code block are collapsed into a single newline (whereas with
strict_markdown_pl_compatibility=NO, any number
of trailing newlines at the end of a code block
is collapsed into a single newline)
Moreover, three newlines are produced from a blank code line
-v keep_trailing_codeblock_newlines=VAL
A non-empty, non-zero value causes parsing to
reproduce blank lines and newline characters at
the end of codeblocks as parsed from source
(unless strict_markdown_pl_compatibility is
set to YES)
-v prune_singleton_html_paras_in_listitems=YES|NO
A value of YES causes sgmlproc to remove HTML p
elements (making their content appear directly as child
content of the parent li or dd element), if that p
element is the sole child of the parent element
p elements specified in markdown HTML blocks are not
pruned
These options are switched on by default for processing
SGML on a web server or browser using sgmlweb to
prevent markup injection and denial-of-service attacks,
but aren't switched on for sgmlproc command-line SGML
processing.
-v restrict_parameter_entity_expansion=YES|NO
A value of YES causes sgmlproc to abort
an attempt to perform parameter entity expansion in
entity declarations outside replacement text literals with
an unrecoverable error condition, except if the value expands
to (the expansion of) 'SYSTEM "%PATH_TRANSLATED"' or
"SYSTEM '%PATH_TRANSLATED'"
-v disable_referential_attributes=VAL
See description of disable_referential_attributes above
-v disable_data_entity_references=VAL
See description of disable_data_entity_references above
These options set or override effective SGML declaration properties.
-v sgmldecl_syntax_namecase_general=YES|NO
Sets the effective value of the SYNTAX NAMESCASE GENERAL property
-v sgmldecl_syntax_namecase_entity=YES|NO
Sets the effective value of the SYNTAX NAMECASE ENTITY
property
-v sgmldecl_features_minimize_omittag=YES|NO
Sets the effective value of the FEATURES MINIMIZE OMIITAG
property
-v sgmldecl_features_minimize_rank=YES|NO
sets the effective value of the FEATURES MINIMIZE RANK
property
-v sgmldecl_features_minimize_implydef_doctype=YES|NO
Sets the effective value of the FEATURES MINIMIZE IMPLYDEF DOCTYPE
property
-v sgmldecl_features_minimize_implydef_element=YES|NO
Sets the effective value of the FEATURES MINIMIZE IMPLYDEF ELEMENT
to either YES or NO
-v sgmldecl_features_minimize_implydef_element_anyother=YES|NO
If specified as YES, and specified in addition to
-v sgmldecl_features_minimize_implydef_element=YES,
this sets the effective value of the
FEATURES MINIMIZE IMPLYDEF ELEMENT property to ANYOTHER
FEATURES MINIMIZE IMPLYDEF ELEMENT ANYOTHER is the default
used by sgmlproc
-v sgmldecl_features_minimize_implydef_attlist=YES|NO
Sets the effective value of the FEATURES MINIMIZE IMPLYDEF ATTLIST
property
-v sgmldecl_features_minimize_implydef_entity=YES|NO
Sets the effective value of the FEATURES MINIMIZE IMPLYDEF ENTITY
property
-v sgmldecl_features_minimize_emptynrm=YES|NO
Sets the effective value of the FEATURES MINIMIZE EMPTYNRM
property
-v sgmldecl_features_minimize_shorttag_attrib_omitname=YES|NO
Sets the effective value of the FEATURES MINIMIZE SHORTTAG ATTRIB OMITNAME
property
-v sgmldecl_features_minimize_shorttag_starttag_empty=YES|NO
Sets the effective value of the FEATURES MINIMIZE SHORTTAG STARTTAG EMPTY
property
-v sgmldecl_features_minimize_shorttag_starttag_netenabl=IMMEDNET
Sets the effective value of the FEATURES MINIMIZE SHORTTAG STARTTAG NETENABL
property to the IMMEDNET value used in WebSGML (the Annex K revision
to ISO 8897:1986) for supporting XML-style empty elements
-v sgmldecl_features_minimize_shorttag_endtag_empty=YES|NO
Sets the effective value of the FEATURES MINIMIZE SHORTTAG ENDTAG EMPTY
property
-v sgmldecl_features_other_validity=TYPE|NOASSERT
Sets the effective value of the FEATURES OTHER VALIDITY property
-v sgmldecl_features_other_formal=YES|NO
Sets the effective value of the FEATURES OTHER FORMAL
property
-v sgmldecl_features_other_urn=YES|NO
Sets the effective value of the FEATURES OTHER URN
property
Only meaningful if -v sgmldecl_features_other_formal=YES
is also specified
sgmlproc leaves an exit status of 0 on successful
completion, a value other than 0 otherwise.
sgmlproc prints error and warning messages
with references to the file and line number of error
locations and details to the standard error stream.
Note the portable sgmlproc program implemented in
the awk programming language may in some builds
silently ignore misspelled options. This is an awk
limitation (like the required use of the -- end of
arguments marker described below).
To create canonial markup from mydoc.sgm:
sgmlproc mydoc.sgm
To create XML markup (with end-element tags for
elements declared EMPTY such as HTML's img element)
from mydoc.sgm:
sgmlproc -v output_format=xml mydoc.sgm
To activate a link process pipeline for creating HTML markup
(mydoc.sgm is expected to declare one or more link process
declaration sets with html result markup in its document
prolog), or just normalize input markup if mydoc.sgm already
uses html as base document type:
sgmlproc -v target_document_type_name=html mydoc.sgm
To produce HTML as described before, with using the
text some text as replacement text for the
myent system-specific entity:
sgmlproc -v target_document_type_name=html -- -e myent='some text' mydoc.sgm
Note using the -- end of arguments marker is required for compatibility
with the portable sgmlproc program implemented in the awk programming
language only. It isn't required, but recognized and tolerated, by
other sgmlproc implementations such as the ECMAScript implementation
for Node.js.