The Full WHATWG HTML RD2001 DTD, like former versions, is a transcription of WHATWG's HTML Review Draft specification prose published January 29th, 2020, into an SGML DTD. The Full DTD covers all elements of HTML, SVG, MathML, and the ARIA attributes, and its construction is described in the reference for the W3C HTML 5 DTD, with only modifications for the current version described in this document.
The Minimal WHATWG HTML RD2001 DTD,
also like former versions, is a compact DTD containing
only essential parsing rules for HTML.
As only HTML's special rules for HTML void elements and
enumerated attributes are included (others being admitted
freely), the Minimal WHATWG HTML's DTD
usefulness for validation purposes is limited. Instead, the
purpose of the Minimal HTML DTD is to provide a
minimal bundled declaration set for content parsing and
production tasks for modern and idiomatic HTML in sgmljs.net
and other SGML software with support for resolving
declaration sets via catalog resolution (in sgmljs.net,
the Minimal HTML DTD is resolved and accessed by
the about:legacy-compat system identifier).
This DTD is based on HTML review draft 20-01 published as a W3C recommendation on January 28, 2021, which is the first (and, so far, only) W3C recommendation based on a WHATWG HTML review draft, and the first W3C recommendation since 2017.
Apart from a larger set of small changes to be expected for the first revision since years as explained below, Review Draft 200229, accepted as W3C HTML recommendation, is also the first W3C HTML specification published under the Memorandum of Understanding between WHATWG and W3C which prevents W3C from directly redacting specification text. As such, HTML Review Draft 200219 sees notable change in two long standing issues where upstream (WHATWG) HTML specification text was accepted when it was explicitly rejected in previous W3C versions despite lack of material change:
multiple main elements are allowed, reflected by the nav,
article, and aside content models now not forbidding main
descendant content
hgroup was included in W3C HTML for the first time; note
in WHATWG HTML, hgroup, as orginally introduced for hiding headings
having multiple ranks from the so-called HTML 5 outlining algorithm to
prevent inference of undesired sections, had been deprecated
for many years, even though its content model specification
hasn't changed (which has been the the reason of the W3C editors
for not including it); hgroup's content model is only changed
in the upcoming Review Draft 2023
The changes are detailed in the following sections.
Added the hgroup and slot elements (complementing the template
element already part of previous HTML specifications).
The hgroup, meta, and slot elements were added
to the flow content category and parameter entity; meta
and slot were also added to the phrasing category and
parameter entity, resp., while hgroup was added to the flow_only
and the heading parameter entities.
A menu element has been re-introduced with changed content
rules and semantics; it is being listed under grouping elements,
and has been removed as a legacy element. Note the menuitem
element that used to be part of the original menu content
model isn't anymore used at all but remains present as a
legacy element since it admits end-element tag omission.
The style element has been removed from the flow content
category, reflecting final abandon of the scoped CSS concept
in HTML specs.
img and object, and the legacy keygen element, have
been made member of the interactive content category.
Changed content models or inclusion or exclusion constraints
of the article, nav, aside, header, footer´,p,figure,
ruby,legend, andcanvas` elements.
Note that heading elements as content of legend elements were
valid before Review Draft 200129, and are valid in current WHATWG
specifications again, hence their disallowance in Review Draft 200129
can be considered erratic. To use the declaration of eg. Review Draft
230116 instead, you can place the following markup declarations into
the internal subset:
<!ENTITY html.legend.element "IGNORE">
<!ELEMENT legend - - (#PCDATA|%phrasing;|%heading;)* -(main)>
Retained the rb and rtc elements (removed from the specification but
allowing tag omission) as legacy elements.
The address element now appears under sectioning when it
would formerly be listed under grouping cortent.
Added event handler attributes onformdata, oncopy, oncontextmenu,
oncut, onpaste, onformdata, onsecuritypolicyviolation,
onslotchange, and onscrollend.
Removed event handler attributes onabort, onloadend,
and onshow.
Added global attributes enterkeyhint, inputmode, is, itemid,
itemprop, itemref, itemscope, itemtype, and slot. Also added
nonce as global attribute where it used to be declared for specific
elements in previous DTDs.
Added body event handler attribute onmessageerror.
Note that the contenteditable, the hidden, the
spellcheck, and the translate global attributes can have the
empty string as value even though the HTML spec advises to
not specifying the attribute in these cases in the first place.
This is not reflected in the SGML DTD.
The same is true of the Fetch API destination (as) (cf
Section 4.2.4) and the CORS settings (crossorigin) attributes (defined by
the Fetch Spec and the referrer policy (referrerpolicy)
attribute (defined by the Referrer Policy spec). These
two specifications have no versioning (not even equivalent to a
Public Review Draft), nor other formal alignment with the HTML
specification, and also contain wildly non-normative language,
and thus, while their snapshot values at the time of publication
can be conditionally included via parameter entities, aren't
included in the HTML DTD by default.
Removed the rev attribute on the link and a elements.
Removed the longdesc attribute on the img element.
Removed the typemustmatch attribute on the object element.
Removed the hreflang attribute on the area element.
The autofocus element has been formally made applicable to all
HTML elements in WHATWG HTML (section 6.6.7) where it was defined only
in the context of form controls in previous revisions; this is
reflected by promoting autofocus as global attribute.
Removed the border attribute on the table element.
Removed the charset attribute on the script element.
Added the usemap attribute on object element; note the
usemap attribute is removed again in the next review draft
(see object-usemap) along with content model changes.
Added the sizes, integrity, imagesrcset, imagesizes, as,
and color attributes on the link element.
Added the ping attribute (as a CDATA attribute) on the a and area elements.
Added the decoding attribute to the img element.
Added the loading attribute to the iframe element.
Added the playsinline attribute to the video element.
Added the rel attribute to the form element.
Added the nomodule attribute to the script element.
The enumerated values for the http-equiv attribute
(section 4.2.5.3) are now represented in the DTD.
Changed the width and height attributes (on the img, iframe,
embed, object, video, and canvas elements and the width attribute
on the input element) to have NUMBER declared value.
The attribute sandbox on the iframe element
allows multiple space-separated values hence has been remodelled
as having declared value NMTOKENS.
The enumerated values for the autocomplete attribute
(section 4.10.3) and the type attribute on the input and button
element are now represented in the DTD.
In previous DTDs, the ARIA role attribute wasn't actually declared
(only attributes for ARIA states and properties were). This has been
fixed. Note unlike role, the tabindex attribute is, and has always
been, declared as part of HTML. Note this was fixed in the W3C HTML
5.2 DTD as well.
Moreover, the integration of ARIA has been changed such that
declared attribute defaults for ARIA state and property attributes
are customized to become #IMPLIED ie. have no material default
value specified. This is in line with what's done with HTML attribute
defaults where applicable, and due to the expectation that an SGML
processor adds default values for attributes where those are declared,
which is however in conflict with HTML's and ARIA's expectation that
an attribute taking on its default value should be left unspecified.
While this change isn't a fix per se, it has been applied to the
previous HTML DTD (W3C HTML 5.2, but no prior versions) as well.
In previous versions, exclusion exceptions for the main element
had been placed on div and legend elements when they should only
apply to sectioning elements with explicit exclusion of main such as
article, nav, and aside. Note main itself doesn't exclude main
descendants in its content model. Note this fix has been applied to
the previous HTML DTD (W3C HTML 5.2, but no prior versions) as well.
The HTML Review Draft specification states that
User agents that implement SVG must implement the SVG 2 specification, and not any earlier revisisions.
The SVG working group at W3C hasn't published a formal specification for SVG 2 as language in the form of a DTD or RelaxNG grammar, like was done for previous versions. Moreover, the SVG 2 specification is at candidate recommendation stage at this time, and has been since 2018, reflecting uncertainty regarding whether proposed recommendation or recommendation status can be reached eventually, considering browser vendors have voiced interest in supporting very few conservative SVG 2 additions (such as for streamlining SVG/CSS integration), but not committed to new SVG 2 features as a whole, while continued existence of the SVG working group per its charter, and even W3C as its hosting organization isn't guaranteed.
In keeping with previous HTML 5.x DTDs, the (extremely modular) SVG 1.1 DTD is further extended for SVG 2, but only with those features that are also accepted and implemented for the SVG subset recognized by W3C's nu validator (the SVG RelaxNG grammar used internally by the nu validator is also derived from the SVG 1.1 DTD we're customizing here), up to changes made until May 25th, 2021. Specifically, the following customizations are applied:
add feDropShadow as element and filter primitive
(in line with section 12.2.6.5's listing of feDropShadow among
mapped camel-case element names for SVG; note feDropShadow technically
was defined as part of SVG Filter Effect Module Level 1, hence
as part of SVG 1.* rather than SVG 2
additional enumerated values for the operator attribute on
feComposite elements, the mode attribute on feBlend
elements, and declaration of the x, y, width, and height
attributes on symbol elements (note the nu validator only adds width
and height)
note the SVG desc element remains unchanged (isn't changed
to allow any child content)
Moreover, the HTML specification makes the specific requirements that
the content model for the SVG title element inside HTML documents is
phrasing content (this further constrains the requirements given in
SVG 2) (section 4.8.17)
the svg element falls into the embedded content, phrasing content,
flow content [and palpable content] categories for the purposes of
the content models in this specification (section 4.8.17)
when the SVG foreignObject element contains elements from the HTML
namespace, such elements must all be flow content
HTML defines the nonce attribute applying to SVG and other foreign
elements (section 2.6.6)
which have been applied as well.
Finally, generic XML attributes in need of declaration within an SGML
context (xml:lang, xml:space, and id, including their no-namespace
HTML variants if applicable) are declared (see section 3.2.6.2).
Note XLink attributes are declared by the SVG 1.1 DTD (see also section 12.1.2.3).
Customization of MathML 3 DTD for embedding into HTML includes the following specific requirements:
When the MathML annotation-xml element contains elements from the HTML namespace, such elements must all be flow content" (section 4.8.16)
When the MathML token elements (mi, mo, mn, ns, and mtext) are descendants of HTML elements, they may contain phrasing content elements from the HTML namespace (section 4.8.16)
Finally, like with SVG, generic XML attributes in need of declaring
no-namespace HTML variants for xml:lang and xml:space are declared.