Home | History | Annotate | Download | only in doc
      1    #8.2.4 Tokenization Table of contents 8.4 Serializing HTML fragments
      2 
      3    WHATWG
      4 
      5 HTML 5
      6 
      7 Draft Recommendation  13 January 2009
      8 
      9     8.2.4 Tokenization  Table of contents  8.4 Serializing HTML
     10    fragments 
     11 
     12     8.2.5 Tree construction
     13 
     14    The input to the tree construction stage is a sequence of tokens from
     15    the tokenization stage. The tree construction stage is associated with
     16    a DOM Document object when a parser is created. The "output" of this
     17    stage consists of dynamically modifying or extending that document's
     18    DOM tree.
     19 
     20    This specification does not define when an interactive user agent has
     21    to render the Document so that it is available to the user, or when it
     22    has to begin accepting user input.
     23 
     24    As each token is emitted from the tokeniser, the user agent must
     25    process the token according to the rules given in the section
     26    corresponding to the current insertion mode.
     27 
     28    When the steps below require the UA to insert a character into a node,
     29    if that node has a child immediately before where the character is to
     30    be inserted, and that child is a Text node, and that Text node was the
     31    last node that the parser inserted into the document, then the
     32    character must be appended to that Text node; otherwise, a new Text
     33    node whose data is just that character must be inserted in the
     34    appropriate place.
     35 
     36    DOM mutation events must not fire for changes caused by the UA parsing
     37    the document. (Conceptually, the parser is not mutating the DOM, it is
     38    constructing it.) This includes the parsing of any content inserted
     39    using document.write() and document.writeln() calls. [DOM3EVENTS]
     40 
     41    Not all of the tag names mentioned below are conformant tag names in
     42    this specification; many are included to handle legacy content. They
     43    still form part of the algorithm that implementations are required to
     44    implement to claim conformance.
     45 
     46    The algorithm described below places no limit on the depth of the DOM
     47    tree generated, or on the length of tag names, attribute names,
     48    attribute values, text nodes, etc. While implementors are encouraged to
     49    avoid arbitrary limits, it is recognized that practical concerns will
     50    likely force user agents to impose nesting depths.
     51 
     52       8.2.5.1 Creating and inserting elements
     53 
     54    When the steps below require the UA to create an element for a token in
     55    a particular namespace, the UA must create a node implementing the
     56    interface appropriate for the element type corresponding to the tag
     57    name of the token in the given namespace (as given in the specification
     58    that defines that element, e.g. for an a element in the HTML namespace,
     59    this specification defines it to be the HTMLAnchorElement interface),
     60    with the tag name being the name of that element, with the node being
     61    in the given namespace, and with the attributes on the node being those
     62    given in the given token.
     63 
     64    The interface appropriate for an element in the HTML namespace that is
     65    not defined in this specification is HTMLElement. The interface
     66    appropriate for an element in another namespace that is not defined by
     67    that namespace's specification is Element.
     68 
     69    When a resettable element is created in this manner, its reset
     70    algorithm must be invoked once the attributes are set. (This
     71    initializes the element's value and checkedness based on the element's
     72    attributes.)
     73      __________________________________________________________________
     74 
     75    When the steps below require the UA to insert an HTML element for a
     76    token, the UA must first create an element for the token in the HTML
     77    namespace, and then append this node to the current node, and push it
     78    onto the stack of open elements so that it is the new current node.
     79 
     80    The steps below may also require that the UA insert an HTML element in
     81    a particular place, in which case the UA must follow the same steps
     82    except that it must insert or append the new node in the location
     83    specified instead of appending it to the current node. (This happens in
     84    particular during the parsing of tables with invalid content.)
     85 
     86    If an element created by the insert an HTML element algorithm is a
     87    form-associated element, and the form element pointer is not null, and
     88    the newly created element doesn't have a form attribute, the user agent
     89    must associate the newly created element with the form element pointed
     90    to by the form element pointer before inserting it wherever it is to be
     91    inserted.
     92      __________________________________________________________________
     93 
     94    When the steps below require the UA to insert a foreign element for a
     95    token, the UA must first create an element for the token in the given
     96    namespace, and then append this node to the current node, and push it
     97    onto the stack of open elements so that it is the new current node. If
     98    the newly created element has an xmlns attribute in the XMLNS namespace
     99    whose value is not exactly the same as the element's namespace, that is
    100    a parse error.
    101 
    102    When the steps below require the user agent to adjust MathML attributes
    103    for a token, then, if the token has an attribute named definitionurl,
    104    change its name to definitionURL (note the case difference).
    105 
    106    When the steps below require the user agent to adjust foreign
    107    attributes for a token, then, if any of the attributes on the token
    108    match the strings given in the first column of the following table, let
    109    the attribute be a namespaced attribute, with the prefix being the
    110    string given in the corresponding cell in the second column, the local
    111    name being the string given in the corresponding cell in the third
    112    column, and the namespace being the namespace given in the
    113    corresponding cell in the fourth column. (This fixes the use of
    114    namespaced attributes, in particular xml:lang.)
    115 
    116    Attribute name Prefix Local name    Namespace
    117    xlink:actuate  xlink  actuate    XLink namespace
    118    xlink:arcrole  xlink  arcrole    XLink namespace
    119    xlink:href     xlink  href       XLink namespace
    120    xlink:role     xlink  role       XLink namespace
    121    xlink:show     xlink  show       XLink namespace
    122    xlink:title    xlink  title      XLink namespace
    123    xlink:type     xlink  type       XLink namespace
    124    xml:base       xml    base       XML namespace
    125    xml:lang       xml    lang       XML namespace
    126    xml:space      xml    space      XML namespace
    127    xmlns          (none) xmlns      XMLNS namespace
    128    xmlns:xlink    xmlns  xlink      XMLNS namespace
    129      __________________________________________________________________
    130 
    131    The generic CDATA element parsing algorithm and the generic RCDATA
    132    element parsing algorithm consist of the following steps. These
    133    algorithms are always invoked in response to a start tag token.
    134     1. Insert an HTML element for the token.
    135     2. If the algorithm that was invoked is the generic CDATA element
    136        parsing algorithm, switch the tokeniser's content model flag to the
    137        CDATA state; otherwise the algorithm invoked was the generic RCDATA
    138        element parsing algorithm, switch the tokeniser's content model
    139        flag to the RCDATA state.
    140     3. Let the original insertion mode be the current insertion mode.
    141     4. Then, switch the insertion mode to "in CDATA/RCDATA".
    142 
    143       8.2.5.2 Closing elements that have implied end tags
    144 
    145    When the steps below require the UA to generate implied end tags, then,
    146    while the current node is a dd element, a dt element, an li element, an
    147    option element, an optgroup element, a p element, an rp element, or an
    148    rt element, the UA must pop the current node off the stack of open
    149    elements.
    150 
    151    If a step requires the UA to generate implied end tags but lists an
    152    element to exclude from the process, then the UA must perform the above
    153    steps as if that element was not in the above list.
    154 
    155       8.2.5.3 Foster parenting
    156 
    157    Foster parenting happens when content is misnested in tables.
    158 
    159    When a node node is to be foster parented, the node node must be
    160    inserted into the foster parent element, and the current table must be
    161    marked as tainted. (Once the current table has been tainted, whitespace
    162    characters are inserted into the foster parent element instead of the
    163    current node.)
    164 
    165    The foster parent element is the parent element of the last table
    166    element in the stack of open elements, if there is a table element and
    167    it has such a parent element. If there is no table element in the stack
    168    of open elements (fragment case), then the foster parent element is the
    169    first element in the stack of open elements (the html element).
    170    Otherwise, if there is a table element in the stack of open elements,
    171    but the last table element in the stack of open elements has no parent,
    172    or its parent node is not an element, then the foster parent element is
    173    the element before the last table element in the stack of open
    174    elements.
    175 
    176    If the foster parent element is the parent element of the last table
    177    element in the stack of open elements, then node must be inserted
    178    immediately before the last table element in the stack of open elements
    179    in the foster parent element; otherwise, node must be appended to the
    180    foster parent element.
    181 
    182       8.2.5.4 The "initial" insertion mode
    183 
    184    When the insertion mode is "initial", tokens must be handled as
    185    follows:
    186 
    187    A character token that is one of one of U+0009 CHARACTER TABULATION,
    188           U+000A LINE FEED (LF), U+000C FORM FEED (FF), or U+0020 SPACE
    189           Ignore the token.
    190 
    191    A comment token
    192           Append a Comment node to the Document object with the data
    193           attribute set to the data given in the comment token.
    194 
    195    A DOCTYPE token
    196           If the DOCTYPE token's name is not a case-sensitive match for
    197           the string "html", or if the token's public identifier is
    198           neither missing nor a case-sensitive match for the string
    199           "XSLT-compat", or if the token's system identifier is not
    200           missing, then there is a parse error (this is the DOCTYPE parse
    201           error). Conformance checkers may, instead of reporting this
    202           error, switch to a conformance checking mode for another
    203           language (e.g. based on the DOCTYPE token a conformance checker
    204           could recognize that the document is an HTML4-era document, and
    205           defer to an HTML4 conformance checker.)
    206 
    207           Append a DocumentType node to the Document node, with the name
    208           attribute set to the name given in the DOCTYPE token; the
    209           publicId attribute set to the public identifier given in the
    210           DOCTYPE token, or the empty string if the public identifier was
    211           missing; the systemId attribute set to the system identifier
    212           given in the DOCTYPE token, or the empty string if the system
    213           identifier was missing; and the other attributes specific to
    214           DocumentType objects set to null and empty lists as appropriate.
    215           Associate the DocumentType node with the Document object so that
    216           it is returned as the value of the doctype attribute of the
    217           Document object.
    218 
    219           Then, if the DOCTYPE token matches one of the conditions in the
    220           following list, then set the document to quirks mode:
    221 
    222           + The force-quirks flag is set to on.
    223           + The name is set to anything other than "HTML".
    224           + The public identifier starts with: "+//Silmaril//dtd html Pro
    225             v0r11 19970101//"
    226           + The public identifier starts with: "-//AdvaSoft Ltd//DTD HTML
    227             3.0 asWedit + extensions//"
    228           + The public identifier starts with: "-//AS//DTD HTML 3.0
    229             asWedit + extensions//"
    230           + The public identifier starts with: "-//IETF//DTD HTML 2.0
    231             Level 1//"
    232           + The public identifier starts with: "-//IETF//DTD HTML 2.0
    233             Level 2//"
    234           + The public identifier starts with: "-//IETF//DTD HTML 2.0
    235             Strict Level 1//"
    236           + The public identifier starts with: "-//IETF//DTD HTML 2.0
    237             Strict Level 2//"
    238           + The public identifier starts with: "-//IETF//DTD HTML 2.0
    239             Strict//"
    240           + The public identifier starts with: "-//IETF//DTD HTML 2.0//"
    241           + The public identifier starts with: "-//IETF//DTD HTML 2.1E//"
    242           + The public identifier starts with: "-//IETF//DTD HTML 3.0//"
    243           + The public identifier starts with: "-//IETF//DTD HTML 3.2
    244             Final//"
    245           + The public identifier starts with: "-//IETF//DTD HTML 3.2//"
    246           + The public identifier starts with: "-//IETF//DTD HTML 3//"
    247           + The public identifier starts with: "-//IETF//DTD HTML Level
    248             0//"
    249           + The public identifier starts with: "-//IETF//DTD HTML Level
    250             1//"
    251           + The public identifier starts with: "-//IETF//DTD HTML Level
    252             2//"
    253           + The public identifier starts with: "-//IETF//DTD HTML Level
    254             3//"
    255           + The public identifier starts with: "-//IETF//DTD HTML Strict
    256             Level 0//"
    257           + The public identifier starts with: "-//IETF//DTD HTML Strict
    258             Level 1//"
    259           + The public identifier starts with: "-//IETF//DTD HTML Strict
    260             Level 2//"
    261           + The public identifier starts with: "-//IETF//DTD HTML Strict
    262             Level 3//"
    263           + The public identifier starts with: "-//IETF//DTD HTML
    264             Strict//"
    265           + The public identifier starts with: "-//IETF//DTD HTML//"
    266           + The public identifier starts with: "-//Metrius//DTD Metrius
    267             Presentational//"
    268           + The public identifier starts with: "-//Microsoft//DTD Internet
    269             Explorer 2.0 HTML Strict//"
    270           + The public identifier starts with: "-//Microsoft//DTD Internet
    271             Explorer 2.0 HTML//"
    272           + The public identifier starts with: "-//Microsoft//DTD Internet
    273             Explorer 2.0 Tables//"
    274           + The public identifier starts with: "-//Microsoft//DTD Internet
    275             Explorer 3.0 HTML Strict//"
    276           + The public identifier starts with: "-//Microsoft//DTD Internet
    277             Explorer 3.0 HTML//"
    278           + The public identifier starts with: "-//Microsoft//DTD Internet
    279             Explorer 3.0 Tables//"
    280           + The public identifier starts with: "-//Netscape Comm.
    281             Corp.//DTD HTML//"
    282           + The public identifier starts with: "-//Netscape Comm.
    283             Corp.//DTD Strict HTML//"
    284           + The public identifier starts with: "-//O'Reilly and
    285             Associates//DTD HTML 2.0//"
    286           + The public identifier starts with: "-//O'Reilly and
    287             Associates//DTD HTML Extended 1.0//"
    288           + The public identifier starts with: "-//O'Reilly and
    289             Associates//DTD HTML Extended Relaxed 1.0//"
    290           + The public identifier starts with: "-//SoftQuad Software//DTD
    291             HoTMetaL PRO 6.0::19990601::extensions to HTML 4.0//"
    292           + The public identifier starts with: "-//SoftQuad//DTD HoTMetaL
    293             PRO 4.0::19971010::extensions to HTML 4.0//"
    294           + The public identifier starts with: "-//Spyglass//DTD HTML 2.0
    295             Extended//"
    296           + The public identifier starts with: "-//SQ//DTD HTML 2.0
    297             HoTMetaL + extensions//"
    298           + The public identifier starts with: "-//Sun Microsystems
    299             Corp.//DTD HotJava HTML//"
    300           + The public identifier starts with: "-//Sun Microsystems
    301             Corp.//DTD HotJava Strict HTML//"
    302           + The public identifier starts with: "-//W3C//DTD HTML 3
    303             1995-03-24//"
    304           + The public identifier starts with: "-//W3C//DTD HTML 3.2
    305             Draft//"
    306           + The public identifier starts with: "-//W3C//DTD HTML 3.2
    307             Final//"
    308           + The public identifier starts with: "-//W3C//DTD HTML 3.2//"
    309           + The public identifier starts with: "-//W3C//DTD HTML 3.2S
    310             Draft//"
    311           + The public identifier starts with: "-//W3C//DTD HTML 4.0
    312             Frameset//"
    313           + The public identifier starts with: "-//W3C//DTD HTML 4.0
    314             Transitional//"
    315           + The public identifier starts with: "-//W3C//DTD HTML
    316             Experimental 19960712//"
    317           + The public identifier starts with: "-//W3C//DTD HTML
    318             Experimental 970421//"
    319           + The public identifier starts with: "-//W3C//DTD W3 HTML//"
    320           + The public identifier starts with: "-//W3O//DTD W3 HTML 3.0//"
    321           + The public identifier is set to: "-//W3O//DTD W3 HTML Strict
    322             3.0//EN//"
    323           + The public identifier starts with: "-//WebTechs//DTD Mozilla
    324             HTML 2.0//"
    325           + The public identifier starts with: "-//WebTechs//DTD Mozilla
    326             HTML//"
    327           + The public identifier is set to: "-/W3C/DTD HTML 4.0
    328             Transitional/EN"
    329           + The public identifier is set to: "HTML"
    330           + The system identifier is set to:
    331             "http://www.ibm.com/data/dtd/v11/ibmxhtml1-transitional.dtd"
    332           + The system identifier is missing and the public identifier
    333             starts with: "-//W3C//DTD HTML 4.01 Frameset//"
    334           + The system identifier is missing and the public identifier
    335             starts with: "-//W3C//DTD HTML 4.01 Transitional//"
    336 
    337           Otherwise, if the DOCTYPE token matches one of the conditions in
    338           the following list, then set the document to limited quirks
    339           mode:
    340 
    341           + The public identifier starts with: "-//W3C//DTD XHTML 1.0
    342             Frameset//"
    343           + The public identifier starts with: "-//W3C//DTD XHTML 1.0
    344             Transitional//"
    345           + The system identifier is not missing and the public identifier
    346             starts with: "-//W3C//DTD HTML 4.01 Frameset//"
    347           + The system identifier is not missing and the public identifier
    348             starts with: "-//W3C//DTD HTML 4.01 Transitional//"
    349 
    350           The name, system identifier, and public identifier strings must
    351           be compared to the values given in the lists above in an ASCII
    352           case-insensitive manner. A system identifier whose value is the
    353           empty string is not considered missing for the purposes of the
    354           conditions above.
    355 
    356           Then, switch the insertion mode to "before html".
    357 
    358    Anything else
    359           Parse error.
    360 
    361           Set the document to quirks mode.
    362 
    363           Switch the insertion mode to "before html", then reprocess the
    364           current token.
    365 
    366       8.2.5.5 The "before html" insertion mode
    367 
    368    When the insertion mode is "before html", tokens must be handled as
    369    follows:
    370 
    371    A DOCTYPE token
    372           Parse error. Ignore the token.
    373 
    374    A comment token
    375           Append a Comment node to the Document object with the data
    376           attribute set to the data given in the comment token.
    377 
    378    A character token that is one of one of U+0009 CHARACTER TABULATION,
    379           U+000A LINE FEED (LF), U+000C FORM FEED (FF), or U+0020 SPACE
    380           Ignore the token.
    381 
    382    A start tag whose tag name is "html"
    383           Create an element for the token in the HTML namespace. Append it
    384           to the Document object. Put this element in the stack of open
    385           elements.
    386 
    387           If the token has an attribute "manifest", then resolve the value
    388           of that attribute to an absolute URL, and if that is successful,
    389           run the application cache selection algorithm with the resulting
    390           absolute URL. Otherwise, if there is no such attribute or
    391           resolving it fails, run the application cache selection
    392           algorithm with no manifest. The algorithm must be passed the
    393           Document object.
    394 
    395           Switch the insertion mode to "before head".
    396 
    397    Anything else
    398           Create an HTMLElement node with the tag name html, in the HTML
    399           namespace. Append it to the Document object. Put this element in
    400           the stack of open elements.
    401 
    402           Run the application cache selection algorithm with no manifest,
    403           passing it the Document object.
    404 
    405           Switch the insertion mode to "before head", then reprocess the
    406           current token.
    407 
    408           Should probably make end tags be ignored, so that "</head><!--
    409           --><html>" puts the comment before the root node (or should we?)
    410 
    411    The root element can end up being removed from the Document object,
    412    e.g. by scripts; nothing in particular happens in such cases, content
    413    continues being appended to the nodes as described in the next section.
    414 
    415       8.2.5.6 The "before head" insertion mode
    416 
    417    When the insertion mode is "before head", tokens must be handled as
    418    follows:
    419 
    420    A character token that is one of one of U+0009 CHARACTER TABULATION,
    421           U+000A LINE FEED (LF), U+000C FORM FEED (FF), or U+0020 SPACE
    422           Ignore the token.
    423 
    424    A comment token
    425           Append a Comment node to the current node with the data
    426           attribute set to the data given in the comment token.
    427 
    428    A DOCTYPE token
    429           Parse error. Ignore the token.
    430 
    431    A start tag whose tag name is "html"
    432           Process the token using the rules for the "in body" insertion
    433           mode.
    434 
    435    A start tag whose tag name is "head"
    436           Insert an HTML element for the token.
    437 
    438           Set the head element pointer to the newly created head element.
    439 
    440           Switch the insertion mode to "in head".
    441 
    442    An end tag whose tag name is one of: "head", "br"
    443           Act as if a start tag token with the tag name "head" and no
    444           attributes had been seen, then reprocess the current token.
    445 
    446    Any other end tag
    447           Parse error. Ignore the token.
    448 
    449    Anything else
    450           Act as if a start tag token with the tag name "head" and no
    451           attributes had been seen, then reprocess the current token.
    452 
    453           This will result in an empty head element being generated, with
    454           the current token being reprocessed in the "after head"
    455           insertion mode.
    456 
    457       8.2.5.7 The "in head" insertion mode
    458 
    459    When the insertion mode is "in head", tokens must be handled as
    460    follows:
    461 
    462    A character token that is one of one of U+0009 CHARACTER TABULATION,
    463           U+000A LINE FEED (LF), U+000C FORM FEED (FF), or U+0020 SPACE
    464           Insert the character into the current node.
    465 
    466    A comment token
    467           Append a Comment node to the current node with the data
    468           attribute set to the data given in the comment token.
    469 
    470    A DOCTYPE token
    471           Parse error. Ignore the token.
    472 
    473    A start tag whose tag name is "html"
    474           Process the token using the rules for the "in body" insertion
    475           mode.
    476 
    477    A start tag whose tag name is one of: "base", "command", "eventsource",
    478           "link"
    479           Insert an HTML element for the token. Immediately pop the
    480           current node off the stack of open elements.
    481 
    482           Acknowledge the token's self-closing flag, if it is set.
    483 
    484    A start tag whose tag name is "meta"
    485           Insert an HTML element for the token. Immediately pop the
    486           current node off the stack of open elements.
    487 
    488           Acknowledge the token's self-closing flag, if it is set.
    489 
    490           If the element has a charset attribute, and its value is a
    491           supported encoding, and the confidence is currently tentative,
    492           then change the encoding to the encoding given by the value of
    493           the charset attribute.
    494 
    495           Otherwise, if the element has a content attribute, and applying
    496           the algorithm for extracting an encoding from a Content-Type to
    497           its value returns a supported encoding encoding, and the
    498           confidence is currently tentative, then change the encoding to
    499           the encoding encoding.
    500 
    501    A start tag whose tag name is "title"
    502           Follow the generic RCDATA element parsing algorithm.
    503 
    504    A start tag whose tag name is "noscript", if the scripting flag is
    505           enabled
    506 
    507    A start tag whose tag name is one of: "noframes", "style"
    508           Follow the generic CDATA element parsing algorithm.
    509 
    510    A start tag whose tag name is "noscript", if the scripting flag is
    511           disabled
    512           Insert an HTML element for the token.
    513 
    514           Switch the insertion mode to "in head noscript".
    515 
    516    A start tag whose tag name is "script"
    517 
    518          1. Create an element for the token in the HTML namespace.
    519          2. Mark the element as being "parser-inserted".
    520             This ensures that, if the script is external, any
    521             document.write() calls in the script will execute in-line,
    522             instead of blowing the document away, as would happen in most
    523             other cases. It also prevents the script from executing until
    524             the end tag is seen.
    525          3. If the parser was originally created for the HTML fragment
    526             parsing algorithm, then mark the script element as "already
    527             executed". (fragment case)
    528          4. Append the new element to the current node.
    529          5. Switch the tokeniser's content model flag to the CDATA state.
    530          6. Let the original insertion mode be the current insertion mode.
    531          7. Switch the insertion mode to "in CDATA/RCDATA".
    532 
    533    An end tag whose tag name is "head"
    534           Pop the current node (which will be the head element) off the
    535           stack of open elements.
    536 
    537           Switch the insertion mode to "after head".
    538 
    539    An end tag whose tag name is "br"
    540           Act as described in the "anything else" entry below.
    541 
    542    A start tag whose tag name is "head"
    543    Any other end tag
    544           Parse error. Ignore the token.
    545 
    546    Anything else
    547           Act as if an end tag token with the tag name "head" had been
    548           seen, and reprocess the current token.
    549 
    550           In certain UAs, some elements don't trigger the "in body" mode
    551           straight away, but instead get put into the head. Do we want to
    552           copy that?
    553 
    554       8.2.5.8 The "in head noscript" insertion mode
    555 
    556    When the insertion mode is "in head noscript", tokens must be handled
    557    as follows:
    558 
    559    A DOCTYPE token
    560           Parse error. Ignore the token.
    561 
    562    A start tag whose tag name is "html"
    563           Process the token using the rules for the "in body" insertion
    564           mode.
    565 
    566    An end tag whose tag name is "noscript"
    567           Pop the current node (which will be a noscript element) from the
    568           stack of open elements; the new current node will be a head
    569           element.
    570 
    571           Switch the insertion mode to "in head".
    572 
    573    A character token that is one of one of U+0009 CHARACTER TABULATION,
    574           U+000A LINE FEED (LF), U+000C FORM FEED (FF), or U+0020 SPACE
    575 
    576    A comment token
    577    A start tag whose tag name is one of: "link", "meta", "noframes",
    578           "style"
    579           Process the token using the rules for the "in head" insertion
    580           mode.
    581 
    582    An end tag whose tag name is "br"
    583           Act as described in the "anything else" entry below.
    584 
    585    A start tag whose tag name is one of: "head", "noscript"
    586    Any other end tag
    587           Parse error. Ignore the token.
    588 
    589    Anything else
    590           Parse error. Act as if an end tag with the tag name "noscript"
    591           had been seen and reprocess the current token.
    592 
    593       8.2.5.9 The "after head" insertion mode
    594 
    595    When the insertion mode is "after head", tokens must be handled as
    596    follows:
    597 
    598    A character token that is one of one of U+0009 CHARACTER TABULATION,
    599           U+000A LINE FEED (LF), U+000C FORM FEED (FF), or U+0020 SPACE
    600           Insert the character into the current node.
    601 
    602    A comment token
    603           Append a Comment node to the current node with the data
    604           attribute set to the data given in the comment token.
    605 
    606    A DOCTYPE token
    607           Parse error. Ignore the token.
    608 
    609    A start tag whose tag name is "html"
    610           Process the token using the rules for the "in body" insertion
    611           mode.
    612 
    613    A start tag whose tag name is "body"
    614           Insert an HTML element for the token.
    615 
    616           Switch the insertion mode to "in body".
    617 
    618    A start tag whose tag name is "frameset"
    619           Insert an HTML element for the token.
    620 
    621           Switch the insertion mode to "in frameset".
    622 
    623    A start tag token whose tag name is one of: "base", "link", "meta",
    624           "noframes", "script", "style", "title"
    625           Parse error.
    626 
    627           Push the node pointed to by the head element pointer onto the
    628           stack of open elements.
    629 
    630           Process the token using the rules for the "in head" insertion
    631           mode.
    632 
    633           Remove the node pointed to by the head element pointer from the
    634           stack of open elements.
    635 
    636    An end tag whose tag name is "br"
    637           Act as described in the "anything else" entry below.
    638 
    639    A start tag whose tag name is "head"
    640    Any other end tag
    641           Parse error. Ignore the token.
    642 
    643    Anything else
    644           Act as if a start tag token with the tag name "body" and no
    645           attributes had been seen, and then reprocess the current token.
    646 
    647       8.2.5.10 The "in body" insertion mode
    648 
    649    When the insertion mode is "in body", tokens must be handled as
    650    follows:
    651 
    652    A character token
    653           Reconstruct the active formatting elements, if any.
    654 
    655           Insert the token's character into the current node.
    656 
    657    A comment token
    658           Append a Comment node to the current node with the data
    659           attribute set to the data given in the comment token.
    660 
    661    A DOCTYPE token
    662           Parse error. Ignore the token.
    663 
    664    A start tag whose tag name is "html"
    665           Parse error. For each attribute on the token, check to see if
    666           the attribute is already present on the top element of the stack
    667           of open elements. If it is not, add the attribute and its
    668           corresponding value to that element.
    669 
    670    A start tag token whose tag name is one of: "base", "command",
    671           "eventsource", "link", "meta", "noframes", "script", "style",
    672           "title"
    673           Process the token using the rules for the "in head" insertion
    674           mode.
    675 
    676    A start tag whose tag name is "body"
    677           Parse error.
    678 
    679           If the second element on the stack of open elements is not a
    680           body element, or, if the stack of open elements has only one
    681           node on it, then ignore the token. (fragment case)
    682 
    683           Otherwise, for each attribute on the token, check to see if the
    684           attribute is already present on the body element (the second
    685           element) on the stack of open elements. If it is not, add the
    686           attribute and its corresponding value to that element.
    687 
    688    An end-of-file token
    689           If there is a node in the stack of open elements that is not
    690           either a dd element, a dt element, an li element, a p element, a
    691           tbody element, a td element, a tfoot element, a th element, a
    692           thead element, a tr element, the body element, or the html
    693           element, then this is a parse error.
    694 
    695           Stop parsing.
    696 
    697    An end tag whose tag name is "body"
    698           If the stack of open elements does not have a body element in
    699           scope, this is a parse error; ignore the token.
    700 
    701           Otherwise, if there is a node in the stack of open elements that
    702           is not either a dd element, a dt element, an li element, a p
    703           element, a tbody element, a td element, a tfoot element, a th
    704           element, a thead element, a tr element, the body element, or the
    705           html element, then this is a parse error.
    706 
    707           Switch the insertion mode to "after body".
    708 
    709    An end tag whose tag name is "html"
    710           Act as if an end tag with tag name "body" had been seen, then,
    711           if that token wasn't ignored, reprocess the current token.
    712 
    713           The fake end tag token here can only be ignored in the fragment
    714           case.
    715 
    716    A start tag whose tag name is one of: "address", "article", "aside",
    717           "blockquote", "center", "datagrid", "details", "dialog", "dir",
    718           "div", "dl", "fieldset", "figure", "footer", "header", "menu",
    719           "nav", "ol", "p", "section", "ul"
    720           If the stack of open elements has a p element in scope, then act
    721           as if an end tag with the tag name "p" had been seen.
    722 
    723           Insert an HTML element for the token.
    724 
    725    A start tag whose tag name is one of: "h1", "h2", "h3", "h4", "h5",
    726           "h6"
    727           If the stack of open elements has a p element in scope, then act
    728           as if an end tag with the tag name "p" had been seen.
    729 
    730           If the current node is an element whose tag name is one of "h1",
    731           "h2", "h3", "h4", "h5", or "h6", then this is a parse error; pop
    732           the current node off the stack of open elements.
    733 
    734           Insert an HTML element for the token.
    735 
    736    A start tag whose tag name is one of: "pre", "listing"
    737           If the stack of open elements has a p element in scope, then act
    738           as if an end tag with the tag name "p" had been seen.
    739 
    740           Insert an HTML element for the token.
    741 
    742           If the next token is a U+000A LINE FEED (LF) character token,
    743           then ignore that token and move on to the next one. (Newlines at
    744           the start of pre blocks are ignored as an authoring
    745           convenience.)
    746 
    747    A start tag whose tag name is "form"
    748           If the form element pointer is not null, then this is a parse
    749           error; ignore the token.
    750 
    751           Otherwise:
    752 
    753           If the stack of open elements has a p element in scope, then act
    754           as if an end tag with the tag name "p" had been seen.
    755 
    756           Insert an HTML element for the token, and set the form element
    757           pointer to point to the element created.
    758 
    759    A start tag whose tag name is "li"
    760           Run the following algorithm:
    761 
    762          1. Initialize node to be the current node (the bottommost node of
    763             the stack).
    764          2. If node is an li element, then act as if an end tag with the
    765             tag name "li" had been seen, then jump to the last step.
    766          3. If node is not in the formatting category, and is not in the
    767             phrasing category, and is not an address, div, or p element,
    768             then jump to the last step.
    769          4. Otherwise, set node to the previous entry in the stack of open
    770             elements and return to step 2.
    771          5. This is the last step.
    772             If the stack of open elements has a p element in scope, then
    773             act as if an end tag with the tag name "p" had been seen.
    774             Finally, insert an HTML element for the token.
    775 
    776    A start tag whose tag name is one of: "dd", "dt"
    777           Run the following algorithm:
    778 
    779          1. Initialize node to be the current node (the bottommost node of
    780             the stack).
    781          2. If node is a dd or dt element, then act as if an end tag with
    782             the same tag name as node had been seen, then jump to the last
    783             step.
    784          3. If node is not in the formatting category, and is not in the
    785             phrasing category, and is not an address, div, or p element,
    786             then jump to the last step.
    787          4. Otherwise, set node to the previous entry in the stack of open
    788             elements and return to step 2.
    789          5. This is the last step.
    790             If the stack of open elements has a p element in scope, then
    791             act as if an end tag with the tag name "p" had been seen.
    792             Finally, insert an HTML element for the token.
    793 
    794    A start tag whose tag name is "plaintext"
    795           If the stack of open elements has a p element in scope, then act
    796           as if an end tag with the tag name "p" had been seen.
    797 
    798           Insert an HTML element for the token.
    799 
    800           Switch the content model flag to the PLAINTEXT state.
    801 
    802           Once a start tag with the tag name "plaintext" has been seen,
    803           that will be the last token ever seen other than character
    804           tokens (and the end-of-file token), because there is no way to
    805           switch the content model flag out of the PLAINTEXT state.
    806 
    807    An end tag whose tag name is one of: "address", "article", "aside",
    808           "blockquote", "center", "datagrid", "details", "dialog", "dir",
    809           "div", "dl", "fieldset", "figure", "footer", "header",
    810           "listing", "menu", "nav", "ol", "pre", "section", "ul"
    811           If the stack of open elements does not have an element in scope
    812           with the same tag name as that of the token, then this is a
    813           parse error; ignore the token.
    814 
    815           Otherwise, run these steps:
    816 
    817          1. Generate implied end tags.
    818          2. If the current node is not an element with the same tag name
    819             as that of the token, then this is a parse error.
    820          3. Pop elements from the stack of open elements until an element
    821             with the same tag name as the token has been popped from the
    822             stack.
    823 
    824    An end tag whose tag name is "form"
    825           Let node be the element that the form element pointer is set to.
    826 
    827           Set the form element pointer to null.
    828 
    829           If node is null or the stack of open elements does not have node
    830           in scope, then this is a parse error; ignore the token.
    831 
    832           Otherwise, run these steps:
    833 
    834          1. Generate implied end tags.
    835          2. If the current node is not node, then this is a parse error.
    836          3. Remove node from the stack of open elements.
    837 
    838    An end tag whose tag name is "p"
    839           If the stack of open elements does not have an element in scope
    840           with the same tag name as that of the token, then this is a
    841           parse error; act as if a start tag with the tag name p had been
    842           seen, then reprocess the current token.
    843 
    844           Otherwise, run these steps:
    845 
    846          1. Generate implied end tags, except for elements with the same
    847             tag name as the token.
    848          2. If the current node is not an element with the same tag name
    849             as that of the token, then this is a parse error.
    850          3. Pop elements from the stack of open elements until an element
    851             with the same tag name as the token has been popped from the
    852             stack.
    853 
    854    An end tag whose tag name is one of: "dd", "dt", "li"
    855           If the stack of open elements does not have an element in scope
    856           with the same tag name as that of the token, then this is a
    857           parse error; ignore the token.
    858 
    859           Otherwise, run these steps:
    860 
    861          1. Generate implied end tags, except for elements with the same
    862             tag name as the token.
    863          2. If the current node is not an element with the same tag name
    864             as that of the token, then this is a parse error.
    865          3. Pop elements from the stack of open elements until an element
    866             with the same tag name as the token has been popped from the
    867             stack.
    868 
    869    An end tag whose tag name is one of: "h1", "h2", "h3", "h4", "h5", "h6"
    870           If the stack of open elements does not have an element in scope
    871           whose tag name is one of "h1", "h2", "h3", "h4", "h5", or "h6",
    872           then this is a parse error; ignore the token.
    873 
    874           Otherwise, run these steps:
    875 
    876          1. Generate implied end tags.
    877          2. If the current node is not an element with the same tag name
    878             as that of the token, then this is a parse error.
    879          3. Pop elements from the stack of open elements until an element
    880             whose tag name is one of "h1", "h2", "h3", "h4", "h5", or "h6"
    881             has been popped from the stack.
    882 
    883    An end tag whose tag name is "sarcasm"
    884           Take a deep breath, then act as described in the "any other end
    885           tag" entry below.
    886 
    887    A start tag whose tag name is "a"
    888           If the list of active formatting elements contains an element
    889           whose tag name is "a" between the end of the list and the last
    890           marker on the list (or the start of the list if there is no
    891           marker on the list), then this is a parse error; act as if an
    892           end tag with the tag name "a" had been seen, then remove that
    893           element from the list of active formatting elements and the
    894           stack of open elements if the end tag didn't already remove it
    895           (it might not have if the element is not in table scope).
    896 
    897           In the non-conforming stream
    898           <a href="a">a<table><a href="b">b</table>x, the first a element
    899           would be closed upon seeing the second one, and the "x"
    900           character would be inside a link to "b", not to "a". This is
    901           despite the fact that the outer a element is not in table scope
    902           (meaning that a regular </a> end tag at the start of the table
    903           wouldn't close the outer a element).
    904 
    905           Reconstruct the active formatting elements, if any.
    906 
    907           Insert an HTML element for the token. Add that element to the
    908           list of active formatting elements.
    909 
    910    A start tag whose tag name is one of: "b", "big", "em", "font", "i",
    911           "s", "small", "strike", "strong", "tt", "u"
    912           Reconstruct the active formatting elements, if any.
    913 
    914           Insert an HTML element for the token. Add that element to the
    915           list of active formatting elements.
    916 
    917    A start tag whose tag name is "nobr"
    918           Reconstruct the active formatting elements, if any.
    919 
    920           If the stack of open elements has a nobr element in scope, then
    921           this is a parse error; act as if an end tag with the tag name
    922           "nobr" had been seen, then once again reconstruct the active
    923           formatting elements, if any.
    924 
    925           Insert an HTML element for the token. Add that element to the
    926           list of active formatting elements.
    927 
    928    An end tag whose tag name is one of: "a", "b", "big", "em", "font",
    929           "i", "nobr", "s", "small", "strike", "strong", "tt", "u"
    930           Follow these steps:
    931 
    932          1. Let the formatting element be the last element in the list of
    933             active formatting elements that:
    934                o is between the end of the list and the last scope marker
    935                  in the list, if any, or the start of the list otherwise,
    936                  and
    937                o has the same tag name as the token.
    938             If there is no such node, or, if that node is also in the
    939             stack of open elements but the element is not in scope, then
    940             this is a parse error; ignore the token, and abort these
    941             steps.
    942             Otherwise, if there is such a node, but that node is not in
    943             the stack of open elements, then this is a parse error; remove
    944             the element from the list, and abort these steps.
    945             Otherwise, there is a formatting element and that element is
    946             in the stack and is in scope. If the element is not the
    947             current node, this is a parse error. In any case, proceed with
    948             the algorithm as written in the following steps.
    949          2. Let the furthest block be the topmost node in the stack of
    950             open elements that is lower in the stack than the formatting
    951             element, and is not an element in the phrasing or formatting
    952             categories. There might not be one.
    953          3. If there is no furthest block, then the UA must skip the
    954             subsequent steps and instead just pop all the nodes from the
    955             bottom of the stack of open elements, from the current node up
    956             to and including the formatting element, and remove the
    957             formatting element from the list of active formatting
    958             elements.
    959          4. Let the common ancestor be the element immediately above the
    960             formatting element in the stack of open elements.
    961          5. If the furthest block has a parent node, then remove the
    962             furthest block from its parent node.
    963          6. Let a bookmark note the position of the formatting element in
    964             the list of active formatting elements relative to the
    965             elements on either side of it in the list.
    966          7. Let node and last node be the furthest block. Follow these
    967             steps:
    968               1. Let node be the element immediately above node in the
    969                  stack of open elements.
    970               2. If node is not in the list of active formatting elements,
    971                  then remove node from the stack of open elements and then
    972                  go back to step 1.
    973               3. Otherwise, if node is the formatting element, then go to
    974                  the next step in the overall algorithm.
    975               4. Otherwise, if last node is the furthest block, then move
    976                  the aforementioned bookmark to be immediately after the
    977                  node in the list of active formatting elements.
    978               5. If node has any children, perform a shallow clone of
    979                  node, replace the entry for node in the list of active
    980                  formatting elements with an entry for the clone, replace
    981                  the entry for node in the stack of open elements with an
    982                  entry for the clone, and let node be the clone.
    983               6. Insert last node into node, first removing it from its
    984                  previous parent node if any.
    985               7. Let last node be node.
    986               8. Return to step 1 of this inner set of steps.
    987          8. If the common ancestor node is a table, tbody, tfoot, thead,
    988             or tr element, then, foster parent whatever last node ended up
    989             being in the previous step.
    990             Otherwise, append whatever last node ended up being in the
    991             previous step to the common ancestor node, first removing it
    992             from its previous parent node if any.
    993          9. Perform a shallow clone of the formatting element.
    994         10. Take all of the child nodes of the furthest block and append
    995             them to the clone created in the last step.
    996         11. Append that clone to the furthest block.
    997         12. Remove the formatting element from the list of active
    998             formatting elements, and insert the clone into the list of
    999             active formatting elements at the position of the
   1000             aforementioned bookmark.
   1001         13. Remove the formatting element from the stack of open elements,
   1002             and insert the clone into the stack of open elements
   1003             immediately below the position of the furthest block in that
   1004             stack.
   1005         14. Jump back to step 1 in this series of steps.
   1006 
   1007           The way these steps are defined, only elements in the formatting
   1008           category ever get cloned by this algorithm.
   1009 
   1010           Because of the way this algorithm causes elements to change
   1011           parents, it has been dubbed the "adoption agency algorithm" (in
   1012           contrast with other possibly algorithms for dealing with
   1013           misnested content, which included the "incest algorithm", the
   1014           "secret affair algorithm", and the "Heisenberg algorithm").
   1015 
   1016    A start tag whose tag name is "button"
   1017           If the stack of open elements has a button element in scope,
   1018           then this is a parse error; act as if an end tag with the tag
   1019           name "button" had been seen, then reprocess the token.
   1020 
   1021           Otherwise:
   1022 
   1023           Reconstruct the active formatting elements, if any.
   1024 
   1025           Insert an HTML element for the token.
   1026 
   1027           Insert a marker at the end of the list of active formatting
   1028           elements.
   1029 
   1030    A start tag token whose tag name is one of: "applet", "marquee",
   1031           "object"
   1032           Reconstruct the active formatting elements, if any.
   1033 
   1034           Insert an HTML element for the token.
   1035 
   1036           Insert a marker at the end of the list of active formatting
   1037           elements.
   1038 
   1039    An end tag token whose tag name is one of: "applet", "button",
   1040           "marquee", "object"
   1041           If the stack of open elements does not have an element in scope
   1042           with the same tag name as that of the token, then this is a
   1043           parse error; ignore the token.
   1044 
   1045           Otherwise, run these steps:
   1046 
   1047          1. Generate implied end tags.
   1048          2. If the current node is not an element with the same tag name
   1049             as that of the token, then this is a parse error.
   1050          3. Pop elements from the stack of open elements until an element
   1051             with the same tag name as the token has been popped from the
   1052             stack.
   1053          4. Clear the list of active formatting elements up to the last
   1054             marker.
   1055 
   1056    A start tag whose tag name is "xmp"
   1057           Reconstruct the active formatting elements, if any.
   1058 
   1059           Follow the generic CDATA element parsing algorithm.
   1060 
   1061    A start tag whose tag name is "table"
   1062           If the stack of open elements has a p element in scope, then act
   1063           as if an end tag with the tag name "p" had been seen.
   1064 
   1065           Insert an HTML element for the token.
   1066 
   1067           Switch the insertion mode to "in table".
   1068 
   1069    A start tag whose tag name is one of: "area", "basefont", "bgsound",
   1070           "br", "embed", "img", "input", "spacer", "wbr"
   1071           Reconstruct the active formatting elements, if any.
   1072 
   1073           Insert an HTML element for the token. Immediately pop the
   1074           current node off the stack of open elements.
   1075 
   1076           Acknowledge the token's self-closing flag, if it is set.
   1077 
   1078    A start tag whose tag name is one of: "param", "source"
   1079           Insert an HTML element for the token. Immediately pop the
   1080           current node off the stack of open elements.
   1081 
   1082           Acknowledge the token's self-closing flag, if it is set.
   1083 
   1084    A start tag whose tag name is "hr"
   1085           If the stack of open elements has a p element in scope, then act
   1086           as if an end tag with the tag name "p" had been seen.
   1087 
   1088           Insert an HTML element for the token. Immediately pop the
   1089           current node off the stack of open elements.
   1090 
   1091           Acknowledge the token's self-closing flag, if it is set.
   1092 
   1093    A start tag whose tag name is "image"
   1094           Parse error. Change the token's tag name to "img" and reprocess
   1095           it. (Don't ask.)
   1096 
   1097    A start tag whose tag name is "isindex"
   1098           Parse error.
   1099 
   1100           If the form element pointer is not null, then ignore the token.
   1101 
   1102           Otherwise:
   1103 
   1104           Acknowledge the token's self-closing flag, if it is set.
   1105 
   1106           Act as if a start tag token with the tag name "form" had been
   1107           seen.
   1108 
   1109           If the token has an attribute called "action", set the action
   1110           attribute on the resulting form element to the value of the
   1111           "action" attribute of the token.
   1112 
   1113           Act as if a start tag token with the tag name "hr" had been
   1114           seen.
   1115 
   1116           Act as if a start tag token with the tag name "p" had been seen.
   1117 
   1118           Act as if a start tag token with the tag name "label" had been
   1119           seen.
   1120 
   1121           Act as if a stream of character tokens had been seen (see below
   1122           for what they should say).
   1123 
   1124           Act as if a start tag token with the tag name "input" had been
   1125           seen, with all the attributes from the "isindex" token except
   1126           "name", "action", and "prompt". Set the name attribute of the
   1127           resulting input element to the value "isindex".
   1128 
   1129           Act as if a stream of character tokens had been seen (see below
   1130           for what they should say).
   1131 
   1132           Act as if an end tag token with the tag name "label" had been
   1133           seen.
   1134 
   1135           Act as if an end tag token with the tag name "p" had been seen.
   1136 
   1137           Act as if a start tag token with the tag name "hr" had been
   1138           seen.
   1139 
   1140           Act as if an end tag token with the tag name "form" had been
   1141           seen.
   1142 
   1143           If the token has an attribute with the name "prompt", then the
   1144           first stream of characters must be the same string as given in
   1145           that attribute, and the second stream of characters must be
   1146           empty. Otherwise, the two streams of character tokens together
   1147           should, together with the input element, express the equivalent
   1148           of "This is a searchable index. Insert your search keywords
   1149           here: (input field)" in the user's preferred language.
   1150 
   1151    A start tag whose tag name is "textarea"
   1152 
   1153          1. Insert an HTML element for the token.
   1154          2. If the next token is a U+000A LINE FEED (LF) character token,
   1155             then ignore that token and move on to the next one. (Newlines
   1156             at the start of textarea elements are ignored as an authoring
   1157             convenience.)
   1158          3. Switch the tokeniser's content model flag to the RCDATA state.
   1159          4. Let the original insertion mode be the current insertion mode.
   1160          5. Switch the insertion mode to "in CDATA/RCDATA".
   1161 
   1162    A start tag whose tag name is one of: "iframe", "noembed"
   1163    A start tag whose tag name is "noscript", if the scripting flag is
   1164           enabled
   1165           Follow the generic CDATA element parsing algorithm.
   1166 
   1167    A start tag whose tag name is "select"
   1168           Reconstruct the active formatting elements, if any.
   1169 
   1170           Insert an HTML element for the token.
   1171 
   1172           If the insertion mode is one of in table", "in caption", "in
   1173           column group", "in table body", "in row", or "in cell", then
   1174           switch the insertion mode to "in select in table". Otherwise,
   1175           switch the insertion mode to "in select".
   1176 
   1177    A start tag whose tag name is one of: "optgroup", "option"
   1178           If the stack of open elements has an option element in scope,
   1179           then act as if an end tag with the tag name "option" had been
   1180           seen.
   1181 
   1182           Reconstruct the active formatting elements, if any.
   1183 
   1184           Insert an HTML element for the token.
   1185 
   1186    A start tag whose tag name is one of: "rp", "rt"
   1187           If the stack of open elements has a ruby element in scope, then
   1188           generate implied end tags. If the current node is not then a
   1189           ruby element, this is a parse error; pop all the nodes from the
   1190           current node up to the node immediately before the bottommost
   1191           ruby element on the stack of open elements.
   1192 
   1193           Insert an HTML element for the token.
   1194 
   1195    An end tag whose tag name is "br"
   1196           Parse error. Act as if a start tag token with the tag name "br"
   1197           had been seen. Ignore the end tag token.
   1198 
   1199    A start tag whose tag name is "math"
   1200           Reconstruct the active formatting elements, if any.
   1201 
   1202           Adjust MathML attributes for the token. (This fixes the case of
   1203           MathML attributes that are not all lowercase.)
   1204 
   1205           Adjust foreign attributes for the token. (This fixes the use of
   1206           namespaced attributes, in particular XLink.)
   1207 
   1208           Insert a foreign element for the token, in the MathML namespace.
   1209 
   1210           If the token has its self-closing flag set, pop the current node
   1211           off the stack of open elements and acknowledge the token's
   1212           self-closing flag.
   1213 
   1214           Otherwise, let the secondary insertion mode be the current
   1215           insertion mode, and then switch the insertion mode to "in
   1216           foreign content".
   1217 
   1218    A start tag whose tag name is one of: "caption", "col", "colgroup",
   1219           "frame", "frameset", "head", "tbody", "td", "tfoot", "th",
   1220           "thead", "tr"
   1221           Parse error. Ignore the token.
   1222 
   1223    Any other start tag
   1224           Reconstruct the active formatting elements, if any.
   1225 
   1226           Insert an HTML element for the token.
   1227 
   1228           This element will be a phrasing element.
   1229 
   1230    Any other end tag
   1231           Run the following steps:
   1232 
   1233          1. Initialize node to be the current node (the bottommost node of
   1234             the stack).
   1235          2. If node has the same tag name as the end tag token, then:
   1236               1. Generate implied end tags.
   1237               2. If the tag name of the end tag token does not match the
   1238                  tag name of the current node, this is a parse error.
   1239               3. Pop all the nodes from the current node up to node,
   1240                  including node, then stop these steps.
   1241          3. Otherwise, if node is in neither the formatting category nor
   1242             the phrasing category, then this is a parse error; ignore the
   1243             token, and abort these steps.
   1244          4. Set node to the previous entry in the stack of open elements.
   1245          5. Return to step 2.
   1246 
   1247       8.2.5.11 The "in CDATA/RCDATA" insertion mode
   1248 
   1249    When the insertion mode is "in CDATA/RCDATA", tokens must be handled as
   1250    follows:
   1251 
   1252    A character token
   1253           Insert the token's character into the current node.
   1254 
   1255    An end-of-file token
   1256           Parse error.
   1257 
   1258           If the current node is a script element, mark the script element
   1259           as "already executed".
   1260 
   1261           Pop the current node off the stack of open elements.
   1262 
   1263           Switch the insertion mode to the original insertion mode and
   1264           reprocess the current token.
   1265 
   1266    An end tag whose tag name is "script"
   1267           Let script be the current node (which will be a script element).
   1268 
   1269           Pop the current node off the stack of open elements.
   1270 
   1271           Switch the insertion mode to the original insertion mode.
   1272 
   1273           Let the old insertion point have the same value as the current
   1274           insertion point. Let the insertion point be just before the next
   1275           input character.
   1276 
   1277           Increment the parser's script nesting level by one.
   1278 
   1279           Run the script. This might cause some script to execute, which
   1280           might cause new characters to be inserted into the tokeniser,
   1281           and might cause the tokeniser to output more tokens, resulting
   1282           in a reentrant invocation of the parser.
   1283 
   1284           Decrement the parser's script nesting level by one. If the
   1285           parser's script nesting level is zero, then set the parser pause
   1286           flag to false.
   1287 
   1288           Let the insertion point have the value of the old insertion
   1289           point. (In other words, restore the insertion point to the value
   1290           it had before the previous paragraph. This value might be the
   1291           "undefined" value.)
   1292 
   1293           At this stage, if there is a pending external script, then:
   1294 
   1295         If the tree construction stage is being called reentrantly, say
   1296                 from a call to document.write():
   1297                 Set the parser pause flag to true, and abort the
   1298                 processing of any nested invocations of the tokeniser,
   1299                 yielding control back to the caller. (Tokenization will
   1300                 resume when the caller returns to the "outer" tree
   1301                 construction stage.)
   1302 
   1303         Otherwise:
   1304                 Follow these steps:
   1305 
   1306               1. Let the script be the pending external script. There is
   1307                  no longer a pending external script.
   1308               2. Pause until the script has completed loading.
   1309               3. Let the insertion point be just before the next input
   1310                  character.
   1311               4. Execute the script.
   1312               5. Let the insertion point be undefined again.
   1313               6. If there is once again a pending external script, then
   1314                  repeat these steps from step 1.
   1315 
   1316    Any other end tag
   1317           Pop the current node off the stack of open elements.
   1318 
   1319           Switch the insertion mode to the original insertion mode.
   1320 
   1321       8.2.5.12 The "in table" insertion mode
   1322 
   1323    When the insertion mode is "in table", tokens must be handled as
   1324    follows:
   1325 
   1326    A character token that is one of one of U+0009 CHARACTER TABULATION,
   1327           U+000A LINE FEED (LF), U+000C FORM FEED (FF), or U+0020 SPACE
   1328           If the current table is tainted, then act as described in the
   1329           "anything else" entry below.
   1330 
   1331           Otherwise, insert the character into the current node.
   1332 
   1333    A comment token
   1334           Append a Comment node to the current node with the data
   1335           attribute set to the data given in the comment token.
   1336 
   1337    A DOCTYPE token
   1338           Parse error. Ignore the token.
   1339 
   1340    A start tag whose tag name is "caption"
   1341           Clear the stack back to a table context. (See below.)
   1342 
   1343           Insert a marker at the end of the list of active formatting
   1344           elements.
   1345 
   1346           Insert an HTML element for the token, then switch the insertion
   1347           mode to "in caption".
   1348 
   1349    A start tag whose tag name is "colgroup"
   1350           Clear the stack back to a table context. (See below.)
   1351 
   1352           Insert an HTML element for the token, then switch the insertion
   1353           mode to "in column group".
   1354 
   1355    A start tag whose tag name is "col"
   1356           Act as if a start tag token with the tag name "colgroup" had
   1357           been seen, then reprocess the current token.
   1358 
   1359    A start tag whose tag name is one of: "tbody", "tfoot", "thead"
   1360           Clear the stack back to a table context. (See below.)
   1361 
   1362           Insert an HTML element for the token, then switch the insertion
   1363           mode to "in table body".
   1364 
   1365    A start tag whose tag name is one of: "td", "th", "tr"
   1366           Act as if a start tag token with the tag name "tbody" had been
   1367           seen, then reprocess the current token.
   1368 
   1369    A start tag whose tag name is "table"
   1370           Parse error. Act as if an end tag token with the tag name
   1371           "table" had been seen, then, if that token wasn't ignored,
   1372           reprocess the current token.
   1373 
   1374           The fake end tag token here can only be ignored in the fragment
   1375           case.
   1376 
   1377    An end tag whose tag name is "table"
   1378           If the stack of open elements does not have an element in table
   1379           scope with the same tag name as the token, this is a parse
   1380           error. Ignore the token. (fragment case)
   1381 
   1382           Otherwise:
   1383 
   1384           Pop elements from this stack until a table element has been
   1385           popped from the stack.
   1386 
   1387           Reset the insertion mode appropriately.
   1388 
   1389    An end tag whose tag name is one of: "body", "caption", "col",
   1390           "colgroup", "html", "tbody", "td", "tfoot", "th", "thead", "tr"
   1391           Parse error. Ignore the token.
   1392 
   1393    A start tag whose tag name is one of: "style", "script"
   1394           If the current table is tainted then act as described in the
   1395           "anything else" entry below.
   1396 
   1397           Otherwise, process the token using the rules for the "in head"
   1398           insertion mode.
   1399 
   1400    A start tag whose tag name is "input"
   1401           If the token does not have an attribute with the name "type", or
   1402           if it does, but that attribute's value is not an ASCII
   1403           case-insensitive match for the string "hidden", or, if the
   1404           current table is tainted, then: act as described in the
   1405           "anything else" entry below.
   1406 
   1407           Otherwise:
   1408 
   1409           Parse error.
   1410 
   1411           Insert an HTML element for the token.
   1412 
   1413           Pop that input element off the stack of open elements.
   1414 
   1415    An end-of-file token
   1416           If the current node is not the root html element, then this is a
   1417           parse error.
   1418 
   1419           It can only be the current node in the fragment case.
   1420 
   1421           Stop parsing.
   1422 
   1423    Anything else
   1424           Parse error. Process the token using the rules for the "in body"
   1425           insertion mode, except that if the current node is a table,
   1426           tbody, tfoot, thead, or tr element, then, whenever a node would
   1427           be inserted into the current node, it must instead be foster
   1428           parented.
   1429 
   1430    When the steps above require the UA to clear the stack back to a table
   1431    context, it means that the UA must, while the current node is not a
   1432    table element or an html element, pop elements from the stack of open
   1433    elements.
   1434 
   1435    The current node being an html element after this process is a fragment
   1436    case.
   1437 
   1438       8.2.5.13 The "in caption" insertion mode
   1439 
   1440    When the insertion mode is "in caption", tokens must be handled as
   1441    follows:
   1442 
   1443    An end tag whose tag name is "caption"
   1444           If the stack of open elements does not have an element in table
   1445           scope with the same tag name as the token, this is a parse
   1446           error. Ignore the token. (fragment case)
   1447 
   1448           Otherwise:
   1449 
   1450           Generate implied end tags.
   1451 
   1452           Now, if the current node is not a caption element, then this is
   1453           a parse error.
   1454 
   1455           Pop elements from this stack until a caption element has been
   1456           popped from the stack.
   1457 
   1458           Clear the list of active formatting elements up to the last
   1459           marker.
   1460 
   1461           Switch the insertion mode to "in table".
   1462 
   1463    A start tag whose tag name is one of: "caption", "col", "colgroup",
   1464           "tbody", "td", "tfoot", "th", "thead", "tr"
   1465 
   1466    An end tag whose tag name is "table"
   1467           Parse error. Act as if an end tag with the tag name "caption"
   1468           had been seen, then, if that token wasn't ignored, reprocess the
   1469           current token.
   1470 
   1471           The fake end tag token here can only be ignored in the fragment
   1472           case.
   1473 
   1474    An end tag whose tag name is one of: "body", "col", "colgroup", "html",
   1475           "tbody", "td", "tfoot", "th", "thead", "tr"
   1476           Parse error. Ignore the token.
   1477 
   1478    Anything else
   1479           Process the token using the rules for the "in body" insertion
   1480           mode.
   1481 
   1482       8.2.5.14 The "in column group" insertion mode
   1483 
   1484    When the insertion mode is "in column group", tokens must be handled as
   1485    follows:
   1486 
   1487    A character token that is one of one of U+0009 CHARACTER TABULATION,
   1488           U+000A LINE FEED (LF), U+000C FORM FEED (FF), or U+0020 SPACE
   1489           Insert the character into the current node.
   1490 
   1491    A comment token
   1492           Append a Comment node to the current node with the data
   1493           attribute set to the data given in the comment token.
   1494 
   1495    A DOCTYPE token
   1496           Parse error. Ignore the token.
   1497 
   1498    A start tag whose tag name is "html"
   1499           Process the token using the rules for the "in body" insertion
   1500           mode.
   1501 
   1502    A start tag whose tag name is "col"
   1503           Insert an HTML element for the token. Immediately pop the
   1504           current node off the stack of open elements.
   1505 
   1506           Acknowledge the token's self-closing flag, if it is set.
   1507 
   1508    An end tag whose tag name is "colgroup"
   1509           If the current node is the root html element, then this is a
   1510           parse error; ignore the token. (fragment case)
   1511 
   1512           Otherwise, pop the current node (which will be a colgroup
   1513           element) from the stack of open elements. Switch the insertion
   1514           mode to "in table".
   1515 
   1516    An end tag whose tag name is "col"
   1517           Parse error. Ignore the token.
   1518 
   1519    An end-of-file token
   1520           If the current node is the root html element, then stop parsing.
   1521           (fragment case)
   1522 
   1523           Otherwise, act as described in the "anything else" entry below.
   1524 
   1525    Anything else
   1526           Act as if an end tag with the tag name "colgroup" had been seen,
   1527           and then, if that token wasn't ignored, reprocess the current
   1528           token.
   1529 
   1530           The fake end tag token here can only be ignored in the fragment
   1531           case.
   1532 
   1533       8.2.5.15 The "in table body" insertion mode
   1534 
   1535    When the insertion mode is "in table body", tokens must be handled as
   1536    follows:
   1537 
   1538    A start tag whose tag name is "tr"
   1539           Clear the stack back to a table body context. (See below.)
   1540 
   1541           Insert an HTML element for the token, then switch the insertion
   1542           mode to "in row".
   1543 
   1544    A start tag whose tag name is one of: "th", "td"
   1545           Parse error. Act as if a start tag with the tag name "tr" had
   1546           been seen, then reprocess the current token.
   1547 
   1548    An end tag whose tag name is one of: "tbody", "tfoot", "thead"
   1549           If the stack of open elements does not have an element in table
   1550           scope with the same tag name as the token, this is a parse
   1551           error. Ignore the token.
   1552 
   1553           Otherwise:
   1554 
   1555           Clear the stack back to a table body context. (See below.)
   1556 
   1557           Pop the current node from the stack of open elements. Switch the
   1558           insertion mode to "in table".
   1559 
   1560    A start tag whose tag name is one of: "caption", "col", "colgroup",
   1561           "tbody", "tfoot", "thead"
   1562 
   1563    An end tag whose tag name is "table"
   1564           If the stack of open elements does not have a tbody, thead, or
   1565           tfoot element in table scope, this is a parse error. Ignore the
   1566           token. (fragment case)
   1567 
   1568           Otherwise:
   1569 
   1570           Clear the stack back to a table body context. (See below.)
   1571 
   1572           Act as if an end tag with the same tag name as the current node
   1573           ("tbody", "tfoot", or "thead") had been seen, then reprocess the
   1574           current token.
   1575 
   1576    An end tag whose tag name is one of: "body", "caption", "col",
   1577           "colgroup", "html", "td", "th", "tr"
   1578           Parse error. Ignore the token.
   1579 
   1580    Anything else
   1581           Process the token using the rules for the "in table" insertion
   1582           mode.
   1583 
   1584    When the steps above require the UA to clear the stack back to a table
   1585    body context, it means that the UA must, while the current node is not
   1586    a tbody, tfoot, thead, or html element, pop elements from the stack of
   1587    open elements.
   1588 
   1589    The current node being an html element after this process is a fragment
   1590    case.
   1591 
   1592       8.2.5.16 The "in row" insertion mode
   1593 
   1594    When the insertion mode is "in row", tokens must be handled as follows:
   1595 
   1596    A start tag whose tag name is one of: "th", "td"
   1597           Clear the stack back to a table row context. (See below.)
   1598 
   1599           Insert an HTML element for the token, then switch the insertion
   1600           mode to "in cell".
   1601 
   1602           Insert a marker at the end of the list of active formatting
   1603           elements.
   1604 
   1605    An end tag whose tag name is "tr"
   1606           If the stack of open elements does not have an element in table
   1607           scope with the same tag name as the token, this is a parse
   1608           error. Ignore the token. (fragment case)
   1609 
   1610           Otherwise:
   1611 
   1612           Clear the stack back to a table row context. (See below.)
   1613 
   1614           Pop the current node (which will be a tr element) from the stack
   1615           of open elements. Switch the insertion mode to "in table body".
   1616 
   1617    A start tag whose tag name is one of: "caption", "col", "colgroup",
   1618           "tbody", "tfoot", "thead", "tr"
   1619 
   1620    An end tag whose tag name is "table"
   1621           Act as if an end tag with the tag name "tr" had been seen, then,
   1622           if that token wasn't ignored, reprocess the current token.
   1623 
   1624           The fake end tag token here can only be ignored in the fragment
   1625           case.
   1626 
   1627    An end tag whose tag name is one of: "tbody", "tfoot", "thead"
   1628           If the stack of open elements does not have an element in table
   1629           scope with the same tag name as the token, this is a parse
   1630           error. Ignore the token.
   1631 
   1632           Otherwise, act as if an end tag with the tag name "tr" had been
   1633           seen, then reprocess the current token.
   1634 
   1635    An end tag whose tag name is one of: "body", "caption", "col",
   1636           "colgroup", "html", "td", "th"
   1637           Parse error. Ignore the token.
   1638 
   1639    Anything else
   1640           Process the token using the rules for the "in table" insertion
   1641           mode.
   1642 
   1643    When the steps above require the UA to clear the stack back to a table
   1644    row context, it means that the UA must, while the current node is not a
   1645    tr element or an html element, pop elements from the stack of open
   1646    elements.
   1647 
   1648    The current node being an html element after this process is a fragment
   1649    case.
   1650 
   1651       8.2.5.17 The "in cell" insertion mode
   1652 
   1653    When the insertion mode is "in cell", tokens must be handled as
   1654    follows:
   1655 
   1656    An end tag whose tag name is one of: "td", "th"
   1657           If the stack of open elements does not have an element in table
   1658           scope with the same tag name as that of the token, then this is
   1659           a parse error and the token must be ignored.
   1660 
   1661           Otherwise:
   1662 
   1663           Generate implied end tags.
   1664 
   1665           Now, if the current node is not an element with the same tag
   1666           name as the token, then this is a parse error.
   1667 
   1668           Pop elements from this stack until an element with the same tag
   1669           name as the token has been popped from the stack.
   1670 
   1671           Clear the list of active formatting elements up to the last
   1672           marker.
   1673 
   1674           Switch the insertion mode to "in row". (The current node will be
   1675           a tr element at this point.)
   1676 
   1677    A start tag whose tag name is one of: "caption", "col", "colgroup",
   1678           "tbody", "td", "tfoot", "th", "thead", "tr"
   1679           If the stack of open elements does not have a td or th element
   1680           in table scope, then this is a parse error; ignore the token.
   1681           (fragment case)
   1682 
   1683           Otherwise, close the cell (see below) and reprocess the current
   1684           token.
   1685 
   1686    An end tag whose tag name is one of: "body", "caption", "col",
   1687           "colgroup", "html"
   1688           Parse error. Ignore the token.
   1689 
   1690    An end tag whose tag name is one of: "table", "tbody", "tfoot",
   1691           "thead", "tr"
   1692           If the stack of open elements does not have an element in table
   1693           scope with the same tag name as that of the token (which can
   1694           only happen for "tbody", "tfoot" and "thead", or, in the
   1695           fragment case), then this is a parse error and the token must be
   1696           ignored.
   1697 
   1698           Otherwise, close the cell (see below) and reprocess the current
   1699           token.
   1700 
   1701    Anything else
   1702           Process the token using the rules for the "in body" insertion
   1703           mode.
   1704 
   1705    Where the steps above say to close the cell, they mean to run the
   1706    following algorithm:
   1707     1. If the stack of open elements has a td element in table scope, then
   1708        act as if an end tag token with the tag name "td" had been seen.
   1709     2. Otherwise, the stack of open elements will have a th element in
   1710        table scope; act as if an end tag token with the tag name "th" had
   1711        been seen.
   1712 
   1713    The stack of open elements cannot have both a td and a th element in
   1714    table scope at the same time, nor can it have neither when the
   1715    insertion mode is "in cell".
   1716 
   1717       8.2.5.18 The "in select" insertion mode
   1718 
   1719    When the insertion mode is "in select", tokens must be handled as
   1720    follows:
   1721 
   1722    A character token
   1723           Insert the token's character into the current node.
   1724 
   1725    A comment token
   1726           Append a Comment node to the current node with the data
   1727           attribute set to the data given in the comment token.
   1728 
   1729    A DOCTYPE token
   1730           Parse error. Ignore the token.
   1731 
   1732    A start tag whose tag name is "html"
   1733           Process the token using the rules for the "in body" insertion
   1734           mode.
   1735 
   1736    A start tag whose tag name is "option"
   1737           If the current node is an option element, act as if an end tag
   1738           with the tag name "option" had been seen.
   1739 
   1740           Insert an HTML element for the token.
   1741 
   1742    A start tag whose tag name is "optgroup"
   1743           If the current node is an option element, act as if an end tag
   1744           with the tag name "option" had been seen.
   1745 
   1746           If the current node is an optgroup element, act as if an end tag
   1747           with the tag name "optgroup" had been seen.
   1748 
   1749           Insert an HTML element for the token.
   1750 
   1751    An end tag whose tag name is "optgroup"
   1752           First, if the current node is an option element, and the node
   1753           immediately before it in the stack of open elements is an
   1754           optgroup element, then act as if an end tag with the tag name
   1755           "option" had been seen.
   1756 
   1757           If the current node is an optgroup element, then pop that node
   1758           from the stack of open elements. Otherwise, this is a parse
   1759           error; ignore the token.
   1760 
   1761    An end tag whose tag name is "option"
   1762           If the current node is an option element, then pop that node
   1763           from the stack of open elements. Otherwise, this is a parse
   1764           error; ignore the token.
   1765 
   1766    An end tag whose tag name is "select"
   1767           If the stack of open elements does not have an element in table
   1768           scope with the same tag name as the token, this is a parse
   1769           error. Ignore the token. (fragment case)
   1770 
   1771           Otherwise:
   1772 
   1773           Pop elements from the stack of open elements until a select
   1774           element has been popped from the stack.
   1775 
   1776           Reset the insertion mode appropriately.
   1777 
   1778    A start tag whose tag name is "select"
   1779           Parse error. Act as if the token had been an end tag with the
   1780           tag name "select" instead.
   1781 
   1782    A start tag whose tag name is one of: "input", "textarea"
   1783           Parse error. Act as if an end tag with the tag name "select" had
   1784           been seen, and reprocess the token.
   1785 
   1786    A start tag token whose tag name is "script"
   1787           Process the token using the rules for the "in head" insertion
   1788           mode.
   1789 
   1790    An end-of-file token
   1791           If the current node is not the root html element, then this is a
   1792           parse error.
   1793 
   1794           It can only be the current node in the fragment case.
   1795 
   1796           Stop parsing.
   1797 
   1798    Anything else
   1799           Parse error. Ignore the token.
   1800 
   1801       8.2.5.19 The "in select in table" insertion mode
   1802 
   1803    When the insertion mode is "in select in table", tokens must be handled
   1804    as follows:
   1805 
   1806    A start tag whose tag name is one of: "caption", "table", "tbody",
   1807           "tfoot", "thead", "tr", "td", "th"
   1808           Parse error. Act as if an end tag with the tag name "select" had
   1809           been seen, and reprocess the token.
   1810 
   1811    An end tag whose tag name is one of: "caption", "table", "tbody",
   1812           "tfoot", "thead", "tr", "td", "th"
   1813           Parse error.
   1814 
   1815           If the stack of open elements has an element in table scope with
   1816           the same tag name as that of the token, then act as if an end
   1817           tag with the tag name "select" had been seen, and reprocess the
   1818           token. Otherwise, ignore the token.
   1819 
   1820    Anything else
   1821           Process the token using the rules for the "in select" insertion
   1822           mode.
   1823 
   1824       8.2.5.20 The "in foreign content" insertion mode
   1825 
   1826    When the insertion mode is "in foreign content", tokens must be handled
   1827    as follows:
   1828 
   1829    A character token
   1830           Insert the token's character into the current node.
   1831 
   1832    A comment token
   1833           Append a Comment node to the current node with the data
   1834           attribute set to the data given in the comment token.
   1835 
   1836    A DOCTYPE token
   1837           Parse error. Ignore the token.
   1838 
   1839    A start tag whose tag name is neither "mglyph" nor "malignmark", if the
   1840           current node is an mi element in the MathML namespace.
   1841 
   1842    A start tag whose tag name is neither "mglyph" nor "malignmark", if the
   1843           current node is an mo element in the MathML namespace.
   1844 
   1845    A start tag whose tag name is neither "mglyph" nor "malignmark", if the
   1846           current node is an mn element in the MathML namespace.
   1847 
   1848    A start tag whose tag name is neither "mglyph" nor "malignmark", if the
   1849           current node is an ms element in the MathML namespace.
   1850 
   1851    A start tag whose tag name is neither "mglyph" nor "malignmark", if the
   1852           current node is an mtext element in the MathML namespace.
   1853 
   1854    A start tag, if the current node is an element in the HTML namespace.
   1855    An end tag
   1856           Process the token using the rules for the secondary insertion
   1857           mode.
   1858 
   1859           If, after doing so, the insertion mode is still "in foreign
   1860           content", but there is no element in scope that has a namespace
   1861           other than the HTML namespace, switch the insertion mode to the
   1862           secondary insertion mode.
   1863 
   1864    A start tag whose tag name is one of: "b", "big", "blockquote", "body",
   1865           "br", "center", "code", "dd", "div", "dl", "dt", "em", "embed",
   1866           "h1", "h2", "h3", "h4", "h5", "h6", "head", "hr", "i", "img",
   1867           "li", "listing", "menu", "meta", "nobr", "ol", "p", "pre",
   1868           "ruby", "s", "small", "span", "strong", "strike", "sub", "sup",
   1869           "table", "tt", "u", "ul", "var"
   1870 
   1871    A start tag whose tag name is "font", if the token has any attributes
   1872           named "color", "face", or "size"
   1873 
   1874    An end-of-file token
   1875           Parse error.
   1876 
   1877           Pop elements from the stack of open elements until the current
   1878           node is in the HTML namespace.
   1879 
   1880           Switch the insertion mode to the secondary insertion mode, and
   1881           reprocess the token.
   1882 
   1883    Any other start tag
   1884           If the current node is an element in the MathML namespace,
   1885           adjust MathML attributes for the token. (This fixes the case of
   1886           MathML attributes that are not all lowercase.)
   1887 
   1888           Adjust foreign attributes for the token. (This fixes the use of
   1889           namespaced attributes, in particular XLink in SVG.)
   1890 
   1891           Insert a foreign element for the token, in the same namespace as
   1892           the current node.
   1893 
   1894           If the token has its self-closing flag set, pop the current node
   1895           off the stack of open elements and acknowledge the token's
   1896           self-closing flag.
   1897 
   1898       8.2.5.21 The "after body" insertion mode
   1899 
   1900    When the insertion mode is "after body", tokens must be handled as
   1901    follows:
   1902 
   1903    A character token that is one of one of U+0009 CHARACTER TABULATION,
   1904           U+000A LINE FEED (LF), U+000C FORM FEED (FF), or U+0020 SPACE
   1905           Process the token using the rules for the "in body" insertion
   1906           mode.
   1907 
   1908    A comment token
   1909           Append a Comment node to the first element in the stack of open
   1910           elements (the html element), with the data attribute set to the
   1911           data given in the comment token.
   1912 
   1913    A DOCTYPE token
   1914           Parse error. Ignore the token.
   1915 
   1916    A start tag whose tag name is "html"
   1917           Process the token using the rules for the "in body" insertion
   1918           mode.
   1919 
   1920    An end tag whose tag name is "html"
   1921           If the parser was originally created as part of the HTML
   1922           fragment parsing algorithm, this is a parse error; ignore the
   1923           token. (fragment case)
   1924 
   1925           Otherwise, switch the insertion mode to "after after body".
   1926 
   1927    An end-of-file token
   1928           Stop parsing.
   1929 
   1930    Anything else
   1931           Parse error. Switch the insertion mode to "in body" and
   1932           reprocess the token.
   1933 
   1934       8.2.5.22 The "in frameset" insertion mode
   1935 
   1936    When the insertion mode is "in frameset", tokens must be handled as
   1937    follows:
   1938 
   1939    A character token that is one of one of U+0009 CHARACTER TABULATION,
   1940           U+000A LINE FEED (LF), U+000C FORM FEED (FF), or U+0020 SPACE
   1941           Insert the character into the current node.
   1942 
   1943    A comment token
   1944           Append a Comment node to the current node with the data
   1945           attribute set to the data given in the comment token.
   1946 
   1947    A DOCTYPE token
   1948           Parse error. Ignore the token.
   1949 
   1950    A start tag whose tag name is "html"
   1951           Process the token using the rules for the "in body" insertion
   1952           mode.
   1953 
   1954    A start tag whose tag name is "frameset"
   1955           Insert an HTML element for the token.
   1956 
   1957    An end tag whose tag name is "frameset"
   1958           If the current node is the root html element, then this is a
   1959           parse error; ignore the token. (fragment case)
   1960 
   1961           Otherwise, pop the current node from the stack of open elements.
   1962 
   1963           If the parser was not originally created as part of the HTML
   1964           fragment parsing algorithm (fragment case), and the current node
   1965           is no longer a frameset element, then switch the insertion mode
   1966           to "after frameset".
   1967 
   1968    A start tag whose tag name is "frame"
   1969           Insert an HTML element for the token. Immediately pop the
   1970           current node off the stack of open elements.
   1971 
   1972           Acknowledge the token's self-closing flag, if it is set.
   1973 
   1974    A start tag whose tag name is "noframes"
   1975           Process the token using the rules for the "in head" insertion
   1976           mode.
   1977 
   1978    An end-of-file token
   1979           If the current node is not the root html element, then this is a
   1980           parse error.
   1981 
   1982           It can only be the current node in the fragment case.
   1983 
   1984           Stop parsing.
   1985 
   1986    Anything else
   1987           Parse error. Ignore the token.
   1988 
   1989       8.2.5.23 The "after frameset" insertion mode
   1990 
   1991    When the insertion mode is "after frameset", tokens must be handled as
   1992    follows:
   1993 
   1994    A character token that is one of one of U+0009 CHARACTER TABULATION,
   1995           U+000A LINE FEED (LF), U+000C FORM FEED (FF), or U+0020 SPACE
   1996           Insert the character into the current node.
   1997 
   1998    A comment token
   1999           Append a Comment node to the current node with the data
   2000           attribute set to the data given in the comment token.
   2001 
   2002    A DOCTYPE token
   2003           Parse error. Ignore the token.
   2004 
   2005    A start tag whose tag name is "html"
   2006           Process the token using the rules for the "in body" insertion
   2007           mode.
   2008 
   2009    An end tag whose tag name is "html"
   2010           Switch the insertion mode to "after after frameset".
   2011 
   2012    A start tag whose tag name is "noframes"
   2013           Process the token using the rules for the "in head" insertion
   2014           mode.
   2015 
   2016    An end-of-file token
   2017           Stop parsing.
   2018 
   2019    Anything else
   2020           Parse error. Ignore the token.
   2021 
   2022    This doesn't handle UAs that don't support frames, or that do support
   2023    frames but want to show the NOFRAMES content. Supporting the former is
   2024    easy; supporting the latter is harder.
   2025 
   2026       8.2.5.24 The "after after body" insertion mode
   2027 
   2028    When the insertion mode is "after after body", tokens must be handled
   2029    as follows:
   2030 
   2031    A comment token
   2032           Append a Comment node to the Document object with the data
   2033           attribute set to the data given in the comment token.
   2034 
   2035    A DOCTYPE token
   2036    A character token that is one of one of U+0009 CHARACTER TABULATION,
   2037           U+000A LINE FEED (LF), U+000C FORM FEED (FF), or U+0020 SPACE
   2038 
   2039    A start tag whose tag name is "html"
   2040           Process the token using the rules for the "in body" insertion
   2041           mode.
   2042 
   2043    An end-of-file token
   2044           Stop parsing.
   2045 
   2046    Anything else
   2047           Parse error. Switch the insertion mode to "in body" and
   2048           reprocess the token.
   2049 
   2050       8.2.5.25 The "after after frameset" insertion mode
   2051 
   2052    When the insertion mode is "after after frameset", tokens must be
   2053    handled as follows:
   2054 
   2055    A comment token
   2056           Append a Comment node to the Document object with the data
   2057           attribute set to the data given in the comment token.
   2058 
   2059    A DOCTYPE token
   2060    A character token that is one of one of U+0009 CHARACTER TABULATION,
   2061           U+000A LINE FEED (LF), U+000C FORM FEED (FF), or U+0020 SPACE
   2062 
   2063    A start tag whose tag name is "html"
   2064           Process the token using the rules for the "in body" insertion
   2065           mode.
   2066 
   2067    An end-of-file token
   2068           Stop parsing.
   2069 
   2070    A start tag whose tag name is "noframes"
   2071           Process the token using the rules for the "in head" insertion
   2072           mode.
   2073 
   2074    Anything else
   2075           Parse error. Ignore the token.
   2076 
   2077     8.2.6 The end
   2078 
   2079    Once the user agent stops parsing the document, the user agent must
   2080    follow the steps in this section.
   2081 
   2082    First, the current document readiness must be set to "interactive".
   2083 
   2084    Then, the rules for when a script completes loading start applying
   2085    (script execution is no longer managed by the parser).
   2086 
   2087    If any of the scripts in the list of scripts that will execute as soon
   2088    as possible have completed loading, or if the list of scripts that will
   2089    execute asynchronously is not empty and the first script in that list
   2090    has completed loading, then the user agent must act as if those scripts
   2091    just completed loading, following the rules given for that in the
   2092    script element definition.
   2093 
   2094    Then, if the list of scripts that will execute when the document has
   2095    finished parsing is not empty, and the first item in this list has
   2096    already completed loading, then the user agent must act as if that
   2097    script just finished loading.
   2098 
   2099    By this point, there will be no scripts that have loaded but have not
   2100    yet been executed.
   2101 
   2102    The user agent must then fire a simple event called DOMContentLoaded at
   2103    the Document.
   2104 
   2105    Once everything that delays the load event has completed, the user
   2106    agent must set the current document readiness to "complete", and then
   2107    fire a load event at the body element.
   2108 
   2109    delaying the load event for things like image loads allows for intranet
   2110    port scans (even without javascript!). Should we really encode that
   2111    into the spec?
   2112 
   2113     8.2.7 Coercing an HTML DOM into an infoset
   2114 
   2115    When an application uses an HTML parser in conjunction with an XML
   2116    pipeline, it is possible that the constructed DOM is not compatible
   2117    with the XML tool chain in certain subtle ways. For example, an XML
   2118    toolchain might not be able to represent attributes with the name
   2119    xmlns, since they conflict with the Namespaces in XML syntax. There is
   2120    also some data that the HTML parser generates that isn't included in
   2121    the DOM itself. This section specifies some rules for handling these
   2122    issues.
   2123 
   2124    If the XML API being used doesn't support DOCTYPEs, the tool may drop
   2125    DOCTYPEs altogether.
   2126 
   2127    If the XML API doesn't support attributes in no namespace that are
   2128    named "xmlns", attributes whose names start with "xmlns:", or
   2129    attributes in the XMLNS namespace, then the tool may drop such
   2130    attributes.
   2131 
   2132    The tool may annotate the output with any namespace declarations
   2133    required for proper operation.
   2134 
   2135    If the XML API being used restricts the allowable characters in the
   2136    local names of elements and attributes, then the tool may map all
   2137    element and attribute local names that the API wouldn't support to a
   2138    set of names that are allowed, by replacing any character that isn't
   2139    supported with the uppercase letter U and the five digits of the
   2140    character's Unicode codepoint when expressed in hexadecimal, using
   2141    digits 0-9 and capital letters A-F as the symbols, in increasing
   2142    numeric order.
   2143 
   2144    For example, the element name foo<bar, which can be output by the HTML
   2145    parser, though it is neither a legal HTML element name nor a
   2146    well-formed XML element name, would be converted into fooU0003Cbar,
   2147    which is a well-formed XML element name (though it's still not legal in
   2148    HTML by any means).
   2149 
   2150    As another example, consider the attribute xlink:href. Used on a MathML
   2151    element, it becomes, after being adjusted, an attribute with a prefix
   2152    "xlink" and a local name "href". However, used on an HTML element, it
   2153    becomes an attribute with no prefix and the local name "xlink:href",
   2154    which is not a valid NCName, and thus might not be accepted by an XML
   2155    API. It could thus get converted, becoming "xlinkU0003Ahref".
   2156 
   2157    The resulting names from this conversion conveniently can't clash with
   2158    any attribute generated by the HTML parser, since those are all either
   2159    lowercase or those listed in the adjust foreign attributes algorithm's
   2160    table.
   2161 
   2162    If the XML API restricts comments from having two consecutive U+002D
   2163    HYPHEN-MINUS characters (--), the tool may insert a single U+0020 SPACE
   2164    character between any such offending characters.
   2165 
   2166    If the XML API restricts comments from ending in a U+002D HYPHEN-MINUS
   2167    character (-), the tool may insert a single U+0020 SPACE character at
   2168    the end of such comments.
   2169 
   2170    If the XML API restricts allowed characters in character data, the tool
   2171    may replace any U+000C FORM FEED (FF) character with a U+0020 SPACE
   2172    character, and any other literal non-XML character with a U+FFFD
   2173    REPLACEMENT CHARACTER.
   2174 
   2175    If the tool has no way to convey out-of-band information, then the tool
   2176    may drop the following information:
   2177      * Whether the document is set to no quirks mode, limited quirks mode,
   2178        or quirks mode
   2179      * The association between form controls and forms that aren't their
   2180        nearest form element ancestor (use of the form element pointer in
   2181        the parser)
   2182 
   2183    The mutations allowed by this section apply after the HTML parser's
   2184    rules have been applied. For example, a <a::> start tag will be closed
   2185    by a </a::> end tag, and never by a </aU0003AU0003A> end tag, even if
   2186    the user agent is using the rules above to then generate an actual
   2187    element in the DOM with the name aU0003AU0003A for that start tag.
   2188 
   2189   8.3 Namespaces
   2190 
   2191    The HTML namespace is: http://www.w3.org/1999/xhtml
   2192 
   2193    The MathML namespace is: http://www.w3.org/1998/Math/MathML
   2194 
   2195    The SVG namespace is: http://www.w3.org/2000/svg
   2196 
   2197    The XLink namespace is: http://www.w3.org/1999/xlink
   2198 
   2199    The XML namespace is: http://www.w3.org/XML/1998/namespace
   2200 
   2201    The XMLNS namespace is: http://www.w3.org/2000/xmlns/
   2202