Pure JavaScript HTML5 Parser

All-in-one: XML Serializer, DOM Builder, DOM Document Creator, A SAX-style API

Learn more





While this library doesn't cover the full gamut of possible weirdness that HTML provides, it does handle a lot of the most obvious stuff. All of the following are accounted for:

  • Unclosed Tags:
    HTMLtoXML("<p><b>Hello") == '<p><b>Hello</b></p>'
  • Empty Elements:
    HTMLtoXML("<img src=test.jpg>") == '<img src="test.jpg">'
  • Block vs. Inline Elements:
    HTMLtoXML("<b>Hello <p>John") == '<b>Hello </b><p>John</p>'
  • Self-closing Elements:
    HTMLtoXML("<p>Hello<p>World") == '<p>Hello</p><p>World</p>'
  • Attributes Without Values:
    HTMLtoXML("<input disabled>") == '<input disabled="disabled">'

Note: It does not take into account where in the document an element should exist. Right now you can put block elements in a head or th inside a p and it'll happily accept them. It's not entirely clear how the logic should work for those, but it's something that I'm open to exploring.