UNPKG

1.78 kBMarkdownView Raw
1# Simple HTML Tokenizer [![Build Status](![CI](https://github.com/tildeio/simple-html-tokenizer/workflows/CI/badge.svg))](https://github.com/tildeio/simple-html-tokenizer/actions?query=workflow%3ACI)
2
3Simple HTML Tokenizer is a lightweight JavaScript library that can be
4used to tokenize the kind of HTML normally found in templates. It can be
5used to preprocess templates to change the behavior of some template
6element depending upon whether the template element was found in an
7attribute or text.
8
9It is not a full HTML5 tokenizer. It focuses on the kind of HTML that is
10used in templates: content designed to be inserted into the `<body>`
11and without `<script>` tags.
12
13In particular, Simple HTML Tokenizer does not handle many states from
14the [HTML5 Tokenizer Specification][1]:
15
16* Any states involving `CDATA` or `RCDATA`
17* Any states involving `<script>`
18* Any states involving `<DOCTYPE>`
19* The bogus comment state
20
21It also passes through character references, instead of trying to
22tokenize and process them, because the preprocessed templates will
23ultimately be parsed by a real browser context.
24
25At the moment, there are some error states specified by the tokenizer
26spec that are not handled by Simple HTML Tokenizer. Ultimately, I plan
27to support all error states, as well as provide information about
28tokenizer errors in debug mode.
29
30[1]: http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html
31
32# Usage
33
34You can tokenize HTML:
35
36```js
37var tokens = HTML5Tokenizer.tokenize("<div id='foo' href=bar class=\"bat\">");
38
39var token = tokens[0];
40token.tagName //=> "div"
41token.attributes //=> [["id", "foo"], ["href", "bar"], ["class", "bat"]]
42token.selfClosing //=> false
43```
44
45## Building and running the tests
46
47```bash
48npm install
49npm test
50```