1 | # Simple HTML Tokenizer [![Build Status](![CI](https://github.com/tildeio/simple-html-tokenizer/workflows/CI/badge.svg))](https://github.com/tildeio/simple-html-tokenizer/actions?query=workflow%3ACI)
|
2 |
|
3 | Simple HTML Tokenizer is a lightweight JavaScript library that can be
|
4 | used to tokenize the kind of HTML normally found in templates. It can be
|
5 | used to preprocess templates to change the behavior of some template
|
6 | element depending upon whether the template element was found in an
|
7 | attribute or text.
|
8 |
|
9 | It is not a full HTML5 tokenizer. It focuses on the kind of HTML that is
|
10 | used in templates: content designed to be inserted into the `<body>`
|
11 | and without `<script>` tags.
|
12 |
|
13 | In particular, Simple HTML Tokenizer does not handle many states from
|
14 | the [HTML5 Tokenizer Specification][1]:
|
15 |
|
16 | * Any states involving `CDATA` or `RCDATA`
|
17 | * Any states involving `<script>`
|
18 | * Any states involving `<DOCTYPE>`
|
19 | * The bogus comment state
|
20 |
|
21 | It also passes through character references, instead of trying to
|
22 | tokenize and process them, because the preprocessed templates will
|
23 | ultimately be parsed by a real browser context.
|
24 |
|
25 | At the moment, there are some error states specified by the tokenizer
|
26 | spec that are not handled by Simple HTML Tokenizer. Ultimately, I plan
|
27 | to support all error states, as well as provide information about
|
28 | tokenizer errors in debug mode.
|
29 |
|
30 | [1]: http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.html
|
31 |
|
32 | # Usage
|
33 |
|
34 | You can tokenize HTML:
|
35 |
|
36 | ```js
|
37 | var tokens = HTML5Tokenizer.tokenize("<div id='foo' href=bar class=\"bat\">");
|
38 |
|
39 | var token = tokens[0];
|
40 | token.tagName //=> "div"
|
41 | token.attributes //=> [["id", "foo"], ["href", "bar"], ["class", "bat"]]
|
42 | token.selfClosing //=> false
|
43 | ```
|
44 |
|
45 | ## Building and running the tests
|
46 |
|
47 | ```bash
|
48 | npm install
|
49 | npm test
|
50 | ```
|