1 | # Parsed HTML Rewriter
|
2 | A DOM-based implementation of [Cloudflare Worker's `HTMLRewriter`](https://developers.cloudflare.com/workers/runtime-apis/html-rewriter).
|
3 |
|
4 | ***
|
5 | ___UPDATE: While this module works just fine, I've made [a new verison](https://github.com/worker-tools/html-rewriter) that is WASM/streaming based for much better performance.___
|
6 | ***
|
7 |
|
8 | Unlike the original, this implementation parses the entire DOM (provided by [`linkedom`](https://github.com/WebReflection/linkedom)),
|
9 | and runs selectors against this representation. As a result, it is slower, more memory intensive, and can't process streaming data.
|
10 |
|
11 | Note that this approach was chosen to quickly implement the functionality of `HTMLRewriter`, as there is currently no JS implementation available.
|
12 | A better implementation would replicate the streaming approach of [`lol-html`](https://github.com/cloudflare/lol-html), or even use a WebAssembly version of it. _Update: [Now available here](https://github.com/worker-tools/html-rewriter)_.
|
13 |
|
14 | However, this implementation should run in most JS contexts (including Web Workers, Service Workers and Deno) without modification and handle many, if not most, use cases of `HTMLRewriter`.
|
15 | It should be good enough for testing and offline Workers development.
|
16 |
|
17 | ## Usage
|
18 | This module can be used in two ways.
|
19 |
|
20 | As a standalone module:
|
21 |
|
22 | ```ts
|
23 | import { ParsedHTMLRewriter } from '@worker-tools/parsed-html-rewriter'
|
24 |
|
25 | await new ParsedHTMLRewriter()
|
26 | .transform(new Response('<body></body>'))
|
27 | .text();
|
28 | ```
|
29 |
|
30 | Or as a polyfill:
|
31 |
|
32 | ```ts
|
33 | import '@worker-tools/parsed-html-rewriter/polyfill'
|
34 |
|
35 | await new HTMLRewriter() // Will use the native version when running in a Worker
|
36 | .transform(new Response('<body></body>'))
|
37 | .text();
|
38 | ```
|
39 |
|
40 | ### innerHTML
|
41 | Unlike the current (March 2021) version on CF Workers, this implementation already supports the [proposed `innerHTML` handler](https://github.com/cloudflare/lol-html/issues/40#issuecomment-567126687).
|
42 | Note that this feature is unstable and will likely change as the real version materializes.
|
43 |
|
44 | ```ts
|
45 | await new HTMLRewriter()
|
46 | .on('body', {
|
47 | innerHTML(html) {
|
48 | console.log(html) // => '<div id="foo">bar</div>'
|
49 | },
|
50 | })
|
51 | .transform(new Response('<body><div id="foo">bar</div></body>'))
|
52 | .text();
|
53 | ```
|
54 |
|
55 | ## Caveats
|
56 | - Because this version isn't based on streaming data, the order in which handlers are called can differ. Some measure have been taken to simulate the order, but differences may occur.
|
57 | - Texts never arrive in chunks. There is always just one chunk, followed by an empty one with `lastInTextNode` set to `true`.
|