UNPKG

2.58 kBMarkdownView Raw
1# Parsed HTML Rewriter
2A DOM-based implementation of [Cloudflare Worker's `HTMLRewriter`](https://developers.cloudflare.com/workers/runtime-apis/html-rewriter).
3
4***
5___UPDATE: While this module works just fine, I've made [a new verison](https://github.com/worker-tools/html-rewriter) that is WASM/streaming based for much better performance.___
6***
7
8Unlike the original, this implementation parses the entire DOM (provided by [`linkedom`](https://github.com/WebReflection/linkedom)),
9and runs selectors against this representation. As a result, it is slower, more memory intensive, and can't process streaming data.
10
11Note that this approach was chosen to quickly implement the functionality of `HTMLRewriter`, as there is currently no JS implementation available.
12A better implementation would replicate the streaming approach of [`lol-html`](https://github.com/cloudflare/lol-html), or even use a WebAssembly version of it. _Update: [Now available here](https://github.com/worker-tools/html-rewriter)_.
13
14However, this implementation should run in most JS contexts (including Web Workers, Service Workers and Deno) without modification and handle many, if not most, use cases of `HTMLRewriter`.
15It should be good enough for testing and offline Workers development.
16
17## Usage
18This module can be used in two ways.
19
20As a standalone module:
21
22```ts
23import { ParsedHTMLRewriter } from '@worker-tools/parsed-html-rewriter'
24
25await new ParsedHTMLRewriter()
26 .transform(new Response('<body></body>'))
27 .text();
28```
29
30Or as a polyfill:
31
32```ts
33import '@worker-tools/parsed-html-rewriter/polyfill'
34
35await new HTMLRewriter() // Will use the native version when running in a Worker
36 .transform(new Response('<body></body>'))
37 .text();
38```
39
40### innerHTML
41Unlike the current (March 2021) version on CF Workers, this implementation already supports the [proposed `innerHTML` handler](https://github.com/cloudflare/lol-html/issues/40#issuecomment-567126687).
42Note that this feature is unstable and will likely change as the real version materializes.
43
44```ts
45await new HTMLRewriter()
46 .on('body', {
47 innerHTML(html) {
48 console.log(html) // => '<div id="foo">bar</div>'
49 },
50 })
51 .transform(new Response('<body><div id="foo">bar</div></body>'))
52 .text();
53```
54
55## Caveats
56- Because this version isn't based on streaming data, the order in which handlers are called can differ. Some measure have been taken to simulate the order, but differences may occur.
57- Texts never arrive in chunks. There is always just one chunk, followed by an empty one with `lastInTextNode` set to `true`.