1 | # hast-util-sanitize
|
2 |
|
3 | [![Build][build-badge]][build]
|
4 | [![Coverage][coverage-badge]][coverage]
|
5 | [![Downloads][downloads-badge]][downloads]
|
6 | [![Size][size-badge]][size]
|
7 | [![Sponsors][sponsors-badge]][collective]
|
8 | [![Backers][backers-badge]][collective]
|
9 | [![Chat][chat-badge]][chat]
|
10 |
|
11 | [hast][] utility to make trees safe.
|
12 |
|
13 | ## Contents
|
14 |
|
15 | * [What is this?](#what-is-this)
|
16 | * [When should I use this?](#when-should-i-use-this)
|
17 | * [Install](#install)
|
18 | * [Use](#use)
|
19 | * [API](#api)
|
20 | * [`defaultSchema`](#defaultschema)
|
21 | * [`sanitize(tree[, options])`](#sanitizetree-options)
|
22 | * [`Schema`](#schema)
|
23 | * [Types](#types)
|
24 | * [Compatibility](#compatibility)
|
25 | * [Security](#security)
|
26 | * [Related](#related)
|
27 | * [Contribute](#contribute)
|
28 | * [License](#license)
|
29 |
|
30 | ## What is this?
|
31 |
|
32 | This package is a utility that can make a tree that potentially contains
|
33 | dangerous user content safe for use.
|
34 | It defaults to what GitHub does to clean unsafe markup, but you can change that.
|
35 |
|
36 | ## When should I use this?
|
37 |
|
38 | This package is needed whenever you deal with potentially dangerous user
|
39 | content.
|
40 |
|
41 | The plugin [`rehype-sanitize`][rehype-sanitize] wraps this utility to also
|
42 | sanitize HTML at a higher-level (easier) abstraction.
|
43 |
|
44 | ## Install
|
45 |
|
46 | This package is [ESM only][esm].
|
47 | In Node.js (version 16+), install with [npm][]:
|
48 |
|
49 | ```sh
|
50 | npm install hast-util-sanitize
|
51 | ```
|
52 |
|
53 | In Deno with [`esm.sh`][esmsh]:
|
54 |
|
55 | ```js
|
56 | import {sanitize} from 'https://esm.sh/hast-util-sanitize@5'
|
57 | ```
|
58 |
|
59 | In browsers with [`esm.sh`][esmsh]:
|
60 |
|
61 | ```html
|
62 | <script type="module">
|
63 | import {sanitize} from 'https://esm.sh/hast-util-sanitize@5?bundle'
|
64 | </script>
|
65 | ```
|
66 |
|
67 | ## Use
|
68 |
|
69 | ```js
|
70 | import {h} from 'hastscript'
|
71 | import {sanitize} from 'hast-util-sanitize'
|
72 | import {toHtml} from 'hast-util-to-html'
|
73 | import {u} from 'unist-builder'
|
74 |
|
75 | const unsafe = h('div', {onmouseover: 'alert("alpha")'}, [
|
76 | h(
|
77 | 'a',
|
78 | {href: 'jAva script:alert("bravo")', onclick: 'alert("charlie")'},
|
79 | 'delta'
|
80 | ),
|
81 | u('text', '\n'),
|
82 | h('script', 'alert("charlie")'),
|
83 | u('text', '\n'),
|
84 | h('img', {src: 'x', onerror: 'alert("delta")'}),
|
85 | u('text', '\n'),
|
86 | h('iframe', {src: 'javascript:alert("echo")'}),
|
87 | u('text', '\n'),
|
88 | h('math', h('mi', {'xlink:href': 'data:x,<script>alert("foxtrot")</script>'}))
|
89 | ])
|
90 |
|
91 | const safe = sanitize(unsafe)
|
92 |
|
93 | console.log(toHtml(unsafe))
|
94 | console.log(toHtml(safe))
|
95 | ```
|
96 |
|
97 | Unsafe:
|
98 |
|
99 | ```html
|
100 | <div onmouseover="alert("alpha")"><a href="jAva script:alert("bravo")" onclick="alert("charlie")">delta</a>
|
101 | <script>alert("charlie")</script>
|
102 | <img src="x" onerror="alert("delta")">
|
103 | <iframe src="javascript:alert("echo")"></iframe>
|
104 | <math><mi xlink:href="data:x,<script>alert("foxtrot")</script>"></mi></math></div>
|
105 | ```
|
106 |
|
107 | Safe:
|
108 |
|
109 | ```html
|
110 | <div><a>delta</a>
|
111 |
|
112 | <img src="x">
|
113 |
|
114 | </div>
|
115 | ```
|
116 |
|
117 | ## API
|
118 |
|
119 | This package exports the identifiers [`defaultSchema`][api-default-schema] and
|
120 | [`sanitize`][api-sanitize].
|
121 | There is no default export.
|
122 |
|
123 | ### `defaultSchema`
|
124 |
|
125 | Default schema ([`Schema`][api-schema]).
|
126 |
|
127 | Follows [GitHub][] style sanitation.
|
128 |
|
129 | ### `sanitize(tree[, options])`
|
130 |
|
131 | Sanitize a tree.
|
132 |
|
133 | ###### Parameters
|
134 |
|
135 | * `tree` ([`Node`][node])
|
136 | — unsafe tree
|
137 | * `options` ([`Schema`][api-schema], default:
|
138 | [`defaultSchema`][api-default-schema])
|
139 | — configuration
|
140 |
|
141 | ###### Returns
|
142 |
|
143 | New, safe tree ([`Node`][node]).
|
144 |
|
145 | ### `Schema`
|
146 |
|
147 | Schema that defines what nodes and properties are allowed.
|
148 |
|
149 | The default schema is [`defaultSchema`][api-default-schema], which follows how
|
150 | GitHub cleans.
|
151 | If any top-level key is missing in the given schema, the corresponding
|
152 | value of the default schema is used.
|
153 |
|
154 | To extend the standard schema with a few changes, clone `defaultSchema`
|
155 | like so:
|
156 |
|
157 | ```js
|
158 | import deepmerge from 'deepmerge'
|
159 | import {h} from 'hastscript'
|
160 | import {defaultSchema, sanitize} from 'hast-util-sanitize'
|
161 |
|
162 | // This allows `className` on all elements.
|
163 | const schema = deepmerge(defaultSchema, {attributes: {'*': ['className']}})
|
164 |
|
165 | const tree = sanitize(h('div', {className: ['foo']}), schema)
|
166 |
|
167 | // `tree` still has `className`.
|
168 | console.log(tree)
|
169 | // {
|
170 | // type: 'element',
|
171 | // tagName: 'div',
|
172 | // properties: {className: ['foo']},
|
173 | // children: []
|
174 | // }
|
175 | ```
|
176 |
|
177 | ##### Fields
|
178 |
|
179 | ###### `allowComments`
|
180 |
|
181 | Whether to allow comment nodes (`boolean`, default: `false`).
|
182 |
|
183 | For example:
|
184 |
|
185 | ```js
|
186 | allowComments: true
|
187 | ```
|
188 |
|
189 | ###### `allowDoctypes`
|
190 |
|
191 | Whether to allow doctype nodes (`boolean`, default: `false`).
|
192 |
|
193 | For example:
|
194 |
|
195 | ```js
|
196 | allowDoctypes: true
|
197 | ```
|
198 |
|
199 | ###### `ancestors`
|
200 |
|
201 | Map of tag names to a list of tag names which are required ancestors
|
202 | (`Record<string, Array<string>>`, default: `defaultSchema.ancestors`).
|
203 |
|
204 | Elements with these tag names will be ignored if they occur outside of one
|
205 | of their allowed parents.
|
206 |
|
207 | For example:
|
208 |
|
209 | ```js
|
210 | ancestors: {
|
211 | tbody: ['table'],
|
212 | // …
|
213 | tr: ['table']
|
214 | }
|
215 | ```
|
216 |
|
217 | ###### `attributes`
|
218 |
|
219 | Map of tag names to allowed [property names][name]
|
220 | (`Record<string, Array<[string, ...Array<RegExp | boolean | number | string>] | string>`,
|
221 | default: `defaultSchema.attributes`).
|
222 |
|
223 | The special key `'*'` as a tag name defines property names allowed on all
|
224 | elements.
|
225 |
|
226 | The special value `'data*'` as a property name can be used to allow all `data`
|
227 | properties.
|
228 |
|
229 | For example:
|
230 |
|
231 | ```js
|
232 | attributes: {
|
233 | a: [
|
234 | 'ariaDescribedBy', 'ariaLabel', 'ariaLabelledBy', /* … */, 'href'
|
235 | ],
|
236 | // …
|
237 | '*': [
|
238 | 'abbr',
|
239 | 'accept',
|
240 | 'acceptCharset',
|
241 | // …
|
242 | 'vAlign',
|
243 | 'value',
|
244 | 'width'
|
245 | ]
|
246 | }
|
247 | ```
|
248 |
|
249 | Instead of a single string in the array, which allows any property value for
|
250 | the field, you can use an array to allow several values.
|
251 | For example, `input: ['type']` allows `type` set to any value on `input`s.
|
252 | But `input: [['type', 'checkbox', 'radio']]` allows `type` when set to
|
253 | `'checkbox'` or `'radio'`.
|
254 |
|
255 | You can use regexes, so for example `span: [['className', /^hljs-/]]` allows
|
256 | any class that starts with `hljs-` on `span`s.
|
257 |
|
258 | When comma- or space-separated values are used (such as `className`), each
|
259 | value in is checked individually.
|
260 | For example, to allow certain classes on `span`s for syntax highlighting, use
|
261 | `span: [['className', 'number', 'operator', 'token']]`.
|
262 | This will allow `'number'`, `'operator'`, and `'token'` classes, but drop
|
263 | others.
|
264 |
|
265 | ###### `clobber`
|
266 |
|
267 | List of [*property names*][name] that clobber (`Array<string>`, default:
|
268 | `defaultSchema.clobber`).
|
269 |
|
270 | For example:
|
271 |
|
272 | ```js
|
273 | clobber: ['ariaDescribedBy', 'ariaLabelledBy', 'id', 'name']
|
274 | ```
|
275 |
|
276 | ###### `clobberPrefix`
|
277 |
|
278 | Prefix to use before clobbering properties (`string`, default:
|
279 | `defaultSchema.clobberPrefix`).
|
280 |
|
281 | For example:
|
282 |
|
283 | ```js
|
284 | clobberPrefix: 'user-content-'
|
285 | ```
|
286 |
|
287 | ###### `protocols`
|
288 |
|
289 | Map of [*property names*][name] to allowed protocols
|
290 | (`Record<string, Array<string>>`, default: `defaultSchema.protocols`).
|
291 |
|
292 | This defines URLs that are always allowed to have local URLs (relative to
|
293 | the current website, such as `this`, `#this`, `/this`, or `?this`), and
|
294 | only allowed to have remote URLs (such as `https://example.com`) if they
|
295 | use a known protocol.
|
296 |
|
297 | For example:
|
298 |
|
299 | ```js
|
300 | protocols: {
|
301 | cite: ['http', 'https'],
|
302 | // …
|
303 | src: ['http', 'https']
|
304 | }
|
305 | ```
|
306 |
|
307 | ###### `required`
|
308 |
|
309 | Map of tag names to required [*property names*][name] with a default value
|
310 | (`Record<string, Record<string, unknown>>`, default: `defaultSchema.required`).
|
311 |
|
312 | This defines properties that must be set.
|
313 | If a field does not exist (after the element was made safe), these will be
|
314 | added with the given value.
|
315 |
|
316 | For example:
|
317 |
|
318 | ```js
|
319 | required: {
|
320 | input: {disabled: true, type: 'checkbox'}
|
321 | }
|
322 | ```
|
323 |
|
324 | > 👉 **Note**: properties are first checked based on `schema.attributes`,
|
325 | > then on `schema.required`.
|
326 | > That means properties could be removed by `attributes` and then added
|
327 | > again with `required`.
|
328 |
|
329 | ###### `strip`
|
330 |
|
331 | List of tag names to strip from the tree (`Array<string>`, default:
|
332 | `defaultSchema.strip`).
|
333 |
|
334 | By default, unsafe elements (those not in `schema.tagNames`) are replaced by
|
335 | what they contain.
|
336 | This option can drop their contents.
|
337 |
|
338 | For example:
|
339 |
|
340 | ```js
|
341 | strip: ['script']
|
342 | ```
|
343 |
|
344 | ###### `tagNames`
|
345 |
|
346 | List of allowed tag names (`Array<string>`, default: `defaultSchema.tagNames`).
|
347 |
|
348 | For example:
|
349 |
|
350 | ```js
|
351 | tagNames: [
|
352 | 'a',
|
353 | 'b',
|
354 | // …
|
355 | 'ul',
|
356 | 'var'
|
357 | ]
|
358 | ```
|
359 |
|
360 | ## Types
|
361 |
|
362 | This package is fully typed with [TypeScript][].
|
363 | It exports the additional type [`Schema`][api-schema].
|
364 |
|
365 | ## Compatibility
|
366 |
|
367 | Projects maintained by the unified collective are compatible with maintained
|
368 | versions of Node.js.
|
369 |
|
370 | When we cut a new major release, we drop support for unmaintained versions of
|
371 | Node.
|
372 | This means we try to keep the current release line, `hast-util-sanitize@^5`,
|
373 | compatible with Node.js 16.
|
374 |
|
375 | ## Security
|
376 |
|
377 | By default, `hast-util-sanitize` will make everything safe to use.
|
378 | Assuming you understand that certain attributes (including a limited set of
|
379 | classes) can be generated by users, and you write your CSS (and JS)
|
380 | accordingly.
|
381 | When used incorrectly, deviating from the defaults can open you up to a
|
382 | [cross-site scripting (XSS)][xss] attack.
|
383 |
|
384 | Use `hast-util-sanitize` after the last unsafe thing: everything after it could
|
385 | be unsafe (but is fine if you do trust it).
|
386 |
|
387 | ## Related
|
388 |
|
389 | * [`rehype-sanitize`](https://github.com/rehypejs/rehype-sanitize)
|
390 | — rehype plugin
|
391 |
|
392 | ## Contribute
|
393 |
|
394 | See [`contributing.md`][contributing] in [`syntax-tree/.github`][health] for
|
395 | ways to get started.
|
396 | See [`support.md`][support] for ways to get help.
|
397 |
|
398 | This project has a [code of conduct][coc].
|
399 | By interacting with this repository, organization, or community you agree to
|
400 | abide by its terms.
|
401 |
|
402 | ## License
|
403 |
|
404 | [MIT][license] © [Titus Wormer][author]
|
405 |
|
406 |
|
407 |
|
408 | [build-badge]: https://github.com/syntax-tree/hast-util-sanitize/workflows/main/badge.svg
|
409 |
|
410 | [build]: https://github.com/syntax-tree/hast-util-sanitize/actions
|
411 |
|
412 | [coverage-badge]: https://img.shields.io/codecov/c/github/syntax-tree/hast-util-sanitize.svg
|
413 |
|
414 | [coverage]: https://codecov.io/github/syntax-tree/hast-util-sanitize
|
415 |
|
416 | [downloads-badge]: https://img.shields.io/npm/dm/hast-util-sanitize.svg
|
417 |
|
418 | [downloads]: https://www.npmjs.com/package/hast-util-sanitize
|
419 |
|
420 | [size-badge]: https://img.shields.io/badge/dynamic/json?label=minzipped%20size&query=$.size.compressedSize&url=https://deno.bundlejs.com/?q=hast-util-sanitize
|
421 |
|
422 | [size]: https://bundlejs.com/?q=hast-util-sanitize
|
423 |
|
424 | [sponsors-badge]: https://opencollective.com/unified/sponsors/badge.svg
|
425 |
|
426 | [backers-badge]: https://opencollective.com/unified/backers/badge.svg
|
427 |
|
428 | [collective]: https://opencollective.com/unified
|
429 |
|
430 | [chat-badge]: https://img.shields.io/badge/chat-discussions-success.svg
|
431 |
|
432 | [chat]: https://github.com/syntax-tree/unist/discussions
|
433 |
|
434 | [npm]: https://docs.npmjs.com/cli/install
|
435 |
|
436 | [esm]: https://gist.github.com/sindresorhus/a39789f98801d908bbc7ff3ecc99d99c
|
437 |
|
438 | [esmsh]: https://esm.sh
|
439 |
|
440 | [typescript]: https://www.typescriptlang.org
|
441 |
|
442 | [license]: license
|
443 |
|
444 | [author]: https://wooorm.com
|
445 |
|
446 | [health]: https://github.com/syntax-tree/.github
|
447 |
|
448 | [contributing]: https://github.com/syntax-tree/.github/blob/main/contributing.md
|
449 |
|
450 | [support]: https://github.com/syntax-tree/.github/blob/main/support.md
|
451 |
|
452 | [coc]: https://github.com/syntax-tree/.github/blob/main/code-of-conduct.md
|
453 |
|
454 | [hast]: https://github.com/syntax-tree/hast
|
455 |
|
456 | [node]: https://github.com/syntax-tree/hast#nodes
|
457 |
|
458 | [name]: https://github.com/syntax-tree/hast#propertyname
|
459 |
|
460 | [github]: https://github.com/gjtorikian/html-pipeline/blob/a2e02ac/lib/html_pipeline/sanitization_filter.rb
|
461 |
|
462 | [xss]: https://en.wikipedia.org/wiki/Cross-site_scripting
|
463 |
|
464 | [rehype-sanitize]: https://github.com/rehypejs/rehype-sanitize
|
465 |
|
466 | [api-default-schema]: #defaultschema
|
467 |
|
468 | [api-sanitize]: #sanitizetree-options
|
469 |
|
470 | [api-schema]: #schema
|