UNPKG

11.5 kBMarkdownView Raw
1# hast-util-sanitize
2
3[![Build][build-badge]][build]
4[![Coverage][coverage-badge]][coverage]
5[![Downloads][downloads-badge]][downloads]
6[![Size][size-badge]][size]
7[![Sponsors][sponsors-badge]][collective]
8[![Backers][backers-badge]][collective]
9[![Chat][chat-badge]][chat]
10
11[hast][] utility to make trees safe.
12
13## Contents
14
15* [What is this?](#what-is-this)
16* [When should I use this?](#when-should-i-use-this)
17* [Install](#install)
18* [Use](#use)
19* [API](#api)
20 * [`defaultSchema`](#defaultschema)
21 * [`sanitize(tree[, options])`](#sanitizetree-options)
22 * [`Schema`](#schema)
23* [Types](#types)
24* [Compatibility](#compatibility)
25* [Security](#security)
26* [Related](#related)
27* [Contribute](#contribute)
28* [License](#license)
29
30## What is this?
31
32This package is a utility that can make a tree that potentially contains
33dangerous user content safe for use.
34It defaults to what GitHub does to clean unsafe markup, but you can change that.
35
36## When should I use this?
37
38This package is needed whenever you deal with potentially dangerous user
39content.
40
41The plugin [`rehype-sanitize`][rehype-sanitize] wraps this utility to also
42sanitize HTML at a higher-level (easier) abstraction.
43
44## Install
45
46This package is [ESM only][esm].
47In Node.js (version 16+), install with [npm][]:
48
49```sh
50npm install hast-util-sanitize
51```
52
53In Deno with [`esm.sh`][esmsh]:
54
55```js
56import {sanitize} from 'https://esm.sh/hast-util-sanitize@5'
57```
58
59In browsers with [`esm.sh`][esmsh]:
60
61```html
62<script type="module">
63 import {sanitize} from 'https://esm.sh/hast-util-sanitize@5?bundle'
64</script>
65```
66
67## Use
68
69```js
70import {h} from 'hastscript'
71import {sanitize} from 'hast-util-sanitize'
72import {toHtml} from 'hast-util-to-html'
73import {u} from 'unist-builder'
74
75const unsafe = h('div', {onmouseover: 'alert("alpha")'}, [
76 h(
77 'a',
78 {href: 'jAva script:alert("bravo")', onclick: 'alert("charlie")'},
79 'delta'
80 ),
81 u('text', '\n'),
82 h('script', 'alert("charlie")'),
83 u('text', '\n'),
84 h('img', {src: 'x', onerror: 'alert("delta")'}),
85 u('text', '\n'),
86 h('iframe', {src: 'javascript:alert("echo")'}),
87 u('text', '\n'),
88 h('math', h('mi', {'xlink:href': 'data:x,<script>alert("foxtrot")</script>'}))
89])
90
91const safe = sanitize(unsafe)
92
93console.log(toHtml(unsafe))
94console.log(toHtml(safe))
95```
96
97Unsafe:
98
99```html
100<div onmouseover="alert(&#x22;alpha&#x22;)"><a href="jAva script:alert(&#x22;bravo&#x22;)" onclick="alert(&#x22;charlie&#x22;)">delta</a>
101<script>alert("charlie")</script>
102<img src="x" onerror="alert(&#x22;delta&#x22;)">
103<iframe src="javascript:alert(&#x22;echo&#x22;)"></iframe>
104<math><mi xlink:href="data:x,<script>alert(&#x22;foxtrot&#x22;)</script>"></mi></math></div>
105```
106
107Safe:
108
109```html
110<div><a>delta</a>
111
112<img src="x">
113
114</div>
115```
116
117## API
118
119This package exports the identifiers [`defaultSchema`][api-default-schema] and
120[`sanitize`][api-sanitize].
121There is no default export.
122
123### `defaultSchema`
124
125Default schema ([`Schema`][api-schema]).
126
127Follows [GitHub][] style sanitation.
128
129### `sanitize(tree[, options])`
130
131Sanitize a tree.
132
133###### Parameters
134
135* `tree` ([`Node`][node])
136 — unsafe tree
137* `options` ([`Schema`][api-schema], default:
138 [`defaultSchema`][api-default-schema])
139 — configuration
140
141###### Returns
142
143New, safe tree ([`Node`][node]).
144
145### `Schema`
146
147Schema that defines what nodes and properties are allowed.
148
149The default schema is [`defaultSchema`][api-default-schema], which follows how
150GitHub cleans.
151If any top-level key is missing in the given schema, the corresponding
152value of the default schema is used.
153
154To extend the standard schema with a few changes, clone `defaultSchema`
155like so:
156
157```js
158import deepmerge from 'deepmerge'
159import {h} from 'hastscript'
160import {defaultSchema, sanitize} from 'hast-util-sanitize'
161
162// This allows `className` on all elements.
163const schema = deepmerge(defaultSchema, {attributes: {'*': ['className']}})
164
165const tree = sanitize(h('div', {className: ['foo']}), schema)
166
167// `tree` still has `className`.
168console.log(tree)
169// {
170// type: 'element',
171// tagName: 'div',
172// properties: {className: ['foo']},
173// children: []
174// }
175```
176
177##### Fields
178
179###### `allowComments`
180
181Whether to allow comment nodes (`boolean`, default: `false`).
182
183For example:
184
185```js
186allowComments: true
187```
188
189###### `allowDoctypes`
190
191Whether to allow doctype nodes (`boolean`, default: `false`).
192
193For example:
194
195```js
196allowDoctypes: true
197```
198
199###### `ancestors`
200
201Map of tag names to a list of tag names which are required ancestors
202(`Record<string, Array<string>>`, default: `defaultSchema.ancestors`).
203
204Elements with these tag names will be ignored if they occur outside of one
205of their allowed parents.
206
207For example:
208
209```js
210ancestors: {
211 tbody: ['table'],
212 // …
213 tr: ['table']
214}
215```
216
217###### `attributes`
218
219Map of tag names to allowed [property names][name]
220(`Record<string, Array<[string, ...Array<RegExp | boolean | number | string>] | string>`,
221default: `defaultSchema.attributes`).
222
223The special key `'*'` as a tag name defines property names allowed on all
224elements.
225
226The special value `'data*'` as a property name can be used to allow all `data`
227properties.
228
229For example:
230
231```js
232attributes: {
233 a: [
234 'ariaDescribedBy', 'ariaLabel', 'ariaLabelledBy', /* … */, 'href'
235 ],
236 // …
237 '*': [
238 'abbr',
239 'accept',
240 'acceptCharset',
241 // …
242 'vAlign',
243 'value',
244 'width'
245 ]
246}
247```
248
249Instead of a single string in the array, which allows any property value for
250the field, you can use an array to allow several values.
251For example, `input: ['type']` allows `type` set to any value on `input`s.
252But `input: [['type', 'checkbox', 'radio']]` allows `type` when set to
253`'checkbox'` or `'radio'`.
254
255You can use regexes, so for example `span: [['className', /^hljs-/]]` allows
256any class that starts with `hljs-` on `span`s.
257
258When comma- or space-separated values are used (such as `className`), each
259value in is checked individually.
260For example, to allow certain classes on `span`s for syntax highlighting, use
261`span: [['className', 'number', 'operator', 'token']]`.
262This will allow `'number'`, `'operator'`, and `'token'` classes, but drop
263others.
264
265###### `clobber`
266
267List of [*property names*][name] that clobber (`Array<string>`, default:
268`defaultSchema.clobber`).
269
270For example:
271
272```js
273clobber: ['ariaDescribedBy', 'ariaLabelledBy', 'id', 'name']
274```
275
276###### `clobberPrefix`
277
278Prefix to use before clobbering properties (`string`, default:
279`defaultSchema.clobberPrefix`).
280
281For example:
282
283```js
284clobberPrefix: 'user-content-'
285```
286
287###### `protocols`
288
289Map of [*property names*][name] to allowed protocols
290(`Record<string, Array<string>>`, default: `defaultSchema.protocols`).
291
292This defines URLs that are always allowed to have local URLs (relative to
293the current website, such as `this`, `#this`, `/this`, or `?this`), and
294only allowed to have remote URLs (such as `https://example.com`) if they
295use a known protocol.
296
297For example:
298
299```js
300protocols: {
301 cite: ['http', 'https'],
302 // …
303 src: ['http', 'https']
304}
305```
306
307###### `required`
308
309Map of tag names to required [*property names*][name] with a default value
310(`Record<string, Record<string, unknown>>`, default: `defaultSchema.required`).
311
312This defines properties that must be set.
313If a field does not exist (after the element was made safe), these will be
314added with the given value.
315
316For example:
317
318```js
319required: {
320 input: {disabled: true, type: 'checkbox'}
321}
322```
323
324> 👉 **Note**: properties are first checked based on `schema.attributes`,
325> then on `schema.required`.
326> That means properties could be removed by `attributes` and then added
327> again with `required`.
328
329###### `strip`
330
331List of tag names to strip from the tree (`Array<string>`, default:
332`defaultSchema.strip`).
333
334By default, unsafe elements (those not in `schema.tagNames`) are replaced by
335what they contain.
336This option can drop their contents.
337
338For example:
339
340```js
341strip: ['script']
342```
343
344###### `tagNames`
345
346List of allowed tag names (`Array<string>`, default: `defaultSchema.tagNames`).
347
348For example:
349
350```js
351tagNames: [
352 'a',
353 'b',
354 // …
355 'ul',
356 'var'
357]
358```
359
360## Types
361
362This package is fully typed with [TypeScript][].
363It exports the additional type [`Schema`][api-schema].
364
365## Compatibility
366
367Projects maintained by the unified collective are compatible with maintained
368versions of Node.js.
369
370When we cut a new major release, we drop support for unmaintained versions of
371Node.
372This means we try to keep the current release line, `hast-util-sanitize@^5`,
373compatible with Node.js 16.
374
375## Security
376
377By default, `hast-util-sanitize` will make everything safe to use.
378Assuming you understand that certain attributes (including a limited set of
379classes) can be generated by users, and you write your CSS (and JS)
380accordingly.
381When used incorrectly, deviating from the defaults can open you up to a
382[cross-site scripting (XSS)][xss] attack.
383
384Use `hast-util-sanitize` after the last unsafe thing: everything after it could
385be unsafe (but is fine if you do trust it).
386
387## Related
388
389* [`rehype-sanitize`](https://github.com/rehypejs/rehype-sanitize)
390 — rehype plugin
391
392## Contribute
393
394See [`contributing.md`][contributing] in [`syntax-tree/.github`][health] for
395ways to get started.
396See [`support.md`][support] for ways to get help.
397
398This project has a [code of conduct][coc].
399By interacting with this repository, organization, or community you agree to
400abide by its terms.
401
402## License
403
404[MIT][license] © [Titus Wormer][author]
405
406<!-- Definitions -->
407
408[build-badge]: https://github.com/syntax-tree/hast-util-sanitize/workflows/main/badge.svg
409
410[build]: https://github.com/syntax-tree/hast-util-sanitize/actions
411
412[coverage-badge]: https://img.shields.io/codecov/c/github/syntax-tree/hast-util-sanitize.svg
413
414[coverage]: https://codecov.io/github/syntax-tree/hast-util-sanitize
415
416[downloads-badge]: https://img.shields.io/npm/dm/hast-util-sanitize.svg
417
418[downloads]: https://www.npmjs.com/package/hast-util-sanitize
419
420[size-badge]: https://img.shields.io/badge/dynamic/json?label=minzipped%20size&query=$.size.compressedSize&url=https://deno.bundlejs.com/?q=hast-util-sanitize
421
422[size]: https://bundlejs.com/?q=hast-util-sanitize
423
424[sponsors-badge]: https://opencollective.com/unified/sponsors/badge.svg
425
426[backers-badge]: https://opencollective.com/unified/backers/badge.svg
427
428[collective]: https://opencollective.com/unified
429
430[chat-badge]: https://img.shields.io/badge/chat-discussions-success.svg
431
432[chat]: https://github.com/syntax-tree/unist/discussions
433
434[npm]: https://docs.npmjs.com/cli/install
435
436[esm]: https://gist.github.com/sindresorhus/a39789f98801d908bbc7ff3ecc99d99c
437
438[esmsh]: https://esm.sh
439
440[typescript]: https://www.typescriptlang.org
441
442[license]: license
443
444[author]: https://wooorm.com
445
446[health]: https://github.com/syntax-tree/.github
447
448[contributing]: https://github.com/syntax-tree/.github/blob/main/contributing.md
449
450[support]: https://github.com/syntax-tree/.github/blob/main/support.md
451
452[coc]: https://github.com/syntax-tree/.github/blob/main/code-of-conduct.md
453
454[hast]: https://github.com/syntax-tree/hast
455
456[node]: https://github.com/syntax-tree/hast#nodes
457
458[name]: https://github.com/syntax-tree/hast#propertyname
459
460[github]: https://github.com/gjtorikian/html-pipeline/blob/a2e02ac/lib/html_pipeline/sanitization_filter.rb
461
462[xss]: https://en.wikipedia.org/wiki/Cross-site_scripting
463
464[rehype-sanitize]: https://github.com/rehypejs/rehype-sanitize
465
466[api-default-schema]: #defaultschema
467
468[api-sanitize]: #sanitizetree-options
469
470[api-schema]: #schema