UNPKG

2.91 kBMarkdownView Raw
1## HTML cleaner and beautifier
2
3Do you have crappy HTML? I do!
4
5```html
6<table width="100%" border="0" cellspacing="0" cellpadding="0">
7 <tr>
8 <td height="31"><b>Currently we have these articles available:</b>
9
10 <blockquote>
11 <p><a href="foo.html">The History of Foo</a><br />
12 An <span color="red">informative</span> piece of <FONT FACE="ARIAL">information</FONT>.</p>
13 <p><a href="bar.html">A Horse Walked Into a Bar</a><br/> The bartender said
14 "Why the long face?"</p>
15 </blockquote>
16 </td>
17 </tr>
18 </table>
19```
20
21Just look at those blank lines and random line breaks, trailing spaces, mixed tabs, deprecated tags - it's outrageous!
22
23Let's clean it up...
24
25```bash
26$ npm install clean-html
27```
28
29```javascript
30var cleaner = require('clean-html'),
31 fs = require('fs'),
32 file = process.argv[2];
33
34fs.readFile(file, function (err, data) {
35 process.stdout.write(cleaner.clean(data) + '\n');
36});
37```
38
39Sanity restored!
40
41```html
42<table>
43 <tr>
44 <td>
45 <b>Currently we have these articles available:</b>
46 <blockquote>
47 <p>
48 <a href="foo.html">The History of Foo</a><br>
49 An <span>informative</span> piece of information.
50 </p>
51 <p>
52 <a href="bar.html">A Horse Walked Into a Bar</a><br>
53 The bartender said "Why the long face?"
54 </p>
55 </blockquote>
56 </td>
57 </tr>
58</table>
59```
60
61## Options
62
63### attr-to-remove
64
65Attributes to remove from markup.
66
67Type: Array
68Default: `['align', 'valign', 'bgcolor', 'color', 'width', 'height', 'border', 'cellpadding', 'cellspacing']`
69
70### block-tags
71
72Block level element tags. Line breaks are added before and after, and nested content is indented. Note: this option has no effect unless pretty print is enabled.
73
74Type: Array
75Default: `['div', 'p', 'table', 'tr', 'td', 'blockquote', 'hr']`
76
77### empty-tags
78
79Empty element tags. Trailing slashes are removed.
80
81Type: Array
82Default: `['br', 'hr', 'img']`
83
84### encoding
85
86Using this option to specify the encoding of the input file will ensure its contents are properly converted to utf-8.
87
88Type: String
89Default: `utf-8`
90
91### pretty
92
93Pretty prints the output by adding line breaks and indentation.
94
95Type: Boolean
96Default: `true`
97
98### remove-comments
99
100Removes comments.
101
102Type: Boolean
103Default: `false`
104
105### tags-to-remove
106
107Tags to remove from markup.
108
109Type: Array
110Default: `['font']`
111
112## Adding values to option lists
113
114These options are added for your convenience.
115
116### add-attr-to-remove
117
118Additional attributes to remove from markup.
119
120Type: Array
121Default: `null`
122
123### add-block-tags
124
125Additional block level element tags.
126
127Type: Array
128Default: `null`
129
130### add-empty-tags
131
132Additional empty element tags.
133
134Type; Array
135Default: `null`
136
137### add-tags-to-remove
138
139Additional tags to remove from markup.
140
141Type; Array
142Default: `null`