UNPKG

4.34 kBMarkdownView Raw
1## HTML cleaner and beautifier
2
3[![NPM Stats](https://nodei.co/npm/clean-html.png?downloads=true&downloadRank=true)](https://npmjs.org/packages/clean-html/)
4
5Do you have crappy HTML? I do!
6
7```html
8<table width="100%" border="0" cellspacing="0" cellpadding="0">
9 <tr>
10 <td height="31"><b>Currently we have these articles available:</b>
11
12 <blockquote>
13 <!-- List articles -->
14 <p><a href="foo.html">The History of Foo</a><br />
15 An <span color="red">informative</span> piece of <FONT FACE="ARIAL">information</FONT>.</p>
16 <p><a href="bar.html">A Horse Walked Into a Bar</a><br/> The bartender said
17 "Why the long face?"</p>
18 </blockquote>
19 </td>
20 </tr>
21 </table>
22```
23
24Just look at those blank lines and random line breaks, trailing spaces, mixed tabs, deprecated tags - it's outrageous!
25
26Let's clean it up...
27
28```bash
29$ npm install clean-html
30```
31
32```javascript
33var cleaner = require('clean-html'),
34 fs = require('fs'),
35 file = process.argv[2];
36
37fs.readFile(file, 'utf-8', function (err, data) {
38 cleaner.clean(data, function (html) {
39 console.log(html);
40 });
41});
42```
43
44Sanity restored!
45
46```html
47<table>
48 <tr>
49 <td>
50 <b>Currently we have these articles available:</b>
51 <blockquote>
52 <!-- List articles -->
53 <p>
54 <a href="foo.html">The History of Foo</a><br>
55 An <span>informative</span> piece of information.
56 </p>
57 <p>
58 <a href="bar.html">A Horse Walked Into a Bar</a><br>
59 The bartender said "Why the long face?"
60 </p>
61 </blockquote>
62 </td>
63 </tr>
64</table>
65```
66
67## Options
68
69### attr-to-remove
70
71Attributes to remove from markup.
72
73Type: Array
74Default: `['align', 'bgcolor', 'border', 'cellpadding', 'cellspacing', 'color', 'disabled', 'height', 'target', 'valign', 'width']`
75
76### block-tags
77
78Block level element tags. Line breaks are added before and after, and nested content is indented.
79
80Type: Array
81Default: `['blockquote', 'div', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'hr', 'p', 'table', 'td', 'tr']`
82
83### break-around-comments
84
85Adds line breaks before and after comments.
86
87Type: Boolean
88Default: `true`
89
90### break-after-br
91
92Adds line breaks after br tags.
93
94Type: Boolean
95Default: `true`
96
97### empty-tags
98
99Empty element tags.
100
101Type: Array
102Default: `['br', 'hr', 'img']`
103
104### indent
105
106The string to use for indentation. e.g., a tab character or one or more spaces.
107
108Type: String
109Default: `' '` (two spaces)
110
111### remove-comments
112
113Removes comments.
114
115Type: Boolean
116Default: `false`
117
118### remove-empty-paras
119
120Removes empty paragraph tags.
121
122Type: Boolean
123Default: `false`
124
125### replace-nbsp
126
127Replaces non-breaking white space entities (`&nbsp;`) with regular spaces.
128
129Type: Boolean
130Default: `false`
131
132### tags-to-remove
133
134Tags to remove from markup.
135
136Type: Array
137Default: `['center', 'font']`
138
139## Adding values to option lists
140
141These options are added for your convenience.
142
143### add-attr-to-remove
144
145Additional attributes to remove from markup.
146
147Type: Array
148Default: `null`
149
150### add-block-tags
151
152Additional block level element tags.
153
154Type: Array
155Default: `null`
156
157### add-empty-tags
158
159Additional empty element tags.
160
161Type: Array
162Default: `null`
163
164### add-tags-to-remove
165
166Additional tags to remove from markup.
167
168Type: Array
169Default: `null`
170
171## Global installation
172
173All of the options above are available from the command line when the package is installed globally:
174
175```bash
176$ clean-html crappy.html clean.html
177```
178
179The first argument is the input file and the second is the output file. If no output file is specified, the output will be piped to STDOUT.
180
181Array options should be separated by commas. These are equivalent:
182
183```bash
184$ clean-html crappy.html clean.html --add-tags-to-remove b,i,u
185$ clean-html crappy.html clean.html --add-tags-to-remove 'b,i,u'
186```
187
188Boolean options are parsed as true if they aren't followed by anything. These are equivalent:
189
190```bash
191$ clean-html crappy.html clean.html --remove-comments
192$ clean-html crappy.html clean.html --remove-comments true
193$ clean-html crappy.html clean.html --remove-comments 'true'
194```
195
196So are these:
197
198```bash
199$ clean-html crappy.html clean.html --break-after-br false
200$ clean-html crappy.html clean.html --break-after-br 'false'
201```