UNPKG

4.8 kBMarkdownView Raw
1## HTML cleaner and beautifier
2
3[![NPM Stats](https://nodei.co/npm/clean-html.png?downloads=true&downloadRank=true)](https://npmjs.org/packages/clean-html/)
4
5Do you have crappy HTML? I do!
6
7```html
8<table width="100%" border="0" cellspacing="0" cellpadding="0">
9 <tr>
10 <td height="31"><b>Currently we have these articles available:</b>
11
12 <blockquote>
13 <p><a href="foo.html">The History of Foo</a><br />
14 An <span color="red">informative</span> piece of <font face="arial">information</font>.</p>
15 <p><A HREF="bar.html">A Horse Walked Into a Bar</A><br/> The bartender said
16 "Why the long face?"</p>
17 </blockquote>
18 </td>
19 </tr>
20 </table>
21```
22
23Just look at those blank lines and random line breaks, trailing spaces, mixed tabs, deprecated tags - it's outrageous!
24
25Let's clean it up:
26
27```javascript
28var cleaner = require('clean-html'),
29 fs = require('fs'),
30 filename = process.argv[2];
31
32fs.readFile(filename, function (err, data) {
33 cleaner.clean(data, function (html) {
34 console.log(html);
35 });
36});
37```
38
39Running this script on the file above produces the following output:
40
41```html
42<table>
43 <tr>
44 <td>
45 <b>Currently we have these articles available:</b>
46 <blockquote>
47 <p>
48 <a href="foo.html">The History of Foo</a>
49 <br>
50 An <span>informative</span> piece of information.
51 </p>
52 <p>
53 <a href="bar.html">A Horse Walked Into a Bar</a>
54 <br>
55 The bartender said "Why the long face?"
56 </p>
57 </blockquote>
58 </td>
59 </tr>
60</table>
61```
62
63You can pass additional options to the `clean` function like this:
64
65```javascript
66var options = {
67 'add-remove-tags': ['table', 'tr', 'td', 'blockquote']
68};
69
70cleaner.clean(data, options, function (html) {
71 console.log(html);
72});
73```
74
75In this case, it produces:
76
77```html
78<b>Currently we have these articles available:</b>
79<p>
80 <a href="foo.html">The History of Foo</a>
81 <br>
82 An <span>informative</span> piece of information.
83</p>
84<p>
85 <a href="bar.html">A Horse Walked Into a Bar</a>
86 <br>
87 The bartender said "Why the long face?"
88</p>
89```
90
91Sanity restored!
92
93## Options
94
95### break-around-comments
96
97Adds line breaks before and after comments.
98
99Type: Boolean
100Default: `true`
101
102### break-around-tags
103
104Tags that should have line breaks added before and after.
105
106Type: Array
107Default: `['body', 'blockquote', 'br', 'div', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'head', 'hr', 'link', 'meta', 'p', 'script', 'style', 'table', 'title', 'td', 'tr']`
108
109### indent
110
111The string to use for indentation. e.g., a tab character or one or more spaces.
112
113Type: String
114Default: `' '` (two spaces)
115
116### remove-attributes
117
118Attributes to remove from markup.
119
120Type: Array
121Default: `['align', 'bgcolor', 'border', 'cellpadding', 'cellspacing', 'color', 'height', 'target', 'valign', 'width']`
122
123### remove-comments
124
125Removes comments.
126
127Type: Boolean
128Default: `false`
129
130### remove-empty-tags
131
132Tags to remove from markup if empty.
133
134Type: Array
135Default: `[]`
136
137### remove-tags
138
139Tags to always remove from markup. Nested content is preserved.
140
141Type: Array
142Default: `['center', 'font']`
143
144### replace-nbsp
145
146Replaces non-breaking white space entities (`&nbsp;`) with regular spaces.
147
148Type: Boolean
149Default: `false`
150
151### wrap
152
153The column number where lines should wrap. Set to 0 to disable line wrapping.
154
155Type: Integer
156Default: `120`
157
158## Adding values to option lists
159
160These options exist for your convenience.
161
162### add-break-around-tags
163
164Additional tags to include in `break-around-tags`.
165
166Type: Array
167Default: `null`
168
169### add-remove-attributes
170
171Additional attributes to include in `remove-attributes`.
172
173Type: Array
174Default: `null`
175
176### add-remove-tags
177
178Additional tags to include in `remove-tags`.
179
180Type: Array
181Default: `null`
182
183## Global installation
184
185If this package is installed globally, it can be used from the command line:
186
187```bash
188$ cat crappy.html | clean-html
189```
190
191Instead of piping the input from another program, you can supply a filename as the first argument:
192
193```bash
194$ clean-html crappy.html
195```
196
197You can redirect the output to another file:
198
199```bash
200$ clean-html crappy.html > clean.html
201```
202
203Or you can edit the file in place:
204
205```bash
206$ clean-html crappy.html --in-place
207```
208
209All of the options above can be used from the command line. Array option values should be separated by commas:
210
211```bash
212$ clean-html crappy.html --add-remove-tags b,i,u
213```
214
215Boolean options can be set to true like this:
216
217```bash
218$ clean-html crappy.html --remove-comments
219```
220
221Or like this
222
223```bash
224$ clean-html crappy.html --remove-comments true
225```
226
227They can be set to false like this:
228
229```bash
230$ clean-html crappy.html --remove-comments false
231```