1 | ## HTML cleaner and beautifier
|
2 |
|
3 | [![NPM Stats](https://nodei.co/npm/clean-html.png?downloads=true&downloadRank=true)](https://npmjs.org/packages/clean-html/)
|
4 |
|
5 | Do you have crappy HTML? I do!
|
6 |
|
7 | ```html
|
8 | <table width="100%" border="0" cellspacing="0" cellpadding="0">
|
9 | <tr>
|
10 | <td height="31"><b>Currently we have these articles available:</b>
|
11 |
|
12 | <blockquote>
|
13 | <p><a href="foo.html">The History of Foo</a><br />
|
14 | An <span color="red">informative</span> piece of <FONT FACE="ARIAL">information</FONT>.</p>
|
15 | <p><a href="bar.html">A Horse Walked Into a Bar</a><br/> The bartender said
|
16 | "Why the long face?"</p>
|
17 | </blockquote>
|
18 | </td>
|
19 | </tr>
|
20 | </table>
|
21 | ```
|
22 |
|
23 | Just look at those blank lines and random line breaks, trailing spaces, mixed tabs, deprecated tags - it's outrageous!
|
24 |
|
25 | Let's clean it up:
|
26 |
|
27 | ```javascript
|
28 | var cleaner = require('clean-html'),
|
29 | fs = require('fs'),
|
30 | filename = process.argv[2];
|
31 |
|
32 | fs.readFile(filename, function (err, data) {
|
33 | cleaner.clean(data, function (html) {
|
34 | console.log(html);
|
35 | });
|
36 | });
|
37 | ```
|
38 |
|
39 | Running this script on the file above produces the following output:
|
40 |
|
41 | ```html
|
42 | <table>
|
43 | <tr>
|
44 | <td>
|
45 | <b>Currently we have these articles available:</b>
|
46 | <blockquote>
|
47 | <p>
|
48 | <a href="foo.html">The History of Foo</a><br>
|
49 | An <span>informative</span> piece of information.
|
50 | </p>
|
51 | <p>
|
52 | <a href="bar.html">A Horse Walked Into a Bar</a><br>
|
53 | The bartender said "Why the long face?"
|
54 | </p>
|
55 | </blockquote>
|
56 | </td>
|
57 | </tr>
|
58 | </table>
|
59 | ```
|
60 |
|
61 | You can pass additional options to the `clean` function like this:
|
62 |
|
63 | ```javascript
|
64 | var options = {
|
65 | 'add-remove-tags': ['table', 'tr', 'td', 'blockquote']
|
66 | };
|
67 |
|
68 | cleaner.clean(data, options, function (html) {
|
69 | console.log(html);
|
70 | });
|
71 | ```
|
72 |
|
73 | In this case, it produces:
|
74 |
|
75 | ```html
|
76 | <b>Currently we have these articles available:</b>
|
77 | <p>
|
78 | <a href="foo.html">The History of Foo</a><br>
|
79 | An <span>informative</span> piece of information.
|
80 | </p>
|
81 | <p>
|
82 | <a href="bar.html">A Horse Walked Into a Bar</a><br>
|
83 | The bartender said "Why the long face?"
|
84 | </p>
|
85 | ```
|
86 |
|
87 | Sanity restored!
|
88 |
|
89 | ## Options
|
90 |
|
91 | ### break-around-comments
|
92 |
|
93 | Adds line breaks before and after comments.
|
94 |
|
95 | Type: Boolean
|
96 | Default: `true`
|
97 |
|
98 | ### break-around-tags
|
99 |
|
100 | Tags that should have line breaks added before and after.
|
101 |
|
102 | Type: Array
|
103 | Default: `['blockquote', 'br', 'div', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'hr', 'p', 'table', 'td', 'tr']`
|
104 |
|
105 | ### indent
|
106 |
|
107 | The string to use for indentation. e.g., a tab character or one or more spaces.
|
108 |
|
109 | Type: String
|
110 | Default: `' '` (two spaces)
|
111 |
|
112 | ### remove-attributes
|
113 |
|
114 | Attributes to remove from markup.
|
115 |
|
116 | Type: Array
|
117 | Default: `['align', 'bgcolor', 'border', 'cellpadding', 'cellspacing', 'color', 'disabled', 'height', 'target', 'valign', 'width']`
|
118 |
|
119 | ### remove-comments
|
120 |
|
121 | Removes comments.
|
122 |
|
123 | Type: Boolean
|
124 | Default: `false`
|
125 |
|
126 | ### remove-empty-tags
|
127 |
|
128 | Tags to remove from markup if empty.
|
129 |
|
130 | Type: Array
|
131 | Default: `[]`
|
132 |
|
133 | ### remove-tags
|
134 |
|
135 | Tags to always remove from markup. Nested content is preserved.
|
136 |
|
137 | Type: Array
|
138 | Default: `['center', 'font']`
|
139 |
|
140 | ### replace-nbsp
|
141 |
|
142 | Replaces non-breaking white space entities (` `) with regular spaces.
|
143 |
|
144 | Type: Boolean
|
145 | Default: `false`
|
146 |
|
147 | ### wrap
|
148 |
|
149 | The column number where lines should wrap.
|
150 |
|
151 | Type: Int
|
152 | Default: `120`
|
153 |
|
154 | ## Adding values to option lists
|
155 |
|
156 | These options exist for your convenience.
|
157 |
|
158 | ### add-break-around-tags
|
159 |
|
160 | Additional tags to include in `break-around-tags`.
|
161 |
|
162 | Type: Array
|
163 | Default: `null`
|
164 |
|
165 | ### add-remove-attributes
|
166 |
|
167 | Additional attributes to include in `remove-attributes`.
|
168 |
|
169 | Type: Array
|
170 | Default: `null`
|
171 |
|
172 | ### add-remove-tags
|
173 |
|
174 | Additional tags to include in `remove-tags`.
|
175 |
|
176 | Type: Array
|
177 | Default: `null`
|
178 |
|
179 | ## Global installation
|
180 |
|
181 | If this package is installed globally, it can be used from the command line:
|
182 |
|
183 | ```bash
|
184 | $ cat crappy.html | clean-html
|
185 | ```
|
186 |
|
187 | Instead of piping the input from another program, you can supply a filename as the first argument:
|
188 |
|
189 | ```bash
|
190 | $ clean-html crappy.html
|
191 | ```
|
192 |
|
193 | You can redirect the output to another file:
|
194 |
|
195 | ```bash
|
196 | $ clean-html crappy.html > clean.html
|
197 | ```
|
198 |
|
199 | Or you can edit the file in place:
|
200 |
|
201 | ```bash
|
202 | $ clean-html crappy.html --in-place
|
203 | ```
|
204 |
|
205 | All of the options above can be used from the command line. Array option values should be separated by commas:
|
206 |
|
207 | ```bash
|
208 | $ clean-html crappy.html --add-remove-tags b,i,u
|
209 | ```
|
210 |
|
211 | Boolean options can be set to true like this:
|
212 |
|
213 | ```bash
|
214 | $ clean-html crappy.html --remove-comments
|
215 | ```
|
216 |
|
217 | Or like this
|
218 |
|
219 | ```bash
|
220 | $ clean-html crappy.html --remove-comments true
|
221 | ```
|
222 |
|
223 | They can be set to false like this:
|
224 |
|
225 | ```bash
|
226 | $ clean-html crappy.html --remove-comments false
|
227 | ```
|