1 | ## HTML cleaner and beautifier
|
2 |
|
3 | [![NPM Stats](https://nodei.co/npm/clean-html.png?downloads=true&downloadRank=true)](https://npmjs.org/packages/clean-html/)
|
4 |
|
5 | Do you have crappy HTML? I do!
|
6 |
|
7 | ```html
|
8 | <table width="100%" border="0" cellspacing="0" cellpadding="0">
|
9 | <tr>
|
10 | <td height="31"><b>Currently we have these articles available:</b>
|
11 |
|
12 | <blockquote>
|
13 | <p><a href="foo.html">The History of Foo</a><br />
|
14 | An <span color="red">informative</span> piece of <FONT FACE="ARIAL">information</FONT>.</p>
|
15 | <p><a href="bar.html">A Horse Walked Into a Bar</a><br/> The bartender said
|
16 | "Why the long face?"</p>
|
17 | </blockquote>
|
18 | </td>
|
19 | </tr>
|
20 | </table>
|
21 | ```
|
22 |
|
23 | Just look at those blank lines and random line breaks, trailing spaces, mixed tabs, deprecated tags - it's outrageous!
|
24 |
|
25 | Let's clean it up:
|
26 |
|
27 | ```javascript
|
28 | var cleaner = require('clean-html'),
|
29 | fs = require('fs'),
|
30 | filename = process.argv[2];
|
31 |
|
32 | fs.readFile(filename, function (err, data) {
|
33 | cleaner.clean(data, function (html) {
|
34 | console.log(html);
|
35 | });
|
36 | });
|
37 | ```
|
38 |
|
39 | Running this script on the file above produces the following output:
|
40 |
|
41 | ```html
|
42 | <table>
|
43 | <tr>
|
44 | <td>
|
45 | <b>Currently we have these articles available:</b>
|
46 | <blockquote>
|
47 | <p>
|
48 | <a href="foo.html">The History of Foo</a>
|
49 | <br>
|
50 | An <span>informative</span> piece of information.
|
51 | </p>
|
52 | <p>
|
53 | <a href="bar.html">A Horse Walked Into a Bar</a>
|
54 | <br>
|
55 | The bartender said "Why the long face?"
|
56 | </p>
|
57 | </blockquote>
|
58 | </td>
|
59 | </tr>
|
60 | </table>
|
61 | ```
|
62 |
|
63 | You can pass additional options to the `clean` function like this:
|
64 |
|
65 | ```javascript
|
66 | var options = {
|
67 | 'add-remove-tags': ['table', 'tr', 'td', 'blockquote']
|
68 | };
|
69 |
|
70 | cleaner.clean(data, options, function (html) {
|
71 | console.log(html);
|
72 | });
|
73 | ```
|
74 |
|
75 | In this case, it produces:
|
76 |
|
77 | ```html
|
78 | <b>Currently we have these articles available:</b>
|
79 | <p>
|
80 | <a href="foo.html">The History of Foo</a>
|
81 | <br>
|
82 | An <span>informative</span> piece of information.
|
83 | </p>
|
84 | <p>
|
85 | <a href="bar.html">A Horse Walked Into a Bar</a>
|
86 | <br>
|
87 | The bartender said "Why the long face?"
|
88 | </p>
|
89 | ```
|
90 |
|
91 | Sanity restored!
|
92 |
|
93 | ## Options
|
94 |
|
95 | ### break-around-comments
|
96 |
|
97 | Adds line breaks before and after comments.
|
98 |
|
99 | Type: Boolean
|
100 | Default: `true`
|
101 |
|
102 | ### break-around-tags
|
103 |
|
104 | Tags that should have line breaks added before and after.
|
105 |
|
106 | Type: Array
|
107 | Default: `['blockquote', 'br', 'div', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'hr', 'p', 'table', 'td', 'tr']`
|
108 |
|
109 | ### indent
|
110 |
|
111 | The string to use for indentation. e.g., a tab character or one or more spaces.
|
112 |
|
113 | Type: String
|
114 | Default: `' '` (two spaces)
|
115 |
|
116 | ### remove-attributes
|
117 |
|
118 | Attributes to remove from markup.
|
119 |
|
120 | Type: Array
|
121 | Default: `['align', 'bgcolor', 'border', 'cellpadding', 'cellspacing', 'color', 'height', 'target', 'valign', 'width']`
|
122 |
|
123 | ### remove-comments
|
124 |
|
125 | Removes comments.
|
126 |
|
127 | Type: Boolean
|
128 | Default: `false`
|
129 |
|
130 | ### remove-empty-tags
|
131 |
|
132 | Tags to remove from markup if empty.
|
133 |
|
134 | Type: Array
|
135 | Default: `[]`
|
136 |
|
137 | ### remove-tags
|
138 |
|
139 | Tags to always remove from markup. Nested content is preserved.
|
140 |
|
141 | Type: Array
|
142 | Default: `['center', 'font']`
|
143 |
|
144 | ### replace-nbsp
|
145 |
|
146 | Replaces non-breaking white space entities (` `) with regular spaces.
|
147 |
|
148 | Type: Boolean
|
149 | Default: `false`
|
150 |
|
151 | ### wrap
|
152 |
|
153 | The column number where lines should wrap. Set to 0 to disable line wrapping.
|
154 |
|
155 | Type: Integer
|
156 | Default: `120`
|
157 |
|
158 | ## Adding values to option lists
|
159 |
|
160 | These options exist for your convenience.
|
161 |
|
162 | ### add-break-around-tags
|
163 |
|
164 | Additional tags to include in `break-around-tags`.
|
165 |
|
166 | Type: Array
|
167 | Default: `null`
|
168 |
|
169 | ### add-remove-attributes
|
170 |
|
171 | Additional attributes to include in `remove-attributes`.
|
172 |
|
173 | Type: Array
|
174 | Default: `null`
|
175 |
|
176 | ### add-remove-tags
|
177 |
|
178 | Additional tags to include in `remove-tags`.
|
179 |
|
180 | Type: Array
|
181 | Default: `null`
|
182 |
|
183 | ## Global installation
|
184 |
|
185 | If this package is installed globally, it can be used from the command line:
|
186 |
|
187 | ```bash
|
188 | $ cat crappy.html | clean-html
|
189 | ```
|
190 |
|
191 | Instead of piping the input from another program, you can supply a filename as the first argument:
|
192 |
|
193 | ```bash
|
194 | $ clean-html crappy.html
|
195 | ```
|
196 |
|
197 | You can redirect the output to another file:
|
198 |
|
199 | ```bash
|
200 | $ clean-html crappy.html > clean.html
|
201 | ```
|
202 |
|
203 | Or you can edit the file in place:
|
204 |
|
205 | ```bash
|
206 | $ clean-html crappy.html --in-place
|
207 | ```
|
208 |
|
209 | All of the options above can be used from the command line. Array option values should be separated by commas:
|
210 |
|
211 | ```bash
|
212 | $ clean-html crappy.html --add-remove-tags b,i,u
|
213 | ```
|
214 |
|
215 | Boolean options can be set to true like this:
|
216 |
|
217 | ```bash
|
218 | $ clean-html crappy.html --remove-comments
|
219 | ```
|
220 |
|
221 | Or like this
|
222 |
|
223 | ```bash
|
224 | $ clean-html crappy.html --remove-comments true
|
225 | ```
|
226 |
|
227 | They can be set to false like this:
|
228 |
|
229 | ```bash
|
230 | $ clean-html crappy.html --remove-comments false
|
231 | ```
|