1 | ## HTML cleaner and beautifier
|
2 |
|
3 | [![NPM Stats](https://nodei.co/npm/clean-html.png?downloads=true&downloadRank=true)](https://npmjs.org/packages/clean-html/)
|
4 |
|
5 | Do you have crappy HTML? I do!
|
6 |
|
7 | ```html
|
8 | <table width="100%" border="0" cellspacing="0" cellpadding="0">
|
9 | <tr>
|
10 | <td height="31"><b>Currently we have these articles available:</b>
|
11 |
|
12 | <blockquote>
|
13 | <!-- List articles -->
|
14 | <p><a href="foo.html">The History of Foo</a><br />
|
15 | An <span color="red">informative</span> piece of <FONT FACE="ARIAL">information</FONT>.</p>
|
16 | <p><a href="bar.html">A Horse Walked Into a Bar</a><br/> The bartender said
|
17 | "Why the long face?"</p>
|
18 | </blockquote>
|
19 | </td>
|
20 | </tr>
|
21 | </table>
|
22 | ```
|
23 |
|
24 | Just look at those blank lines and random line breaks, trailing spaces, mixed tabs, deprecated tags - it's outrageous!
|
25 |
|
26 | Let's clean it up...
|
27 |
|
28 | ```bash
|
29 | $ npm install clean-html
|
30 | ```
|
31 |
|
32 | ```javascript
|
33 | var cleaner = require('clean-html'),
|
34 | fs = require('fs'),
|
35 | file = process.argv[2];
|
36 |
|
37 | fs.readFile(file, 'utf-8', function (err, data) {
|
38 | process.stdout.write(cleaner.clean(data) + '\n');
|
39 | });
|
40 | ```
|
41 |
|
42 | Sanity restored!
|
43 |
|
44 | ```html
|
45 | <table>
|
46 | <tr>
|
47 | <td>
|
48 | <b>Currently we have these articles available:</b>
|
49 | <blockquote>
|
50 | <!-- List articles -->
|
51 | <p>
|
52 | <a href="foo.html">The History of Foo</a><br>
|
53 | An <span>informative</span> piece of information.
|
54 | </p>
|
55 | <p>
|
56 | <a href="bar.html">A Horse Walked Into a Bar</a><br>
|
57 | The bartender said "Why the long face?"
|
58 | </p>
|
59 | </blockquote>
|
60 | </td>
|
61 | </tr>
|
62 | </table>
|
63 | ```
|
64 |
|
65 | If you like, you can even close the empty tags, lose the comments and get rid of that nasty presentational markup:
|
66 |
|
67 | ```javascript
|
68 | var options = {
|
69 | 'close-empty-tags': true,
|
70 | 'remove-comments': true,
|
71 | 'add-tags-to-remove': ['table', 'tr', 'td', 'blockquote']
|
72 | };
|
73 |
|
74 | process.stdout.write(cleaner.clean(data, options) + '\n');
|
75 | ```
|
76 |
|
77 | Voila!
|
78 |
|
79 | ```html
|
80 | <b>Currently we have these articles available:</b>
|
81 | <p>
|
82 | <a href="foo.html">The History of Foo</a><br/>
|
83 | An <span>informative</span> piece of information.
|
84 | </p>
|
85 | <p>
|
86 | <a href="bar.html">A Horse Walked Into a Bar</a><br/>
|
87 | The bartender said "Why the long face?"
|
88 | </p>
|
89 | ```
|
90 |
|
91 | ## Options
|
92 |
|
93 | ### attr-to-remove
|
94 |
|
95 | Attributes to remove from markup.
|
96 |
|
97 | Type: Array
|
98 | Default: `['align', 'valign', 'bgcolor', 'color', 'width', 'height', 'border', 'cellpadding', 'cellspacing']`
|
99 |
|
100 | ### block-tags
|
101 |
|
102 | Block level element tags. Line breaks are added before and after, and nested content is indented. Note: this option has no effect unless pretty is set to true.
|
103 |
|
104 | Type: Array
|
105 | Default: `['h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'div', 'p', 'table', 'tr', 'td', 'blockquote', 'hr']`
|
106 |
|
107 | ### break-after-br
|
108 |
|
109 | Adds line breaks after br tags. Note: this option has no effect unless pretty is set to true.
|
110 |
|
111 | Type: Boolean
|
112 | Default: `true`
|
113 |
|
114 | ### close-empty-tags
|
115 |
|
116 | If set to true, adds trailing slashes to empty tags. Otherwise removes trailing slashes.
|
117 |
|
118 | Type: Boolean
|
119 | Default: `false`
|
120 |
|
121 | ### empty-tags
|
122 |
|
123 | Empty element tags.
|
124 |
|
125 | Type: Array
|
126 | Default: `['br', 'hr', 'img']`
|
127 |
|
128 | ### fix-end-tags
|
129 |
|
130 | Adds end tags where they are missing. For example, this:
|
131 |
|
132 | ```html
|
133 | <blockquote>Now Scotch is a real drink for a man.
|
134 | ```
|
135 |
|
136 | becomes this:
|
137 |
|
138 | ```html
|
139 | <blockquote>Now Scotch is a real drink for a man.</blockquote>
|
140 | ```
|
141 |
|
142 | Also fixes end tags that are closed in the wrong order.
|
143 |
|
144 | ```html
|
145 | You <b>belong in the <i>circus</b></i>, Spock, not a starship.
|
146 | ```
|
147 |
|
148 | becomes this:
|
149 |
|
150 | ```html
|
151 | You <b>belong in the <i>circus</i></b>, Spock, not a starship.
|
152 | ```
|
153 |
|
154 | ### indent
|
155 |
|
156 | The string to use for indentation. e.g., a tab character or one or more spaces. Note: this option has no effect unless pretty is set to true.
|
157 |
|
158 | Type: String
|
159 | Default: `' '` (two spaces)
|
160 |
|
161 | ### pretty
|
162 |
|
163 | Pretty prints the output by adding line breaks and indentation.
|
164 |
|
165 | Type: Boolean
|
166 | Default: `true`
|
167 |
|
168 | ### remove-comments
|
169 |
|
170 | Removes comments.
|
171 |
|
172 | Type: Boolean
|
173 | Default: `false`
|
174 |
|
175 | ### remove-empty-paras
|
176 |
|
177 | Removes empty paragraph tags.
|
178 |
|
179 | Type: Boolean
|
180 | Default: `false`
|
181 |
|
182 | ### tags-to-remove
|
183 |
|
184 | Tags to remove from markup.
|
185 |
|
186 | Type: Array
|
187 | Default: `['font']`
|
188 |
|
189 | ## Adding values to option lists
|
190 |
|
191 | These options are added for your convenience.
|
192 |
|
193 | ### add-attr-to-remove
|
194 |
|
195 | Additional attributes to remove from markup.
|
196 |
|
197 | Type: Array
|
198 | Default: `null`
|
199 |
|
200 | ### add-block-tags
|
201 |
|
202 | Additional block level element tags.
|
203 |
|
204 | Type: Array
|
205 | Default: `null`
|
206 |
|
207 | ### add-empty-tags
|
208 |
|
209 | Additional empty element tags.
|
210 |
|
211 | Type: Array
|
212 | Default: `null`
|
213 |
|
214 | ### add-tags-to-remove
|
215 |
|
216 | Additional tags to remove from markup.
|
217 |
|
218 | Type: Array
|
219 | Default: `null`
|