UNPKG

4.61 kBMarkdownView Raw
1## HTML cleaner and beautifier
2
3[![NPM Stats](https://nodei.co/npm/clean-html.png?downloads=true&downloadRank=true)](https://npmjs.org/packages/clean-html/)
4
5Do you have crappy HTML? I do!
6
7```html
8<table width="100%" border="0" cellspacing="0" cellpadding="0">
9 <tr>
10 <td height="31"><b>Currently we have these articles available:</b>
11
12 <blockquote>
13 <!-- List articles -->
14 <p><a href="foo.html">The History of Foo</a><br />
15 An <span color="red">informative</span> piece of <FONT FACE="ARIAL">information</FONT>.</p>
16 <p><a href="bar.html">A Horse Walked Into a Bar</a><br/> The bartender said
17 "Why the long face?"</p>
18 </blockquote>
19 </td>
20 </tr>
21 </table>
22```
23
24Just look at those blank lines and random line breaks, trailing spaces, mixed tabs, deprecated tags - it's outrageous!
25
26Let's clean it up...
27
28```bash
29$ npm install clean-html
30```
31
32```javascript
33var cleaner = require('clean-html'),
34 fs = require('fs'),
35 file = process.argv[2];
36
37fs.readFile(file, 'utf-8', function (err, data) {
38 process.stdout.write(cleaner.clean(data) + '\n');
39});
40```
41
42Sanity restored!
43
44```html
45<table>
46 <tr>
47 <td>
48 <b>Currently we have these articles available:</b>
49 <blockquote>
50 <!-- List articles -->
51 <p>
52 <a href="foo.html">The History of Foo</a><br>
53 An <span>informative</span> piece of information.
54 </p>
55 <p>
56 <a href="bar.html">A Horse Walked Into a Bar</a><br>
57 The bartender said "Why the long face?"
58 </p>
59 </blockquote>
60 </td>
61 </tr>
62</table>
63```
64
65If you like, you can even close the empty tags, lose the comments and get rid of that nasty presentational markup:
66
67```javascript
68var options = {
69 'close-empty-tags': true,
70 'remove-comments': true,
71 'add-tags-to-remove': ['table', 'tr', 'td', 'blockquote']
72};
73
74process.stdout.write(cleaner.clean(data, options) + '\n');
75```
76
77Voila!
78
79```html
80<b>Currently we have these articles available:</b>
81<p>
82 <a href="foo.html">The History of Foo</a><br/>
83 An <span>informative</span> piece of information.
84</p>
85<p>
86 <a href="bar.html">A Horse Walked Into a Bar</a><br/>
87 The bartender said "Why the long face?"
88</p>
89```
90
91## Options
92
93### attr-to-remove
94
95Attributes to remove from markup.
96
97Type: Array
98Default: `['align', 'valign', 'bgcolor', 'color', 'width', 'height', 'border', 'cellpadding', 'cellspacing']`
99
100### block-tags
101
102Block level element tags. Line breaks are added before and after, and nested content is indented. Note: this option has no effect unless pretty is set to true.
103
104Type: Array
105Default: `['h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'div', 'p', 'table', 'tr', 'td', 'blockquote', 'hr']`
106
107### break-after-br
108
109Adds line breaks after br tags. Note: this option has no effect unless pretty is set to true.
110
111Type: Boolean
112Default: `true`
113
114### close-empty-tags
115
116If set to true, adds trailing slashes to empty tags. Otherwise removes trailing slashes.
117
118Type: Boolean
119Default: `false`
120
121### empty-tags
122
123Empty element tags.
124
125Type: Array
126Default: `['br', 'hr', 'img']`
127
128### fix-end-tags
129
130Adds end tags where they are missing. For example, this:
131
132```html
133<blockquote>Now Scotch is a real drink for a man.
134```
135
136becomes this:
137
138```html
139<blockquote>Now Scotch is a real drink for a man.</blockquote>
140```
141
142Also fixes end tags that are closed in the wrong order.
143
144```html
145You <b>belong in the <i>circus</b></i>, Spock, not a starship.
146```
147
148becomes this:
149
150```html
151You <b>belong in the <i>circus</i></b>, Spock, not a starship.
152```
153
154### indent
155
156The string to use for indentation. e.g., a tab character or one or more spaces. Note: this option has no effect unless pretty is set to true.
157
158Type: String
159Default: `' '` (two spaces)
160
161### pretty
162
163Pretty prints the output by adding line breaks and indentation.
164
165Type: Boolean
166Default: `true`
167
168### remove-comments
169
170Removes comments.
171
172Type: Boolean
173Default: `false`
174
175### remove-empty-paras
176
177Removes empty paragraph tags.
178
179Type: Boolean
180Default: `false`
181
182### tags-to-remove
183
184Tags to remove from markup.
185
186Type: Array
187Default: `['font']`
188
189## Adding values to option lists
190
191These options are added for your convenience.
192
193### add-attr-to-remove
194
195Additional attributes to remove from markup.
196
197Type: Array
198Default: `null`
199
200### add-block-tags
201
202Additional block level element tags.
203
204Type: Array
205Default: `null`
206
207### add-empty-tags
208
209Additional empty element tags.
210
211Type: Array
212Default: `null`
213
214### add-tags-to-remove
215
216Additional tags to remove from markup.
217
218Type: Array
219Default: `null`