1 | ## HTML cleaner and beautifier
|
2 |
|
3 | [![NPM Stats](https://nodei.co/npm/clean-html.png?downloads=true&downloadRank=true)](https://npmjs.org/packages/clean-html/)
|
4 |
|
5 | Do you have crappy HTML? I do!
|
6 |
|
7 | ```html
|
8 | <table width="100%" border="0" cellspacing="0" cellpadding="0">
|
9 | <tr>
|
10 | <td height="31"><b>Currently we have these articles available:</b>
|
11 |
|
12 | <blockquote>
|
13 | <!-- List articles -->
|
14 | <p><a href="foo.html">The History of Foo</a><br />
|
15 | An <span color="red">informative</span> piece of <FONT FACE="ARIAL">information</FONT>.</p>
|
16 | <p><a href="bar.html">A Horse Walked Into a Bar</a><br/> The bartender said
|
17 | "Why the long face?"</p>
|
18 | </blockquote>
|
19 | </td>
|
20 | </tr>
|
21 | </table>
|
22 | ```
|
23 |
|
24 | Just look at those blank lines and random line breaks, trailing spaces, mixed tabs, deprecated tags - it's outrageous!
|
25 |
|
26 | Let's clean it up...
|
27 |
|
28 | ```bash
|
29 | $ npm install clean-html
|
30 | ```
|
31 |
|
32 | ```javascript
|
33 | var cleaner = require('clean-html'),
|
34 | fs = require('fs'),
|
35 | file = process.argv[2];
|
36 |
|
37 | fs.readFile(file, 'utf-8', function (err, data) {
|
38 | cleaner.clean(data, function (html) {
|
39 | console.log(html);
|
40 | });
|
41 | });
|
42 | ```
|
43 |
|
44 | Sanity restored!
|
45 |
|
46 | ```html
|
47 | <table>
|
48 | <tr>
|
49 | <td>
|
50 | <b>Currently we have these articles available:</b>
|
51 | <blockquote>
|
52 | <!-- List articles -->
|
53 | <p>
|
54 | <a href="foo.html">The History of Foo</a><br>
|
55 | An <span>informative</span> piece of information.
|
56 | </p>
|
57 | <p>
|
58 | <a href="bar.html">A Horse Walked Into a Bar</a><br>
|
59 | The bartender said "Why the long face?"
|
60 | </p>
|
61 | </blockquote>
|
62 | </td>
|
63 | </tr>
|
64 | </table>
|
65 | ```
|
66 |
|
67 | ## Options
|
68 |
|
69 | ### attr-to-remove
|
70 |
|
71 | Attributes to remove from markup.
|
72 |
|
73 | Type: Array
|
74 | Default: `['align', 'bgcolor', 'border', 'cellpadding', 'cellspacing', 'color', 'disabled', 'height', 'target', 'valign', 'width']`
|
75 |
|
76 | ### block-tags
|
77 |
|
78 | Block level element tags. Line breaks are added before and after, and nested content is indented.
|
79 |
|
80 | Type: Array
|
81 | Default: `['blockquote', 'div', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'hr', 'p', 'table', 'td', 'tr']`
|
82 |
|
83 | ### break-around-comments
|
84 |
|
85 | Adds line breaks before and after comments.
|
86 |
|
87 | Type: Boolean
|
88 | Default: `true`
|
89 |
|
90 | ### break-after-br
|
91 |
|
92 | Adds line breaks after br tags.
|
93 |
|
94 | Type: Boolean
|
95 | Default: `true`
|
96 |
|
97 | ### empty-tags
|
98 |
|
99 | Empty element tags.
|
100 |
|
101 | Type: Array
|
102 | Default: `['br', 'hr', 'img']`
|
103 |
|
104 | ### indent
|
105 |
|
106 | The string to use for indentation. e.g., a tab character or one or more spaces.
|
107 |
|
108 | Type: String
|
109 | Default: `' '` (two spaces)
|
110 |
|
111 | ### remove-comments
|
112 |
|
113 | Removes comments.
|
114 |
|
115 | Type: Boolean
|
116 | Default: `false`
|
117 |
|
118 | ### remove-empty-paras
|
119 |
|
120 | Removes empty paragraph tags.
|
121 |
|
122 | Type: Boolean
|
123 | Default: `false`
|
124 |
|
125 | ### replace-nbsp
|
126 |
|
127 | Replaces non-breaking white space entities (` `) with regular spaces.
|
128 |
|
129 | Type: Boolean
|
130 | Default: `false`
|
131 |
|
132 | ### tags-to-remove
|
133 |
|
134 | Tags to remove from markup.
|
135 |
|
136 | Type: Array
|
137 | Default: `['center', 'font']`
|
138 |
|
139 | ## Adding values to option lists
|
140 |
|
141 | These options are added for your convenience.
|
142 |
|
143 | ### add-attr-to-remove
|
144 |
|
145 | Additional attributes to remove from markup.
|
146 |
|
147 | Type: Array
|
148 | Default: `null`
|
149 |
|
150 | ### add-block-tags
|
151 |
|
152 | Additional block level element tags.
|
153 |
|
154 | Type: Array
|
155 | Default: `null`
|
156 |
|
157 | ### add-empty-tags
|
158 |
|
159 | Additional empty element tags.
|
160 |
|
161 | Type: Array
|
162 | Default: `null`
|
163 |
|
164 | ### add-tags-to-remove
|
165 |
|
166 | Additional tags to remove from markup.
|
167 |
|
168 | Type: Array
|
169 | Default: `null`
|