1 | ## HTML cleaner and beautifier
|
2 |
|
3 | Do you have crappy HTML? I do!
|
4 |
|
5 | ```html
|
6 | <table width="100%" border="0" cellspacing="0" cellpadding="0">
|
7 | <tr>
|
8 | <td height="31"><b>Currently we have these articles available:</b>
|
9 |
|
10 | <blockquote>
|
11 | <p><a href="foo.html">The History of Foo</a><br />
|
12 | An <span color="red">informative</span> piece of <FONT FACE="ARIAL">information</FONT>.</p>
|
13 | <p><a href="bar.html">A Horse Walked Into a Bar</a><br/> The bartender said
|
14 | "Why the long face?"</p>
|
15 | </blockquote>
|
16 | </td>
|
17 | </tr>
|
18 | </table>
|
19 | ```
|
20 |
|
21 | Just look at those blank lines and random line breaks, trailing spaces, mixed tabs, deprecated tags - it's outrageous!
|
22 |
|
23 | Let's clean it up...
|
24 |
|
25 | ```bash
|
26 | $ npm install clean-html
|
27 | ```
|
28 |
|
29 | ```javascript
|
30 | var cleaner = require('clean-html'),
|
31 | fs = require('fs'),
|
32 | file = process.argv[2];
|
33 |
|
34 | fs.readFile(file, function (err, data) {
|
35 | process.stdout.write(cleaner.clean(data) + '\n');
|
36 | });
|
37 | ```
|
38 |
|
39 | Sanity restored!
|
40 |
|
41 | ```html
|
42 | <table>
|
43 | <tr>
|
44 | <td>
|
45 | <b>Currently we have these articles available:</b>
|
46 | <blockquote>
|
47 | <p>
|
48 | <a href="foo.html">The History of Foo</a><br>
|
49 | An <span>informative</span> piece of information.
|
50 | </p>
|
51 | <p>
|
52 | <a href="bar.html">A Horse Walked Into a Bar</a><br>
|
53 | The bartender said "Why the long face?"
|
54 | </p>
|
55 | </blockquote>
|
56 | </td>
|
57 | </tr>
|
58 | </table>
|
59 | ```
|
60 |
|
61 | ## Options
|
62 |
|
63 | ### attr-to-remove
|
64 |
|
65 | Attributes to remove from markup.
|
66 |
|
67 | Type: Array
|
68 | Default: `['align', 'valign', 'bgcolor', 'color', 'width', 'height', 'border', 'cellpadding', 'cellspacing']`
|
69 |
|
70 | ### block-tags
|
71 |
|
72 | Block level element tags. Line breaks are added before and after, and nested content is indented. Note: this option has no effect unless pretty print is enabled.
|
73 |
|
74 | Type: Array
|
75 | Default: `['div', 'p', 'table', 'tr', 'td', 'blockquote', 'hr']`
|
76 |
|
77 | ### empty-tags
|
78 |
|
79 | Empty element tags. Trailing slashes are removed.
|
80 |
|
81 | Type: Array
|
82 | Default: `['br', 'hr', 'img']`
|
83 |
|
84 | ### encoding
|
85 |
|
86 | Using this option to specify the encoding of the input file will ensure its contents are properly converted to utf-8.
|
87 |
|
88 | Type: String
|
89 | Default: `utf-8`
|
90 |
|
91 | ### pretty
|
92 |
|
93 | Pretty prints the output by adding line breaks and indentation.
|
94 |
|
95 | Type: Boolean
|
96 | Default: `true`
|
97 |
|
98 | ### remove-comments
|
99 |
|
100 | Removes comments.
|
101 |
|
102 | Type: Boolean
|
103 | Default: `false`
|
104 |
|
105 | ### tags-to-remove
|
106 |
|
107 | Tags to remove from markup.
|
108 |
|
109 | Type: Array
|
110 | Default: `['font']`
|
111 |
|
112 | ## Adding values to option lists
|
113 |
|
114 | These options are added for your convenience.
|
115 |
|
116 | ### add-attr-to-remove
|
117 |
|
118 | Additional attributes to remove from markup.
|
119 |
|
120 | Type: Array
|
121 | Default: `null`
|
122 |
|
123 | ### add-block-tags
|
124 |
|
125 | Additional block level element tags.
|
126 |
|
127 | Type: Array
|
128 | Default: `null`
|
129 |
|
130 | ### add-empty-tags
|
131 |
|
132 | Additional empty element tags.
|
133 |
|
134 | Type; Array
|
135 | Default: `null`
|
136 |
|
137 | ### add-tags-to-remove
|
138 |
|
139 | Additional tags to remove from markup.
|
140 |
|
141 | Type; Array
|
142 | Default: `null`
|