UNPKG

11.2 kBMarkdownView Raw
1# node-html-to-text
2
3An advanced converter that parses HTML and returns beautiful text. It was mainly designed to transform HTML E-Mail templates to a text representation. So it is currently optimized for table layouts.
4
5### Features:
6
7 * Transform headlines to uppercase text.
8 * Convert tables to an appropiate text representation with rows and columns.
9 * Word wrapping for paragraphs (default 80 chars)
10 * Automatic extraction of href information from links.
11 * `<br>` conversion to `\n`
12
13## Installation
14
15```
16npm install html-to-text
17```
18
19Or when you want to use it as command line interface it is recommended to install it globally via
20
21```
22npm install html-to-text -g
23```
24
25## Usage
26You can read from a file via:
27
28```
29var htmlToText = require('html-to-text');
30
31htmlToText.fromFile(path.join(__dirname, 'test.html'), {
32 tables: ['#invoice', '.address']
33}, function(err, text) {
34 if (err) return console.error(err);
35 console.log(text);
36});
37```
38
39or directly from a string:
40
41```
42var htmlToText = require('html-to-text');
43
44var text = htmlToText.fromString('<h1>Hello World</h1>', {
45 wordwrap: 130
46});
47console.log(text);
48```
49
50### Options:
51
52You can configure the behaviour of html-to-text with the following options:
53
54 * `tables` allows to select certain tables by the `class` or `id` attribute from the HTML document. This is necessary because the majority of HTML E-Mails uses a table based layout. Prefix your table selectors with an `.` for the `class` and with a `#` for the `id` attribute. All other tables are ignored. You can assign `true` to this attribute to select all tables. Default: `[]`
55 * `wordwrap` defines after how many chars a line break should follow in `p` elements. Set to `null` or `false` to disable word-wrapping. Default: `80`
56 * `linkHrefBaseUrl` allows you to specify the server host for href attributes, where the links start at the root (`/`). For example, `linkHrefBaseUrl = 'http://asdf.com'` and `<a href='/dir/subdir'>...</a>` the link in the text will be `http://asdf.com/dir/subdir`. Keep in mind that `linkHrefBaseUrl` shouldn't end with a `/`.
57 * `hideLinkHrefIfSameAsText` by default links are translated the following `<a href='link'>text</a>` => becomes => `text [link]`. If this option is set to true and `link` and `text` are the same, `[link]` will be hidden and only `text` visible.
58 * `ignoreHref` ignore all document links if `true`.
59 * `ignoreImage` ignore all document images if `true`.
60 * `preserveNewlines` by default, any newlines `\n` in a block of text will be removed. If `true`, these newlines will not be removed.
61
62## Command Line Interface
63
64It is possible to use html-to-text as command line interface. This allows an easy validation of your generated text and the integration in other systems that does not run on node.js.
65
66`html-to-text` uses `stdin` and `stdout` for data in and output. So you can use `html-to-text` the following way:
67
68```
69cat example/test.html | html-to-text > test.txt
70```
71
72There also all options available as described above. You can use them like this:
73
74```
75cat example/test.html | html-to-text --tables=#invoice,.address --wordwrap=100 > test.txt
76```
77
78The `tables` option has to be declared as comma separated list without whitespaces.
79
80## Example
81
82```
83<html>
84 <head>
85 <meta charset="utf-8">
86 </head>
87
88 <body>
89 <table cellpadding="0" cellspacing="0" border="0">
90 <tr>
91 <td>
92 <h2>Paragraphs</h2>
93 <p class="normal-space">At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. <a href="www.github.com">Github</a>
94 </p>
95 <p class="normal-space">At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.
96 </p>
97 </td>
98 <td></td>
99 </tr>
100 <tr>
101 <td>
102 <hr/>
103 <h2>Pretty printed table</h2>
104 <table id="invoice">
105 <thead>
106 <tr>
107 <th>Article</th>
108 <th>Price</th>
109 <th>Taxes</th>
110 <th>Amount</th>
111 <th>Total</th>
112 </tr>
113 </thead>
114 <tbody>
115 <tr>
116 <td>
117 <p>
118 Product 1<br />
119 <span style="font-size:0.8em">Contains: 1x Product 1</span>
120 </p>
121 </td>
122 <td align="right" valign="top">6,99&euro;</td>
123 <td align="right" valign="top">7%</td>
124 <td align="right" valign="top">1</td>
125 <td align="right" valign="top">6,99€</td>
126 </tr>
127 <tr>
128 <td>Shipment costs</td>
129 <td align="right">3,25€</td>
130 <td align="right">7%</td>
131 <td align="right">1</td>
132 <td align="right">3,25€</td>
133 </tr>
134 </tbody>
135 <tfoot>
136 <tr>
137 <td>&nbsp;</td>
138 <td>&nbsp;</td>
139 <td colspan="3">to pay: 10,24€</td>
140 </tr>
141 <tr>
142 <td></td>
143 <td></td>
144 <td colspan="3">Taxes 7%: 0,72€</td>
145 </tr>
146 </tfoot>
147 </table>
148
149 </td>
150 <td></td>
151 </tr>
152 <tr>
153 <td>
154 <hr/>
155 <h2>Lists</h2>
156 <ul>
157 <li>At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.</li>
158 <li>At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.</li>
159 </ul>
160 <ol>
161 <li>At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.</li>
162 <li>At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.</li>
163 </ol>
164 </td>
165 </tr>
166 <tr>
167 <td>
168 <hr />
169 <h2>Column Layout with tables</h2>
170 <table class="address">
171 <tr>
172 <th align="left">Invoice Address</th>
173 <th align="left">Shipment Address</th>
174 </tr>
175 <tr>
176 <td align="left">
177 <p>
178 Mr.<br/>
179 John Doe<br/>
180 Featherstone Street 49<br/>
181 28199 Bremen<br/>
182 </p>
183 </td>
184 <td align="left">
185 <p>
186 Mr.<br/>
187 John Doe<br/>
188 Featherstone Street 49<br/>
189 28199 Bremen<br/>
190 </p>
191 </td>
192 </tr>
193 </table>
194 </td>
195 <td></td>
196 </tr>
197 <tr>
198 <td>
199 <hr/>
200 <h2>Mailto formating</h2>
201 <p class="normal-space small">
202 Some Company<br />
203 Some Street 42<br />
204 Somewhere<br />
205 E-Mail: <a href="mailto:test@example.com">Click here</a>
206 </p>
207 </td>
208 </tr>
209 </table>
210 </body>
211</html>
212```
213
214Gets converted to:
215
216```
217PARAGRAPHS
218At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd
219gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum
220dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor
221invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos
222et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea
223takimata sanctus est Lorem ipsum dolor sit amet. Github [www.github.com]
224
225At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd
226gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet. Lorem ipsum
227dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor
228invidunt ut labore et dolore magna aliquyam erat, sed diam voluptua. At vero eos
229et accusam et justo duo dolores et ea rebum. Stet clita kasd gubergren, no sea
230takimata sanctus est Lorem ipsum dolor sit amet.
231
232--------------------------------------------------------------------------------
233
234PRETTY PRINTED TABLE
235ARTICLE PRICE TAXES AMOUNT TOTAL
236Product 1 6,99€ 7% 1 6,99€
237Contains: 1x Product 1
238Shipment costs 3,25€ 7% 1 3,25€
239 to pay: 10,24€
240 Taxes 7%: 0,72€
241
242--------------------------------------------------------------------------------
243
244LISTS
245 * At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd
246 gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.
247 * At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd
248 gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.
249
250 1. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd
251 gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.
252 2. At vero eos et accusam et justo duo dolores et ea rebum. Stet clita kasd
253 gubergren, no sea takimata sanctus est Lorem ipsum dolor sit amet.
254
255--------------------------------------------------------------------------------
256
257COLUMN LAYOUT WITH TABLES
258INVOICE ADDRESS SHIPMENT ADDRESS
259Mr. Mr.
260John Doe John Doe
261Featherstone Street 49 Featherstone Street 49
26228199 Bremen 28199 Bremen
263
264--------------------------------------------------------------------------------
265
266MAILTO FORMATING
267Some Company
268Some Street 42
269Somewhere
270E-Mail: Click here [test@example.com]
271```
272
273## License
274
275(The MIT License)
276
277Copyright (c) 2015 werk85 &lt;legenhausen@werk85.de&gt;
278
279Permission is hereby granted, free of charge, to any person obtaining
280a copy of this software and associated documentation files (the
281'Software'), to deal in the Software without restriction, including
282without limitation the rights to use, copy, modify, merge, publish,
283distribute, sublicense, and/or sell copies of the Software, and to
284permit persons to whom the Software is furnished to do so, subject to
285the following conditions:
286
287The above copyright notice and this permission notice shall be
288included in all copies or substantial portions of the Software.
289
290THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND,
291EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
292MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
293IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
294CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
295TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
296SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
297
298[![githalytics.com alpha](https://cruel-carlota.pagodabox.com/cc03826a7e68e1bb680cf2226276e031 "githalytics.com")](http://githalytics.com/werk85/node-html-to-text)