UNPKG

11.1 kBMarkdownView Raw
1# Changelog
2
3## Version 8.2.0
4
5Fix for the issue [#249](https://github.com/html-to-text/node-html-to-text/issues/249) and possibly other obscure issues when some selector options are ignored. `options.selectors` array was not fully processed before.
6
7## Version 8.1.1
8
9Bump `minimist` dependency, regenerate `package-lock.json`.
10
11## Version 8.1.0
12
13* Fix for too many newlines in certain cases when `preserveNewlines` option is used. Addresses [#232](https://github.com/html-to-text/node-html-to-text/issues/232);
14* Link and image formatters now have a `linkBrackets` option - it accepts an array of two strings (default: `['[', ']']`) or `false` to remove the brackets. Addresses [#236](https://github.com/html-to-text/node-html-to-text/issues/236);
15* `noLinkBrackets` formatters option is now deprecated.
16
17All commits: [8.0.0...8.1.0](https://github.com/html-to-text/node-html-to-text/compare/8.0.0...8.1.0)
18
19## Version 8.0.0
20
21All commits: [7.1.1...8.0.0](https://github.com/html-to-text/node-html-to-text/compare/7.1.1...8.0.0)
22
23Version 8 roadmap issue: [#228](https://github.com/html-to-text/node-html-to-text/issues/228)
24
25### Selectors
26
27The main focus of this version. Addresses the most demanded user requests ([#159](https://github.com/html-to-text/node-html-to-text/issues/159), [#179](https://github.com/html-to-text/node-html-to-text/issues/179), partially [#143](https://github.com/html-to-text/node-html-to-text/issues/143)).
28
29It is now possible to specify formatting options or assign custom formatters not only by tag names but by almost any selectors.
30
31See the README [Selectors](https://github.com/html-to-text/node-html-to-text#selectors) section for details.
32
33Note: The new `selectors` option is an array, in contrast to the `tags` option introduced in version 6 (and now deprecated). Selectors have to have a well defined order and object properties is not a right tool for that.
34
35Two new packages were created to enable this feature - [parseley](https://github.com/mxxii/parseley) and [selderee](https://github.com/mxxii/selderee).
36
37### Base elements
38
39The same selectors implementation is used now to narrow down the conversion to specific HTML DOM fragments. Addresses [#96](https://github.com/html-to-text/node-html-to-text/issues/96). (Previous implementation had more limited selectors format.)
40
41BREAKING CHANGE: All outermost elements matching provided selectors will be present in the output (previously it was only the first match for each selector). Addresses [#215](https://github.com/html-to-text/node-html-to-text/issues/215).
42
43`limits.maxBaseElements` can be used when you only need a fixed number of base elements and would like to avoid checking the rest of the source HTML document.
44
45Base elements can be arranged in output text in the order of matched selectors (default, to keep it closer to the old implementation) or in the order of appearance in source HTML document.
46
47BREAKING CHANGE: previous implementation was treating id selectors in the same way as class selectors (could match `<foo id="a b">` with `foo#a` selector). New implementation is closer to the spec and doesn't expect multiple ids on an element. You can achieve the old behavior with `foo[id~=a]` selector in case you rely on it for some poorly formatted documents (note that it has different specificity though).
48
49### Batch processing
50
51Since options preprocessing is getting more involved with selectors compilation, it seemed reasonable to break the single `htmlToText()` function into compilation and convertation steps. It might provide some performance benefits in client code.
52
53* new function `compile(options)` returns a function of a single argument (html string);
54* `htmlToText(html, options)` is now an alias to `convert(html, options)` function and works as before.
55
56### Deprecated options
57
58* `baseElement`;
59* `returnDomByDefault`;
60* `tables`;
61* `tags`.
62
63Refer to README for [migration instructions](https://github.com/html-to-text/node-html-to-text#deprecated-or-removed-options).
64
65No previously deprecated stuff is removed in this version. Significant cleanup is planned for version 9 instead.
66
67## Version 7.1.1
68
69Regenerate `package-lock.json`.
70
71## Version 7.1.0
72
73### Dependency updates
74
75* `htmlparser2` updated from 6.0.0 to 6.1.0 ([Release notes](https://github.com/fb55/htmlparser2/releases));
76* dev dependencies are bumped.
77
78## Version 7.0.0
79
80### Node version
81
82Required Node version is now >=10.23.2.
83
84### Dependency updates
85
86* `lodash` dependency is removed;
87* `htmlparser2` updated from 4.1.0 to 6.0.0 ([Release notes](https://github.com/fb55/htmlparser2/releases), also [domhandler](https://github.com/fb55/domhandler/releases/tag/v4.0.0)). There is a slim chance you can run into some differences in case you're relying on it heavily in your custom formatters;
88* dev dependencies are bumped.
89
90### Custom formatters API change
91
92[BlockTextBuilder](https://github.com/html-to-text/node-html-to-text/blob/master/lib/block-text-builder.js) methods now accept option objects for optional arguments. This improves client code readability and allows to introduce extra options with ease. It will see some use in future updates.
93
94Positional arguments introduced in version 6.0.0 are now deprecated. Formatters written for the version 6.0.0 should keep working for now but the compatibility layer is rather inconvenient and will be removed with the next major version.
95
96See the commit [f50f10f](https://github.com/html-to-text/node-html-to-text/commit/f50f10f54cf814efb2f7633d9d377ba7eadeaf1e). Changes in `lib/formatter.js` file are illustrative for how to migrate to the new API.
97
98### And more
99
100* Bunch of documentation and test updates.
101
102All commits: [6.0.0...7.0.0](https://github.com/html-to-text/node-html-to-text/compare/6.0.0...7.0.0)
103
104Version 7 roadmap issue: [#222](https://github.com/html-to-text/node-html-to-text/issues/222)
105
106## Version 6.0.0
107
108This is a major update. No code left untouched. While the goal was to keep as much compatibility as possible, some client-facing changes were unavoidable.
109
110### fromString() is deprecated in favor of htmlToText()
111
112Since the library has the only exported function, it is now self-titled.
113
114### Inline and block-level tags, HTML whitespace
115
116Formatting code was rewritten almost entirely to make it aware of block-level tags and to handle HTML whitespace properly. One of popular requests was to support divs, and it is here now, after a lot of effort.
117
118### Options reorganized
119
120Options are reorganized to make room for some extra format options while making everything more structured. Now tag-specific options live within that tag configuration.
121
122For the majority of changed options there is a compatibility layer that will remain until next major release. But you are encouraged to explore new options since they provide a bit more flexibility.
123
124### Custom formatters are different now
125
126Because formatters are integral part of the formatting code (as the name suggests), it wasn't possible to provide a compatibility layer.
127
128Please refer to the Readme to see how things are wired now, in case you were using them for anything other than dealing with the lack of block-level tags support.
129
130### Tables support was improved
131
132Cells can make use of extra space with colspan and rowspan attributes. Max column width is defined separately from global wordwrap limit.
133
134### Limits
135
136Multiple options to cut content in large HTML documents.
137
138By default, any input longer than 16 million characters will be truncated.
139
140### Node and dependencies
141
142Required Node version is now >=8.10.0.
143
144Dependency versions are bumped.
145
146### Repository is moved to it's own organization
147
148[https://github.com/html-to-text/node-html-to-text](https://github.com/html-to-text/node-html-to-text) is the new home.
149
150GitHub should handle all redirects from the old url, so it shouldn't break anything, even if you have a local fork pointing at the old origin. But it is still a good idea to [update](https://docs.github.com/en/free-pro-team@latest/github/using-git/changing-a-remotes-url) the url.
151
152### And more
153
154Version 6 roadmap issue: [#200](https://github.com/html-to-text/node-html-to-text/issues/200)
155
156## Version 5.1.1
157
158* `preserveNewLines` whitespace issue fixed [#162](https://github.com/html-to-text/node-html-to-text/pull/162)
159
160## Version 5.1.0
161
162* Hard-coded CLI options removed [#173](https://github.com/html-to-text/node-html-to-text/pull/173)
163
164## Version 5.0.0
165
166### BREAKING CHANGES
167
168#### fromFile removed
169
170The function `fromFile` is removed. It was the main reason `html-to-text` could not be used in the browser [#164](https://github.com/html-to-text/node-html-to-text/pull/164).
171
172You can get the `fromFile` functionality back by using the following code
173
174```js
175const fs = require('fs');
176const { fromString } = require('html-to-text');
177
178// Callback version
179const fromFile = (file, options, callback) => {
180 if (!callback) {
181 callback = options;
182 options = {};
183 }
184 fs.readFile(file, 'utf8', (err, str) => {
185 if (err) return callback(err);
186 callback(null, fromString(str, options));
187 });
188};
189
190// Promise version
191const fromFile = (file, option) => fs.promises.readFile(file, 'utf8').then(html => fromString(html, options));
192
193// Sync version
194const fromFileSync = (file, options) => fromString(fs.readFileSync(file, 'utf8'), options);
195```
196
197#### Supported NodeJS Versions
198
199Node versions < 6 are no longer supported.
200
201## Version 4.0.0
202
203* Support dropped for node version < 4.
204* New option `unorderedListItemPrefix` added.
205* HTML entities in links are not supported.
206
207## Version 3.3.0
208
209* Ability to pass custom formatting via the `format` option #128
210* Enhanced support for alpha ordered list types added #123
211
212## Version 3.2.0
213
214* Basic support for alpha ordered list types added #122
215 * This includes support for the `ol` type values `1`, `a` and `A`
216
217## Version 3.1.0
218
219* Support for the ordered list start attribute added #117
220* Option to format paragraph with single new line #112
221* `noLinksBrackets` options added #119
222
223## Version 3.0.0
224
225* Switched from `htmlparser` to `htmlparser2` #113
226* Treat non-numeric colspans as zero and handle them gracefully #105
227
228## Version 2.1.1
229
230* Extra space problem fixed. #88
231
232## Version 2.1.0
233
234* New option to disable `uppercaseHeadings` added. #86
235* Starting point of html to text conversion can now be defined in the options via the `baseElement` option. #83
236* Support for long words added. The behaviour can be configured via the `longWordSplit` option. #83
237
238## Version 2.0.0
239
240* Unicode support added. #81
241* New option `decodeOptions` added.
242* Dependencies updated.
243
244Breaking Changes:
245
246* Minimum node version increased to >=0.10.0
247
248## Version 1.6.2
249
250* Fixed: correctly handle HTML entities for images #82
251
252## Version 1.6.1
253
254* Fixed: using --tables=true doesn't produce the expected results. #80
255
256## Version 1.6.0
257
258* Preserve newlines in text feature added #75
259
260## Version 1.5.1
261
262* Support for h5 and h6 tags added #74
263
264## Version 1.5.0
265
266* Entity regex is now less greedy #69 #70
267
268## Version 1.4.0
269
270* Uppercase tag processing added. Table center support added. #56
271* Unused dependencies removed.
272
273## Version 1.3.2
274
275* Support Node 4 engine #64