UNPKG

15.6 kBMarkdownView Raw
1# Changelog
2
3## Version 9.0.1
4
5* fixed a broken link in readme: [#262](https://github.com/html-to-text/node-html-to-text/pull/262);
6* test and documented the usage of existing formatters from custom formatters in readme: [#263](https://github.com/html-to-text/node-html-to-text/issues/263);
7* fixed jsdoc comment for `BlockTextBuilder.closeTable`: [#264](https://github.com/html-to-text/node-html-to-text/issues/264);
8* added missing entry in the 9.0.0 changelog below regarding `BlockTextBuilder.closeTable`.
9
10All commits: [9.0.0...9.0.1](https://github.com/html-to-text/node-html-to-text/compare/9.0.0...9.0.1)
11
12## Version 9.0.0
13
14All commits: [8.2.1...9.0.0](https://github.com/html-to-text/node-html-to-text/compare/8.2.1...9.0.0)
15
16Version 9 roadmap: [#240](https://github.com/html-to-text/node-html-to-text/issues/240)
17
18Request for comments: [#261 \[RFC\] Naming issue](https://github.com/html-to-text/node-html-to-text/discussions/261) - please take a look and share opinions while you're here
19
20### Node version
21
22Required Node version is now >=14.
23
24### CommonJS and ES Module
25
26Package now provides `cjs` and `mjs` exports.
27
28### CLI is no longer built in
29
30If you use CLI then install [that package](https://github.com/html-to-text/node-html-to-text/tree/master/packages/html-to-text-cli/) instead.
31
32The new package uses new arg parser [aspargvs](https://github.com/mxxii/aspargvs) instead of minimist in order to deal with the vast options space of `html-to-text`.
33
34### Dependency updates
35
36* `htmlparser2` updated from 6.1.0 to 8.0.1 ([Release notes](https://github.com/fb55/htmlparser2/releases));
37* `he` dependency is removed. It was needed at the time it was introduced, apparently, but at this point `htmlparser2` seems to do a better job itself.
38
39### Removed features
40
41* Options deprecated in version 6 are now removed;
42* `decodeOptions` section removed with `he` dependency;
43* `fromString` method removed;
44* deprecated positional arguments in `BlockTextBuilder` methods are now removed.
45
46Refer to README for [migration instructions](https://github.com/html-to-text/node-html-to-text/tree/master/packages/html-to-text#deprecated-or-removed-options).
47
48### New options
49
50* `decodeEntities` - controls whether HTML entities found in the input HTML should be decoded or left as is in the output text;
51* `encodeCharacters` - a dictionary with characters that should be replaced in the output text and corresponding escape sequences.
52
53### New built-in formatters
54
55New generic formatters `blockString`, `blockTag`, `blockHtml`, `inlineString`, `inlineSurround`, `inlineTag`, `inlineHtml` cover some common usage scenarios such as [#231](https://github.com/html-to-text/node-html-to-text/issues/231).
56
57### Changes to existing built-in formatters
58
59* `anchor` and `image` got `pathRewrite` option;
60* `dataTable` formatter allows zero `colSpacing`.
61
62### Improvements for writing custom formatters
63
64* Some logic for making lists is moved to BlockTextBuilder and can be reused for custom lists (`openList`, `openListItem`, `closeListItem`, `closeList`). Addresses [#238](https://github.com/html-to-text/node-html-to-text/issues/238);
65* `startNoWrap`, `stopNoWrap` - allows to keep local inline content in a single line regardless of wrapping options;
66* `addLiteral` - it is like `addInline` but circumvents most of the text processing logic. This should be preferred when inserting markup elements;
67* It is now possible to provide a metadata object along with the HTML string to convert. Metadata object is available for custom formatters via `builder.metadata`. This allows to compile the converter once and still being able to supply per-document data. Metadata object is supplied as the last optional argument to `convert` function and the function returned by `compile` function;
68* Breaking change for those who dare to write their own table formatter (in case there is anyone) - `closeTable` function got a required property in the options object - `tableToString` function, and previously existed `colSpacing` and `rowSpacing` are removed (now a responsibility of the `tableToString` function).
69
70### Other
71
72* Fix deprecated `tags` option support. Addresses [#253](https://github.com/html-to-text/node-html-to-text/issues/253).
73
74
75----
76
77## Version 8.2.1
78
79No changes in published package. Bumped dev dependencies and regenerated `package-lock.json`.
80
81## Version 8.2.0
82
83Fix for the issue [#249](https://github.com/html-to-text/node-html-to-text/issues/249) and possibly other obscure issues when some selector options are ignored. `options.selectors` array was not fully processed before.
84
85## Version 8.1.1
86
87Bump `minimist` dependency, regenerate `package-lock.json`.
88
89## Version 8.1.0
90
91* Fix for too many newlines in certain cases when `preserveNewlines` option is used. Addresses [#232](https://github.com/html-to-text/node-html-to-text/issues/232);
92* Link and image formatters now have a `linkBrackets` option - it accepts an array of two strings (default: `['[', ']']`) or `false` to remove the brackets. Addresses [#236](https://github.com/html-to-text/node-html-to-text/issues/236);
93* `noLinkBrackets` formatters option is now deprecated.
94
95All commits: [8.0.0...8.1.0](https://github.com/html-to-text/node-html-to-text/compare/8.0.0...8.1.0)
96
97## Version 8.0.0
98
99All commits: [7.1.1...8.0.0](https://github.com/html-to-text/node-html-to-text/compare/7.1.1...8.0.0)
100
101Version 8 roadmap issue: [#228](https://github.com/html-to-text/node-html-to-text/issues/228)
102
103### Selectors
104
105The main focus of this version. Addresses the most demanded user requests ([#159](https://github.com/html-to-text/node-html-to-text/issues/159), [#179](https://github.com/html-to-text/node-html-to-text/issues/179), partially [#143](https://github.com/html-to-text/node-html-to-text/issues/143)).
106
107It is now possible to specify formatting options or assign custom formatters not only by tag names but by almost any selectors.
108
109See the README [Selectors](https://github.com/html-to-text/node-html-to-text#selectors) section for details.
110
111Note: The new `selectors` option is an array, in contrast to the `tags` option introduced in version 6 (and now deprecated). Selectors have to have a well defined order and object properties is not a right tool for that.
112
113Two new packages were created to enable this feature - [parseley](https://github.com/mxxii/parseley) and [selderee](https://github.com/mxxii/selderee).
114
115### Base elements
116
117The same selectors implementation is used now to narrow down the conversion to specific HTML DOM fragments. Addresses [#96](https://github.com/html-to-text/node-html-to-text/issues/96). (Previous implementation had more limited selectors format.)
118
119BREAKING CHANGE: All outermost elements matching provided selectors will be present in the output (previously it was only the first match for each selector). Addresses [#215](https://github.com/html-to-text/node-html-to-text/issues/215).
120
121`limits.maxBaseElements` can be used when you only need a fixed number of base elements and would like to avoid checking the rest of the source HTML document.
122
123Base elements can be arranged in output text in the order of matched selectors (default, to keep it closer to the old implementation) or in the order of appearance in source HTML document.
124
125BREAKING CHANGE: previous implementation was treating id selectors in the same way as class selectors (could match `<foo id="a b">` with `foo#a` selector). New implementation is closer to the spec and doesn't expect multiple ids on an element. You can achieve the old behavior with `foo[id~=a]` selector in case you rely on it for some poorly formatted documents (note that it has different specificity though).
126
127### Batch processing
128
129Since options preprocessing is getting more involved with selectors compilation, it seemed reasonable to break the single `htmlToText()` function into compilation and convertation steps. It might provide some performance benefits in client code.
130
131* new function `compile(options)` returns a function of a single argument (html string);
132* `htmlToText(html, options)` is now an alias to `convert(html, options)` function and works as before.
133
134### Deprecated options
135
136* `baseElement`;
137* `returnDomByDefault`;
138* `tables`;
139* `tags`.
140
141Refer to README for [migration instructions](https://github.com/html-to-text/node-html-to-text#deprecated-or-removed-options).
142
143No previously deprecated stuff is removed in this version. Significant cleanup is planned for version 9 instead.
144
145----
146
147## Version ~~7.1.2~~ 7.1.3
148
149Bump `minimist` dependency and dev dependencies, regenerate `package-lock.json`.
150
151## Version 7.1.1
152
153Regenerate `package-lock.json`.
154
155## Version 7.1.0
156
157### Dependency updates
158
159* `htmlparser2` updated from 6.0.0 to 6.1.0 ([Release notes](https://github.com/fb55/htmlparser2/releases));
160* dev dependencies are bumped.
161
162## Version 7.0.0
163
164### Node version
165
166Required Node version is now >=10.23.2.
167
168### Dependency updates
169
170* `lodash` dependency is removed;
171* `htmlparser2` updated from 4.1.0 to 6.0.0 ([Release notes](https://github.com/fb55/htmlparser2/releases), also [domhandler](https://github.com/fb55/domhandler/releases/tag/v4.0.0)). There is a slim chance you can run into some differences in case you're relying on it heavily in your custom formatters;
172* dev dependencies are bumped.
173
174### Custom formatters API change
175
176[BlockTextBuilder](https://github.com/html-to-text/node-html-to-text/blob/master/lib/block-text-builder.js) methods now accept option objects for optional arguments. This improves client code readability and allows to introduce extra options with ease. It will see some use in future updates.
177
178Positional arguments introduced in version 6.0.0 are now deprecated. Formatters written for the version 6.0.0 should keep working for now but the compatibility layer is rather inconvenient and will be removed with the next major version.
179
180See the commit [f50f10f](https://github.com/html-to-text/node-html-to-text/commit/f50f10f54cf814efb2f7633d9d377ba7eadeaf1e). Changes in `lib/formatter.js` file are illustrative for how to migrate to the new API.
181
182### And more
183
184* Bunch of documentation and test updates.
185
186All commits: [6.0.0...7.0.0](https://github.com/html-to-text/node-html-to-text/compare/6.0.0...7.0.0)
187
188Version 7 roadmap issue: [#222](https://github.com/html-to-text/node-html-to-text/issues/222)
189
190----
191
192## Version 6.0.0
193
194This is a major update. No code left untouched. While the goal was to keep as much compatibility as possible, some client-facing changes were unavoidable.
195
196### fromString() is deprecated in favor of htmlToText()
197
198Since the library has the only exported function, it is now self-titled.
199
200### Inline and block-level tags, HTML whitespace
201
202Formatting code was rewritten almost entirely to make it aware of block-level tags and to handle HTML whitespace properly. One of popular requests was to support divs, and it is here now, after a lot of effort.
203
204### Options reorganized
205
206Options are reorganized to make room for some extra format options while making everything more structured. Now tag-specific options live within that tag configuration.
207
208For the majority of changed options there is a compatibility layer that will remain until next major release. But you are encouraged to explore new options since they provide a bit more flexibility.
209
210### Custom formatters are different now
211
212Because formatters are integral part of the formatting code (as the name suggests), it wasn't possible to provide a compatibility layer.
213
214Please refer to the Readme to see how things are wired now, in case you were using them for anything other than dealing with the lack of block-level tags support.
215
216### Tables support was improved
217
218Cells can make use of extra space with colspan and rowspan attributes. Max column width is defined separately from global wordwrap limit.
219
220### Limits
221
222Multiple options to cut content in large HTML documents.
223
224By default, any input longer than 16 million characters will be truncated.
225
226### Node and dependencies
227
228Required Node version is now >=8.10.0.
229
230Dependency versions are bumped.
231
232### Repository is moved to it's own organization
233
234[https://github.com/html-to-text/node-html-to-text](https://github.com/html-to-text/node-html-to-text) is the new home.
235
236GitHub should handle all redirects from the old url, so it shouldn't break anything, even if you have a local fork pointing at the old origin. But it is still a good idea to [update](https://docs.github.com/en/free-pro-team@latest/github/using-git/changing-a-remotes-url) the url.
237
238### And more
239
240Version 6 roadmap issue: [#200](https://github.com/html-to-text/node-html-to-text/issues/200)
241
242----
243
244## Version 5.1.1
245
246* `preserveNewLines` whitespace issue fixed [#162](https://github.com/html-to-text/node-html-to-text/pull/162)
247
248## Version 5.1.0
249
250* Hard-coded CLI options removed [#173](https://github.com/html-to-text/node-html-to-text/pull/173)
251
252## Version 5.0.0
253
254### BREAKING CHANGES
255
256#### fromFile removed
257
258The function `fromFile` is removed. It was the main reason `html-to-text` could not be used in the browser [#164](https://github.com/html-to-text/node-html-to-text/pull/164).
259
260You can get the `fromFile` functionality back by using the following code
261
262```js
263const fs = require('fs');
264const { fromString } = require('html-to-text');
265
266// Callback version
267const fromFile = (file, options, callback) => {
268 if (!callback) {
269 callback = options;
270 options = {};
271 }
272 fs.readFile(file, 'utf8', (err, str) => {
273 if (err) return callback(err);
274 callback(null, fromString(str, options));
275 });
276};
277
278// Promise version
279const fromFile = (file, option) => fs.promises.readFile(file, 'utf8').then(html => fromString(html, options));
280
281// Sync version
282const fromFileSync = (file, options) => fromString(fs.readFileSync(file, 'utf8'), options);
283```
284
285#### Supported NodeJS Versions
286
287Node versions < 6 are no longer supported.
288
289----
290
291## Version 4.0.0
292
293* Support dropped for node version < 4.
294* New option `unorderedListItemPrefix` added.
295* HTML entities in links are not supported.
296
297----
298
299## Version 3.3.0
300
301* Ability to pass custom formatting via the `format` option #128
302* Enhanced support for alpha ordered list types added #123
303
304## Version 3.2.0
305
306* Basic support for alpha ordered list types added #122
307 * This includes support for the `ol` type values `1`, `a` and `A`
308
309## Version 3.1.0
310
311* Support for the ordered list start attribute added #117
312* Option to format paragraph with single new line #112
313* `noLinksBrackets` options added #119
314
315## Version 3.0.0
316
317* Switched from `htmlparser` to `htmlparser2` #113
318* Treat non-numeric colspans as zero and handle them gracefully #105
319
320----
321
322## Version 2.1.1
323
324* Extra space problem fixed. #88
325
326## Version 2.1.0
327
328* New option to disable `uppercaseHeadings` added. #86
329* Starting point of html to text conversion can now be defined in the options via the `baseElement` option. #83
330* Support for long words added. The behaviour can be configured via the `longWordSplit` option. #83
331
332## Version 2.0.0
333
334* Unicode support added. #81
335* New option `decodeOptions` added.
336* Dependencies updated.
337
338Breaking Changes:
339
340* Minimum node version increased to >=0.10.0
341
342----
343
344## Version 1.6.2
345
346* Fixed: correctly handle HTML entities for images #82
347
348## Version 1.6.1
349
350* Fixed: using --tables=true doesn't produce the expected results. #80
351
352## Version 1.6.0
353
354* Preserve newlines in text feature added #75
355
356## Version 1.5.1
357
358* Support for h5 and h6 tags added #74
359
360## Version 1.5.0
361
362* Entity regex is now less greedy #69 #70
363
364## Version 1.4.0
365
366* Uppercase tag processing added. Table center support added. #56
367* Unused dependencies removed.
368
369## Version 1.3.2
370
371* Support Node 4 engine #64