UNPKG

15.9 kBMarkdownView Raw
1# Changelog
2
3## Version 9.0.2
4
5* support multi-character code points in `encodeCharacters` option: [#267](https://github.com/html-to-text/node-html-to-text/issues/267).
6
7All commits: [9.0.0...9.0.1](https://github.com/html-to-text/node-html-to-text/compare/9.0.1...9.0.2)
8
9## Version 9.0.1
10
11* fixed a broken link in readme: [#262](https://github.com/html-to-text/node-html-to-text/pull/262);
12* test and documented the usage of existing formatters from custom formatters in readme: [#263](https://github.com/html-to-text/node-html-to-text/issues/263);
13* fixed jsdoc comment for `BlockTextBuilder.closeTable`: [#264](https://github.com/html-to-text/node-html-to-text/issues/264);
14* added missing entry in the 9.0.0 changelog below regarding `BlockTextBuilder.closeTable`.
15
16All commits: [9.0.0...9.0.1](https://github.com/html-to-text/node-html-to-text/compare/9.0.0...9.0.1)
17
18## Version 9.0.0
19
20All commits: [8.2.1...9.0.0](https://github.com/html-to-text/node-html-to-text/compare/8.2.1...9.0.0)
21
22Version 9 roadmap: [#240](https://github.com/html-to-text/node-html-to-text/issues/240)
23
24Request for comments: [#261 \[RFC\] Naming issue](https://github.com/html-to-text/node-html-to-text/discussions/261) - please take a look and share opinions while you're here
25
26### Node version
27
28Required Node version is now >=14.
29
30### CommonJS and ES Module
31
32Package now provides `cjs` and `mjs` exports.
33
34### CLI is no longer built in
35
36If you use CLI then install [that package](https://github.com/html-to-text/node-html-to-text/tree/master/packages/html-to-text-cli/) instead.
37
38The new package uses new arg parser [aspargvs](https://github.com/mxxii/aspargvs) instead of minimist in order to deal with the vast options space of `html-to-text`.
39
40### Dependency updates
41
42* `htmlparser2` updated from 6.1.0 to 8.0.1 ([Release notes](https://github.com/fb55/htmlparser2/releases));
43* `he` dependency is removed. It was needed at the time it was introduced, apparently, but at this point `htmlparser2` seems to do a better job itself.
44
45### Removed features
46
47* Options deprecated in version 6 are now removed;
48* `decodeOptions` section removed with `he` dependency;
49* `fromString` method removed;
50* deprecated positional arguments in `BlockTextBuilder` methods are now removed.
51
52Refer to README for [migration instructions](https://github.com/html-to-text/node-html-to-text/tree/master/packages/html-to-text#deprecated-or-removed-options).
53
54### New options
55
56* `decodeEntities` - controls whether HTML entities found in the input HTML should be decoded or left as is in the output text;
57* `encodeCharacters` - a dictionary with characters that should be replaced in the output text and corresponding escape sequences.
58
59### New built-in formatters
60
61New generic formatters `blockString`, `blockTag`, `blockHtml`, `inlineString`, `inlineSurround`, `inlineTag`, `inlineHtml` cover some common usage scenarios such as [#231](https://github.com/html-to-text/node-html-to-text/issues/231).
62
63### Changes to existing built-in formatters
64
65* `anchor` and `image` got `pathRewrite` option;
66* `dataTable` formatter allows zero `colSpacing`.
67
68### Improvements for writing custom formatters
69
70* Some logic for making lists is moved to BlockTextBuilder and can be reused for custom lists (`openList`, `openListItem`, `closeListItem`, `closeList`). Addresses [#238](https://github.com/html-to-text/node-html-to-text/issues/238);
71* `startNoWrap`, `stopNoWrap` - allows to keep local inline content in a single line regardless of wrapping options;
72* `addLiteral` - it is like `addInline` but circumvents most of the text processing logic. This should be preferred when inserting markup elements;
73* It is now possible to provide a metadata object along with the HTML string to convert. Metadata object is available for custom formatters via `builder.metadata`. This allows to compile the converter once and still being able to supply per-document data. Metadata object is supplied as the last optional argument to `convert` function and the function returned by `compile` function;
74* Breaking change for those who dare to write their own table formatter (in case there is anyone) - `closeTable` function got a required property in the options object - `tableToString` function, and previously existed `colSpacing` and `rowSpacing` are removed (now a responsibility of the `tableToString` function).
75
76### Other
77
78* Fix deprecated `tags` option support. Addresses [#253](https://github.com/html-to-text/node-html-to-text/issues/253).
79
80
81----
82
83## Version 8.2.1
84
85No changes in published package. Bumped dev dependencies and regenerated `package-lock.json`.
86
87## Version 8.2.0
88
89Fix for the issue [#249](https://github.com/html-to-text/node-html-to-text/issues/249) and possibly other obscure issues when some selector options are ignored. `options.selectors` array was not fully processed before.
90
91## Version 8.1.1
92
93Bump `minimist` dependency, regenerate `package-lock.json`.
94
95## Version 8.1.0
96
97* Fix for too many newlines in certain cases when `preserveNewlines` option is used. Addresses [#232](https://github.com/html-to-text/node-html-to-text/issues/232);
98* Link and image formatters now have a `linkBrackets` option - it accepts an array of two strings (default: `['[', ']']`) or `false` to remove the brackets. Addresses [#236](https://github.com/html-to-text/node-html-to-text/issues/236);
99* `noLinkBrackets` formatters option is now deprecated.
100
101All commits: [8.0.0...8.1.0](https://github.com/html-to-text/node-html-to-text/compare/8.0.0...8.1.0)
102
103## Version 8.0.0
104
105All commits: [7.1.1...8.0.0](https://github.com/html-to-text/node-html-to-text/compare/7.1.1...8.0.0)
106
107Version 8 roadmap issue: [#228](https://github.com/html-to-text/node-html-to-text/issues/228)
108
109### Selectors
110
111The main focus of this version. Addresses the most demanded user requests ([#159](https://github.com/html-to-text/node-html-to-text/issues/159), [#179](https://github.com/html-to-text/node-html-to-text/issues/179), partially [#143](https://github.com/html-to-text/node-html-to-text/issues/143)).
112
113It is now possible to specify formatting options or assign custom formatters not only by tag names but by almost any selectors.
114
115See the README [Selectors](https://github.com/html-to-text/node-html-to-text#selectors) section for details.
116
117Note: The new `selectors` option is an array, in contrast to the `tags` option introduced in version 6 (and now deprecated). Selectors have to have a well defined order and object properties is not a right tool for that.
118
119Two new packages were created to enable this feature - [parseley](https://github.com/mxxii/parseley) and [selderee](https://github.com/mxxii/selderee).
120
121### Base elements
122
123The same selectors implementation is used now to narrow down the conversion to specific HTML DOM fragments. Addresses [#96](https://github.com/html-to-text/node-html-to-text/issues/96). (Previous implementation had more limited selectors format.)
124
125BREAKING CHANGE: All outermost elements matching provided selectors will be present in the output (previously it was only the first match for each selector). Addresses [#215](https://github.com/html-to-text/node-html-to-text/issues/215).
126
127`limits.maxBaseElements` can be used when you only need a fixed number of base elements and would like to avoid checking the rest of the source HTML document.
128
129Base elements can be arranged in output text in the order of matched selectors (default, to keep it closer to the old implementation) or in the order of appearance in source HTML document.
130
131BREAKING CHANGE: previous implementation was treating id selectors in the same way as class selectors (could match `<foo id="a b">` with `foo#a` selector). New implementation is closer to the spec and doesn't expect multiple ids on an element. You can achieve the old behavior with `foo[id~=a]` selector in case you rely on it for some poorly formatted documents (note that it has different specificity though).
132
133### Batch processing
134
135Since options preprocessing is getting more involved with selectors compilation, it seemed reasonable to break the single `htmlToText()` function into compilation and convertation steps. It might provide some performance benefits in client code.
136
137* new function `compile(options)` returns a function of a single argument (html string);
138* `htmlToText(html, options)` is now an alias to `convert(html, options)` function and works as before.
139
140### Deprecated options
141
142* `baseElement`;
143* `returnDomByDefault`;
144* `tables`;
145* `tags`.
146
147Refer to README for [migration instructions](https://github.com/html-to-text/node-html-to-text#deprecated-or-removed-options).
148
149No previously deprecated stuff is removed in this version. Significant cleanup is planned for version 9 instead.
150
151----
152
153## Version ~~7.1.2~~ 7.1.3
154
155Bump `minimist` dependency and dev dependencies, regenerate `package-lock.json`.
156
157## Version 7.1.1
158
159Regenerate `package-lock.json`.
160
161## Version 7.1.0
162
163### Dependency updates
164
165* `htmlparser2` updated from 6.0.0 to 6.1.0 ([Release notes](https://github.com/fb55/htmlparser2/releases));
166* dev dependencies are bumped.
167
168## Version 7.0.0
169
170### Node version
171
172Required Node version is now >=10.23.2.
173
174### Dependency updates
175
176* `lodash` dependency is removed;
177* `htmlparser2` updated from 4.1.0 to 6.0.0 ([Release notes](https://github.com/fb55/htmlparser2/releases), also [domhandler](https://github.com/fb55/domhandler/releases/tag/v4.0.0)). There is a slim chance you can run into some differences in case you're relying on it heavily in your custom formatters;
178* dev dependencies are bumped.
179
180### Custom formatters API change
181
182[BlockTextBuilder](https://github.com/html-to-text/node-html-to-text/blob/master/lib/block-text-builder.js) methods now accept option objects for optional arguments. This improves client code readability and allows to introduce extra options with ease. It will see some use in future updates.
183
184Positional arguments introduced in version 6.0.0 are now deprecated. Formatters written for the version 6.0.0 should keep working for now but the compatibility layer is rather inconvenient and will be removed with the next major version.
185
186See the commit [f50f10f](https://github.com/html-to-text/node-html-to-text/commit/f50f10f54cf814efb2f7633d9d377ba7eadeaf1e). Changes in `lib/formatter.js` file are illustrative for how to migrate to the new API.
187
188### And more
189
190* Bunch of documentation and test updates.
191
192All commits: [6.0.0...7.0.0](https://github.com/html-to-text/node-html-to-text/compare/6.0.0...7.0.0)
193
194Version 7 roadmap issue: [#222](https://github.com/html-to-text/node-html-to-text/issues/222)
195
196----
197
198## Version 6.0.0
199
200This is a major update. No code left untouched. While the goal was to keep as much compatibility as possible, some client-facing changes were unavoidable.
201
202### fromString() is deprecated in favor of htmlToText()
203
204Since the library has the only exported function, it is now self-titled.
205
206### Inline and block-level tags, HTML whitespace
207
208Formatting code was rewritten almost entirely to make it aware of block-level tags and to handle HTML whitespace properly. One of popular requests was to support divs, and it is here now, after a lot of effort.
209
210### Options reorganized
211
212Options are reorganized to make room for some extra format options while making everything more structured. Now tag-specific options live within that tag configuration.
213
214For the majority of changed options there is a compatibility layer that will remain until next major release. But you are encouraged to explore new options since they provide a bit more flexibility.
215
216### Custom formatters are different now
217
218Because formatters are integral part of the formatting code (as the name suggests), it wasn't possible to provide a compatibility layer.
219
220Please refer to the Readme to see how things are wired now, in case you were using them for anything other than dealing with the lack of block-level tags support.
221
222### Tables support was improved
223
224Cells can make use of extra space with colspan and rowspan attributes. Max column width is defined separately from global wordwrap limit.
225
226### Limits
227
228Multiple options to cut content in large HTML documents.
229
230By default, any input longer than 16 million characters will be truncated.
231
232### Node and dependencies
233
234Required Node version is now >=8.10.0.
235
236Dependency versions are bumped.
237
238### Repository is moved to it's own organization
239
240[https://github.com/html-to-text/node-html-to-text](https://github.com/html-to-text/node-html-to-text) is the new home.
241
242GitHub should handle all redirects from the old url, so it shouldn't break anything, even if you have a local fork pointing at the old origin. But it is still a good idea to [update](https://docs.github.com/en/free-pro-team@latest/github/using-git/changing-a-remotes-url) the url.
243
244### And more
245
246Version 6 roadmap issue: [#200](https://github.com/html-to-text/node-html-to-text/issues/200)
247
248----
249
250## Version 5.1.1
251
252* `preserveNewLines` whitespace issue fixed [#162](https://github.com/html-to-text/node-html-to-text/pull/162)
253
254## Version 5.1.0
255
256* Hard-coded CLI options removed [#173](https://github.com/html-to-text/node-html-to-text/pull/173)
257
258## Version 5.0.0
259
260### BREAKING CHANGES
261
262#### fromFile removed
263
264The function `fromFile` is removed. It was the main reason `html-to-text` could not be used in the browser [#164](https://github.com/html-to-text/node-html-to-text/pull/164).
265
266You can get the `fromFile` functionality back by using the following code
267
268```js
269const fs = require('fs');
270const { fromString } = require('html-to-text');
271
272// Callback version
273const fromFile = (file, options, callback) => {
274 if (!callback) {
275 callback = options;
276 options = {};
277 }
278 fs.readFile(file, 'utf8', (err, str) => {
279 if (err) return callback(err);
280 callback(null, fromString(str, options));
281 });
282};
283
284// Promise version
285const fromFile = (file, option) => fs.promises.readFile(file, 'utf8').then(html => fromString(html, options));
286
287// Sync version
288const fromFileSync = (file, options) => fromString(fs.readFileSync(file, 'utf8'), options);
289```
290
291#### Supported NodeJS Versions
292
293Node versions < 6 are no longer supported.
294
295----
296
297## Version 4.0.0
298
299* Support dropped for node version < 4.
300* New option `unorderedListItemPrefix` added.
301* HTML entities in links are not supported.
302
303----
304
305## Version 3.3.0
306
307* Ability to pass custom formatting via the `format` option #128
308* Enhanced support for alpha ordered list types added #123
309
310## Version 3.2.0
311
312* Basic support for alpha ordered list types added #122
313 * This includes support for the `ol` type values `1`, `a` and `A`
314
315## Version 3.1.0
316
317* Support for the ordered list start attribute added #117
318* Option to format paragraph with single new line #112
319* `noLinksBrackets` options added #119
320
321## Version 3.0.0
322
323* Switched from `htmlparser` to `htmlparser2` #113
324* Treat non-numeric colspans as zero and handle them gracefully #105
325
326----
327
328## Version 2.1.1
329
330* Extra space problem fixed. #88
331
332## Version 2.1.0
333
334* New option to disable `uppercaseHeadings` added. #86
335* Starting point of html to text conversion can now be defined in the options via the `baseElement` option. #83
336* Support for long words added. The behaviour can be configured via the `longWordSplit` option. #83
337
338## Version 2.0.0
339
340* Unicode support added. #81
341* New option `decodeOptions` added.
342* Dependencies updated.
343
344Breaking Changes:
345
346* Minimum node version increased to >=0.10.0
347
348----
349
350## Version 1.6.2
351
352* Fixed: correctly handle HTML entities for images #82
353
354## Version 1.6.1
355
356* Fixed: using --tables=true doesn't produce the expected results. #80
357
358## Version 1.6.0
359
360* Preserve newlines in text feature added #75
361
362## Version 1.5.1
363
364* Support for h5 and h6 tags added #74
365
366## Version 1.5.0
367
368* Entity regex is now less greedy #69 #70
369
370## Version 1.4.0
371
372* Uppercase tag processing added. Table center support added. #56
373* Unused dependencies removed.
374
375## Version 1.3.2
376
377* Support Node 4 engine #64