UNPKG

14.7 kBMarkdownView Raw
1# Changelog
2
3## Version 9.0.0
4
5All commits: [8.2.1...9.0.0](https://github.com/html-to-text/node-html-to-text/compare/8.2.1...9.0.0)
6
7Version 9 roadmap: [#240](https://github.com/html-to-text/node-html-to-text/issues/240)
8
9Request for comments: [#261 \[RFC\] Naming issue](https://github.com/html-to-text/node-html-to-text/discussions/261) - please take a look and share opinions while you're here
10
11### Node version
12
13Required Node version is now >=14.
14
15### CommonJS and ES Module
16
17Package now provides `cjs` and `mjs` exports.
18
19### CLI is no longer built in
20
21If you use CLI then install [that package](https://github.com/html-to-text/node-html-to-text/tree/master/packages/html-to-text-cli/) instead.
22
23The new package uses new arg parser [aspargvs](https://github.com/mxxii/aspargvs) instead of minimist in order to deal with the vast options space of `html-to-text`.
24
25### Dependency updates
26
27* `htmlparser2` updated from 6.1.0 to 8.0.1 ([Release notes](https://github.com/fb55/htmlparser2/releases));
28* `he` dependency is removed. It was needed at the time it was introduced, apparently, but at this point `htmlparser2` seems to do a better job itself.
29
30### Removed features
31
32* Options deprecated in version 6 are now removed;
33* `decodeOptions` section removed with `he` dependency;
34* `fromString` method removed;
35* deprecated positional arguments in `BlockTextBuilder` methods are now removed.
36
37Refer to README for [migration instructions](https://github.com/html-to-text/node-html-to-text/tree/master/packages/html-to-text#deprecated-or-removed-options).
38
39### New options
40
41* `decodeEntities` - controls whether HTML entities found in the input HTML should be decoded or left as is in the output text;
42* `encodeCharacters` - a dictionary with characters that should be replaced in the output text and corresponding escape sequences.
43
44### New built-in formatters
45
46New generic formatters `blockString`, `blockTag`, `blockHtml`, `inlineString`, `inlineSurround`, `inlineTag`, `inlineHtml` cover some common usage scenarios such as [#231](https://github.com/html-to-text/node-html-to-text/issues/231).
47
48### Changes to existing built-in formatters
49
50* `anchor` and `image` got `pathRewrite` option;
51* `dataTable` formatter allows zero `colSpacing`.
52
53### Improvements for writing custom formatters
54
55* Some logic for making lists is moved to BlockTextBuilder and can be reused for custom lists (`openList`, `openListItem`, `closeListItem`, `closeList`). Addresses [#238](https://github.com/html-to-text/node-html-to-text/issues/238);
56* `startNoWrap`, `stopNoWrap` - allows to keep local inline content in a single line regardless of wrapping options;
57* `addLiteral` - it is like `addInline` but circumvents most of the text processing logic. This should be preferred when inserting markup elements;
58* It is now possible to provide a metadata object along with the HTML string to convert. Metadata object is available for custom formatters via `builder.metadata`. This allows to compile the converter once and still being able to supply per-document data. Metadata object is supplied as the last optional argument to `convert` function and the function returned by `compile` function.
59
60### Other
61
62* Fix deprecated `tags` option support. Addresses [#253](https://github.com/html-to-text/node-html-to-text/issues/253).
63
64
65----
66
67## Version 8.2.1
68
69No changes in published package. Bumped dev dependencies and regenerated `package-lock.json`.
70
71## Version 8.2.0
72
73Fix for the issue [#249](https://github.com/html-to-text/node-html-to-text/issues/249) and possibly other obscure issues when some selector options are ignored. `options.selectors` array was not fully processed before.
74
75## Version 8.1.1
76
77Bump `minimist` dependency, regenerate `package-lock.json`.
78
79## Version 8.1.0
80
81* Fix for too many newlines in certain cases when `preserveNewlines` option is used. Addresses [#232](https://github.com/html-to-text/node-html-to-text/issues/232);
82* Link and image formatters now have a `linkBrackets` option - it accepts an array of two strings (default: `['[', ']']`) or `false` to remove the brackets. Addresses [#236](https://github.com/html-to-text/node-html-to-text/issues/236);
83* `noLinkBrackets` formatters option is now deprecated.
84
85All commits: [8.0.0...8.1.0](https://github.com/html-to-text/node-html-to-text/compare/8.0.0...8.1.0)
86
87## Version 8.0.0
88
89All commits: [7.1.1...8.0.0](https://github.com/html-to-text/node-html-to-text/compare/7.1.1...8.0.0)
90
91Version 8 roadmap issue: [#228](https://github.com/html-to-text/node-html-to-text/issues/228)
92
93### Selectors
94
95The main focus of this version. Addresses the most demanded user requests ([#159](https://github.com/html-to-text/node-html-to-text/issues/159), [#179](https://github.com/html-to-text/node-html-to-text/issues/179), partially [#143](https://github.com/html-to-text/node-html-to-text/issues/143)).
96
97It is now possible to specify formatting options or assign custom formatters not only by tag names but by almost any selectors.
98
99See the README [Selectors](https://github.com/html-to-text/node-html-to-text#selectors) section for details.
100
101Note: The new `selectors` option is an array, in contrast to the `tags` option introduced in version 6 (and now deprecated). Selectors have to have a well defined order and object properties is not a right tool for that.
102
103Two new packages were created to enable this feature - [parseley](https://github.com/mxxii/parseley) and [selderee](https://github.com/mxxii/selderee).
104
105### Base elements
106
107The same selectors implementation is used now to narrow down the conversion to specific HTML DOM fragments. Addresses [#96](https://github.com/html-to-text/node-html-to-text/issues/96). (Previous implementation had more limited selectors format.)
108
109BREAKING CHANGE: All outermost elements matching provided selectors will be present in the output (previously it was only the first match for each selector). Addresses [#215](https://github.com/html-to-text/node-html-to-text/issues/215).
110
111`limits.maxBaseElements` can be used when you only need a fixed number of base elements and would like to avoid checking the rest of the source HTML document.
112
113Base elements can be arranged in output text in the order of matched selectors (default, to keep it closer to the old implementation) or in the order of appearance in source HTML document.
114
115BREAKING CHANGE: previous implementation was treating id selectors in the same way as class selectors (could match `<foo id="a b">` with `foo#a` selector). New implementation is closer to the spec and doesn't expect multiple ids on an element. You can achieve the old behavior with `foo[id~=a]` selector in case you rely on it for some poorly formatted documents (note that it has different specificity though).
116
117### Batch processing
118
119Since options preprocessing is getting more involved with selectors compilation, it seemed reasonable to break the single `htmlToText()` function into compilation and convertation steps. It might provide some performance benefits in client code.
120
121* new function `compile(options)` returns a function of a single argument (html string);
122* `htmlToText(html, options)` is now an alias to `convert(html, options)` function and works as before.
123
124### Deprecated options
125
126* `baseElement`;
127* `returnDomByDefault`;
128* `tables`;
129* `tags`.
130
131Refer to README for [migration instructions](https://github.com/html-to-text/node-html-to-text#deprecated-or-removed-options).
132
133No previously deprecated stuff is removed in this version. Significant cleanup is planned for version 9 instead.
134
135----
136
137## Version ~~7.1.2~~ 7.1.3
138
139Bump `minimist` dependency and dev dependencies, regenerate `package-lock.json`.
140
141## Version 7.1.1
142
143Regenerate `package-lock.json`.
144
145## Version 7.1.0
146
147### Dependency updates
148
149* `htmlparser2` updated from 6.0.0 to 6.1.0 ([Release notes](https://github.com/fb55/htmlparser2/releases));
150* dev dependencies are bumped.
151
152## Version 7.0.0
153
154### Node version
155
156Required Node version is now >=10.23.2.
157
158### Dependency updates
159
160* `lodash` dependency is removed;
161* `htmlparser2` updated from 4.1.0 to 6.0.0 ([Release notes](https://github.com/fb55/htmlparser2/releases), also [domhandler](https://github.com/fb55/domhandler/releases/tag/v4.0.0)). There is a slim chance you can run into some differences in case you're relying on it heavily in your custom formatters;
162* dev dependencies are bumped.
163
164### Custom formatters API change
165
166[BlockTextBuilder](https://github.com/html-to-text/node-html-to-text/blob/master/lib/block-text-builder.js) methods now accept option objects for optional arguments. This improves client code readability and allows to introduce extra options with ease. It will see some use in future updates.
167
168Positional arguments introduced in version 6.0.0 are now deprecated. Formatters written for the version 6.0.0 should keep working for now but the compatibility layer is rather inconvenient and will be removed with the next major version.
169
170See the commit [f50f10f](https://github.com/html-to-text/node-html-to-text/commit/f50f10f54cf814efb2f7633d9d377ba7eadeaf1e). Changes in `lib/formatter.js` file are illustrative for how to migrate to the new API.
171
172### And more
173
174* Bunch of documentation and test updates.
175
176All commits: [6.0.0...7.0.0](https://github.com/html-to-text/node-html-to-text/compare/6.0.0...7.0.0)
177
178Version 7 roadmap issue: [#222](https://github.com/html-to-text/node-html-to-text/issues/222)
179
180----
181
182## Version 6.0.0
183
184This is a major update. No code left untouched. While the goal was to keep as much compatibility as possible, some client-facing changes were unavoidable.
185
186### fromString() is deprecated in favor of htmlToText()
187
188Since the library has the only exported function, it is now self-titled.
189
190### Inline and block-level tags, HTML whitespace
191
192Formatting code was rewritten almost entirely to make it aware of block-level tags and to handle HTML whitespace properly. One of popular requests was to support divs, and it is here now, after a lot of effort.
193
194### Options reorganized
195
196Options are reorganized to make room for some extra format options while making everything more structured. Now tag-specific options live within that tag configuration.
197
198For the majority of changed options there is a compatibility layer that will remain until next major release. But you are encouraged to explore new options since they provide a bit more flexibility.
199
200### Custom formatters are different now
201
202Because formatters are integral part of the formatting code (as the name suggests), it wasn't possible to provide a compatibility layer.
203
204Please refer to the Readme to see how things are wired now, in case you were using them for anything other than dealing with the lack of block-level tags support.
205
206### Tables support was improved
207
208Cells can make use of extra space with colspan and rowspan attributes. Max column width is defined separately from global wordwrap limit.
209
210### Limits
211
212Multiple options to cut content in large HTML documents.
213
214By default, any input longer than 16 million characters will be truncated.
215
216### Node and dependencies
217
218Required Node version is now >=8.10.0.
219
220Dependency versions are bumped.
221
222### Repository is moved to it's own organization
223
224[https://github.com/html-to-text/node-html-to-text](https://github.com/html-to-text/node-html-to-text) is the new home.
225
226GitHub should handle all redirects from the old url, so it shouldn't break anything, even if you have a local fork pointing at the old origin. But it is still a good idea to [update](https://docs.github.com/en/free-pro-team@latest/github/using-git/changing-a-remotes-url) the url.
227
228### And more
229
230Version 6 roadmap issue: [#200](https://github.com/html-to-text/node-html-to-text/issues/200)
231
232----
233
234## Version 5.1.1
235
236* `preserveNewLines` whitespace issue fixed [#162](https://github.com/html-to-text/node-html-to-text/pull/162)
237
238## Version 5.1.0
239
240* Hard-coded CLI options removed [#173](https://github.com/html-to-text/node-html-to-text/pull/173)
241
242## Version 5.0.0
243
244### BREAKING CHANGES
245
246#### fromFile removed
247
248The function `fromFile` is removed. It was the main reason `html-to-text` could not be used in the browser [#164](https://github.com/html-to-text/node-html-to-text/pull/164).
249
250You can get the `fromFile` functionality back by using the following code
251
252```js
253const fs = require('fs');
254const { fromString } = require('html-to-text');
255
256// Callback version
257const fromFile = (file, options, callback) => {
258 if (!callback) {
259 callback = options;
260 options = {};
261 }
262 fs.readFile(file, 'utf8', (err, str) => {
263 if (err) return callback(err);
264 callback(null, fromString(str, options));
265 });
266};
267
268// Promise version
269const fromFile = (file, option) => fs.promises.readFile(file, 'utf8').then(html => fromString(html, options));
270
271// Sync version
272const fromFileSync = (file, options) => fromString(fs.readFileSync(file, 'utf8'), options);
273```
274
275#### Supported NodeJS Versions
276
277Node versions < 6 are no longer supported.
278
279----
280
281## Version 4.0.0
282
283* Support dropped for node version < 4.
284* New option `unorderedListItemPrefix` added.
285* HTML entities in links are not supported.
286
287----
288
289## Version 3.3.0
290
291* Ability to pass custom formatting via the `format` option #128
292* Enhanced support for alpha ordered list types added #123
293
294## Version 3.2.0
295
296* Basic support for alpha ordered list types added #122
297 * This includes support for the `ol` type values `1`, `a` and `A`
298
299## Version 3.1.0
300
301* Support for the ordered list start attribute added #117
302* Option to format paragraph with single new line #112
303* `noLinksBrackets` options added #119
304
305## Version 3.0.0
306
307* Switched from `htmlparser` to `htmlparser2` #113
308* Treat non-numeric colspans as zero and handle them gracefully #105
309
310----
311
312## Version 2.1.1
313
314* Extra space problem fixed. #88
315
316## Version 2.1.0
317
318* New option to disable `uppercaseHeadings` added. #86
319* Starting point of html to text conversion can now be defined in the options via the `baseElement` option. #83
320* Support for long words added. The behaviour can be configured via the `longWordSplit` option. #83
321
322## Version 2.0.0
323
324* Unicode support added. #81
325* New option `decodeOptions` added.
326* Dependencies updated.
327
328Breaking Changes:
329
330* Minimum node version increased to >=0.10.0
331
332----
333
334## Version 1.6.2
335
336* Fixed: correctly handle HTML entities for images #82
337
338## Version 1.6.1
339
340* Fixed: using --tables=true doesn't produce the expected results. #80
341
342## Version 1.6.0
343
344* Preserve newlines in text feature added #75
345
346## Version 1.5.1
347
348* Support for h5 and h6 tags added #74
349
350## Version 1.5.0
351
352* Entity regex is now less greedy #69 #70
353
354## Version 1.4.0
355
356* Uppercase tag processing added. Table center support added. #56
357* Unused dependencies removed.
358
359## Version 1.3.2
360
361* Support Node 4 engine #64