UNPKG

24.3 kBMarkdownView Raw
1# Stream-Editor
2
3[![npm](https://img.shields.io/npm/v/stream-editor?logo=npm)](https://www.npmjs.com/package/stream-editor)
4[![install size](https://packagephobia.com/badge?p=stream-editor)](https://packagephobia.com/result?p=stream-editor)
5[![codecov](https://codecov.io/gh/edfus/stream-editor/branch/master/graph/badge.svg)](https://codecov.io/gh/edfus/stream-editor)
6[![CI](https://github.com/edfus/stream-editor/actions/workflows/node.js.yml/badge.svg?branch=master)](https://github.com/edfus/stream-editor/actions/workflows/node.js.yml)
7[![Node.js Version](https://raw.githubusercontent.com/edfus/storage/master/node-lts-badge.svg)](https://nodejs.org/en/about/releases/)
8
9* [Features](#features)
10 * [Partial replacement](#partial-replacement)
11 * [Substituting texts within files in streaming fashion](#substituting-texts-within-files-in-streaming-fashion)
12 * [Setting limits on Regular Expressions' maximum executed times](#setting-limits-on-regular-expressions-maximum-executed-times)
13 * [Transcoding streams or files](#transcoding-streams-or-files)
14 * [Piping/teeing/confluencing streams with proper error handling & propagation](#pipingteeingconfluencing-streams-with-proper-error-handling--propagation)
15 * [No dependency](#no-dependency)
16 * [High coverage tests](#high-coverage-tests)
17* [API](#api)
18 * [Overview](#overview)
19 * [Options for replacement](#options-for-replacement)
20 * [Options for stream transform](#options-for-stream-transform)
21 * [Options for stream input/output](#options-for-stream-inputoutput)
22* [Examples](#examples)
23
24## Features
25
26### Partial replacement
27
28A partial replacement is replacing only the 1st parenthesized capture group substring match with replacement specified, allowing a simpler syntax and a minimum modification.
29
30Take the following snippet converting something like `import x from "../src/x.mjs"` into `import x from "../build/x.mjs"` as an example:
31
32```js
33import { sed as updateFileContent } from "stream-editor" ;
34
35updateFileContent({
36 file: "index.mjs",
37 search: matchParentFolderImport(/(src\/(.+?))/),
38 replacement: "build/$2",
39 maxTimes: 2,
40 required: true
41});
42
43function matchImport (addtionalPattern) {
44 const parts = /import\s+.+\s+from\s*['"](.+?)['"];?/.source.split("(.+?)");
45
46 return new RegExp([
47 parts[0],
48 addtionalPattern.source,
49 parts[1]
50 ].join(""));
51}
52```
53
54Special replacement patterns (parenthesized capture group placeholders) are well supported in a partial replacement, either for function replacements or string replacements. And all other concepts are designed to keep firmly to their origins in vanilla String.prototype.replace method, though the $& (also the 1st supplied value to replace function) and $1 (the 2nd param passed) always have the same value, supplying the matched substring in 1st PCG.
55
56You can specify a truthy `isFullReplacement` to perform a full replacment instead.
57
58### Substituting texts within files in streaming fashion
59
60This package will create readable and writable streams connected to a single file at the same time, while disallowing any write operations to advance further than the current reading index. This feature is based on [rw-stream](https://github.com/signicode/rw-stream)'s great work.
61
62To accommodate RegEx replacement (which requires intact strings rather than chunks that may begin or end at any position) with streams, we brings `separator` (default: `/(?<=\r?\n)/`) and `join` (default: `''`) options into use. You should NOT specify separators that may divide text structures targeted by your RegEx searches, which would result in undefined behavior.
63
64Moreover, as the RegEx replacement part in `options` is actually optional, stream-editor can also be used to break up streams and reassemble them like [split2](https://github.com/mcollina/split2) does:
65
66```js
67// named export sed is an alias for streamEdit
68const { streamEdit } = require("stream-editor");
69
70const filepath = join(__dirname, `./file.ndjson`);
71
72/* replace CRLF with LF */
73await streamEdit({
74 file: filepath,
75 separator: "\r\n",
76 join: "\n"
77});
78
79/* parse ndjson */
80await streamEdit({
81 from: createReadStream(filepath),
82 to: new Writable({
83 objectMode: true,
84 write(parsedObj, _enc, cb) {
85 return (
86 doSomething()
87 .then(() => cb())
88 .catch(cb)
89 );
90 }
91 }),
92 separator: "\n",
93 readableObjectMode: true,
94 postProcessing: part => JSON.parse(part)
95});
96```
97
98You can specify `null` as the `separator` to completely disable splitting.
99
100### Setting limits on Regular Expressions' maximum executed times
101
102This is achieved by altering all `replacement` into replacement functions and adding layers of proxying on them.
103
104```js
105const { sed: updateFiles } = require("stream-editor");
106
107/**
108 * add "use strict" plus a compatible line ending
109 * to the beginning of every commonjs file.
110 */
111
112// maxTimes version
113updateFiles({
114 files: commonjsFiles,
115 match: /^().*(\r?\n)/,
116 replacement: `"use strict";$2`,
117 maxTimes: 1
118});
119
120// limit version
121updateFiles({
122 files: commonjsFiles,
123 replace: [
124 {
125 match: /^().*(\r?\n)/,
126 replacement: `"use strict";$2`,
127 /**
128 * a local limit,
129 * applying restriction on certain match's maximum executed times.
130 */
131 limit: 1
132 }
133 ]
134 // a global limit, limit the maximum count of every search's executed times.
135 limit: 1
136});
137```
138
139Once the limit specified by option `limit` is reached, underlying transform stream will become a transparent passThrough stream if option `truncate` is falsy, otherwise the remaining part will be discarded. In contrast, `maxTimes` just performs a removal on that search.
140
141### Transcoding streams or files
142
143```js
144streamEdit({
145 from: createReadStream("gbk.txt"),
146 to: createWriteStream("hex.txt"),
147 decodeBuffers: "gbk",
148 encoding: "hex"
149});
150```
151
152Option `decodeBuffers` is the specific character encoding, like utf-8, iso-8859-2, koi8, cp1261, gbk, etc for decoding the input raw buffer. Some encodings are only available for Node embedded the entire ICU but the good news is that full-icu has been made the default since v14+ (see <https://github.com/nodejs/node/pull/29522>).
153
154Note that option `decodeBuffers` only makes sense when no encoding is assigned and stream data are passed as buffers. Below are some wrong input examples:
155
156```js
157streamEdit({
158 from:
159 createReadStream("gbk.txt").setEncoding("utf8"),
160 to: createWriteStream("hex.txt"),
161 decodeBuffers: "gbk",
162 encoding: "hex"
163});
164
165streamEdit({
166 from:
167 createReadStream("gbk.txt", "utf8"),
168 to: createWriteStream("hex.txt"),
169 decodeBuffers: "gbk",
170 encoding: "hex"
171});
172```
173
174Option `encoding` is for encoding all processed and joined strings to buffers with according encoding. Following options are supported by Node.js: `ascii`, `utf8`, `utf-8`, `utf16le`, `ucs2`, `ucs-2`, `base64`, `latin1`, `binary`, `hex`.
175
176### Piping/teeing/confluencing streams with proper error handling & propagation
177
178Confluence:
179```js
180const yamlFiles = await (
181 fsp.readdir(folderpath, { withFileTypes: true })
182 .then(dirents =>
183 dirents
184 .filter(dirent => dirent.isFile() && dirent.name.endsWith(".yaml"))
185 .sort((a, b) => a.name.localeCompare(b.name))
186 .map(({ name }) => createReadStream(join(folderpath, name)))
187 )
188);
189
190streamEdit({
191 from: yamlFiles,
192 to: createWriteStream(resultPath),
193 contentJoin: "\n\n" // join streams
194 // the encoding of contentJoin respects the `encoding` option
195});
196```
197
198Teeing:
199```js
200streamEdit({
201 readableStream: new Readable({
202 read(size) {
203 // ...
204 }
205 }),
206 writableStreams: new Array(6).fill(0).map((_, i) =>
207 createWriteStream(join(resultFolderPath, `./test-source${i}`))
208 )
209});
210```
211
212You can have a look at tests regarding error handling [here](https://github.com/edfus/stream-editor/blob/85665e5a9f53a724dab7a42a2d15301eaafddfc2/test/test.mjs#L578-L846).
213
214### No dependency
215
216stream-editor previously depends on [rw-stream](https://github.com/signicode/rw-stream), but for some historical reasons, I refactored rw-stream and bundled it as a part of this package. See [src/rw-stream](https://github.com/edfus/stream-editor/blob/master/src/rw-stream/index.mjs).
217
218Currently, stream-editor has zero dependency.
219
220### High coverage tests
221
222See <https://github.com/edfus/stream-editor/tree/master/test>.
223
224```plain text
225
226 Normalize & Replace
227 √ can handle sticky regular expressions
228 √ can handle string match with special characters
229 √ can handle partial replacement with placeholders
230 √ can handle non-capture-group parenthesized pattern: Assertions
231 √ can handle non-capture-group parenthesized pattern: Round brackets
232 √ can handle pattern starts with a capture group
233 √ can handle malformed (without capture groups) partial replacement
234 √ can await replace partially with function
235 √ recognize $\d{1,3} $& $` $' and check validity (throw warnings)
236
237 Edit streams
238 √ should check arguments
239 √ should warn unknown/unneeded options
240 √ should respect FORCE_COLOR, NO_COLOR, NODE_DISABLE_COLORS
241 √ should pipe one Readable to multiple dumps (49ms)
242 √ should replace CRLF with LF
243 √ should have replaced /dum(b)/i to dumpling (while preserving dum's case)
244 √ should have global and local limits on replacement amount
245 √ should have line buffer maxLength
246 √ should edit and combine multiple Readable into one Writable
247 √ has readableObjectMode
248 √ can signal an unsuccessful substitution using beforeCompletion
249 √ can declare a limit below which a substitution is considered failed for a search
250 truncation & limitation
251 √ truncating the rest when limitations reached
252 √ not: self rw-stream
253 √ not: piping stream
254 transcoding
255 √ gbk to utf8 buffer
256 √ gbk to hex with HWM
257 error handling
258 √ destroys streams properly when one of them closed prematurely
259 √ destroys streams properly if errors occurred during initialization
260 √ multiple-to-one: can correctly propagate errors emitted by readableStreams
261 √ multiple-to-one: can handle prematurely destroyed readableStreams
262 √ multiple-to-one: can correctly propagate errors emitted by writableStream
263 √ multiple-to-one: can handle prematurely ended writableStream
264 √ multiple-to-one: can handle prematurely destroyed writableStream
265 √ one-to-multiple: can correctly propagate errors emitted by writableStreams
266 √ one-to-multiple: can handle prematurely ended writableStreams
267 √ one-to-multiple: can handle prematurely destroyed writableStreams
268 √ can handle errors thrown from postProcessing
269 √ can handle errors thrown from join functions
270 √ can handle errors thrown from replacement functions
271 corner cases
272 √ can handle empty content
273 √ can handle non-string in regular expression split result
274 try-on
275 √ can handle files larger than 64KiB
276
277
278 42 passing (263ms)
279
280```
281
282## API
283
284### Overview
285
286This package has two named function exports: `streamEdit` and `sed` (an alias for `streamEdit`).
287
288`streamEdit` returns a promise that resolves to `void | void[]` for files, a promise that resolves to `Writable[] | Writable` for streams (which keeps output streams' references).
289
290An object input with one or more following options is acceptable to `streamEdit`:
291
292### Options for replacement
293
294| name | alias | expect | safe to ignore | default |
295| :--: | :-: | :-----: | :-: | :--: |
296| search | match | `string` \| `RegExp` | ✔ | none |
297| replacement | x | `string` \| `(wholeMatch, ...args) => string` | ✔ | none |
298| limit | x | `number` | ✔ | `Infinity` |
299| maxTimes | x | `number` | ✔ | `Infinity` |
300| minTimes | x | `number` | ✔ | `0` |
301| required | x | `boolean` | ✔ | `false` |
302| isFullReplacement | x | `boolean` | ✔ | `false` |
303| disablePlaceholders |x| `boolean` | ✔ | `false` |
304| replace | x | an `Array` of { `search`, `replacement` } | ✔ | none |
305| defaultOptions| x | `BasicReplaceOptions` | ✔ | `{}` |
306| join | x | `string` \| `(part: string) => string` \| `null` | ✔ | `part => part` |
307| postProcessing| x | `(part: string, isLastPart: boolean) => any` | ✔ | none |
308| beforeCompletion| x | `() => promise<void> | void` | ✔ | none |
309
310```ts
311type GlobalLimit = number;
312type LocalLimit = number;
313
314interface BasicReplaceOptions {
315 /**
316 * Perform a full replacement or not.
317 *
318 * A RegExp search without capture groups or a search in string will be
319 * treated as a full replacement silently.
320 */
321 isFullReplacement?: Boolean;
322 /**
323 * Only valid for a string replacement.
324 *
325 * Disable placeholders in replacement or not. Processed result shall be
326 * exactly the same as the string replacement if set to true.
327 *
328 * Default: false
329 */
330 disablePlaceholders?: Boolean;
331 /**
332 * Apply restriction on certain search's maximum executed times.
333 *
334 * Upon reaching the limit, if option `truncate` is falsy (false by default),
335 * underlying transform stream will become a transparent passThrough stream.
336 *
337 * Default: Infinity. 0 is considered as Infinity for this option.
338 */
339 limit?: LocalLimit;
340 /**
341 * Observe a certain search's executed times, remove that search right
342 * after upper limit reached.
343 *
344 * Default: Infinity. 0 is considered as Infinity for this option.
345 */
346 maxTimes?: number;
347 /**
348 * For the search you specified, add a limit below which the substitution
349 * is considered failed.
350 */
351 minTimes?: number;
352 /**
353 * Sugar for minTimes = 1
354 */
355 required?: boolean;
356}
357
358interface SearchAndReplaceOptions extends BasicReplaceOptions {
359 /**
360 * Correspondence: `String.prototype.replaceAll`'s 1st argument.
361 *
362 * Accepts a literal string or a RegExp object.
363 *
364 * Will replace all occurrences by converting input into a global RegExp
365 * object, which means that the according replacement might be invoked
366 * multiple times for each full match to be replaced.
367 *
368 * Every `search` and `replacement` not arranged in pairs is silently
369 * discarded in `options`, while in `options.replace` that will result in
370 * an error thrown.
371 */
372 search?: string | RegExp;
373
374 /**
375 * Correspondence: String.prototype.replace's 2nd argument.
376 *
377 * Replaces the according text for a given match, a string or
378 * a function that returns the replacement text can be passed.
379 *
380 * Special replacement patterns (parenthesized capture group placeholders)
381 * are well supported.
382 *
383 * For a partial replacement, $& (also the 1st supplied value to replace
384 * function) and $1 (the 2nd param passed) always have the same value,
385 * supplying the matched substring in the parenthesized capture group
386 * you specified.
387 */
388 replacement?: string | ((wholeMatch: string, ...args: string[]) => string);
389}
390
391interface MultipleReplacementOptions {
392 /**
393 * Apply restriction on the maximum count of every search's executed times.
394 *
395 * Upon reaching the limit, if option `truncate` is falsy (false by default),
396 * underlying transform stream will become a transparent passThrough stream.
397 *
398 * Default: Infinity. 0 is considered as Infinity for this option.
399 */
400 limit?: GlobalLimit;
401 /**
402 * Should be an array of { [ "match" | "search" ], "replacement" } pairs.
403 *
404 * Possible `search|match` and `replacement` pair in `options` scope will be
405 * prepended to `options.replace` array, if both exist.
406 */
407 replace?: Array<SearchAndReplaceOptions | MatchAndReplaceOptions>;
408 /**
409 * Default: {}
410 */
411 defaultOptions?: BasicReplaceOptions;
412}
413
414type ReplaceOptions = MultipleReplacementOptions `OR` SearchAndReplaceOptions;
415
416interface BasicOptions extends ReplaceOptions {
417 /**
418 * Correspondence: String.prototype.join's 1nd argument, though a function
419 * is also acceptable.
420 *
421 * You can specify a literal string or a function that returns the post-processed
422 * part.
423 *
424 * Example function for appending a CRLF: part => part.concat("\r\n");
425 *
426 * Default: part => part
427 */
428 join?: string | ((part: string) => string) | null;
429 /**
430 * A post-processing function that consumes transformed strings and returns a
431 * string or a Buffer. This option has higher priority over option `join`.
432 *
433 * If readableObjectMode is enabled, any object accepted by Node.js objectMode
434 * streams can be returned.
435 */
436 postProcessing: (part: string, isLastPart: boolean) => any
437 /**
438 * This optional function will be called before the destination(s) close,
439 * delaying the resolution of the promise returned by streamEdit() until
440 * beforeCompletion resolves.
441 *
442 * You can also return a rejected promise or simply raise an error to signal a
443 * failure and destroy all streams.
444 */
445 beforeCompletion: () => Promise<void> | void
446}
447```
448
449### Options for stream transform
450
451| name | alias | expect | safe to ignore | default |
452| :--: | :-: | :-----: | :-: | :--: |
453| separator | x | `string` \| `RegExp` \| `null`| ✔ | `/(?<=\r?\n)/` |
454| encoding | x | `string` \| `null` | ✔ | `null` |
455| decodeBuffers | x | `string` | ✔ | `"utf8"` |
456| truncate | x | `boolean` | ✔ | `false` |
457| maxLength | x | `number` | ✔ | `Infinity` |
458| readableObjectMode| x | `boolean` | ✔ | `false` |
459
460Options that are only available under certain context:
461| name | alias | expect | context | default |
462| :--: | :-: | :-----: | :-: | :--: |
463| readStart | x | `number` | file\[s\]| `0` |
464| writeStart | x | `number` | file\[s\]| `0` |
465| contentJoin | x | `string` \| `Buffer` | readableStreams | `""` |
466
467```ts
468interface BasicOptions extends ReplaceOptions {
469 /**
470 * Correspondence: String.prototype.split's 1nd argument.
471 *
472 * Accepts a literal string or a RegExp object.
473 *
474 * Used by underlying transform stream to split upstream data into separate
475 * to-be-processed parts.
476 *
477 * String.prototype.split will implicitly call `toString` on non-string &
478 * non-regex & non-void values.
479 *
480 * Specify `null` or `undefined` to process upstream data as a whole.
481 *
482 * Default: /(?<=\r?\n)/. Line endings following lines.
483 */
484 separator?: string | RegExp | null;
485 /**
486 * Correspondence: encoding of Node.js Buffer.
487 *
488 * If specified, then processed and joined strings will be encoded to buffers
489 * with that encoding.
490 *
491 * Node.js currently supportes following options:
492 * "ascii" | "utf8" | "utf-8" | "utf16le" | "ucs2" | "ucs-2" | "base64" | "latin1" | "binary" | "hex"
493 * Default: null.
494 */
495 encoding?: BufferEncoding | null;
496 /**
497 * Correspondence: encodings of WHATWG Encoding Standard TextDecoder.
498 *
499 * Accept a specific character encoding, like utf-8, iso-8859-2, koi8, cp1261,
500 * gbk, etc for decoding the input raw buffer.
501 *
502 * This option only makes sense when no encoding is assigned and stream data are
503 * passed as Buffer objects (that is, haven't done something like
504 * readable.setEncoding('utf8'));
505 *
506 * Example: streamEdit({
507 * from: createReadStream("gbk.txt"),
508 * to: createWriteStream("utf8.txt"),
509 * decodeBuffers: "gbk"
510 * });
511 *
512 * Some encodings are only available for Node embedded the entire ICU (full-icu).
513 * See https://nodejs.org/api/util.html#util_class_util_textdecoder.
514 *
515 * Default: "utf8".
516 */
517 decodeBuffers?: string;
518 /**
519 * Truncating the rest or not when limits reached.
520 *
521 * Default: false.
522 */
523 truncate?: Boolean;
524 /**
525 * The maximum size of the line buffer.
526 *
527 * A line buffer is the buffer used for buffering the last incomplete substring
528 * when dividing chunks (typically 64 KiB) by options.separator.
529 *
530 * Default: Infinity.
531 */
532 maxLength?: number;
533 /**
534 * Correspondence: readableObjectMode option of Node.js stream.Transform
535 *
536 * Options writableObjectMode and objectMode are not supported.
537 *
538 * Default: Infinity.
539 */
540 readableObjectMode?: boolean;
541}
542
543interface UpdateFileOptions extends BasicOptions {
544 file: string;
545 readStart?: number;
546 writeStart?: number;
547}
548
549interface UpdateFilesOptions extends BasicOptions {
550 files: string[];
551 readStart?: number;
552 writeStart?: number;
553}
554
555interface MultipleReadablesToWritableOptions<T> extends BasicOptions {
556 from: Array<Readable>;
557 to: T;
558 /**
559 * Concatenate results of transformed Readables with the input value.
560 * Accepts a literal string or a Buffer.
561 * option.encoding will be passed along with contentJoin to Writable.write
562 * Default: ""
563 */
564 contentJoin: string | Buffer;
565}
566
567interface MultipleReadablesToWritableOptionsAlias<T> extends BasicOptions {
568 readableStreams: Array<Readable>;
569 writableStream: T;
570 contentJoin: string | Buffer;
571}
572```
573
574### Options for stream input/output
575
576| name | alias | expect | with | default |
577| :--: | :-: | :-----: | :-: | :--: |
578| file | x | `string` | self | none |
579| files | x | an `Array` of `string` | self | none |
580| readableStream | from | `Readable` | writableStream\[s\]| none |
581| writableStream | to | `Writable` | readableStream\[s\]| none |
582| readableStreams| from | an `Array` of `Readable` | writableStream | none |
583| writableStreams| to | an `Array` of `Writable` | readableStream | none |
584
585file:
586```ts
587interface UpdateFileOptions extends BasicOptions {
588 file: string;
589 readStart?: number;
590 writeStart?: number;
591}
592function streamEdit(options: UpdateFileOptions): Promise<void>;
593```
594
595files:
596```ts
597interface UpdateFilesOptions extends BasicOptions {
598 files: string[];
599 readStart?: number;
600 writeStart?: number;
601}
602
603function streamEdit(options: UpdateFilesOptions): Promise<void[]>;
604```
605
606transform Readable:
607```ts
608interface TransformReadableOptions<T> extends BasicOptions {
609 [ from | readableStream ]: Readable;
610 [ to | writableStream ]: T;
611}
612
613function streamEdit<T extends Writable>(
614 options: TransformReadableOptions<T>
615): Promise<T>;
616```
617
618readables -> writable:
619```ts
620interface MultipleReadablesToWritableOptions<T> extends BasicOptions {
621 [ from | readableStreams ]: Array<Readable>;
622 [ to | writableStream ]: T;
623 contentJoin: string | Buffer;
624}
625
626function streamEdit<T extends Writable>(
627 options: MultipleReadablesToWritableOptions<T>
628): Promise< T >;
629```
630
631readable -> writables
632```ts
633interface ReadableToMultipleWritablesOptions<T> extends BasicOptions {
634 [ from | readableStream ]: Readable;
635 [ to | writableStreams ]: Array<T>;
636}
637
638function streamEdit<T extends Writable>(
639 options: ReadableToMultipleWritablesOptions<T>
640): Promise< T[]>;
641```
642
643For further reading, take a look at [the declaration file](https://github.com/edfus/stream-editor/blob/master/src/index.d.ts).
644
645## Examples
646
647See [./examples](https://github.com/edfus/stream-editor/tree/master/examples) and [esm2cjs](https://github.com/edfus/esm2cjs)
\No newline at end of file