UNPKG

24.3 kBMarkdownView Raw
1# jsdiff
2
3A JavaScript text differencing implementation. Try it out in the **[online demo](https://kpdecker.github.io/jsdiff)**.
4
5Based on the algorithm proposed in
6["An O(ND) Difference Algorithm and its Variations" (Myers, 1986)](http://www.xmailserver.org/diff2.pdf).
7
8## Installation
9```bash
10npm install diff --save
11```
12
13## Usage
14
15Broadly, jsdiff's diff functions all take an old text and a new text and perform three steps:
16
171. Split both texts into arrays of "tokens". What constitutes a token varies; in `diffChars`, each character is a token, while in `diffLines`, each line is a token.
18
192. Find the smallest set of single-token *insertions* and *deletions* needed to transform the first array of tokens into the second.
20
21 This step depends upon having some notion of a token from the old array being "equal" to one from the new array, and this notion of equality affects the results. Usually two tokens are equal if `===` considers them equal, but some of the diff functions use an alternative notion of equality or have options to configure it. For instance, by default `diffChars("Foo", "FOOD")` will require two deletions (`o`, `o`) and three insertions (`O`, `O`, `D`), but `diffChars("Foo", "FOOD", {ignoreCase: true})` will require just one insertion (of a `D`), since `ignoreCase` causes `o` and `O` to be considered equal.
22
233. Return an array representing the transformation computed in the previous step as a series of [change objects](#change-objects). The array is ordered from the start of the input to the end, and each change object represents *inserting* one or more tokens, *deleting* one or more tokens, or *keeping* one or more tokens.
24
25### API
26
27* `Diff.diffChars(oldStr, newStr[, options])` - diffs two blocks of text, treating each character as a token.
28
29 ("Characters" here means Unicode code points - the elements you get when you loop over a string with a `for ... of ...` loop.)
30
31 Returns a list of [change objects](#change-objects).
32
33 Options
34 * `ignoreCase`: If `true`, the uppercase and lowercase forms of a character are considered equal. Defaults to `false`.
35
36* `Diff.diffWords(oldStr, newStr[, options])` - diffs two blocks of text, treating each word and each punctuation mark as a token. Whitespace is ignored when computing the diff (but preserved as far as possible in the final change objects).
37
38 Returns a list of [change objects](#change-objects).
39
40 Options
41 * `ignoreCase`: Same as in `diffChars`. Defaults to false.
42 * `intlSegmenter`: An optional [`Intl.Segmenter`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/Segmenter) object (which must have a `granularity` of `'word'`) for `diffWords` to use to split the text into words.
43
44 By default, `diffWords` does not use an `Intl.Segmenter`, just some regexes for splitting text into words. This will tend to give worse results than `Intl.Segmenter` would, but ensures the results are consistent across environments; `Intl.Segmenter` behaviour is only loosely specced and the implementations in browsers could in principle change dramatically in future. If you want to use `diffWords` with an `Intl.Segmenter` but ensure it behaves the same whatever environment you run it in, use an `Intl.Segmenter` polyfill instead of the JavaScript engine's native `Intl.Segmenter` implementation.
45
46 Using an `Intl.Segmenter` should allow better word-level diffing of non-English text than the default behaviour. For instance, `Intl.Segmenter`s can generally identify via built-in dictionaries which sequences of adjacent Chinese characters form words, allowing word-level diffing of Chinese. By specifying a language when instantiating the segmenter (e.g. `new Intl.Segmenter('sv', {granularity: 'word'})`) you can also support language-specific rules, like treating Swedish's colon separated contractions (like *k:a* for *kyrka*) as single words; by default this would be seen as two words separated by a colon.
47
48* `Diff.diffWordsWithSpace(oldStr, newStr[, options])` - diffs two blocks of text, treating each word, punctuation mark, newline, or run of (non-newline) whitespace as a token.
49
50* `Diff.diffLines(oldStr, newStr[, options])` - diffs two blocks of text, treating each line as a token.
51
52 Options
53 * `ignoreWhitespace`: `true` to ignore leading and trailing whitespace characters when checking if two lines are equal. Defaults to `false`.
54 * `ignoreNewlineAtEof`: `true` to ignore a missing newline character at the end of the last line when comparing it to other lines. (By default, the line `'b\n'` in text `'a\nb\nc'` is not considered equal to the line `'b'` in text `'a\nb'`; this option makes them be considered equal.) Ignored if `ignoreWhitespace` or `newlineIsToken` are also true.
55 * `stripTrailingCr`: `true` to remove all trailing CR (`\r`) characters before performing the diff. Defaults to `false`.
56 This helps to get a useful diff when diffing UNIX text files against Windows text files.
57 * `newlineIsToken`: `true` to treat the newline character at the end of each line as its own token. This allows for changes to the newline structure to occur independently of the line content and to be treated as such. In general this is the more human friendly form of `diffLines`; the default behavior with this option turned off is better suited for patches and other computer friendly output. Defaults to `false`.
58
59 Note that while using `ignoreWhitespace` in combination with `newlineIsToken` is not an error, results may not be as expected. With `ignoreWhitespace: true` and `newlineIsToken: false`, changing a completely empty line to contain some spaces is treated as a non-change, but with `ignoreWhitespace: true` and `newlineIsToken: true`, it is treated as an insertion. This is because the content of a completely blank line is not a token at all in `newlineIsToken` mode.
60
61 Returns a list of [change objects](#change-objects).
62
63* `Diff.diffSentences(oldStr, newStr[, options])` - diffs two blocks of text, treating each sentence as a token. The characters `.`, `!`, and `?`, when followed by whitespace, are treated as marking the end of a sentence; nothing else is considered to mark a sentence end.
64
65 (For more sophisticated detection of sentence breaks, including support for non-English punctuation, consider instead tokenizing with an [`Intl.Segmenter`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Intl/Segmenter) with `granularity: 'sentence'` and passing the result to `Diff.diffArrays`.)
66
67 Returns a list of [change objects](#change-objects).
68
69* `Diff.diffCss(oldStr, newStr[, options])` - diffs two blocks of text, comparing CSS tokens.
70
71 Returns a list of [change objects](#change-objects).
72
73* `Diff.diffJson(oldObj, newObj[, options])` - diffs two JSON-serializable objects by first serializing them to prettily-formatted JSON and then treating each line of the JSON as a token. Object properties are ordered alphabetically in the serialized JSON, so the order of properties in the objects being compared doesn't affect the result.
74
75 Returns a list of [change objects](#change-objects).
76
77 Options
78 * `stringifyReplacer`: A custom replacer function. Operates similarly to the `replacer` parameter to [`JSON.stringify()`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/stringify#the_replacer_parameter), but must be a function.
79 * `undefinedReplacement`: A value to replace `undefined` with. Ignored if a `stringifyReplacer` is provided.
80
81* `Diff.diffArrays(oldArr, newArr[, options])` - diffs two arrays of tokens, comparing each item for strict equality (===).
82
83 Options
84 * `comparator`: `function(left, right)` for custom equality checks
85
86 Returns a list of [change objects](#change-objects).
87
88* `Diff.createTwoFilesPatch(oldFileName, newFileName, oldStr, newStr[, oldHeader[, newHeader[, options]]])` - creates a unified diff patch by first computing a diff with `diffLines` and then serializing it to unified diff format.
89
90 Parameters:
91 * `oldFileName` : String to be output in the filename section of the patch for the removals
92 * `newFileName` : String to be output in the filename section of the patch for the additions
93 * `oldStr` : Original string value
94 * `newStr` : New string value
95 * `oldHeader` : Optional additional information to include in the old file header. Default: `undefined`.
96 * `newHeader` : Optional additional information to include in the new file header. Default: `undefined`.
97 * `options` : An object with options.
98 - `context` describes how many lines of context should be included. You can set this to `Number.MAX_SAFE_INTEGER` or `Infinity` to include the entire file content in one hunk.
99 - `ignoreWhitespace`: Same as in `diffLines`. Defaults to `false`.
100 - `stripTrailingCr`: Same as in `diffLines`. Defaults to `false`.
101
102* `Diff.createPatch(fileName, oldStr, newStr[, oldHeader[, newHeader[, options]]])` - creates a unified diff patch.
103
104 Just like Diff.createTwoFilesPatch, but with oldFileName being equal to newFileName.
105
106* `Diff.formatPatch(patch)` - creates a unified diff patch.
107
108 `patch` may be either a single structured patch object (as returned by `structuredPatch`) or an array of them (as returned by `parsePatch`).
109
110* `Diff.structuredPatch(oldFileName, newFileName, oldStr, newStr[, oldHeader[, newHeader[, options]]])` - returns an object with an array of hunk objects.
111
112 This method is similar to createTwoFilesPatch, but returns a data structure
113 suitable for further processing. Parameters are the same as createTwoFilesPatch. The data structure returned may look like this:
114
115 ```js
116 {
117 oldFileName: 'oldfile', newFileName: 'newfile',
118 oldHeader: 'header1', newHeader: 'header2',
119 hunks: [{
120 oldStart: 1, oldLines: 3, newStart: 1, newLines: 3,
121 lines: [' line2', ' line3', '-line4', '+line5', '\\ No newline at end of file'],
122 }]
123 }
124 ```
125
126* `Diff.applyPatch(source, patch[, options])` - attempts to apply a unified diff patch.
127
128 Hunks are applied first to last. `applyPatch` first tries to apply the first hunk at the line number specified in the hunk header, and with all context lines matching exactly. If that fails, it tries scanning backwards and forwards, one line at a time, to find a place to apply the hunk where the context lines match exactly. If that still fails, and `fuzzFactor` is greater than zero, it increments the maximum number of mismatches (missing, extra, or changed context lines) that there can be between the hunk context and a region where we are trying to apply the patch such that the hunk will still be considered to match. Regardless of `fuzzFactor`, lines to be deleted in the hunk *must* be present for a hunk to match, and the context lines *immediately* before and after an insertion must match exactly.
129
130 Once a hunk is successfully fitted, the process begins again with the next hunk. Regardless of `fuzzFactor`, later hunks must be applied later in the file than earlier hunks.
131
132 If a hunk cannot be successfully fitted *anywhere* with fewer than `fuzzFactor` mismatches, `applyPatch` fails and returns `false`.
133
134 If a hunk is successfully fitted but not at the line number specified by the hunk header, all subsequent hunks have their target line number adjusted accordingly. (e.g. if the first hunk is applied 10 lines below where the hunk header said it should fit, `applyPatch` will *start* looking for somewhere to apply the second hunk 10 lines below where its hunk header says it goes.)
135
136 If the patch was applied successfully, returns a string containing the patched text. If the patch could not be applied (because some hunks in the patch couldn't be fitted to the text in `source`), `applyPatch` returns false.
137
138 `patch` may be a string diff or the output from the `parsePatch` or `structuredPatch` methods.
139
140 The optional `options` object may have the following keys:
141
142 - `fuzzFactor`: Maximum Levenshtein distance (in lines deleted, added, or subtituted) between the context shown in a patch hunk and the lines found in the file. Defaults to 0.
143 - `autoConvertLineEndings`: If `true`, and if the file to be patched consistently uses different line endings to the patch (i.e. either the file always uses Unix line endings while the patch uses Windows ones, or vice versa), then `applyPatch` will behave as if the line endings in the patch were the same as those in the source file. (If `false`, the patch will usually fail to apply in such circumstances since lines deleted in the patch won't be considered to match those in the source file.) Defaults to `true`.
144 - `compareLine(lineNumber, line, operation, patchContent)`: Callback used to compare to given lines to determine if they should be considered equal when patching. Defaults to strict equality but may be overridden to provide fuzzier comparison. Should return false if the lines should be rejected.
145
146* `Diff.applyPatches(patch, options)` - applies one or more patches.
147
148 `patch` may be either an array of structured patch objects, or a string representing a patch in unified diff format (which may patch one or more files).
149
150 This method will iterate over the contents of the patch and apply to data provided through callbacks. The general flow for each patch index is:
151
152 - `options.loadFile(index, callback)` is called. The caller should then load the contents of the file and then pass that to the `callback(err, data)` callback. Passing an `err` will terminate further patch execution.
153 - `options.patched(index, content, callback)` is called once the patch has been applied. `content` will be the return value from `applyPatch`. When it's ready, the caller should call `callback(err)` callback. Passing an `err` will terminate further patch execution.
154
155 Once all patches have been applied or an error occurs, the `options.complete(err)` callback is made.
156
157* `Diff.parsePatch(diffStr)` - Parses a patch into structured data
158
159 Return a JSON object representation of the a patch, suitable for use with the `applyPatch` method. This parses to the same structure returned by `Diff.structuredPatch`.
160
161* `Diff.reversePatch(patch)` - Returns a new structured patch which when applied will undo the original `patch`.
162
163 `patch` may be either a single structured patch object (as returned by `structuredPatch`) or an array of them (as returned by `parsePatch`).
164
165* `Diff.convertChangesToXML(changes)` - converts a list of change objects to a serialized XML format
166
167* `Diff.convertChangesToDMP(changes)` - converts a list of change objects to the format returned by Google's [diff-match-patch](https://github.com/google/diff-match-patch) library
168
169#### Universal `options`
170
171Certain options can be provided in the `options` object of *any* method that calculates a diff (including `diffChars`, `diffLines` etc. as well as `structuredPatch`, `createPatch`, and `createTwoFilesPatch`):
172
173* `callback`: if provided, the diff will be computed in async mode to avoid blocking the event loop while the diff is calculated. The value of the `callback` option should be a function and will be passed the computed diff or patch as its first argument.
174
175 (Note that if the ONLY option you want to provide is a callback, you can pass the callback function directly as the `options` parameter instead of passing an object with a `callback` property.)
176
177* `maxEditLength`: a number specifying the maximum edit distance to consider between the old and new texts. You can use this to limit the computational cost of diffing large, very different texts by giving up early if the cost will be huge. This option can be passed either to diffing functions (`diffLines`, `diffChars`, etc) or to patch-creation function (`structuredPatch`, `createPatch`, etc), all of which will indicate that the max edit length was reached by returning `undefined` instead of whatever they'd normally return.
178
179* `timeout`: a number of milliseconds after which the diffing algorithm will abort and return `undefined`. Supported by the same functions as `maxEditLength`.
180
181* `oneChangePerToken`: if `true`, the array of change objects returned will contain one change object per token (e.g. one per line if calling `diffLines`), instead of runs of consecutive tokens that are all added / all removed / all conserved being combined into a single change object.
182
183### Defining custom diffing behaviors
184
185If you need behavior a little different to what any of the text diffing functions above offer, you can roll your own by customizing both the tokenization behavior used and the notion of equality used to determine if two tokens are equal.
186
187The simplest way to customize tokenization behavior is to simply tokenize the texts you want to diff yourself, with your own code, then pass the arrays of tokens to `diffArrays`. For instance, if you wanted a semantically-aware diff of some code, you could try tokenizing it using a parser specific to the programming language the code is in, then passing the arrays of tokens to `diffArrays`.
188
189To customize the notion of token equality used, use the `comparator` option to `diffArrays`.
190
191For even more customisation of the diffing behavior, you can create a `new Diff.Diff()` object, overwrite its `castInput`, `tokenize`, `removeEmpty`, `equals`, and `join` properties with your own functions, then call its `diff(oldString, newString[, options])` method. The methods you can overwrite are used as follows:
192
193* `castInput(value, options)`: used to transform the `oldString` and `newString` before any other steps in the diffing algorithm happen. For instance, `diffJson` uses `castInput` to serialize the objects being diffed to JSON. Defaults to a no-op.
194* `tokenize(value, options)`: used to convert each of `oldString` and `newString` (after they've gone through `castInput`) to an array of tokens. Defaults to returning `value.split('')` (returning an array of individual characters).
195* `removeEmpty(array)`: called on the arrays of tokens returned by `tokenize` and can be used to modify them. Defaults to stripping out falsey tokens, such as empty strings. `diffArrays` overrides this to simply return the `array`, which means that falsey values like empty strings can be handled like any other token by `diffArrays`.
196* `equals(left, right, options)`: called to determine if two tokens (one from the old string, one from the new string) should be considered equal. Defaults to comparing them with `===`.
197* `join(tokens)`: gets called with an array of consecutive tokens that have either all been added, all been removed, or are all common. Needs to join them into a single value that can be used as the `value` property of the [change object](#change-objects) for these tokens. Defaults to simply returning `tokens.join('')`.
198* `postProcess(changeObjects)`: gets called at the end of the algorithm with the [change objects](#change-objects) produced, and can do final cleanups on them. Defaults to simply returning `changeObjects` unchanged.
199
200### Change Objects
201Many of the methods above return change objects. These objects consist of the following fields:
202
203* `value`: The concatenated content of all the tokens represented by this change object - i.e. generally the text that is either added, deleted, or common, as a single string. In cases where tokens are considered common but are non-identical (e.g. because an option like `ignoreCase` or a custom `comparator` was used), the value from the *new* string will be provided here.
204* `added`: true if the value was inserted into the new string, otherwise false
205* `removed`: true if the value was removed from the old string, otherwise false
206* `count`: How many tokens (e.g. chars for `diffChars`, lines for `diffLines`) the value in the change object consists of
207
208(Change objects where `added` and `removed` are both false represent content that is common to the old and new strings.)
209
210## Examples
211
212#### Basic example in Node
213
214```js
215require('colors');
216const Diff = require('diff');
217
218const one = 'beep boop';
219const other = 'beep boob blah';
220
221const diff = Diff.diffChars(one, other);
222
223diff.forEach((part) => {
224 // green for additions, red for deletions
225 let text = part.added ? part.value.bgGreen :
226 part.removed ? part.value.bgRed :
227 part.value;
228 process.stderr.write(text);
229});
230
231console.log();
232```
233Running the above program should yield
234
235<img src="images/node_example.png" alt="Node Example">
236
237#### Basic example in a web page
238
239```html
240<pre id="display"></pre>
241<script src="diff.js"></script>
242<script>
243const one = 'beep boop',
244 other = 'beep boob blah',
245 color = '';
246
247let span = null;
248
249const diff = Diff.diffChars(one, other),
250 display = document.getElementById('display'),
251 fragment = document.createDocumentFragment();
252
253diff.forEach((part) => {
254 // green for additions, red for deletions
255 // grey for common parts
256 const color = part.added ? 'green' :
257 part.removed ? 'red' : 'grey';
258 span = document.createElement('span');
259 span.style.color = color;
260 span.appendChild(document
261 .createTextNode(part.value));
262 fragment.appendChild(span);
263});
264
265display.appendChild(fragment);
266</script>
267```
268
269Open the above .html file in a browser and you should see
270
271<img src="images/web_example.png" alt="Node Example">
272
273#### Example of generating a patch from Node
274
275The code below is roughly equivalent to the Unix command `diff -u file1.txt file2.txt > mydiff.patch`:
276
277```
278const Diff = require('diff');
279const file1Contents = fs.readFileSync("file1.txt").toString();
280const file2Contents = fs.readFileSync("file2.txt").toString();
281const patch = Diff.createTwoFilesPatch("file1.txt", "file2.txt", file1Contents, file2Contents);
282fs.writeFileSync("mydiff.patch", patch);
283```
284
285#### Examples of parsing and applying a patch from Node
286
287##### Applying a patch to a specified file
288
289The code below is roughly equivalent to the Unix command `patch file1.txt mydiff.patch`:
290
291```
292const Diff = require('diff');
293const file1Contents = fs.readFileSync("file1.txt").toString();
294const patch = fs.readFileSync("mydiff.patch").toString();
295const patchedFile = Diff.applyPatch(file1Contents, patch);
296fs.writeFileSync("file1.txt", patchedFile);
297```
298
299##### Applying a multi-file patch to the files specified by the patch file itself
300
301The code below is roughly equivalent to the Unix command `patch < mydiff.patch`:
302
303```
304const Diff = require('diff');
305const patch = fs.readFileSync("mydiff.patch").toString();
306Diff.applyPatches(patch, {
307 loadFile: (patch, callback) => {
308 let fileContents;
309 try {
310 fileContents = fs.readFileSync(patch.oldFileName).toString();
311 } catch (e) {
312 callback(`No such file: ${patch.oldFileName}`);
313 return;
314 }
315 callback(undefined, fileContents);
316 },
317 patched: (patch, patchedContent, callback) => {
318 if (patchedContent === false) {
319 callback(`Failed to apply patch to ${patch.oldFileName}`)
320 return;
321 }
322 fs.writeFileSync(patch.oldFileName, patchedContent);
323 callback();
324 },
325 complete: (err) => {
326 if (err) {
327 console.log("Failed with error:", err);
328 }
329 }
330});
331```
332
333## Compatibility
334
335jsdiff supports all ES3 environments with some known issues on IE8 and below. Under these browsers some diff algorithms such as word diff and others may fail due to lack of support for capturing groups in the `split` operation.
336
337## License
338
339See [LICENSE](https://github.com/kpdecker/jsdiff/blob/master/LICENSE).
340
341## Deviations from the published Myers diff algorithm
342
343jsdiff deviates from the published algorithm in a couple of ways that don't affect results but do affect performance:
344
345* jsdiff keeps track of the diff for each diagonal using a linked list of change objects for each diagonal, rather than the historical array of furthest-reaching D-paths on each diagonal contemplated on page 8 of Myers's paper.
346* jsdiff skips considering diagonals where the furthest-reaching D-path would go off the edge of the edit graph. This dramatically reduces the time cost (from quadratic to linear) in cases where the new text just appends or truncates content at the end of the old text.