1 | ![jsinspect](http://danielstjules.com/github/jsinspect-logo.png)
|
2 |
|
3 | Detect copy-pasted and structurally similar code. The inspector identifies
|
4 | duplicate code, even if modified, as well as common boilerplate or logic that
|
5 | should be the target of refactoring.
|
6 |
|
7 | [![Build Status](https://travis-ci.org/danielstjules/jsinspect.svg?branch=master)](https://travis-ci.org/danielstjules/jsinspect)
|
8 |
|
9 | * [Overview](#overview)
|
10 | * [Installation](#installation)
|
11 | * [Usage](#usage)
|
12 | * [Integration](#integration)
|
13 | * [Reporters](#reporters)
|
14 | * [Performance](#performance)
|
15 |
|
16 | ## Overview
|
17 |
|
18 | We've all had to deal with code smell, and duplicate code is a common source.
|
19 | While some instances are easy to spot, this type of searching is the perfect
|
20 | use-case for a helpful CLI tool.
|
21 |
|
22 | Existing solutions do exist for this purpose, but are often token-based and
|
23 | rely on string searching methods such as the Rabin–Karp algorithm. Why isn't
|
24 | this always ideal? Those tools may struggle with code that has wildly varying
|
25 | identifiers, despite having the same structure and behavior.
|
26 |
|
27 | And copy-pasted code is but one type of code duplication. Common boilerplate
|
28 | and repeated logic can be identified as well using jsinspect, since it
|
29 | doesn't work on tokens - it uses the ASTs of the parsed code.
|
30 |
|
31 | You have the freedom to specify a threshold determining the smallest subset of
|
32 | nodes to analyze. This will identify code with a similar structure, based
|
33 | on the AST node types, e.g. BlockStatement, VariableDeclaration,
|
34 | ObjectExpression, etc. For copy-paste oriented detection, you can even limit
|
35 | the search to nodes with matching identifiers.
|
36 |
|
37 | The tool accepts a list of paths to parse, and outputs any matches along
|
38 | with a series of 2-way diffs. Any directories among the paths are walked
|
39 | recursively, and only `.js` files are analyzed. Any `node_modules` and
|
40 | `bower_components` dirs are also ignored. Being built for JavaScript, it also
|
41 | ignores ES6 module declarations, CommonJS require statements, and AMD define
|
42 | expressions.
|
43 |
|
44 | ![screenshot](http://danielstjules.com/github/jsinspect-example.png)
|
45 |
|
46 | ## Installation
|
47 |
|
48 | It can be installed via `npm` using:
|
49 |
|
50 | ``` bash
|
51 | npm install -g jsinspect
|
52 | ```
|
53 |
|
54 | Also available: [grunt-jsinspect](https://github.com/stefanjudis/grunt-jsinspect),
|
55 | and [gulp-jsinspect](https://github.com/alexeyraspopov/gulp-jsinspect)
|
56 |
|
57 | ## Usage
|
58 |
|
59 | ```
|
60 | Usage: jsinspect [options] <paths ...>
|
61 |
|
62 | Duplicate code and structure detection for JavaScript.
|
63 | Identifier matching is disabled by default. Example use:
|
64 | jsinspect -t 30 -i --ignore "Test.js" ./path/to/src
|
65 |
|
66 |
|
67 | Options:
|
68 |
|
69 | -h, --help output usage information
|
70 | -V, --version output the version number
|
71 | -t, --threshold <number> number of nodes (default: 15)
|
72 | -i, --identifiers match identifiers
|
73 | -j --jsx process jsx files (default off)
|
74 | -c, --config path to config file (default: .jsinspectrc)
|
75 | -r, --reporter [default|json|pmd] specify the reporter to use
|
76 | -s, --suppress <number> length to suppress diffs (default: 100, off: 0)
|
77 | -D, --no-diff disable 2-way diffs
|
78 | -C, --no-color disable colors
|
79 | --ignore <pattern> ignore paths matching a regex
|
80 | ```
|
81 |
|
82 | If a `.jsinspectrc` file is located in the project directory, its values will
|
83 | be used in place of the defaults listed above. For example:
|
84 |
|
85 | ``` javascript
|
86 | {
|
87 | "threshold": 30,
|
88 | "identifiers": true,
|
89 | "ignore": "Test.js|Spec.js", // used as RegExp,
|
90 | "reporter": "json",
|
91 | "suppress": 100
|
92 | }
|
93 | ```
|
94 |
|
95 | On first use with a project, you may want to run the tool with the following
|
96 | options, while running explicitly on the lib/src directories, and not the
|
97 | test/spec dir.
|
98 |
|
99 | ```
|
100 | jsinspect -t 30 -i ./path/to/src
|
101 | ```
|
102 |
|
103 | From there, feel free to try incrementally decreasing the threshold and
|
104 | ignoring identifiers. A threshold of 20 may lead you to discover new areas of
|
105 | interest for refactoring or cleanup. Each project or library may be different.
|
106 |
|
107 | ## Integration
|
108 |
|
109 | It's simple to run jsinspect on your library source as part of a build
|
110 | process. It will exit with an error code of 0 when no matches are found,
|
111 | resulting in a passing step, and a positive error code corresponding to its
|
112 | failure. For example, with Travis CI, you could add the following entries
|
113 | to your `.travis.yml`:
|
114 |
|
115 | ``` yaml
|
116 | before_script:
|
117 | - "npm install -g jsinspect"
|
118 |
|
119 | script:
|
120 | - "jsinspect -t 30 ./path/to/src"
|
121 | ```
|
122 |
|
123 | Note that in the above example, we're using a threshold of 30 for detecting
|
124 | structurally similar code. A lower threshold may work for your build process,
|
125 | but ~30 should help detect unnecessary boilerplate, while avoiding excessive
|
126 | output.
|
127 |
|
128 | To have jsinspect run with each job, but not block or fail the build, you can
|
129 | use something like the following:
|
130 |
|
131 | ``` yaml
|
132 | script:
|
133 | - "jsinspect -t 30 ./path/to/src || true"
|
134 | ```
|
135 |
|
136 | ## Reporters
|
137 |
|
138 | Aside from the default reporter, both JSON and PMD CPD-style XML reporters are
|
139 | available. Note that in the JSON example below, indentation and formatting
|
140 | has been applied.
|
141 |
|
142 | #### JSON
|
143 |
|
144 | ``` json
|
145 | [{
|
146 | "instances":[
|
147 | {"path":"spec/fixtures/intersection.js","lines":[1,5]},
|
148 | {"path":"spec/fixtures/intersection.js","lines":[7,11]}
|
149 | ],
|
150 | "diffs":[
|
151 | {
|
152 | "-":{"path":"spec/fixtures/intersection.js","lines":[1,5]},
|
153 | "+":{"path":"spec/fixtures/intersection.js","lines":[7,11]},
|
154 | "diff":"- function intersectionA(array1, array2) {\n- array1.filter(function(n) {\n- return array2.indexOf(n) != -1;\n+ function intersectionB(arrayA, arrayB) {\n+ arrayA.filter(function(n) {\n+ return arrayB.indexOf(n) != -1;\n });\n }\n"
|
155 | }
|
156 | ]
|
157 | }]
|
158 | ```
|
159 |
|
160 | #### PMD CPD XML
|
161 |
|
162 | ``` xml
|
163 | <?xml version="1.0" encoding="utf-8"?>
|
164 | <pmd-cpd>
|
165 | <duplication lines="10">
|
166 | <file path="/jsinspect/spec/fixtures/intersection.js" line="1"/>
|
167 | <file path="/jsinspect/spec/fixtures/intersection.js" line="7"/>
|
168 | <codefragment>
|
169 | - spec/fixtures/intersection.js:1,5
|
170 | + spec/fixtures/intersection.js:7,11
|
171 |
|
172 | - function intersectionA(array1, array2) {
|
173 | - array1.filter(function(n) {
|
174 | - return array2.indexOf(n) != -1;
|
175 | + function intersectionB(arrayA, arrayB) {
|
176 | + arrayA.filter(function(n) {
|
177 | + return arrayB.indexOf(n) != -1;
|
178 | });
|
179 | }
|
180 | </codefragment>
|
181 | </duplication>
|
182 | </pmd-cpd>
|
183 | ```
|
184 |
|
185 | ## Performance
|
186 |
|
187 | Running on a medium sized code base, with a 2.4Ghz i5 MPB, yielded the
|
188 | following results:
|
189 |
|
190 | ``` bash
|
191 | $ find src/ -name '*.js' | xargs wc -l
|
192 | # ...
|
193 | 44810 total
|
194 |
|
195 | $ time jsinspect -t 30 src/
|
196 | # Looking for structural similarities..
|
197 | 41 matches found across 800 files
|
198 |
|
199 | real 0m1.542s
|
200 | user 0m1.472s
|
201 | sys 0m0.071s
|
202 |
|
203 | $ time jsinspect -i -t 15 src/
|
204 | # Looking for copy-pasted code..
|
205 | 96 matches found across 800 files
|
206 |
|
207 | real 0m1.283s
|
208 | user 0m1.196s
|
209 | sys 0m0.084s
|
210 | ```
|
211 |
|
212 | Much of the overhead comes from diff generation, so a greater number of matches
|
213 | will increase running time.
|