1 | ![jsinspect](http://danielstjules.com/github/jsinspect-logo.png)
|
2 |
|
3 | Detect copy-pasted and structurally similar code. The inspector identifies
|
4 | duplicate code, even if modified, as well as common boilerplate or logic that
|
5 | should be the target of refactoring.
|
6 |
|
7 | [![Build Status](https://travis-ci.org/danielstjules/jsinspect.svg?branch=master)](https://travis-ci.org/danielstjules/jsinspect)
|
8 |
|
9 | * [Overview](#overview)
|
10 | * [Installation](#installation)
|
11 | * [Usage](#usage)
|
12 | * [Integration](#integration)
|
13 | * [Reporters](#reporters)
|
14 | * [Performance](#performance)
|
15 |
|
16 | ## Overview
|
17 |
|
18 | We've all had to deal with code smell, and duplicate code is a common source.
|
19 | While some instances are easy to spot, this type of searching is the perfect
|
20 | use-case for a helpful CLI tool.
|
21 |
|
22 | Existing solutions do exist for this purpose, but are often token-based and
|
23 | rely on string searching methods such as the Rabin–Karp algorithm. Why isn't
|
24 | this always ideal? Those tools may struggle with code that has wildly varying
|
25 | identifiers, despite having the same structure and behavior.
|
26 |
|
27 | And copy-pasted code is but one type of code duplication. Common boilerplate
|
28 | and repeated logic can be identified as well using jsinspect, since it
|
29 | doesn't work on tokens - it uses the ASTs of the parsed code.
|
30 |
|
31 | You have the freedom to specify a threshold determining the smallest subset of
|
32 | nodes to analyze. This will identify code with a similar structure, based
|
33 | on the AST node types, e.g. BlockStatement, VariableDeclaration,
|
34 | ObjectExpression, etc. For copy-paste oriented detection, you can even limit
|
35 | the search to nodes with matching identifiers.
|
36 |
|
37 | The tool accepts a list of paths to parse, and outputs any matches along
|
38 | with a series of 2-way diffs. Any directories among the paths are walked
|
39 | recursively, and only `.js` files are analyzed. Any `node_modules` and
|
40 | `bower_components` dirs are also ignored. Being built for JavaScript, it also
|
41 | ignores ES6 module declarations, CommonJS require statements, and AMD define
|
42 | expressions.
|
43 |
|
44 | ![screenshot](http://danielstjules.com/github/jsinspect-example.png)
|
45 |
|
46 | ## Installation
|
47 |
|
48 | It can be installed via `npm` using:
|
49 |
|
50 | ``` bash
|
51 | npm install -g jsinspect
|
52 | ```
|
53 |
|
54 | Also available: [grunt-jsinspect](https://github.com/stefanjudis/grunt-jsinspect),
|
55 | and [gulp-jsinspect](https://github.com/alexeyraspopov/gulp-jsinspect)
|
56 |
|
57 | ## Usage
|
58 |
|
59 | ```
|
60 | Usage: jsinspect [options] <paths ...>
|
61 |
|
62 | Duplicate code and structure detection for JavaScript.
|
63 | Identifier matching is disabled by default. Example use:
|
64 | jsinspect -t 30 -i --ignore "Test.js" ./path/to/src
|
65 |
|
66 |
|
67 | Options:
|
68 |
|
69 | -h, --help output usage information
|
70 | -V, --version output the version number
|
71 | -t, --threshold <number> number of nodes (default: 15)
|
72 | -i, --identifiers match identifiers
|
73 | -j, --jsx process jsx files (default: false)
|
74 | -c, --config path to config file (default: .jsinspectrc)
|
75 | -r, --reporter [default|json|pmd] specify the reporter to use
|
76 | -s, --suppress <number> length to suppress diffs (default: 100, off: 0)
|
77 | -D, --no-diff disable 2-way diffs
|
78 | -C, --no-color disable colors
|
79 | --ignore <pattern> ignore paths matching a regex
|
80 | ```
|
81 |
|
82 | If a `.jsinspectrc` file is located in the project directory, its values will
|
83 | be used in place of the defaults listed above. For example:
|
84 |
|
85 | ``` javascript
|
86 | {
|
87 | "threshold": 30,
|
88 | "identifiers": true,
|
89 | "ignore": "Test.js|Spec.js", // used as RegExp,
|
90 | "jsx": true,
|
91 | "reporter": "json",
|
92 | "suppress": 100
|
93 | }
|
94 | ```
|
95 |
|
96 | On first use with a project, you may want to run the tool with the following
|
97 | options, while running explicitly on the lib/src directories, and not the
|
98 | test/spec dir.
|
99 |
|
100 | ```
|
101 | jsinspect -t 30 -i ./path/to/src
|
102 | ```
|
103 |
|
104 | From there, feel free to try incrementally decreasing the threshold and
|
105 | ignoring identifiers. A threshold of 20 may lead you to discover new areas of
|
106 | interest for refactoring or cleanup. Each project or library may be different.
|
107 |
|
108 | ## Integration
|
109 |
|
110 | It's simple to run jsinspect on your library source as part of a build
|
111 | process. It will exit with an error code of 0 when no matches are found,
|
112 | resulting in a passing step, and a positive error code corresponding to its
|
113 | failure. For example, with Travis CI, you could add the following entries
|
114 | to your `.travis.yml`:
|
115 |
|
116 | ``` yaml
|
117 | before_script:
|
118 | - "npm install -g jsinspect"
|
119 |
|
120 | script:
|
121 | - "jsinspect -t 30 ./path/to/src"
|
122 | ```
|
123 |
|
124 | Note that in the above example, we're using a threshold of 30 for detecting
|
125 | structurally similar code. A lower threshold may work for your build process,
|
126 | but ~30 should help detect unnecessary boilerplate, while avoiding excessive
|
127 | output.
|
128 |
|
129 | To have jsinspect run with each job, but not block or fail the build, you can
|
130 | use something like the following:
|
131 |
|
132 | ``` yaml
|
133 | script:
|
134 | - "jsinspect -t 30 ./path/to/src || true"
|
135 | ```
|
136 |
|
137 | ## Reporters
|
138 |
|
139 | Aside from the default reporter, both JSON and PMD CPD-style XML reporters are
|
140 | available. Note that in the JSON example below, indentation and formatting
|
141 | has been applied.
|
142 |
|
143 | #### JSON
|
144 |
|
145 | ``` json
|
146 | [{
|
147 | "instances":[
|
148 | {"path":"spec/fixtures/intersection.js","lines":[1,5]},
|
149 | {"path":"spec/fixtures/intersection.js","lines":[7,11]}
|
150 | ],
|
151 | "diffs":[
|
152 | {
|
153 | "-":{"path":"spec/fixtures/intersection.js","lines":[1,5]},
|
154 | "+":{"path":"spec/fixtures/intersection.js","lines":[7,11]},
|
155 | "diff":"- function intersectionA(array1, array2) {\n- array1.filter(function(n) {\n- return array2.indexOf(n) != -1;\n+ function intersectionB(arrayA, arrayB) {\n+ arrayA.filter(function(n) {\n+ return arrayB.indexOf(n) != -1;\n });\n }\n"
|
156 | }
|
157 | ]
|
158 | }]
|
159 | ```
|
160 |
|
161 | #### PMD CPD XML
|
162 |
|
163 | ``` xml
|
164 | <?xml version="1.0" encoding="utf-8"?>
|
165 | <pmd-cpd>
|
166 | <duplication lines="10">
|
167 | <file path="/jsinspect/spec/fixtures/intersection.js" line="1"/>
|
168 | <file path="/jsinspect/spec/fixtures/intersection.js" line="7"/>
|
169 | <codefragment>
|
170 | - spec/fixtures/intersection.js:1,5
|
171 | + spec/fixtures/intersection.js:7,11
|
172 |
|
173 | - function intersectionA(array1, array2) {
|
174 | - array1.filter(function(n) {
|
175 | - return array2.indexOf(n) != -1;
|
176 | + function intersectionB(arrayA, arrayB) {
|
177 | + arrayA.filter(function(n) {
|
178 | + return arrayB.indexOf(n) != -1;
|
179 | });
|
180 | }
|
181 | </codefragment>
|
182 | </duplication>
|
183 | </pmd-cpd>
|
184 | ```
|
185 |
|
186 | ## Performance
|
187 |
|
188 | Running on a medium sized code base, with a 2.4Ghz i5 MPB, yielded the
|
189 | following results:
|
190 |
|
191 | ``` bash
|
192 | $ find src/ -name '*.js' | xargs wc -l
|
193 | # ...
|
194 | 44810 total
|
195 |
|
196 | $ time jsinspect -t 30 src/
|
197 | # Looking for structural similarities..
|
198 | 41 matches found across 800 files
|
199 |
|
200 | real 0m1.542s
|
201 | user 0m1.472s
|
202 | sys 0m0.071s
|
203 |
|
204 | $ time jsinspect -i -t 15 src/
|
205 | # Looking for copy-pasted code..
|
206 | 96 matches found across 800 files
|
207 |
|
208 | real 0m1.283s
|
209 | user 0m1.196s
|
210 | sys 0m0.084s
|
211 | ```
|
212 |
|
213 | Much of the overhead comes from diff generation, so a greater number of matches
|
214 | will increase running time.
|