UNPKG

6.15 kBMarkdownView Raw
1node-webcrawler ChangeLog
2-------------------------
3
41.2.1
5 * [#310](https://github.com/bda-research/node-crawler/issues/310) Upgrade dependencies' version(@mike442144)
6 * [#303](https://github.com/bda-research/node-crawler/issues/303) Update seenreq to v3(@mike442144)
7 * [#304](https://github.com/bda-research/node-crawler/pull/304) Replacement of istanbul with nyc (@kossidts)
8 * [#300](https://github.com/bda-research/node-crawler/pull/300) Add formData arg to requestArgs (@humandevmode)
9 * [#280](https://github.com/bda-research/node-crawler/pull/280) 20180611 updatetestwithnock (@Dong-Gao)
10
111.2.0
12 * [#278](https://github.com/bda-research/node-crawler/pull/278) Added filestream require to download section (@swosko)
13 * Use `nock` to mock testing instead of httpbin
14 * Replace jshint by eslint
15 * Fix code to pass eslint rules
16
171.1.4
18 * Tolerate incorrect `Content-Type` header [#270](https://github.com/bda-research/node-crawler/pull/270), [#193](https://github.com/bda-research/node-crawler/issues/193)
19 * Added examples [#272](https://github.com/bda-research/node-crawler/pull/272), [267](https://github.com/bda-research/node-crawler/issues/267)
20 * Fixed "skipDuplicates" and "retries" config incompatible bug [#261](https://github.com/bda-research/node-crawler/issues/261)
21 * Fix typo in README [#268](https://github.com/bda-research/node-crawler/pull/268)
22
231.1.3
24 * Upgraded `request.js` and `lodash`
25
261.1.2
27 * Recognize all XML MIME types to inject jQuery [#245](https://github.com/bda-research/node-crawler/pull/245)
28 * Allow options to specify the Agent for Request [#246](https://github.com/bda-research/node-crawler/pull/246)
29 * Added logo
30
311.1.1
32 * added a way to replace the global options.headers keys by queuing options.headers [#241](https://github.com/bda-research/node-crawler/issues/241)
33 * fix bug of using last jar object if current options doesn't contain `jar` option [#240](https://github.com/bda-research/node-crawler/issues/240)
34 * fix bug of encoding [#233](https://github.com/bda-research/node-crawler/issues/233)
35 * added seenreq options [#208](https://github.com/bda-research/node-crawler/issues/208)
36 * added preRequest, setLimiterProperty, direct request functions
37
381.0.5
39 * fix missing debugging messages [#213](https://github.com/bda-research/node-crawler/issues/213)
40 * fix bug of 'drain' never called [#210](https://github.com/bda-research/node-crawler/issues/210)
41
421.0.4
43 * fix bug of charset detecting [#203](https://github.com/bda-research/node-crawler/issues/203)
44 * keep node version up to date in travis scripts
45
461.0.3
47 * fix bug, skipDuplicate and rotateUA don't work even if set true
48
491.0.0
50 * upgrade jsdom up to 9.6.x
51 * remove 0.10 and 0.12 support [#170](https://github.com/bda-research/node-crawler/issues/170)
52 * control dependencies version using ^ and ~ [#169](https://github.com/bda-research/node-crawler/issues/169)
53 * remove node-pool
54 * notify bottleneck until a task is completed
55 * replace bottleneck by bottleneckp, which has priority
56 * change default log function
57 * use event listener on `request` and `drain` instead of global function [#144](https://github.com/bda-research/node-crawler/issues/144)
58 * default set forceUTF8 to true
59 * detect `ESOCKETTIMEDOUT` instead of `ETIMEDOUT` when timeout in test
60 * add `done` function in callback to avoid async trap
61 * do not convert response body to string if `encoding` is null [#118](https://github.com/bda-research/node-crawler/issues/118)
62 * add result document [#68](https://github.com/bda-research/node-crawler/issues/68) [#116](https://github.com/bda-research/node-crawler/issues/116)
63 * add event `schedule` which is emitted when a task is being added to scheduler
64 * in callback, move $ into `res` because of weird API
65 * change rateLimits to rateLimit
66
670.7.5
68 * delete entity in options before copy, and assgin after, `jar` is one of the typical properties which is an `Entity` wich functions [#177](https://github.com/bda-research/node-crawler/issues/177)
69 * upgrade `request` to version 2.74.0
70
710.7.4
72 * change `debug` option to instance level instead of `options`
73 * update README.md to detail error handling
74 * call `onDrain` with scope of `this`
75 * upgrade `seenreq` version to 0.1.7
76
770.7.0
78 * cancel recursion in queue
79 * upgrade `request` version to v2.67.0
80
810.6.9
82 * use `bottleneckConcurrent` instead of `maxConnections`, default `10000`
83 * add debug info
84
850.6.5
86 * fix a deep and big bug when initializing Pool, that may lead to sequence execution. [#2](https://github.com/bda-research/node-webcrawler/issues/2)
87 * print log of Pool status
88
890.6.3
90 * you could also get `result.options` from callback even when some errors ouccurred [#127](https://github.com/bda-research/node-crawler/issues/127) [#86](https://github.com/bda-research/node-crawler/issues/86)
91 * add test for `bottleneck`
92
930.6.0
94 * add `bottleneck` to implement rate limit, one can set limit for each connection at same time.
95
960.5.2
97 * you can manually terminate all the resources in your pool, when `onDrain` called, before their timeouts have been reached
98 * add a read-only property `queueSize` to crawler [#148](https://github.com/bda-research/node-crawler/issues/148) [#76](https://github.com/bda-research/node-crawler/issues/76) [#107](https://github.com/bda-research/node-crawler/issues/107)
99
1000.5.1
101 * remove cache feature, it's useless
102 * add `localAddress`, `time`, `tunnel`, `proxyHeaderWhiteList`, `proxyHeaderExclusiveList` properties to pass to `request` [#155](https://github.com/bda-research/node-crawler/issues/155)
103
1040.5.0
105 * parse charset from `content-type` in http headers or meta tag in html, then convert
106 * big5 charset is avaliable as the `iconv-lite` has already supported it
107 * default enable gzip in request header
108 * remove unzip code in crawler since `request` will do this
109 * body will return as a Buffer if encoding is null which is an option in `request`
110 * remove cache and skip duplicate `request` for `GET`, `POST`(only for type `urlencode`), `HEAD`
111 * add log feature, you can use `winston` to set `logger:winston`, or crawler will output to console
112 * rotate user-agent in case some sites ban your requests
113