1 | node-webcrawler ChangeLog
|
2 | -------------------------
|
3 |
|
4 | 1.2.1
|
5 | * [#310](https://github.com/bda-research/node-crawler/issues/310) Upgrade dependencies' version(@mike442144)
|
6 | * [#303](https://github.com/bda-research/node-crawler/issues/303) Update seenreq to v3(@mike442144)
|
7 | * [#304](https://github.com/bda-research/node-crawler/pull/304) Replacement of istanbul with nyc (@kossidts)
|
8 | * [#300](https://github.com/bda-research/node-crawler/pull/300) Add formData arg to requestArgs (@humandevmode)
|
9 | * [#280](https://github.com/bda-research/node-crawler/pull/280) 20180611 updatetestwithnock (@Dong-Gao)
|
10 |
|
11 | 1.2.0
|
12 | * [#278](https://github.com/bda-research/node-crawler/pull/278) Added filestream require to download section (@swosko)
|
13 | * Use `nock` to mock testing instead of httpbin
|
14 | * Replace jshint by eslint
|
15 | * Fix code to pass eslint rules
|
16 |
|
17 | 1.1.4
|
18 | * Tolerate incorrect `Content-Type` header [#270](https://github.com/bda-research/node-crawler/pull/270), [#193](https://github.com/bda-research/node-crawler/issues/193)
|
19 | * Added examples [#272](https://github.com/bda-research/node-crawler/pull/272), [267](https://github.com/bda-research/node-crawler/issues/267)
|
20 | * Fixed "skipDuplicates" and "retries" config incompatible bug [#261](https://github.com/bda-research/node-crawler/issues/261)
|
21 | * Fix typo in README [#268](https://github.com/bda-research/node-crawler/pull/268)
|
22 |
|
23 | 1.1.3
|
24 | * Upgraded `request.js` and `lodash`
|
25 |
|
26 | 1.1.2
|
27 | * Recognize all XML MIME types to inject jQuery [#245](https://github.com/bda-research/node-crawler/pull/245)
|
28 | * Allow options to specify the Agent for Request [#246](https://github.com/bda-research/node-crawler/pull/246)
|
29 | * Added logo
|
30 |
|
31 | 1.1.1
|
32 | * added a way to replace the global options.headers keys by queuing options.headers [#241](https://github.com/bda-research/node-crawler/issues/241)
|
33 | * fix bug of using last jar object if current options doesn't contain `jar` option [#240](https://github.com/bda-research/node-crawler/issues/240)
|
34 | * fix bug of encoding [#233](https://github.com/bda-research/node-crawler/issues/233)
|
35 | * added seenreq options [#208](https://github.com/bda-research/node-crawler/issues/208)
|
36 | * added preRequest, setLimiterProperty, direct request functions
|
37 |
|
38 | 1.0.5
|
39 | * fix missing debugging messages [#213](https://github.com/bda-research/node-crawler/issues/213)
|
40 | * fix bug of 'drain' never called [#210](https://github.com/bda-research/node-crawler/issues/210)
|
41 |
|
42 | 1.0.4
|
43 | * fix bug of charset detecting [#203](https://github.com/bda-research/node-crawler/issues/203)
|
44 | * keep node version up to date in travis scripts
|
45 |
|
46 | 1.0.3
|
47 | * fix bug, skipDuplicate and rotateUA don't work even if set true
|
48 |
|
49 | 1.0.0
|
50 | * upgrade jsdom up to 9.6.x
|
51 | * remove 0.10 and 0.12 support [#170](https://github.com/bda-research/node-crawler/issues/170)
|
52 | * control dependencies version using ^ and ~ [#169](https://github.com/bda-research/node-crawler/issues/169)
|
53 | * remove node-pool
|
54 | * notify bottleneck until a task is completed
|
55 | * replace bottleneck by bottleneckp, which has priority
|
56 | * change default log function
|
57 | * use event listener on `request` and `drain` instead of global function [#144](https://github.com/bda-research/node-crawler/issues/144)
|
58 | * default set forceUTF8 to true
|
59 | * detect `ESOCKETTIMEDOUT` instead of `ETIMEDOUT` when timeout in test
|
60 | * add `done` function in callback to avoid async trap
|
61 | * do not convert response body to string if `encoding` is null [#118](https://github.com/bda-research/node-crawler/issues/118)
|
62 | * add result document [#68](https://github.com/bda-research/node-crawler/issues/68) [#116](https://github.com/bda-research/node-crawler/issues/116)
|
63 | * add event `schedule` which is emitted when a task is being added to scheduler
|
64 | * in callback, move $ into `res` because of weird API
|
65 | * change rateLimits to rateLimit
|
66 |
|
67 | 0.7.5
|
68 | * delete entity in options before copy, and assgin after, `jar` is one of the typical properties which is an `Entity` wich functions [#177](https://github.com/bda-research/node-crawler/issues/177)
|
69 | * upgrade `request` to version 2.74.0
|
70 |
|
71 | 0.7.4
|
72 | * change `debug` option to instance level instead of `options`
|
73 | * update README.md to detail error handling
|
74 | * call `onDrain` with scope of `this`
|
75 | * upgrade `seenreq` version to 0.1.7
|
76 |
|
77 | 0.7.0
|
78 | * cancel recursion in queue
|
79 | * upgrade `request` version to v2.67.0
|
80 |
|
81 | 0.6.9
|
82 | * use `bottleneckConcurrent` instead of `maxConnections`, default `10000`
|
83 | * add debug info
|
84 |
|
85 | 0.6.5
|
86 | * fix a deep and big bug when initializing Pool, that may lead to sequence execution. [#2](https://github.com/bda-research/node-webcrawler/issues/2)
|
87 | * print log of Pool status
|
88 |
|
89 | 0.6.3
|
90 | * you could also get `result.options` from callback even when some errors ouccurred [#127](https://github.com/bda-research/node-crawler/issues/127) [#86](https://github.com/bda-research/node-crawler/issues/86)
|
91 | * add test for `bottleneck`
|
92 |
|
93 | 0.6.0
|
94 | * add `bottleneck` to implement rate limit, one can set limit for each connection at same time.
|
95 |
|
96 | 0.5.2
|
97 | * you can manually terminate all the resources in your pool, when `onDrain` called, before their timeouts have been reached
|
98 | * add a read-only property `queueSize` to crawler [#148](https://github.com/bda-research/node-crawler/issues/148) [#76](https://github.com/bda-research/node-crawler/issues/76) [#107](https://github.com/bda-research/node-crawler/issues/107)
|
99 |
|
100 | 0.5.1
|
101 | * remove cache feature, it's useless
|
102 | * add `localAddress`, `time`, `tunnel`, `proxyHeaderWhiteList`, `proxyHeaderExclusiveList` properties to pass to `request` [#155](https://github.com/bda-research/node-crawler/issues/155)
|
103 |
|
104 | 0.5.0
|
105 | * parse charset from `content-type` in http headers or meta tag in html, then convert
|
106 | * big5 charset is avaliable as the `iconv-lite` has already supported it
|
107 | * default enable gzip in request header
|
108 | * remove unzip code in crawler since `request` will do this
|
109 | * body will return as a Buffer if encoding is null which is an option in `request`
|
110 | * remove cache and skip duplicate `request` for `GET`, `POST`(only for type `urlencode`), `HEAD`
|
111 | * add log feature, you can use `winston` to set `logger:winston`, or crawler will output to console
|
112 | * rotate user-agent in case some sites ban your requests
|
113 |
|