1 | # urllib
|
2 |
|
3 | [![NPM version][npm-image]][npm-url]
|
4 | [![build status][travis-image]][travis-url]
|
5 | [![Build Status](https://dev.azure.com/eggjs/egg/_apis/build/status/node-modules.urllib)](https://dev.azure.com/eggjs/egg/_build/latest?definitionId=7)
|
6 | [![Test coverage][codecov-image]][codecov-url]
|
7 | [![David deps][david-image]][david-url]
|
8 | [![Known Vulnerabilities][snyk-image]][snyk-url]
|
9 | [![npm download][download-image]][download-url]
|
10 |
|
11 | [npm-image]: https://img.shields.io/npm/v/urllib.svg?style=flat-square
|
12 | [npm-url]: https://npmjs.org/package/urllib
|
13 | [travis-image]: https://img.shields.io/travis/node-modules/urllib.svg?style=flat-square
|
14 | [travis-url]: https://travis-ci.org/node-modules/urllib
|
15 | [codecov-image]: https://codecov.io/gh/node-modules/urllib/branch/master/graph/badge.svg
|
16 | [codecov-url]: https://codecov.io/gh/node-modules/urllib
|
17 | [david-image]: https://img.shields.io/david/node-modules/urllib.svg?style=flat-square
|
18 | [david-url]: https://david-dm.org/node-modules/urllib
|
19 | [snyk-image]: https://snyk.io/test/npm/urllib/badge.svg?style=flat-square
|
20 | [snyk-url]: https://snyk.io/test/npm/urllib
|
21 | [download-image]: https://img.shields.io/npm/dm/urllib.svg?style=flat-square
|
22 | [download-url]: https://npmjs.org/package/urllib
|
23 |
|
24 | Request HTTP URLs in a complex world — basic
|
25 | and digest authentication, redirections, cookies, timeout and more.
|
26 |
|
27 | ## Install
|
28 |
|
29 | ```bash
|
30 | $ npm install urllib --save
|
31 | ```
|
32 |
|
33 | ## Usage
|
34 |
|
35 | ### callback
|
36 |
|
37 | ```js
|
38 | var urllib = require('urllib');
|
39 |
|
40 | urllib.request('http://cnodejs.org/', function (err, data, res) {
|
41 | if (err) {
|
42 | throw err; // you need to handle error
|
43 | }
|
44 | console.log(res.statusCode);
|
45 | console.log(res.headers);
|
46 | // data is Buffer instance
|
47 | console.log(data.toString());
|
48 | });
|
49 | ```
|
50 |
|
51 | ### Promise
|
52 |
|
53 | If you've installed [bluebird][bluebird],
|
54 | [bluebird][bluebird] will be used.
|
55 | `urllib` does not install [bluebird][bluebird] for you.
|
56 |
|
57 | Otherwise, if you're using a node that has native v8 Promises (v0.11.13+),
|
58 | then that will be used.
|
59 |
|
60 | Otherwise, this library will crash the process and exit,
|
61 | so you might as well install [bluebird][bluebird] as a dependency!
|
62 |
|
63 | ```js
|
64 | var urllib = require('urllib');
|
65 |
|
66 | urllib.request('http://nodejs.org').then(function (result) {
|
67 | // result: {data: buffer, res: response object}
|
68 | console.log('status: %s, body size: %d, headers: %j', result.res.statusCode, result.data.length, result.res.headers);
|
69 | }).catch(function (err) {
|
70 | console.error(err);
|
71 | });
|
72 | ```
|
73 |
|
74 | ### co & generator
|
75 |
|
76 | If you are using [co](https://github.com/visionmedia/co) or [koa](https://github.com/koajs/koa):
|
77 |
|
78 | ```js
|
79 | var co = require('co');
|
80 | var urllib = require('urllib');
|
81 |
|
82 | co(function* () {
|
83 | var result = yield urllib.requestThunk('http://nodejs.org');
|
84 | console.log('status: %s, body size: %d, headers: %j',
|
85 | result.status, result.data.length, result.headers);
|
86 | })();
|
87 | ```
|
88 |
|
89 | ## Global `response` event
|
90 |
|
91 | You should create a urllib instance first.
|
92 |
|
93 | ```js
|
94 | var httpclient = require('urllib').create();
|
95 |
|
96 | httpclient.on('response', function (info) {
|
97 | error: err,
|
98 | ctx: args.ctx,
|
99 | req: {
|
100 | url: url,
|
101 | options: options,
|
102 | size: requestSize,
|
103 | },
|
104 | res: res
|
105 | });
|
106 |
|
107 | httpclient.request('http://nodejs.org', function (err, body) {
|
108 | console.log('body size: %d', body.length);
|
109 | });
|
110 | ```
|
111 |
|
112 | ## API Doc
|
113 |
|
114 | ### Method: `http.request(url[, options][, callback])`
|
115 |
|
116 | #### Arguments
|
117 |
|
118 | - **url** String | Object - The URL to request, either a String or a Object that return by [url.parse](http://nodejs.org/api/url.html#url_url_parse_urlstr_parsequerystring_slashesdenotehost).
|
119 | - ***options*** Object - Optional
|
120 | - ***method*** String - Request method, defaults to `GET`. Could be `GET`, `POST`, `DELETE` or `PUT`. Alias 'type'.
|
121 | - ***data*** Object - Data to be sent. Will be stringify automatically.
|
122 | - ***dataAsQueryString*** Boolean - Force convert `data` to query string.
|
123 | - ***content*** String | [Buffer](http://nodejs.org/api/buffer.html) - Manually set the content of payload. If set, `data` will be ignored.
|
124 | - ***stream*** [stream.Readable](http://nodejs.org/api/stream.html#stream_class_stream_readable) - Stream to be pipe to the remote. If set, `data` and `content` will be ignored.
|
125 | - ***writeStream*** [stream.Writable](http://nodejs.org/api/stream.html#stream_class_stream_writable) - A writable stream to be piped by the response stream. Responding data will be write to this stream and `callback` will be called with `data` set `null` after finished writing.
|
126 | - ***consumeWriteStream*** [true] - consume the writeStream, invoke the callback after writeStream close.
|
127 | - ***contentType*** String - Type of request data. Could be `json`. If it's `json`, will auto set `Content-Type: application/json` header.
|
128 | - ***nestedQuerystring*** Boolean - urllib default use querystring to stringify form data which don't support nested object, will use [qs](https://github.com/ljharb/qs) instead of querystring to support nested object by set this option to true.
|
129 | - ***dataType*** String - Type of response data. Could be `text` or `json`. If it's `text`, the `callback`ed `data` would be a String. If it's `json`, the `data` of callback would be a parsed JSON Object and will auto set `Accept: application/json` header. Default `callback`ed `data` would be a `Buffer`.
|
130 | - **fixJSONCtlChars** Boolean - Fix the control characters (U+0000 through U+001F) before JSON parse response. Default is `false`.
|
131 | - ***headers*** Object - Request headers.
|
132 | - ***timeout*** Number | Array - Request timeout in milliseconds for connecting phase and response receiving phase. Defaults to `exports.TIMEOUT`, both are 5s. You can use `timeout: 5000` to tell urllib use same timeout on two phase or set them seperately such as `timeout: [3000, 5000]`, which will set connecting timeout to 3s and response 5s.
|
133 | - ***auth*** String - `username:password` used in HTTP Basic Authorization.
|
134 | - ***digestAuth*** String - `username:password` used in HTTP [Digest Authorization](http://en.wikipedia.org/wiki/Digest_access_authentication).
|
135 | - ***agent*** [http.Agent](http://nodejs.org/api/http.html#http_class_http_agent) - HTTP Agent object.
|
136 | Set `false` if you does not use agent.
|
137 | - ***httpsAgent*** [https.Agent](http://nodejs.org/api/https.html#https_class_https_agent) - HTTPS Agent object.
|
138 | Set `false` if you does not use agent.
|
139 | - ***ca*** String | Buffer | Array - An array of strings or Buffers of trusted certificates.
|
140 | If this is omitted several well known "root" CAs will be used, like VeriSign.
|
141 | These are used to authorize connections.
|
142 | **Notes**: This is necessary only if the server uses the self-signed certificate
|
143 | - ***rejectUnauthorized*** Boolean - If true, the server certificate is verified against the list of supplied CAs.
|
144 | An 'error' event is emitted if verification fails. Default: true.
|
145 | - ***pfx*** String | Buffer - A string or Buffer containing the private key,
|
146 | certificate and CA certs of the server in PFX or PKCS12 format.
|
147 | - ***key*** String | Buffer - A string or Buffer containing the private key of the client in PEM format.
|
148 | **Notes**: This is necessary only if using the client certificate authentication
|
149 | - ***cert*** String | Buffer - A string or Buffer containing the certificate key of the client in PEM format.
|
150 | **Notes**: This is necessary only if using the client certificate authentication
|
151 | - ***passphrase*** String - A string of passphrase for the private key or pfx.
|
152 | - ***ciphers*** String - A string describing the ciphers to use or exclude.
|
153 | - ***secureProtocol*** String - The SSL method to use, e.g. SSLv3_method to force SSL version 3.
|
154 | - ***followRedirect*** Boolean - follow HTTP 3xx responses as redirects. defaults to false.
|
155 | - ***maxRedirects*** Number - The maximum number of redirects to follow, defaults to 10.
|
156 | - ***formatRedirectUrl*** Function - Format the redirect url by your self. Default is `url.resolve(from, to)`.
|
157 | - ***beforeRequest*** Function - Before request hook, you can change every thing here.
|
158 | - ***streaming*** Boolean - let you get the `res` object when request connected, default `false`. alias `customResponse`
|
159 | - ***gzip*** Boolean - Accept gzip response content and auto decode it, default is `false`.
|
160 | - ***timing*** Boolean - Enable timing or not, default is `false`.
|
161 | - ***enableProxy*** Boolean - Enable proxy request, default is `false`.
|
162 | - ***proxy*** String | Object - proxy agent uri or options, default is `null`.
|
163 | - ***lookup*** Function - Custom DNS lookup function, default is `dns.lookup`. Require node >= 4.0.0(for http protocol) and node >=8(for https protocol)
|
164 | - ***checkAddress*** Function: optional, check request address to protect from SSRF and similar attacks. It receive tow arguments(`ip` and `family`) and should return true or false to identified the address is legal or not. It rely on `lookup` and have the same version requirement.
|
165 | - ***trace*** Boolean - Enable capture stack include call site of library entrance, default is `false`.
|
166 | - ***callback(err, data, res)*** Function - Optional callback.
|
167 | - **err** Error - Would be `null` if no error accured.
|
168 | - **data** Buffer | Object - The data responsed. Would be a Buffer if `dataType` is set to `text` or an JSON parsed into Object if it's set to `json`.
|
169 | - **res** [http.IncomingMessage](http://nodejs.org/api/http.html#http_http_incomingmessage) - The response.
|
170 |
|
171 | #### Returns
|
172 |
|
173 | [http.ClientRequest](http://nodejs.org/api/http.html#http_class_http_clientrequest) - The request.
|
174 |
|
175 | Calling `.abort()` method of the request stream can cancel the request.
|
176 |
|
177 | #### Options: `options.data`
|
178 |
|
179 | When making a request:
|
180 |
|
181 | ```js
|
182 | urllib.request('http://example.com', {
|
183 | method: 'GET',
|
184 | data: {
|
185 | 'a': 'hello',
|
186 | 'b': 'world'
|
187 | }
|
188 | });
|
189 | ```
|
190 |
|
191 | For `GET` request, `data` will be stringify to query string, e.g. `http://example.com/?a=hello&b=world`.
|
192 |
|
193 | For others like `POST`, `PATCH` or `PUT` request,
|
194 | in defaults, the `data` will be stringify into `application/x-www-form-urlencoded` format
|
195 | if `Content-Type` header is not set.
|
196 |
|
197 | If `Content-type` is `application/json`, the `data` will be `JSON.stringify` to JSON data format.
|
198 |
|
199 | #### Options: `options.content`
|
200 |
|
201 | `options.content` is useful when you wish to construct the request body by yourself,
|
202 | for example making a `Content-Type: application/json` request.
|
203 |
|
204 | Notes that if you want to send a JSON body, you should stringify it yourself:
|
205 |
|
206 | ```js
|
207 | urllib.request('http://example.com', {
|
208 | method: 'POST',
|
209 | headers: {
|
210 | 'Content-Type': 'application/json'
|
211 | },
|
212 | content: JSON.stringify({
|
213 | a: 'hello',
|
214 | b: 'world'
|
215 | })
|
216 | });
|
217 | ```
|
218 |
|
219 | It would make a HTTP request like:
|
220 |
|
221 | ```http
|
222 | POST / HTTP/1.1
|
223 | Host: example.com
|
224 | Content-Type: application/json
|
225 |
|
226 | {
|
227 | "a": "hello",
|
228 | "b": "world"
|
229 | }
|
230 | ```
|
231 |
|
232 | This exmaple can use `options.data` with `application/json` content type:
|
233 |
|
234 | ```js
|
235 | urllib.request('http://example.com', {
|
236 | method: 'POST',
|
237 | headers: {
|
238 | 'Content-Type': 'application/json'
|
239 | },
|
240 | data: {
|
241 | a: 'hello',
|
242 | b: 'world'
|
243 | }
|
244 | });
|
245 | ```
|
246 |
|
247 | #### Options: `options.stream`
|
248 |
|
249 | Uploads a file with [formstream](https://github.com/node-modules/formstream):
|
250 |
|
251 | ```js
|
252 | var urllib = require('urllib');
|
253 | var formstream = require('formstream');
|
254 |
|
255 | var form = formstream();
|
256 | form.file('file', __filename);
|
257 | form.field('hello', '你好urllib');
|
258 |
|
259 | var req = urllib.request('http://my.server.com/upload', {
|
260 | method: 'POST',
|
261 | headers: form.headers(),
|
262 | stream: form
|
263 | }, function (err, data, res) {
|
264 | // upload finished
|
265 | });
|
266 | ```
|
267 |
|
268 | ### Response Object
|
269 |
|
270 | Response is normal object, it contains:
|
271 |
|
272 | * `status` or `statusCode`: response status code.
|
273 | * `-1` meaning some network error like `ENOTFOUND`
|
274 | * `-2` meaning ConnectionTimeoutError
|
275 | * `headers`: response http headers, default is `{}`
|
276 | * `size`: response size
|
277 | * `aborted`: response was aborted or not
|
278 | * `rt`: total request and response time in ms.
|
279 | * `timing`: timing object if timing enable.
|
280 | * `remoteAddress`: http server ip address
|
281 | * `remotePort`: http server ip port
|
282 | * `socketHandledRequests`: socket already handled request count
|
283 | * `socketHandledResponses`: socket already handled response count
|
284 |
|
285 | #### Response: `res.aborted`
|
286 |
|
287 | If the underlaying connection was terminated before `response.end()` was called,
|
288 | `res.aborted` should be `true`.
|
289 |
|
290 | ```js
|
291 | require('http').createServer(function (req, res) {
|
292 | req.resume();
|
293 | req.on('end', function () {
|
294 | res.write('foo haha\n');
|
295 | setTimeout(function () {
|
296 | res.write('foo haha 2');
|
297 | setTimeout(function () {
|
298 | res.socket.end();
|
299 | }, 300);
|
300 | }, 200);
|
301 | return;
|
302 | });
|
303 | }).listen(1984);
|
304 |
|
305 | urllib.request('http://127.0.0.1:1984/socket.end', function (err, data, res) {
|
306 | data.toString().should.equal('foo haha\nfoo haha 2');
|
307 | should.ok(res.aborted);
|
308 | done();
|
309 | });
|
310 | ```
|
311 |
|
312 | ### HttpClient2
|
313 |
|
314 | HttpClient2 is a new instance for future. request method only return a promise, compatible with `async/await` and generator in co.
|
315 |
|
316 | #### Options
|
317 |
|
318 | options extends from urllib, besides below
|
319 |
|
320 | - ***retry*** Number - a retry count, when get an error, it will request again until reach the retry count.
|
321 | - ***retryDelay*** Number - wait a delay(ms) between retries.
|
322 | - ***isRetry*** Function - determine whether retry, a response object as the first argument. it will retry when status >= 500 by default. Request error is not included.
|
323 |
|
324 | ## Proxy
|
325 |
|
326 | Support both `http` and `https` protocol.
|
327 |
|
328 | **Notice: Only support on Node.js >= 4.0.0**
|
329 |
|
330 | ### Programming
|
331 |
|
332 | ```js
|
333 | urllib.request('https://twitter.com/', {
|
334 | enableProxy: true,
|
335 | proxy: 'http://localhost:8008',
|
336 | }, (err, data, res) => {
|
337 | console.log(res.status, res.headers);
|
338 | });
|
339 | ```
|
340 |
|
341 | ### System environment variable
|
342 |
|
343 | - http
|
344 |
|
345 | ```bash
|
346 | HTTP_PROXY=http://localhost:8008
|
347 | http_proxy=http://localhost:8008
|
348 | ```
|
349 |
|
350 | - https
|
351 |
|
352 | ```bash
|
353 | HTTP_PROXY=http://localhost:8008
|
354 | http_proxy=http://localhost:8008
|
355 | HTTPS_PROXY=https://localhost:8008
|
356 | https_proxy=https://localhost:8008
|
357 | ```
|
358 |
|
359 | ```bash
|
360 | $ http_proxy=http://localhost:8008 node index.js
|
361 | ```
|
362 |
|
363 | ### Trace
|
364 | If set trace true, error stack will contains full call stack, like
|
365 | ```
|
366 | Error: connect ECONNREFUSED 127.0.0.1:11
|
367 | at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1113:14)
|
368 | --------------------
|
369 | at ~/workspace/urllib/lib/urllib.js:150:13
|
370 | at new Promise (<anonymous>)
|
371 | at Object.request (~/workspace/urllib/lib/urllib.js:149:10)
|
372 | at Context.<anonymous> (~/workspace/urllib/test/urllib_promise.test.js:49:19)
|
373 | ....
|
374 | ```
|
375 |
|
376 | When open the trace, urllib may have poor perfomance, please consider carefully.
|
377 |
|
378 | ## TODO
|
379 |
|
380 | * [ ] Support component
|
381 | * [ ] Browser env use Ajax
|
382 | * [√] Support Proxy
|
383 | * [√] Upload file like form upload
|
384 | * [√] Auto redirect handle
|
385 | * [√] https & self-signed certificate
|
386 | * [√] Connection timeout & Response timeout
|
387 | * [√] Support `Accept-Encoding=gzip` by `options.gzip = true`
|
388 | * [√] Support [Digest access authentication](http://en.wikipedia.org/wiki/Digest_access_authentication)
|
389 |
|
390 | ## License
|
391 |
|
392 | [MIT](LICENSE.txt)
|
393 |
|
394 |
|
395 | [bluebird]: https://github.com/petkaantonov/bluebird
|