UNPKG

feedparser/README.md

Version:

9.7 kBMarkdownView Raw

1#  Feedparser - Robust RSS, Atom, and RDF feed parsing in Node.js
2
3[![Greenkeeper badge](https://badges.greenkeeper.io/danmactough/node-feedparser.svg)](https://greenkeeper.io/)
4
5[![Join the chat at https://gitter.im/danmactough/node-feedparser](https://badges.gitter.im/danmactough/node-feedparser.svg)](https://gitter.im/danmactough/node-feedparser?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
6
7[![Build Status](https://secure.travis-ci.org/danmactough/node-feedparser.png?branch=master)](https://travis-ci.org/danmactough/node-feedparser)
8
9[![NPM](https://nodei.co/npm/feedparser.png?downloads=true&downloadRank=true&stars=true)](https://nodei.co/npm/feedparser/)
10
11Feedparser is for parsing RSS, Atom, and RDF feeds in node.js.
12
13It has a couple features you don't usually see in other feed parsers:
14
151. It resolves relative URLs (such as those seen in Tim Bray's "ongoing" [feed](http://www.tbray.org/ongoing/ongoing.atom)).
162. It properly handles XML namespaces (including those in unusual feeds
17that define a non-default namespace for the main feed elements).
18
19## Installation
20
21```bash
22npm install feedparser
23```
24
25## Usage
26
27This example is just to briefly demonstrate basic concepts.
28
29**Please** also review the [compressed example](examples/compressed.js) for a
30thorough working example that is a suitable starting point for your app.
31
32```js
33
34var FeedParser = require('feedparser');
35var request = require('request'); // for fetching the feed
36
37var req = request('http://somefeedurl.xml')
38var feedparser = new FeedParser([options]);
39
40req.on('error', function (error) {
41  // handle any request errors
42});
43
44req.on('response', function (res) {
45  var stream = this; // `this` is `req`, which is a stream
46
47  if (res.statusCode !== 200) {
48    this.emit('error', new Error('Bad status code'));
49  }
50  else {
51    stream.pipe(feedparser);
52  }
53});
54
55feedparser.on('error', function (error) {
56  // always handle errors
57});
58
59feedparser.on('readable', function () {
60  // This is where the action is!
61  var stream = this; // `this` is `feedparser`, which is a stream
62  var meta = this.meta; // **NOTE** the "meta" is always available in the context of the feedparser instance
63  var item;
64
65  while (item = stream.read()) {
66    console.log(item);
67  }
68});
69
70```
71
72### options
73
74- `normalize` - Set to `false` to override Feedparser's default behavior,
75  which is to parse feeds into an object that contains the generic properties
76  patterned after (although not identical to) the RSS 2.0 format, regardless
77  of the feed's format.
78
79- `addmeta` - Set to `false` to override Feedparser's default behavior, which
80  is to add the feed's `meta` information to each article.
81
82- `feedurl` - The url (string) of the feed. FeedParser is very good at
83  resolving relative urls in feeds. But some feeds use relative urls without
84  declaring the `xml:base` attribute any place in the feed. This is perfectly
85  valid, but we don't know know the feed's url before we start parsing the feed
86  and trying to resolve those relative urls. If we discover the feed's url, we
87  will go back and resolve the relative urls we've already seen, but this takes
88  a little time (not much). If you want to be sure we never have to re-resolve
89  relative urls (or if FeedParser is failing to properly resolve relative urls),
90  you should set the `feedurl` option. Otherwise, feel free to ignore this option.
91
92- `resume_saxerror` - Set to `false` to override Feedparser's default behavior, which
93  is to emit any `SAXError` on `error` and then automatically resume parsing. In
94  my experience, `SAXErrors` are not usually fatal, so this is usually helpful
95  behavior. If you want total control over handling these errors and optionally
96  aborting parsing the feed, use this option.
97
98## Examples
99
100See the [`examples`](examples/) directory.
101
102## API
103
104### Transform Stream
105
106Feedparser is a [transform stream](http://nodejs.org/api/stream.html#stream_class_stream_transform) operating in "object mode": XML in -> Javascript objects out.
107Each readable chunk is an object representing an article in the feed.
108
109### Events Emitted
110
111* `meta` - called with feed `meta` when it has been parsed
112* `error` - called with `error` whenever there is a Feedparser error of any kind (SAXError, Feedparser error, etc.)
113
114## What is the parsed output produced by feedparser?
115
116Feedparser parses each feed into a `meta` (emitted on the `meta` event) portion
117and one or more `articles` (emited on the `data` event or readable after the `readable`
118is emitted).
119
120Regardless of the format of the feed, the `meta` and each `article` contain a
121uniform set of generic properties patterned after (although not identical to)
122the RSS 2.0 format, as well as all of the properties originally contained in the
123feed. So, for example, an Atom feed may have a `meta.description` property, but
124it will also have a `meta['atom:subtitle']` property.
125
126The purpose of the generic properties is to provide the user a uniform interface
127for accessing a feed's information without needing to know the feed's format
128(i.e., RSS versus Atom) or having to worry about handling the differences
129between the formats. However, the original information is also there, in case
130you need it. In addition, Feedparser supports some popular namespace extensions
131(or portions of them), such as portions of the `itunes`, `media`, `feedburner`
132and `pheedo` extensions. So, for example, if a feed article contains either an
133`itunes:image` or `media:thumbnail`, the url for that image will be contained in
134the article's `image.url` property.
135
136All generic properties are "pre-initialized" to `null` (or empty arrays or
137objects for certain properties). This should save you from having to do a lot of
138checking for `undefined`, such as, for example, when you are using jade
139templates.
140
141In addition, all properties (and namespace prefixes) use only lowercase letters,
142regardless of how they were capitalized in the original feed. ("xmlUrl" and
143"pubDate" also are still used to provide backwards compatibility.) This decision
144places ease-of-use over purity -- hopefully, you will never need to think about
145whether you should camelCase "pubDate" ever again.
146
147The `title` and `description` properties of `meta` and the `title` property of
148each `article` have any HTML stripped if you let feedparser normalize the output.
149If you really need the HTML in those elements, there are always the originals:
150e.g., `meta['atom:subtitle']['#']`.
151
152### List of meta properties
153
154* title
155* description
156* link (website link)
157* xmlurl (the canonical link to the feed, as specified by the feed)
158* date (most recent update)
159* pubdate (original published date)
160* author
161* language
162* image (an Object containing `url` and `title` properties)
163* favicon (a link to the favicon -- only provided by Atom feeds)
164* copyright
165* generator
166* categories (an Array of Strings)
167
168### List of article properties
169
170* title
171* description (frequently, the full article content)
172* summary (frequently, an excerpt of the article content)
173* link
174* origlink (when FeedBurner or Pheedo puts a special tracking url in the `link` property, `origlink` contains the original link)
175* permalink (when an RSS feed has a `guid` field and the `isPermalink` attribute is not set to `false`, `permalink` contains the value of `guid`)
176* date (most recent update)
177* pubdate (original published date)
178* author
179* guid (a unique identifier for the article)
180* comments (a link to the article's comments section)
181* image (an Object containing `url` and `title` properties)
182* categories (an Array of Strings)
183* source (an Object containing `url` and `title` properties pointing to the original source for an article; see the [RSS Spec](http://cyber.law.harvard.edu/rss/rss.html#ltsourcegtSubelementOfLtitemgt) for an explanation of this element)
184* enclosures (an Array of Objects, each representing a podcast or other enclosure and having a `url` property and possibly `type` and `length` properties)
185* meta (an Object containing all the feed meta properties; especially handy when using the EventEmitter interface to listen to `article` emissions)
186
187## Help
188
189- Don't be afraid to report an [issue](https://github.com/danmactough/node-feedparser/issues).
190- You can drop by [Gitter](https://gitter.im/danmactough/node-feedparser?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge), too.
191
192## Contributors
193
194View all the [contributors](https://github.com/danmactough/node-feedparser/graphs/contributors).
195
196Although `node-feedparser` no longer shares any code with `node-easyrss`, it was
197the original inspiration and a starting point.
198
199## License
200
201(The MIT License)
202
203Copyright (c) 2011-2016 Dan MacTough and contributors
204
205Permission is hereby granted, free of charge, to any person obtaining a copy of
206this software and associated documentation files (the 'Software'), to deal in
207the Software without restriction, including without limitation the rights to
208use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
209the Software, and to permit persons to whom the Software is furnished to do so,
210subject to the following conditions:
211
212The above copyright notice and this permission notice shall be included in all
213copies or substantial portions of the Software.
214
215THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
216IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
217FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
218COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
219IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
220CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

1	`# Feedparser - Robust RSS, Atom, and RDF feed parsing in Node.js`
2
3	`[![Greenkeeper badge](https://badges.greenkeeper.io/danmactough/node-feedparser.svg)](https://greenkeeper.io/)`
4
5	`[![Join the chat at https://gitter.im/danmactough/node-feedparser](https://badges.gitter.im/danmactough/node-feedparser.svg)](https://gitter.im/danmactough/node-feedparser?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)`
6
7	`[![Build Status](https://secure.travis-ci.org/danmactough/node-feedparser.png?branch=master)](https://travis-ci.org/danmactough/node-feedparser)`
8
9	`[![NPM](https://nodei.co/npm/feedparser.png?downloads=true&downloadRank=true&stars=true)](https://nodei.co/npm/feedparser/)`
10
11	`Feedparser is for parsing RSS, Atom, and RDF feeds in node.js.`
12
13	`It has a couple features you don't usually see in other feed parsers:`
14
15	`1. It resolves relative URLs (such as those seen in Tim Bray's "ongoing" [feed](http://www.tbray.org/ongoing/ongoing.atom)).`
16	`2. It properly handles XML namespaces (including those in unusual feeds`
17	`that define a non-default namespace for the main feed elements).`
18
19	`## Installation`
20
21	```bash
22	`npm install feedparser`
23	```
24
25	`## Usage`
26
27	`This example is just to briefly demonstrate basic concepts.`
28
29	`Please also review the [compressed example](examples/compressed.js) for a`
30	`thorough working example that is a suitable starting point for your app.`
31
32	```js
33
34	`var FeedParser = require('feedparser');`
35	`var request = require('request'); // for fetching the feed`
36
37	`var req = request('http://somefeedurl.xml')`
38	`var feedparser = new FeedParser([options]);`
39
40	`req.on('error', function (error) {`
41	`// handle any request errors`
42	`});`
43
44	`req.on('response', function (res) {`
45	var stream = this; // `this` is `req`, which is a stream
46
47	`if (res.statusCode !== 200) {`
48	`this.emit('error', new Error('Bad status code'));`
49	`}`
50	`else {`
51	`stream.pipe(feedparser);`
52	`}`
53	`});`
54
55	`feedparser.on('error', function (error) {`
56	`// always handle errors`
57	`});`
58
59	`feedparser.on('readable', function () {`
60	`// This is where the action is!`
61	var stream = this; // `this` is `feedparser`, which is a stream
62	`var meta = this.meta; // NOTE the "meta" is always available in the context of the feedparser instance`
63	`var item;`
64
65	`while (item = stream.read()) {`
66	`console.log(item);`
67	`}`
68	`});`
69
70	```
71
72	`### options`
73
74	- `normalize` - Set to `false` to override Feedparser's default behavior,
75	`which is to parse feeds into an object that contains the generic properties`
76	`patterned after (although not identical to) the RSS 2.0 format, regardless`
77	`of the feed's format.`
78
79	- `addmeta` - Set to `false` to override Feedparser's default behavior, which
80	is to add the feed's `meta` information to each article.
81
82	- `feedurl` - The url (string) of the feed. FeedParser is very good at
83	`resolving relative urls in feeds. But some feeds use relative urls without`
84	declaring the `xml:base` attribute any place in the feed. This is perfectly
85	`valid, but we don't know know the feed's url before we start parsing the feed`
86	`and trying to resolve those relative urls. If we discover the feed's url, we`
87	`will go back and resolve the relative urls we've already seen, but this takes`
88	`a little time (not much). If you want to be sure we never have to re-resolve`
89	`relative urls (or if FeedParser is failing to properly resolve relative urls),`
90	you should set the `feedurl` option. Otherwise, feel free to ignore this option.
91
92	- `resume_saxerror` - Set to `false` to override Feedparser's default behavior, which
93	is to emit any `SAXError` on `error` and then automatically resume parsing. In
94	my experience, `SAXErrors` are not usually fatal, so this is usually helpful
95	`behavior. If you want total control over handling these errors and optionally`
96	`aborting parsing the feed, use this option.`
97
98	`## Examples`
99
100	See the [`examples`](examples/) directory.
101
102	`## API`
103
104	`### Transform Stream`
105
106	`Feedparser is a [transform stream](http://nodejs.org/api/stream.html#stream_class_stream_transform) operating in "object mode": XML in -> Javascript objects out.`
107	`Each readable chunk is an object representing an article in the feed.`
108
109	`### Events Emitted`
110
111	* `meta` - called with feed `meta` when it has been parsed
112	* `error` - called with `error` whenever there is a Feedparser error of any kind (SAXError, Feedparser error, etc.)
113
114	`## What is the parsed output produced by feedparser?`
115
116	Feedparser parses each feed into a `meta` (emitted on the `meta` event) portion
117	and one or more `articles` (emited on the `data` event or readable after the `readable`
118	`is emitted).`
119
120	Regardless of the format of the feed, the `meta` and each `article` contain a
121	`uniform set of generic properties patterned after (although not identical to)`
122	`the RSS 2.0 format, as well as all of the properties originally contained in the`
123	feed. So, for example, an Atom feed may have a `meta.description` property, but
124	it will also have a `meta['atom:subtitle']` property.
125
126	`The purpose of the generic properties is to provide the user a uniform interface`
127	`for accessing a feed's information without needing to know the feed's format`
128	`(i.e., RSS versus Atom) or having to worry about handling the differences`
129	`between the formats. However, the original information is also there, in case`
130	`you need it. In addition, Feedparser supports some popular namespace extensions`
131	(or portions of them), such as portions of the `itunes`, `media`, `feedburner`
132	and `pheedo` extensions. So, for example, if a feed article contains either an
133	`itunes:image` or `media:thumbnail`, the url for that image will be contained in
134	the article's `image.url` property.
135
136	All generic properties are "pre-initialized" to `null` (or empty arrays or
137	`objects for certain properties). This should save you from having to do a lot of`
138	checking for `undefined`, such as, for example, when you are using jade
139	`templates.`
140
141	`In addition, all properties (and namespace prefixes) use only lowercase letters,`
142	`regardless of how they were capitalized in the original feed. ("xmlUrl" and`
143	`"pubDate" also are still used to provide backwards compatibility.) This decision`
144	`places ease-of-use over purity -- hopefully, you will never need to think about`
145	`whether you should camelCase "pubDate" ever again.`
146
147	The `title` and `description` properties of `meta` and the `title` property of
148	each `article` have any HTML stripped if you let feedparser normalize the output.
149	`If you really need the HTML in those elements, there are always the originals:`
150	e.g., `meta['atom:subtitle']['#']`.
151
152	`### List of meta properties`
153
154	`* title`
155	`* description`
156	`* link (website link)`
157	`* xmlurl (the canonical link to the feed, as specified by the feed)`
158	`* date (most recent update)`
159	`* pubdate (original published date)`
160	`* author`
161	`* language`
162	* image (an Object containing `url` and `title` properties)
163	`* favicon (a link to the favicon -- only provided by Atom feeds)`
164	`* copyright`
165	`* generator`
166	`* categories (an Array of Strings)`
167
168	`### List of article properties`
169
170	`* title`
171	`* description (frequently, the full article content)`
172	`* summary (frequently, an excerpt of the article content)`
173	`* link`
174	* origlink (when FeedBurner or Pheedo puts a special tracking url in the `link` property, `origlink` contains the original link)
175	* permalink (when an RSS feed has a `guid` field and the `isPermalink` attribute is not set to `false`, `permalink` contains the value of `guid`)
176	`* date (most recent update)`
177	`* pubdate (original published date)`
178	`* author`
179	`* guid (a unique identifier for the article)`
180	`* comments (a link to the article's comments section)`
181	* image (an Object containing `url` and `title` properties)
182	`* categories (an Array of Strings)`
183	* source (an Object containing `url` and `title` properties pointing to the original source for an article; see the [RSS Spec](http://cyber.law.harvard.edu/rss/rss.html#ltsourcegtSubelementOfLtitemgt) for an explanation of this element)
184	* enclosures (an Array of Objects, each representing a podcast or other enclosure and having a `url` property and possibly `type` and `length` properties)
185	* meta (an Object containing all the feed meta properties; especially handy when using the EventEmitter interface to listen to `article` emissions)
186
187	`## Help`
188
189	`- Don't be afraid to report an [issue](https://github.com/danmactough/node-feedparser/issues).`
190	`- You can drop by [Gitter](https://gitter.im/danmactough/node-feedparser?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge), too.`
191
192	`## Contributors`
193
194	`View all the [contributors](https://github.com/danmactough/node-feedparser/graphs/contributors).`
195
196	Although `node-feedparser` no longer shares any code with `node-easyrss`, it was
197	`the original inspiration and a starting point.`
198
199	`## License`
200
201	`(The MIT License)`
202
203	`Copyright (c) 2011-2016 Dan MacTough and contributors`
204
205	`Permission is hereby granted, free of charge, to any person obtaining a copy of`
206	`this software and associated documentation files (the 'Software'), to deal in`
207	`the Software without restriction, including without limitation the rights to`
208	`use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of`
209	`the Software, and to permit persons to whom the Software is furnished to do so,`
210	`subject to the following conditions:`
211
212	`The above copyright notice and this permission notice shall be included in all`
213	`copies or substantial portions of the Software.`
214
215	`THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR`
216	`IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS`
217	`FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR`
218	`COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER`
219	`IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN`
220	`CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.`