1 | # Feedparser - Robust RSS, Atom, and RDF feed parsing in Node.js
|
2 |
|
3 | [![Greenkeeper badge](https://badges.greenkeeper.io/danmactough/node-feedparser.svg)](https://greenkeeper.io/)
|
4 |
|
5 | [![Join the chat at https://gitter.im/danmactough/node-feedparser](https://badges.gitter.im/danmactough/node-feedparser.svg)](https://gitter.im/danmactough/node-feedparser?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge)
|
6 |
|
7 | [![Build Status](https://secure.travis-ci.org/danmactough/node-feedparser.png?branch=master)](https://travis-ci.org/danmactough/node-feedparser)
|
8 |
|
9 | [![NPM](https://nodei.co/npm/feedparser.png?downloads=true&downloadRank=true&stars=true)](https://nodei.co/npm/feedparser/)
|
10 |
|
11 | Feedparser is for parsing RSS, Atom, and RDF feeds in node.js.
|
12 |
|
13 | It has a couple features you don't usually see in other feed parsers:
|
14 |
|
15 | 1. It resolves relative URLs (such as those seen in Tim Bray's "ongoing" [feed](http://www.tbray.org/ongoing/ongoing.atom)).
|
16 | 2. It properly handles XML namespaces (including those in unusual feeds
|
17 | that define a non-default namespace for the main feed elements).
|
18 |
|
19 | ## Installation
|
20 |
|
21 | ```bash
|
22 | npm install feedparser
|
23 | ```
|
24 |
|
25 | ## Usage
|
26 |
|
27 | This example is just to briefly demonstrate basic concepts.
|
28 |
|
29 | **Please** also review the [compressed example](examples/compressed.js) for a
|
30 | thorough working example that is a suitable starting point for your app.
|
31 |
|
32 | ```js
|
33 |
|
34 | var FeedParser = require('feedparser');
|
35 | var request = require('request'); // for fetching the feed
|
36 |
|
37 | var req = request('http://somefeedurl.xml')
|
38 | var feedparser = new FeedParser([options]);
|
39 |
|
40 | req.on('error', function (error) {
|
41 | // handle any request errors
|
42 | });
|
43 |
|
44 | req.on('response', function (res) {
|
45 | var stream = this; // `this` is `req`, which is a stream
|
46 |
|
47 | if (res.statusCode !== 200) {
|
48 | this.emit('error', new Error('Bad status code'));
|
49 | }
|
50 | else {
|
51 | stream.pipe(feedparser);
|
52 | }
|
53 | });
|
54 |
|
55 | feedparser.on('error', function (error) {
|
56 | // always handle errors
|
57 | });
|
58 |
|
59 | feedparser.on('readable', function () {
|
60 | // This is where the action is!
|
61 | var stream = this; // `this` is `feedparser`, which is a stream
|
62 | var meta = this.meta; // **NOTE** the "meta" is always available in the context of the feedparser instance
|
63 | var item;
|
64 |
|
65 | while (item = stream.read()) {
|
66 | console.log(item);
|
67 | }
|
68 | });
|
69 |
|
70 | ```
|
71 |
|
72 | ### options
|
73 |
|
74 | - `normalize` - Set to `false` to override Feedparser's default behavior,
|
75 | which is to parse feeds into an object that contains the generic properties
|
76 | patterned after (although not identical to) the RSS 2.0 format, regardless
|
77 | of the feed's format.
|
78 |
|
79 | - `addmeta` - Set to `false` to override Feedparser's default behavior, which
|
80 | is to add the feed's `meta` information to each article.
|
81 |
|
82 | - `feedurl` - The url (string) of the feed. FeedParser is very good at
|
83 | resolving relative urls in feeds. But some feeds use relative urls without
|
84 | declaring the `xml:base` attribute any place in the feed. This is perfectly
|
85 | valid, but we don't know know the feed's url before we start parsing the feed
|
86 | and trying to resolve those relative urls. If we discover the feed's url, we
|
87 | will go back and resolve the relative urls we've already seen, but this takes
|
88 | a little time (not much). If you want to be sure we never have to re-resolve
|
89 | relative urls (or if FeedParser is failing to properly resolve relative urls),
|
90 | you should set the `feedurl` option. Otherwise, feel free to ignore this option.
|
91 |
|
92 | - `resume_saxerror` - Set to `false` to override Feedparser's default behavior, which
|
93 | is to emit any `SAXError` on `error` and then automatically resume parsing. In
|
94 | my experience, `SAXErrors` are not usually fatal, so this is usually helpful
|
95 | behavior. If you want total control over handling these errors and optionally
|
96 | aborting parsing the feed, use this option.
|
97 |
|
98 | ## Examples
|
99 |
|
100 | See the [`examples`](examples/) directory.
|
101 |
|
102 | ## API
|
103 |
|
104 | ### Transform Stream
|
105 |
|
106 | Feedparser is a [transform stream](http://nodejs.org/api/stream.html#stream_class_stream_transform) operating in "object mode": XML in -> Javascript objects out.
|
107 | Each readable chunk is an object representing an article in the feed.
|
108 |
|
109 | ### Events Emitted
|
110 |
|
111 | * `meta` - called with feed `meta` when it has been parsed
|
112 | * `error` - called with `error` whenever there is a Feedparser error of any kind (SAXError, Feedparser error, etc.)
|
113 |
|
114 | ## What is the parsed output produced by feedparser?
|
115 |
|
116 | Feedparser parses each feed into a `meta` (emitted on the `meta` event) portion
|
117 | and one or more `articles` (emited on the `data` event or readable after the `readable`
|
118 | is emitted).
|
119 |
|
120 | Regardless of the format of the feed, the `meta` and each `article` contain a
|
121 | uniform set of generic properties patterned after (although not identical to)
|
122 | the RSS 2.0 format, as well as all of the properties originally contained in the
|
123 | feed. So, for example, an Atom feed may have a `meta.description` property, but
|
124 | it will also have a `meta['atom:subtitle']` property.
|
125 |
|
126 | The purpose of the generic properties is to provide the user a uniform interface
|
127 | for accessing a feed's information without needing to know the feed's format
|
128 | (i.e., RSS versus Atom) or having to worry about handling the differences
|
129 | between the formats. However, the original information is also there, in case
|
130 | you need it. In addition, Feedparser supports some popular namespace extensions
|
131 | (or portions of them), such as portions of the `itunes`, `media`, `feedburner`
|
132 | and `pheedo` extensions. So, for example, if a feed article contains either an
|
133 | `itunes:image` or `media:thumbnail`, the url for that image will be contained in
|
134 | the article's `image.url` property.
|
135 |
|
136 | All generic properties are "pre-initialized" to `null` (or empty arrays or
|
137 | objects for certain properties). This should save you from having to do a lot of
|
138 | checking for `undefined`, such as, for example, when you are using jade
|
139 | templates.
|
140 |
|
141 | In addition, all properties (and namespace prefixes) use only lowercase letters,
|
142 | regardless of how they were capitalized in the original feed. ("xmlUrl" and
|
143 | "pubDate" also are still used to provide backwards compatibility.) This decision
|
144 | places ease-of-use over purity -- hopefully, you will never need to think about
|
145 | whether you should camelCase "pubDate" ever again.
|
146 |
|
147 | The `title` and `description` properties of `meta` and the `title` property of
|
148 | each `article` have any HTML stripped if you let feedparser normalize the output.
|
149 | If you really need the HTML in those elements, there are always the originals:
|
150 | e.g., `meta['atom:subtitle']['#']`.
|
151 |
|
152 | ### List of meta properties
|
153 |
|
154 | * title
|
155 | * description
|
156 | * link (website link)
|
157 | * xmlurl (the canonical link to the feed, as specified by the feed)
|
158 | * date (most recent update)
|
159 | * pubdate (original published date)
|
160 | * author
|
161 | * language
|
162 | * image (an Object containing `url` and `title` properties)
|
163 | * favicon (a link to the favicon -- only provided by Atom feeds)
|
164 | * copyright
|
165 | * generator
|
166 | * categories (an Array of Strings)
|
167 |
|
168 | ### List of article properties
|
169 |
|
170 | * title
|
171 | * description (frequently, the full article content)
|
172 | * summary (frequently, an excerpt of the article content)
|
173 | * link
|
174 | * origlink (when FeedBurner or Pheedo puts a special tracking url in the `link` property, `origlink` contains the original link)
|
175 | * permalink (when an RSS feed has a `guid` field and the `isPermalink` attribute is not set to `false`, `permalink` contains the value of `guid`)
|
176 | * date (most recent update)
|
177 | * pubdate (original published date)
|
178 | * author
|
179 | * guid (a unique identifier for the article)
|
180 | * comments (a link to the article's comments section)
|
181 | * image (an Object containing `url` and `title` properties)
|
182 | * categories (an Array of Strings)
|
183 | * source (an Object containing `url` and `title` properties pointing to the original source for an article; see the [RSS Spec](http://cyber.law.harvard.edu/rss/rss.html#ltsourcegtSubelementOfLtitemgt) for an explanation of this element)
|
184 | * enclosures (an Array of Objects, each representing a podcast or other enclosure and having a `url` property and possibly `type` and `length` properties)
|
185 | * meta (an Object containing all the feed meta properties; especially handy when using the EventEmitter interface to listen to `article` emissions)
|
186 |
|
187 | ## Help
|
188 |
|
189 | - Don't be afraid to report an [issue](https://github.com/danmactough/node-feedparser/issues).
|
190 | - You can drop by [Gitter](https://gitter.im/danmactough/node-feedparser?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge), too.
|
191 |
|
192 | ## Contributors
|
193 |
|
194 | View all the [contributors](https://github.com/danmactough/node-feedparser/graphs/contributors).
|
195 |
|
196 | Although `node-feedparser` no longer shares any code with `node-easyrss`, it was
|
197 | the original inspiration and a starting point.
|
198 |
|
199 | ## License
|
200 |
|
201 | (The MIT License)
|
202 |
|
203 | Copyright (c) 2011-2016 Dan MacTough and contributors
|
204 |
|
205 | Permission is hereby granted, free of charge, to any person obtaining a copy of
|
206 | this software and associated documentation files (the 'Software'), to deal in
|
207 | the Software without restriction, including without limitation the rights to
|
208 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
|
209 | the Software, and to permit persons to whom the Software is furnished to do so,
|
210 | subject to the following conditions:
|
211 |
|
212 | The above copyright notice and this permission notice shall be included in all
|
213 | copies or substantial portions of the Software.
|
214 |
|
215 | THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
216 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
|
217 | FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
|
218 | COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
|
219 | IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
|
220 | CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|