UNPKG

article-parser

Version:

Extract clean article data from given URL.

143 lines (105 loc) 4.09 kB
# article-parser Extract main article, main image and meta data from URL. [![NPM](https://badge.fury.io/js/article-parser.svg)](https://badge.fury.io/js/article-parser) [![Build Status](https://travis-ci.org/ndaidong/article-parser.svg?branch=master)](https://travis-ci.org/ndaidong/article-parser) [![codecov](https://codecov.io/gh/ndaidong/article-parser/branch/master/graph/badge.svg)](https://codecov.io/gh/ndaidong/article-parser) [![Dependency Status](https://gemnasium.com/badges/github.com/ndaidong/article-parser.svg)](https://gemnasium.com/github.com/ndaidong/article-parser) [![NSP Status](https://nodesecurity.io/orgs/techpush/projects/d965e951-5bc6-41d3-90da-81e2a3b7e40f/badge)](https://nodesecurity.io/orgs/techpush/projects/d965e951-5bc6-41d3-90da-81e2a3b7e40f) ### Usage ``` npm install article-parser ``` Then: ``` var { extract } = require('article-parser'); let url = 'https://goo.gl/MV8Tkh'; extract(url).then((article) => { console.log(article); }).catch((err) => { console.log(err); }); ``` ### APIs - [configure(Object conf)](#configureobject-conf) - [extract(String url)](#extractstring-url) - [parseWithEmbedly(String url [, String EmbedlyKey])](#parsewithembedlystring-url--string-embedlykey) - [getConfig()](#getconfig) #### configure(Object conf) ``` { fetchOptions: Object, wordsPerMinute: Number, htmlRules: Object, SoundCloudKey: String, YouTubeKey: String, EmbedlyKey: String } ``` - fetchOptions: Object, simple version of [node-fetch options](https://www.npmjs.com/package/node-fetch#options). Only `headers`, `timeout` and `agent` are available here. - wordsPerMinute: Number, default 300, use to estimate time to read - htmlRules: Object, options to to clean HTML with [sanitize-html](https://www.npmjs.com/package/sanitize-html#what-are-the-default-options) - SoundCloudKey: String, use to get audio duration. Get it [here](https://developers.soundcloud.com/). - YouTubeKey: String, use to get video duration. Get it [here](https://console.developers.google.com/). - EmbedlyKey: String, use to extract with Embedly API. Refer [here](http://docs.embed.ly/docs/extract). Default configurations may work for most case. #### extract(String url) Extract article data from specified url. ``` var { extract } = require('article-parser'); let url = 'https://www.youtube.com/watch?v=tRGJj59G1x4'; extract(url).then((article) => { console.log(article); }).catch((err) => { console.log(err); }); ``` Now *article* would be something like this: ``` { title: 'Zato ESB - Test demo hosted on company server', alias: 'zato-esb-test-demo-hosted-on-company-server-1500021746537-PAQXw8IYcU', url: 'https://www.youtube.com/watch?v=tRGJj59G1x4', canonicals: [ 'https://www.youtube.com/watch?v=tRGJj59G1x4', 'https://youtu.be/tRGJj59G1x4', 'https://www.youtube.com/v/tRGJj59G1x4', 'https://www.youtube.com/embed/tRGJj59G1x4' ], description: 'Our sample: https://github.com/greenglobal/zato-demo Zato homepage: https://zato.io Tutorial: "Zato — a powerful Python-based ESB solution for your SOA" http...', content: '<iframe src="https://www.youtube.com/embed/tRGJj59G1x4?feature=oembed" frameborder="0" allowfullscreen></iframe>', image: 'https://i.ytimg.com/vi/tRGJj59G1x4/hqdefault.jpg', author: 'Dong Nguyen', source: 'YouTube', domain: 'youtube.com', publishedTime: '', duration: 292 } ``` #### parseWithEmbedly(String url [, String EmbedlyKey]) Extract article data from specified url using [Embedly Extract API](http://embed.ly/extract): The second parameter is optional. If you've added your Embedly key via configure() method, you can ignore it here. ``` var { parseWithEmbedly } = require('article-parser'); let url = 'https://goo.gl/MV8Tkh'; parseWithEmbedly(url).then((article) => { console.log(article); }).catch((err) => { console.log(err); }); ``` #### getConfig() Return the current configurations. ## Test ``` git clone https://github.com/ndaidong/article-parser.git cd article-parser npm install npm test ``` # License The MIT License (MIT)