UNPKG

3 kBMarkdownView Raw
1# Keyword Extractor
2
3[![Tests Status](https://github.com/michaeldelorenzo/keyword-extractor/workflows/test/badge.svg)](https://github.com/michaeldelorenzo/keyword-extractor/actions)
4
5A simple [NPM package](https://npmjs.org/package/keyword-extractor) for extracting _keywords_ from a string by
6removing stopwords.
7
8## Installation
9
10```sh
11$ npm install keyword-extractor
12```
13
14## Running tests
15
16To run the test suite, first install the development dependencies by running the following command within the package's
17directory.
18
19```sh
20$ npm install
21```
22
23To execute the package's tests, run:
24
25``` sh
26$ make test
27```
28
29## Usage of the Module
30
31```javascript
32// include the Keyword Extractor
33const keyword_extractor = require("keyword-extractor");
34
35// Opening sentence to NY Times Article at
36/*
37http://www.nytimes.com/2013/09/10/world/middleeast/
38surprise-russian-proposal-catches-obama-between-putin-and-house-republicans.html
39*/
40const sentence =
41"President Obama woke up Monday facing a Congressional defeat that many in both parties believed could hobble his presidency."
42
43// Extract the keywords
44const extraction_result =
45keyword_extractor.extract(sentence,{
46 language:"english",
47 remove_digits: true,
48 return_changed_case:true,
49 remove_duplicates: false
50
51});
52
53/*
54 extraction result is:
55
56 [
57 "president",
58 "obama",
59 "woke",
60 "monday",
61 "facing",
62 "congressional",
63 "defeat",
64 "parties",
65 "believed",
66 "hobble",
67 "presidency"
68 ]
69*/
70```
71
72### Options Parameters
73
74The second argument of the _extract_ method is an Object of configuration/processing settings for the extraction.
75
76Parameter Name | Description | Permitted Values
77---------------|-------------|-----------------
78language | The stopwords list to use. | _english_, _spanish_, _polish_, _german_, _french_, _italian_, _dutch_, _romanian_, _russian_, _portuguese_, _swedish_, _arabic_, _persian_
79remove_digits | Removes all digits from the results if set to true (can handle Arabic and Perisan digits too) | _true_ or _false_
80return_changed_case | The case of the extracted keywords. Setting the value to _true_ will return the results all lower-cased, if _false_ the results will be in the original case. | _true_ or _false_
81return_chained_words | Instead of returning each word separately, join the words that were originally together. Setting the value to _true_ will join the words, if _false_ the results will be splitted on each array element. | _true_ or _false_
82remove_duplicates | Removes the duplicate keywords | _true_ , _false_ (defaults to _false_ )
83return_max_ngrams | Returns keywords that are ngrams with size 0-_integer_ | _integer_ , _false_ (defaults to _false_ )
84
85
86## Credits
87
88The initial stopwords lists are taken from the following sources:
89
90- English [http://jmlr.org/papers/volume5/lewis04a/a11-smart-stop-list/english.stop]
91- Spanish [https://stop-words.googlecode.com/svn/trunk/stop-words/stop-words/stop-words-spanish.txt]