UNPKG

keyword-extractor/README.md

Version:

3.28 kBMarkdownView Raw

1# Keyword Extractor
2
3[![Tests Status](https://github.com/michaeldelorenzo/keyword-extractor/workflows/test/badge.svg)](https://github.com/michaeldelorenzo/keyword-extractor/actions)
4
5A simple [NPM package](https://npmjs.org/package/keyword-extractor) for extracting _keywords_ from a string by
6removing stopwords.
7
8## Installation
9
10```sh
11$ npm install keyword-extractor
12```
13
14## Running tests
15
16To run the test suite, first install the development dependencies by running the following command within the package's
17directory.
18
19```sh
20$ npm install
21```
22
23To execute the package's tests, run:
24
25``` sh
26$ make test
27```
28
29## Usage of the Module
30
31```javascript
32//  include the Keyword Extractor
33const keyword_extractor = require("keyword-extractor");
34
35//  Opening sentence to NY Times Article at
36/*
37http://www.nytimes.com/2013/09/10/world/middleeast/
38surprise-russian-proposal-catches-obama-between-putin-and-house-republicans.html
39*/
40const sentence =
41"President Obama woke up Monday facing a Congressional defeat that many in both parties believed could hobble his presidency."
42
43//  Extract the keywords
44const extraction_result =
45keyword_extractor.extract(sentence,{
46    language:"english",
47    remove_digits: true,
48    return_changed_case:true,
49    remove_duplicates: false
50
51});
52
53/*
54  extraction result is:
55
56  [
57        "president",
58        "obama",
59        "woke",
60        "monday",
61        "facing",
62        "congressional",
63        "defeat",
64        "parties",
65        "believed",
66        "hobble",
67        "presidency"
68    ]
69*/
70```
71
72### Options Parameters
73
74The second argument of the _extract_ method is an Object of configuration/processing settings for the extraction.
75
76Parameter Name | Description | Permitted Values
77---------------|-------------|-----------------
78language       | The stopwords list to use. ISO 639-1 codes and verbose names | _ar_, _cs_, _da_, _de_, _en_, _es_, _fa_, _fr_, _gl_, _it_, _ko_, _nl_, _pl_, _pt_, _ro_, _ru_, _sv_, _tr_, _vi_, _arabic_, _czech_, _danish_, _dutch_, _english_, _french_, _galician_,_german_, _italian_, _korean_, _persian_, _polish_, _portuguese_, _romanian_, _russian_,_spanish_, _swedish_, _turkish_, _vietnam_
79remove_digits | Removes all digits from the results if set to true (can handle Arabic and Perisan digits too) | _true_ or _false_
80return_changed_case | The case of the extracted keywords. Setting the value to _true_ will return the results all lower-cased, if _false_ the results will be in the original case. | _true_ or _false_
81return_chained_words | Instead of returning each word separately, join the words that were originally together. Setting the value to _true_ will join the words, if _false_ the results will be splitted on each array element. | _true_ or _false_
82remove_duplicates | Removes the duplicate keywords | _true_ , _false_ (defaults to _false_ )
83return_max_ngrams | Returns keywords that are ngrams with size 0-_integer_ | _integer_ , _false_ (defaults to _false_ )
84
85
86## Credits
87
88The initial stopwords lists are taken from the following sources:
89
90- English [http://www.ai.mit.edu/projects/jmlr/papers/volume5/lewis04a/a11-smart-stop-list/english.stop]
91- Spanish [https://stop-words.googlecode.com/svn/trunk/stop-words/stop-words/stop-words-spanish.txt]
92- Turkish [https://github.com/ahmetax/trstop]

1	`# Keyword Extractor`
2
3	`[![Tests Status](https://github.com/michaeldelorenzo/keyword-extractor/workflows/test/badge.svg)](https://github.com/michaeldelorenzo/keyword-extractor/actions)`
4
5	`A simple [NPM package](https://npmjs.org/package/keyword-extractor) for extracting _keywords_ from a string by`
6	`removing stopwords.`
7
8	`## Installation`
9
10	```sh
11	`$ npm install keyword-extractor`
12	```
13
14	`## Running tests`
15
16	`To run the test suite, first install the development dependencies by running the following command within the package's`
17	`directory.`
18
19	```sh
20	`$ npm install`
21	```
22
23	`To execute the package's tests, run:`
24
25	``` sh
26	`$ make test`
27	```
28
29	`## Usage of the Module`
30
31	```javascript
32	`// include the Keyword Extractor`
33	`const keyword_extractor = require("keyword-extractor");`
34
35	`// Opening sentence to NY Times Article at`
36	`/*`
37	`http://www.nytimes.com/2013/09/10/world/middleeast/`
38	`surprise-russian-proposal-catches-obama-between-putin-and-house-republicans.html`
39	`*/`
40	`const sentence =`
41	`"President Obama woke up Monday facing a Congressional defeat that many in both parties believed could hobble his presidency."`
42
43	`// Extract the keywords`
44	`const extraction_result =`
45	`keyword_extractor.extract(sentence,{`
46	`language:"english",`
47	`remove_digits: true,`
48	`return_changed_case:true,`
49	`remove_duplicates: false`
50
51	`});`
52
53	`/*`
54	`extraction result is:`
55
56	`[`
57	`"president",`
58	`"obama",`
59	`"woke",`
60	`"monday",`
61	`"facing",`
62	`"congressional",`
63	`"defeat",`
64	`"parties",`
65	`"believed",`
66	`"hobble",`
67	`"presidency"`
68	`]`
69	`*/`
70	```
71
72	`### Options Parameters`
73
74	`The second argument of the _extract_ method is an Object of configuration/processing settings for the extraction.`
75
76	`Parameter Name \| Description \| Permitted Values`
77	`---------------\|-------------\|-----------------`
78	`language \| The stopwords list to use. ISO 639-1 codes and verbose names \| _ar_, _cs_, _da_, _de_, _en_, _es_, _fa_, _fr_, _gl_, _it_, _ko_, _nl_, _pl_, _pt_, _ro_, _ru_, _sv_, _tr_, _vi_, _arabic_, _czech_, _danish_, _dutch_, _english_, _french_, _galician_,_german_, _italian_, _korean_, _persian_, _polish_, _portuguese_, _romanian_, _russian_,_spanish_, _swedish_, _turkish_, _vietnam_`
79	`remove_digits \| Removes all digits from the results if set to true (can handle Arabic and Perisan digits too) \| _true_ or _false_`
80	`return_changed_case \| The case of the extracted keywords. Setting the value to _true_ will return the results all lower-cased, if _false_ the results will be in the original case. \| _true_ or _false_`
81	`return_chained_words \| Instead of returning each word separately, join the words that were originally together. Setting the value to _true_ will join the words, if _false_ the results will be splitted on each array element. \| _true_ or _false_`
82	`remove_duplicates \| Removes the duplicate keywords \| _true_ , _false_ (defaults to _false_ )`
83	`return_max_ngrams \| Returns keywords that are ngrams with size 0-_integer_ \| _integer_ , _false_ (defaults to _false_ )`
84
85
86	`## Credits`
87
88	`The initial stopwords lists are taken from the following sources:`
89
90	`- English [http://www.ai.mit.edu/projects/jmlr/papers/volume5/lewis04a/a11-smart-stop-list/english.stop]`
91	`- Spanish [https://stop-words.googlecode.com/svn/trunk/stop-words/stop-words/stop-words-spanish.txt]`
92	`- Turkish [https://github.com/ahmetax/trstop]`