1 | #Keyword Extractor
|
2 |
|
3 | A simple [NPM package](https://npmjs.org/package/keyword-extractor) for extracting _keywords_ from a string by
|
4 | removing stopwords.
|
5 |
|
6 | ## Installation
|
7 |
|
8 | ```sh
|
9 | $ npm install keyword-extractor
|
10 | ```
|
11 |
|
12 | ## Running tests
|
13 |
|
14 | To run the test suite, first install the development dependencies by running the following command within the package's
|
15 | directory.
|
16 |
|
17 | ```sh
|
18 | $ npm install
|
19 | ```
|
20 |
|
21 | To execute the package's tests, run:
|
22 |
|
23 | ``` sh
|
24 | $ make test
|
25 | ```
|
26 |
|
27 | ##Usage of the Module
|
28 |
|
29 | ```javascript
|
30 | // include the Keyword Extractor
|
31 | var keyword_extractor = require("keyword-extractor");
|
32 |
|
33 | // Opening sentence to NY Times Article at
|
34 | // http://www.nytimes.com/2013/09/10/world/middleeast/surprise-russian-proposal-catches-obama-between-putin-and-house-republicans.html
|
35 | var sentence = "President Obama woke up Monday facing a Congressional defeat that many in both parties believed could hobble his presidency."
|
36 |
|
37 | // Extract the keywords
|
38 | var extraction_result = keyword_extractor.extract(sentence,{
|
39 | language:"english",
|
40 | return_changed_case:true
|
41 | });
|
42 |
|
43 | /*
|
44 | extraction result is:
|
45 |
|
46 | [
|
47 | "president",
|
48 | "obama",
|
49 | "woke",
|
50 | "monday",
|
51 | "facing",
|
52 | "congressional",
|
53 | "defeat",
|
54 | "parties",
|
55 | "believed",
|
56 | "hobble",
|
57 | "presidency"
|
58 | ]
|
59 | */
|
60 | ```
|
61 |
|
62 | ###Options Parameters
|
63 |
|
64 | The second argument of the _extract_ method is an Object of configuration/processing settings for the extraction.
|
65 |
|
66 | Parameter Name | Description | Permitted Values
|
67 | ---------------|-------------|-----------------
|
68 | language | The stopwords list to use. | _english_ or _spanish_
|
69 | return_changed_case | The case of the extracted keywords. Setting the value to _true_ will return the results all lower-cased, if _false_ the results will be in the original case. | _true_ or _false_
|
70 |
|
71 | ## Credits
|
72 |
|
73 | The initial stopwords lists are taken from the following sources:
|
74 |
|
75 | - English [http://jmlr.org/papers/volume5/lewis04a/a11-smart-stop-list/english.stop]
|
76 | - Spanish [https://stop-words.googlecode.com/svn/trunk/stop-words/stop-words/stop-words-spanish.txt]
|