1 | # crawler-hbase
|
2 | a library to interact with the crawler tables stored in hbase.
|
3 | crawler hbase exports two modules: class called Client which constructs an hbase client and a module Utils which is an object containing helper functions.
|
4 |
|
5 | ## Class Client
|
6 | ```javascript
|
7 | var HbaseClient = require("crawler-hbase").Client;
|
8 | var client = new HbaseClient("0.0.0.0:9090");
|
9 | ```
|
10 |
|
11 | #### CrawlHbaseClient(dbUrl)
|
12 | Constructs the client using the provided hbase dbUrl. It is assumed that there is Hbase-thrift running on the provided dbUrl.
|
13 |
|
14 |
|
15 | #### storeRawCrawl(crawl)
|
16 | Stores a raw crawl into table raw_crawls.
|
17 |
|
18 | #### getRows(startKey, endKey, limit, descending, tableName, filterString)
|
19 | The generic get function used by almost all the other specific gets
|
20 |
|
21 | #### getLatestRawCrawl()
|
22 | Returns the latest raw crawl.
|
23 |
|
24 | #### getRawCrawlByKey(key)
|
25 | Gets a raw crawl by key.
|
26 |
|
27 | #### storeProcessedCrawl(newCrawl, oldCrawl)
|
28 | Stores newCrawl. oldCrawl is used to calculate the changes that happened between the two crawls.
|
29 |
|
30 | #### getCrawlInfo(crawlKey)
|
31 | Get crawl info.
|
32 |
|
33 | #### getNodeHistory(pubKey)
|
34 | Get the array of all different versions tha given node appeared in crawls.
|
35 |
|
36 | #### getCrawlNodeStats(crawlKey)
|
37 | Get stats about the given nodes in the given crawl
|
38 |
|
39 | #### getConnections(crawlKey, pubKey, type)
|
40 | Get links between nodes. type is either 'in' or 'out' to get ingoing or outgoing connections respectively.
|
41 |
|
42 | #### getAllConnections(crawlKey)
|
43 | Get all links for the given crawl
|
44 |
|
45 | ## Utils
|
46 | provides helper methods to work with hbase tables' keys which have a lot of hidden information in them.
|
47 |
|
48 | #### keyToStart(key)
|
49 | Get crawl start time from crawl's key
|
50 |
|
51 | #### keyToEnd(key)
|
52 | Get crawl end time from crawl's key |
\ | No newline at end of file |