UNPKG

3.1 kBJSONView Raw
1{
2 "name": "scrape-it",
3 "description": "A Node.js scraper for humans.",
4 "keywords": [
5 "scrape",
6 "it",
7 "a",
8 "scraping",
9 "module",
10 "for",
11 "humans"
12 ],
13 "license": "MIT",
14 "version": "5.3.0",
15 "main": "lib/index.js",
16 "types": "lib/index.d.ts",
17 "scripts": {
18 "test": "node test"
19 },
20 "author": "Ionică Bizău <bizauionica@gmail.com> (https://ionicabizau.net)",
21 "contributors": [
22 "ComFreek <comfreek@outlook.com> (https://github.com/ComFreek)",
23 "Jim Buck <jim@jimmyboh.com> (https://github.com/JimmyBoh)"
24 ],
25 "repository": {
26 "type": "git",
27 "url": "git+ssh://git@github.com/IonicaBizau/scrape-it.git"
28 },
29 "bugs": {
30 "url": "https://github.com/IonicaBizau/scrape-it/issues"
31 },
32 "homepage": "https://github.com/IonicaBizau/scrape-it#readme",
33 "blah": {
34 "h_img": "https://i.imgur.com/j3Z0rbN.png",
35 "cli": "scrape-it-cli",
36 "description": "Want to save time or not using Node.js? Try our [hosted API](https://scrape-it.saasify.sh).",
37 "installation": [
38 {
39 "h2": "FAQ"
40 },
41 {
42 "p": "Here are some frequent questions and their answers."
43 },
44 {
45 "h3": "1. How to parse scrape pages?"
46 },
47 {
48 "p": "`scrape-it` has only a simple request module for making requests. That means you cannot directly parse ajax pages with it, but in general you will have those scenarios:"
49 },
50 {
51 "ol": [
52 "**The ajax response is in JSON format.** In this case, you can make the request directly, without needing a scraping library.",
53 "**The ajax response gives you HTML back.** Instead of calling the main website (e.g. example.com), pass to `scrape-it` the ajax url (e.g. `example.com/api/that-endpoint`) and you will you will be able to parse the response",
54 "**The ajax request is so complicated that you don't want to reverse-engineer it.** In this case, use a headless browser (e.g. Google Chrome, Electron, PhantomJS) to load the content and then use the `.scrapeHTML` method from scrape it once you get the HTML loaded on the page."
55 ]
56 },
57 {
58 "h3": "2. Crawling"
59 },
60 {
61 "p": "There is no fancy way to crawl pages with `scrape-it`. For simple scenarios, you can parse the list of urls from the initial page and then, using Promises, parse each page. Also, you can use a different crawler to download the website and then use the `.scrapeHTML` method to scrape the local files."
62 },
63 {
64 "h3": "3. Local files"
65 },
66 {
67 "p": "Use the `.scrapeHTML` to parse the HTML read from the local files using `fs.readFile`."
68 }
69 ]
70 },
71 "dependencies": {
72 "assured": "^1.0.14",
73 "cheerio-req": "^1.2.3",
74 "scrape-it-core": "^1.0.0",
75 "typpy": "^2.3.11"
76 },
77 "devDependencies": {
78 "lien": "^3.3.0",
79 "tester": "^1.4.4"
80 },
81 "files": [
82 "bin/",
83 "app/",
84 "lib/",
85 "dist/",
86 "src/",
87 "scripts/",
88 "resources/",
89 "menu/",
90 "cli.js",
91 "index.js",
92 "bloggify.js",
93 "bloggify.json",
94 "bloggify/"
95 ]
96}