UNPKG

scrape-it/package.json

Version:

1{
"name": "scrape-it",
"description": "A Node.js scraper for humans.",
"keywords": [
  "scrape",
  "it",
  "a",
  "scraping",
  "module",
  "for",
  "humans"
],
"license": "MIT",
"version": "5.3.0",
"main": "lib/index.js",
"types": "lib/index.d.ts",
"scripts": {
  "test": "node test"
},
"author": "Ionică Bizău <bizauionica@gmail.com> (https://ionicabizau.net)",
"contributors": [
  "ComFreek <comfreek@outlook.com> (https://github.com/ComFreek)",
  "Jim Buck <jim@jimmyboh.com> (https://github.com/JimmyBoh)"
],
"repository": {
  "type": "git",
  "url": "git+ssh://git@github.com/IonicaBizau/scrape-it.git"
},
"bugs": {
  "url": "https://github.com/IonicaBizau/scrape-it/issues"
},
"homepage": "https://github.com/IonicaBizau/scrape-it#readme",
"blah": {
  "h_img": "https://i.imgur.com/j3Z0rbN.png",
  "cli": "scrape-it-cli",
  "description": "Want to save time or not using Node.js? Try our [hosted API](https://scrape-it.saasify.sh).",
  "installation": [
    {
      "h2": "FAQ"
    },
    {
      "p": "Here are some frequent questions and their answers."
    },
    {
      "h3": "1. How to parse scrape pages?"
    },
    {
      "p": "`scrape-it` has only a simple request module for making requests. That means you cannot directly parse ajax pages with it, but in general you will have those scenarios:"
    },
    {
      "ol": [
        "**The ajax response is in JSON format.** In this case, you can make the request directly, without needing a scraping library.",
        "**The ajax response gives you HTML back.** Instead of calling the main website (e.g. example.com), pass to `scrape-it` the ajax url (e.g. `example.com/api/that-endpoint`) and you will you will be able to parse the response",
        "**The ajax request is so complicated that you don't want to reverse-engineer it.** In this case, use a headless browser (e.g. Google Chrome, Electron, PhantomJS) to load the content and then use the `.scrapeHTML` method from scrape it once you get the HTML loaded on the page."
      ]
    },
    {
      "h3": "2. Crawling"
    },
    {
      "p": "There is no fancy way to crawl pages with `scrape-it`. For simple scenarios, you can parse the list of urls from the initial page and then, using Promises, parse each page. Also, you can use a different crawler to download the website and then use the `.scrapeHTML` method to scrape the local files."
    },
    {
      "h3": "3. Local files"
    },
    {
      "p": "Use the `.scrapeHTML` to parse the HTML read from the local files using `fs.readFile`."
    }
  ]
},
"dependencies": {
  "assured": "^1.0.14",
  "cheerio-req": "^1.2.3",
  "scrape-it-core": "^1.0.0",
  "typpy": "^2.3.11"
},
"devDependencies": {
  "lien": "^3.3.0",
  "tester": "^1.4.4"
},
"files": [
  "bin/",
  "app/",
  "lib/",
  "dist/",
  "src/",
  "scripts/",
  "resources/",
  "menu/",
  "cli.js",
  "index.js",
  "bloggify.js",
  "bloggify.json",
  "bloggify/"
]
96}

1	`{`
2	`"name": "scrape-it",`
3	`"description": "A Node.js scraper for humans.",`
4	`"keywords": [`
5	`"scrape",`
6	`"it",`
7	`"a",`
8	`"scraping",`
9	`"module",`
10	`"for",`
11	`"humans"`
12	`],`
13	`"license": "MIT",`
14	`"version": "5.3.0",`
15	`"main": "lib/index.js",`
16	`"types": "lib/index.d.ts",`
17	`"scripts": {`
18	`"test": "node test"`
19	`},`
20	`"author": "Ionică Bizău <bizauionica@gmail.com> (https://ionicabizau.net)",`
21	`"contributors": [`
22	`"ComFreek <comfreek@outlook.com> (https://github.com/ComFreek)",`
23	`"Jim Buck <jim@jimmyboh.com> (https://github.com/JimmyBoh)"`
24	`],`
25	`"repository": {`
26	`"type": "git",`
27	`"url": "git+ssh://git@github.com/IonicaBizau/scrape-it.git"`
28	`},`
29	`"bugs": {`
30	`"url": "https://github.com/IonicaBizau/scrape-it/issues"`
31	`},`
32	`"homepage": "https://github.com/IonicaBizau/scrape-it#readme",`
33	`"blah": {`
34	`"h_img": "https://i.imgur.com/j3Z0rbN.png",`
35	`"cli": "scrape-it-cli",`
36	`"description": "Want to save time or not using Node.js? Try our [hosted API](https://scrape-it.saasify.sh).",`
37	`"installation": [`
38	`{`
39	`"h2": "FAQ"`
40	`},`
41	`{`
42	`"p": "Here are some frequent questions and their answers."`
43	`},`
44	`{`
45	`"h3": "1. How to parse scrape pages?"`
46	`},`
47	`{`
48	"p": "`scrape-it` has only a simple request module for making requests. That means you cannot directly parse ajax pages with it, but in general you will have those scenarios:"
49	`},`
50	`{`
51	`"ol": [`
52	`"The ajax response is in JSON format. In this case, you can make the request directly, without needing a scraping library.",`
53	"The ajax response gives you HTML back. Instead of calling the main website (e.g. example.com), pass to `scrape-it` the ajax url (e.g. `example.com/api/that-endpoint`) and you will you will be able to parse the response",
54	"The ajax request is so complicated that you don't want to reverse-engineer it. In this case, use a headless browser (e.g. Google Chrome, Electron, PhantomJS) to load the content and then use the `.scrapeHTML` method from scrape it once you get the HTML loaded on the page."
55	`]`
56	`},`
57	`{`
58	`"h3": "2. Crawling"`
59	`},`
60	`{`
61	"p": "There is no fancy way to crawl pages with `scrape-it`. For simple scenarios, you can parse the list of urls from the initial page and then, using Promises, parse each page. Also, you can use a different crawler to download the website and then use the `.scrapeHTML` method to scrape the local files."
62	`},`
63	`{`
64	`"h3": "3. Local files"`
65	`},`
66	`{`
67	"p": "Use the `.scrapeHTML` to parse the HTML read from the local files using `fs.readFile`."
68	`}`
69	`]`
70	`},`
71	`"dependencies": {`
72	`"assured": "^1.0.14",`
73	`"cheerio-req": "^1.2.3",`
74	`"scrape-it-core": "^1.0.0",`
75	`"typpy": "^2.3.11"`
76	`},`
77	`"devDependencies": {`
78	`"lien": "^3.3.0",`
79	`"tester": "^1.4.4"`
80	`},`
81	`"files": [`
82	`"bin/",`
83	`"app/",`
84	`"lib/",`
85	`"dist/",`
86	`"src/",`
87	`"scripts/",`
88	`"resources/",`
89	`"menu/",`
90	`"cli.js",`
91	`"index.js",`
92	`"bloggify.js",`
93	`"bloggify.json",`
94	`"bloggify/"`
95	`]`
96	`}`