UNPKG

17.3 kBMarkdownView Raw
1Prerender
2===========================
3
4Prerender is a node server that uses Headless Chrome to render HTML, screenshots, PDFs, and HAR files out of any web page. The Prerender server listens for an http request, takes the URL and loads it in Headless Chrome, waits for the page to finish loading by waiting for the network to be idle, and then returns your content.
5
6##### The quickest way to run your own prerender server:
7
8```bash
9$ npm install prerender
10```
11##### server.js
12```js
13const prerender = require('prerender');
14const server = prerender();
15server.start();
16```
17##### test it:
18```bash
19curl http://localhost:3000/render?url=https://www.example.com/
20```
21
22## Use Cases
23The Prerender server can be used in conjunction with [our Prerender.io middleware](#middleware) in order to serve the prerendered HTML of your javascript website to search engines (Google, Bing, etc) and social networks (Facebook, Twitter, etc) for SEO. We run the Prerender server at scale for SEO needs at [https://prerender.io/](https://prerender.io/).
24
25The Prerender server can be used on its own to crawl any web page and pull down the content for your own parsing needs. We host the Prerender server for your own crawling needs at [https://prerender.com/](https://prerender.com/).
26
27
28Prerender differs from Google Puppeteer in that Prerender is a web server that takes in URLs and loads them in parallel in a new tab in Headless Chrome. Puppeteer is an API for interacting with Chrome, but you still have to write that interaction yourself. With Prerender, you don't have to write any code to launch Chrome, load pages, wait for the page to load, or pull the content off of the page. The Prerender server handles all of that for you so you can focus on more important things!
29
30Below you will find documentation for our Prerender.io service (website SEO) and our Prerender.com service (web crawling).
31
32[Click here to jump to Prerender.io documentation](#prerenderio)
33
34[Click here to jump to Prerender.com documentation](#prerendercom)
35
36
37### <a id='prerenderio'></a>
38# Prerender.io
39###### For serving your prerendered HTML to crawlers for SEO
40
41Prerender solves SEO by serving prerendered HTML to Google and other search engines. It's easy:
42- Just install the appropriate middleware for your app (or check out the source code and build your own)
43- Make sure search engines have a way of discovering your pages (e.g. sitemap.xml and links from other parts of your site or from around the web)
44- That's it! Perfect SEO on javascript pages.
45
46
47### <a id='middleware'></a>
48## Middleware
49
50This is a list of middleware available to use with the prerender service:
51
52#### Official middleware
53
54###### Javascript
55* [prerender-node](https://github.com/prerender/prerender-node) (Express)
56
57###### Ruby
58* [prerender_rails](https://github.com/prerender/prerender_rails) (Rails)
59
60###### Apache
61* [.htaccess](https://gist.github.com/thoop/8072354)
62
63###### Nginx
64* [nginx.conf](https://gist.github.com/thoop/8165802)
65
66#### Community middleware
67
68###### PHP
69* [zfr-prerender](https://github.com/zf-fr/zfr-prerender) (Zend Framework 2)
70* [YuccaPrerenderBundle](https://github.com/rjanot/YuccaPrerenderBundle) (Symfony 2)
71* [Laravel Prerender](https://github.com/JeroenNoten/Laravel-Prerender) (Laravel)
72
73###### Java
74* [prerender-java](https://github.com/greengerong/prerender-java)
75
76###### Go
77* [goprerender](https://github.com/tampajohn/goprerender)
78
79###### Grails
80* [grails-prerender](https://github.com/tuler/grails-prerender)
81
82###### Nginx
83* [Reverse Proxy Example](https://gist.github.com/Stanback/6998085)
84
85###### Apache
86* [.htaccess](https://gist.github.com/Stanback/7028309)
87
88Request more middleware for a different framework in this [issue](https://github.com/prerender/prerender/issues/12).
89
90
91
92## How it works
93This is a simple service that only takes a url and returns the rendered HTML (with all script tags removed).
94
95Note: you should proxy the request through your server (using middleware) so that any relative links to CSS/images/etc still work.
96
97`GET https://service.prerender.io/https://www.google.com`
98
99`GET https://service.prerender.io/https://www.google.com/search?q=angular`
100
101
102## Running locally
103If you are trying to test Prerender with your website on localhost, you'll have to run the Prerender server locally so that Prerender can access your local dev website.
104
105If you are running the prerender service locally. Make sure you set your middleware to point to your local Prerender server with:
106
107`export PRERENDER_SERVICE_URL=http://localhost:3000`
108
109 $ git clone https://github.com/prerender/prerender.git
110 $ cd prerender
111 $ npm install
112 $ node server.js
113
114Prerender will now be running on http://localhost:3000. If you wanted to start a web app that ran on say, http://localhost:8000, you can now visit the URL http://localhost:3000/http://localhost:8000 to see how your app would render in Prerender.
115
116To test how your website will render through Prerender using the middleware, you'll want to visit the URL http://localhost:8000?_escaped_fragment_=
117
118That should send a request to the Prerender server and display the prerendered page through your website. If you View Source of that page, you should see the HTML with all of the `<script>` tags removed.
119
120Keep in mind you will see 504s for relative URLs when accessing http://localhost:3000/http://localhost:8000 because the actual domain on that request is your prerender server. This isn't really an issue because once you proxy that request through the middleware, then the domain will be your website and those requests won't be sent to the prerender server. For instance if you want to see your relative URLS working visit `http://localhost:8000?_escaped_fragment_=`
121
122
123# Customization
124
125You can clone this repo and run `server.js` OR include prerender in your project with `npm install prerender --save` to create an express-like server with custom plugins.
126
127
128## Options
129
130### chromeLocation
131```
132var prerender = require('./lib');
133
134var server = prerender({
135 chromeLocation: '/Applications/Google\ Chrome\ Canary.app/Contents/MacOS/Google\ Chrome\ Canary'
136});
137
138server.start();
139```
140
141Uses a chrome install at a certain location. Prerender does not download Chrome so you will want to make sure Chrome is installed on your server already. The Prerender server checks a few known locations for Chrome but this lets you override that.
142
143`Default: null`
144
145
146### logRequests
147```
148var prerender = require('./lib');
149
150var server = prerender({
151 logRequests: true
152});
153
154server.start();
155```
156
157Causes the Prerender server to print out every request made represented by a `+` and every response received represented by a `-`. Lets you analyze page load times.
158
159`Default: false`
160
161### captureConsoleLog
162```
163var prerender = require('./lib');
164
165var server = prerender({
166 captureConsoleLog: true
167});
168
169server.start();
170```
171
172Prerender server will store all console logs into `pageLoadInfo.logEntries` for further analytics.
173
174`Default: false`
175
176### pageDoneCheckInterval
177```
178var prerender = require('./lib');
179
180var server = prerender({
181 pageDoneCheckInterval: 1000
182});
183
184server.start();
185```
186
187Number of milliseconds between the interval of checking whether the page is done loading or not. You can also set the environment variable of `PAGE_DONE_CHECK_INTERVAL` instead of passing in the `pageDoneCheckInterval` parameter.
188
189`Default: 500`
190
191### pageLoadTimeout
192```
193var prerender = require('./lib');
194
195var server = prerender({
196 pageLoadTimeout: 20 * 1000
197});
198
199server.start();
200```
201
202Maximum number of milliseconds to wait while downloading the page, waiting for all pending requests/ajax calls to complete before timing out and continuing on. Time out condition does not cause an error, it just returns the HTML on the page at that moment. You can also set the environment variable of `PAGE_LOAD_TIMEOUT` instead of passing in the `pageLoadTimeout` parameter.
203
204`Default: 20000`
205
206### waitAfterLastRequest
207```
208var prerender = require('./lib');
209
210var server = prerender({
211 waitAfterLastRequest: 500
212});
213
214server.start();
215```
216
217Number of milliseconds to wait after the number of requests/ajax calls in flight reaches zero. HTML is pulled off of the page at this point. You can also set the environment variable of `WAIT_AFTER_LAST_REQUEST` instead of passing in the `waitAfterLastRequest` parameter.
218
219`Default: 500`
220
221### followRedirects
222```
223var prerender = require('./lib');
224
225var server = prerender({
226 followRedirects: false
227});
228
229server.start();
230```
231
232Whether Chrome follows a redirect on the first request if a redirect is encountered. Normally, for SEO purposes, you do not want to follow redirects. Instead, you want the Prerender server to return the redirect to the crawlers so they can update their index. Don't set this to `true` unless you know what you are doing. You can also set the environment variable of `FOLLOW_REDIRECTS` instead of passing in the `followRedirects` parameter.
233
234`Default: false`
235
236## Plugins
237
238We use a plugin system in the same way that Connect and Express use middleware. Our plugins are a little different and we don't want to confuse the prerender plugins with the [prerender middleware](#middleware), so we opted to call them "plugins".
239
240Plugins are in the `lib/plugins` directory, and add functionality to the prerender service.
241
242Each plugin can implement any of the plugin methods:
243
244#### `init()`
245
246#### `requestReceived(req, res, next)`
247
248#### `tabCreated(req, res, next)`
249
250#### `pageLoaded(req, res, next)`
251
252#### `beforeSend(req, res, next)`
253
254## Available plugins
255
256You can use any of these plugins by modifying the `server.js` file
257
258### basicAuth
259
260If you want to only allow access to your Prerender server from authorized parties, enable the basic auth plugin.
261
262You will need to add the `BASIC_AUTH_USERNAME` and `BASIC_AUTH_PASSWORD` environment variables.
263```
264export BASIC_AUTH_USERNAME=prerender
265export BASIC_AUTH_PASSWORD=test
266```
267
268Then make sure to pass the basic authentication headers (password base64 encoded).
269
270```
271curl -u prerender:wrong http://localhost:3000/http://example.com -> 401
272curl -u prerender:test http://localhost:3000/http://example.com -> 200
273```
274
275### removeScriptTags
276
277We remove script tags because we don't want any framework specific routing/rendering to happen on the rendered HTML once it's executed by the crawler. The crawlers may not execute javascript, but we'd rather be safe than have something get screwed up.
278
279For example, if you rendered the HTML of an angular page but left the angular scripts in there, your browser would try to execute the angular routing and possibly end up clearing out the HTML of the page.
280
281This plugin implements the `pageLoaded` function, so make sure any caching plugins run after this plugin is run to ensure you are caching pages with javascript removed.
282
283### httpHeaders
284
285If your Javascript routing has a catch-all for things like 404's, you can tell the prerender service to serve a 404 to google instead of a 200. This way, google won't index your 404's.
286
287Add these tags in the `<head>` of your page if you want to serve soft http headers. Note: Prerender will still send the HTML of the page. This just modifies the status code and headers being sent.
288
289Example: telling prerender to server this page as a 404
290```html
291<meta name="prerender-status-code" content="404">
292```
293
294Example: telling prerender to serve this page as a 302 redirect
295```html
296<meta name="prerender-status-code" content="302">
297<meta name="prerender-header" content="Location: https://www.google.com">
298```
299
300### whitelist
301
302If you only want to allow requests to a certain domain, use this plugin to cause a 404 for any other domains.
303
304You can add the whitelisted domains to the plugin itself, or use the `ALLOWED_DOMAINS` environment variable.
305
306`export ALLOWED_DOMAINS=www.prerender.io,prerender.io`
307
308### blacklist
309
310If you want to disallow requests to a certain domain, use this plugin to cause a 404 for the domains.
311
312You can add the blacklisted domains to the plugin itself, or use the `BLACKLISTED_DOMAINS` environment variable.
313
314`export BLACKLISTED_DOMAINS=yahoo.com,www.google.com`
315
316### in-memory-cache
317
318Caches pages in memory. Available at [prerender-memory-cache](https://github.com/prerender/prerender-memory-cache)
319
320### s3-html-cache
321
322Caches pages in S3. Available at [coming soon](https://github.com/prerender/prerender)
323
324--------------------
325
326### <a id='prerendercom'></a>
327# Prerender.com
328###### For doing your own web crawling
329
330When running your Prerender server in the web crawling context, we have a separate "API" for the server that is more complex to let you do different things like:
331- get HTML from a web page
332- get screenshots (viewport or full screen) from a web page
333- get PDFS from a web page
334- get HAR files from a web page
335- execute your own javascript and return json along with the HTML
336
337If you make an http request to the `/render` endpoint, you can pass any of the following options. You can pass any of these options as query parameters on a GET request or as JSON properties on a POST request. We recommend using a POST request but we will display GET requests here for brevity. Click here to see [how to send the POST request](#getvspost).
338
339These examples assume you have the server running locally on port 3000 but you can also use our hosted service at [https://prerender.com/](https://prerender.com/).
340
341#### url
342
343The URL you want to load. Returns HTML by default.
344
345```
346http://localhost:3000/render?url=https://www.example.com/
347```
348
349#### renderType
350
351The type of content you want to pull off the page.
352
353```
354http://localhost:3000/render?renderType=html&url=https://www.example.com/
355```
356
357Options are `html`, `jpeg`, `png`, `pdf`, `har`.
358
359#### userAgent
360
361Send your own custom user agent when Chrome loads the page.
362
363```
364http://localhost:3000/render?userAgent=ExampleCrawlerUserAgent&url=https://www.example.com/
365```
366
367#### fullpage
368
369Whether you want your screenshot to be the entire height of the document or just the viewport.
370
371```
372http://localhost:3000/render?fullpage=true&renderType=html&url=https://www.example.com/
373```
374
375Don't include `fullpage` and we'll just screenshot the normal browser viewport. Include `fullpage=true` for a full page screenshot.
376
377#### width
378
379Screen width. Lets you emulate different screen sizes.
380
381```
382http://localhost:3000/render?width=990&url=https://www.example.com/
383```
384
385Default is `1440`.
386
387#### height
388
389Screen height. Lets you emulate different screen sizes.
390
391```
392http://localhost:3000/render?height=100&url=https://www.example.com/
393```
394
395Default is `718`.
396
397#### followRedirects
398
399By default, we don't follow 301 redirects on the initial request so you can be alerted of any changes in URLs to update your crawling data. If you want us to follow redirects instead, you can pass this parameter.
400
401```
402http://localhost:3000/render?followRedirects=true&url=https://www.example.com/
403```
404
405Default is `false`.
406
407#### javascript
408
409Execute javascript to modify the page before we snapshot your content. If you set `window.prerenderData` to an object, we will pull the object off the page and return it to you. Great for parsing extra data from a page in javascript.
410
411```
412http://localhost:3000/render?javascript=window.prerenderData=window.angular.version&url=https://www.example.com/
413```
414
415When using this parameter and `window.prerenderData`, the response from Prerender will look like:
416```
417{
418 prerenderData: { example: 'data' },
419 content: '<html><body></body></html>'
420}
421```
422
423If you don't set `window.prerenderData`, the response won't be JSON. The response will just be the normal HTML.
424
425### <a id='getvspost'></a>
426### Get vs Post
427
428You can send all options as a query parameter on a GET request or as a JSON property on a POST request. We recommend using the POST request when possible to avoid any issues with URL encoding of GET request query strings. Here's a few pseudo examples:
429
430```
431POST http://localhost:3000/render
432{
433 renderType: 'html',
434 javascript: 'window.prerenderData = window.angular.version',
435 url: 'https://www.example.com/'
436}
437```
438
439```
440POST http://localhost:3000/render
441{
442 renderType: 'jpeg',
443 fullpage: 'true',
444 url: 'https://www.example.com/'
445}
446```
447
448Check out our [full documentation](https://prerender.com/documentation)
449
450
451## License
452
453The MIT License (MIT)
454
455Copyright (c) 2013 Todd Hooper &lt;todd@prerender.io&gt;
456
457Permission is hereby granted, free of charge, to any person obtaining a copy
458of this software and associated documentation files (the "Software"), to deal
459in the Software without restriction, including without limitation the rights
460to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
461copies of the Software, and to permit persons to whom the Software is
462furnished to do so, subject to the following conditions:
463
464The above copyright notice and this permission notice shall be included in
465all copies or substantial portions of the Software.
466
467THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
468IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
469FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
470AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
471LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
472OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
473THE SOFTWARE.