UNPKG

17.2 kBMarkdownView Raw
1Prerender [![Stories in Ready](https://badge.waffle.io/prerender/prerender.png?label=ready&title=Ready)](https://waffle.io/prerender/prerender)
2===========================
3
4Prerender is a node server that uses Headless Chrome to render HTML, screenshots, PDFs, and HAR files out of any web page. The Prerender server listens for an http request, takes the URL and loads it in Headless Chrome, waits for the page to finish loading by waiting for the network to be idle, and then returns your content.
5
6Looking for our PhantomJS Prerender server? [Go to our phantomjs branch](https://github.com/prerender/prerender/tree/phantomjs)
7
8##### The quickest way to run your own prerender server:
9
10```bash
11$ npm install prerender
12```
13##### server.js
14```js
15const prerender = require('prerender');
16const server = prerender();
17server.start();
18```
19##### test it:
20```bash
21curl http://localhost:3000/render?url=https://www.example.com/
22```
23
24## Use Cases
25The Prerender server can be used in conjunction with [our Prerender.io middleware](#middleware) in order to serve the prerendered HTML of your javascript website to search engines (Google, Bing, etc) and social networks (Facebook, Twitter, etc) for SEO. We run the Prerender server at scale for SEO needs at [https://prerender.io/](https://prerender.io/).
26
27The Prerender server can be used on its own to crawl any web page and pull down the content for your own parsing needs. We host the Prerender server for your own crawling needs at [https://prerender.com/](https://prerender.com/).
28
29
30Prerender differs from Google Puppeteer in that Prerender is a web server that takes in URLs and loads them in parallel in a new tab in Headless Chrome. Puppeteer is an API for interacting with Chrome, but you still have to write that interaction yourself. With Prerender, you don't have to write any code to launch Chrome, load pages, wait for the page to load, or pull the content off of the page. The Prerender server handles all of that for you so you can focus on more important things!
31
32Below you will find documentation for our Prerender.io service (website SEO) and our Prerender.com service (web crawling).
33
34[Click here to jump to Prerender.io documentation](#prerenderio)
35
36[Click here to jump to Prerender.com documentation](#prerendercom)
37
38
39### <a id='prerenderio'></a>
40# Prerender.io
41###### For serving your prerendered HTML to crawlers for SEO
42
43Prerender adheres to Google's `_escaped_fragment_` proposal, which we recommend you use. It's easy:
44- Just add &lt;meta name="fragment" content="!"> to the &lt;head> of all of your pages
45- If you use hash urls (#), change them to the hash-bang (#!)
46- That's it! Perfect SEO on javascript pages.
47
48
49### <a id='middleware'></a>
50## Middleware
51
52This is a list of middleware available to use with the prerender service:
53
54#### Official middleware
55
56###### Javascript
57* [prerender-node](https://github.com/prerender/prerender-node) (Express)
58
59###### Ruby
60* [prerender_rails](https://github.com/prerender/prerender_rails) (Rails)
61
62###### Apache
63* [.htaccess](https://gist.github.com/thoop/8072354)
64
65###### Nginx
66* [nginx.conf](https://gist.github.com/thoop/8165802)
67
68#### Community middleware
69
70###### PHP
71* [zfr-prerender](https://github.com/zf-fr/zfr-prerender) (Zend Framework 2)
72* [YuccaPrerenderBundle](https://github.com/rjanot/YuccaPrerenderBundle) (Symfony 2)
73* [Laravel Prerender](https://github.com/JeroenNoten/Laravel-Prerender) (Laravel)
74
75###### Java
76* [prerender-java](https://github.com/greengerong/prerender-java)
77
78###### Go
79* [goprerender](https://github.com/tampajohn/goprerender)
80
81###### Grails
82* [grails-prerender](https://github.com/tuler/grails-prerender)
83
84###### Nginx
85* [Reverse Proxy Example](https://gist.github.com/Stanback/6998085)
86
87###### Apache
88* [.htaccess](https://gist.github.com/Stanback/7028309)
89
90Request more middleware for a different framework in this [issue](https://github.com/prerender/prerender/issues/12).
91
92
93
94## How it works
95This is a simple service that only takes a url and returns the rendered HTML (with all script tags removed).
96
97Note: you should proxy the request through your server (using middleware) so that any relative links to CSS/images/etc still work.
98
99`GET https://service.prerender.io/https://www.google.com`
100
101`GET https://service.prerender.io/https://www.google.com/search?q=angular`
102
103
104## Running locally
105If you are trying to test Prerender with your website on localhost, you'll have to run the Prerender server locally so that Prerender can access your local dev website.
106
107If you are running the prerender service locally. Make sure you set your middleware to point to your local Prerender server with:
108
109`export PRERENDER_SERVICE_URL=http://localhost:3000`
110
111 $ git clone https://github.com/prerender/prerender.git
112 $ cd prerender
113 $ npm install
114 $ node server.js
115
116Prerender will now be running on http://localhost:3000. If you wanted to start a web app that ran on say, http://localhost:8000, you can now visit the URL http://localhost:3000/http://localhost:8000 to see how your app would render in Prerender.
117
118To test how your website will render through Prerender using the middleware, you'll want to visit the URL http://localhost:8000?_escaped_fragment_=
119
120That should send a request to the Prerender server and display the prerendered page through your website. If you View Source of that page, you should see the HTML with all of the `<script>` tags removed.
121
122Keep in mind you will see 504s for relative URLs when accessing http://localhost:3000/http://localhost:8000 because the actual domain on that request is your prerender server. This isn't really an issue because once you proxy that request through the middleware, then the domain will be your website and those requests won't be sent to the prerender server. For instance if you want to see your relative URLS working visit `http://localhost:8000?_escaped_fragment_=`
123
124
125# Customization
126
127You can clone this repo and run `server.js` OR include prerender in your project with `npm install prerender --save` to create an express-like server with custom plugins.
128
129
130## Options
131
132### chromeLocation
133```
134var prerender = require('./lib');
135
136var server = prerender({
137 chromeLocation: '/Applications/Google\ Chrome\ Canary.app/Contents/MacOS/Google\ Chrome\ Canary'
138});
139
140server.start();
141```
142
143Uses a chrome install at a certain location. Prerender does not download Chrome so you will want to make sure Chrome is installed on your server already. The Prerender server checks a few known locations for Chrome but this lets you override that.
144
145`Default: null`
146
147
148### logRequests
149```
150var prerender = require('./lib');
151
152var server = prerender({
153 logRequests: true
154});
155
156server.start();
157```
158
159Causes the Prerender server to print out every request made represented by a `+` and every response received represented by a `-`. Lets you analyze page load times.
160
161`Default: false`
162
163### pageDoneCheckInterval
164```
165var prerender = require('./lib');
166
167var server = prerender({
168 pageDoneCheckInterval: 1000
169});
170
171server.start();
172```
173
174Number of milliseconds between the interval of checking whether the page is done loading or not. You can also set the environment variable of `PAGE_DONE_CHECK_INTERVAL` instead of passing in the `pageDoneCheckInterval` parameter.
175
176`Default: 500`
177
178### pageLoadTimeout
179```
180var prerender = require('./lib');
181
182var server = prerender({
183 pageLoadTimeout: 20 * 1000
184});
185
186server.start();
187```
188
189Maximum number of milliseconds to wait while downloading the page, waiting for all pending requests/ajax calls to complete before timing out and continuing on. Time out condition does not cause an error, it just returns the HTML on the page at that moment. You can also set the environment variable of `PAGE_LOAD_TIMEOUT` instead of passing in the `pageLoadTimeout` parameter.
190
191`Default: 20000`
192
193### waitAfterLastRequest
194```
195var prerender = require('./lib');
196
197var server = prerender({
198 waitAfterLastRequest: 500
199});
200
201server.start();
202```
203
204Number of milliseconds to wait after the number of requests/ajax calls in flight reaches zero. HTML is pulled off of the page at this point. You can also set the environment variable of `WAIT_AFTER_LAST_REQUEST` instead of passing in the `waitAfterLastRequest` parameter.
205
206`Default: 500`
207
208### followRedirects
209```
210var prerender = require('./lib');
211
212var server = prerender({
213 followRedirects: false
214});
215
216server.start();
217```
218
219Whether Chrome follows a redirect on the first request if a redirect is encountered. Normally, for SEO purposes, you do not want to follow redirects. Instead, you want the Prerender server to return the redirect to the crawlers so they can update their index. Don't set this to `true` unless you know what you are doing. You can also set the environment variable of `FOLLOW_REDIRECTS` instead of passing in the `followRedirects` parameter.
220
221`Default: false`
222
223## Plugins
224
225We use a plugin system in the same way that Connect and Express use middleware. Our plugins are a little different and we don't want to confuse the prerender plugins with the [prerender middleware](#middleware), so we opted to call them "plugins".
226
227Plugins are in the `lib/plugins` directory, and add functionality to the prerender service.
228
229Each plugin can implement any of the plugin methods:
230
231#### `init()`
232
233#### `requestReceived(req, res, next)`
234
235#### `tabCreated(req, res, next)`
236
237#### `pageLoaded(req, res, next)`
238
239#### `beforeSend(req, res, next)`
240
241## Available plugins
242
243You can use any of these plugins by modifying the `server.js` file
244
245### basicAuth
246
247If you want to only allow access to your Prerender server from authorized parties, enable the basic auth plugin.
248
249You will need to add the `BASIC_AUTH_USERNAME` and `BASIC_AUTH_PASSWORD` environment variables.
250```
251export BASIC_AUTH_USERNAME=prerender
252export BASIC_AUTH_PASSWORD=test
253```
254
255Then make sure to pass the basic authentication headers (password base64 encoded).
256
257```
258curl -u prerender:wrong http://localhost:3000/http://example.com -> 401
259curl -u prerender:test http://localhost:3000/http://example.com -> 200
260```
261
262### removeScriptTags
263
264We remove script tags because we don't want any framework specific routing/rendering to happen on the rendered HTML once it's executed by the crawler. The crawlers may not execute javascript, but we'd rather be safe than have something get screwed up.
265
266For example, if you rendered the HTML of an angular page but left the angular scripts in there, your browser would try to execute the angular routing and possibly end up clearing out the HTML of the page.
267
268This plugin implements the `pageLoaded` function, so make sure any caching plugins run after this plugin is run to ensure you are caching pages with javascript removed.
269
270### httpHeaders
271
272If your Javascript routing has a catch-all for things like 404's, you can tell the prerender service to serve a 404 to google instead of a 200. This way, google won't index your 404's.
273
274Add these tags in the `<head>` of your page if you want to serve soft http headers. Note: Prerender will still send the HTML of the page. This just modifies the status code and headers being sent.
275
276Example: telling prerender to server this page as a 404
277```html
278<meta name="prerender-status-code" content="404">
279```
280
281Example: telling prerender to serve this page as a 302 redirect
282```html
283<meta name="prerender-status-code" content="302">
284<meta name="prerender-header" content="Location: https://www.google.com">
285```
286
287### whitelist
288
289If you only want to allow requests to a certain domain, use this plugin to cause a 404 for any other domains.
290
291You can add the whitelisted domains to the plugin itself, or use the `ALLOWED_DOMAINS` environment variable.
292
293`export ALLOWED_DOMAINS=www.prerender.io,prerender.io`
294
295### blacklist
296
297If you want to disallow requests to a certain domain, use this plugin to cause a 404 for the domains.
298
299You can add the blacklisted domains to the plugin itself, or use the `BLACKLISTED_DOMAINS` environment variable.
300
301`export BLACKLISTED_DOMAINS=yahoo.com,www.google.com`
302
303### in-memory-cache
304
305Caches pages in memory. Available at [prerender-memory-cache](https://github.com/prerender/prerender-memory-cache)
306
307### s3-html-cache
308
309Caches pages in S3. Available at [coming soon](https://github.com/prerender/prerender)
310
311--------------------
312
313### <a id='prerendercom'></a>
314# Prerender.com
315###### For doing your own web crawling
316
317When running your Prerender server in the web crawling context, we have a separate "API" for the server that is more complex to let you do different things like:
318- get HTML from a web page
319- get screenshots (viewport or full screen) from a web page
320- get PDFS from a web page
321- get HAR files from a web page
322- execute your own javascript and return json along with the HTML
323
324If you make an http request to the `/render` endpoint, you can pass any of the following options. You can pass any of these options as query parameters on a GET request or as JSON properties on a POST request. We recommend using a POST request but we will display GET requests here for brevity. Click here to see [how to send the POST request](#getvspost).
325
326These examples assume you have the server running locally on port 3000 but you can also use our hosted service at [https://prerender.com/](https://prerender.com/).
327
328#### url
329
330The URL you want to load. Returns HTML by default.
331
332```
333http://localhost:3000/render?url=https://www.example.com/
334```
335
336#### renderType
337
338The type of content you want to pull off the page.
339
340```
341http://localhost:3000/render?renderType=html&url=https://www.example.com/
342```
343
344Options are `html`, `jpeg`, `png`, `pdf`, `har`.
345
346#### userAgent
347
348Send your own custom user agent when Chrome loads the page.
349
350```
351http://localhost:3000/render?userAgent=ExampleCrawlerUserAgent&url=https://www.example.com/
352```
353
354#### fullpage
355
356Whether you want your screenshot to be the entire height of the document or just the viewport.
357
358```
359http://localhost:3000/render?fullpage=true&renderType=html&url=https://www.example.com/
360```
361
362Don't include `fullpage` and we'll just screenshot the normal browser viewport. Include `fullpage=true` for a full page screenshot.
363
364#### width
365
366Screen width. Lets you emulate different screen sizes.
367
368```
369http://localhost:3000/render?width=990&url=https://www.example.com/
370```
371
372Default is `1440`.
373
374#### height
375
376Screen height. Lets you emulate different screen sizes.
377
378```
379http://localhost:3000/render?height=100&url=https://www.example.com/
380```
381
382Default is `718`.
383
384#### followRedirects
385
386By default, we don't follow 301 redirects on the initial request so you can be alerted of any changes in URLs to update your crawling data. If you want us to follow redirects instead, you can pass this parameter.
387
388```
389http://localhost:3000/render?followRedirects=true&url=https://www.example.com/
390```
391
392Default is `false`.
393
394#### javascript
395
396Execute javascript to modify the page before we snapshot your content. If you set `window.prerenderData` to an object, we will pull the object off the page and return it to you. Great for parsing extra data from a page in javascript.
397
398```
399http://localhost:3000/render?javascript=window.prerenderData=window.angular.version&url=https://www.example.com/
400```
401
402When using this parameter and `window.prerenderData`, the response from Prerender will look like:
403```
404{
405 prerenderData: { example: 'data' },
406 content: '<html><body></body></html>'
407}
408```
409
410If you don't set `window.prerenderData`, the response won't be JSON. The response will just be the normal HTML.
411
412### <a id='getvspost'></a>
413### Get vs Post
414
415You can send all options as a query parameter on a GET request or as a JSON property on a POST request. We recommend using the POST request when possible to avoid any issues with URL encoding of GET request query strings. Here's a few pseudo examples:
416
417```
418POST http://localhost:3000/render
419{
420 renderType: 'html',
421 javascript: 'window.prerenderData = window.angular.version',
422 url: 'https://www.example.com/'
423}
424```
425
426```
427POST http://localhost:3000/render
428{
429 renderType: 'jpeg',
430 fullpage: 'true',
431 url: 'https://www.example.com/'
432}
433```
434
435Check out our [full documentation](https://prerender.com/documentation)
436
437
438## License
439
440The MIT License (MIT)
441
442Copyright (c) 2013 Todd Hooper &lt;todd@prerender.io&gt;
443
444Permission is hereby granted, free of charge, to any person obtaining a copy
445of this software and associated documentation files (the "Software"), to deal
446in the Software without restriction, including without limitation the rights
447to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
448copies of the Software, and to permit persons to whom the Software is
449furnished to do so, subject to the following conditions:
450
451The above copyright notice and this permission notice shall be included in
452all copies or substantial portions of the Software.
453
454THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
455IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
456FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
457AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
458LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
459OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
460THE SOFTWARE.