UNPKG

14.4 kBMarkdownView Raw
1# CouchBackup
2
3[![npm (scoped)](https://img.shields.io/npm/v/@cloudant/couchbackup.svg?colorB=0000ff)](https://www.npmjs.com/package/@cloudant/couchbackup)
4[![npm (scoped with tag)](https://img.shields.io/npm/v/@cloudant/couchbackup/snapshot.svg?colorB=666699)](https://www.npmjs.com/package/@cloudant/couchbackup)
5[![Build Status](https://travis-ci.org/cloudant/couchbackup.svg?branch=master)](https://travis-ci.org/cloudant/couchbackup)
6[![Greenkeeper badge](https://badges.greenkeeper.io/cloudant/couchbackup.svg)](https://greenkeeper.io/)
7
8```
9 _____ _ ______ _
10/ __ \ | | | ___ \ | |
11| / \/ ___ _ _ ___| |__ | |_/ / __ _ ___| | ___ _ _ __
12| | / _ \| | | |/ __| '_ \| ___ \/ _` |/ __| |/ / | | | '_ \
13| \__/\ (_) | |_| | (__| | | | |_/ / (_| | (__| <| |_| | |_) |
14 \____/\___/ \__,_|\___|_| |_\____/ \__,_|\___|_|\_\\__,_| .__/
15 | |
16 |_|
17```
18
19CouchBackup is a command-line utility that allows a Cloudant or CouchDB database to be backed up to a text file.
20It comes with a companion command-line utility that can restore the backed up data.
21
22**N.B.**
23
24* **couchbackup does not do CouchDB replication as such, it simply streams through a database's `_changes` feed, and uses `POST /db/_bulk_get` to fetch the documents, storing the documents it finds on disk.**
25* **couchbackup does not support backing up or restoring databases containing documents with attachments. It is recommended to store attachments directly in an object store. DO NOT USE THIS TOOL FOR DATABASES CONTAINING ATTACHMENTS.** [Note](#note-on-attachments)
26
27## Installation
28
29To install the latest released version use npm:
30
31```sh
32npm install -g @cloudant/couchbackup
33```
34
35### Requirements
36* The minimum required Node.js version is 4.8.2.
37* The minimum required CouchDB version is 2.0.0.
38
39### Snapshots
40
41The latest builds of master are published to npm with the `snapshot` tag. Use the `snapshot` tag if you want to experiment with an unreleased fix or new function, but please note that snapshot versions are **unsupported**.
42
43## Usage
44
45Either environment variables or command-line options can be used to specify the URL of the CouchDB or Cloudant instance, and the database to work with.
46
47### The URL
48
49To define the URL of the CouchDB instance set the `COUCH_URL` environment variable:
50
51```sh
52export COUCH_URL=http://localhost:5984
53```
54or
55
56```sh
57export COUCH_URL=https://myusername:mypassword@myhost.cloudant.com
58```
59
60Alternatively we can use the `--url` command-line parameter.
61
62### The Database name
63
64To define the name of the database to backup or restore, set the `COUCH_DATABASE` environment variable:
65
66```sh
67export COUCH_DATABASE=animaldb
68```
69
70Alternatively we can use the `--db` command-line parameter
71
72## Backup
73
74To backup a database to a text file, use the `couchbackup` command, directing the output to a text file:
75
76```sh
77couchbackup > backup.txt
78```
79
80Another way of backing up is to set the `COUCH_URL` environment variable only and supply the database name on the command-line:
81
82```sh
83couchbackup --db animaldb > animaldb.txt
84```
85
86## Logging & resuming backups
87
88You may also create a log file which records the progress of the backup with the `--log` parameter e.g.
89
90```sh
91couchbackup --db animaldb --log animaldb.log > animaldb.txt
92```
93
94This log file can be used to resume backups from where you left off with `--resume true`:
95
96```sh
97couchbackup --db animaldb --log animaldb.log --resume true >> animaldb.txt
98```
99
100You may also specify the name of the output file, rather than directing the backup data to *stdout*:
101
102```sh
103couchbackup --db animaldb --log animaldb.log --resume true --output animaldb.txt
104```
105
106## Restore
107
108Now we have our backup text file, we can restore it to an existing database using the `couchrestore`:
109
110```sh
111cat animaldb.txt | couchrestore
112```
113
114or specifying the database name on the command-line:
115
116```sh
117cat animaldb.txt | couchrestore --db animaldb2
118```
119
120## Compressed backups
121
122If we want to compress the backup data before storing to disk, we can pipe the contents through `gzip`:
123
124```sh
125couchbackup --db animaldb | gzip > animaldb.txt.gz
126```
127
128and restore the file with:
129
130```sh
131cat animaldb.tar.gz | gunzip | couchdbrestore --db animaldb2
132```
133
134## Encrypted backups
135
136Similarly to compression it is possible to pipe the backup content through an
137encryption or decryption utility. For example with `openssl`:
138
139```sh
140couchbackup --db animaldb | openssl aes-128-cbc -pass pass:12345 > encrypted_animal.db
141```
142
143```sh
144openssl aes-128-cbc -d -in encrypted_animal.db -pass pass:12345 | couchrestore --db animaldb2
145```
146
147Note that the content is unencrypted while it is being processed by the
148backup tool before it is piped to the encryption utility.
149
150## What's in a backup file?
151
152A backup file is a text file where each line contains a JSON encoded array of up to `buffer-size` objects e.g.
153
154```js
155 [{"a":1},{"a":2}...]
156 [{"a":501},{"a":502}...]
157```
158
159## What's in a log file?
160
161A log file contains a line:
162
163- for every batch of document ids that need to be fetched e.g. `:t batch56 [{"id":"a"},{"id":"b"}]`
164- for every batch that has been fetched and stored e.g. `:d batch56`
165- to indicate that the changes feed was fully consumed e.g. `:changes_complete`
166
167## What is shallow mode?
168
169When you run `couchbackup` with `--mode shallow` a simpler backup is performed, only backing up the winning revisions
170of the database. No revision tokens are saved and any conflicting revisions are ignored. This is a faster, but less
171complete backup. Shallow backups cannot be resumed because they do not produce a log file.
172
173## Why use CouchBackup?
174
175The easiest way to backup a CouchDB database is to copy the ".couch" file. This is fine on a single-node instance, but when running multi-node
176Cloudant or using CouchDB 2.0 or greater, the ".couch" file only contains a single shard of data. This utility allows simple backups of CouchDB
177or Cloudant database using the HTTP API.
178
179This tool can be used to script the backup of your databases. Move the backup and log files to cheap Object Storage so that you have multiple copies of your precious data.
180
181## Options reference
182
183### Environment variables
184
185* `COUCH_URL` - the URL of the CouchDB/Cloudant server e.g. `http://127.0.0.1:5984`
186* `COUCH_DATABASE` - the name of the database to act upon e.g. `mydb` (default `test`)
187* `COUCH_PARALLELISM` - the number of HTTP requests to perform in parallel when restoring a backup e.g. `10` (Default `5`)
188* `COUCH_BUFFER_SIZE` - the number of documents fetched and restored at once e.g. `100` (default `500`)
189* `COUCH_LOG` - the file to store logging information during backup
190* `COUCH_RESUME` - if `true`, resumes a previous backup from its last known position
191* `COUCH_OUTPUT` - the file name to store the backup data (defaults to stdout)
192* `COUCH_MODE` - if `shallow`, only a superficial backup is done, ignoring conflicts and revision tokens. Defaults to `full` - a full backup.
193* `CLOUDANT_IAM_API_KEY` - optional [IAM API key](https://console.bluemix.net/docs/services/Cloudant/guides/iam.html#ibm-cloud-identity-and-access-management)
194 to use to access the Cloudant database instead of user information credentials in the URL. The endpoint used to retrieve the token defaults to
195 `https://iam.bluemix.net/identity/token`, but can be overridden if necessary using the `CLOUDANT_IAM_TOKEN_URL` environment variable.
196* `DEBUG` - if set to `couchbackup`, all debug messages will be sent to `stderr` during a backup or restore process
197
198### Command-line paramters
199
200* `--url` - same as `COUCH_URL` environment variable
201* `--db` - same as `COUCH_DATABASE`
202* `--parallelism` - same as `COUCH_PARALLELISM`
203* `--buffer-size` - same as `COUCH_BUFFER_SIZE`
204* `--log` - same as `COUCH_LOG`
205* `--resume` - same as `COUCH_RESUME`
206* `--output` - same as `COUCH_OUTPUT`
207* `--mode` - same as `COUCH_MODE`
208* `--iam-api-key` - same as `CLOUDANT_IAM_API_KEY`
209
210## Using programmatically
211
212You can use `couchbackup` programatically. First install
213`couchbackup` into your project with `npm install --save @cloudant/couchbackup`.
214Then you can import the library into your code:
215
216```js
217 const couchbackup = require('@cloudant/couchbackup');
218```
219
220The library exports two main functions:
221
2221. `backup` - backup from a database to a writable stream.
2232. `restore` - restore from a readable stream to a database.
224
225### Examples
226
227See [the examples folder](./examples) for example scripts showing how to
228use the library.
229
230### Backup
231
232The `backup` function takes a source database URL, a stream to write to,
233backup options and a callback for completion.
234
235```javascript
236backup: function(srcUrl, targetStream, opts, callback) { /* ... */ }
237```
238
239The `opts` dictionary can contain values which map to a subset of the
240environment variables defined above. Those related to the source and
241target locations are not required.
242
243* `parallelism`: see `COUCH_PARALLELISM`.
244* `bufferSize`: see `COUCH_BUFFER_SIZE`.
245* `log`: see `COUCH_LOG`.
246* `resume`: see `COUCH_RESUME`.
247* `mode`: see `COUCH_MODE`.
248* `iamApiKey`: see `CLOUDANT_IAM_API_KEY`.
249* `iamTokenUrl` : may be used with `key` to override the default URL for
250 retrieving IAM tokens.
251
252The callback has the standard `err, data` parameters and is called when
253the backup completes or fails.
254
255The `backup` function returns an event emitter. You can subscribe to:
256
257* `changes` - when a batch of changes has been written to log stream.
258* `written` - when a batch of documents has been written to backup stream.
259* `finished` - emitted once when all documents are backed up.
260
261Backup data to a stream:
262
263```javascript
264couchbackup.backup(
265 'https://examples.cloudant.com/animaldb',
266 process.stdout,
267 {parallelism: 2},
268 function(err, data) {
269 if (err) {
270 console.error("Failed! " + err);
271 } else {
272 console.error("Success! " + data);
273 }
274 });
275```
276
277Or to a file:
278
279```javascript
280couchbackup.backup(
281 'https://examples.cloudant.com/animaldb',
282 fs.createWriteStream(filename),
283 {parallelism: 2},
284 function(err, data) {
285 if (err) {
286 console.error("Failed! " + err);
287 } else {
288 console.error("Success! " + data);
289 }
290 });
291```
292
293### Restore
294
295The `restore` function takes a readable stream containing the data emitted
296by the `backup` function. It uploads that to a Cloudant database, which
297should be a **new** database.
298
299```javascript
300restore: function(srcStream, targetUrl, opts, callback) { /* ... */ }
301```
302
303The `opts` dictionary can contain values which map to a subset of the
304environment variables defined above. Those related to the source and
305target locations are not required.
306
307* `parallelism`: see `COUCH_PARALLELISM`.
308* `bufferSize`: see `COUCH_BUFFER_SIZE`.
309
310The callback has the standard `err, data` parameters and is called when
311the restore completes or fails.
312
313The `restore` function returns an event emitter. You can subscribe to:
314
315* `restored` - when a batch of documents is restored.
316* `finished` - emitted once when all documents are restored.
317
318The backup file (or `srcStream`) contains lists comprising of document
319revisions, where each list is separated by a newline. The list length is
320dictated by the `bufferSize` parameter used during the backup.
321
322It's possible a list could be corrupt due to failures in the backup process. A
323`BackupFileJsonError` is emitted for each corrupt list found. _These can only be
324ignored if the backup that generated the stream did complete successfully_. This
325ensures that corrupt lists also have a valid counterpart within the stream.
326
327Restore data from a stream:
328
329```javascript
330couchbackup.restore(
331 process.stdin,
332 'https://examples.cloudant.com/new-animaldb',
333 {parallelism: 2},
334 function(err, data) {
335 if (err) {
336 console.error("Failed! " + err);
337 } else {
338 console.error("Success! " + data);
339 }
340 });
341```
342
343Or from a file:
344
345```javascript
346couchbackup.restore(
347 fs.createReadStream(filename),
348 'https://examples.cloudant.com/new-animaldb',
349 {parallelism: 2},
350 function(err, data) {
351 if (err) {
352 console.error("Failed! " + err);
353 } else {
354 console.error("Success! " + data);
355 }
356 });
357```
358
359## Error Handling
360
361The `couchbackup` and `couchrestore` processes are designed to be relatively robust over an unreliable network. Work is batched and any failed requests are retried indefinitely. However, certain aspects of the execution will not tolerate failure:
362- Spooling changes from the database changes feed. A failure in the changes request during the backup process will result in process termination.
363- Validating the existence of a target database during the database restore process.
364
365### API
366
367When using the library programmatically an `Error` will be passed in one of two ways:
368* For fatal errors the callback will be called with `null, error` arguments
369* For non-fatal errors an `error` event will be emitted
370
371### CLI Exit Codes
372
373On fatal errors, `couchbackup` and `couchrestore` will exit with non-zero exit codes. This section
374details them.
375
376### common to both `couchbackup` and `couchrestore`
377
378* `1`: unknown CLI option or generic error.
379* `2`: invalid CLI option.
380* `11`: unauthorized credentials for the database.
381* `12`: incorrect permissions for the database.
382* `40`: database returned a fatal HTTP error.
383
384### `couchbackup`
385
386* `20`: resume was specified without a log file.
387* `21`: the resume log file does not exist.
388* `22`: incomplete changes in log file.
389* `30`: error spooling changes from the database.
390* `50`: source database does not support `/_bulk_get` endpoint.
391
392### `couchrestore`
393
394* `10`: restore target database does not exist.
395
396## Note on attachments
397
398TLDR; If you backup a database that contains attachments you will not be able to restore it.
399
400As documented above couchbackup does not support backing up or restoring databases containing documents with attachments.
401Attempting to backup a database that includes documents with attachments will appear to succeed. However, the attachment
402content will not have been downloaded and the backup file will contain attachment metadata. Consequently any attempt to
403restore the backup will result in errors because the attachment metadata will reference attachments that are not present
404in the restored database.
405
406It is recommended to store attachments directly in an object store with a link in the JSON document instead of using the
407native attachment API.