UNPKG

16.3 kBMarkdownView Raw
1# CouchBackup
2
3[![npm (scoped)](https://img.shields.io/npm/v/@cloudant/couchbackup.svg?colorB=0000ff)](https://www.npmjs.com/package/@cloudant/couchbackup)
4[![npm (scoped with tag)](https://img.shields.io/npm/v/@cloudant/couchbackup/snapshot.svg?colorB=666699)](https://www.npmjs.com/package/@cloudant/couchbackup)
5[![Build Status](https://travis-ci.org/cloudant/couchbackup.svg?branch=master)](https://travis-ci.org/cloudant/couchbackup)
6[![Greenkeeper badge](https://badges.greenkeeper.io/cloudant/couchbackup.svg)](https://greenkeeper.io/)
7
8```
9 _____ _ ______ _
10/ __ \ | | | ___ \ | |
11| / \/ ___ _ _ ___| |__ | |_/ / __ _ ___| | ___ _ _ __
12| | / _ \| | | |/ __| '_ \| ___ \/ _` |/ __| |/ / | | | '_ \
13| \__/\ (_) | |_| | (__| | | | |_/ / (_| | (__| <| |_| | |_) |
14 \____/\___/ \__,_|\___|_| |_\____/ \__,_|\___|_|\_\\__,_| .__/
15 | |
16 |_|
17```
18
19CouchBackup is a command-line utility that allows a Cloudant or CouchDB database to be backed up to a text file.
20It comes with a companion command-line utility that can restore the backed up data.
21
22## Limitations
23
24Couchbackup has some restrictions in the data it's able to backup:
25
26* **couchbackup does not do CouchDB replication as such, it simply streams through a database's `_changes` feed, and uses `POST /db/_bulk_get` to fetch the documents, storing the documents it finds on disk.**
27* **couchbackup does not support backing up or restoring databases containing documents with attachments. It is recommended to store attachments directly in an object store. DO NOT USE THIS TOOL FOR DATABASES CONTAINING ATTACHMENTS.** [Note](#note-on-attachments)
28
29## Installation
30
31To install the latest released version use npm:
32
33```sh
34npm install -g @cloudant/couchbackup
35```
36
37### Requirements
38* The minimum required Node.js version is 10.
39* The minimum required CouchDB version is 2.0.0.
40
41### Snapshots
42
43The latest builds of master are published to npm with the `snapshot` tag. Use the `snapshot` tag if you want to experiment with an unreleased fix or new function, but please note that snapshot versions are **unsupported**.
44
45## Usage
46
47Either environment variables or command-line options can be used to specify the URL of the CouchDB or Cloudant instance, and the database to work with.
48
49### The URL
50
51To define the URL of the CouchDB instance set the `COUCH_URL` environment variable:
52
53```sh
54export COUCH_URL=http://localhost:5984
55```
56or
57
58```sh
59export COUCH_URL=https://myusername:mypassword@myhost.cloudant.com
60```
61
62Alternatively we can use the `--url` command-line parameter.
63
64When passing credentials in the user information subcomponent of the URL
65they must be [percent encoded](https://tools.ietf.org/html/rfc3986#section-3.2.1).
66Specifically, within either the username or password, the characters `: / ? # [ ] @ %`
67_MUST_ be precent-encoded, other characters _MAY_ be percent encoded.
68
69For example, for the username `user123` and password `colon:at@321`:
70```
71https://user123:colon%3aat%40321@localhost:5984
72```
73
74Note that additional care must be taken to escape shell reserved characters when
75setting the environment variable or command-line parameter.
76
77### The Database name
78
79To define the name of the database to backup or restore, set the `COUCH_DATABASE` environment variable:
80
81```sh
82export COUCH_DATABASE=animaldb
83```
84
85Alternatively we can use the `--db` command-line parameter
86
87## Backup
88
89To backup a database to a text file, use the `couchbackup` command, directing the output to a text file:
90
91```sh
92couchbackup > backup.txt
93```
94
95Another way of backing up is to set the `COUCH_URL` environment variable only and supply the database name on the command-line:
96
97```sh
98couchbackup --db animaldb > animaldb.txt
99```
100
101## Logging & resuming backups
102
103You may also create a log file which records the progress of the backup with the `--log` parameter e.g.
104
105```sh
106couchbackup --db animaldb --log animaldb.log > animaldb.txt
107```
108
109This log file can be used to resume backups from where you left off with `--resume true`:
110
111```sh
112couchbackup --db animaldb --log animaldb.log --resume true >> animaldb.txt
113```
114
115The `--resume true` option works for a backup that has finished spooling changes, but has not yet completed downloading all the necessary batches of documents. It does _not_ provide an incremental backup solution.
116
117You may also specify the name of the output file, rather than directing the backup data to *stdout*:
118
119```sh
120couchbackup --db animaldb --log animaldb.log --resume true --output animaldb.txt
121```
122
123## Restore
124
125Now that we have our backup text file, we can restore it to a new, empty, existing database using the `couchrestore`:
126
127```sh
128cat animaldb.txt | couchrestore
129```
130
131or specifying the database name on the command-line:
132
133```sh
134cat animaldb.txt | couchrestore --db animaldb2
135```
136
137## Compressed backups
138
139If we want to compress the backup data before storing to disk, we can pipe the contents through `gzip`:
140
141```sh
142couchbackup --db animaldb | gzip > animaldb.txt.gz
143```
144
145and restore the file with:
146
147```sh
148cat animaldb.tar.gz | gunzip | couchdbrestore --db animaldb2
149```
150
151## Encrypted backups
152
153Similarly to compression it is possible to pipe the backup content through an
154encryption or decryption utility. For example with `openssl`:
155
156```sh
157couchbackup --db animaldb | openssl aes-128-cbc -pass pass:12345 > encrypted_animal.db
158```
159
160```sh
161openssl aes-128-cbc -d -in encrypted_animal.db -pass pass:12345 | couchrestore --db animaldb2
162```
163
164Note that the content is unencrypted while it is being processed by the
165backup tool before it is piped to the encryption utility.
166
167## What's in a backup file?
168
169A backup file is a text file where each line contains a JSON encoded array of up to `buffer-size` objects e.g.
170
171```js
172 [{"a":1},{"a":2}...]
173 [{"a":501},{"a":502}...]
174```
175
176## What's in a log file?
177
178A log file contains a line:
179
180- for every batch of document ids that need to be fetched e.g. `:t batch56 [{"id":"a"},{"id":"b"}]`
181- for every batch that has been fetched and stored e.g. `:d batch56`
182- to indicate that the changes feed was fully consumed e.g. `:changes_complete`
183
184## What is shallow mode?
185
186When you run `couchbackup` with `--mode shallow` a simpler backup is performed, only backing up the winning revisions
187of the database. No revision tokens are saved and any conflicting revisions are ignored. This is a faster, but less
188complete backup. Shallow backups cannot be resumed because they do not produce a log file.
189
190NOTE: Parallellism will not be in effect if `--mode shallow` is defined.
191
192## Why use CouchBackup?
193
194The easiest way to backup a CouchDB database is to copy the ".couch" file. This is fine on a single-node instance, but when running multi-node
195Cloudant or using CouchDB 2.0 or greater, the ".couch" file only contains a single shard of data. This utility allows simple backups of CouchDB
196or Cloudant database using the HTTP API.
197
198This tool can be used to script the backup of your databases. Move the backup and log files to cheap Object Storage so that you have multiple copies of your precious data.
199
200## Options reference
201
202### Environment variables
203
204* `COUCH_URL` - the URL of the CouchDB/Cloudant server e.g. `http://127.0.0.1:5984`
205* `COUCH_DATABASE` - the name of the database to act upon e.g. `mydb` (default `test`)
206* `COUCH_PARALLELISM` - the number of HTTP requests to perform in parallel when restoring a backup e.g. `10` (Default `5`)
207* `COUCH_BUFFER_SIZE` - the number of documents fetched and restored at once e.g. `100` (default `500`). When using CouchBackup with [Cloudant on Transaction Engine](https://www.ibm.com/cloud/blog/announcements/ibm-cloudant-on-transaction-engine) `COUCH_BUFFER_SIZE` must be less than `2000` to avoid bad request errors.
208* `COUCH_REQUEST_TIMEOUT` - the number of milliseconds to wait for a respose to a HTTP request before retrying the request e.g. `10000` (Default `120000`)
209* `COUCH_LOG` - the file to store logging information during backup
210* `COUCH_RESUME` - if `true`, resumes a previous backup from its last known position
211* `COUCH_OUTPUT` - the file name to store the backup data (defaults to stdout)
212* `COUCH_MODE` - if `shallow`, only a superficial backup is done, ignoring conflicts and revision tokens. Defaults to `full` - a full backup.
213* `CLOUDANT_IAM_API_KEY` - optional [IAM API key](https://console.bluemix.net/docs/services/Cloudant/guides/iam.html#ibm-cloud-identity-and-access-management)
214 to use to access the Cloudant database instead of user information credentials in the URL. The endpoint used to retrieve the token defaults to
215 `https://iam.cloud.ibm.com/identity/token`, but can be overridden if necessary using the `CLOUDANT_IAM_TOKEN_URL` environment variable.
216* `DEBUG` - if set to `couchbackup`, all debug messages will be sent to `stderr` during a backup or restore process
217
218_Note:_ These environment variables can only be used with the CLI. When
219[using programmatically](#using-programmatically) the `opts` dictionary must be
220used.
221
222### Command-line parameters
223
224* `--url` - same as `COUCH_URL` environment variable
225* `--db` - same as `COUCH_DATABASE`
226* `--parallelism` - same as `COUCH_PARALLELISM`
227* `--buffer-size` - same as `COUCH_BUFFER_SIZE`
228* `--request-timeout` - same as `COUCH_REQUEST_TIMEOUT`
229* `--log` - same as `COUCH_LOG`
230* `--resume` - same as `COUCH_RESUME`
231* `--output` - same as `COUCH_OUTPUT`
232* `--mode` - same as `COUCH_MODE`
233* `--iam-api-key` - same as `CLOUDANT_IAM_API_KEY`
234
235## Using programmatically
236
237You can use `couchbackup` programatically. First install
238`couchbackup` into your project with `npm install --save @cloudant/couchbackup`.
239Then you can import the library into your code:
240
241```js
242 const couchbackup = require('@cloudant/couchbackup');
243```
244
245The library exports two main functions:
246
2471. `backup` - backup from a database to a writable stream.
2482. `restore` - restore from a readable stream to an empty database.
249
250### Examples
251
252See [the examples folder](./examples) for example scripts showing how to
253use the library.
254
255### Backup
256
257The `backup` function takes a source database URL, a stream to write to,
258backup options and a callback for completion.
259
260```javascript
261backup: function(srcUrl, targetStream, opts, callback) { /* ... */ }
262```
263
264The `opts` dictionary can contain values which map to a subset of the
265environment variables defined above. Those related to the source and
266target locations are not required.
267
268* `parallelism`: see `COUCH_PARALLELISM`.
269* `bufferSize`: see `COUCH_BUFFER_SIZE`.
270* `requestTimeout`: see `COUCH_REQUEST_TIMEOUT`.
271* `log`: see `COUCH_LOG`.
272* `resume`: see `COUCH_RESUME`.
273* `mode`: see `COUCH_MODE`.
274* `iamApiKey`: see `CLOUDANT_IAM_API_KEY`.
275* `iamTokenUrl`: may be used with `iamApiKey` to override the default URL for
276 retrieving IAM tokens.
277
278The callback has the standard `err, data` parameters and is called when
279the backup completes or fails.
280
281The `backup` function returns an event emitter. You can subscribe to:
282
283* `changes` - when a batch of changes has been written to log stream.
284* `written` - when a batch of documents has been written to backup stream.
285* `finished` - emitted once when all documents are backed up.
286
287Backup data to a stream:
288
289```javascript
290couchbackup.backup(
291 'https://examples.cloudant.com/animaldb',
292 process.stdout,
293 {parallelism: 2},
294 function(err, data) {
295 if (err) {
296 console.error("Failed! " + err);
297 } else {
298 console.error("Success! " + data);
299 }
300 });
301```
302
303Or to a file:
304
305```javascript
306couchbackup.backup(
307 'https://examples.cloudant.com/animaldb',
308 fs.createWriteStream(filename),
309 {parallelism: 2},
310 function(err, data) {
311 if (err) {
312 console.error("Failed! " + err);
313 } else {
314 console.error("Success! " + data);
315 }
316 });
317```
318
319### Restore
320
321The `restore` function takes a readable stream containing the data emitted
322by the `backup` function and uploads that to a Cloudant database.
323
324_Note:_ A target database must be a **new and empty** database.
325
326```javascript
327restore: function(srcStream, targetUrl, opts, callback) { /* ... */ }
328```
329
330The `opts` dictionary can contain values which map to a subset of the
331environment variables defined above. Those related to the source and
332target locations are not required.
333
334* `parallelism`: see `COUCH_PARALLELISM`.
335* `bufferSize`: see `COUCH_BUFFER_SIZE`.
336* `requestTimeout`: see `COUCH_REQUEST_TIMEOUT`.
337* `iamApiKey`: see `CLOUDANT_IAM_API_KEY`.
338* `iamTokenUrl`: may be used with `iamApiKey` to override the default URL for
339 retrieving IAM tokens.
340
341The callback has the standard `err, data` parameters and is called when
342the restore completes or fails.
343
344The `restore` function returns an event emitter. You can subscribe to:
345
346* `restored` - when a batch of documents is restored.
347* `finished` - emitted once when all documents are restored.
348
349The backup file (or `srcStream`) contains lists comprising of document
350revisions, where each list is separated by a newline. The list length is
351dictated by the `bufferSize` parameter used during the backup.
352
353It's possible a list could be corrupt due to failures in the backup process. A
354`BackupFileJsonError` is emitted for each corrupt list found. _These can only be
355ignored if the backup that generated the stream did complete successfully_. This
356ensures that corrupt lists also have a valid counterpart within the stream.
357
358Restore data from a stream:
359
360```javascript
361couchbackup.restore(
362 process.stdin,
363 'https://examples.cloudant.com/new-animaldb',
364 {parallelism: 2},
365 function(err, data) {
366 if (err) {
367 console.error("Failed! " + err);
368 } else {
369 console.error("Success! " + data);
370 }
371 });
372```
373
374Or from a file:
375
376```javascript
377couchbackup.restore(
378 fs.createReadStream(filename),
379 'https://examples.cloudant.com/new-animaldb',
380 {parallelism: 2},
381 function(err, data) {
382 if (err) {
383 console.error("Failed! " + err);
384 } else {
385 console.error("Success! " + data);
386 }
387 });
388```
389
390## Error Handling
391
392The `couchbackup` and `couchrestore` processes are designed to be relatively robust over an unreliable network. Work is batched and any failed requests are retried indefinitely. However, certain aspects of the execution will not tolerate failure:
393- Spooling changes from the database changes feed. A failure in the changes request during the backup process will result in process termination.
394- Validating the existence of a target database during the database restore process.
395
396### API
397
398When using the library programmatically an `Error` will be passed in one of two ways:
399* For fatal errors the callback will be called with `null, error` arguments
400* For non-fatal errors an `error` event will be emitted
401
402### CLI Exit Codes
403
404On fatal errors, `couchbackup` and `couchrestore` will exit with non-zero exit codes. This section
405details them.
406
407### common to both `couchbackup` and `couchrestore`
408
409* `1`: unknown CLI option or generic error.
410* `2`: invalid CLI option.
411* `10`: backup source or restore target database does not exist.
412* `11`: unauthorized credentials for the database.
413* `12`: incorrect permissions for the database.
414* `40`: database returned a fatal HTTP error.
415
416### `couchbackup`
417
418* `20`: resume was specified without a log file.
419* `21`: the resume log file does not exist.
420* `22`: incomplete changes in log file.
421* `30`: error spooling changes from the database.
422* `50`: source database does not support `/_bulk_get` endpoint.
423
424### `couchrestore`
425
426## Note on attachments
427
428TLDR; If you backup a database that contains attachments you will not be able to restore it.
429
430As documented above couchbackup does not support backing up or restoring databases containing documents with attachments.
431Attempting to backup a database that includes documents with attachments will appear to succeed. However, the attachment
432content will not have been downloaded and the backup file will contain attachment metadata. Consequently any attempt to
433restore the backup will result in errors because the attachment metadata will reference attachments that are not present
434in the restored database.
435
436It is recommended to store attachments directly in an object store with a link in the JSON document instead of using the
437native attachment API.