UNPKG

16.5 kBMarkdownView Raw
1# CouchBackup
2
3[![npm (scoped)](https://img.shields.io/npm/v/@cloudant/couchbackup.svg?colorB=0000ff)](https://www.npmjs.com/package/@cloudant/couchbackup)
4[![npm (scoped with tag)](https://img.shields.io/npm/v/@cloudant/couchbackup/snapshot.svg?colorB=666699)](https://www.npmjs.com/package/@cloudant/couchbackup)
5[![Build Status](https://travis-ci.org/cloudant/couchbackup.svg?branch=master)](https://travis-ci.org/cloudant/couchbackup)
6[![Greenkeeper badge](https://badges.greenkeeper.io/cloudant/couchbackup.svg)](https://greenkeeper.io/)
7
8```
9 _____ _ ______ _
10/ __ \ | | | ___ \ | |
11| / \/ ___ _ _ ___| |__ | |_/ / __ _ ___| | ___ _ _ __
12| | / _ \| | | |/ __| '_ \| ___ \/ _` |/ __| |/ / | | | '_ \
13| \__/\ (_) | |_| | (__| | | | |_/ / (_| | (__| <| |_| | |_) |
14 \____/\___/ \__,_|\___|_| |_\____/ \__,_|\___|_|\_\\__,_| .__/
15 | |
16 |_|
17```
18
19CouchBackup is a command-line utility that allows a Cloudant or CouchDB database to be backed up to a text file.
20It comes with a companion command-line utility that can restore the backed up data.
21
22## Limitations
23
24Couchbackup has some restrictions in the data it's able to backup:
25
26* **couchbackup does not do CouchDB replication as such, it simply streams through a database's `_changes` feed, and uses `POST /db/_bulk_get` to fetch the documents, storing the documents it finds on disk.**
27* **couchbackup does not support backing up or restoring databases containing documents with attachments. It is recommended to store attachments directly in an object store. DO NOT USE THIS TOOL FOR DATABASES CONTAINING ATTACHMENTS.** [Note](#note-on-attachments)
28
29## Installation
30
31To install the latest released version use npm:
32
33```sh
34npm install -g @cloudant/couchbackup
35```
36
37### Requirements
38* The minimum required Node.js version is 12.
39* The minimum required CouchDB version is 2.0.0.
40
41### Snapshots
42
43The latest builds of master are published to npm with the `snapshot` tag. Use the `snapshot` tag if you want to experiment with an unreleased fix or new function, but please note that snapshot versions are **unsupported**.
44
45## Usage
46
47Either environment variables or command-line options can be used to specify the URL of the CouchDB or Cloudant instance, and the database to work with.
48
49### The URL
50
51To define the URL of the CouchDB instance set the `COUCH_URL` environment variable:
52
53```sh
54export COUCH_URL=http://localhost:5984
55```
56or
57
58```sh
59export COUCH_URL=https://myusername:mypassword@myhost.cloudant.com
60```
61
62Alternatively we can use the `--url` command-line parameter.
63
64When passing credentials in the user information subcomponent of the URL
65they must be [percent encoded](https://tools.ietf.org/html/rfc3986#section-3.2.1).
66Specifically, within either the username or password, the characters `: / ? # [ ] @ %`
67_MUST_ be precent-encoded, other characters _MAY_ be percent encoded.
68
69For example, for the username `user123` and password `colon:at@321`:
70```
71https://user123:colon%3aat%40321@localhost:5984
72```
73
74Note that additional care must be taken to escape shell reserved characters when
75setting the environment variable or command-line parameter.
76
77### The Database name
78
79To define the name of the database to backup or restore, set the `COUCH_DATABASE` environment variable:
80
81```sh
82export COUCH_DATABASE=animaldb
83```
84
85Alternatively we can use the `--db` command-line parameter
86
87## Backup
88
89To backup a database to a text file, use the `couchbackup` command, directing the output to a text file:
90
91```sh
92couchbackup > backup.txt
93```
94
95Another way of backing up is to set the `COUCH_URL` environment variable only and supply the database name on the command-line:
96
97```sh
98couchbackup --db animaldb > animaldb.txt
99```
100
101## Logging & resuming backups
102
103You may also create a log file which records the progress of the backup with the `--log` parameter e.g.
104
105```sh
106couchbackup --db animaldb --log animaldb.log > animaldb.txt
107```
108
109This log file can be used to resume backups from where you left off with `--resume true`:
110
111```sh
112couchbackup --db animaldb --log animaldb.log --resume true >> animaldb.txt
113```
114
115The `--resume true` option works for a backup that has finished spooling changes, but has not yet completed downloading all the necessary batches of documents. It does _not_ provide an incremental backup solution.
116
117You may also specify the name of the output file, rather than directing the backup data to *stdout*:
118
119```sh
120couchbackup --db animaldb --log animaldb.log --resume true --output animaldb.txt
121```
122
123## Restore
124
125Now that we have our backup text file, we can restore it to a new, empty, existing database using the `couchrestore`:
126
127```sh
128cat animaldb.txt | couchrestore
129```
130
131or specifying the database name on the command-line:
132
133```sh
134cat animaldb.txt | couchrestore --db animaldb2
135```
136
137## Compressed backups
138
139If we want to compress the backup data before storing to disk, we can pipe the contents through `gzip`:
140
141```sh
142couchbackup --db animaldb | gzip > animaldb.txt.gz
143```
144
145and restore the file with:
146
147```sh
148cat animaldb.tar.gz | gunzip | couchdbrestore --db animaldb2
149```
150
151## Encrypted backups
152
153Similarly to compression it is possible to pipe the backup content through an
154encryption or decryption utility. For example with `openssl`:
155
156```sh
157couchbackup --db animaldb | openssl aes-128-cbc -pass pass:12345 > encrypted_animal.db
158```
159
160```sh
161openssl aes-128-cbc -d -in encrypted_animal.db -pass pass:12345 | couchrestore --db animaldb2
162```
163
164Note that the content is unencrypted while it is being processed by the
165backup tool before it is piped to the encryption utility.
166
167## What's in a backup file?
168
169A backup file is a text file where each line contains a JSON encoded array of up to `buffer-size` objects e.g.
170
171```js
172 [{"a":1},{"a":2}...]
173 [{"a":501},{"a":502}...]
174```
175
176## What's in a log file?
177
178A log file contains a line:
179
180- for every batch of document ids that need to be fetched e.g. `:t batch56 [{"id":"a"},{"id":"b"}]`
181- for every batch that has been fetched and stored e.g. `:d batch56`
182- to indicate that the changes feed was fully consumed e.g. `:changes_complete`
183
184## What is shallow mode?
185
186When you run `couchbackup` with `--mode shallow` a simpler backup is performed, only backing up the winning revisions
187of the database. No revision tokens are saved and any conflicting revisions are ignored. This is a faster, but less
188complete backup. Shallow backups cannot be resumed because they do not produce a log file.
189
190NOTE: Parallellism will not be in effect if `--mode shallow` is defined.
191
192## Why use CouchBackup?
193
194The easiest way to backup a CouchDB database is to copy the ".couch" file. This is fine on a single-node instance, but when running multi-node
195Cloudant or using CouchDB 2.0 or greater, the ".couch" file only contains a single shard of data. This utility allows simple backups of CouchDB
196or Cloudant database using the HTTP API.
197
198This tool can be used to script the backup of your databases. Move the backup and log files to cheap Object Storage so that you have multiple copies of your precious data.
199
200## Options reference
201
202### Environment variables
203
204* `COUCH_URL` - the URL of the CouchDB/Cloudant server e.g. `http://127.0.0.1:5984`
205* `COUCH_DATABASE` - the name of the database to act upon e.g. `mydb` (default `test`)
206* `COUCH_PARALLELISM` - the number of HTTP requests to perform in parallel when restoring a backup e.g. `10` (Default `5`)
207* `COUCH_BUFFER_SIZE` - the number of documents fetched and restored at once e.g. `100` (default `500`). When using CouchBackup with [Cloudant on Transaction Engine](https://www.ibm.com/cloud/blog/announcements/ibm-cloudant-on-transaction-engine) `COUCH_BUFFER_SIZE` must be less than `2000` to avoid bad request errors.
208* `COUCH_REQUEST_TIMEOUT` - the number of milliseconds to wait for a respose to a HTTP request before retrying the request e.g. `10000` (Default `120000`)
209* `COUCH_LOG` - the file to store logging information during backup
210* `COUCH_RESUME` - if `true`, resumes a previous backup from its last known position
211* `COUCH_OUTPUT` - the file name to store the backup data (defaults to stdout)
212* `COUCH_MODE` - if `shallow`, only a superficial backup is done, ignoring conflicts and revision tokens. Defaults to `full` - a full backup.
213* `COUCH_QUIET` - if `true`, suppresses the individual batch messages to the console during CLI backup and restore
214* `CLOUDANT_IAM_API_KEY` - optional [IAM API key](https://console.bluemix.net/docs/services/Cloudant/guides/iam.html#ibm-cloud-identity-and-access-management)
215 to use to access the Cloudant database instead of user information credentials in the URL. The endpoint used to retrieve the token defaults to
216 `https://iam.cloud.ibm.com/identity/token`, but can be overridden if necessary using the `CLOUDANT_IAM_TOKEN_URL` environment variable.
217* `DEBUG` - if set to `couchbackup`, all debug messages will be sent to `stderr` during a backup or restore process
218
219_Note:_ These environment variables can only be used with the CLI. When
220[using programmatically](#using-programmatically) the `opts` dictionary must be
221used.
222
223### Command-line parameters
224
225* `--url` - same as `COUCH_URL` environment variable
226* `--db` - same as `COUCH_DATABASE`
227* `--parallelism` - same as `COUCH_PARALLELISM`
228* `--buffer-size` - same as `COUCH_BUFFER_SIZE`
229* `--request-timeout` - same as `COUCH_REQUEST_TIMEOUT`
230* `--log` - same as `COUCH_LOG`
231* `--resume` - same as `COUCH_RESUME`
232* `--output` - same as `COUCH_OUTPUT`
233* `--mode` - same as `COUCH_MODE`
234* `--iam-api-key` - same as `CLOUDANT_IAM_API_KEY`
235* `--quiet` - same as `COUCH_QUIET`
236
237## Using programmatically
238
239You can use `couchbackup` programatically. First install
240`couchbackup` into your project with `npm install --save @cloudant/couchbackup`.
241Then you can import the library into your code:
242
243```js
244 const couchbackup = require('@cloudant/couchbackup');
245```
246
247The library exports two main functions:
248
2491. `backup` - backup from a database to a writable stream.
2502. `restore` - restore from a readable stream to an empty database.
251
252### Examples
253
254See [the examples folder](./examples) for example scripts showing how to
255use the library.
256
257### Backup
258
259The `backup` function takes a source database URL, a stream to write to,
260backup options and a callback for completion.
261
262```javascript
263backup: function(srcUrl, targetStream, opts, callback) { /* ... */ }
264```
265
266The `opts` dictionary can contain values which map to a subset of the
267environment variables defined above. Those related to the source and
268target locations are not required.
269
270* `parallelism`: see `COUCH_PARALLELISM`.
271* `bufferSize`: see `COUCH_BUFFER_SIZE`.
272* `requestTimeout`: see `COUCH_REQUEST_TIMEOUT`.
273* `log`: see `COUCH_LOG`.
274* `resume`: see `COUCH_RESUME`.
275* `mode`: see `COUCH_MODE`.
276* `iamApiKey`: see `CLOUDANT_IAM_API_KEY`.
277* `iamTokenUrl`: may be used with `iamApiKey` to override the default URL for
278 retrieving IAM tokens.
279
280The callback has the standard `err, data` parameters and is called when
281the backup completes or fails.
282
283The `backup` function returns an event emitter. You can subscribe to:
284
285* `changes` - when a batch of changes has been written to log stream.
286* `written` - when a batch of documents has been written to backup stream.
287* `finished` - emitted once when all documents are backed up.
288
289Backup data to a stream:
290
291```javascript
292couchbackup.backup(
293 'https://examples.cloudant.com/animaldb',
294 process.stdout,
295 {parallelism: 2},
296 function(err, data) {
297 if (err) {
298 console.error("Failed! " + err);
299 } else {
300 console.error("Success! " + data);
301 }
302 });
303```
304
305Or to a file:
306
307```javascript
308couchbackup.backup(
309 'https://examples.cloudant.com/animaldb',
310 fs.createWriteStream(filename),
311 {parallelism: 2},
312 function(err, data) {
313 if (err) {
314 console.error("Failed! " + err);
315 } else {
316 console.error("Success! " + data);
317 }
318 });
319```
320
321### Restore
322
323The `restore` function takes a readable stream containing the data emitted
324by the `backup` function and uploads that to a Cloudant database.
325
326_Note:_ A target database must be a **new and empty** database.
327
328```javascript
329restore: function(srcStream, targetUrl, opts, callback) { /* ... */ }
330```
331
332The `opts` dictionary can contain values which map to a subset of the
333environment variables defined above. Those related to the source and
334target locations are not required.
335
336* `parallelism`: see `COUCH_PARALLELISM`.
337* `bufferSize`: see `COUCH_BUFFER_SIZE`.
338* `requestTimeout`: see `COUCH_REQUEST_TIMEOUT`.
339* `iamApiKey`: see `CLOUDANT_IAM_API_KEY`.
340* `iamTokenUrl`: may be used with `iamApiKey` to override the default URL for
341 retrieving IAM tokens.
342
343The callback has the standard `err, data` parameters and is called when
344the restore completes or fails.
345
346The `restore` function returns an event emitter. You can subscribe to:
347
348* `restored` - when a batch of documents is restored.
349* `finished` - emitted once when all documents are restored.
350
351The backup file (or `srcStream`) contains lists comprising of document
352revisions, where each list is separated by a newline. The list length is
353dictated by the `bufferSize` parameter used during the backup.
354
355It's possible a list could be corrupt due to failures in the backup process. A
356`BackupFileJsonError` is emitted for each corrupt list found. _These can only be
357ignored if the backup that generated the stream did complete successfully_. This
358ensures that corrupt lists also have a valid counterpart within the stream.
359
360Restore data from a stream:
361
362```javascript
363couchbackup.restore(
364 process.stdin,
365 'https://examples.cloudant.com/new-animaldb',
366 {parallelism: 2},
367 function(err, data) {
368 if (err) {
369 console.error("Failed! " + err);
370 } else {
371 console.error("Success! " + data);
372 }
373 });
374```
375
376Or from a file:
377
378```javascript
379couchbackup.restore(
380 fs.createReadStream(filename),
381 'https://examples.cloudant.com/new-animaldb',
382 {parallelism: 2},
383 function(err, data) {
384 if (err) {
385 console.error("Failed! " + err);
386 } else {
387 console.error("Success! " + data);
388 }
389 });
390```
391
392## Error Handling
393
394The `couchbackup` and `couchrestore` processes are designed to be relatively robust over an unreliable network. Work is batched and any failed requests are retried indefinitely. However, certain aspects of the execution will not tolerate failure:
395- Spooling changes from the database changes feed. A failure in the changes request during the backup process will result in process termination.
396- Validating the existence of a target database during the database restore process.
397
398### API
399
400When using the library programmatically an `Error` will be passed in one of two ways:
401* For fatal errors the callback will be called with `null, error` arguments
402* For non-fatal errors an `error` event will be emitted
403
404### CLI Exit Codes
405
406On fatal errors, `couchbackup` and `couchrestore` will exit with non-zero exit codes. This section
407details them.
408
409### common to both `couchbackup` and `couchrestore`
410
411* `1`: unknown CLI option or generic error.
412* `2`: invalid CLI option.
413* `10`: backup source or restore target database does not exist.
414* `11`: unauthorized credentials for the database.
415* `12`: incorrect permissions for the database.
416* `40`: database returned a fatal HTTP error.
417
418### `couchbackup`
419
420* `20`: resume was specified without a log file.
421* `21`: the resume log file does not exist.
422* `22`: incomplete changes in log file.
423* `30`: error spooling changes from the database.
424* `50`: source database does not support `/_bulk_get` endpoint.
425
426### `couchrestore`
427
428* `13`: restore target database is not new and empty.
429
430## Note on attachments
431
432TLDR; If you backup a database that contains attachments you will not be able to restore it.
433
434As documented above couchbackup does not support backing up or restoring databases containing documents with attachments.
435Attempting to backup a database that includes documents with attachments will appear to succeed. However, the attachment
436content will not have been downloaded and the backup file will contain attachment metadata. Consequently any attempt to
437restore the backup will result in errors because the attachment metadata will reference attachments that are not present
438in the restored database.
439
440It is recommended to store attachments directly in an object store with a link in the JSON document instead of using the
441native attachment API.