1 | # CouchBackup
|
2 |
|
3 | [![npm (scoped)](https://img.shields.io/npm/v/@cloudant/couchbackup.svg?colorB=0000ff)](https://www.npmjs.com/package/@cloudant/couchbackup)
|
4 | [![npm (scoped with tag)](https://img.shields.io/npm/v/@cloudant/couchbackup/snapshot.svg?colorB=666699)](https://www.npmjs.com/package/@cloudant/couchbackup)
|
5 | [![Build Status](https://travis-ci.org/cloudant/couchbackup.svg?branch=master)](https://travis-ci.org/cloudant/couchbackup)
|
6 | [![Greenkeeper badge](https://badges.greenkeeper.io/cloudant/couchbackup.svg)](https://greenkeeper.io/)
|
7 |
|
8 | ```
|
9 | _____ _ ______ _
|
10 | / __ \ | | | ___ \ | |
|
11 | | / \/ ___ _ _ ___| |__ | |_/ / __ _ ___| | ___ _ _ __
|
12 | | | / _ \| | | |/ __| '_ \| ___ \/ _` |/ __| |/ / | | | '_ \
|
13 | | \__/\ (_) | |_| | (__| | | | |_/ / (_| | (__| <| |_| | |_) |
|
14 | \____/\___/ \__,_|\___|_| |_\____/ \__,_|\___|_|\_\\__,_| .__/
|
15 | | |
|
16 | |_|
|
17 | ```
|
18 |
|
19 | CouchBackup is a command-line utility that allows a Cloudant or CouchDB database to be backed up to a text file.
|
20 | It comes with a companion command-line utility that can restore the backed up data.
|
21 |
|
22 | **N.B.**
|
23 |
|
24 | * **couchbackup does not do CouchDB replication as such, it simply streams through a database's `_changes` feed, and uses `POST /db/_bulk_get` to fetch the documents, storing the documents it finds on disk.**
|
25 | * **couchbackup does not support backing up or restoring databases containing documents with attachments. It is recommended to store attachments directly in an object store. DO NOT USE THIS TOOL FOR DATABASES CONTAINING ATTACHMENTS.** [Note](#note-on-attachments)
|
26 |
|
27 | ## Installation
|
28 |
|
29 | To install the latest released version use npm:
|
30 |
|
31 | ```sh
|
32 | npm install -g @cloudant/couchbackup
|
33 | ```
|
34 |
|
35 | ### Requirements
|
36 | * The minimum required Node.js version is 4.8.2.
|
37 | * The minimum required CouchDB version is 2.0.0.
|
38 |
|
39 | ### Snapshots
|
40 |
|
41 | The latest builds of master are published to npm with the `snapshot` tag. Use the `snapshot` tag if you want to experiment with an unreleased fix or new function, but please note that snapshot versions are **unsupported**.
|
42 |
|
43 | ## Usage
|
44 |
|
45 | Either environment variables or command-line options can be used to specify the URL of the CouchDB or Cloudant instance, and the database to work with.
|
46 |
|
47 | ### The URL
|
48 |
|
49 | To define the URL of the CouchDB instance set the `COUCH_URL` environment variable:
|
50 |
|
51 | ```sh
|
52 | export COUCH_URL=http://localhost:5984
|
53 | ```
|
54 | or
|
55 |
|
56 | ```sh
|
57 | export COUCH_URL=https://myusername:mypassword@myhost.cloudant.com
|
58 | ```
|
59 |
|
60 | Alternatively we can use the `--url` command-line parameter.
|
61 |
|
62 | ### The Database name
|
63 |
|
64 | To define the name of the database to backup or restore, set the `COUCH_DATABASE` environment variable:
|
65 |
|
66 | ```sh
|
67 | export COUCH_DATABASE=animaldb
|
68 | ```
|
69 |
|
70 | Alternatively we can use the `--db` command-line parameter
|
71 |
|
72 | ## Backup
|
73 |
|
74 | To backup a database to a text file, use the `couchbackup` command, directing the output to a text file:
|
75 |
|
76 | ```sh
|
77 | couchbackup > backup.txt
|
78 | ```
|
79 |
|
80 | Another way of backing up is to set the `COUCH_URL` environment variable only and supply the database name on the command-line:
|
81 |
|
82 | ```sh
|
83 | couchbackup --db animaldb > animaldb.txt
|
84 | ```
|
85 |
|
86 | ## Logging & resuming backups
|
87 |
|
88 | You may also create a log file which records the progress of the backup with the `--log` parameter e.g.
|
89 |
|
90 | ```sh
|
91 | couchbackup --db animaldb --log animaldb.log > animaldb.txt
|
92 | ```
|
93 |
|
94 | This log file can be used to resume backups from where you left off with `--resume true`:
|
95 |
|
96 | ```sh
|
97 | couchbackup --db animaldb --log animaldb.log --resume true >> animaldb.txt
|
98 | ```
|
99 |
|
100 | You may also specify the name of the output file, rather than directing the backup data to *stdout*:
|
101 |
|
102 | ```sh
|
103 | couchbackup --db animaldb --log animaldb.log --resume true --output animaldb.txt
|
104 | ```
|
105 |
|
106 | ## Restore
|
107 |
|
108 | Now we have our backup text file, we can restore it to an existing database using the `couchrestore`:
|
109 |
|
110 | ```sh
|
111 | cat animaldb.txt | couchrestore
|
112 | ```
|
113 |
|
114 | or specifying the database name on the command-line:
|
115 |
|
116 | ```sh
|
117 | cat animaldb.txt | couchrestore --db animaldb2
|
118 | ```
|
119 |
|
120 | ## Compressed backups
|
121 |
|
122 | If we want to compress the backup data before storing to disk, we can pipe the contents through `gzip`:
|
123 |
|
124 | ```sh
|
125 | couchbackup --db animaldb | gzip > animaldb.txt.gz
|
126 | ```
|
127 |
|
128 | and restore the file with:
|
129 |
|
130 | ```sh
|
131 | cat animaldb.tar.gz | gunzip | couchdbrestore --db animaldb2
|
132 | ```
|
133 |
|
134 | ## Encrypted backups
|
135 |
|
136 | Similarly to compression it is possible to pipe the backup content through an
|
137 | encryption or decryption utility. For example with `openssl`:
|
138 |
|
139 | ```sh
|
140 | couchbackup --db animaldb | openssl aes-128-cbc -pass pass:12345 > encrypted_animal.db
|
141 | ```
|
142 |
|
143 | ```sh
|
144 | openssl aes-128-cbc -d -in encrypted_animal.db -pass pass:12345 | couchrestore --db animaldb2
|
145 | ```
|
146 |
|
147 | Note that the content is unencrypted while it is being processed by the
|
148 | backup tool before it is piped to the encryption utility.
|
149 |
|
150 | ## What's in a backup file?
|
151 |
|
152 | A backup file is a text file where each line contains a JSON encoded array of up to `buffer-size` objects e.g.
|
153 |
|
154 | ```js
|
155 | [{"a":1},{"a":2}...]
|
156 | [{"a":501},{"a":502}...]
|
157 | ```
|
158 |
|
159 | ## What's in a log file?
|
160 |
|
161 | A log file contains a line:
|
162 |
|
163 | - for every batch of document ids that need to be fetched e.g. `:t batch56 [{"id":"a"},{"id":"b"}]`
|
164 | - for every batch that has been fetched and stored e.g. `:d batch56`
|
165 | - to indicate that the changes feed was fully consumed e.g. `:changes_complete`
|
166 |
|
167 | ## What is shallow mode?
|
168 |
|
169 | When you run `couchbackup` with `--mode shallow` a simpler backup is performed, only backing up the winning revisions
|
170 | of the database. No revision tokens are saved and any conflicting revisions are ignored. This is a faster, but less
|
171 | complete backup. Shallow backups cannot be resumed because they do not produce a log file.
|
172 |
|
173 | ## Why use CouchBackup?
|
174 |
|
175 | The easiest way to backup a CouchDB database is to copy the ".couch" file. This is fine on a single-node instance, but when running multi-node
|
176 | Cloudant or using CouchDB 2.0 or greater, the ".couch" file only contains a single shard of data. This utility allows simple backups of CouchDB
|
177 | or Cloudant database using the HTTP API.
|
178 |
|
179 | This tool can be used to script the backup of your databases. Move the backup and log files to cheap Object Storage so that you have multiple copies of your precious data.
|
180 |
|
181 | ## Options reference
|
182 |
|
183 | ### Environment variables
|
184 |
|
185 | * `COUCH_URL` - the URL of the CouchDB/Cloudant server e.g. `http://127.0.0.1:5984`
|
186 | * `COUCH_DATABASE` - the name of the database to act upon e.g. `mydb` (default `test`)
|
187 | * `COUCH_PARALLELISM` - the number of HTTP requests to perform in parallel when restoring a backup e.g. `10` (Default `5`)
|
188 | * `COUCH_BUFFER_SIZE` - the number of documents fetched and restored at once e.g. `100` (default `500`)
|
189 | * `COUCH_LOG` - the file to store logging information during backup
|
190 | * `COUCH_RESUME` - if `true`, resumes a previous backup from its last known position
|
191 | * `COUCH_OUTPUT` - the file name to store the backup data (defaults to stdout)
|
192 | * `COUCH_MODE` - if `shallow`, only a superficial backup is done, ignoring conflicts and revision tokens. Defaults to `full` - a full backup.
|
193 | * `CLOUDANT_IAM_API_KEY` - optional [IAM API key](https://console.bluemix.net/docs/services/Cloudant/guides/iam.html#ibm-cloud-identity-and-access-management)
|
194 | to use to access the Cloudant database instead of user information credentials in the URL. The endpoint used to retrieve the token defaults to
|
195 | `https://iam.bluemix.net/identity/token`, but can be overridden if necessary using the `CLOUDANT_IAM_TOKEN_URL` environment variable.
|
196 | * `DEBUG` - if set to `couchbackup`, all debug messages will be sent to `stderr` during a backup or restore process
|
197 |
|
198 | ### Command-line paramters
|
199 |
|
200 | * `--url` - same as `COUCH_URL` environment variable
|
201 | * `--db` - same as `COUCH_DATABASE`
|
202 | * `--parallelism` - same as `COUCH_PARALLELISM`
|
203 | * `--buffer-size` - same as `COUCH_BUFFER_SIZE`
|
204 | * `--log` - same as `COUCH_LOG`
|
205 | * `--resume` - same as `COUCH_RESUME`
|
206 | * `--output` - same as `COUCH_OUTPUT`
|
207 | * `--mode` - same as `COUCH_MODE`
|
208 | * `--iam-api-key` - same as `CLOUDANT_IAM_API_KEY`
|
209 |
|
210 | ## Using programmatically
|
211 |
|
212 | You can use `couchbackup` programatically. First install
|
213 | `couchbackup` into your project with `npm install --save @cloudant/couchbackup`.
|
214 | Then you can import the library into your code:
|
215 |
|
216 | ```js
|
217 | const couchbackup = require('@cloudant/couchbackup');
|
218 | ```
|
219 |
|
220 | The library exports two main functions:
|
221 |
|
222 | 1. `backup` - backup from a database to a writable stream.
|
223 | 2. `restore` - restore from a readable stream to a database.
|
224 |
|
225 | ### Examples
|
226 |
|
227 | See [the examples folder](./examples) for example scripts showing how to
|
228 | use the library.
|
229 |
|
230 | ### Backup
|
231 |
|
232 | The `backup` function takes a source database URL, a stream to write to,
|
233 | backup options and a callback for completion.
|
234 |
|
235 | ```javascript
|
236 | backup: function(srcUrl, targetStream, opts, callback) { /* ... */ }
|
237 | ```
|
238 |
|
239 | The `opts` dictionary can contain values which map to a subset of the
|
240 | environment variables defined above. Those related to the source and
|
241 | target locations are not required.
|
242 |
|
243 | * `parallelism`: see `COUCH_PARALLELISM`.
|
244 | * `bufferSize`: see `COUCH_BUFFER_SIZE`.
|
245 | * `log`: see `COUCH_LOG`.
|
246 | * `resume`: see `COUCH_RESUME`.
|
247 | * `mode`: see `COUCH_MODE`.
|
248 | * `iamApiKey`: see `CLOUDANT_IAM_API_KEY`.
|
249 | * `iamTokenUrl` : may be used with `key` to override the default URL for
|
250 | retrieving IAM tokens.
|
251 |
|
252 | The callback has the standard `err, data` parameters and is called when
|
253 | the backup completes or fails.
|
254 |
|
255 | The `backup` function returns an event emitter. You can subscribe to:
|
256 |
|
257 | * `changes` - when a batch of changes has been written to log stream.
|
258 | * `written` - when a batch of documents has been written to backup stream.
|
259 | * `finished` - emitted once when all documents are backed up.
|
260 |
|
261 | Backup data to a stream:
|
262 |
|
263 | ```javascript
|
264 | couchbackup.backup(
|
265 | 'https://examples.cloudant.com/animaldb',
|
266 | process.stdout,
|
267 | {parallelism: 2},
|
268 | function(err, data) {
|
269 | if (err) {
|
270 | console.error("Failed! " + err);
|
271 | } else {
|
272 | console.error("Success! " + data);
|
273 | }
|
274 | });
|
275 | ```
|
276 |
|
277 | Or to a file:
|
278 |
|
279 | ```javascript
|
280 | couchbackup.backup(
|
281 | 'https://examples.cloudant.com/animaldb',
|
282 | fs.createWriteStream(filename),
|
283 | {parallelism: 2},
|
284 | function(err, data) {
|
285 | if (err) {
|
286 | console.error("Failed! " + err);
|
287 | } else {
|
288 | console.error("Success! " + data);
|
289 | }
|
290 | });
|
291 | ```
|
292 |
|
293 | ### Restore
|
294 |
|
295 | The `restore` function takes a readable stream containing the data emitted
|
296 | by the `backup` function. It uploads that to a Cloudant database, which
|
297 | should be a **new** database.
|
298 |
|
299 | ```javascript
|
300 | restore: function(srcStream, targetUrl, opts, callback) { /* ... */ }
|
301 | ```
|
302 |
|
303 | The `opts` dictionary can contain values which map to a subset of the
|
304 | environment variables defined above. Those related to the source and
|
305 | target locations are not required.
|
306 |
|
307 | * `parallelism`: see `COUCH_PARALLELISM`.
|
308 | * `bufferSize`: see `COUCH_BUFFER_SIZE`.
|
309 |
|
310 | The callback has the standard `err, data` parameters and is called when
|
311 | the restore completes or fails.
|
312 |
|
313 | The `restore` function returns an event emitter. You can subscribe to:
|
314 |
|
315 | * `restored` - when a batch of documents is restored.
|
316 | * `finished` - emitted once when all documents are restored.
|
317 |
|
318 | The backup file (or `srcStream`) contains lists comprising of document
|
319 | revisions, where each list is separated by a newline. The list length is
|
320 | dictated by the `bufferSize` parameter used during the backup.
|
321 |
|
322 | It's possible a list could be corrupt due to failures in the backup process. A
|
323 | `BackupFileJsonError` is emitted for each corrupt list found. _These can only be
|
324 | ignored if the backup that generated the stream did complete successfully_. This
|
325 | ensures that corrupt lists also have a valid counterpart within the stream.
|
326 |
|
327 | Restore data from a stream:
|
328 |
|
329 | ```javascript
|
330 | couchbackup.restore(
|
331 | process.stdin,
|
332 | 'https://examples.cloudant.com/new-animaldb',
|
333 | {parallelism: 2},
|
334 | function(err, data) {
|
335 | if (err) {
|
336 | console.error("Failed! " + err);
|
337 | } else {
|
338 | console.error("Success! " + data);
|
339 | }
|
340 | });
|
341 | ```
|
342 |
|
343 | Or from a file:
|
344 |
|
345 | ```javascript
|
346 | couchbackup.restore(
|
347 | fs.createReadStream(filename),
|
348 | 'https://examples.cloudant.com/new-animaldb',
|
349 | {parallelism: 2},
|
350 | function(err, data) {
|
351 | if (err) {
|
352 | console.error("Failed! " + err);
|
353 | } else {
|
354 | console.error("Success! " + data);
|
355 | }
|
356 | });
|
357 | ```
|
358 |
|
359 | ## Error Handling
|
360 |
|
361 | The `couchbackup` and `couchrestore` processes are designed to be relatively robust over an unreliable network. Work is batched and any failed requests are retried indefinitely. However, certain aspects of the execution will not tolerate failure:
|
362 | - Spooling changes from the database changes feed. A failure in the changes request during the backup process will result in process termination.
|
363 | - Validating the existence of a target database during the database restore process.
|
364 |
|
365 | ### API
|
366 |
|
367 | When using the library programmatically an `Error` will be passed in one of two ways:
|
368 | * For fatal errors the callback will be called with `null, error` arguments
|
369 | * For non-fatal errors an `error` event will be emitted
|
370 |
|
371 | ### CLI Exit Codes
|
372 |
|
373 | On fatal errors, `couchbackup` and `couchrestore` will exit with non-zero exit codes. This section
|
374 | details them.
|
375 |
|
376 | ### common to both `couchbackup` and `couchrestore`
|
377 |
|
378 | * `1`: unknown CLI option or generic error.
|
379 | * `2`: invalid CLI option.
|
380 | * `11`: unauthorized credentials for the database.
|
381 | * `12`: incorrect permissions for the database.
|
382 | * `40`: database returned a fatal HTTP error.
|
383 |
|
384 | ### `couchbackup`
|
385 |
|
386 | * `20`: resume was specified without a log file.
|
387 | * `21`: the resume log file does not exist.
|
388 | * `22`: incomplete changes in log file.
|
389 | * `30`: error spooling changes from the database.
|
390 | * `50`: source database does not support `/_bulk_get` endpoint.
|
391 |
|
392 | ### `couchrestore`
|
393 |
|
394 | * `10`: restore target database does not exist.
|
395 |
|
396 | ## Note on attachments
|
397 |
|
398 | TLDR; If you backup a database that contains attachments you will not be able to restore it.
|
399 |
|
400 | As documented above couchbackup does not support backing up or restoring databases containing documents with attachments.
|
401 | Attempting to backup a database that includes documents with attachments will appear to succeed. However, the attachment
|
402 | content will not have been downloaded and the backup file will contain attachment metadata. Consequently any attempt to
|
403 | restore the backup will result in errors because the attachment metadata will reference attachments that are not present
|
404 | in the restored database.
|
405 |
|
406 | It is recommended to store attachments directly in an object store with a link in the JSON document instead of using the
|
407 | native attachment API.
|