UNPKG

chillastic/README.md

Version:

6.92 kBMarkdownView Raw

1# chillastic
2[![Codacy Badge](https://api.codacy.com/project/badge/Grade/636e4a8ac9bd43fab11f33e83061044e)](https://www.codacy.com/app/GroupByInc/chillastic?utm_source=github.com&amp;utm_medium=referral&amp;utm_content=groupby/chillastic&amp;utm_campaign=Badge_Grade) [![Coverage Status](https://coveralls.io/repos/github/groupby/chillastic/badge.svg?branch=master)](https://coveralls.io/github/groupby/chillastic?branch=master) [![Circle CI](https://circleci.com/gh/groupby/chillastic.svg?style=svg)](https://circleci.com/gh/groupby/chillastic)
3
4Reindex multiple elasticsearch indices, save your progress, mutate your data in-flight.
5
6### How to use it
7Install into your project with:
8```
9npm install --save chillastic
10```
11
12Create an instance of it doing something like (also seen in example.js):
13```
14const Chillastic = require('chillastic');
15const _ = require('lodash');
16
17const REDIS_HOST = 'localhost';
18const REDIS_PORT = 6379;
19const CHILL_PORT = _.random(7000, 10000);
20
21const chillastic = Chillastic(REDIS_HOST, REDIS_PORT, CHILL_PORT);
22
23// Start it up!
24chillastic.run();
25```
26
27Running the code above will create a single chillastic worker with an API on a random port between 7000-10000. You should see something like the following on console out:
28```
2913:19:28.519Z  WARN chillastic : 
30    Starting with config: {
31      "FRAMEWORK_NAME": "chillastic",
32      "logLevel": "info",
33      "elasticsearch": {
34        "logLevel": "warn"
35      },
36      "redis": {
37        "host": "localhost",
38        "port": 6379
39      },
40      "port": 9605
41    }
4213:19:28.530Z  WARN chillastic : chillastic server listening on port 9605
4313:19:28.544Z  INFO chillastic : Starting worker: Rapidskinner Grin
4413:19:28.545Z  INFO chillastic : No tasks found, waiting...
4513:19:30.548Z  INFO chillastic : No tasks found, waiting...
46```
47
48To get the status of the system, and a full list of workers:
49```
50curl localhost:9605/status
51
52{"manager":"running","workers":{"Rapidskinner Grin":{"status":"waiting for task..."}}}
53```
54
55To add another worker, just start another instance pointed at the same redis instance. Then check the status again for a response like:
56```
57curl localhost:9605/health
58
59{"manager":"running","workers":{"Rapidskinner Grin":{"status":"waiting for task..."},"Windshift Fairy":{"status":"waiting for task..."}}}
60```
61
62That's great, but it's time to do some work.
63 
64First we'll define a simple mutator like this: 
65```javascript
66const TARGET_INDICES_REGEX = /^log_data_v1/;
67const NEW_INDEX_NAME = 'log_data_v2';
68
69module.exports = {
70  type:      'data',
71  predicate: function (doc) {
72    return TARGET_INDICES_REGEX.test(doc._index);
73  },
74  mutate: function (doc) {
75    doc._index = doc._index.replace(TARGET_INDICES_REGEX, NEW_INDEX_NAME);
76
77    return doc;
78  }
79};
80```
81
82We'll save that to mutator.js, and then send that to the API:
83```bash
84curl localhost:9605/mutators/someNamespace/ourMutator -H 'Content-type: text/plain' --data-binary '@mutator.js'
85```
86
87Define the task to use the mutator we just sent:
88```json
89{
90  "source": {
91    "host": "localhost",
92    "port": 9200
93  },
94  "destination": {
95    "host": "localhost:9201",
96    "port": 9201
97  },
98  "transfer": {
99    "documents": {
100      "fromIndices": "log_data_v1*"
101    }
102  },
103  "mutators": {
104    "actions": [
105      {
106        "namespace": "someNamespace",
107        "id": "ourMutator"
108      }
109    ]
110  }
111}
112```
113
114You can push a task into the system using the API:
115```bash
116curl -XPOST localhost:9605/tasks/newtask -d '{"source":{"host":"localhost","port":9200},"destination":{"host":"localhost:9201","port":9201},"transfer":{"documents":{"fromIndices":"log_data_v1*"}},"mutators":{"actions":[{"namespace":"someNamespace","id":"ourMutator"}]}}'
117```
118
119This task will be split into subtasks, one for each combination of index and type. The workers will then transfer all the documents associated with a specific subtask from one elasticsearch to the other.
120
121### Tasks
122A task defines the work to be done during reindexing and has the following possible fields:
123
124```javascript
125{
126  "source": {
127    "host": "localhost",
128    "port": 9200
129  },
130  "destination": {
131    "host": "localhost:9201",
132    "port": 9201
133  },
134  "transfer": {
135    "documents": {
136      "flushSize": 25000,               // Max number of docs in a bulk operation
137      "fromIndices": "log_data_v1*",    // Any index names that match this will have their docs transferred
138      "filters": {
139        "actions": [
140          {
141            "namespace": "someNamespace",
142            "id": "ourFilter",
143            "arguments": {}
144          }
145        ],
146        "arguments": {}
147      }
148    },
149    "indices": {
150      "name": "*log_data*",     // Any index names that match this will have their settings, mappings, aliases copied
151      "templates": "*log_data*" // Any template names that match this will be copied
152    }
153  },
154  "mutators": {
155    "actions": [
156      {
157        "namespace": "someNamespace",
158        "id": "ourMutator"
159      }
160    ],
161    "arguments": {}
162  }
163}
164```
165
166### Mutators
167Mutators can be of type 'data', 'index', or 'template' and apply to documents, index configurations, and templates respectively.
168
169They are defined as javascript modules and loaded by POSTing them to the mutators/ API endpoint.
170
171```javascript
172// The libraries for 'moment' and 'lodash' are available inside the mutator definition
173const moment = require('moment');
174
175const OLD_DATE_FORMAT = 'YYYY-MM-DD';
176const OLD_DATE_REGEX  = /[0-9]{4}-[0-9]{2}-[0-9]{2}/;
177const NEW_DATE_FORMAT = 'YYYY-MM';
178
179module.exports = {
180  /**
181   * Type of mutator
182   */
183  type:      'data',
184  /**
185   * The predicate function is called for every target document
186   * @param doc - The document to be checked against the predicate
187   * @param arguments - The task-specific arguments object
188   * @returns {boolean}
189   */
190  predicate: function (doc, arguments) {
191    return OLD_DATE_REGEX.test(doc._index);
192  },
193  /**
194   * The mutate function is only called on documents that satisfy the predicate
195   * @param doc - The document that satisfied the predicate
196   * @param arguments - The task-specific arguments object
197   * @returns {*}
198   */
199  mutate: function (doc, arguments) {
200    const date = moment(doc._index.match(OLD_DATE_REGEX), OLD_DATE_FORMAT);
201    doc._index = doc._index.replace(OLD_DATE_REGEX, date.format(NEW_DATE_FORMAT));
202
203    return doc;
204  }
205};
206```
207
208### Filters
209Filters are used prior to document transfer to only include specific types or indicies prior to the task starting. While a mutator could be used for this by returning null on specific documents, a filter has the advantage of removing entire indicies and types prior to processing.
210
211Filters are also javascript modules.
212
213```javascript
214module.exports = {
215  type: 'index',
216  /**
217   * Only indicies that trigger this predicate will be included in transfer
218   * @param index - Full index configuration
219   */
220  predicate: (index) => index.name === 'log_data_v1_include_this'
221};
222```
223
224
225### More docs to come

1	`# chillastic`
2	[![Codacy Badge](https://api.codacy.com/project/badge/Grade/636e4a8ac9bd43fab11f33e83061044e)](https://www.codacy.com/app/GroupByInc/chillastic?utm_source=github.com&utm_medium=referral&utm_content=groupby/chillastic&utm_campaign=Badge_Grade) [![Coverage Status](https://coveralls.io/repos/github/groupby/chillastic/badge.svg?branch=master)](https://coveralls.io/github/groupby/chillastic?branch=master) [![Circle CI](https://circleci.com/gh/groupby/chillastic.svg?style=svg)](https://circleci.com/gh/groupby/chillastic)
3
4	`Reindex multiple elasticsearch indices, save your progress, mutate your data in-flight.`
5
6	`### How to use it`
7	`Install into your project with:`
8	```
9	`npm install --save chillastic`
10	```
11
12	`Create an instance of it doing something like (also seen in example.js):`
13	```
14	`const Chillastic = require('chillastic');`
15	`const _ = require('lodash');`
16
17	`const REDIS_HOST = 'localhost';`
18	`const REDIS_PORT = 6379;`
19	`const CHILL_PORT = _.random(7000, 10000);`
20
21	`const chillastic = Chillastic(REDIS_HOST, REDIS_PORT, CHILL_PORT);`
22
23	`// Start it up!`
24	`chillastic.run();`
25	```
26
27	`Running the code above will create a single chillastic worker with an API on a random port between 7000-10000. You should see something like the following on console out:`
28	```
29	`13:19:28.519Z WARN chillastic :`
30	`Starting with config: {`
31	`"FRAMEWORK_NAME": "chillastic",`
32	`"logLevel": "info",`
33	`"elasticsearch": {`
34	`"logLevel": "warn"`
35	`},`
36	`"redis": {`
37	`"host": "localhost",`
38	`"port": 6379`
39	`},`
40	`"port": 9605`
41	`}`
42	`13:19:28.530Z WARN chillastic : chillastic server listening on port 9605`
43	`13:19:28.544Z INFO chillastic : Starting worker: Rapidskinner Grin`
44	`13:19:28.545Z INFO chillastic : No tasks found, waiting...`
45	`13:19:30.548Z INFO chillastic : No tasks found, waiting...`
46	```
47
48	`To get the status of the system, and a full list of workers:`
49	```
50	`curl localhost:9605/status`
51
52	`{"manager":"running","workers":{"Rapidskinner Grin":{"status":"waiting for task..."}}}`
53	```
54
55	`To add another worker, just start another instance pointed at the same redis instance. Then check the status again for a response like:`
56	```
57	`curl localhost:9605/health`
58
59	`{"manager":"running","workers":{"Rapidskinner Grin":{"status":"waiting for task..."},"Windshift Fairy":{"status":"waiting for task..."}}}`
60	```
61
62	`That's great, but it's time to do some work.`
63
64	`First we'll define a simple mutator like this:`
65	```javascript
66	`const TARGET_INDICES_REGEX = /^log_data_v1/;`
67	`const NEW_INDEX_NAME = 'log_data_v2';`
68
69	`module.exports = {`
70	`type: 'data',`
71	`predicate: function (doc) {`
72	`return TARGET_INDICES_REGEX.test(doc._index);`
73	`},`
74	`mutate: function (doc) {`
75	`doc._index = doc._index.replace(TARGET_INDICES_REGEX, NEW_INDEX_NAME);`
76
77	`return doc;`
78	`}`
79	`};`
80	```
81
82	`We'll save that to mutator.js, and then send that to the API:`
83	```bash
84	`curl localhost:9605/mutators/someNamespace/ourMutator -H 'Content-type: text/plain' --data-binary '@mutator.js'`
85	```
86
87	`Define the task to use the mutator we just sent:`
88	```json
89	`{`
90	`"source": {`
91	`"host": "localhost",`
92	`"port": 9200`
93	`},`
94	`"destination": {`
95	`"host": "localhost:9201",`
96	`"port": 9201`
97	`},`
98	`"transfer": {`
99	`"documents": {`
100	`"fromIndices": "log_data_v1*"`
101	`}`
102	`},`
103	`"mutators": {`
104	`"actions": [`
105	`{`
106	`"namespace": "someNamespace",`
107	`"id": "ourMutator"`
108	`}`
109	`]`
110	`}`
111	`}`
112	```
113
114	`You can push a task into the system using the API:`
115	```bash
116	`curl -XPOST localhost:9605/tasks/newtask -d '{"source":{"host":"localhost","port":9200},"destination":{"host":"localhost:9201","port":9201},"transfer":{"documents":{"fromIndices":"log_data_v1*"}},"mutators":{"actions":[{"namespace":"someNamespace","id":"ourMutator"}]}}'`
117	```
118
119	`This task will be split into subtasks, one for each combination of index and type. The workers will then transfer all the documents associated with a specific subtask from one elasticsearch to the other.`
120
121	`### Tasks`
122	`A task defines the work to be done during reindexing and has the following possible fields:`
123
124	```javascript
125	`{`
126	`"source": {`
127	`"host": "localhost",`
128	`"port": 9200`
129	`},`
130	`"destination": {`
131	`"host": "localhost:9201",`
132	`"port": 9201`
133	`},`
134	`"transfer": {`
135	`"documents": {`
136	`"flushSize": 25000, // Max number of docs in a bulk operation`
137	`"fromIndices": "log_data_v1*", // Any index names that match this will have their docs transferred`
138	`"filters": {`
139	`"actions": [`
140	`{`
141	`"namespace": "someNamespace",`
142	`"id": "ourFilter",`
143	`"arguments": {}`
144	`}`
145	`],`
146	`"arguments": {}`
147	`}`
148	`},`
149	`"indices": {`
150	`"name": "log_data", // Any index names that match this will have their settings, mappings, aliases copied`
151	`"templates": "log_data" // Any template names that match this will be copied`
152	`}`
153	`},`
154	`"mutators": {`
155	`"actions": [`
156	`{`
157	`"namespace": "someNamespace",`
158	`"id": "ourMutator"`
159	`}`
160	`],`
161	`"arguments": {}`
162	`}`
163	`}`
164	```
165
166	`### Mutators`
167	`Mutators can be of type 'data', 'index', or 'template' and apply to documents, index configurations, and templates respectively.`
168
169	`They are defined as javascript modules and loaded by POSTing them to the mutators/ API endpoint.`
170
171	```javascript
172	`// The libraries for 'moment' and 'lodash' are available inside the mutator definition`
173	`const moment = require('moment');`
174
175	`const OLD_DATE_FORMAT = 'YYYY-MM-DD';`
176	`const OLD_DATE_REGEX = /[0-9]{4}-[0-9]{2}-[0-9]{2}/;`
177	`const NEW_DATE_FORMAT = 'YYYY-MM';`
178
179	`module.exports = {`
180	`/**`
181	`* Type of mutator`
182	`*/`
183	`type: 'data',`
184	`/**`
185	`* The predicate function is called for every target document`
186	`* @param doc - The document to be checked against the predicate`
187	`* @param arguments - The task-specific arguments object`
188	`* @returns {boolean}`
189	`*/`
190	`predicate: function (doc, arguments) {`
191	`return OLD_DATE_REGEX.test(doc._index);`
192	`},`
193	`/**`
194	`* The mutate function is only called on documents that satisfy the predicate`
195	`* @param doc - The document that satisfied the predicate`
196	`* @param arguments - The task-specific arguments object`
197	`* @returns {*}`
198	`*/`
199	`mutate: function (doc, arguments) {`
200	`const date = moment(doc._index.match(OLD_DATE_REGEX), OLD_DATE_FORMAT);`
201	`doc._index = doc._index.replace(OLD_DATE_REGEX, date.format(NEW_DATE_FORMAT));`
202
203	`return doc;`
204	`}`
205	`};`
206	```
207
208	`### Filters`
209	`Filters are used prior to document transfer to only include specific types or indicies prior to the task starting. While a mutator could be used for this by returning null on specific documents, a filter has the advantage of removing entire indicies and types prior to processing.`
210
211	`Filters are also javascript modules.`
212
213	```javascript
214	`module.exports = {`
215	`type: 'index',`
216	`/**`
217	`* Only indicies that trigger this predicate will be included in transfer`
218	`* @param index - Full index configuration`
219	`*/`
220	`predicate: (index) => index.name === 'log_data_v1_include_this'`
221	`};`
222	```
223
224
225	`### More docs to come`