UNPKG

6.92 kBMarkdownView Raw
1# chillastic
2[![Codacy Badge](https://api.codacy.com/project/badge/Grade/636e4a8ac9bd43fab11f33e83061044e)](https://www.codacy.com/app/GroupByInc/chillastic?utm_source=github.com&utm_medium=referral&utm_content=groupby/chillastic&utm_campaign=Badge_Grade) [![Coverage Status](https://coveralls.io/repos/github/groupby/chillastic/badge.svg?branch=master)](https://coveralls.io/github/groupby/chillastic?branch=master) [![Circle CI](https://circleci.com/gh/groupby/chillastic.svg?style=svg)](https://circleci.com/gh/groupby/chillastic)
3
4Reindex multiple elasticsearch indices, save your progress, mutate your data in-flight.
5
6### How to use it
7Install into your project with:
8```
9npm install --save chillastic
10```
11
12Create an instance of it doing something like (also seen in example.js):
13```
14const Chillastic = require('chillastic');
15const _ = require('lodash');
16
17const REDIS_HOST = 'localhost';
18const REDIS_PORT = 6379;
19const CHILL_PORT = _.random(7000, 10000);
20
21const chillastic = Chillastic(REDIS_HOST, REDIS_PORT, CHILL_PORT);
22
23// Start it up!
24chillastic.run();
25```
26
27Running the code above will create a single chillastic worker with an API on a random port between 7000-10000. You should see something like the following on console out:
28```
2913:19:28.519Z WARN chillastic :
30 Starting with config: {
31 "FRAMEWORK_NAME": "chillastic",
32 "logLevel": "info",
33 "elasticsearch": {
34 "logLevel": "warn"
35 },
36 "redis": {
37 "host": "localhost",
38 "port": 6379
39 },
40 "port": 9605
41 }
4213:19:28.530Z WARN chillastic : chillastic server listening on port 9605
4313:19:28.544Z INFO chillastic : Starting worker: Rapidskinner Grin
4413:19:28.545Z INFO chillastic : No tasks found, waiting...
4513:19:30.548Z INFO chillastic : No tasks found, waiting...
46```
47
48To get the status of the system, and a full list of workers:
49```
50curl localhost:9605/status
51
52{"manager":"running","workers":{"Rapidskinner Grin":{"status":"waiting for task..."}}}
53```
54
55To add another worker, just start another instance pointed at the same redis instance. Then check the status again for a response like:
56```
57curl localhost:9605/health
58
59{"manager":"running","workers":{"Rapidskinner Grin":{"status":"waiting for task..."},"Windshift Fairy":{"status":"waiting for task..."}}}
60```
61
62That's great, but it's time to do some work.
63
64First we'll define a simple mutator like this:
65```javascript
66const TARGET_INDICES_REGEX = /^log_data_v1/;
67const NEW_INDEX_NAME = 'log_data_v2';
68
69module.exports = {
70 type: 'data',
71 predicate: function (doc) {
72 return TARGET_INDICES_REGEX.test(doc._index);
73 },
74 mutate: function (doc) {
75 doc._index = doc._index.replace(TARGET_INDICES_REGEX, NEW_INDEX_NAME);
76
77 return doc;
78 }
79};
80```
81
82We'll save that to mutator.js, and then send that to the API:
83```bash
84curl localhost:9605/mutators/someNamespace/ourMutator -H 'Content-type: text/plain' --data-binary '@mutator.js'
85```
86
87Define the task to use the mutator we just sent:
88```json
89{
90 "source": {
91 "host": "localhost",
92 "port": 9200
93 },
94 "destination": {
95 "host": "localhost:9201",
96 "port": 9201
97 },
98 "transfer": {
99 "documents": {
100 "fromIndices": "log_data_v1*"
101 }
102 },
103 "mutators": {
104 "actions": [
105 {
106 "namespace": "someNamespace",
107 "id": "ourMutator"
108 }
109 ]
110 }
111}
112```
113
114You can push a task into the system using the API:
115```bash
116curl -XPOST localhost:9605/tasks/newtask -d '{"source":{"host":"localhost","port":9200},"destination":{"host":"localhost:9201","port":9201},"transfer":{"documents":{"fromIndices":"log_data_v1*"}},"mutators":{"actions":[{"namespace":"someNamespace","id":"ourMutator"}]}}'
117```
118
119This task will be split into subtasks, one for each combination of index and type. The workers will then transfer all the documents associated with a specific subtask from one elasticsearch to the other.
120
121### Tasks
122A task defines the work to be done during reindexing and has the following possible fields:
123
124```javascript
125{
126 "source": {
127 "host": "localhost",
128 "port": 9200
129 },
130 "destination": {
131 "host": "localhost:9201",
132 "port": 9201
133 },
134 "transfer": {
135 "documents": {
136 "flushSize": 25000, // Max number of docs in a bulk operation
137 "fromIndices": "log_data_v1*", // Any index names that match this will have their docs transferred
138 "filters": {
139 "actions": [
140 {
141 "namespace": "someNamespace",
142 "id": "ourFilter",
143 "arguments": {}
144 }
145 ],
146 "arguments": {}
147 }
148 },
149 "indices": {
150 "name": "*log_data*", // Any index names that match this will have their settings, mappings, aliases copied
151 "templates": "*log_data*" // Any template names that match this will be copied
152 }
153 },
154 "mutators": {
155 "actions": [
156 {
157 "namespace": "someNamespace",
158 "id": "ourMutator"
159 }
160 ],
161 "arguments": {}
162 }
163}
164```
165
166### Mutators
167Mutators can be of type 'data', 'index', or 'template' and apply to documents, index configurations, and templates respectively.
168
169They are defined as javascript modules and loaded by POSTing them to the mutators/ API endpoint.
170
171```javascript
172// The libraries for 'moment' and 'lodash' are available inside the mutator definition
173const moment = require('moment');
174
175const OLD_DATE_FORMAT = 'YYYY-MM-DD';
176const OLD_DATE_REGEX = /[0-9]{4}-[0-9]{2}-[0-9]{2}/;
177const NEW_DATE_FORMAT = 'YYYY-MM';
178
179module.exports = {
180 /**
181 * Type of mutator
182 */
183 type: 'data',
184 /**
185 * The predicate function is called for every target document
186 * @param doc - The document to be checked against the predicate
187 * @param arguments - The task-specific arguments object
188 * @returns {boolean}
189 */
190 predicate: function (doc, arguments) {
191 return OLD_DATE_REGEX.test(doc._index);
192 },
193 /**
194 * The mutate function is only called on documents that satisfy the predicate
195 * @param doc - The document that satisfied the predicate
196 * @param arguments - The task-specific arguments object
197 * @returns {*}
198 */
199 mutate: function (doc, arguments) {
200 const date = moment(doc._index.match(OLD_DATE_REGEX), OLD_DATE_FORMAT);
201 doc._index = doc._index.replace(OLD_DATE_REGEX, date.format(NEW_DATE_FORMAT));
202
203 return doc;
204 }
205};
206```
207
208### Filters
209Filters are used prior to document transfer to only include specific types or indicies prior to the task starting. While a mutator could be used for this by returning null on specific documents, a filter has the advantage of removing entire indicies and types prior to processing.
210
211Filters are also javascript modules.
212
213```javascript
214module.exports = {
215 type: 'index',
216 /**
217 * Only indicies that trigger this predicate will be included in transfer
218 * @param index - Full index configuration
219 */
220 predicate: (index) => index.name === 'log_data_v1_include_this'
221};
222```
223
224
225### More docs to come