1 | # chillastic
|
2 | [![Codacy Badge](https://api.codacy.com/project/badge/Grade/636e4a8ac9bd43fab11f33e83061044e)](https://www.codacy.com/app/GroupByInc/chillastic?utm_source=github.com&utm_medium=referral&utm_content=groupby/chillastic&utm_campaign=Badge_Grade) [![Coverage Status](https://coveralls.io/repos/github/groupby/chillastic/badge.svg?branch=master)](https://coveralls.io/github/groupby/chillastic?branch=master) [![Circle CI](https://circleci.com/gh/groupby/chillastic.svg?style=svg)](https://circleci.com/gh/groupby/chillastic)
|
3 |
|
4 | Reindex multiple elasticsearch indices, save your progress, mutate your data in-flight.
|
5 |
|
6 | ### How to use it
|
7 | Install into your project with:
|
8 | ```
|
9 | npm install --save chillastic
|
10 | ```
|
11 |
|
12 | Create an instance of it doing something like (also seen in example.js):
|
13 | ```
|
14 | const Chillastic = require('chillastic');
|
15 | const _ = require('lodash');
|
16 |
|
17 | const REDIS_HOST = 'localhost';
|
18 | const REDIS_PORT = 6379;
|
19 | const CHILL_PORT = _.random(7000, 10000);
|
20 |
|
21 | const chillastic = Chillastic(REDIS_HOST, REDIS_PORT, CHILL_PORT);
|
22 |
|
23 | // Start it up!
|
24 | chillastic.run();
|
25 | ```
|
26 |
|
27 | Running the code above will create a single chillastic worker with an API on a random port between 7000-10000. You should see something like the following on console out:
|
28 | ```
|
29 | 13:19:28.519Z WARN chillastic :
|
30 | Starting with config: {
|
31 | "FRAMEWORK_NAME": "chillastic",
|
32 | "logLevel": "info",
|
33 | "elasticsearch": {
|
34 | "logLevel": "warn"
|
35 | },
|
36 | "redis": {
|
37 | "host": "localhost",
|
38 | "port": 6379
|
39 | },
|
40 | "port": 9605
|
41 | }
|
42 | 13:19:28.530Z WARN chillastic : chillastic server listening on port 9605
|
43 | 13:19:28.544Z INFO chillastic : Starting worker: Rapidskinner Grin
|
44 | 13:19:28.545Z INFO chillastic : No tasks found, waiting...
|
45 | 13:19:30.548Z INFO chillastic : No tasks found, waiting...
|
46 | ```
|
47 |
|
48 | To get the status of the system, and a full list of workers:
|
49 | ```
|
50 | curl localhost:9605/status
|
51 |
|
52 | {"manager":"running","workers":{"Rapidskinner Grin":{"status":"waiting for task..."}}}
|
53 | ```
|
54 |
|
55 | To add another worker, just start another instance pointed at the same redis instance. Then check the status again for a response like:
|
56 | ```
|
57 | curl localhost:9605/health
|
58 |
|
59 | {"manager":"running","workers":{"Rapidskinner Grin":{"status":"waiting for task..."},"Windshift Fairy":{"status":"waiting for task..."}}}
|
60 | ```
|
61 |
|
62 | That's great, but it's time to do some work.
|
63 |
|
64 | First we'll define a simple mutator like this:
|
65 | ```javascript
|
66 | const TARGET_INDICES_REGEX = /^log_data_v1/;
|
67 | const NEW_INDEX_NAME = 'log_data_v2';
|
68 |
|
69 | module.exports = {
|
70 | type: 'data',
|
71 | predicate: function (doc) {
|
72 | return TARGET_INDICES_REGEX.test(doc._index);
|
73 | },
|
74 | mutate: function (doc) {
|
75 | doc._index = doc._index.replace(TARGET_INDICES_REGEX, NEW_INDEX_NAME);
|
76 |
|
77 | return doc;
|
78 | }
|
79 | };
|
80 | ```
|
81 |
|
82 | We'll save that to mutator.js, and then send that to the API:
|
83 | ```bash
|
84 | curl localhost:9605/mutators/someNamespace/ourMutator -H 'Content-type: text/plain' --data-binary '@mutator.js'
|
85 | ```
|
86 |
|
87 | Define the task to use the mutator we just sent:
|
88 | ```json
|
89 | {
|
90 | "source": {
|
91 | "host": "localhost",
|
92 | "port": 9200
|
93 | },
|
94 | "destination": {
|
95 | "host": "localhost:9201",
|
96 | "port": 9201
|
97 | },
|
98 | "transfer": {
|
99 | "documents": {
|
100 | "fromIndices": "log_data_v1*"
|
101 | }
|
102 | },
|
103 | "mutators": {
|
104 | "actions": [
|
105 | {
|
106 | "namespace": "someNamespace",
|
107 | "id": "ourMutator"
|
108 | }
|
109 | ]
|
110 | }
|
111 | }
|
112 | ```
|
113 |
|
114 | You can push a task into the system using the API:
|
115 | ```bash
|
116 | curl -XPOST localhost:9605/tasks/newtask -d '{"source":{"host":"localhost","port":9200},"destination":{"host":"localhost:9201","port":9201},"transfer":{"documents":{"fromIndices":"log_data_v1*"}},"mutators":{"actions":[{"namespace":"someNamespace","id":"ourMutator"}]}}'
|
117 | ```
|
118 |
|
119 | This task will be split into subtasks, one for each combination of index and type. The workers will then transfer all the documents associated with a specific subtask from one elasticsearch to the other.
|
120 |
|
121 | ### Tasks
|
122 | A task defines the work to be done during reindexing and has the following possible fields:
|
123 |
|
124 | ```javascript
|
125 | {
|
126 | "source": {
|
127 | "host": "localhost",
|
128 | "port": 9200
|
129 | },
|
130 | "destination": {
|
131 | "host": "localhost:9201",
|
132 | "port": 9201
|
133 | },
|
134 | "transfer": {
|
135 | "documents": {
|
136 | "flushSize": 25000, // Max number of docs in a bulk operation
|
137 | "fromIndices": "log_data_v1*", // Any index names that match this will have their docs transferred
|
138 | "filters": {
|
139 | "actions": [
|
140 | {
|
141 | "namespace": "someNamespace",
|
142 | "id": "ourFilter",
|
143 | "arguments": {}
|
144 | }
|
145 | ],
|
146 | "arguments": {}
|
147 | }
|
148 | },
|
149 | "indices": {
|
150 | "name": "*log_data*", // Any index names that match this will have their settings, mappings, aliases copied
|
151 | "templates": "*log_data*" // Any template names that match this will be copied
|
152 | }
|
153 | },
|
154 | "mutators": {
|
155 | "actions": [
|
156 | {
|
157 | "namespace": "someNamespace",
|
158 | "id": "ourMutator"
|
159 | }
|
160 | ],
|
161 | "arguments": {}
|
162 | }
|
163 | }
|
164 | ```
|
165 |
|
166 | ### Mutators
|
167 | Mutators can be of type 'data', 'index', or 'template' and apply to documents, index configurations, and templates respectively.
|
168 |
|
169 | They are defined as javascript modules and loaded by POSTing them to the mutators/ API endpoint.
|
170 |
|
171 | ```javascript
|
172 | // The libraries for 'moment' and 'lodash' are available inside the mutator definition
|
173 | const moment = require('moment');
|
174 |
|
175 | const OLD_DATE_FORMAT = 'YYYY-MM-DD';
|
176 | const OLD_DATE_REGEX = /[0-9]{4}-[0-9]{2}-[0-9]{2}/;
|
177 | const NEW_DATE_FORMAT = 'YYYY-MM';
|
178 |
|
179 | module.exports = {
|
180 | /**
|
181 | * Type of mutator
|
182 | */
|
183 | type: 'data',
|
184 | /**
|
185 | * The predicate function is called for every target document
|
186 | * @param doc - The document to be checked against the predicate
|
187 | * @param arguments - The task-specific arguments object
|
188 | * @returns {boolean}
|
189 | */
|
190 | predicate: function (doc, arguments) {
|
191 | return OLD_DATE_REGEX.test(doc._index);
|
192 | },
|
193 | /**
|
194 | * The mutate function is only called on documents that satisfy the predicate
|
195 | * @param doc - The document that satisfied the predicate
|
196 | * @param arguments - The task-specific arguments object
|
197 | * @returns {*}
|
198 | */
|
199 | mutate: function (doc, arguments) {
|
200 | const date = moment(doc._index.match(OLD_DATE_REGEX), OLD_DATE_FORMAT);
|
201 | doc._index = doc._index.replace(OLD_DATE_REGEX, date.format(NEW_DATE_FORMAT));
|
202 |
|
203 | return doc;
|
204 | }
|
205 | };
|
206 | ```
|
207 |
|
208 | ### Filters
|
209 | Filters are used prior to document transfer to only include specific types or indicies prior to the task starting. While a mutator could be used for this by returning null on specific documents, a filter has the advantage of removing entire indicies and types prior to processing.
|
210 |
|
211 | Filters are also javascript modules.
|
212 |
|
213 | ```javascript
|
214 | module.exports = {
|
215 | type: 'index',
|
216 | /**
|
217 | * Only indicies that trigger this predicate will be included in transfer
|
218 | * @param index - Full index configuration
|
219 | */
|
220 | predicate: (index) => index.name === 'log_data_v1_include_this'
|
221 | };
|
222 | ```
|
223 |
|
224 |
|
225 | ### More docs to come
|