1 | *dataflo.ws*: dataflow processing for node.js
|
2 | =============================================
|
3 |
|
4 | example
|
5 | -------------------------------
|
6 |
|
7 | you can see a working example by running
|
8 |
|
9 | npm install -g dataflo.ws
|
10 | cd $NODE_PATH/dataflo.ws/example/yql/ # example project directory
|
11 | dataflows daemon yql
|
12 |
|
13 | abstract
|
14 | -------------------------------
|
15 |
|
16 | every application is based on series of dataflows. if your dataflow is written
|
17 | in a program code, you have a problem. people don't want to screw up, but
|
18 | they do. `dataflo.ws` is an abstract async processing framework for
|
19 | describing dataflows in simple configuration.
|
20 |
|
21 | you can use a dataflow for description of programmatic dataflows. or
|
22 | real life ones.
|
23 |
|
24 | add DSL by your own taste.
|
25 |
|
26 | concept
|
27 | -------------------------------
|
28 |
|
29 | this project concept is born combining [flow based paradigm](http://en.wikipedia.org/wiki/Flow-based_programming) and
|
30 | [responsibility driven design](http://en.wikipedia.org/wiki/Responsibility-driven_design).
|
31 |
|
32 | typically, the dataflo.ws is designed for usage in client-server applications.
|
33 | an external system (client) makes request and an initiator receives this request.
|
34 | after parsing request, an initiator makes decision which dataflow is responsible for
|
35 | this type of request. when decision is made and a dataflow is found, an initiator starts
|
36 | a dataflow. a dataflow is based on series of tasks. tasks have input parameters and
|
37 | may have a response value. tasks don't talk one-to-one; instead, output
|
38 | from a task is delivered to a dataflow (a controlling routine) via event data (a message);
|
39 | input parameters provided by passing parameters from dataflow.
|
40 |
|
41 | thus, tasks have [loose coupling](http://en.wikipedia.org/wiki/Coupling_(computer_science))
|
42 | and can be [distributed](http://en.wikipedia.org/wiki/Distributed_data_flow).
|
43 |
|
44 | tasks must be designed [functionally cohesed](http://en.wikipedia.org/wiki/Cohesion_(computer_science))
|
45 | in mind, which leads to a good system design.
|
46 |
|
47 | terms
|
48 | ------------------------------
|
49 |
|
50 | ### initiator ###
|
51 |
|
52 | it's an entry point generator. every dataflow has one entry point
|
53 | from one initiator.
|
54 |
|
55 | for example, initiator can be a http daemon, an amqp listener,
|
56 | a file system event (file changed), a process signal handler,
|
57 | a timer or any other external thing.
|
58 |
|
59 | ### entry point ###
|
60 |
|
61 | it's a trigger for a specific dataflow. each initiator has its own entry point
|
62 | description.
|
63 |
|
64 | for example, an entry point for a httpd daemon is an url. httpd may have a simple
|
65 | url match config or a complex route configuration — you decide which initiator handles requests.
|
66 |
|
67 | ### dataflow ###
|
68 |
|
69 | a set of tasks, which leads dataflow to success or fail
|
70 |
|
71 | ### task ###
|
72 |
|
73 | a black processing cube.
|
74 |
|
75 | every `task` has its own requirements. requirements can be an `entry point` data (for
|
76 | example, a query string in case of httpd `initiator`) or another `task` data.
|
77 | if requirements are met, the `task` starts execution. after execution the `task`
|
78 | produces a dataset.
|
79 |
|
80 | real life example
|
81 | -------------------------------
|
82 |
|
83 | you want to visit a conference. your company does all dirty bureaucratic work for you. but,
|
84 | you must make some bureocratic things too.
|
85 |
|
86 | 1. receive approval from your boss.
|
87 | 2. wait for all the needed documents for the conference (foreign visa, travel tickets,
|
88 | hotel booking, the conference fee pay)
|
89 | 3. get travel allowance
|
90 | 4. visit the conference and make a report for finance department (a hotel invoice,
|
91 | airline tickets, a taxi receipt and so on)
|
92 | 5. make a presentation about the conference
|
93 |
|
94 | then tasks look like:
|
95 |
|
96 | conferenceRequest (
|
97 | conferenceInfo
|
98 | ) -> approve
|
99 |
|
100 | after approval we can book a hotel
|
101 |
|
102 | hotelBookingRequest (
|
103 | approve
|
104 | ) -> hotelBooking
|
105 |
|
106 | documents for visa must already contain conference info and booking
|
107 |
|
108 | visaRequest (
|
109 | approve, conferenceInfo, hotelBooking
|
110 | ) -> visa
|
111 |
|
112 | we pay for the conference and tickets after the visa is received
|
113 |
|
114 | conferenceFeePay (
|
115 | approve, visa
|
116 | ) -> conferenceFee
|
117 |
|
118 | ticketsPay (
|
119 | approve, visa
|
120 | ) -> tickets
|
121 |
|
122 |
|
123 |
|
124 | synopsis
|
125 | -------------------------------
|
126 |
|
127 |
|
128 | var httpd = require ('initiator/http');
|
129 |
|
130 | var httpdConfig = {
|
131 | "dataflows": [{
|
132 | "url": "/save",
|
133 | "tasks": [{
|
134 | "$class": "post",
|
135 | "request": "{$request}",
|
136 | $set: "data.body"
|
137 | }, {
|
138 | "$class": "render",
|
139 | "type": "json",
|
140 | "data": "{$data.body}",
|
141 | "output": "{$response}"
|
142 | }]
|
143 | }, {
|
144 | "url": "/entity/tickets/list.json",
|
145 | "tasks": [{
|
146 | "$class": "mongoRequest",
|
147 | "connector": "mongo",
|
148 | "collection": "messages",
|
149 | $set: "data.filter"
|
150 | }, {
|
151 | "$class": "render",
|
152 | "type": "json",
|
153 | "data": "{$data.filter}",
|
154 | "output": "{$response}"
|
155 | }]
|
156 | }]
|
157 | };
|
158 |
|
159 | var initiator = new httpd (httpdConfig);
|
160 |
|
161 |
|
162 |
|
163 | implementation details
|
164 | -----------------------
|
165 |
|
166 | ### initiator ###
|
167 |
|
168 | an initiator makes a request object, which contains all basic info about particular request. the basic info doesn't mean you received all the data, but everything required to complete the request data.
|
169 |
|
170 | example: using a httpd initiator, you receive all GET data i.e. a query string, but in case of a POST request, you'll need to receive all the POST data by yourself (using a task within a dataflow)
|
171 |
|
172 | ### task ###
|
173 |
|
174 | every task has its own state and requirements. all task states:
|
175 |
|
176 | * scarce - the task is in initial state; not enough data to fulfill requirements
|
177 | * ready - task is ready to run (when all task requirements are met)
|
178 | * running - a dataflow has decided to launch this task
|
179 | * idle - not implemented
|
180 | * completed - the task completed without errors
|
181 | * error - the task completed with errors
|
182 | * skipped - the task is skipped, because another execution branch is selected (see below)
|
183 | * exception - somewhere exception was thrown
|
184 |
|
185 |
|
186 | ### flow ###
|
187 |
|
188 | the flow module checks for task requirements and switches task state to `ready`.
|
189 | if any running slots are available, the flow starts task exectution.
|
190 |
|
191 |
|
192 | how to write your own task
|
193 | --------------------------
|
194 |
|
195 | assume we have a task for checking file stats
|
196 |
|
197 | first of all, we need to load a task base class along with fs node module:
|
198 |
|
199 | var task = require ('task/base'),
|
200 | fs = require ('fs');
|
201 |
|
202 | next, we need to write a constructor of our class. a constructor receives all
|
203 | of task parameters and must call this.init (config) after all preparations.
|
204 |
|
205 | var statTask = module.exports = function (config) {
|
206 | this.path = config.path;
|
207 | this.init (config);
|
208 | };
|
209 |
|
210 | next, we inherit task base class and add some methods to our stat class:
|
211 |
|
212 | util.inherits (statTask, task);
|
213 |
|
214 | util.extend (statTask.prototype, {
|
215 |
|
216 | run: function () {
|
217 |
|
218 | var self = this;
|
219 |
|
220 | fs.stat (self.path, function (err, stats) {
|
221 | if (err) {
|
222 | self.emit ('warn', 'stat error for: ' + self.path);
|
223 | self.failed (err);
|
224 | return;
|
225 | }
|
226 |
|
227 | self.emit ('log', 'stat done for: ' + self.path);
|
228 | self.completed (stats);
|
229 | })
|
230 | }
|
231 |
|
232 | });
|
233 |
|
234 | in code above i've used these methods:
|
235 |
|
236 | * emit - emits message to subscriber
|
237 | * failed - a method for indication of the failed task
|
238 | * completed - marks the task as completed successfully
|
239 |
|
240 |
|
241 | see also
|
242 | ---------------------------
|
243 |
|
244 | [http://docs.constructibl.es/specs/js/]
|
245 |
|
246 |
|
247 |
|
248 | license
|
249 | ---------------------------
|
250 |
|
251 | [MIT License](http://opensource.org/licenses/mit-license.html)
|