1 | BigML Node.js Bindings
|
2 | ======================
|
3 |
|
4 | [BigML](https://bigml.com) makes machine learning easy by taking care
|
5 | of the details required to add data-driven decisions and predictive
|
6 | power to your company.
|
7 | Unlike other machine learning services, BigML
|
8 | creates
|
9 | [beautiful predictive models](https://bigml.com/gallery/models) that
|
10 | can be easily understood and interacted with.
|
11 |
|
12 | These BigML Node.js bindings allow you to interact with BigML.io, the API
|
13 | for BigML. You can use it to easily create, retrieve, list, update, and
|
14 | delete BigML resources (i.e., sources, datasets, models and
|
15 | predictions).
|
16 |
|
17 | This module is licensed under the [Apache License, Version
|
18 | 2.0](http://www.apache.org/licenses/LICENSE-2.0.html).
|
19 |
|
20 | Support
|
21 | -------
|
22 |
|
23 | Please report problems and bugs to our [BigML.io issue
|
24 | tracker](https://github.com/bigmlcom/io/issues).
|
25 |
|
26 | Discussions about the different bindings take place in the general
|
27 | [BigML mailing list](http://groups.google.com/group/bigml). Or join us
|
28 | in our [Campfire chatroom](https://bigmlinc.campfirenow.com/f20a0).
|
29 |
|
30 | Requirements
|
31 | ------------
|
32 |
|
33 | Node 0.10 is currently supported by these bindings.
|
34 |
|
35 | The only mandatory third-party dependencies are the
|
36 | [request](https://github.com/mikeal/request.git),
|
37 | [winston](https://github.com/flatiron/winston.git) and
|
38 | [form-data](https://github.com/felixge/node-form-data.git) libraries.
|
39 |
|
40 | The testing environment requires the additional
|
41 | [mocha](https://github.com/visionmedia/mocha) package that can be installed
|
42 | with the following command:
|
43 |
|
44 | $ sudo npm install -g mocha
|
45 |
|
46 | Installation
|
47 | ------------
|
48 |
|
49 | To install the latest stable release with
|
50 | [npm](https://npmjs.org/):
|
51 |
|
52 | $ npm install bigml
|
53 |
|
54 | You can also install the development version of the bindings by cloning the
|
55 | Git repository to your local computer and issuing:
|
56 |
|
57 | $ npm install .
|
58 |
|
59 | Testing
|
60 | -------
|
61 |
|
62 | The test suite is run automatically using `mocha` as test framework. As all the
|
63 | tested api objects perform one or more connections to the remote resources in
|
64 | bigml.com, you may have to enlarge the default timeout used by `mocha` in
|
65 | each test. For instance:
|
66 |
|
67 | $ mocha -t 20000
|
68 |
|
69 | will set the timeout limit to 20 seconds.
|
70 | This limit should typically be enough, but you can change it to fit
|
71 | the latencies of your connection. You can also add the `-R spec` flag to see
|
72 | the definition of each step as they go.
|
73 |
|
74 | Importing the modules
|
75 | ---------------------
|
76 |
|
77 | To use the library, import it with `require`:
|
78 |
|
79 | $ node
|
80 | > bigml = require('bigml');
|
81 |
|
82 | this will give you access to the following library structure:
|
83 |
|
84 | - bigml.constants common constants
|
85 | - bigml.BigML connection object
|
86 | - bigml.Resource common API methods
|
87 | - bigml.Source Source API methods
|
88 | - bigml.Dataset Dataset API methods
|
89 | - bigml.Model Model API methods
|
90 | - bigml.Ensemble Ensemble API methods
|
91 | - bigml.Prediction Prediction API methods
|
92 | - bigml.BatchPrediction BatchPrediction API methods
|
93 | - bigml.Evaluation Evaluation API methods
|
94 | - bigml.Cluster Cluster API methods
|
95 | - bigml.Centroid Centroid API methods
|
96 | - bigml.BatchCentroid BatchCentroid API methods
|
97 | - bigml.Anomaly Anomaly detector API methods
|
98 | - bigml.AnomalyScore Anomaly score API methods
|
99 | - bigml.BatchAnomalyScore BatchAnomalyScore API methods
|
100 | - bigml.Project Project API methods
|
101 | - bigml.Sample Sample API methods
|
102 | - bigml.Correlation Correlation API methods
|
103 | - bigml.StatisticalTests StatisticalTest API methods
|
104 | - bigml.LogisticRegression LogisticRegression API methods
|
105 | - bigml.Association Association API methods
|
106 | - bigml.AssociationSet Associationset API methods
|
107 | - bigml.TopicModel Topic Model API methods
|
108 | - bigml.TopicDistribution Topic Distribution API methods
|
109 | - bigml.BatchTopicDistribution Batch Topic Distribution API methods
|
110 | - bigml.Deepnet Deepnet API methods
|
111 | - bigml.Fusion Fusion API methods
|
112 | - bigml.PCA PCA API methods
|
113 | - bigml.Projection Projection API methods
|
114 | - bigml.BatchProjection Batch Projection API methods
|
115 | - bigml.LinearRegression Linear Regression API methods
|
116 | - bigml.ExternalConnector External Connector API methods
|
117 | - bigml.Script Script API methods
|
118 | - bigml.Execution Execution API methods
|
119 | - bigml.Library Library API methods
|
120 | - bigml.LocalModel Model for local predictions
|
121 | - bigml.LocalEnsemble Ensemble for local predictions
|
122 | - bigml.LocalCluster Cluster for local centroids
|
123 | - bigml.LocalAnomaly Anomaly detector for local anomaly scores
|
124 | - bigml.LocalLogisticRegression Logistic regression model for local predictions
|
125 | - bigml.LocalAssociation Association model for associaton rules
|
126 | - bigml.LocalTopicModel Topic Model for local predictions
|
127 | - bigml.LocalTimeSeries Time Series for local forecasts
|
128 | - bigml.LocalDeepnet Deepnets for local predictions
|
129 | - bigml.LocalFusion Fusions for local predictions
|
130 | - bigml.LocalPCA PCA for local projections
|
131 | - bigml.LocalLinearRegression Linear Regression for local predictions
|
132 |
|
133 |
|
134 | Authentication
|
135 | --------------
|
136 |
|
137 | All the requests to BigML.io must be authenticated using your username
|
138 | and [API key](https://bigml.com/account/apikey) and are always
|
139 | transmitted over HTTPS.
|
140 |
|
141 | This module will look for your username and API key in the environment
|
142 | variables `BIGML_USERNAME` and `BIGML_API_KEY` respectively. You can
|
143 | add the following lines to your `.bashrc` or `.bash_profile` to set
|
144 | those variables automatically when you log in::
|
145 |
|
146 | export BIGML_USERNAME=myusername
|
147 | export BIGML_API_KEY=ae579e7e53fb9abd646a6ff8aa99d4afe83ac291
|
148 |
|
149 | With that environment set up, connecting to BigML is a breeze::
|
150 |
|
151 | connection = new bigml.BigML();
|
152 |
|
153 | Otherwise, you can initialize directly when instantiating the BigML
|
154 | class as follows::
|
155 |
|
156 | connection = new bigml.BigML('myusername',
|
157 | 'ae579e7e53fb9abd646a6ff8aa99d4afe83ac291');
|
158 |
|
159 | Connection information
|
160 | ----------------------
|
161 |
|
162 | If your BigML installation is not in `bigml.io` you can adapt your connection
|
163 | to point to your customized domain. For instance, if your user is in the
|
164 | australian site, your domain should point to `au.bigml.io`. This can be
|
165 | achieved by adding the::
|
166 |
|
167 | export BIGML_DOMAIN=au.bigml.io
|
168 |
|
169 | environment variable and your connection object will take its value and
|
170 | create the convenient urls. You can also set this value dinamically (together
|
171 | with the protocol used, if you need to change it)::
|
172 |
|
173 | connection = new bigml.BigML('myusername',
|
174 | 'ae579e7e53fb9abd646a6ff8aa99d4afe83ac291',
|
175 | {domain: 'au.bigml.io',
|
176 | protocol: 'https'});
|
177 |
|
178 | The default if no domain or protocol information is provided, the connection
|
179 | is uses `bigml.io` and `https` as default.
|
180 |
|
181 | Also, you can set a local directory to be used as storage. Using this
|
182 | mechanism, any resource you download using this connection object is stored
|
183 | as a json file in the directory. The name of the file is the resource ID string
|
184 | replacing the slash by an underscore::
|
185 |
|
186 | connection = new bigml.BigML('myusername',
|
187 | 'ae579e7e53fb9abd646a6ff8aa99d4afe83ac291',
|
188 | {storage: './my_storage'});
|
189 |
|
190 | Note that the old `devMode` parameter has been deprecated and will no longer
|
191 | be accepted.
|
192 |
|
193 |
|
194 | Quick Start
|
195 | -----------
|
196 |
|
197 | Let's see the steps that will lead you from [this csv
|
198 | file](https://static.bigml.com/csv/iris.csv) containing the [Iris
|
199 | flower dataset](http://en.wikipedia.org/wiki/Iris_flower_data_set) to
|
200 | predicting the species of a flower whose `sepal length` is `5` and
|
201 | whose `sepal width` is `2.5`. By default, BigML considers the last field
|
202 | (`species`) in the row as the
|
203 | objective field (i.e., the field that you want to generate predictions
|
204 | for). The csv structure is::
|
205 |
|
206 | sepal length,sepal width,petal length,petal width,species
|
207 | 5.1,3.5,1.4,0.2,Iris-setosa
|
208 | 4.9,3.0,1.4,0.2,Iris-setosa
|
209 | 4.7,3.2,1.3,0.2,Iris-setosa
|
210 | ...
|
211 |
|
212 | The steps required to generate a prediction are creating a set of
|
213 | source, dataset and model objects::
|
214 |
|
215 | ```js
|
216 | var bigml = require('bigml');
|
217 | var source = new bigml.Source();
|
218 | source.create('./data/iris.csv', function(error, sourceInfo) {
|
219 | if (!error && sourceInfo) {
|
220 | var dataset = new bigml.Dataset();
|
221 | dataset.create(sourceInfo, function(error, datasetInfo) {
|
222 | if (!error && datasetInfo) {
|
223 | var model = new bigml.Model();
|
224 | model.create(datasetInfo, function (error, modelInfo) {
|
225 | if (!error && modelInfo) {
|
226 | var prediction = new bigml.Prediction();
|
227 | prediction.create(modelInfo, {'petal length': 1})
|
228 | }
|
229 | });
|
230 | }
|
231 | });
|
232 | }
|
233 | });
|
234 | ```
|
235 |
|
236 | Note that in our example the `prediction.create` call has no associated
|
237 | callback. All the CRUD methods of any resource allow assigning a callback as
|
238 | the last parameter,
|
239 | but if you don't the default action will be
|
240 | printing the resulting resource or the error. For the `create` method:
|
241 |
|
242 | > result:
|
243 | { code: 201,
|
244 | object:
|
245 | { category: 0,
|
246 | code: 201,
|
247 | content_type: 'text/csv',
|
248 | created: '2013-06-08T15:22:36.834797',
|
249 | credits: 0,
|
250 | description: '',
|
251 | fields_meta: { count: 0, limit: 1000, offset: 0, total: 0 },
|
252 | file_name: 'iris.csv',
|
253 | md5: 'd1175c032e1042bec7f974c91e4a65ae',
|
254 | name: 'iris.csv',
|
255 | number_of_datasets: 0,
|
256 | number_of_ensembles: 0,
|
257 | number_of_models: 0,
|
258 | number_of_predictions: 0,
|
259 | private: true,
|
260 | resource: 'source/51b34c3c37203f4678000020',
|
261 | size: 4608,
|
262 | source_parser: {},
|
263 | status:
|
264 | { code: 1,
|
265 | message: 'The request has been queued and will be processed soon' },
|
266 | subscription: false,
|
267 | tags: [],
|
268 | type: 0,
|
269 | updated: '2013-06-08T15:22:36.834844' },
|
270 | resource: 'source/51b34c3c37203f4678000020',
|
271 | location: 'https://localhost:1026/andromeda/source/51b34c3c37203f4678000020',
|
272 | error: null }
|
273 |
|
274 |
|
275 | The generated objects can be retrieved, updated and deleted through the
|
276 | corresponding REST methods. For instance, in the previous example you would
|
277 | use:
|
278 |
|
279 | ```js
|
280 | bigml = require('bigml');
|
281 | var source = new bigml.Source();
|
282 | source.get('source/51b25fb237203f4410000010', function (error, resource) {
|
283 | if (!error && resource) {
|
284 | console.log(resource);
|
285 | }
|
286 | })
|
287 | ```
|
288 | to recover and show the source information.
|
289 |
|
290 | When a resource `create` call is sent,
|
291 | the request creates an evolving resource that will go through some stages
|
292 | till it ends up being finished or faulty.
|
293 | BigML's API will give asynchronous access to the resource at any time,
|
294 | so the `create` method response might contain an in-process resource that
|
295 | will lack some of the properties that it will have when finished.
|
296 | That is helpful to build any-time models and to use non-blocking
|
297 | calls. However, in order to have the complete information that a finished
|
298 | resource contains, we will probably need to wait till it
|
299 | reaches its final state. The `createAndWait` methods provide alternatives
|
300 | to `create` that wait for the resource to be finished before returning the
|
301 | result.
|
302 |
|
303 | ```js
|
304 | bigml = require('bigml');
|
305 | var dataset = new bigml.Dataset();
|
306 | dataset.createAndWait('source/51b25fb237203f4410000010',
|
307 | function (error, resource) {
|
308 | if (!error && resource) {
|
309 | console.log("The dataset has been completely created.")
|
310 | console.log(resource);
|
311 | }
|
312 | })
|
313 | ```
|
314 |
|
315 | You can work with different credentials by setting them in the connection
|
316 | object, as explained in the Authentication section.
|
317 |
|
318 | ```js
|
319 | bigml = require('bigml');
|
320 | var connection = new bigml.BigML('myusername',
|
321 | 'ae579e7e53fb9abd646a6ff8aa99d4afe83ac291')
|
322 | // the new Source object will use the connection credentials for remote
|
323 | // authentication.
|
324 | var source = new bigml.Source(connection);
|
325 | source.get('source/51b25fb237203f4410000010' function (error, resource) {
|
326 | if (!error && resource) {
|
327 | console.log(resource);
|
328 | }
|
329 | })
|
330 | ```
|
331 |
|
332 | You can also generate local predictions using the information of your
|
333 | models:
|
334 |
|
335 | ```js
|
336 | bigml = require('bigml');
|
337 | var localModel = new bigml.LocalModel('model/51922d0b37203f2a8c000010');
|
338 | localModel.predict({'petal length': 1},
|
339 | function(error, prediction) {console.log(prediction)});
|
340 | ```
|
341 |
|
342 | And similarly, for your ensembles
|
343 |
|
344 | ```js
|
345 | bigml = require('bigml');
|
346 | var localEnsemble = new bigml.LocalEnsemble('ensemble/51901f4337203f3a9a000215');
|
347 | localEnsemble.predict({'petal length': 1}, 0,
|
348 | function(error, prediction) {console.log(prediction)});
|
349 | ```
|
350 | or any of the other modeling resources offered by BigML. The previous example
|
351 | will generate a prediction for a `Decision Forest` ensemble
|
352 | by combining the predictions of each of the models
|
353 | they enclose. The example uses the `plurality` combination method (whose code
|
354 | is `0`. Check the docs for more information about the available combination
|
355 | methods). All the three kinds of ensembles available in BigML
|
356 | (`Decision Forest`,
|
357 | `Random Decision Forest` and `Boosting Trees`) can be used to predict locally
|
358 | through this `LocalEnsemble` object.
|
359 |
|
360 | Types of resources
|
361 | ------------------
|
362 |
|
363 | Currently these are the types of resources in bigml.com:
|
364 |
|
365 | - **external connectors** Contain the information to connect to an external
|
366 | database manager. They can be used to upload data to BigML to build `sources`.
|
367 | These resources are handled through `bigml.ExternalConnector`.
|
368 |
|
369 | - **sources** Contain the data uploaded from your local data file, inline data
|
370 | or an external data source after processing (interpreting field types
|
371 | or missing characters, for instance).
|
372 | You can set their locale settings or their field names or types. These resources
|
373 | are handled through `bigml.Source`.
|
374 |
|
375 | - **datasets** Contain the information of the source in a structured summarized
|
376 | way according to their file types (numeric, categorical, text or datetime).
|
377 | These resources are handled through `bigml.Dataset`.
|
378 |
|
379 | - **models** They are tree-like structures extracted from a dataset in order to
|
380 | predict one field, the objective field, according to the values of other
|
381 | fields, the input fields. These resources
|
382 | are handled through `bigml.Model`.
|
383 |
|
384 | - **predictions** Are the representation of the predicted value for the
|
385 | objective field obtained by applying the model to an input data set. These
|
386 | resources are handled through `bigml.Prediction`.
|
387 |
|
388 | - **ensembles** Are a group of models extracted from a single dataset to be
|
389 | used together in order to predict the objective field. BigML offers three
|
390 | kinds of ensembles:
|
391 | `Decision Forests`, `Random Decision Forests` and `Boosting Trees`.
|
392 | All these resources are handled through `bigml.Ensemble`.
|
393 |
|
394 | - **evaluations** Are a set of measures of performance defined on your model
|
395 | or ensemble by checking predictions for the objective field of
|
396 | a test dataset with its provided values. These resources
|
397 | are handled through `bigml.Evaluation`.
|
398 |
|
399 | - **batch predictions** Are groups of predictions for the objective field
|
400 | obtained by applying the model or ensemble to a dataset resource. These
|
401 | resources are handled through `bigml.BatchPredictions`.
|
402 |
|
403 | - **clusters** They are unsupervised learning models that define groups of
|
404 | instances in the training dataset according to the similarity of their
|
405 | features. Each group has a central instance, named Centroid, and all
|
406 | instances in the group form a new dataset. These resources are handled
|
407 | through `bigml.Cluster`.
|
408 |
|
409 | - **centroids** Are the central instances of the groups defined in a cluster.
|
410 | They are the values predicted by the cluster when new input data is given.
|
411 | These resources are handled through `bigml.Centroid`
|
412 |
|
413 | - **batch centroids** Are lists of centroids obtained by using the cluster to
|
414 | classify a dataset of input data. They are analogous to the batch
|
415 | predictions generated from models, but for clusters. These resources
|
416 | are handled through `bigml.BatchCentroid`.
|
417 |
|
418 | - **anomaly detectors** They are unsupervised learning models
|
419 | that detect instances in the training dataset that are anomalous.
|
420 | The information it returns encloses a `top_anomalies` block
|
421 | that contains a list of the most anomalous
|
422 | points. For each instance, a `score` from 0 to 1 is computed. The closer to 1,
|
423 | the more anomalous. These resources are handled
|
424 | through `bigml.Anomaly`.
|
425 |
|
426 | - **anomaly scores** Are scores computed for any user-given input data using
|
427 | an anomaly detector. These resources are handled through `bigml.AnomalyScore`
|
428 |
|
429 | - **batch anomaly scores** Are lists of anomaly scores obtained by using
|
430 | the anomaly detector to
|
431 | classify a dataset of input data. They are analogous to the batch
|
432 | predictions generated from models, but for anomalies. These resources
|
433 | are handled through `bigml.BatchAnomalyScore`.
|
434 |
|
435 | - **projects** These resources are meant for organizational purposes only.
|
436 | The rest of resources can be related to one `project` that groups them.
|
437 | Only sources can be assigned to a `project`, the rest of resources inherit
|
438 | the `project` reference from its originating source. Projects are handled
|
439 | through `bigml.Project`.
|
440 |
|
441 | - **samples** These resources provide quick access to your raw data. They are
|
442 | objects cached in-memory by the server that can be queried for subsets
|
443 | of data by limiting
|
444 | their size, the fields or the rows returned. Samples are handled
|
445 | through `bigml.Sample`.
|
446 |
|
447 | - **correlations** These resources contain a series of computations that
|
448 | reflect the
|
449 | degree of dependence between the field set as objective for your predictions
|
450 | and the rest of fields in your dataset. The dependence degree is obtained by
|
451 | comparing the distributions in every objective and non-objective field pair,
|
452 | as independent fields should have probabilistic
|
453 | independent distributions. Depending on the types of the fields to compare,
|
454 | the metrics used to compute the correlation degree will change. Check the
|
455 | [developers documentation](https://bigml.com/api/correlations#retrieving-correlation)
|
456 | for a detailed description. Correlations are handled
|
457 | through `bigml.Correlation`.
|
458 |
|
459 | - **statistical tests** These resources contain a series of statistical tests
|
460 | that compare the
|
461 | distribution of data in each numeric field to certain canonical distributions,
|
462 | such as the
|
463 | [normal distribution](https://en.wikipedia.org/wiki/Normal_distribution)
|
464 | or [Benford's law](https://en.wikipedia.org/wiki/Benford%27s_law)
|
465 | distribution. Statistical tests are handled
|
466 | through `bigml.StatisticalTest`.
|
467 |
|
468 | - **logistic regressions** These resources are models to solve classification
|
469 | problems by predicting one field of the dataset, the objective field,
|
470 | based on the values of the other fields, the input fields. The prediction
|
471 | is made using a logistic function whose argument is a linear combination
|
472 | of the predictor's values. Check the
|
473 | [developers documentation](https://bigml.com/api/logisticregressions)
|
474 | for a detailed description. These resources
|
475 | are handled through `bigml.LogisticRegression`.
|
476 |
|
477 | - **associations** These resources are models to discover the existing
|
478 | associations between the field values in your dataset. Check the
|
479 | [developers documentation](https://bigml.com/api/associations)
|
480 | for a detailed description. These resources
|
481 | are handled through `bigml.Association`.
|
482 |
|
483 | - **association sets** These resources are the sets of items associated to
|
484 | the ones in your input data and their score. Check the
|
485 | [developers documentation](https://bigml.com/api/associationsets)
|
486 | for a detailed description. These resources
|
487 | are handled through `bigml.AssociationSet`.
|
488 |
|
489 | - **topic models** These resources are models to discover topics underlying a
|
490 | collection of documents. Check the
|
491 | [developers documentation](https://bigml.com/api/topicmodels)
|
492 | for a detailed description. These resources
|
493 | are handled through `bigml.TopicModel`.
|
494 |
|
495 | - **topic distributions** These resources contain the
|
496 | probabilites
|
497 | for a document to belong to each one of the topics in a `topic model`.
|
498 | Check the
|
499 | [developers documentation](https://bigml.com/api/topicdistributions)
|
500 | for a detailed description. These resources
|
501 | are handled through `bigml.TopicDistribution`.
|
502 |
|
503 | - **batch topic distributions** These resources contain a list of
|
504 | the probabilites
|
505 | for a collection of documents to belong to each one of the topics in
|
506 | a `topic model`. Check the
|
507 | [developers documentation](https://bigml.com/api/batchtopicdistributions)
|
508 | for a detailed description. These resources
|
509 | are handled through `bigml.BatchTopicDistribution`.
|
510 |
|
511 | - **time series** These resources are models to discover the patterns in
|
512 | the properties of a sequence of ordered data. Check the
|
513 | [developers documentation](https://bigml.com/api/timeseries)
|
514 | for a detailed description. These resources
|
515 | are handled through `bigml.TimeSeries`.
|
516 |
|
517 | - **forecasts** These resources contain forecasts for the numeric
|
518 | fields in a dataset as predicted by a `timeseries` model.
|
519 | Check the
|
520 | [developers documentation](https://bigml.com/api/forecasts)
|
521 | for a detailed description. These resources
|
522 | are handled through `bigml.Forecast`.
|
523 |
|
524 | - **deepnets** These resources are classification and regression models based
|
525 | on deep neural networks. Check the
|
526 | [developers documentation](https://bigml.com/api/deepnets)
|
527 | for a detailed description. These resources
|
528 | are handled through `bigml.Deepnet`.
|
529 |
|
530 | - **fusions** These resources are classification and regression models based
|
531 | on mixed supervised models. Check the
|
532 | [developers documentation](https://bigml.com/api/fusions)
|
533 | for a detailed description. These resources
|
534 | are handled through `bigml.Fusion`.
|
535 |
|
536 | - **PCAs** These resources are models for dimensional reduction. Check the
|
537 | [developers documentation](https://bigml.com/api/pcas)
|
538 | for a detailed description. These resources
|
539 | are handled through `bigml.PCA`.
|
540 |
|
541 | - **projections** These resources are the result of applying PCAs to get
|
542 | a smaller features set covering the variance in data. Check the
|
543 | [developers documentation](https://bigml.com/api/projections)
|
544 | for a detailed description. These resources
|
545 | are handled through `bigml.Projection`.
|
546 |
|
547 | - **batch projections** These resources are the result of applying PCAs to
|
548 | a dataset to get a smaller features set covering the variance in data.
|
549 | Check the
|
550 | [developers documentation](https://bigml.com/api/batchprojections)
|
551 | for a detailed description. These resources
|
552 | are handled through `bigml.Fusion`
|
553 |
|
554 | - **linear regressions** These resources are regression models based on the
|
555 | assumption of a linear relation between the predictors and the outcome.
|
556 | Check the
|
557 | [developers documentation](https://bigml.com/api/linearregressions)
|
558 | for a detailed description. These resources
|
559 | are handled through `bigml.LinearRegression`
|
560 |
|
561 | - **scripts** These resources are Whizzml scripts, that can be created
|
562 | to handle workflows, which provide a means of automating the creation and
|
563 | management of the rest of resources. Check the
|
564 | [developers documentation](https://bigml.com/api/scripts)
|
565 | for a detailed description. These resources
|
566 | are handled through `bigml.Script`.
|
567 |
|
568 | - **executions** These resources are Whizzml scripts' executions, that
|
569 | can be created to execute the workflows defined in the `Whizzml scripts`.
|
570 | Check the
|
571 | [developers documentation](https://bigml.com/api/executions)
|
572 | for a detailed description. These resources
|
573 | are handled through `bigml.Execution`.
|
574 |
|
575 | - **libraries** These resources are Whizzml libraries, that
|
576 | can be created to store definitions of constants and functions which can
|
577 | be imported and used in the `Whizzml scripts`.
|
578 | Check the
|
579 | [developers documentation](https://bigml.com/api/libraries)
|
580 | for a detailed description. These resources
|
581 | are handled through `bigml.Library`.
|
582 |
|
583 | Creating resources
|
584 | ------------------
|
585 |
|
586 | As you've seen in the quick start section, each resource has its own creation
|
587 | method availabe in the corresponding resource object. Sources are created
|
588 | by uploading a local csv file:
|
589 |
|
590 | ```js
|
591 | var bigml = require('bigml');
|
592 | var source = new bigml.Source();
|
593 | source.create('./data/iris.csv', {name: 'my source'}, true,
|
594 | function(error, sourceInfo) {
|
595 | if (!error && sourceInfo) {
|
596 | console.log(sourceInfo);
|
597 | }
|
598 | });
|
599 | ```
|
600 | The first argument in the `create` method of the `bigml.Source` is the csv
|
601 | file, the next one is an object to set some of the source properties,
|
602 | in this case its name, a boolean that determines if retries will be used
|
603 | in case a resumable error occurs, and finally the chosen callback.
|
604 | The arguments are optional (for this method and all
|
605 | the `create` methods of the rest of resources).
|
606 |
|
607 | It is important to instantiate a new resource object (`new bigml.Source()` in
|
608 | this case) for each different resource, because each one stores internally the
|
609 | parameters used in the last REST call. They are available
|
610 | to be used if the call needs to be retried. For instance, if your internet
|
611 | connection falls for a while, the `create` call will be retried a limited number
|
612 | of times using this information unless you explicitely command it by using the
|
613 |
|
614 |
|
615 | For datasets to be created you need a source object or id, another dataset
|
616 | object or id, a list of dataset ids or a cluster id as first argument
|
617 | in the `create` method.
|
618 | In the first case, it
|
619 | generates a dataset using the data of the source and in the second,
|
620 | the method is used to generate new datasets by splitting the original one.
|
621 | For instance,
|
622 |
|
623 | ```js
|
624 | var bigml = require('bigml');
|
625 | var dataset = new bigml.Dataset();
|
626 | dataset.create('source/51b25fb237203f4410000010',
|
627 | {name: 'my dataset', size: 1024}, true,
|
628 | function(error, datasetInfo) {
|
629 | if (!error && datasetInfo) {
|
630 | console.log(datasetInfo);
|
631 | }
|
632 | });
|
633 | ```
|
634 |
|
635 | will create a dataset named `my dataset` with the first 1024
|
636 | bytes of the source. And
|
637 |
|
638 | ```js
|
639 | dataset.create('dataset/51b3c4c737203f16230000d1',
|
640 | {name: 'split dataset', sample_rate: 0.8}, true,
|
641 | function(error, datasetInfo) {
|
642 | if (!error && datasetInfo) {
|
643 | console.log(datasetInfo);
|
644 | }
|
645 | });
|
646 | ```
|
647 |
|
648 | will create a new dataset by sampling 80% of the data in the original dataset.
|
649 |
|
650 | Clusters can also be used to generate datasets containing the instances
|
651 | grouped around each centroid. You will need the cluster id and the centroid id
|
652 | to reference the dataset to be created. For instance,
|
653 |
|
654 | ```js
|
655 | cluster.create(datasetId, function (error, data) {
|
656 | var clusterId = data.resource;
|
657 | var centroidId = '000000';
|
658 | dataset.create(clusterId, {centroid: centroidId},
|
659 | function (error, data) {
|
660 | console.log(data);
|
661 | });
|
662 | });
|
663 | ```
|
664 |
|
665 | All datasets can be exported to a local CSV file using the ``download``
|
666 | method of ``Dataset`` objects.
|
667 |
|
668 | ```js
|
669 | var bigml = require('bigml');
|
670 | dataset = new bigml.Dataset();
|
671 | dataset.download('dataset/53b0aa6837203f4341000034',
|
672 | 'my_exported_file.csv',
|
673 | function (error, data) {
|
674 | console.log("data:" + data);
|
675 | });
|
676 | ```
|
677 |
|
678 | would generate a new dataset containing the subset of instances in the cluster
|
679 | associated to the centroid id ``000000``.
|
680 |
|
681 | Similarly to create models, ensembles, logistic regressions, deepnets and any modeling resource,
|
682 | you will need a dataset as first argument.
|
683 |
|
684 | Evaluations will need a model as first argument and a dataset as second one and
|
685 | predictions need a model as first argument too:
|
686 |
|
687 | ```js
|
688 | var bigml = require('bigml');
|
689 | var evaluation = new bigml.Evaluation();
|
690 | evaluation.create('model/51922d0b37203f2a8c000010',
|
691 | 'dataset/51b3c4c737203f16230000d1',
|
692 | {'name': 'my evaluation'}, true,
|
693 | function(error, evaluationInfo) {
|
694 | if (!error && evaluationInfo) {
|
695 | console.log(evaluationInfo);
|
696 | }
|
697 | });
|
698 | ```
|
699 |
|
700 | Newly-created resources are returned in an object with the following
|
701 | keys:
|
702 |
|
703 | - **code**: If the request is successful you will get a
|
704 | `constants.HTTP_CREATED` (201) status code. Otherwise, it will be
|
705 | one of the standard HTTP error codes [detailed in the
|
706 | documentation](https://bigml.com/api/status_codes).
|
707 | - **resource**: The identifier of the new resource.
|
708 | - **location**: The location of the new resource.
|
709 | - **object**: The resource itself, as computed by BigML.
|
710 | - **error**: If an error occurs and the resource cannot be created, it
|
711 | will contain an additional code and a description of the error. In
|
712 | this case, **location**, and **resource** will be `null`.
|
713 |
|
714 | Bigml.com will answer your `create` call immediately, even if the resource
|
715 | is not finished yet (see the
|
716 | [documentation on status
|
717 | codes](https://bigml.com/api/status_codes) for the listing of
|
718 | potential states and their semantics). To retrieve a finished resource,
|
719 | you'll need to use the `get` method described in the next section.
|
720 |
|
721 | Getting resources
|
722 | -----------------
|
723 |
|
724 | To retrieve an existing resource, you use the `get`
|
725 | method of the corresponding class. Let's see an example of model retrieval:
|
726 |
|
727 | ```js
|
728 | var bigml = require('bigml');
|
729 | var model = new bigml.Model();
|
730 | model.get('model/51b3c45a37203f16230000b5',
|
731 | true,
|
732 | 'only_model=true;limit=-1',
|
733 | function (error, resource) {
|
734 | if (!error && resource) {
|
735 | console.log(resource);
|
736 | }
|
737 | })
|
738 | ```
|
739 |
|
740 | The first parameter is, obviously, the model id, and the rest of parameters are
|
741 | optional. Passing a `true` value as the second argument (as in the example)
|
742 | forces the `get` method to retry to
|
743 | retrieve a finished model. In the previous section we saw that, right after
|
744 | creation, resources evolve
|
745 | through a series of states until they end up in a `FINISHED` (or `FAULTY`)
|
746 | state.
|
747 | Setting this boolean to `true` will force the `get` method to wait for
|
748 | the resource to be finished before
|
749 | executing the corresponding callback (default is set to `false`).
|
750 | The third parameter is a query string
|
751 | that can be used to filter the fields returned. In the example we set the
|
752 | fields to be retrieved to those used in the model (default is an empty string).
|
753 | The callback parameter is set to
|
754 | a default printing function if absent.
|
755 |
|
756 |
|
757 | Updating Resources
|
758 | ------------------
|
759 |
|
760 | Each type of resource has a set of properties whose values can be updated.
|
761 | Check the properties subsection of each resource in the [developers
|
762 | documentation](https://bigml.com/developers) to see which are marked as
|
763 | updatable. The `update` method of each resource class will let you modify
|
764 | such properties. For instance,
|
765 |
|
766 | ```js
|
767 | var bigml = require('bigml');
|
768 | var ensemble = new bigml.Ensemble();
|
769 | ensemble.update('ensemble/51901f4337203f3a9a000215',
|
770 | {name: 'my name', tags: 'code example'}, true,
|
771 | function (error, resource) {
|
772 | if (!error && resource) {
|
773 | console.log(resource);
|
774 | }
|
775 | })
|
776 | ```
|
777 |
|
778 | will set the name `my name` to your ensemble and add the
|
779 | tags `code` and `example`. The callback function is optional and a default
|
780 | printing function will be used if absent.
|
781 |
|
782 | If you have a look at the returned resource
|
783 | you will see that its status will
|
784 | be `constants.HTTP_ACCEPTED` if the resource can be updated without
|
785 | problems or one of the HTTP standard error codes otherwise.
|
786 |
|
787 | Deleting Resources
|
788 | ------------------
|
789 |
|
790 | Resources can be deleted individually using the `delete` method of the
|
791 | corresponding class.
|
792 |
|
793 | ```js
|
794 | var bigml = require('bigml');
|
795 | var source = new bigml.Source();
|
796 | source.delete('source/51b25fb237203f4410000010', true,
|
797 | function (error, result) {
|
798 | if (!error && result) {
|
799 | console.log(result);
|
800 | }
|
801 | })
|
802 | ```
|
803 |
|
804 | The call will return an object with the following keys:
|
805 |
|
806 | - **code** If the request is successful, the code will be a
|
807 | `constants.HTTP_NO_CONTENT` (204) status code. Otherwise, it wil be
|
808 | one of the standard HTTP error codes. See the [documentation on
|
809 | status codes](https://bigml.com/api/status_codes) for more
|
810 | info.
|
811 | - **error** If the request does not succeed, it will contain an
|
812 | object with an error code and a message. It will be `null`
|
813 | otherwise.
|
814 |
|
815 | The callback parameter is optional and a printing function is used as default.
|
816 |
|
817 | Downloading Batch Predictions' (or Centroids') output
|
818 | -----------------------------------------------------
|
819 |
|
820 | Using batch predictions you can obtain the predictions given by a model or
|
821 | ensemble on a dataset. Similarly, using batch centroids you will get the
|
822 | centroids predicted by a cluster for each instance of a dataset.
|
823 | The output is accessible through a BigML URL and can
|
824 | be stored in a local file by using the download method.
|
825 |
|
826 | ```js
|
827 | var bigml = require('bigml');
|
828 | var batchPrediction = new bigml.BatchPrediction(),
|
829 | tmpFileName='/tmp/predictions.csv';
|
830 | // batch prediction creation call
|
831 | batchPrediction.create('model/52e4680f37203f20bb000da7',
|
832 | 'dataset/52e6bd1a37203f3eac000392',
|
833 | {'name': 'my batch prediction'},
|
834 | function(error, batchPredictionInfo) {
|
835 | if (!error && batchPredictionInfo) {
|
836 | // retrieving batch prediction finished resource
|
837 | batchPrediction.get(batchPredictionInfo, true,
|
838 | function (error, batchPredictionInfo) {
|
839 | if (batchPredictionInfo.object.status.code === bigml.constants.FINISHED) {
|
840 | // retrieving the batch prediction output file and storing it
|
841 | // in the local file system
|
842 | batchPrediction.download(batchPredictionInfo,
|
843 | tmpFileName,
|
844 | function (error, cbFilename) {
|
845 | console.log(cbFilename);
|
846 | });
|
847 | }
|
848 | });
|
849 | }
|
850 | });
|
851 | ```
|
852 |
|
853 | If no `filename` is given, the callback receives the error and the
|
854 | request object used to download the url.
|
855 |
|
856 | Listing, Filtering and Ordering Resources
|
857 | -----------------------------------------
|
858 |
|
859 | Each type of resource has its own `list` method that allows you to
|
860 | retrieve groups of available resources of that kind. You can also add some
|
861 | filters to select
|
862 | specific subsets of them and even order the results. The returned list will
|
863 | show the 20 most recent resources. That limit can be modified by setting
|
864 | the `limit` argument in the query string. For more information about the syntax
|
865 | of query strings filters and orderings, you can check the fields labeled
|
866 | as *filterable* and *sortable* in the listings section of [BigML
|
867 | documentation](https://bigml.com/developers) for each resource. As an
|
868 | example, we can see how to list the first 20 sources
|
869 |
|
870 | ```js
|
871 | var bigml = require('bigml');
|
872 | var source = new bigml.Source();
|
873 | source.list(function (error, list) {
|
874 | if (!error && list) {
|
875 | console.log(list);
|
876 | }
|
877 | })
|
878 | ```
|
879 |
|
880 | and if you want the first 5 sources created before April 1st,
|
881 | 2013:
|
882 |
|
883 | ```js
|
884 | var bigml = require('bigml');
|
885 | var source = new bigml.Source();
|
886 | source.list('limit=5;created__lt=2013-04-1',
|
887 | function (error, list) {
|
888 | if (!error && list) {
|
889 | console.log(list);
|
890 | }
|
891 | })
|
892 | ```
|
893 |
|
894 | and if you want to select the first 5 as ordered by name:
|
895 |
|
896 | ```js
|
897 | var bigml = require('bigml');
|
898 | var source = new bigml.Source();
|
899 | source.list('limit=5;created__lt=2013-04-1;order_by=name',
|
900 | function (error, list) {
|
901 | if (!error && list) {
|
902 | console.log(list);
|
903 | }
|
904 | })
|
905 | ```
|
906 |
|
907 | In this method, both parameters are optional and, if no callback is given,
|
908 | a basic printing function is used instead.
|
909 |
|
910 | The list object will have the following structure:
|
911 |
|
912 | - **code**: If the request is successful you will get a
|
913 | `constants.HTTP_OK` (200) status code. Otherwise, it will be one of
|
914 | the standard HTTP error codes. See [BigML documentation on status
|
915 | codes](https://bigml.com/api/status_codes) for more info.
|
916 | - **meta**: An object including the following keys that can help you
|
917 | paginate listings:
|
918 |
|
919 | - **previous**: Path to get the previous page or `null` if there
|
920 | is no previous page.
|
921 | - **next**: Path to get the next page or `null` if there is no
|
922 | next page.
|
923 | - **offset**: How far off from the first entry in the resources is
|
924 | the first one listed in the resources key.
|
925 | - **limit**: Maximum number of resources that you will get listed in
|
926 | the resources key.
|
927 | - **total\_count**: The total number of resources in BigML.
|
928 |
|
929 | - **objects**: A list of resources as returned by BigML.
|
930 | - **error**: If an error occurs and the resource cannot be created, it
|
931 | will contain an additional code and a description of the error. In
|
932 | this case, **meta**, and **resources** will be `null`.
|
933 |
|
934 | a simple example of what a `list` call would retrieve is this one, where
|
935 | we asked for the 2 most recent sources:
|
936 |
|
937 | ```js
|
938 | var bigml = require('bigml');
|
939 | var source = new bigml.Source();
|
940 | source.list('limit=2',
|
941 | function (error, list) {
|
942 | if (!error && list) {
|
943 | console.log(list);
|
944 | }
|
945 | })
|
946 | > { code: 200,
|
947 | meta:
|
948 | { limit: 2,
|
949 | next: '/andromeda/source?username=mmerce&api_key=c972018dc5f2789e65c74ba3170fda31d02e00c0&limit=2&offset=2',
|
950 | offset: 0,
|
951 | previous: null,
|
952 | total_count: 653 },
|
953 | resources:
|
954 | [ { category: 0,
|
955 | code: 200,
|
956 | content_type: 'text/csv',
|
957 | created: '2013-06-11T00:01:51.526000',
|
958 | credits: 0,
|
959 | description: '',
|
960 | file_name: 'iris.csv',
|
961 | md5: 'd1175c032e1042bec7f974c91e4a65ae',
|
962 | name: 'iris.csv',
|
963 | number_of_datasets: 0,
|
964 | number_of_ensembles: 0,
|
965 | number_of_models: 0,
|
966 | number_of_predictions: 0,
|
967 | private: true,
|
968 | resource: 'source/51b668ef37203f50a4000005',
|
969 | size: 4608,
|
970 | source_parser: [Object],
|
971 | status: [Object],
|
972 | subscription: false,
|
973 | tags: [],
|
974 | type: 0,
|
975 | updated: '2013-06-11T00:02:06.381000' },
|
976 | { category: 0,
|
977 | code: 200,
|
978 | content_type: 'text/csv',
|
979 | created: '2013-06-09T00:15:00.574000',
|
980 | credits: 0,
|
981 | description: '',
|
982 | file_name: 'iris.csv',
|
983 | md5: 'd1175c032e1042bec7f974c91e4a65ae',
|
984 | name: 'my source',
|
985 | number_of_datasets: 0,
|
986 | number_of_ensembles: 0,
|
987 | number_of_models: 0,
|
988 | number_of_predictions: 0,
|
989 | private: true,
|
990 | resource: 'source/51b3c90437203f16230000dd',
|
991 | size: 4608,
|
992 | source_parser: [Object],
|
993 | status: [Object],
|
994 | subscription: false,
|
995 | tags: [],
|
996 | type: 0,
|
997 | updated: '2013-06-09T00:15:00.780000' } ],
|
998 | error: null }
|
999 | ```
|
1000 |
|
1001 | Local Predictions: file system or cache storage
|
1002 | -----------------------------------------------
|
1003 |
|
1004 | Every model available in BigML is white-box and can be downloaded from the
|
1005 | API as a JSON. This response can either be stored in the file system as a file
|
1006 | or stored in a cache-manager. In both cases, the bindings provide a class
|
1007 | for every model which can interpret this JSON and predict (supervised models),
|
1008 | or assing centroids, compute anomaly scores, etc. Thus, these classes allow
|
1009 | you to use your BigML model locally, with no connection whatsoever to
|
1010 | BigML's servers.
|
1011 |
|
1012 | The following sections explain each of these classes and their methods. As a
|
1013 | general summary, in order to create a local model from a BigML `model`
|
1014 | resource, the class to use is called `LocalModel`. You can make a local model
|
1015 | from a remote model by providing its ID.
|
1016 |
|
1017 | ```js
|
1018 | var bigml = require('bigml');
|
1019 | var localModel = new bigml.LocalModel('model/51922d0b37203f2a8c000010');
|
1020 | ```
|
1021 |
|
1022 | In this case, the localModel object will create a default connection object
|
1023 | that will retrieve your credentials from the environment variables and will
|
1024 | create a `./storage` directory in your current directory to be used as
|
1025 | local storage. Once this is done, it will download the JSON of the model
|
1026 | into the `./storage` directory, naming the file after the model ID
|
1027 | (replacing `/` by `_`). This stored file will be used the next time you
|
1028 | instantiate the `LocalModel` class for the same model ID. Thus, the connection
|
1029 | to BigML's servers is only needed the first time you use a model to download
|
1030 | its JSON information and once it's stored, the local version in your file
|
1031 | system will be used.
|
1032 |
|
1033 | If you prefer to use another directory to store your models, you can provide
|
1034 | a different connection object that specifies the folder to be used:
|
1035 |
|
1036 |
|
1037 | ```js
|
1038 | var bigml = require('bigml');
|
1039 | var localModel = new bigml.LocalModel(
|
1040 | 'model/51922d0b37203f2a8c000010',
|
1041 | new bigml.BigML(undefined,
|
1042 | undefined,
|
1043 | {storage: './my_storage_dir'}));
|
1044 | ```
|
1045 |
|
1046 | If you stored the model JSON in your local system by any other means, you
|
1047 | can use the path to the file as first argument too:
|
1048 |
|
1049 | ```js
|
1050 | var bigml = require('bigml');
|
1051 | var localModel = new bigml.LocalModel(
|
1052 | '/my_dir/my_model_json');
|
1053 | ```
|
1054 |
|
1055 | You can also use some cache-manager to store the JSON. In that case, the
|
1056 | `storage` attribute in the connection should contain the cache-manager
|
1057 | object that provides the `.get` and `.set` methods to manage the cache.
|
1058 | The cache-manager mechanism itself is not included in the bindings code
|
1059 | or its dependencies. Here's an example of use:
|
1060 |
|
1061 | ```js
|
1062 | var bigml = require('bigml');
|
1063 | var cacheManager = require('cache-manager');
|
1064 | var memoryCache = cacheManager.caching({store: 'memory',
|
1065 | max: 100,
|
1066 | ttl: 100});
|
1067 | var localModel = new bigml.LocalModel(
|
1068 | 'model/51922d0b37203f2a8c000010',
|
1069 | new bigml.BigML(undefined,
|
1070 | undefined,
|
1071 | {storage: memoryCache}));
|
1072 | ```
|
1073 |
|
1074 | Other types of local model classes (LocalCluster, LocalAnomaly,
|
1075 | etc.) offer the same kind of mechanisms.
|
1076 | Please, check the following sections for details.
|
1077 |
|
1078 |
|
1079 | Local Models
|
1080 | ------------
|
1081 |
|
1082 | A remote model encloses all the information required to make
|
1083 | predictions. Thus, once you retrieve a remote model, you can build its local
|
1084 | version and predict locally. This can be easily done using
|
1085 | the `LocalModel` class.
|
1086 |
|
1087 | ```js
|
1088 | var bigml = require('bigml');
|
1089 | var localModel = new bigml.LocalModel('model/51922d0b37203f2a8c000010');
|
1090 | localModel.predict({'petal length': 1},
|
1091 | function(error, prediction) {console.log(prediction)});
|
1092 | ```
|
1093 |
|
1094 | As you see, the first parameter to the `LocalModel` constructor is a model id
|
1095 | (or object or the path to a JSON file containing the full model information).
|
1096 | The constructor allows a second optional argument, a connection
|
1097 | object (as described in the [Authentication section](#authentication)).
|
1098 |
|
1099 | ```js
|
1100 | var bigml = require('bigml');
|
1101 | var myUser = 'myuser';
|
1102 | var myKey = 'ae579e7e53fb9abd646a6ff8aa99d4afe83ac291';
|
1103 | var localModel = new bigml.LocalModel('model/51922d0b37203f2a8c000010',
|
1104 | new bigml.BigML(myUser, myKey));
|
1105 | localModel.predict({'petal length': 1},
|
1106 | function(error, prediction) {console.log(prediction)});
|
1107 | ```
|
1108 |
|
1109 | The connection object can also include a storage directory. Setting that
|
1110 | will cause the `LocalModel` to check whether it can find a local model JSON
|
1111 | file in this directory before trying to download it from the server. This
|
1112 | means that your model information will only be downloaded the first time
|
1113 | you use it in a `LocalModel` instance. Instances that use the same connection
|
1114 | object will read the local file instead.
|
1115 |
|
1116 | ```js
|
1117 | var bigml = require('bigml');
|
1118 | var myUser = 'myuser';
|
1119 | var myKey = 'ae579e7e53fb9abd646a6ff8aa99d4afe83ac291';
|
1120 | var my_storage = './my_storage'
|
1121 | var connection = new bigml.BigML(myUser, myKey, {storage: my_storage});
|
1122 | var localModel = new bigml.LocalModel('model/51922d0b37203f2a8c000010',
|
1123 | connection);
|
1124 | localModel.predict({'petal length': 1},
|
1125 | function(error, prediction) {console.log(prediction)});
|
1126 | ```
|
1127 |
|
1128 |
|
1129 | The predict method can also be used labelling input data with the corresponding
|
1130 | field id.
|
1131 |
|
1132 | ```js
|
1133 | var bigml = require('bigml');
|
1134 | var localModel = new bigml.LocalModel('model/51922d0b37203f2a8c000010');
|
1135 | localModel.predict({'000002': 1},
|
1136 | function(error, prediction) {console.log(prediction)});
|
1137 | ```
|
1138 |
|
1139 | When the first argument is a finished model object, the constructor creates
|
1140 | immediately
|
1141 | a `LocalModel` instance ready to predict. Then, the `LocalModel.predict`
|
1142 | method can be immediately called in a synchronous way.
|
1143 |
|
1144 |
|
1145 | ```js
|
1146 | var bigml = require('bigml');
|
1147 | var model = new bigml.Model();
|
1148 | model.get('model/51b3c45a37203f16230000b5',
|
1149 | true,
|
1150 | 'only_model=true;limit=-1',
|
1151 | function (error, resource) {
|
1152 | if (!error && resource) {
|
1153 | var localModel = new bigml.LocalModel(resource);
|
1154 | var prediction = localModel.predict({'petal length': 3});
|
1155 | console.log(prediction);
|
1156 | }
|
1157 | })
|
1158 | ```
|
1159 | Note that the `get` method's second and third arguments ensure that the
|
1160 | retrieval waits for the model to be finished before retrieving it and that all
|
1161 | the fields used in the model will be downloaded. Beware of using
|
1162 | filtered fields models to instantiate a local model. If an important field is
|
1163 | missing (because it has been excluded or
|
1164 | filtered), an exception will arise. In this example, the connection to BigML
|
1165 | is used only in the `get` method call to retrieve the remote model information.
|
1166 | The callback code, where the `localModel` and predictions are built, is
|
1167 | strictly local.
|
1168 |
|
1169 | On the other hand, when the first argument for the `LocalModel` constructor
|
1170 | is a model id, it automatically calls internally
|
1171 | the `bigml.Model.get` method to retrieve the remote model information. As this
|
1172 | is an asyncronous procedure, the `LocalModel.predict` method must wait for
|
1173 | the built process to complete before making predictions. When using the
|
1174 | previous callback syntax this condition is internally ensured and you need
|
1175 | not care for these details. However, you may
|
1176 | want to use the synchronous version of the predict method in this case too.
|
1177 | Then you must be aware that the `LocalModel`
|
1178 | `ready` event is triggered on completion and at the same time the
|
1179 | `LocalModel.ready` attribute is set to true. You can wait for
|
1180 | the `ready` event to make predictions synchronously from then on like in:
|
1181 |
|
1182 | ```js
|
1183 | var bigml = require('bigml');
|
1184 | var localModel = new bigml.LocalModel('model/51922d0b37203f2a8c000010');
|
1185 | function doPredictions() {
|
1186 | var prediction = localModel.predict({'petal length': 1});
|
1187 | console.log(prediction);
|
1188 | }
|
1189 | if (localModel.ready) {
|
1190 | doPredictions();
|
1191 | } else {
|
1192 | localModel.on('ready', function () {doPredictions()});
|
1193 | }
|
1194 | ```
|
1195 |
|
1196 | You can also create a `LocalModel` from a JSON file containing the model
|
1197 | structure by setting the path to the file as first parameter:
|
1198 |
|
1199 | ```js
|
1200 | var bigml = require('bigml');
|
1201 | var localModel = new bigml.LocalModel('my_dir/my_model.json');
|
1202 | localModel.predict({'000002': 1},
|
1203 | function(error, prediction) {console.log(prediction)});
|
1204 | ```
|
1205 |
|
1206 | For classifications, the prediction of a local model will be one of the
|
1207 | available categories in the objective field and an associated `confidence`
|
1208 | or `probability` that is used to decide which is the predicted category.
|
1209 | If you prefer the model predictions to be operated using any of them, you can
|
1210 | use the `operatingKind` argument in the `predict` method. Here's the example
|
1211 | to use predictions based on `confidence`:
|
1212 |
|
1213 | ```js
|
1214 | localModel.predict({'000002': 1},
|
1215 | undefined,
|
1216 | undefined,
|
1217 | undefined,
|
1218 | undefined,
|
1219 | "confidence",
|
1220 | function(error, prediction) {console.log(prediction)});
|
1221 | ```
|
1222 |
|
1223 |
|
1224 | Predictions' Missing Strategy
|
1225 | -----------------------------
|
1226 |
|
1227 | There are two different strategies when dealing with missing values
|
1228 | in input data for the fields used in the model rules. The default
|
1229 | strategy used in predictions when a missing value is found for the
|
1230 | field used to split the node is returning the prediction of the previous node.
|
1231 |
|
1232 | ```js
|
1233 | var bigml = require('bigml');
|
1234 | var LAST_PREDICTION = 0;
|
1235 | var localModel = new bigml.LocalModel('model/51922d0b37203f2a8c000010');
|
1236 | localModel.predict({'petal length': 1}, LAST_PREDICTION,
|
1237 | function(error, prediction) {console.log(prediction)});
|
1238 | ```
|
1239 |
|
1240 | The other strategy when a missing value is found is considering both splits
|
1241 | valid and following the rules to the final leaves of the tree. The prediction
|
1242 | is built considering all the predictions of the leaves reached, averaging
|
1243 | them in regressions and selecting the majority class for classifications.
|
1244 |
|
1245 | ```js
|
1246 | var bigml = require('bigml');
|
1247 | var PROPORTIONAL = 1;
|
1248 | var localModel = new bigml.LocalModel('model/51922d0b37203f2a8c000010');
|
1249 | localModel.predict({'petal length': 1}, PROPORTIONAL,
|
1250 | function(error, prediction) {console.log(prediction)});
|
1251 | ```
|
1252 |
|
1253 | Operating point's predictions
|
1254 | -----------------------------
|
1255 |
|
1256 | In classification problems,
|
1257 | Models, Ensembles and Logistic Regressions can be used at different
|
1258 | operating points, that is, associated to particular thresholds. Each
|
1259 | operating point is then defined by the kind of property you use as threshold,
|
1260 | its value and the class that is supposed to be predicted if the threshold
|
1261 | is reached.
|
1262 |
|
1263 | Let's assume you decide that you have a binary problem, with classes `True`
|
1264 | and `False` as possible outcomes. Imagine you want to be very sure to
|
1265 | predict the `True` outcome, so you don't want to predict that unless the
|
1266 | probability associated to it is over `0.8`. You can achieve this with any
|
1267 | classification model by creating an operating point:
|
1268 |
|
1269 | ```js
|
1270 | var operatingPoint = {kind: 'probability',
|
1271 | positiveClass: 'True',
|
1272 | threshold: 0.8};
|
1273 | ```
|
1274 |
|
1275 | to predict using this restriction, you can use the `predictOperating`
|
1276 | method:
|
1277 |
|
1278 | ```js
|
1279 |
|
1280 | var prediction = localModel.predictOperating(inputData,
|
1281 | missingStrategy,
|
1282 | operatingPoint,
|
1283 | cb);
|
1284 | ```
|
1285 |
|
1286 | where `inputData` should contain the values for which you want to predict
|
1287 | and
|
1288 | `missingStrategy` is the strategy to use when values are missing (please,
|
1289 | check the previous section for details about this parameter).
|
1290 |
|
1291 | Local models allow two kinds of operating points: `probability` and
|
1292 | `confidence`. For both of them, the threshold can be set to any number
|
1293 | in the `[0, 1]` range.
|
1294 |
|
1295 | You can also use the `predict` method in its most general form:
|
1296 |
|
1297 | ```js
|
1298 | var prediction = localModel.predict(inputData,
|
1299 | missingStrategy,
|
1300 | median,
|
1301 | addUnusedFields,
|
1302 | operatingPoint,
|
1303 | cb);
|
1304 |
|
1305 | ```
|
1306 |
|
1307 | Local Ensembles
|
1308 | ---------------
|
1309 |
|
1310 | As in the local model case, remote ensembles can also be used locally through
|
1311 | the `LocalEnsemble` class to make local predictions. The simplest way to
|
1312 | create a `LocalEnsemble` is:
|
1313 |
|
1314 | ```js
|
1315 | var bigml = require('bigml');
|
1316 | var localEnsemble = new bigml.LocalEnsemble('ensemble/51901f4337203f3a9a000215');
|
1317 | localEnsemble.predict({'petal length': 1},
|
1318 | function(error, prediction) {console.log(prediction)});
|
1319 | ```
|
1320 |
|
1321 | This call will download all the ensemble related info (and each of its
|
1322 | component models) and use it to predict by combining the predictions
|
1323 | of each individual
|
1324 | model. The algorithm used to combine these predictions depends
|
1325 | on the ensemble type (`Decision Forest`,
|
1326 | `Random Decision Forest` or `Boosting Trees`) and you can learn more about
|
1327 | them in the
|
1328 | [ensembles section of the API documents](https://bigml.com/api/ensembles).
|
1329 | The example shows
|
1330 | a `Decision Forest` using a majority system (classifications)
|
1331 | or an average system
|
1332 | (regressions) to combine the models' predictions.
|
1333 |
|
1334 | For classifications
|
1335 | the prediction will be one amongst the list of categories in the objective
|
1336 | field. When each model in the ensemble
|
1337 | is used to predict, each category has a confidence, a
|
1338 | probability or a vote associated to this prediction.
|
1339 | Then, through the collection
|
1340 | of models in the
|
1341 | ensemble, each category gets an averaged confidence, probabiity and number of
|
1342 | votes. Thus you can decide whether to operate the ensemble using the
|
1343 | ``confidence``, the ``probability`` or the ``votes`` so that the predicted
|
1344 | category is the one that scores higher in any of these quantities. The
|
1345 | criteria can be set using the `operatingKind` option:
|
1346 |
|
1347 |
|
1348 | ```js
|
1349 | var bigml = require('bigml');
|
1350 | var localEnsemble = new bigml.LocalEnsemble('ensemble/51901f4337203f3a9a000215');
|
1351 | localEnsemble.predict({'petal length': 1}, undefined,
|
1352 | {operatingKind: "probability"},
|
1353 | function(error, prediction) {console.log(prediction)});
|
1354 | ```
|
1355 |
|
1356 |
|
1357 | The first argument in the `LocalEnsemble.predict`
|
1358 | method
|
1359 | is the input data to predict from. The second argument is a legacy
|
1360 | parameter to be deprecated that used to decide the combination method. This
|
1361 | parameter has been overridden by the ``operatingKind`` option that can be
|
1362 | sent in the third argument, which is an object where you can specify some
|
1363 | additional configuration values, such as the missing strategy used in each
|
1364 | model's prediction:
|
1365 |
|
1366 | ```js
|
1367 | var bigml = require('bigml');
|
1368 | var localEnsemble = new bigml.LocalEnsemble('ensemble/51901f4337203f3a9a000215');
|
1369 | localEnsemble.predict({'petal length': 1},
|
1370 | undefined,
|
1371 | {missingStrategy: 1,
|
1372 | operatingKind: "confidence"},
|
1373 | function(error, prediction) {console.log(prediction)});
|
1374 | ```
|
1375 | in this case the proportional missing strategy (default would be last
|
1376 | prediction missing strategy) will be applied.
|
1377 |
|
1378 | As in `LocalModel`, the constructor of `LocalEnsemble` has as
|
1379 | first argument the ensemble id (or object) or a list of model ids (or objects)
|
1380 | as well as a second optional connection
|
1381 | argument. Building a `LocalEnsemble` is an asynchronous process because the
|
1382 | constructor will need to call the `get` methods of the remote ensemble object
|
1383 | and its component models. Thus, the `LocalEnsemble.predict` method will have
|
1384 | to wait for the object to be entirely built before making the prediction. This
|
1385 | is internally done when you use the callback syntax for the `predict` method.
|
1386 | In case you want to call the `LocalEnsemble.predict` method as a synchronous
|
1387 | function, you should first make sure that the constructor has finished building
|
1388 | the object by checking the `LocalEnsemble.ready` attribute and listening
|
1389 | to the `ready` event. For instance,
|
1390 |
|
1391 | ```js
|
1392 | var bigml = require('bigml');
|
1393 | var localEnsemble = new bigml.LocalEnsemble('ensemble/51901f4337203f3a9a000215');
|
1394 | function doPredictions() {
|
1395 | var prediction = localEnsemble.predict({'petal length': 1});
|
1396 | console.log(prediction);
|
1397 | }
|
1398 | if (localEnsemble.ready) {
|
1399 | doPredictions();
|
1400 | } else {
|
1401 | localEnsemble.on('ready', function () {doPredictions()});
|
1402 | }
|
1403 | ```
|
1404 | would first download the remote ensemble and its component models, then
|
1405 | construct a local model for each one and predict using these local models.
|
1406 |
|
1407 | The same can be done for an array containing a list of models (only bagging
|
1408 | ensembles and random decision forests can be built this way), regardless of
|
1409 | whether they belong to a remote ensemble or not:
|
1410 |
|
1411 | ```js
|
1412 | var bigml = require('bigml');
|
1413 | var localEnsemble = new bigml.LocalEnsemble([
|
1414 | 'model/51bb69b437203f02b50004ce', 'model/51bb69b437203f02b50004d0']);
|
1415 | localEnsemble.predict({'petal length': 1},
|
1416 | function(error, prediction) {console.log(prediction)});
|
1417 | ```
|
1418 |
|
1419 | Operating point predictions are also available for local ensembles and an
|
1420 | example of it would be:
|
1421 |
|
1422 | ```js
|
1423 | var operatingPoint = {kind: 'probability',
|
1424 | positiveClass: 'True',
|
1425 | threshold: 0.8};
|
1426 | var prediction = localEnsemble.predictOperating(inputData,
|
1427 | missingStrategy,
|
1428 | operatingPoint,
|
1429 | cb);
|
1430 | ```
|
1431 |
|
1432 | or using the `predict` method:
|
1433 |
|
1434 | ```js
|
1435 | var prediction = localEnsemble.predict(inputData,
|
1436 | undefined,
|
1437 | {operatingPoint: operatingPoint},
|
1438 | cb);
|
1439 | ```
|
1440 |
|
1441 | You can check the
|
1442 | [Operating point's predictions](#operating-point's-predictions) section
|
1443 | to learn about
|
1444 | operating points. For ensembles, three kinds of operating points are available:
|
1445 | `votes`, `probability` and `confidence`. The `votes` option
|
1446 | will use as threshold the
|
1447 | number of models in the ensemble that vote for the positive class. The other
|
1448 | two are already explained in the above mentioned section.
|
1449 |
|
1450 |
|
1451 | The local ensemble constructor accepts also a connection object.
|
1452 | The connection object can also include the user's credentials and
|
1453 | a storage directory. Setting that
|
1454 | will cause the `LocalEnsemble` to check whether it can find a local ensemble
|
1455 | JSON file in this directory before trying to download it from the server. This
|
1456 | means that your model information will only be downloaded the first time
|
1457 | you use it in a `LocalEnsemble` instance. Instances that use the same
|
1458 | connection
|
1459 | object will read the local file instead.
|
1460 |
|
1461 | ```js
|
1462 | var bigml = require('bigml');
|
1463 | var myUser = 'myuser';
|
1464 | var myKey = 'ae579e7e53fb9abd646a6ff8aa99d4afe83ac291';
|
1465 | var my_storage = './my_storage';
|
1466 | var connection = new bigml.BigML(myUser, myKey, {storage: my_storage});
|
1467 | var localEnsemble = new bigml.LocalEnsemble(
|
1468 | 'ensemble/51922d0b37203f2a8c000010', connection);
|
1469 | ```
|
1470 |
|
1471 | Local Logistic Regressions
|
1472 | --------------------------
|
1473 |
|
1474 | A remote logistic regression model encloses all the information
|
1475 | required to predict the categorical value of the objective field associated
|
1476 | to a given input data set.
|
1477 | Thus, you can build a local version of
|
1478 | a logistic regression model and predict the category locally using
|
1479 | the `LocalLogisticRegression` class.
|
1480 |
|
1481 | ```js
|
1482 | var bigml = require('bigml');
|
1483 | var localLogisticRegression = new bigml.LocalLogisticRegression(
|
1484 | 'logisticregression/51922d0b37203f2a8c001010');
|
1485 | localLogisticRegression.predict({'petal length': 1, 'petal width': 1,
|
1486 | 'sepal length': 1, 'sepal width': 1},
|
1487 | function(error, prediction) {
|
1488 | console.log(prediction)});
|
1489 | ```
|
1490 |
|
1491 | Note that, to find the associated prediction, input data cannot contain missing
|
1492 | values in numeric fields. The predict method can also be used labelling
|
1493 | input data with the corresponding field id.
|
1494 |
|
1495 | As you see, the first parameter to the `LocalLogisticRegression` constructor
|
1496 | is a logistic regression id (or object). The constructor allows a second
|
1497 | optional argument, a connection
|
1498 | object (as described in the [Authentication section](#authentication)).
|
1499 |
|
1500 | ```js
|
1501 | var bigml = require('bigml');
|
1502 | var myUser = 'myuser';
|
1503 | var myKey = 'ae579e7e53fb9abd646a6ff8aa99d4afe83ac291';
|
1504 | var localLogisticRegression = new bigml.LocalLogisticRegression(
|
1505 | 'logisticregression/51922d0b37203f2a8c001010',
|
1506 | new bigml.BigML(myUser, myKey));
|
1507 | localLogisticRegression.predict({'000000': 1, '000001': 1,
|
1508 | '000002': 1, '000003': 1},
|
1509 | function(error, prediction) {
|
1510 | console.log(prediction)});
|
1511 | ```
|
1512 |
|
1513 | When the first argument is a finished logistic regression object,
|
1514 | the constructor creates immediately
|
1515 | a `LocalLogisticRegression` instance ready to predict. Then,
|
1516 | the `LocalLogisticRegression.predict`
|
1517 | method can be immediately called in a synchronous way.
|
1518 |
|
1519 |
|
1520 | ```js
|
1521 | var bigml = require('bigml');
|
1522 | var logisticRegression = new bigml.LogisticRegression();
|
1523 | logisticRegression.get('logisticregression/51b3c45a37203f16230000b5', true,
|
1524 | 'only_model=true;limit=-1',
|
1525 | function (error, resource) {
|
1526 | if (!error && resource) {
|
1527 | var localLogisticRegression = new bigml.LocalLogisticRegression(
|
1528 | resource);
|
1529 | var prediction = localLogisticRegression.predict(
|
1530 | {'000000': 1, '000001': 1,
|
1531 | '000002': 1, '000003': 1});
|
1532 | console.log(prediction);
|
1533 | }
|
1534 | })
|
1535 | ```
|
1536 | Note that the `get` method's second and third arguments ensure that the
|
1537 | retrieval waits for the model to be finished before retrieving it and that all
|
1538 | the fields used in the logistic regression will be downloaded respectively.
|
1539 | Beware of using
|
1540 | filtered fields logistic regressions to instantiate a local logistic regression
|
1541 | object. If an important field
|
1542 | is missing (because it has been excluded or
|
1543 | filtered), an exception will arise. In this example, the connection to BigML
|
1544 | is used only in the `get` method call to retrieve the remote logistic
|
1545 | regression
|
1546 | information. The callback code, where the `localLogisticRegression`
|
1547 | and predictions
|
1548 | are built, is strictly local.
|
1549 |
|
1550 | Operating point predictions are also available for local logistic regressions
|
1551 | and an example of it would be:
|
1552 |
|
1553 | ```js
|
1554 | var operatingPoint = {kind: 'probability',
|
1555 | positiveClass: 'True',
|
1556 | threshold: 0.8};
|
1557 | localLogistic.predictOperating(inputData,
|
1558 | operatingPoint,
|
1559 | cb);
|
1560 | ```
|
1561 |
|
1562 | or using the `predict` method:
|
1563 |
|
1564 | ```js
|
1565 | localLogistic.predict(inputData,
|
1566 | undefined,
|
1567 | operatingPoint,
|
1568 | cb);
|
1569 | ```
|
1570 |
|
1571 | You can check the
|
1572 | [Operating point's predictions](#operating-point's-predictions) section
|
1573 | to learn about
|
1574 | operating points. For logistic regressions, the only available kind is
|
1575 | `probability`, that sets the threshold of probability to be reached for the
|
1576 | prediction to be the positive class.
|
1577 |
|
1578 |
|
1579 | The local logistic regression constructor accepts also a connection object.
|
1580 | The connection object can also include the user's credentials and
|
1581 | a storage directory. Setting that
|
1582 | will cause the `LocalLogistic` to check whether it can find a local logistic
|
1583 | regression JSON file in this directory before trying to download
|
1584 | it from the server. This
|
1585 | means that your model information will only be downloaded the first time
|
1586 | you use it in a `LocalLogistic` instance. Instances that use the same
|
1587 | connection
|
1588 | object will read the local file instead.
|
1589 |
|
1590 | ```js
|
1591 | var bigml = require('bigml');
|
1592 | var myUser = 'myuser';
|
1593 | var myKey = 'ae579e7e53fb9abd646a6ff8aa99d4afe83ac291';
|
1594 | var my_storage = './my_storage'
|
1595 | var connection = new bigml.BigML(myUser, myKey, {storage: my_storage});
|
1596 | var localLogistic = new bigml.LocalLogistic(
|
1597 | 'logisticregression/51922d0b37203f2a8c000010', connection);
|
1598 | ```
|
1599 |
|
1600 | Local Deepnets
|
1601 | --------------
|
1602 |
|
1603 | A remote deepnet model encloses all the information
|
1604 | required to predict the value of the objective field associated
|
1605 | to a given input data set.
|
1606 | Thus, you can build a local version of
|
1607 | a deepnet model and predict the category locally using
|
1608 | the `LocalDeepnet` class.
|
1609 |
|
1610 | ```js
|
1611 | var bigml = require('bigml');
|
1612 | var localDeepnet = new bigml.LocalDeepnet(
|
1613 | 'deepnet/51922d0b37203f2a8c001010');
|
1614 | localDeepnet.predict({'petal length': 1, 'petal width': 1,
|
1615 | 'sepal length': 1, 'sepal width': 1},
|
1616 | function(error, prediction) {
|
1617 | console.log(prediction)});
|
1618 | ```
|
1619 |
|
1620 | As you see, the first parameter to the `LocalDeepnet` constructor
|
1621 | is a deepnet id (or object). The constructor allows a second
|
1622 | optional argument, a connection
|
1623 | object (as described in the [Authentication section](#authentication)).
|
1624 |
|
1625 | ```js
|
1626 | var bigml = require('bigml');
|
1627 | var myUser = 'myuser';
|
1628 | var myKey = 'ae579e7e53fb9abd646a6ff8aa99d4afe83ac291';
|
1629 | var localDeepnet = new bigml.LocalDeepnet(
|
1630 | 'deepnet/51922d0b37203f2a8c001010',
|
1631 | new bigml.BigML(myUser, myKey));
|
1632 | localDeepnet.predict({'000000': 1, '000001': 1,
|
1633 | '000002': 1, '000003': 1},
|
1634 | function(error, prediction) {
|
1635 | console.log(prediction)});
|
1636 | ```
|
1637 |
|
1638 | When the first argument is a finished deepnet object,
|
1639 | the constructor creates immediately
|
1640 | a `LocalDeepnet` instance ready to predict. Then,
|
1641 | the `LocalDeepnet.predict`
|
1642 | method can be immediately called in a synchronous way.
|
1643 |
|
1644 |
|
1645 | ```js
|
1646 | var bigml = require('bigml');
|
1647 | var deepnet = new bigml.Deepnet();
|
1648 | deepnet.get('deepnet/51b3c45a37203f16230000b5', true,
|
1649 | 'only_model=true;limit=-1',
|
1650 | function (error, resource) {
|
1651 | if (!error && resource) {
|
1652 | var localDeepnet = new bigml.LocalDeepnet(
|
1653 | resource);
|
1654 | var prediction = localDeepnet.predict(
|
1655 | {'000000': 1, '000001': 1,
|
1656 | '000002': 1, '000003': 1});
|
1657 | console.log(prediction);
|
1658 | }
|
1659 | })
|
1660 | ```
|
1661 | Note that the `get` method's second and third arguments ensure that the
|
1662 | retrieval waits for the model to be finished before retrieving it and that all
|
1663 | the fields used in the deepnet will be downloaded.
|
1664 | Beware of using
|
1665 | filtered fields deepnets to instantiate a local deepnet
|
1666 | object. If an important field
|
1667 | is missing (because it has been excluded or
|
1668 | filtered), an exception will arise. In this example, the connection to BigML
|
1669 | is used only in the `get` method call to retrieve the remote deepnet
|
1670 | information. The callback code, where the `localDeepnet`
|
1671 | and predictions
|
1672 | are built, is strictly local.
|
1673 |
|
1674 | Operating point predictions are also available for local deepnets
|
1675 | and an example of it would be:
|
1676 |
|
1677 | ```js
|
1678 | var operatingPoint = {kind: 'probability',
|
1679 | positiveClass: 'True',
|
1680 | threshold: 0.8};
|
1681 | localDeepnet.predictOperating(inputData,
|
1682 | operatingPoint,
|
1683 | cb);
|
1684 | ```
|
1685 |
|
1686 | or using the `predict` method:
|
1687 |
|
1688 | ```js
|
1689 | localDeepnet.predict(inputData,
|
1690 | false,
|
1691 | operatingPoint,
|
1692 | cb);
|
1693 | ```
|
1694 |
|
1695 | You can check the
|
1696 | [Operating point's predictions](#operating-point's-predictions) section
|
1697 | to learn about
|
1698 | operating points. For deepnets, the only available kind is
|
1699 | `probability`, that sets the threshold of probability to be reached for the
|
1700 | prediction to be the positive class.
|
1701 |
|
1702 | The local deepnet constructor accepts also a connection object.
|
1703 | The connection object can also include the user's credentials and
|
1704 | a storage directory. Setting that
|
1705 | will cause the `LocalDeepnet` to check whether it can find a local deepnet
|
1706 | JSON file in this directory before trying to download
|
1707 | it from the server. This
|
1708 | means that your model information will only be downloaded the first time
|
1709 | you use it in a `LocalDeepnet` instance. Instances that use the same
|
1710 | connection
|
1711 | object will read the local file instead.
|
1712 |
|
1713 | ```js
|
1714 | var bigml = require('bigml');
|
1715 | var myUser = 'myuser';
|
1716 | var myKey = 'ae579e7e53fb9abd646a6ff8aa99d4afe83ac291';
|
1717 | var my_storage = './my_storage'
|
1718 | var connection = new bigml.BigML(myUser, myKey, {storage: my_storage});
|
1719 | var localDeepnet = new bigml.LocalDeepnet(
|
1720 | 'deepnet/51922d0b37203f2a8c000010', connection);
|
1721 | ```
|
1722 |
|
1723 | Local Fusions
|
1724 | -------------
|
1725 |
|
1726 | A remote fusion model encloses all the information
|
1727 | required to predict the value of the objective field associated
|
1728 | to a given input data set. The fusion model is composed of a list of
|
1729 | supervised models whose predictions are aggregated to produce a final
|
1730 | fusion prediction. You can build a local version of
|
1731 | a fusion and predict the category (or numeric objective) locally using
|
1732 | the `LocalFusion` class.
|
1733 |
|
1734 | ```js
|
1735 | var bigml = require('bigml');
|
1736 | var localFusion = new bigml.LocalFusion(
|
1737 | 'fusion/51922d0b37203f2a8c001013');
|
1738 | localFusion.predict({'petal length': 1, 'petal width': 1,
|
1739 | 'sepal length': 1, 'sepal width': 1},
|
1740 | function(error, prediction) {
|
1741 | console.log(prediction)});
|
1742 | ```
|
1743 |
|
1744 | As you see, the first parameter to the `LocalFusion` constructor
|
1745 | is a fusion id (or object). The constructor allows a second
|
1746 | optional argument, a connection
|
1747 | object (as described in the [Authentication section](#authentication)).
|
1748 |
|
1749 | ```js
|
1750 | var bigml = require('bigml');
|
1751 | var myUser = 'myuser';
|
1752 | var myKey = 'ae579e7e53fb9abd646a6ff8aa99d4afe83ac291';
|
1753 | var localFusion = new bigml.LocalFusion(
|
1754 | 'fusion/51922d0b37203f2a8c001010',
|
1755 | new bigml.BigML(myUser, myKey));
|
1756 | localFusion.predict({'000000': 1, '000001': 1,
|
1757 | '000002': 1, '000003': 1},
|
1758 | function(error, prediction) {
|
1759 | console.log(prediction)});
|
1760 | ```
|
1761 |
|
1762 | When the first argument is a finished fusion object,
|
1763 | the constructor creates immediately
|
1764 | a `LocalFusion` instance ready to predict. Then,
|
1765 | the `LocalFusion.predict`
|
1766 | method can be immediately called in a synchronous way.
|
1767 |
|
1768 |
|
1769 | ```js
|
1770 | var bigml = require('bigml');
|
1771 | var fusion = new bigml.Fusion();
|
1772 | fusion.get('fusion/51b3c45a37203f16230000b5', true,
|
1773 | 'only_model=true;limit=-1',
|
1774 | function (error, resource) {
|
1775 | if (!error && resource) {
|
1776 | var localFusion = new bigml.LocalFusion(
|
1777 | resource);
|
1778 | var prediction = localFusion.predict(
|
1779 | {'000000': 1, '000001': 1,
|
1780 | '000002': 1, '000003': 1});
|
1781 | console.log(prediction);
|
1782 | }
|
1783 | })
|
1784 | ```
|
1785 | Note that the `get` method's second and third arguments ensure that the
|
1786 | retrieval waits for the model to be finished before retrieving it and that all
|
1787 | the fields used in the fusion will be downloaded.
|
1788 | Beware of using
|
1789 | a fusion with filtered fields information to instantiate a local fusion
|
1790 | object. If an important field
|
1791 | is missing (because it has been excluded or
|
1792 | filtered), an exception will arise. In this example, the connection to BigML
|
1793 | is used only in the `get` method call to retrieve the remote fusion
|
1794 | information. The callback code, where the `localFusion`
|
1795 | and predictions
|
1796 | are built, is strictly local.
|
1797 |
|
1798 | Operating point predictions are also available for local fusions
|
1799 | and an example of it would be:
|
1800 |
|
1801 | ```js
|
1802 | var operatingPoint = {kind: 'probability',
|
1803 | positiveClass: 'True',
|
1804 | threshold: 0.8};
|
1805 | locaFusion.predictOperating(inputData,
|
1806 | operatingPoint,
|
1807 | cb);
|
1808 | ```
|
1809 |
|
1810 | or using the `predict` method:
|
1811 |
|
1812 | ```js
|
1813 | localFusion.predict(inputData,
|
1814 | operatingPoint,
|
1815 | cb);
|
1816 | ```
|
1817 |
|
1818 | You can check the
|
1819 | [Operating point's predictions](#operating-point's-predictions) section
|
1820 | to learn about
|
1821 | operating points. For fusions, the only available kind is
|
1822 | `probability`, that sets the threshold of probability to be reached for the
|
1823 | prediction to be the positive class.
|
1824 |
|
1825 | Local PCA
|
1826 | ---------
|
1827 |
|
1828 | A remote PCA model describes the set of orthogonal features best adapted to
|
1829 | a particular dataset to describe the variance in its data. These features
|
1830 | are built by linearly combining the original set of features.
|
1831 | You can build a local version of
|
1832 | a PCA and use it to compute the projections of every instance from the
|
1833 | original feature set to the PCA one using
|
1834 | the `LocalPCA` class.
|
1835 |
|
1836 | ```js
|
1837 | var bigml = require('bigml');
|
1838 | var localPCA = new bigml.LocalPCA(
|
1839 | 'pca/51922d0b37203f2a8c001017');
|
1840 | localPCA.projection({'petal length': 1, 'petal width': 1,
|
1841 | 'sepal length': 1, 'sepal width': 1},
|
1842 | function(error, prediction) {
|
1843 | console.log(prediction)});
|
1844 | ```
|
1845 |
|
1846 | As you see, the first parameter to the `LocalPCA` constructor
|
1847 | is a PCA id (or object). The constructor allows a second
|
1848 | optional argument, a connection
|
1849 | object (as described in the [Authentication section](#authentication))
|
1850 | that can also include the user's credentials and
|
1851 | a storage directory. Setting that
|
1852 | will cause the `LocalPCA` to check whether it can find a local
|
1853 | PCA JSON file in this directory before trying to download
|
1854 | it from the server. This
|
1855 | means that your model information will only be downloaded the first time
|
1856 | you use it in a `LocalPCA` instance. Instances that use the same
|
1857 | connection
|
1858 | object will read the local file instead.
|
1859 |
|
1860 | ```js
|
1861 | var bigml = require('bigml');
|
1862 | var myUser = 'myuser';
|
1863 | var myKey = 'ae579e7e53fb9abd646a6ff8aa99d4afe83ac291';
|
1864 | var my_storage = './my_storage'
|
1865 | var connection = new bigml.BigML(myUser, myKey, {storage: my_storage});
|
1866 | var localPCA = new bigml.PCA(
|
1867 | 'pca/51922d0b37203f2a8c000010', connection);
|
1868 | localPCA.projection({'000000': 1, '000001': 1,
|
1869 | '000002': 1, '000003': 1},
|
1870 | function(error, projection) {
|
1871 | console.log(projection)});
|
1872 | ```
|
1873 |
|
1874 | When the first argument is a finished PCA object,
|
1875 | the constructor creates immediately
|
1876 | a `LocalPCA` instance ready to work. Then,
|
1877 | the `LocalPCA.predict`
|
1878 | method can be immediately called in a synchronous way.
|
1879 |
|
1880 |
|
1881 | ```js
|
1882 | var bigml = require('bigml');
|
1883 | var pca = new bigml.PCA();
|
1884 | pca.get('pca/51b3c45a37203f16230000b5', true,
|
1885 | 'only_model=true;limit=-1',
|
1886 | function (error, resource) {
|
1887 | if (!error && resource) {
|
1888 | var localPCA = new bigml.LocalPCA(
|
1889 | resource);
|
1890 | var projection = localPCA.predict(
|
1891 | {'000000': 1, '000001': 1,
|
1892 | '000002': 1, '000003': 1});
|
1893 | console.log(projection);
|
1894 | }
|
1895 | })
|
1896 | ```
|
1897 | Note that the `get` method's second and third arguments ensure that the
|
1898 | retrieval waits for the model to be finished before retrieving it and that all
|
1899 | the fields used in the PCA will be downloaded.
|
1900 | Beware of using
|
1901 | a PCA with filtered fields information to instantiate a local PCA
|
1902 | object. If an important field
|
1903 | is missing (because it has been excluded or
|
1904 | filtered), an exception will arise. In this example, the connection to BigML
|
1905 | is used only in the `get` method call to retrieve the remote PCA
|
1906 | information. The callback code, where the `localPCA`
|
1907 | and projections
|
1908 | are built, is strictly local.
|
1909 |
|
1910 |
|
1911 | Local Linear Regressions
|
1912 | ------------------------
|
1913 |
|
1914 | A remote linear regression model encloses all the information
|
1915 | required to predict the numeric value of the objective field associated
|
1916 | to a given input data set.
|
1917 | Thus, you can build a local version of
|
1918 | a linear regression model and predict the numeric objective locally using
|
1919 | the `LocalLinearRegression` class.
|
1920 |
|
1921 | ```js
|
1922 | var bigml = require('bigml');
|
1923 | var localLinearRegression = new bigml.LocalLinearRegression(
|
1924 | 'linearregression/51922d0b37203f2a8c001010');
|
1925 | localLinearRegression.predict({'petal length': 1, 'petal width': 1,
|
1926 | 'sepal length': 1, 'species': 'Iris-setosa'},
|
1927 | function(error, prediction) {
|
1928 | console.log(prediction)});
|
1929 | ```
|
1930 |
|
1931 | Note that, to find the associated prediction, input data cannot contain missing
|
1932 | values in fields that had not missings in the training data.
|
1933 | The predict method can also be used labelling
|
1934 | input data with the corresponding field id.
|
1935 |
|
1936 | As you see, the first parameter to the `LocalLinearRegression` constructor
|
1937 | is a linear regression id (or object). The constructor allows a second
|
1938 | optional argument, a connection
|
1939 | object (as described in the [Authentication section](#authentication)).
|
1940 |
|
1941 | ```js
|
1942 | var bigml = require('bigml');
|
1943 | var myUser = 'myuser';
|
1944 | var myKey = 'ae579e7e53fb9abd646a6ff8aa99d4afe83ac291';
|
1945 | var localLinearRegression = new bigml.LocalLinearRegression(
|
1946 | 'linearregression/51922d0b37203f2a8c001010',
|
1947 | new bigml.BigML(myUser, myKey));
|
1948 | localLinearRegression.predict({'000000': 1, '000001': 1,
|
1949 | '000002': 1, '000004': 'Iris-setosa'},
|
1950 | function(error, prediction) {
|
1951 | console.log(prediction)});
|
1952 | ```
|
1953 |
|
1954 | When the first argument is a finished linear regression object,
|
1955 | the constructor creates immediately
|
1956 | a `LocalLinearRegression` instance ready to predict. Then,
|
1957 | the `LocalLinearRegression.predict`
|
1958 | method can be immediately called in a synchronous way.
|
1959 |
|
1960 |
|
1961 | ```js
|
1962 | var bigml = require('bigml');
|
1963 | var linearRegression = new bigml.LinearRegression();
|
1964 | linearRegression.get('linearregression/51b3c45a37203f16230000b5', true,
|
1965 | 'only_model=true;limit=-1',
|
1966 | function (error, resource) {
|
1967 | if (!error && resource) {
|
1968 | var localLinearRegression = new bigml.LocalLinearRegression(
|
1969 | resource);
|
1970 | var prediction = localLinearRegression.predict(
|
1971 | {'000000': 1, '000001': 1,
|
1972 | '000002': 1, '000004': 'Iris-setosa'});
|
1973 | console.log(prediction);
|
1974 | }
|
1975 | })
|
1976 | ```
|
1977 | Note that the `get` method's second and third arguments ensure that the
|
1978 | retrieval waits for the model to be finished before retrieving it and that all
|
1979 | the fields used in the linear regression will be downloaded respectively.
|
1980 | Beware of using filtered fields linear regressions to
|
1981 | instantiate a local linear regression
|
1982 | object. If an important field is missing (because it has been excluded or
|
1983 | filtered), an exception will arise. In this example, the connection to BigML
|
1984 | is used only in the `get` method call to retrieve the remote linear
|
1985 | regression information. The callback code, where the `localLinearRegression`
|
1986 | and predictions are built, is strictly local.
|
1987 |
|
1988 | The local linear regression constructor accepts also a connection object.
|
1989 | The connection object can also include the user's credentials and
|
1990 | a storage directory. Setting that
|
1991 | will cause the `LocalLinear` to check whether it can find a local linear
|
1992 | regression JSON file in this directory before trying to download
|
1993 | it from the server. This
|
1994 | means that your model information will only be downloaded the first time
|
1995 | you use it in a `LocalLinear` instance. Instances that use the same
|
1996 | connection
|
1997 | object will read the local file instead.
|
1998 |
|
1999 | ```js
|
2000 | var bigml = require('bigml');
|
2001 | var myUser = 'myuser';
|
2002 | var myKey = 'ae579e7e53fb9abd646a6ff8aa99d4afe83ac291';
|
2003 | var my_storage = './my_storage'
|
2004 | var connection = new bigml.BigML(myUser, myKey, {storage: my_storage});
|
2005 | var localLinear = new bigml.LocalLinear(
|
2006 | 'linearregression/51922d0b37203f2a8c000010', connection);
|
2007 | ```
|
2008 |
|
2009 |
|
2010 | Local predictions' distribution
|
2011 | -------------------------------
|
2012 |
|
2013 | For classification models, the local model, ensemble, logistic regression,
|
2014 | deepnet, linear regression, or fusion objects offer a method that
|
2015 | produces the predicted distribution of probabilities
|
2016 | for each of the categories in the objective field.
|
2017 |
|
2018 | ```js
|
2019 | var probabilities = localModel.predictProbability(inputData,
|
2020 | missingStrategy,
|
2021 | cb);
|
2022 | ```
|
2023 | The result of this call will generate list of objects that contain the
|
2024 | category name and the probability predicted for that category, for instance:
|
2025 |
|
2026 | ```js
|
2027 | [{"category": "Iris-setosa", "probability": 0.53},
|
2028 | {"category": "Iris-virginica", "probability": 0.20},
|
2029 | {"category": "Iris-versicolor", "probability": 0.27}]
|
2030 | ```
|
2031 |
|
2032 | Local Clusters
|
2033 | --------------
|
2034 |
|
2035 | A remote cluster encloses all the information required to predict the centroid
|
2036 | associated to a given input data set. Thus, you can build a local version of
|
2037 | a cluster and predict centroids locally using
|
2038 | the `LocalCluster` class.
|
2039 |
|
2040 | ```js
|
2041 | var bigml = require('bigml');
|
2042 | var localCluster = new bigml.LocalCluster('cluster/51922d0b37203f2a8c000010');
|
2043 | localCluster.centroid({'petal length': 1, 'petal width': 1,
|
2044 | 'sepal length': 1, 'sepal width': 1,
|
2045 | 'species': 'Iris-setosa'},
|
2046 | function(error, centroid) {console.log(centroid)});
|
2047 | ```
|
2048 |
|
2049 | Note that, to find the associated centroid, input data cannot contain missing
|
2050 | values in numeric fields. The centroid method can also be used labelling
|
2051 | input data with the corresponding field id.
|
2052 |
|
2053 | As you see, the first parameter to the `LocalCluster` constructor is a cluster
|
2054 | id (or object). The constructor allows a second optional argument, a connection
|
2055 | object (as described in the [Authentication section](#authentication)).
|
2056 |
|
2057 | ```js
|
2058 | var bigml = require('bigml');
|
2059 | var myUser = 'myuser';
|
2060 | var myKey = 'ae579e7e53fb9abd646a6ff8aa99d4afe83ac291';
|
2061 | var localCluster = new bigml.LocalCluster('cluster/51922d0b37203f2a8c000010',
|
2062 | new bigml.BigML(myUser, myKey));
|
2063 | localCluster.centroid({'000000': 1, '000001': 1,
|
2064 | '000002': 1, '000003': 1,
|
2065 | '000004': 'Iris-setosa'},
|
2066 | function(error, centroid) {console.log(centroid)});
|
2067 | ```
|
2068 |
|
2069 | When the first argument is a finished cluster object, the constructor creates
|
2070 | immediately
|
2071 | a `LocalCluster` instance ready to predict. Then, the `LocalCluster.centroid`
|
2072 | method can be immediately called in a synchronous way.
|
2073 |
|
2074 |
|
2075 | ```js
|
2076 | var bigml = require('bigml');
|
2077 | var cluster = new bigml.Cluster();
|
2078 | cluster.get('cluster/51b3c45a37203f16230000b5', true,
|
2079 | 'only_model=true;limit=-1',
|
2080 | function (error, resource) {
|
2081 | if (!error && resource) {
|
2082 | var localCluster = new bigml.LocalCluster(resource);
|
2083 | var centroid = localCluster.centroid({'000000': 1, '000001': 1,
|
2084 | '000002': 1, '000003': 1,
|
2085 | '000004': 'Iris-setosa'});
|
2086 | console.log(centroid);
|
2087 | }
|
2088 | })
|
2089 | ```
|
2090 | Note that the `get` method's second and third arguments ensure that the
|
2091 | retrieval waits for the model to be finished before retrieving it and that all
|
2092 | the fields used in the cluster will be downloaded respectively. Beware of using
|
2093 | filtered fields clusters to instantiate a local cluster. If an important field
|
2094 | is missing (because it has been excluded or
|
2095 | filtered), an exception will arise. In this example, the connection to BigML
|
2096 | is used only in the `get` method call to retrieve the remote cluster
|
2097 | information. The callback code, where the `localCluster` and predictions
|
2098 | are built, is strictly local.
|
2099 |
|
2100 | The local cluster constructor accepts also a connection object.
|
2101 | The connection object can also include the user's credentials and
|
2102 | a storage directory. Setting that
|
2103 | will cause the `LocalCluster` to check whether it can find a local cluster
|
2104 | SON file in this directory before trying to download
|
2105 | it from the server. This
|
2106 | means that your model information will only be downloaded the first time
|
2107 | you use it in a `LocalCluster` instance. Instances that use the same
|
2108 | connection
|
2109 | object will read the local file instead.
|
2110 |
|
2111 | ```js
|
2112 | var bigml = require('bigml');
|
2113 | var myUser = 'myuser';
|
2114 | var myKey = 'ae579e7e53fb9abd646a6ff8aa99d4afe83ac291';
|
2115 | var my_storage = './my_storage'
|
2116 | var connection = new bigml.BigML(myUser, myKey, {storage: my_storage});
|
2117 | var localCluster = new bigml.LocalCluster(
|
2118 | 'cluster/51922d0b37203f2a8c000010', connection);
|
2119 | ```
|
2120 |
|
2121 | Local Anomaly Detectors
|
2122 | -----------------------
|
2123 |
|
2124 | A remote anomaly detector encloses all the information required to predict the
|
2125 | anomaly score
|
2126 | associated to a given input data set. Thus, you can build a local version of
|
2127 | an anomaly detector and predict anomaly scores locally using
|
2128 | the `LocalAnomaly` class.
|
2129 |
|
2130 | ```js
|
2131 | var bigml = require('bigml');
|
2132 | var localAnomaly = new bigml.LocalAnomaly('anomaly/51922d0b37203f2a8c003010');
|
2133 | localAnomaly.anomalyScore({'srv_serror_rate': 0.0, 'src_bytes': 181.0,
|
2134 | 'srv_count': 8.0, 'serror_rate': 0.0},
|
2135 | function(error, anomalyScore) {
|
2136 | console.log(anomalyScore)});
|
2137 | ```
|
2138 |
|
2139 | The anomaly score method can also be used labelling
|
2140 | input data with the corresponding field id.
|
2141 |
|
2142 | As you see, the first parameter to the `LocalAnomaly` constructor is an anomaly
|
2143 | detector id (or object). The constructor allows a second optional argument,
|
2144 | a connection
|
2145 | object (as described in the [Authentication section](#authentication)).
|
2146 |
|
2147 | ```js
|
2148 | var bigml = require('bigml');
|
2149 | var myUser = 'myuser';
|
2150 | var myKey = 'ae579e7e53fb9abd646a6ff8aa99d4afe83ac291';
|
2151 | var localAnomaly = new bigml.LocalAnomaly('anomaly/51922d0b37203f2a8c003010',
|
2152 | new bigml.BigML(myUser, myKey));
|
2153 | localAnomaly.anomalyScore({'000020': 9.0, '000004': 181.0, '000016': 8.0,
|
2154 | '000024': 0.0, '000025': 0.0},
|
2155 | function(error, anomalyScore) {console.log(anomalyScore)});
|
2156 | ```
|
2157 |
|
2158 | When the first argument is a finished anomaly detector object, the constructor creates
|
2159 | immediately
|
2160 | a `LocalAnomaly` instance ready to predict. Then, the `LocalAnomaly.anomalyScore`
|
2161 | method can be immediately called in a synchronous way.
|
2162 |
|
2163 |
|
2164 | ```js
|
2165 | var bigml = require('bigml');
|
2166 | var anomaly = new bigml.Anomaly();
|
2167 | anomaly.get('anomaly/51b3c45a37203f16230030b5', true,
|
2168 | 'only_model=true;limit=-1',
|
2169 | function (error, resource) {
|
2170 | if (!error && resource) {
|
2171 | var localAnomaly = new bigml.LocalAnomaly(resource);
|
2172 | var anomalyScore = localAnomaly.anomalyScore(
|
2173 | {'000020': 9.0, '000004': 181.0, '000016': 8.0,
|
2174 | '000024': 0.0, '000025': 0.0});
|
2175 | console.log(anomalyScore);
|
2176 | }
|
2177 | })
|
2178 | ```
|
2179 | Note that the `get` method's second and third arguments ensure that the
|
2180 | retrieval waits for the model to be finished before retrieving it and that all
|
2181 | the fields used in the anomaly detector will be downloaded respectively.
|
2182 | Beware of using
|
2183 | filtered fields anomaly detectors to instantiate a local anomaly detector.
|
2184 | If an important field
|
2185 | is missing (because it has been excluded or
|
2186 | filtered), an exception will arise. In this example, the connection to BigML
|
2187 | is used only in the `get` method call to retrieve the remote anomaly detector
|
2188 | information. The callback code, where the `localAnomaly` and scores
|
2189 | are built, is strictly local.
|
2190 |
|
2191 | The top anomalies in the `LocalAnomaly` can be extracted from the original
|
2192 | dataset by filtering the rows that have the highest score. The filter
|
2193 | expression that can single out these rows can be extracted using the
|
2194 | `anomaliesFilter` method:
|
2195 |
|
2196 | ```js
|
2197 | localAnomaly.anomaliesFilter(true,
|
2198 | function(error, data) {console.log(data);});
|
2199 | ```
|
2200 |
|
2201 | When the first argument is set to `true`, the filter corresponds to the top
|
2202 | anomalies. On the contrary, if set to `false` the filter will exclude
|
2203 | the top anomalies from the dataset.
|
2204 |
|
2205 | The local anomaly constructor accepts also a connection object.
|
2206 | The connection object can also include the user's credentials and
|
2207 | a storage directory. Setting that
|
2208 | will cause the `LocalAnomaly` to check whether it can find a local anomaly
|
2209 | JSON file in this directory before trying to download
|
2210 | it from the server. This
|
2211 | means that your model information will only be downloaded the first time
|
2212 | you use it in a `LocalAnomaly` instance. Instances that use the same
|
2213 | connection
|
2214 | object will read the local file instead.
|
2215 |
|
2216 | ```js
|
2217 | var bigml = require('bigml');
|
2218 | var myUser = 'myuser';
|
2219 | var myKey = 'ae579e7e53fb9abd646a6ff8aa99d4afe83ac291';
|
2220 | var my_storage = './my_storage'
|
2221 | var connection = new bigml.BigML(myUser, myKey, {storage: my_storage});
|
2222 | var localAnomaly = new bigml.LocalAnomaly(
|
2223 | 'anomaly/51922d0b37203f2a8c000010', connection);
|
2224 | ```
|
2225 |
|
2226 | Local Associations
|
2227 | ------------------
|
2228 |
|
2229 | A remote association object encloses the information about which field values
|
2230 | in your dataset are related. The values are structured as items and
|
2231 | their relations are described as rules. The `LocalAssociation`
|
2232 | class allows you to build a local version of this remote object, and get
|
2233 | both the list of `Items` that can be related and the list of `AssociationRules`
|
2234 | that describe such relations. Creating a `LocalAssociation` object is as
|
2235 | simple as
|
2236 |
|
2237 |
|
2238 | ```js
|
2239 | var bigml = require('bigml');
|
2240 | var localAssociation = new bigml.LocalAssociation('association/51922d0b37203f2a8c003010');
|
2241 | ```
|
2242 |
|
2243 | and you can list the `AssociationRules` that it contains using the `getRules`
|
2244 | method
|
2245 |
|
2246 | ```js
|
2247 | var index = 0;
|
2248 | associationRules = localAssociation.getRules();
|
2249 | for (index = 0; index < associationRules.length; index++) {
|
2250 | console.log(associationRules[index].describe());
|
2251 | }
|
2252 | ```
|
2253 |
|
2254 | As you can see in the previous code, the `AssociationRule` object has a
|
2255 | `describe` method that will generate a human-readable description of the
|
2256 | rule.
|
2257 |
|
2258 | The `getRules` method accepts also several arguments that will allow you to
|
2259 | filter the rules by its leverage, strength, support, p-value, the list of
|
2260 | items they contain or a user-given filter function. They can all be added
|
2261 | as attributes of a filters object as first argument. The second argument can
|
2262 | be a callback function. The previous example was a syncrhonous call to the
|
2263 | method that will only work once the `localAssociation` object is ready. To
|
2264 | use the method asyncrhonously you can use:
|
2265 |
|
2266 | ```js
|
2267 | var associationRules;
|
2268 | localAssociation.getRules(
|
2269 | {minLeverage: 0.3}, // filter by minimum Leverage
|
2270 | function(error, data) {associationRules = data;}) // callback code
|
2271 | ```
|
2272 |
|
2273 | See the method docstring for filter options details.
|
2274 |
|
2275 | Similarly, you can obtain the list of `Items` involved in the association
|
2276 | rules
|
2277 |
|
2278 | ```js
|
2279 | var index = 0;
|
2280 | items = localAssociation.getItems();
|
2281 | for (index = 0; index < items.length; index++) {
|
2282 | console.log(items[index].describe());
|
2283 | }
|
2284 | ```
|
2285 |
|
2286 | and they can be filtered by their field ID, name, an object containing
|
2287 | input data or a user-given function. See the method docstring for details.
|
2288 |
|
2289 | You can also save the rules to a CSV file using the `rulesCSV` method
|
2290 |
|
2291 | ```js
|
2292 | minimumLeverage = 0.3;
|
2293 | localAssociation.rulesCSV(
|
2294 | './my_csv.csv', // fileName
|
2295 | {minLeverage: minimumLeverage}); // filters for the rules
|
2296 | ```
|
2297 |
|
2298 | as you can see, the first argument is the path to the CSV file where the
|
2299 | rules will be stored and the second one is the list of rules. In this example,
|
2300 | we are only storing the rules whose leverage is over the 0.3 threshold.
|
2301 |
|
2302 | Both the `getItems` and the `rulesCSV` methods can also be called
|
2303 | asynchronously as we saw for the `getRules` method.
|
2304 |
|
2305 |
|
2306 | The `LocalAssociation` object can be used to retrieve the `association sets`
|
2307 | related to a certain input data.
|
2308 |
|
2309 |
|
2310 | ```js
|
2311 | var bigml = require('bigml');
|
2312 | var localAssociation = new bigml.LocalAssociation(
|
2313 | 'association/55922d0b37203f2a8c003010');
|
2314 | localAssociation.associationSet({product: 'cat food'},
|
2315 | function(error, associationSet) {
|
2316 | console.log(associationSet)});
|
2317 | ```
|
2318 |
|
2319 | When the
|
2320 | `LocalAssociation` instance is ready, the `LocalAssociation.associationSet`
|
2321 | method can be immediately called to predict in a synchronous way.
|
2322 |
|
2323 |
|
2324 | ```js
|
2325 | var bigml = require('bigml');
|
2326 | var association = new bigml.Association();
|
2327 | association.get('association/51b3c45a37203f16230530b5', true,
|
2328 | 'only_model=true;limit=-1',
|
2329 | function (error, resource) {
|
2330 | if (!error && resource) {
|
2331 | var localAssociation = new bigml.LocalAssociation(resource);
|
2332 | var associationSet = localAssociation.associationSet(
|
2333 | {'000020': 'cat food'});
|
2334 | console.log(associationSet);
|
2335 | }
|
2336 | })
|
2337 | ```
|
2338 | In this example, the connection to BigML
|
2339 | is used only in the `get` method call to retrieve the remote association
|
2340 | information. The callback code, where the `localAssociation` and scores
|
2341 | are built, is strictly local.
|
2342 |
|
2343 |
|
2344 | Local Topic Models
|
2345 | ------------------
|
2346 |
|
2347 | A remote `topic model` object contains a list of topics extracted from the
|
2348 | terms in the collection of documents used in its training. The
|
2349 | `LocalTopicModel`
|
2350 | class allows you to build a local version of this remote object, and get
|
2351 | the list of `Topics` and the terms distribution for each of them. Using this
|
2352 | information, its `distribution` method computes the probabilities for a new
|
2353 | document to be classified under each of the `Topics`.
|
2354 | Creating a `LocalTopicModel` object is as
|
2355 | simple as
|
2356 |
|
2357 |
|
2358 | ```js
|
2359 | var bigml = require('bigml');
|
2360 | var localTopicModel = new bigml.LocalTopicModel('topicmodel/51922d0b37203f4b8c003010');
|
2361 | ```
|
2362 |
|
2363 | and obtaining the `TopicDistribution` for a new document:
|
2364 |
|
2365 | ```js
|
2366 | var newDocument = {"Message": "Where are you?when wil you reach here?"}
|
2367 | localTopicModel.distribution(newDocument,
|
2368 | function(error, topicDistribution) {
|
2369 | console.log(topicDistribution)});
|
2370 | ```
|
2371 |
|
2372 | Note that only text fields are considered to decide the `topic distribution`
|
2373 | of a document, and their contents will be concatenated.
|
2374 |
|
2375 | As you see, the first parameter to the `LocalTopicModel` constructor is a
|
2376 | `topic model`
|
2377 | id (or object). The constructor allows a second optional argument, a connection
|
2378 | object (as described in the [Authentication section](#authentication)).
|
2379 |
|
2380 | ```js
|
2381 | var bigml = require('bigml');
|
2382 | var myUser = 'myuser';
|
2383 | var myKey = 'ae579e7e53fb9abd646a6ff8aa99d4afe83ac291';
|
2384 | var localTopicModel = new bigml.LocalTopicModel('topicmodel/51522d0b37203f2a8c000010',
|
2385 | new bigml.BigML(myUser, myKey));
|
2386 | localTopicModel.distribution({'000001': "Where are you?when wil you reach here?"},
|
2387 | function(error, topicDistribution) {console.log(topicDistribution)});
|
2388 | ```
|
2389 |
|
2390 | When the first argument is a finished `topic model` object,
|
2391 | the constructor creates
|
2392 | immediately
|
2393 | a `LocalTopicModel` instance ready to be used. Then, the
|
2394 | `LocalTopicModel.distribution` method can be immediately called
|
2395 | in a synchronous way.
|
2396 |
|
2397 |
|
2398 | ```js
|
2399 | var bigml = require('bigml');
|
2400 | var topicModel = new bigml.TopicModel();
|
2401 | topicModel.get('topicmodel/51b3c45a47203f16230000b5', true,
|
2402 | 'only_model=true;limit=-1',
|
2403 | function (error, resource) {
|
2404 | if (!error && resource) {
|
2405 | var localTopicModel = new bigml.LocalTopicModel(resource);
|
2406 | var topicDistribution = localTopicModel.distribution(
|
2407 | {'000001': "Where are you?when wil you reach here?"},
|
2408 | console.log(topicDistribution);
|
2409 | }
|
2410 | })
|
2411 | ```
|
2412 | Note that the `get` method's second and third arguments ensure that the
|
2413 | retrieval waits for the `topic model` to be finished before retrieving
|
2414 | it and that all
|
2415 | the fields used in the `topic model` will be downloaded.
|
2416 | Beware of using
|
2417 | filtered fields topic models to instantiate a local topic model.
|
2418 | If an important field
|
2419 | is missing (because it has been excluded or
|
2420 | filtered), an exception will arise. In this example, the connection to BigML
|
2421 | is used only in the `get` method call to retrieve the remote topic model
|
2422 | information. The callback code, where the `localTopicModel` and distributions
|
2423 | are built, is strictly local.
|
2424 |
|
2425 | The local association constructor accepts also a connection object.
|
2426 | The connection object can also include the user's credentials and
|
2427 | a storage directory. Setting that
|
2428 | will cause the `LocalAssociation` to check whether it can find a local
|
2429 | association JSON file in this directory before trying to download
|
2430 | it from the server. This
|
2431 | means that your model information will only be downloaded the first time
|
2432 | you use it in a `LocalAssociation` instance. Instances that use the same
|
2433 | connection
|
2434 | object will read the local file instead.
|
2435 |
|
2436 | ```js
|
2437 | var bigml = require('bigml');
|
2438 | var myUser = 'myuser';
|
2439 | var myKey = 'ae579e7e53fb9abd646a6ff8aa99d4afe83ac291';
|
2440 | var my_storage = './my_storage'
|
2441 | var connection = new bigml.BigML(myUser, myKey, {storage: my_storage});
|
2442 | var localAssociation = new bigml.LocalAssociation(
|
2443 | 'association/51922d0b37203f2a8c000010', connection);
|
2444 | ```
|
2445 |
|
2446 | Local Time Series
|
2447 | -----------------
|
2448 |
|
2449 | A remote time series resource encloses all the information
|
2450 | required to produce forecasts for all the numeric fields that have been
|
2451 | previously declared as its objective fields.
|
2452 | Thus, you can build a local version of
|
2453 | a time series and generate forecasts using the `LocalTimeSeries` class.
|
2454 |
|
2455 | ```js
|
2456 | var bigml = require('bigml');
|
2457 | var localTimeSeries = new bigml.LocalTimeSeries(
|
2458 | 'timeseries/51922d0b37203f2a8c001010');
|
2459 | localTimeSeries.forecast({'Final': {'horizon': 10}},
|
2460 | function(error, forecast) {
|
2461 | console.log(forecast)});
|
2462 | ```
|
2463 |
|
2464 | The forecast method can also be used labelling
|
2465 | input data with the corresponding field id. The result of this call will
|
2466 | contain forecasts for the fields, horizons and ETS models given as input
|
2467 | data. In the example, the response will be an object like:
|
2468 |
|
2469 | ```js
|
2470 | { '000005': [ { model: 'A,N,N', pointForecast: [ 68.53181,
|
2471 | 68.53181,
|
2472 | 68.53181,
|
2473 | 68.53181,
|
2474 | 68.53181,
|
2475 | 68.53181,
|
2476 | 68.53181,
|
2477 | 68.53181,
|
2478 | 68.53181,
|
2479 | 68.53181 ]}]}
|
2480 | ```
|
2481 | that contains the ID of the forecasted field and the forecast for the best
|
2482 | performing model in the `time series` (according to `aic`). You can
|
2483 | read more about the available error metrics and the input data parameters in
|
2484 | the [time series API documentation](https://bigml.com/api/timeseries) and
|
2485 | the [forecast API documentation](https://bigml.com/api/forecasts).
|
2486 |
|
2487 | As you see, the first parameter to the `LocalTimeSeries` constructor
|
2488 | is a time series ID (or object). The constructor allows a second
|
2489 | optional argument, a connection
|
2490 | object (as described in the [Authentication section](#authentication)).
|
2491 |
|
2492 | ```js
|
2493 | var bigml = require('bigml');
|
2494 | var myUser = 'myuser';
|
2495 | var myKey = 'ae579e7e53fb9abd646a6ff8aa99d4afe83ac291';
|
2496 | var localTimeSeries = new bigml.LocalTimeSeries(
|
2497 | 'timeseries/51922d0b37203f2a8c001010',
|
2498 | new bigml.BigML(myUser, myKey));
|
2499 | localTimeSeries.forecast({'000005': {"horizon": 10,
|
2500 | "ets_models": {"criterion": "aic",
|
2501 | "names": ["A,A,N"],
|
2502 | "indices": [3,7],
|
2503 | "limit": 2}}},
|
2504 | function(error, forecast) {
|
2505 | console.log(forecast)});
|
2506 | ```
|
2507 |
|
2508 | When the first argument is a finished time series object,
|
2509 | the constructor creates immediately
|
2510 | a `LocalTimeSeries` instance ready to predict. Then,
|
2511 | the `LocalTimeSeries.forecast`
|
2512 | method can be immediately called in a synchronous way.
|
2513 |
|
2514 |
|
2515 | ```js
|
2516 | var bigml = require('bigml');
|
2517 | var timeSeries = new bigml.TimeSeries();
|
2518 | timeSeries.get('timeseries/51b3c45a37203f16230000b5', true,
|
2519 | 'only_model=true;limit=-1',
|
2520 | function (error, resource) {
|
2521 | if (!error && resource) {
|
2522 | var localTimeSeries = new bigml.LocalTimeSeries(
|
2523 | resource);
|
2524 | var prediction = localTimeSeries.forecast(
|
2525 | {'Final': {'horizon': 10}});
|
2526 | console.log(prediction);
|
2527 | }
|
2528 | })
|
2529 | ```
|
2530 | Note that the `get` method's second and third arguments ensure that the
|
2531 | retrieval waits for the time series to be finished before retrieving it
|
2532 | and that all
|
2533 | the fields used in the time series models will be downloaded respectively.
|
2534 | Beware of using
|
2535 | filtered fields time series to instantiate a local time series
|
2536 | object.
|
2537 |
|
2538 | The local time series constructor accepts also a connection object.
|
2539 | The connection object can also include the user's credentials and
|
2540 | a storage directory. Setting that
|
2541 | will cause the `LocalTimeSeries` to check whether it can find a local time
|
2542 | series JSON file in this directory before trying to download
|
2543 | it from the server. This
|
2544 | means that your model information will only be downloaded the first time
|
2545 | you use it in a `LocalTimeSeries` instance. Instances that use the same
|
2546 | connection
|
2547 | object will read the local file instead.
|
2548 |
|
2549 | ```js
|
2550 | var bigml = require('bigml');
|
2551 | var myUser = 'myuser';
|
2552 | var myKey = 'ae579e7e53fb9abd646a6ff8aa99d4afe83ac291';
|
2553 | var my_storage = './my_storage'
|
2554 | var connection = new bigml.BigML(myUser, myKey, {storage: my_storage});
|
2555 | var localTimeSeries = new bigml.LocalTimeSeries(
|
2556 | 'timeseries/51922d0b37203f2a8c000010', connection);
|
2557 | ```
|
2558 |
|
2559 | External Connectors
|
2560 | -------------------
|
2561 |
|
2562 | BigML offers an `externalconnector` resource that can be used to connect to
|
2563 | external data sources. The description of the API requirements to create an
|
2564 | `ExternalConnector` can be found in the
|
2565 | [API documentation](https://bigml.com/api/externalconnectors). The required
|
2566 | information to create an external connector are parameters like the type
|
2567 | of database manager, the host, user, password and table that we need to access.
|
2568 | This must be provided as first argument to the `bigml.ExternalConnector`
|
2569 | create method or can be set in environment variables (leaving the first
|
2570 | argument undefined):
|
2571 |
|
2572 | ```batch
|
2573 | export BIGML_EXTERNAL_CONN_HOST=db.host.com
|
2574 | export BIGML_EXTERNAL_CONN_PORT=4324
|
2575 | export BIGML_EXTERNAL_CONN_USER=my_user
|
2576 | export BIGML_EXTERNAL_CONN_PWD=my_password
|
2577 | export BIGML_EXTERNAL_CONN_DB=my_database
|
2578 | export BIGML_EXTERNAL_CONN_SOURCE="postgresql"
|
2579 | ```
|
2580 |
|
2581 |
|
2582 | ```js
|
2583 | var bigml = require('bigml');
|
2584 | var externalConnectorConn = new bigml.ExternalConnector(),
|
2585 | externalConnectorId, connectionInfo,
|
2586 | args = {"name": "my connector", "source": "postgresql"}, retry = false;
|
2587 | externalConnectorConn.create(connectionInfo, args, retry,
|
2588 | function(error, data) {
|
2589 | if (error) {console.log(error);
|
2590 | } else {
|
2591 | externalConnectorId = data.resource;
|
2592 | }
|
2593 | });
|
2594 | // As connectionInfo is undefined, the environment variables are retrieved
|
2595 | // and the information used to create the external connector is
|
2596 | // {"host": "db.host.com",
|
2597 | // "port": 4321,
|
2598 | // "user": "my_user",
|
2599 | // "password": "my_password",
|
2600 | // "database": "my_database"}
|
2601 | // Alternatively, you can provide this information as first argument
|
2602 | // for the ExternalConnector create method.
|
2603 | ```
|
2604 |
|
2605 | Logging configuration and Exceptions
|
2606 | ------------------------------------
|
2607 |
|
2608 | Logging is configured at startup to use the
|
2609 | [winston](https://github.com/flatiron/winston) logging library. The environment
|
2610 | variables ``BIGML_LOG_EXCEPTIONS`` and ``BIGML_EXIT_ON_ERROR`` can be set to
|
2611 | ``0`` or ``1`` to control whether BigML takes care of logging the errors and/or
|
2612 | causes exiting. By default, BigML will use ``winston`` to handle errors and
|
2613 | will exit when an uncaught exception is raised.
|
2614 |
|
2615 | Logs will be sent
|
2616 | both to console and a `bigml.log` file by default. You can change this
|
2617 | behaviour by using:
|
2618 |
|
2619 | - BIGML_LOG_FILE: path to the log file.
|
2620 | - BIGML_LOG_LEVEL: log level (0 - no output at all, 1 - console and file log,
|
2621 | 2 - console log only, 3 - file log only,
|
2622 | 4 - console and log file with debug info)
|
2623 |
|
2624 | For instance,
|
2625 |
|
2626 | ```batch
|
2627 | export BIGML_LOG_FILE=/tmp/my_log_file.log
|
2628 | export BIGML_LOG_LEVEL=3
|
2629 | ```
|
2630 |
|
2631 | would store log information only in the `/tmp/my_log_file.log` file.
|
2632 |
|
2633 | Additional Information
|
2634 | ----------------------
|
2635 |
|
2636 | For additional information about the API, see the
|
2637 | [BigML developer's documentation](https://bigml.com/api).
|