guv
Version:
Grid Utilization Virgilante
160 lines (104 loc) • 5.56 kB
Markdown
guv
===
`guv`, aka Grid Utilization Vigilante, is a governor for your (Heroku) workers:
It automatically scales the numbers of workers based on number of pending jobs in a (RabbitMQ) queue.
The number of workers is calculated to attempt that all jobs are completed within a specified *deadline*.
This is based on estimates of the processing time (mean, variance).
guv is written in Node.js, but can be used with workers in any programming language.
[Origin of the word guv](http://english.stackexchange.com/questions/14370/what-is-the-origin-of-the-british-guv-is-it-still-used-colloquially).
## Status
*In production*
* Supports RabbitMQ messaging system and Heroku workers
* Optional metric reporting to statuspage.io and New Relic
* Uses simple algorithm for scaling to hit a deadline
* Note: Scaling algorithm does not compensate for worker boot time
* Used in production at [The Grid](https://thegrid.io), with [MsgFlo](https://github.com/msgflo/msgflo)
## Usage
Install as NPM dependency
npm install --save guv
Add it to your Procfile
echo "guv: node node_modules/.bin/guv" >> Procfile
Configure an Heroku API key to use. Get it
heroku config:set HEROKU_API_KEY=`heroku auth:token`
Configure RabbitMQ instance to use. It must have the management plugin installed and configured.
heroku config:set GUV_BROKER=amqp://[user:pass]@example.net/instance
Note: If you use CloudAMQP, guv will automatically respect the `CLOUDAMQP_URL` envvar. No config needed.
For guv own configuration we also recommend using an envvar.
This allows you to change the configuration without redeploying.
See below for details on the configuration format.
heroku config:set GUV_CONFIG="`cat autoscale.guv`"
To verify that guv is running and working, check its log.
heroku logs --app myapp --ps guv
The configuration format for guv is based specified in [YAML](http://yaml.org/).
Since it is a superset of JSON, you can also use that.
One guv instance can handle multiple worker *roles*.
Each role has an associated queue, worker and scaling configuration - specified as variables.
myrole:
variable1: value1
otherrole:
variable2: value2
The special role name `*` is used for global, application-wide settings.
Each of the individual roles will inherit this configuration if they do not override it.
'*': {min: 5, max: 50, app: my-heroku-app}
imageprocessing: {}
textprocessing:
max: 10
guv attempts to scale workers to be within a `deadline`, based on estimates of `processing` time.
To let it do a good job you should always specify the deadline, and *mean* processing time.
textprocessing:
deadline: 100
processing: 30
You can also specify the variance, as 1 standard deviation
textprocessing:
deadline: 100
processing: 30
stddev: 3
The name of the `worker` and `queue` defaults to the `role name`, but can be overridden.
colorextract: {}
histogram:
queue: 'hist.INPUT'
worker: processhistograms
For list of all supported configuration variables see [./src/config.coffee](./src/config.coffee).
Many of the commonly used ones have short and long-form names.
guv configuration files by convention use the extension *.guv*, for instance `autoscale.guv` or `myproject.guv`.
guv can report metrics about how workers are being scaled to [New Relic](https://newrelic.com/) Insights.
FIXME: document how to set up
guv can report metrics about in-flight jobs to your [statuspage.io](http://statuspage.io/).
Set the API key as an environment variable
export STATUSPAGE_API_TOKEN=mytoken
And configure in your guv.yaml file:
'*':
statuspage: 'my-statuspage-id'
workerA:
metric: 'my-statuspage-metric'
* Measure the actual processing times of your jobs.
Calculate mean and standard deviations, and use these in the `guv` config.
* Measure the actual end-to-end processing time for your jobs.
Monitor that they meet your target deadlines, and identify the load cases where they do not.
* Keep boot times of your workers as low as possible.
Responsiveness to variable loads, especially sudden peaks is severely affected by boot time.
Avoid doing expensive computations, lots of I/O or waiting for external services.
If neccesary do more during app build time, like creating caches or single-file builds.
* Separate out jobs with different processing time characterstics to different worker roles
If your job processing time has several clear peaks instead of one in the processing time histogram,
determine what the different cases are and use dedicated worker roles.
If your job processing time depends on the size of the job payload, implement an estimation
algorithm and route jobs into N different queues depending on which bin they fall into.
* Separate out jobs with different deadlines to different workers
Maintenance work, and anything else not required to maintain responsiveness for users,
should be done in separate queues, usually with a higher deadline.
This ensures that such work does not disrupt quality-of-service.
* Use only one primary input queue per worker