# hot-shots

A Node.js client for Datadog's [DogStatsD](http://docs.datadoghq.com/guides/dogstatsd/) server, InfluxDB's [Telegraf](https://github.com/influxdb/telegraf) StatsD server, the OpenTelemetry Collector [StatsD receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/statsdreceiver), and Etsy's [StatsD](https://github.com/etsy/statsd) server.

This project was originally a fork off of [node-statsd](https://github.com/sivy/node-statsd).  This project
includes all changes in the latest node-statsd and many additional changes, including:
* uds (Unix domain socket) protocol support
* raw stream protocol support
* TypeScript types
* Telegraf support
* events
* child clients
* tcp protocol support
* mock mode
* asyncTimer
* asyncDistTimer
* debug logging
* much more, including many bug fixes

You can read about all changes in [the changelog](CHANGES.md).

hot-shots supports Node 18.x and higher. When using types.d.ts, hot-shots require TypeScript 4.0 or higher.

![Build Status](https://github.com/bdeitte/hot-shots/actions/workflows/node-test.js.yml/badge.svg)

## Example

```javascript
const StatsD = require('hot-shots');
const client = new StatsD();

client.increment('my_counter');
```

## Usage

All initialization parameters are optional.

Parameters (specified as one object passed into hot-shots):

* `host`:        The host to send stats to, if not set, the constructor tries to
  retrieve it from the `DD_AGENT_HOST` environment variable, `default: 'undefined'` which as per [UDP/datagram socket docs](https://nodejs.org/api/dgram.html#dgram_socket_send_msg_offset_length_port_address_callback) results in `127.0.0.1` or `::1` being used.
* `port`:        The port to send stats to, if not set, the constructor tries to retrieve it from the `DD_DOGSTATSD_PORT` environment variable, `default: 8125`
* `prefix`:      What to prefix each stat name with `default: ''`. A period separator is automatically added if not present (e.g. `my_prefix` becomes `my_prefix.`).
* `suffix`:      What to suffix each stat name with `default: ''`. A period separator is automatically added if not present (e.g. `my_suffix` becomes `.my_suffix`).
* `tagPrefix`:   Prefix tag list with character `default: '#'`. Note does not work with `telegraf` option.
* `tagSeparator`: Separate tags with character `default: ','`. Note does not work with `telegraf` option.
* `globalize`:   Expose this StatsD instance globally. `default: false`
* `cacheDns`:    Caches dns lookup to *host* for *cacheDnsTtl*, only used
  when protocol is `udp`, `default: false`
* `cacheDnsTtl`: time-to-live of dns lookups in milliseconds, when *cacheDns* is enabled. `default: 60000`
* `mock`:        Create a mock StatsD instance, using a mock transport that doesn't create real sockets.
  Stats are not sent to the server but can be read from mockBuffer for testing.  Note that
  mockBuffer will keep growing, so only use for testing or clear out periodically. `default: false`
* `globalTags`:  Tags that will be added to every metric. Can be either an object or list of tags. `default: {}`.
* `includeDataDogTags`: Whether to include DataDog tags to the global tags. `default: true`. The following *Datadog* tags are appended to `globalTags` from the corresponding environment variable if the latter is set:
  * `dd.internal.entity_id` from `DD_ENTITY_ID` ([docs](https://docs.datadoghq.com/developers/dogstatsd/?tab=kubernetes#origin-detection-over-udp))
  * `env` from `DD_ENV` ([docs](https://docs.datadoghq.com/getting_started/tagging/unified_service_tagging/?tab=kubernetes#full-configuration))
  * `service` from `DD_SERVICE` ([docs](https://docs.datadoghq.com/getting_started/tagging/unified_service_tagging/?tab=kubernetes#full-configuration))
  * `version` from `DD_VERSION` ([docs](https://docs.datadoghq.com/getting_started/tagging/unified_service_tagging/?tab=kubernetes#full-configuration))
* `maxBufferSize`: If larger than 0,  metrics will be buffered and only sent when the string length is greater than the size. `default: 0` for udp and tcp.  `default: 8192` for uds.
* `bufferFlushInterval`: If buffering is in use, this is the time in ms to always flush any buffered metrics. `default: 1000`
* `telegraf`:    Use Telegraf's StatsD line protocol, which is slightly different than the rest `default: false`
* `sampleRate`:    Sends only a sample of data to StatsD for all StatsD methods.  Can be overridden at the method level. `default: 1`
* `errorHandler`: A function with one argument. It is called to handle various errors. `default: none`, errors are thrown/logger to console
* `useDefaultRoute`: Use the default interface on a Linux system. Useful when running in containers
* `protocol`: Use `tcp` option for TCP protocol, or `uds` for the Unix Domain Socket protocol or `stream` for the raw stream. Defaults to `udp` otherwise.
* `path`: Used only when the protocol is `uds`. Defaults to `/var/run/datadog/dsd.socket`.
* `stream`: Reference to a stream instance. Used only when the protocol is `stream`.
* `tcpGracefulErrorHandling`: Used only when the protocol is `tcp`. Boolean indicating whether to handle socket errors gracefully. Defaults to true.
* `tcpGracefulRestartRateLimit`: Used only when the protocol is `tcp`. Time (ms) between re-creating the socket. Defaults to `1000`.
* `udsGracefulErrorHandling`: Used only when the protocol is `uds`. Boolean indicating whether to handle socket errors gracefully. Defaults to true.
* `udsGracefulRestartRateLimit`: Used only when the protocol is `uds`. Time (ms) between re-creating the socket. Defaults to `1000`.
* `closingFlushInterval`: Before closing, StatsD will check for inflight messages. Time (ms) between each check. Defaults to `50`.
* `udsRetryOptions`: Used only when the protocol is `uds`. Retry/backoff options for UDS sends:
  * `retries`: Number of retry attempts for failed packet sends. Defaults to `3`.
  * `retryDelayMs`: Initial delay in milliseconds before retrying a failed packet send. Defaults to `100`.
  * `maxRetryDelayMs`: Maximum delay in milliseconds between retry attempts (caps exponential backoff). Defaults to `1000`.
  * `backoffFactor`: Exponential backoff multiplier for retry delays. Defaults to `2`.
* `udpSocketOptions`: Used only when the protocol is `udp`. Specify the options passed into dgram.createSocket(). The socket type (`udp4` or `udp6`) is auto-detected based on the host: IPv6 addresses (e.g., `::1`) use `udp6`, IPv4 addresses use `udp4`, and hostnames default to `udp4`. You can override auto-detection by explicitly setting `type` (e.g., `{ type: 'udp6' }`).
* `includeDatadogTelemetry`: Enable client-side telemetry to track metrics about the client itself. This helps diagnose high-throughput metric delivery issues. Telemetry metrics are prefixed with `datadog.dogstatsd.client.` and are not billed as custom metrics. `default: false`. See [Client-Side Telemetry](#client-side-telemetry) for details.
* `telemetryFlushInterval`: When telemetry is enabled, how often (in ms) to send telemetry metrics. `default: 10000`

### StatsD methods
All StatsD methods other than `event`, `close`, and `check` have the same API:
* `name`:       Stat name `required`
* `value`:      Stat value `required except in increment/decrement where it defaults to 1/-1 respectively`
* `sampleRate`: Sends only a sample of data to StatsD `default: 1`
* `tags`:       The tags to add to metrics. Can be either an object `{ tag: "value"}` or an array of tags. `default: []`
* `callback`:   The callback to execute once the metric has been sent or buffered

Alternatively, you can pass an options object in place of `sampleRate` and `tags`:
* `options`:    An object with optional properties:
  * `sampleRate`: Sends only a sample of data to StatsD `default: 1`
  * `tags`:       The tags to add to metrics `default: []`
  * `timestamp`:  A timestamp to associate with the metric. Can be a `Date` object or Unix timestamp in seconds. (DogStatsD only, ignored for Telegraf)
* `callback`:   The callback to execute once the metric has been sent or buffered

If an array is specified as the `name` parameter each item in that array will be sent along with the specified value.

#### `close`
The close method has the following API:

* `callback`:   The callback to execute once close is complete.  All other calls to statsd will fail once this is called.

#### `event`
The event method has the following API:

* `title`:       Event title `required`
* `text`:        Event description `default is title`
* `options`:     Options for the event
  * `date_happened`    Assign a timestamp to the event `default is now`
  * `hostname`         Assign a hostname to the event.
  * `aggregation_key`  Assign an aggregation key to the event, to group it with some others.
  * `priority`         Can be ‘normal’ or ‘low’ `default: normal`
  * `source_type_name` Assign a source type to the event.
  * `alert_type`       Can be ‘error’, ‘warning’, ‘info’ or ‘success’ `default: info`
* `tags`:       The tags to add to metrics. Can be either an object `{ tag: "value"}` or an array of tags. `default: []`
* `callback`:   The callback to execute once the metric has been sent.

#### `check`
The check method has the following API:

* `name`:        Check name `required`
* `status`:      Check status `required`
* `options`:     Options for the check
  * `date_happened`    Assign a timestamp to the check `default is now`
  * `hostname`         Assign a hostname to the check.
  * `message`          Assign a message to the check.
* `tags`:       The tags to add to metrics. Can be either an object `{ tag: "value"}` or an array of tags. `default: []`
* `callback`:   The callback to execute once the metric has been sent.

```javascript
  var StatsD = require('hot-shots'),
      client = new StatsD({
          port: 8020,
          globalTags: { env: process.env.NODE_ENV },
          errorHandler: errorHandler,
      });

  // Increment: Increments a stat by a value (default is 1)
  client.increment('my_counter');

  // Decrement: Decrements a stat by a value (default is -1)
  client.decrement('my_counter');

  // Histogram: send data for histogram stat (DataDog and Telegraf only)
  client.histogram('my_histogram', 42);

  // Distribution: Tracks the statistical distribution of a set of values across your infrastructure.
  // (DataDog v6)
  client.distribution('my_distribution', 42);

  // Gauge: Gauge a stat by a specified amount
  client.gauge('my_gauge', 123.45);

  // Gauge: Gauge a stat by a specified amount, but change it rather than setting it
  client.gaugeDelta('my_gauge', -10);
  client.gaugeDelta('my_gauge', 4);

  // Set: Counts unique occurrences of a stat (alias of unique)
  client.set('my_unique', 'foobar');
  client.unique('my_unique', 'foobarbaz');

  // Event: sends the titled event (DataDog only)
  client.event('my_title', 'description');

  // Check: sends a service check (DataDog only)
  client.check('service.up', client.CHECKS.OK, { hostname: 'host-1' }, ['foo', 'bar'])

  // Incrementing multiple items
  client.increment(['these', 'are', 'different', 'stats']);

  // Incrementing with tags
  client.increment('my_counter', ['foo', 'bar']);

  // Incrementing with tags and a callback (value defaults to 1)
  client.increment('my_counter', { env: 'production' }, function(error, bytes) {
    console.log('Sent counter with tags');
  });

  // Sampling, this will sample 25% of the time the StatsD Daemon will compensate for sampling
  client.increment('my_counter', 1, 0.25);

  // Tags, this will add user-defined tags to the data
  // (DataDog and Telegraf only)
  client.histogram('my_histogram', 42, ['foo', 'bar']);

  // Options object, allows combining sampleRate, tags, and timestamp
  // (DataDog only for timestamp)
  client.gauge('my_gauge', 42, { sampleRate: 0.25, tags: ['foo', 'bar'] });

  // Timestamp: send a metric with a specific timestamp (DataDog only)
  client.gauge('my_gauge', 42, { timestamp: new Date('2022-01-01') });
  client.increment('my_counter', 1, { timestamp: 1640995200 }); // Unix seconds

  // Using the callback.  This is the same format for the callback
  // with all non-close calls
  client.set(['foo', 'bar'], 42, function(error, bytes){
    //this only gets called once after all messages have been sent
    if(error){
      console.error('Oh noes! There was an error:', error);
    } else {
      console.log('Successfully sent', bytes, 'bytes');
    }
    });

  // Timing: sends a timing command with the specified milliseconds
  client.timing('response_time', 42);

  // Timing: also accepts a Date object of which the difference is calculated
  client.timing('response_time', new Date());

  // Timing: measuring elapsed time with Date.now()
  var startTime = Date.now();
  // ... your code here ...
  client.timing('response_time', Date.now() - startTime);

  // Timer: Returns a function that you call to record how long the first
  // parameter takes to execute (in milliseconds) and then sends that value
  // using 'client.timing'.
  // The parameters after the first one (in this case 'fn')
  // match those in 'client.timing'.
  var fn = function(a, b) { return a + b };
  client.timer(fn, 'fn_execution_time')(2, 2);

  // Async timer: Similar to timer above, but you instead pass in a function
  // that returns a Promise.  And then it returns a Promise that will record the timing.
  var fn = function () { return new Promise(function (resolve, reject) { setTimeout(resolve, n); }); };
  var instrumented = statsd.asyncTimer(fn, 'fn_execution_time');
  instrumented().then(function() {
    console.log('Code run and metric sent');
  });

  // Async timer: Similar to asyncTimer above, but it instead emits a distribution.
  var fn = function () { return new Promise(function (resolve, reject) { setTimeout(resolve, n); }); };
  var instrumented = statsd.asyncDistTimer(fn, 'fn_execution_time');
  instrumented().then(function() {
    console.log('Code run and metric sent');
  });

  // Async timer with dynamic tags: Add tags during function execution based on results
  // The ctx parameter is passed as the last argument to your function and is optional to use
  var fetchData = function (url, ctx) {
    return fetch(url).then(function(response) {
      ctx.addTags({ status: response.status, cached: 'false' });
      return response.json();
    });
  };
  var instrumentedFetch = statsd.asyncTimer(fetchData, 'api_call_time');
  instrumentedFetch('/api/data').then(function(data) {
    console.log('Data fetched with timing recorded');
  });

  // Timer without using dynamic tags (ctx parameter can be ignored)
  var simpleAdd = function (a, b) {
    return a + b;
  };
  var instrumentedAdd = statsd.timer(simpleAdd, 'add_time');
  instrumentedAdd(2, 3); // ctx is passed but simpleAdd doesn't use it

  // Sampling, tags and callback are optional and could be used in any combination (DataDog and Telegraf only)
  client.histogram('my_histogram', 42, 0.25); // 25% Sample Rate
  client.histogram('my_histogram', 42, { tag: 'value'}); // User-defined tag
  client.histogram('my_histogram', 42, ['tag:value']); // Tags as an array
  client.histogram('my_histogram', 42, next); // Callback
  client.histogram('my_histogram', 42, 0.25, ['tag']);
  client.histogram('my_histogram', 42, 0.25, next);
  client.histogram('my_histogram', 42, { tag: 'value'}, next);
  client.histogram('my_histogram', 42, 0.25, { tag: 'value'}, next);

  // Use a child client to add more context to the client.
  // Clients can be nested.
  var childClient = client.childClient({
    prefix: 'additionalPrefix.',
    suffix: '.additionalSuffix',
    globalTags: { globalTag1: 'forAllMetricsFromChildClient'}
  });
  childClient.increment('my_counter_with_more_tags');

  // Close statsd.  This will ensure all stats are sent and stop statsd
  // from doing anything more.
  client.close(function(err) {
    console.log('The close did not work quite right: ', err);
  });

  // UDS client with automatic retry on packet failures
  var client = new StatsD({
      protocol: 'uds',
      path: '/var/run/datadog/dsd.socket',
      udsRetryOptions: {
        // Retry options (all optional, showing defaults):
        // retries: 3,           // Number of retry attempts (set to 0 to disable)
        // retryDelayMs: 100,    // Initial delay in ms
        // maxRetryDelayMs: 1000,// Maximum delay cap in ms
        // backoffFactor: 2      // Exponential backoff multiplier
      }
  });
```

## DogStatsD, Telegraf, and OpenTelemetry functionality

Some of the functionality mentioned above is specific to certain backends and will not work with others.

* globalTags parameter - DogStatsD, Telegraf, or OpenTelemetry
* tags parameter - DogStatsD, Telegraf, or OpenTelemetry
* histogram method - DogStatsD, Telegraf, or OpenTelemetry
* telegraf parameter - Telegraf
* uds option in protocol parameter - DogStatsD
* distribution method - DogStatsD
* set / unique method - DogStatsD or Telegraf (not OpenTelemetry)
* event method - DogStatsD
* check method - DogStatsD
* timestamp option - DogStatsD
* includeDatadogTelemetry parameter - DogStatsD
* telemetryFlushInterval parameter - DogStatsD

## OpenTelemetry Collector Compatibility

hot-shots is compatible with the [OpenTelemetry Collector's StatsD receiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/statsdreceiver). The following features work out of the box:

| Feature | hot-shots Method | OTel Support |
|---------|------------------|--------------|
| Counter | `increment()`, `decrement()` | Yes |
| Gauge | `gauge()` | Yes |
| Gauge delta (+/-) | `gaugeDelta()` | Yes |
| Timer | `timing()` | Yes (converted to gauge/summary/histogram) |
| Histogram | `histogram()` | Yes (treated as timer) |
| Sample rate | All methods | Yes |
| Tags | All methods | Yes |

Example configuration for OpenTelemetry Collector:

```javascript
var client = new StatsD({
  host: 'localhost',
  port: 8125,
  protocol: 'udp'  // or 'tcp'
});

// These all work with OpenTelemetry
client.increment('requests');
client.gauge('queue_size', 100);
client.gaugeDelta('connections', 1);
client.timing('response_time', 250);
client.histogram('request_size', 1024);
```

## Sanitization

To prevent malformed packets, hot-shots automatically replaces protocol-breaking characters with underscores (`_`).

* **Metric names**: `:`, `|`, `\n`
* **Tag keys**: `:`, `|`, `,`, `\n`, plus `@` and `#` for StatsD/DogStatsD
* **Tag values**: `|`, `,`, `\n`, plus `@` and `#` for StatsD/DogStatsD

Colons are allowed in tag values (e.g., `url:https://example.com:8080`).

## Errors

As usual, callbacks will have an error as their first parameter.  You can have an error in both the message and close callbacks.

If the optional callback is not given, an error is thrown in some
cases and a console.log message is used in others.  An error will only
be explicitly thrown when there is a missing callback or if it is some potential configuration issue to be fixed.

If you would like to ensure all errors are caught, specify an `errorHandler` in your root
client. This will catch errors in socket setup, sending of messages,
and closing of the socket.  If you specify an errorHandler and a callback, the callback will take precedence.

```javascript
// Using errorHandler
var client = new StatsD({
  errorHandler: function (error) {
    console.log("Socket errors caught here: ", error);
  }
})
```

### Congestion error

If you get an error like `Error sending hot-shots message: Error: congestion` with an error code of `1`,
it is probably because you are sending large volumes of metrics to a single agent/ server.
This error only arises when using the UDS protocol and means that packages are being dropped.
Take a look at the [Datadog docs](https://docs.datadoghq.com/developers/dogstatsd/high_throughput/?#over-uds-unix-domain-socket) for some tips on tuning your connection.

### Sending metrics during process shutdown

Metrics sent from `process.on('exit')` handlers will **not** be delivered. This is a fundamental Node.js limitation, not a bug in hot-shots. When the `exit` event fires, the event loop has stopped processing async operations, so socket send callbacks will never execute.

The same applies to `process.on('uncaughtExceptionMonitor')` since that handler is also synchronous.

**Alternatives that work:**

Use `beforeExit` for graceful shutdown (fires when event loop is empty but before exit):
```javascript
process.on('beforeExit', (code) => {
  client.increment('app.shutdown');
  client.close();
});
```

Use signal handlers for external shutdown requests:
```javascript
function gracefulShutdown(signal) {
  client.increment('app.shutdown', [`signal:${signal}`]);
  client.close(() => {
    process.exit(0);
  });
}

process.on('SIGTERM', () => gracefulShutdown('SIGTERM'));
process.on('SIGINT', () => gracefulShutdown('SIGINT'));
```

For uncaught exceptions, use `uncaughtException` (not `uncaughtExceptionMonitor`) and delay exit:
```javascript
process.on('uncaughtException', (err) => {
  client.increment('app.crash');
  client.close(() => {
    console.error('Uncaught exception:', err);
    process.exit(1);
  });
});
```

## Debugging

If you're having issues with metrics not being sent or want to understand what hot-shots is doing
in detail, you can enable debug logging using Node.js's built-in `NODE_DEBUG` environment variable:

```bash
NODE_DEBUG=hot-shots node your-app.js
```

## Unix domain socket support

The 'uds' option as the protocol is to support [Unix Domain Sockets for Datadog](https://docs.datadoghq.com/developers/dogstatsd/unix_socket/).  It has the following limitations:
- It only works where 'node-gyp' works. If you don't know what this is, this
is probably fine for you. If you had an troubles with libraries that
you 'node-gyp' before, you will have problems here as well.
- It does not work on Windows

The above will cause the underlying library that is used, unix-dgram,
to not install properly.  Given the library is listed as an
optionalDependency, and how it's used in the codebase, this install
failure will not cause any problems.  It only means that you can't use
the uds feature.

## Datadog Telemetry

When `includeDatadogTelemetry` is enabled, the client automatically sends telemetry metrics about itself to help diagnose metric delivery issues in high-throughput scenarios. This feature should matche the behavior of official Datadog clients as described in [the docs](https://docs.datadoghq.com/developers/dogstatsd/high_throughput/?tab=go#client-side-telemetry).

Telemetry is automatically disabled when using `mock: true`, `telegraf: true`, or in child clients.

### Telemetry Metrics

The following metrics are sent every `telemetryFlushInterval` milliseconds (default: 10 seconds):

| Metric | Description |
|--------|-------------|
| `datadog.dogstatsd.client.metrics` | Total number of metrics sent |
| `datadog.dogstatsd.client.metrics_by_type` | Metrics broken down by type (gauge, count, set, timing, histogram, distribution) |
| `datadog.dogstatsd.client.events` | Total number of events sent |
| `datadog.dogstatsd.client.service_checks` | Total number of service checks sent |
| `datadog.dogstatsd.client.bytes_sent` | Total bytes successfully sent |
| `datadog.dogstatsd.client.bytes_dropped` | Total bytes dropped |
| `datadog.dogstatsd.client.packets_sent` | Total packets successfully sent |
| `datadog.dogstatsd.client.packets_dropped` | Total packets dropped |

The `metric_dropped_on_receive` from the official Datadog clients is intentionally omitted. That metric tracks drops on an internal receive channel, which doesn't apply to hot-shots' architecture. Also `bytes_dropped_queue` is omitted as this also didn't fit into how hot-shots works.

### Telemetry Tags

All telemetry metrics include these tags:
* `client:nodejs` - Identifies the hot-shots client
* `client_version:<version>` - The hot-shots version
* `client_transport:<protocol>` - The transport protocol (udp, tcp, uds, stream)

### Example

```javascript
var client = new StatsD({
  host: 'localhost',
  includeDatadogTelemetry: true,
  telemetryFlushInterval: 10000  // Optional, default is 10 seconds
});
```

## Submitting changes

Thanks for considering making any updates to this project! This project is entirely community-driven, and so your changes are important. Here are the steps to take in your fork:

1. Run "npm install"
2. Add your changes in your fork as well as any new tests needed
3. Run "npm test"
4. Update README.md with any needed documentation
5. If you have made any API changes, update types.d.ts (note: timer/asyncTimer/asyncDistTimer type signatures require TypeScript 4.0+ for variadic tuple support)
6. Push your changes and create the PR

When you've done all this we're happy to try to get this merged in right away.

## Package versioning and security

Versions will attempt to follow semantic versioning, with major changes only coming in major versions.

npm publishing is possible by one person, [bdeitte](https://github.com/bdeitte), who has two-factor authentication enabled for publishes.  Publishes only contain one additional library, [unix-dgram](https://github.com/bnoordhuis/node-unix-dgram).

## License

hot-shots is licensed under the MIT license.
