# Testing strategy

A concrete, executable plan for going from **0% coverage** today to **a useful test suite running on every push** — without adding heavy dependencies or breaking the existing release pipeline.

This document is meant to be **acted on** in phases. Each phase is a single PR. The aim is to make the second-most-important thing in this repo (releasing correctly) cheaper and safer.

---

## Goals

- `npm test` runs all tests locally with one command, no setup beyond `npm install`.
- Tests run automatically in CI on every push and pull request to `master`.
- Coverage report uploads as a CI artefact so it can be inspected from any failing run.
- Coverage **thresholds** enforced by CI — a regression below the threshold fails the build.
- A clear path from the first PR (small, targeted) to comprehensive coverage of the bug-prone parts of the codebase.

## Non-goals

- 100% coverage — not a useful target. Aim at the bug-prone surface.
- Integration tests against real Shelly hardware — too operationally heavy for CI. Mock the transport.
- A Node-RED end-to-end runner — Node-RED's own test helper is heavy and pulls in the full runtime. We can get most of the value with simpler unit tests against mocked `RED`.
- Bringing in jest / vitest / mocha — the built-in `node:test` runner is enough and adds zero dependencies.

---

## Tooling choice

| Concern | Tool | Why |
|---|---|---|
| Test runner | `node:test` (built-in since Node 20) | Zero dependencies. We just bumped engines to `>=20`, so it's available. Real `describe` / `it` style. Watch mode, parallelism, filters, all built in. |
| Assertions | `node:assert/strict` (built-in) | Same — no deps. Strict-mode by default is the right default. |
| Coverage | `c8` (one dev dep) | Wraps V8's native coverage. The only mainstream coverage tool that pairs cleanly with `node:test`. Tiny dependency footprint. |
| HTTP mocking | `nock` (one dev dep) | The standard for axios mocking. Lets us simulate Shelly device responses (including the digest 401-retry dance) without hitting hardware. |
| CI runner | GitHub Actions | Already in use. |

**Total new dependencies:** 2 dev deps (`c8`, `nock`). Both are mature, widely used, and small.

---

## Directory layout

```
test/
├── unit/
│   ├── utils.test.js                  Pure helpers from shelly/lib/utils.js
│   ├── configuration.test.js          Catalog lookups
│   ├── parsers/
│   │   ├── gen1-relay.test.js
│   │   ├── gen1-dimmer.test.js
│   │   ├── gen1-rgbw.test.js
│   │   ├── gen1-roller.test.js
│   │   ├── gen1-thermostat.test.js    (would have caught the scheduleProfile bug)
│   │   ├── gen1-sensor.test.js
│   │   ├── gen1-button.test.js
│   │   ├── gen1-measure.test.js
│   │   └── gen2-generic.test.js
│   ├── status-converters.test.js      convertStatus1, convertStatus2
│   └── transport.test.js              shellyRequestAsync + digest retry (nock)
└── helpers/
    ├── fake-red.js                    Minimal Node-RED RED-object stub
    └── mock-shelly.js                 nock interceptors for typical device responses
```

The split into `unit/` leaves room for `integration/` later (when / if a Shelly-device simulator emerges).

---

## What to test in what order

Each phase is one PR. Each phase produces a working `npm test` that's more capable than the last.

### Phase 1 — Tooling + low-hanging fruit (reachable: ~10% coverage)

**Wire up the runner, prove it works, cover the pure helpers.**

- Add `c8` to devDependencies.
- Add `npm test` and `npm run coverage` scripts.
- Write tests for [`shelly/lib/utils.js`](../../shelly/lib/utils.js): `isMsgPayloadValid`, `isMsgPayloadValidOrArray`, `isEmpty`, `trim`, `replace`. All five are pure and trivially testable.
- Wire CI to run `npm test` on push / PR.
- Upload coverage report as a CI artefact.

**Why first:** lowest risk, highest learning value. Establishes the patterns and CI integration before any harder testing.

### Phase 2 — Catalog and status converters (reachable: ~25%)

**Test the data layer.**

- Tests for [`shelly/lib/configuration.js`](../../shelly/lib/configuration.js): `getDevice`, `getDeviceType`, `getDeviceTypes1/2`, `isExactTypeGen1/2`, `getDeviceTypeInfos`.
- Tests for the status converters (`convertStatus1`, `convertStatus2`). They're pure functions but live inside the node files — see "extraction" note below.

**Why second:** still pure functions, no I/O. Validates `config.json` shape implicitly — if a future edit breaks the catalog, tests will catch it.

### Phase 3 — Input parsers (reachable: ~50%)

**The bug-prone surface.** This is where the value-per-test ratio is highest.

- One test file per device family, asserting on the produced URL or RPC envelope.
- Cover at least:
  - Default path (no command parameters → no route returned).
  - Each documented input field exercised.
  - **Boundary checks** — the scheduleProfile bug was an out-of-range value silently passing through. Every range guard should have a test below, at, and above the bounds.
  - The `on:true/false` ↔ `turn:'on'/'off'` translation.

**Why third:** these functions are pure but live inside the device-node IIFE. They need to be reachable for testing, which couples this phase to a small extraction (see "extraction" below).

### Phase 4 — Transport (reachable: ~65%)

**Test the auth and retry logic with nock.**

- Mock a gen 1 device: GET `/relay/0?turn=on` → 200 with status JSON. Assert returned payload.
- Mock a gen 1 device with Basic auth: 401 → after retry with auth header → 200. Assert auth header value.
- Mock a gen 2 device with Digest auth: 401 with `www-authenticate` → assert the second request carries a correct `Digest` header.
- Mock a gen 2 device returning a 400 with `{"error":{"code":-103,...}}` → assert the thrown error message includes the body (the 11.10.1 improvement).

**Why fourth:** needs `nock`. Slightly more setup than pure unit tests. Catches the entire transport layer including the auth dance — which has had no tests despite being the most security-sensitive part of the codebase.

### Phase 5 — Lifecycle (reachable: ~70-75%)

**Mock the Node-RED `RED` object and exercise the constructor.**

- Use `test/helpers/fake-red.js` to provide a minimal `RED.nodes.createNode`, `RED.nodes.registerType`, `RED.nodes.getNode`, `RED.httpAdmin`, `RED.log`.
- Assert that `new ShellyGen1Node({ hostname: '...', devicetype: 'Relay', mode: 'polling' })` sets up the expected `axiosInstance`, `pollingTimer`, and event handlers.
- Test the `close` handler — assert `node.closing === true`, timers cleared, listeners removed.

**Why fifth:** needs more scaffolding than the others. By this point we know how the codebase shapes up under tests, so the harness is informed.

The remaining ~25% (callback-mode network code paths) is genuinely hard to unit-test without a device simulator. Leave it.

---

## A note on extraction

The per-family input parsers and status converters in [`gen1-node.js`](../../shelly/nodes/gen1-node.js) and [`gen2-node.js`](../../shelly/nodes/gen2-node.js) are **nested inside** the `module.exports = function(RED) { ... }` factory. They can't be imported directly:

```js
// This doesn't work today:
const { inputParserRelay1Async } = require('../../shelly/nodes/gen1-node');
//                                              ↑ exports a factory, not the parsers
```

Two options:

**Option A — extract to standalone files** (preferred, aligns with [R1 in refactoring recommendations](../architecture/06-recommendations-for-refactoring.md#r1-extract-the-per-device-type-input-parsers-into-one-file-each-m)):

```
shelly/nodes/gen1/parsers/
├── relay.js          module.exports = async function inputParserRelay1Async(msg) { ... }
├── dimmer.js
├── ...
```

These don't need the `RED` factory because they're pure. They get imported by `gen1-node.js` _and_ by tests.

**Option B — export-for-testing hook** (faster, uglier):

```js
// At the bottom of gen1-node.js, before the factory returns:
module.exports.__test = { inputParserRelay1Async, inputParserDimmer1Async, ... };
```

Tests reach in via `require('../../shelly/nodes/gen1-node').__test.inputParserRelay1Async`. Convention-marked as a testing escape hatch.

**Recommendation:** Option A. The refactor is small, lasting, and makes the rest of the codebase easier to read. Option B is technical debt deliberately incurred to avoid a refactor that pays for itself.

This proposal **assumes Option A** for Phases 3-5. Phase 1 and 2 (utils, config, status converters) don't need any extraction — they're already in importable files.

---

## Concrete first-PR shape (Phase 1)

### `package.json` changes

```jsonc
{
  "scripts": {
    "prepare": "husky install",
    "lint": "eslint shelly --ext .js --format unix --ignore-pattern scripts",
    "postlint": "echo ✅ lint valid",
    "test": "node --test test/unit",
    "test:watch": "node --test --watch test/unit",
    "coverage": "c8 --reporter=text --reporter=html --reporter=lcov node --test test/unit",
    "coverage:check": "c8 --check-coverage --lines 10 --functions 10 --branches 8 node --test test/unit"
  },
  "devDependencies": {
    "c8": "^10.1.3",
    "eslint": "^8.20.0",
    "eslint-config-prettier": "^8.5.0",
    "eslint-plugin-prettier": "^4.2.1",
    "husky": "^8.0.1",
    "prettier": "^2.7.1"
  }
}
```

Thresholds start **deliberately low** in Phase 1 (10% lines / 10% functions / 8% branches). Each phase ratchets them upward.

### `test/unit/utils.test.js` (example, fully working)

```js
const { describe, it } = require('node:test');
const assert = require('node:assert/strict');
const utils = require('../../shelly/lib/utils.js');

describe('utils.isMsgPayloadValid', () => {
    it('rejects undefined msg', () => {
        assert.equal(utils.isMsgPayloadValid(undefined), false);
    });

    it('rejects msg without payload', () => {
        assert.equal(utils.isMsgPayloadValid({}), false);
    });

    it('rejects array as msg', () => {
        assert.equal(utils.isMsgPayloadValid([{ payload: { a: 1 } }]), false);
    });

    it('rejects empty payload object', () => {
        assert.equal(utils.isMsgPayloadValid({ payload: {} }), false);
    });

    it('rejects array payload', () => {
        assert.equal(utils.isMsgPayloadValid({ payload: [1, 2] }), false);
    });

    it('accepts non-empty object payload', () => {
        assert.equal(utils.isMsgPayloadValid({ payload: { relay: 0, on: true } }), true);
    });
});

describe('utils.trim', () => {
    it('returns undefined for undefined input', () => {
        assert.equal(utils.trim(undefined), undefined);
    });

    it('returns undefined for empty string', () => {
        assert.equal(utils.trim(''), undefined);
    });

    it('strips leading and trailing whitespace', () => {
        assert.equal(utils.trim('  hello  '), 'hello');
    });
});

describe('utils.replace', () => {
    it('returns undefined for falsy input', () => {
        assert.equal(utils.replace(undefined, /x/g, 'y'), undefined);
        assert.equal(utils.replace('', /x/g, 'y'), undefined);
    });

    it('does global regex replace', () => {
        assert.equal(utils.replace('a"b"c', /"/g, ''), 'abc');
    });

    it('does string-literal replace', () => {
        assert.equal(utils.replace('hello %URL% world', '%URL%', 'http://x'), 'hello http://x world');
    });
});
```

That's roughly 50 lines of tests covering ~80% of `utils.js`. Phase 1 is mostly this kind of work.

### CI workflow changes

Add a `test` step to [`.github/workflows/node.js.yml`](../../.github/workflows/node.js.yml):

```yaml
jobs:
  build:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        node-version: [20.x, 22.x]
    steps:
    - uses: actions/checkout@v4
    - name: Use Node.js ${{ matrix.node-version }}
      uses: actions/setup-node@v4
      with:
        node-version: ${{ matrix.node-version }}
    - run: npm ci
    - run: npm run lint
    - run: npm test                       # ← new
    - run: npm run coverage:check         # ← new (gates on the threshold)
    - name: Upload coverage report        # ← new
      if: matrix.node-version == '22.x'
      uses: actions/upload-artifact@v4
      with:
        name: coverage-report
        path: coverage/
        retention-days: 14
```

`coverage:check` failing breaks the build. The HTML coverage report is uploaded as a CI artefact — downloadable from any run, useful for inspecting what's covered without running locally.

### Threshold ratchet plan

Each phase raises the thresholds in `coverage:check`:

Actual progression once executed (matches the floors set in `package.json`'s `c8` block at each phase):

| Phase | Lines | Functions | Branches | What was added | Tests |
|---:|---:|---:|---:|---|---:|
| 1 | 5 | 10 | 5 | `lib/utils.js` + `lib/configuration.js`. | 48 |
| 2 | 10 | 50 | 80 | + status converters extracted to `shelly/nodes/gen{1,2}/`. | 73 |
| 3 | 28 | 73 | 87 | + 7 per-family gen1 parsers extracted to `shelly/nodes/gen1/parsers/`, gen2 generic parser to `shelly/nodes/gen2/parsers/`. The 1342-line `gen1-node.js` shrinks to 696 LOC. | 153 |
| 4 | 35 | 70 | 88 | + transport tests (`shellyRequestAsync` with nock, digest 401-retry, error-body enrichment) and `getCredentials` / `getShellyInfo`. Functions floor dipped because `shelly.js` (12 functions) entered the tested-files denominator with only 7 covered. | 175 |
| 5 | 40 | 75 | 88 | + lifecycle: `shellyPing`, `tryCheckDeviceType`, `start` against nock + a fake-node harness. `shelly.js` went from 49% → 82% lines. | 193 |

The remaining uncovered ~58% lives mostly in the four files that need a fake Node-RED `RED` object to exercise (`99-shelly.js`, `cloud-node.js`, `gen1-node.js`, `gen2-node.js` — all at 0%). Their core logic — the input parsers, status converters, transport, polling lifecycle — is already tested in isolation. What remains uncovered in those files is mostly wiring (constructor field assignment, `RED.nodes.createNode`, event-handler registration, the EM-data download side path in `inputParserMeasure1Async`). Adding fake-RED tests for the constructors would lift line coverage roughly to 55-60% but with much lower value-per-test than Phases 1-5; left as future work.

Note: line / function / branch percentages don't move in lockstep. Branch and function percentages spike early because the small tested files are well-branched; line percentage moves slowly until the large `gen1-node.js` and `gen2-node.js` get split (Phase 3); and the function metric can dip when a new larger file enters the tested set (Phase 4) before its functions get covered (Phase 5).

Each phase's PR bumps the previous floor. CI then prevents regressions back below it.

---

## Optional: Codecov integration

If you want a coverage badge on the README and per-PR coverage delta comments:

```yaml
- name: Upload coverage to Codecov
  if: matrix.node-version == '22.x'
  uses: codecov/codecov-action@v4
  with:
    files: ./coverage/lcov.info
    fail_ci_if_error: false
```

Requires signing up at [codecov.io](https://about.codecov.io/) and adding a `CODECOV_TOKEN` repo secret. Optional and additive — the local thresholds in `coverage:check` are the real gate.

---

## Why this beats the status quo

- **The 11.9.x series shipped four real logic bugs** to a published release: the missing-await in `start()`, the `urls[i]→urls[j]` index swap, the missing-await in RGBW init, and the always-true `scheduleProfile` range guard. Three of the four are pure-function bugs that would have been caught by a 5-line test in Phases 1-3.
- **Husky is now wired** (since 11.9.4) but only runs `npm run lint`. Once `npm test` exists, adding it to the pre-commit hook is a one-line change — every local commit then runs the suite.
- The maintainer's current QA strategy ("I run it against my home fleet") doesn't scale and isn't replicable by contributors. Tests give every PR the same baseline check.

---

## What I'd suggest doing first

Phase 1 only — it's small, low-risk, and you can see exactly what the developer ergonomics feel like before committing to the broader plan. Once you've used `npm test` locally a few times you'll know whether the runner choice fits your style. Subsequent phases follow the same pattern, so the first one is the highest-information one.

If you want, I can implement Phase 1 as a single PR-shaped commit:

- Add `c8` to devDeps and the four new npm scripts.
- Write the `test/unit/utils.test.js` shown above plus one more file for `configuration.js` to prove the catalog-lookup pattern.
- Update the CI workflow with the test + coverage-check + artefact-upload steps.
- Wire `npm test` into the husky pre-commit hook.
- Land at ~12-15% coverage with thresholds set at 10%.

Say the word and I'll send it.