# Architecture

This document explains the **big picture** of Universal Emoji Parser so a new contributor (human or agent) can be productive quickly. For day-to-day commands see [Development Commands](DEVELOPMENT_COMMANDS.md). For language-specific rules see [Standards](STANDARDS.md).

## High-level model

```
                         ┌─────────────────────────────────┐
                         │  src/index.ts (public API)      │
                         │  ───────────────────────────    │
                         │  uEmojiParser.parse(text, opts) │
                         │  uEmojiParser.parseToHtml       │
                         │  uEmojiParser.parseToUnicode    │
                         │  uEmojiParser.parseToShortcode  │
                         │  emojiLibJsonData               │
                         │  DEFAULT_EMOJI_CDN              │
                         └────────┬───────────────┬────────┘
                                  │               │
                ┌─────────────────┘               └─────────────────┐
                ▼                                                   ▼
   ┌──────────────────────────┐                       ┌─────────────────────────┐
   │  src/lib/emoji-lib.json  │                       │  @twemoji/parser        │
   │  (1906 entries)          │                       │  (only runtime dep)     │
   │  shortcode → EmojiType   │                       │  finds emoji entities   │
   │  unicode  → EmojiType    │                       │  → CDN URLs             │
   └──────────┬───────────────┘                       └─────────────────────────┘
              │
              │ generated offline by
              ▼
   ┌──────────────────────────────────────────┐
   │  test/prepareEmojiLibJson.test.ts        │
   │  (it.skip — opt-in regeneration)         │
   │  emojilib  + unicode-emoji-json          │
   │  + EMOJIS_SPECIAL_CASES overrides        │
   │  → src/lib/emoji-lib-output.json         │
   │  → (review + copy to emoji-lib.json)     │
   └──────────────────────────────────────────┘
```

The runtime is **two files**: `src/index.ts` (~135 lines) and `src/lib/emoji-lib.json` (data). Everything else is type definitions, tests, or build/release infrastructure.

## Project structure

```
universal-emoji-parser/
├── AGENTS.md                          # Single source of truth for AI agents
├── CLAUDE.md → AGENTS.md              # Symlink (do not edit directly)
├── README.md                          # Human-facing intro and usage docs
├── LICENSE                            # MIT
├── package.json                       # Scripts, deps, version, engines.node ≥ 20.19
├── tsconfig.json                      # Strict TS config; tests + src; emits .d.ts via `build:tsc`
├── tsconfig.build.json                # `tsc`/ts-loader: compile `src/` only (`rootDir`)
├── webpack.config.js                  # commonjs2 output, ts-loader → `tsconfig.build.json`
├── eslint.config.mjs                  # ESLint flat config + Prettier integration
├── .prettierrc                        # semi:false, singleQuote:true, trailingComma:'es5'
├── .editorconfig                      # 2-space indent, LF, max 120 cols
├── .ncurc.json                        # npm-check-updates (optional `reject` list)
├── .babelrc                           # babel-preset-env + transform-runtime (legacy, kept for compat)
├── .npmignore                         # Trims source/test/config from npm tarball
│
├── src/
│   ├── index.ts                       # The public API — see "src/index.ts" below
│   └── lib/
│       ├── type.ts                    # EmojiType, EmojiParseOptionsType, UEmojiParserType
│       ├── emoji-lib.json             # The catalog (committed; ~543 KB; 1906 entries)
│       └── emoji-lib-output.json      # Last regeneration output (git-ignored)
│
├── test/
│   ├── main.test.ts                       # Integration tests for the public methods
│   ├── emojiLibJson.test.ts               # Validates catalog metadata + count
│   └── prepareEmojiLibJson.test.ts        # `it.skip`-guarded regenerator
│
├── dist/                              # Webpack output (git-ignored, npm-published)
│   ├── index.js
│   ├── index.d.ts
│   └── *.map
│
├── docker/local/                      # Dev container Docker Compose + Dockerfile
├── .devcontainer/                     # VS Code Dev Container config (uses docker/local/)
│
├── .github/
│   ├── workflows/
│   │   ├── code_check.yml                       # PR: lint + format + test
│   │   ├── pull_request_check.yml               # PR: title/body length + size labels
│   │   ├── release_and_publish.yml              # PR merge → bump version, build, publish
│   │   ├── check_packages_versions.yml          # Weekly: open deps PR via ncu
│   │   ├── check_and_merge_packages_upgrades_pr.yml  # Auto-merge that PR if green
│   │   ├── check_branches_state.yml             # Stale branch report
│   │   └── cleanup_caches.yml                   # GHA cache GC
│   └── scripts/
│       ├── get_github_release_log.sh            # Build release notes from git log
│       └── get_packages_upgrades.sh             # Format ncu output for the PR body
│
├── .agents/                           # AI agent skills, commands, subagents
│   ├── README.md
│   ├── skills/
│   ├── commands/
│   └── agents/
├── .claude/ → .agents                 # Symlink (Claude Code looks here natively)
│
├── docs/                              # This documentation
└── tmp/                               # Git-ignored scratch space
```

## `src/index.ts` walkthrough

### Imports

```ts
import { EmojiLibJsonType, EmojiParseOptionsType, EmojiType, TwemojiEntity, UEmojiParserType } from './lib/type'
import emojiLibJson from './lib/emoji-lib.json'
import { parse } from '@twemoji/parser'
```

`emoji-lib.json` is imported as a typed JSON module (`resolveJsonModule: true` in `tsconfig.json`) and cast to `EmojiLibJsonType`. There is **no** runtime construction of the catalog — it's literally a `.json` import.

### Constants

```ts
export const DEFAULT_EMOJI_CDN: string = 'https://cdn.jsdelivr.net/gh/jdecked/twemoji@latest/assets/svg/'
export const emojiLibJsonData: EmojiLibJsonType = emojiLibJson
```

`DEFAULT_EMOJI_CDN` is the URL prefix Twemoji's `parse()` produces. Custom CDNs work by string-replacing this prefix in `__parseEmojiToHtml`.

### The `uEmojiParser` object

Six methods, each described below. `getEmojiObjectByShortcode` and `getDefaultOptions` are public (typed in `UEmojiParserType`) but rarely used directly.

#### `getEmojiObjectByShortcode(shortcode)`

Two-tier lookup:

1. Strip `:` from the shortcode
2. **Direct hit** on `emojiLibJsonData[shortcode]` — fast path for canonical slugs (`smiling_face_with_sunglasses`)
3. **Keyword scan** — `Object.keys(...).find(k => emojiLibJsonData[k].keywords.includes(shortcode))` — fallback for dialects like `:thumbsup:` (Slack/legacy) that aren't the slug

This is what makes Slack-style aliases coexist with the canonical slugs in a single catalog.

#### `getDefaultOptions(options)`

Merges user options with defaults. Subtle detail: it uses `Object.getOwnPropertyDescriptor(options, 'emojiCDN')` to distinguish "explicitly undefined" from "missing". For booleans (`parseToHtml`, `parseToUnicode`, `parseToShortcode`) it just calls `Boolean(...)` because `undefined → false` is the right default for those.

Defaults: `parseToHtml: true`, `parseToUnicode: false`, `parseToShortcode: false`, `emojiCDN: undefined`.

#### `__parseEmojiToHtml(text, emojiCDN)`

Internal (note the `__` prefix, though it's exported — it's a JS-style "please don't call this" marker, not a hard private):

1. Run `@twemoji/parser`'s `parse(text)` to get `Array<TwemojiEntity>` (each has `text`, `url`, `indices`, `type`)
2. Track `entitiesFound` to avoid replacing the same emoji twice
3. For each entity: rewrite the URL prefix if `emojiCDN` is set, then `text.replace(new RegExp(entity.text, 'g'), <img...>)` to swap all occurrences

Output: `<img class="emoji" alt="<unicode>" src="<url>"/>` — see [API Reference → HTML output contract](API_REFERENCE.md).

#### `parseToHtml(text, emojiCDN?)`

Convenience: runs `parseToUnicode` first (so `:smile:` becomes `🙂` first), then hands off to `__parseEmojiToHtml`. **Always** runs unicode resolution first — Twemoji only sees unicode characters.

#### `parseToUnicode(text)`

Match `/:(\w+):/g` to find shortcodes, look each one up via `getEmojiObjectByShortcode`, replace with `emoji.char`. Linear scan over matches; one regex per shortcode found.

#### `parseToShortcode(text)`

Builds a single alternation regex from `Object.keys(emojiLibJsonData).join('|')`, escapes the `*️⃣` keycap (it has special regex characters), then `text.matchAll` to find every emoji and replace with `:slug:`. The escape is load-bearing — without it, the regex compiles but corrupts the keycap match.

#### `parse(text, options)`

The dispatcher:

```ts
if (typeof text !== 'string') throw new Error('The text parameter should be a string.')

if (!opts.parseToHtml && opts.parseToShortcode) text = parseToShortcode(text)
if (opts.parseToHtml || opts.parseToUnicode) text = parseToUnicode(text)
if (opts.parseToHtml) text = __parseEmojiToHtml(text, opts.emojiCDN)
```

Order matters: shortcode → unicode → HTML. Each stage is a no-op if its option is off.

### CommonJS reattachment

```ts
export default uEmojiParser
module.exports = uEmojiParser
module.exports.emojiLibJsonData = emojiLibJsonData
module.exports.DEFAULT_EMOJI_CDN = DEFAULT_EMOJI_CDN
```

Webpack's `libraryTarget: 'commonjs2'` exposes the default export as `module.exports.default`, which would break `require('universal-emoji-parser').parse(...)`. The three `module.exports` assignments at the bottom flatten the API so `require` and `import` users see the same shape. Every `export const` declared at the top of `src/index.ts` must be reattached here too, otherwise it ships as `undefined` to CommonJS consumers (regression-tested in `test/exports.test.ts`).

## Type model — `src/lib/type.ts`

```ts
export interface EmojiType {
  name: string // "smiling face with sunglasses"
  slug: string // "smiling_face_with_sunglasses" (canonical shortcode)
  group: string // "Smileys & Emotion"
  emoji_version: string // "1.0"
  unicode_version: string // "1.0"
  skin_tone_support: boolean
  char: string // "😎" — the unicode literal
  keywords: Array<string> // ["smiling_face_with_sunglasses", "cool", "summer", ...]
  keyword_index_found?: number // Used by the regenerator only — don't rely on it
}

export interface EmojiLibJsonType {
  [key: string]: EmojiType // keyed by emoji char (the unicode literal)
}

export interface EmojiParseOptionsType {
  emojiCDN?: string
  parseToHtml?: boolean
  parseToUnicode?: boolean
  parseToShortcode?: boolean
}

export interface UEmojiParserType {
  getEmojiObjectByShortcode: (shortcode: string) => EmojiType | undefined
  getDefaultOptions(options?: EmojiParseOptionsType): EmojiParseOptionsType
  __parseEmojiToHtml(text: string, emojiCDN?: string): string
  parseToHtml: (text: string, emojiCDN?: string) => string
  parseToUnicode: (text: string) => string
  parseToShortcode: (text: string) => string
  parse: (text: string, options?: EmojiParseOptionsType) => string
}

export interface TwemojiEntity {
  url: string
  indices: Array<number>
  text: string
  type: string
}
```

The catalog is **keyed by unicode literal**, not by slug. That's because the regenerator pipeline starts from `unicode-emoji-json` (whose keys are unicode) and merges keywords from `emojilib` (whose keys are also unicode). Looking up by slug requires the two-tier scan in `getEmojiObjectByShortcode`.

## The regeneration pipeline

`test/prepareEmojiLibJson.test.ts` is the **only** sanctioned way to rebuild `src/lib/emoji-lib.json`. The test is `it.skip`-guarded so it never runs on CI:

1. Load `unicode-emoji-json` (1906 emojis with metadata: name, slug, group, version)
2. Load `emojilib` (1898 emojis with curated keyword arrays)
3. For each emoji in `unicode-emoji-json`:
   - Set `char` to the key
   - Use `emojilib` keywords if present, else `[slug]`
   - Ensure the slug is in keywords (unshift if missing)
   - Apply `EMOJIS_SPECIAL_CASES` overrides (include/exclude)
4. **Deduplicate keywords** across emojis — the same keyword can appear on multiple emojis (e.g., `coffee` on `☕` and `🤎`). The algorithm picks the emoji with the lowest `keyword_index_found` (i.e., where the keyword is most prominent) and removes it from the rest. This is O(n²) but only runs at regeneration time
5. Write to `src/lib/emoji-lib-output.json`

After regeneration:

- Diff `emoji-lib-output.json` vs `emoji-lib.json` to review changes
- Copy the new contents to `emoji-lib.json` (the runtime source)
- Update `TOTAL_EMOJIS` in `emojiLibJson.test.ts` if the count changed
- Commit both files together

See [`/regenerate-emoji-lib`](../.agents/commands/regenerate-emoji-lib.md) for the full workflow.

## Special cases (`EMOJIS_SPECIAL_CASES`)

The regenerator applies hand-curated keyword overrides for a handful of emojis where the upstream `emojilib` keywords are wrong, missing, or collide with another emoji. Current entries:

| Emoji | Include                                | Exclude           | Why                                                                               |
| ----- | -------------------------------------- | ----------------- | --------------------------------------------------------------------------------- |
| `☕`  | `coffee`                               | —                 | `emojilib` has it, but the dedup loop would otherwise hand `coffee` to `🤎` first |
| `🤎`  | —                                      | `coffee`          | Brown heart should not match `:coffee:`                                           |
| `❤️`  | `heart`                                | —                 | The plain red heart is the canonical `:heart:`                                    |
| `💘`  | —                                      | `heart`           | Heart-with-arrow shouldn't steal `:heart:`                                        |
| `👮‍♀️`  | `policewoman`, `female-police-officer` | `legal`, `arrest` | Common Slack aliases; remove ambiguous keywords                                   |
| `✅`  | `white_check_mark`                     | —                 | GitHub-flavored alias                                                             |
| `⏸️`  | `double_vertical_bar`                  | —                 | Niche but supported                                                               |

Add new entries by editing `EMOJIS_SPECIAL_CASES` in `prepareEmojiLibJson.test.ts` and regenerating.

## Build configuration

### Webpack (`webpack.config.js`)

```js
{
  entry: { index: { import: './src/index.ts' } },
  output: {
    path: 'dist/',
    filename: '[name].js',
    libraryTarget: 'commonjs2',     // critical — see "CommonJS reattachment" above
    globalObject: 'this',
  },
  module: { rules: [{ test: /\.tsx?$/, use: 'ts-loader' }] },
  resolve: { extensions: ['.tsx', '.ts', '.js'] },
  optimization: { chunkIds: 'size', minimize: true },
  // CleanWebpackPlugin only on `--mode production`
}
```

Single-entry, single-output. ts-loader runs the TypeScript compiler, no Babel involvement at build time (the `.babelrc` is legacy — Babel only kicks in if a downstream tool reaches for it).

### TypeScript (`tsconfig.json`)

Highlights:

- `strictNullChecks: true`
- `noImplicitAny: true`
- `noUnusedLocals: true`, `noUnusedParameters: true`
- `declaration: true` — emits `.d.ts` so consumers get types
- `module: 'commonjs'`, `moduleResolution: 'node'`
- `lib: ['es6', 'dom']` — includes DOM types because consumers may use this in the browser
- `resolveJsonModule: true` — required to `import emojiLibJson from './lib/emoji-lib.json'`

`outDir: './dist/'` — but Webpack overrides this; `tsc` is used only via `npm run build:tsc` to emit type declarations.

### npm scripts

| Script                            | Runs                                                                            | Purpose                   |
| --------------------------------- | ------------------------------------------------------------------------------- | ------------------------- |
| `dev`                             | `nodemon src/index.ts`                                                          | Watch-run a smoke script  |
| `build`                           | `webpack --mode production --progress`                                          | Production bundle         |
| `build:dev`                       | `webpack --mode development --progress`                                         | Unminified bundle         |
| `build:tsc`                       | `tsc --build tsconfig.json`                                                     | Type-check + emit `.d.ts` |
| `test`                            | `tsx ./node_modules/mocha/bin/mocha.js 'test/**/*.ts' --timeout 25000 --colors` | Run all specs             |
| `test:watch`                      | `mocha -w --watch-extensions ts ...`                                            | TDD inner loop            |
| `eslint:check` / `eslint:fix`     | ESLint over `*.ts`                                                              | Lint                      |
| `prettier:check` / `prettier:fix` | Prettier over `*.{css,html,js,ts,json,md,yaml,yml}`                             | Format                    |
| `release`                         | `npm version patch -m "[🤖 DailyBot] New release to v%s launched 🚀"`           | Bump version (CI-only)    |
| `ncu:check` / `ncu:upgrade`       | `npm-check-updates`                                                             | Dep upgrade pipeline      |

## CI/CD pipeline

The release flow (`.github/workflows/release_and_publish.yml`) is triggered on `pull_request: closed` with `merged == true` against `main`:

```
PR merged to main
       │
       ▼
check_pr_size_label   (XS / S / M / L / XL / XXL based on lines changed)
       │
       ▼
notify_on_channel_start  (DailyBot Slack-like notification)
       │
       ▼
deploy_setup            (npm install with cache)
       │
       ▼
deploy_validate_linters_and_code_format   (eslint:check + prettier:check)
       │
       ▼
deploy_tests            (npm test)
       │
       ▼
build                   (npm run build → dist/)
       │
       ▼
release_and_publish     (npm version patch + push tag + create GH release + npm publish)
       │
       ▼
cleanup_caches  +  notify_on_channel_end
```

Every job runs on `ubuntu-latest` with Node 24 and aggressive caching of `~/.npm` and `node_modules`. The release job uses `secrets.AUTOMATION_GITHUB_TOKEN` (push + tag) and `secrets.NPM_TOKEN` (npm publish). The DailyBot identity (`🤖 DailyBot <ops@dailybot.com>`) is hardcoded.

Detailed walkthrough: **[Build & Deploy](BUILD_DEPLOY.md)**.

## Mental model summary

1. **Two-file runtime.** `src/index.ts` + `src/lib/emoji-lib.json`. Everything else is build/test/CI.
2. **The catalog is generated, not authored.** Edit `EMOJIS_SPECIAL_CASES` and regenerate; never hand-edit the JSON.
3. **One runtime dependency.** `@twemoji/parser`. Adding more dependencies requires justification — they ship to consumer bundles.
4. **Dual ESM/CommonJS shape.** The `module.exports` reattachment at the bottom of `src/index.ts` is non-negotiable.
5. **HTML output is a contract.** `<img class="emoji" alt="<unicode>" src="<url>"/>` — exactly that shape, forever (until a major bump).
6. **CI owns the release.** Humans never run `npm version` or `npm publish`. The merge to `main` is the release trigger.
