# @jackdbd/eleventy-plugin-text-to-speech

[![npm version](https://badge.fury.io/js/@jackdbd%2Feleventy-plugin-text-to-speech.svg)](https://badge.fury.io/js/@jackdbd%2Feleventy-plugin-text-to-speech)
![Snyk Vulnerabilities for npm package](https://img.shields.io/snyk/vulnerabilities/npm/@jackdbd%2Feleventy-plugin-text-to-speech)

Eleventy plugin that synthesizes **any text** you want, on **any page** of your Eleventy site, using the [Google Cloud Text-to-Speech API](https://cloud.google.com/text-to-speech). You can either self-host the audio assets this plugin generates, or host them on [Cloud Storage](https://cloud.google.com/storage).

> :warning: The Cloud Text-to-Speech API has a [limit of 5000 characters](https://cloud.google.com/text-to-speech/quotas).
> 
> See also:
>
> - [this issue of the Wavenet for Chrome extension](https://github.com/wavenet-for-chrome/extension/issues/12)
>
> - [this discussion on Google Groups](https://groups.google.com/g/google-translate-api/c/2JsRdq0tEdA)

## Installation

```sh
npm install --save-dev @jackdbd/eleventy-plugin-text-to-speech
```

## Preliminary Operations

### Enable the Text-to-Speech API

Before you can begin using the Text-to-Speech API, you must enable it. You can enable the API with the following command:

```sh
gcloud services enable texttospeech.googleapis.com
```

### Set up authentication via a service account

This plugin uses the [official Node.js client library for the Text-to-Speech API](https://github.com/googleapis/nodejs-text-to-speech). In order to authenticate to any Google Cloud API you will need some kind of credentials. At the moment this plugin supports only authentication via a service account JSON key.

First, create a service account that can use the Text-to-Speech API. You can also reuse an existing service account if you want. You just need the service account, no need to configure any IAM permissions.

```sh
gcloud iam service-accounts create sa-text-to-speech-user \
  --display-name "Text-to-Speech user SA"
```

Second, [download the JSON key of this service account](https://cloud.google.com/iam/docs/creating-managing-service-account-keys) and store it somewhere safe. Do **not** track this file in git.

### Optional: Create Cloud Storage bucket (only if you want to host audio files on Cloud Storage)

Create a Cloud Storage bucket in your desired [location](https://cloud.google.com/storage/docs/locations). Enable [uniform bucket-level access](https://cloud.google.com/storage/docs/uniform-bucket-level-access) and use the `nearline` [storage class](https://cloud.google.com/storage/docs/storage-classes).

```sh
gsutil mb \
  -p $GCP_PROJECT_ID \
  -l $CLOUD_STORAGE_LOCATION \
  -c nearline \
  -b on \
  gs://bkt-eleventy-plugin-text-to-speech-audio-files
```

If you want, you can check that uniform bucket-level access is **enabled** using this command:

```sh
gsutil uniformbucketlevelaccess get \
  gs://bkt-eleventy-plugin-text-to-speech-audio-files
```

Make the bucket's objects publicly available for read access (otherwise people will not be able to listen/download the audio files):

```sh
gsutil iam ch allUsers:objectViewer \
  gs://bkt-eleventy-plugin-text-to-speech-audio-files
```

## Usage

Let's say that you are hosting your Eleventy website on Cloudflare Pages. Your current deployment is at the URL indicated by the [environment variable](https://developers.cloudflare.com/pages/platform/build-configuration/#environment-variables) `CF_PAGES_URL`.

### Self-hosting the generated audio assets

If you want to self-host the audio assets that this plugin generates and use all default options, you can register the plugin with this code:

```js
const { plugin: tts } = require('@jackdbd/eleventy-plugin-text-to-speech')

module.exports = function (eleventyConfig) {
  // some eleventy configuration...

  eleventyConfig.addPlugin(tts, {
    audioHost: process.env.CF_PAGES_URL
      ? new URL(`${process.env.CF_PAGES_URL}/assets/audio`)
      : new URL('http://localhost:8090/assets/audio')
  })

  // some more eleventy configuration...
}
```

### Hosting the generated audio assets on Cloud Storage

If you want to host the audio assets on a Cloud Storage bucket and configure the rules for the audio matches, you could register the plugin using something like this:

```js
const { plugin: tts } = require('@jackdbd/eleventy-plugin-text-to-speech')

module.exports = function (eleventyConfig) {
  // some eleventy configuration...

  eleventyConfig.addPlugin(tts, {
    audioHost: {
      bucketName: 'some-bucket-containing-publicly-readable-files'
    },
    rules: [
      // synthesize the text contained in all <h1> tags, in all posts
      {
        regex: new RegExp('posts\\/.*\\.html$'),
        cssSelectors: ['h1']
      },
      // synthesize the text contained in all <p> tags that start with "Once upon a time", in all HTML pages, except the 404.html page
      {
        regex: new RegExp('^((?!404).)*\\.html$'),
        xPathExpressions: ['//p[starts-with(., "Once upon a time")]']
      }
    ],
    voice: 'en-GB-Wavenet-C'
  })

  // some more eleventy configuration...
}
```

### Multiple hosts

If you want to host the generated audio assets on multiple hosts, register this plugin multiple times. Here are a few examples:

- self-host some audio assets, and host on a Cloud Storage bucket some other assets
- host all audio assets on Cloud Storage, but host some on one bucket, and some others on a different bucket.

Have a look at the Eleventy configuration of the [demo-site in this monorepo](../demo-site/README.md).

## Configuration

### Required parameters

| Parameter | Explanation |
| --- | --- |
| `audioHost` | Each audio host should have a matching writer responsible for writing/uploading the assets to the host. |

### Options

| Option | Default | Explanation |
| --- | --- | --- |
| `audioEncodings` | `['OGG_OPUS', 'MP3']` | List of [audio encodings](https://cloud.google.com/speech-to-text/docs/encoding#audio-encodings) to use when generating audio assets from text matches. |
| `audioInnerHTML` | see in [src/dom.ts](./src/dom.ts) | Function to use to generate the innerHTML of the `<audio>` tag to inject in the page for each text match. |
| `cacheExpiration` | `365d` | Expiration for the 11ty AssetCache. See [here](https://www.11ty.dev/docs/plugins/fetch/#change-the-cache-duration). |
| `collectionName` | `audio-items` | Name of the 11ty collection created by this plugin. |
| `keyFilename` | `process.env.GOOGLE_APPLICATION_CREDENTIALS` | credentials for the Cloud Text-to-Speech API (and for the Cloud Storage API if you don't set it in `audioHost`). |
| `rules` | see in [src/constants.ts](./src/constants.ts) | Rules that determine which texts to convert into speech. |
| `transformName` | `inject-audio-tags-into-html` | Name of the 11ty transform created by this plugin. |
| `voice` | `en-US-Standard-J` | Voice to use when generating audio assets from text matches. The Speech-to-Text API supports [these voices](https://cloud.google.com/text-to-speech/docs/voices), and might have different [pricing](https://cloud.google.com/text-to-speech/pricing) for diffent voices. |

> :warning: Don't forget to set either `keyFilename` or the `GOOGLE_APPLICATION_CREDENTIALS` environment variable on your build server.

## Debug

This plugin uses the [debug](https://github.com/debug-js/debug) library for logging. You can control what's logged using the `DEBUG` environment variable. For example, if you set your environment variables in a `.envrc` file, you could do:

```sh
# print all logging statements
export DEBUG=eleventy-plugin-text-to-speech/*

# print just the logging statements from the dom module and the writers module
export DEBUG=eleventy-plugin-text-to-speech/dom,eleventy-plugin-text-to-speech/writers

# print all logging statements, except the ones from the dom module and the transforms module
export DEBUG=eleventy-plugin-text-to-speech/*,-eleventy-plugin-text-to-speech/dom,-eleventy-plugin-text-to-speech/transforms
```

## Credits

I had the idea of this plugin while reading the code of the homonym [eleventy-plugin-text-to-speech](https://github.com/larryhudson/eleventy-plugin-text-to-speech) by [Larry Hudson](https://larryhudson.io/). There are a few differences between these plugins, the main one is that this plugin uses the [Google Cloud Text-to-Speech API](https://cloud.google.com/text-to-speech), while Larry's plugin uses the [Microsoft Azure Speech SDK](https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-sdk).