Class: RecognizeStream

RecognizeStream

new RecognizeStream(options)

pipe()-able Node.js Readable/Writeable stream - accepts binary audio and emits text/objects in it's data events.

Uses WebSockets under the hood. For audio with no recognizable speech, no data events are emitted.

By default, only finalized text is emitted in the data events, however in readableObjectMode (usually just objectMode when using a helper method).

An interim result looks like this:

 { alternatives:
   [ { timestamps:
        [ [ 'it', 20.9, 21.04 ],
          [ 'is', 21.04, 21.17 ],
          [ 'a', 21.17, 21.25 ],
          [ 'site', 21.25, 21.56 ],
          [ 'that', 21.56, 21.7 ],
          [ 'hardly', 21.7, 22.06 ],
          [ 'anyone', 22.06, 22.49 ],
          [ 'can', 22.49, 22.67 ],
          [ 'behold', 22.67, 23.13 ],
          [ 'without', 23.13, 23.46 ],
          [ 'some', 23.46, 23.67 ],
          [ 'sort', 23.67, 23.91 ],
          [ 'of', 23.91, 24 ],
          [ 'unwanted', 24, 24.58 ],
          [ 'emotion', 24.58, 25.1 ] ],
       transcript: 'it is a site that hardly anyone can behold without some sort of unwanted emotion ' } ],
  final: false,
  result_index: 3 }

While a final result looks like this (some features only appear in final results):

  { alternatives:
     [ { word_confidence:
          [ [ 'it', 1 ],
            [ 'is', 0.956286624429304 ],
            [ 'a', 0.8105753725270362 ],
            [ 'site', 1 ],
            [ 'that', 1 ],
            [ 'hardly', 1 ],
            [ 'anyone', 1 ],
            [ 'can', 1 ],
            [ 'behold', 0.5273598005406737 ],
            [ 'without', 1 ],
            [ 'some', 1 ],
            [ 'sort', 1 ],
            [ 'of', 1 ],
            [ 'unwanted', 1 ],
            [ 'emotion', 0.49401837076320887 ] ],
         confidence: 0.881,
         transcript: 'it is a site that hardly anyone can behold without some sort of unwanted emotion ',
         timestamps:
          [ [ 'it', 20.9, 21.04 ],
            [ 'is', 21.04, 21.17 ],
            [ 'a', 21.17, 21.25 ],
            [ 'site', 21.25, 21.56 ],
            [ 'that', 21.56, 21.7 ],
            [ 'hardly', 21.7, 22.06 ],
            [ 'anyone', 22.06, 22.49 ],
            [ 'can', 22.49, 22.67 ],
            [ 'behold', 22.67, 23.13 ],
            [ 'without', 23.13, 23.46 ],
            [ 'some', 23.46, 23.67 ],
            [ 'sort', 23.67, 23.91 ],
            [ 'of', 23.91, 24 ],
            [ 'unwanted', 24, 24.58 ],
            [ 'emotion', 24.58, 25.1 ] ] },
       { transcript: 'it is a sight that hardly anyone can behold without some sort of unwanted emotion ' },
       { transcript: 'it is a site that hardly anyone can behold without some sort of unwanted emotions ' } ],
    final: true,
    result_index: 3 }
Parameters:
Name Type Description
options Object
Properties
Name Type Attributes Default Description
model String <optional>
'en-US_BroadbandModel'

voice model to use. Microphone streaming only supports broadband models.

url String <optional>
'wss://stream.watsonplatform.net/speech-to-text/api'

base URL for service

content-type String <optional>
'audio/wav'

content type of audio; can be automatically determined from file header in most cases. only wav, flac, and ogg/opus are supported

interim_results Boolean <optional>
true

Send back non-final previews of each "sentence" as it is being processed. These results are ignored in text mode.

continuous Boolean <optional>
true

set to false to automatically stop the transcription after the first "sentence"

word_confidence Boolean <optional>
false

include confidence scores with results. Defaults to true when in objectMode.

timestamps Boolean <optional>
false

include timestamps with results. Defaults to true when in objectMode.

max_alternatives Number <optional>
1

maximum number of alternative transcriptions to include. Defaults to 3 when in objectMode.

keywords Array.<String> <optional>

a list of keywords to search for in the audio

keywords_threshold Number <optional>

Number between 0 and 1 representing the minimum confidence before including a keyword in the results. Required when options.keywords is set

word_alternatives_threshold Number <optional>

Number between 0 and 1 representing the minimum confidence before including an alternative word in the results. Must be set to enable word alternatives,

profanity_filter Boolean <optional>
false

set to true to filter out profanity and replace the words with *'s

inactivity_timeout Number <optional>
30

how many seconds of silence before automatically closing the stream (even if continuous is true). use -1 for infinity

readableObjectMode Boolean <optional>
false

emit result objects instead of string Buffers for the data events. Changes several other defaults.

X-WDC-PL-OPT-OUT Number <optional>
0

set to 1 to opt-out of allowing Watson to use this request to improve it's services

Source:

Methods

(inner) flowForResults(event)

listening for results events should put the stream in flowing mode just like data events

Parameters:
Name Type Description
event String
Source:

Events

close

Parameters:
Name Type Description
reasonCode Number
description String
Source:

connection-close

Parameters:
Name Type Description
reasonCode Number
description String
Deprecated:
  • Yes
Source:

data

Finalized text

Parameters:
Name Type Description
transcript String
Source:

data

Object with interim or final results, possibly including confidence scores, alternatives, and word timing.

Parameters:
Name Type Description
data Object
Source:

error

Parameters:
Name Type Attributes Description
msg String

custom error message

frame * <optional>

unprocessed frame (should have a .data property with either string or binary data)

err Error <optional>
Source:

receive-json

Parameters:
Name Type Description
msg Object

the raw JSON received from Watson - sometimes useful for debugging

Source:

results

Object with interim or final results, possibly including confidence scores, alternatives, and word timing.

Parameters:
Name Type Description
results Object
Deprecated:
  • - use objectMode and listen for the 'data' event instead
Source:

results

Object with array of interim or final results, possibly including confidence scores, alternatives, and word timing. May have no results at all for empty audio files.

Parameters:
Name Type Description
results Object
Deprecated:
  • - use objectMode and listen for the 'data' event instead
Source:

send-json

Parameters:
Name Type Description
msg Object

the raw JSON sent to Watson - sometimes useful for debugging

Source: