Class: RecognizeStream

RecognizeStream

new RecognizeStream(options)

pipe()-able Node.js Readable/Writeable stream - accepts binary audio and emits text/objects in it's data events.

Uses WebSockets under the hood. For audio with no recognizable speech, no data events are emitted.

By default, only finalized text is emitted in the data events, however in readableObjectMode (usually just objectMode when using a helper method).

An interim result looks like this (assuming all features are enabled):

 { alternatives:
   [ { timestamps:
        [ [ 'it', 20.9, 21.04 ],
          [ 'is', 21.04, 21.17 ],
          [ 'a', 21.17, 21.25 ],
          [ 'site', 21.25, 21.56 ],
          [ 'that', 21.56, 21.7 ],
          [ 'hardly', 21.7, 22.06 ],
          [ 'anyone', 22.06, 22.49 ],
          [ 'can', 22.49, 22.67 ],
          [ 'behold', 22.67, 23.13 ],
          [ 'without', 23.13, 23.46 ],
          [ 'some', 23.46, 23.67 ],
          [ 'sort', 23.67, 23.91 ],
          [ 'of', 23.91, 24 ],
          [ 'unwanted', 24, 24.58 ],
          [ 'emotion', 24.58, 25.1 ] ],
       transcript: 'it is a site that hardly anyone can behold without some sort of unwanted emotion ' } ],
  final: false,
  result_index: 3 }

While a final result looks like this (again, assuming all features are enabled):

  { alternatives:
     [ { word_confidence:
          [ [ 'it', 1 ],
            [ 'is', 0.956286624429304 ],
            [ 'a', 0.8105753725270362 ],
            [ 'site', 1 ],
            [ 'that', 1 ],
            [ 'hardly', 1 ],
            [ 'anyone', 1 ],
            [ 'can', 1 ],
            [ 'behold', 0.5273598005406737 ],
            [ 'without', 1 ],
            [ 'some', 1 ],
            [ 'sort', 1 ],
            [ 'of', 1 ],
            [ 'unwanted', 1 ],
            [ 'emotion', 0.49401837076320887 ] ],
         confidence: 0.881,
         transcript: 'it is a site that hardly anyone can behold without some sort of unwanted emotion ',
         timestamps:
          [ [ 'it', 20.9, 21.04 ],
            [ 'is', 21.04, 21.17 ],
            [ 'a', 21.17, 21.25 ],
            [ 'site', 21.25, 21.56 ],
            [ 'that', 21.56, 21.7 ],
            [ 'hardly', 21.7, 22.06 ],
            [ 'anyone', 22.06, 22.49 ],
            [ 'can', 22.49, 22.67 ],
            [ 'behold', 22.67, 23.13 ],
            [ 'without', 23.13, 23.46 ],
            [ 'some', 23.46, 23.67 ],
            [ 'sort', 23.67, 23.91 ],
            [ 'of', 23.91, 24 ],
            [ 'unwanted', 24, 24.58 ],
            [ 'emotion', 24.58, 25.1 ] ] },
       { transcript: 'it is a sight that hardly anyone can behold without some sort of unwanted emotion ' },
       { transcript: 'it is a site that hardly anyone can behold without some sort of unwanted emotions ' } ],
    final: true,
    result_index: 3 }
Parameters:
Name Type Description
options Object
Properties
Name Type Attributes Default Description
model String <optional>
'en-US_BroadbandModel'

voice model to use. Microphone streaming only supports broadband models.

url String <optional>
'wss://stream.watsonplatform.net/speech-to-text/api'

base URL for service

content-type String <optional>
'audio/wav'

content type of audio; can be automatically determined from file header in most cases. only wav, flac, and ogg/opus are supported

interim_results Boolean <optional>
false

Send back non-final previews of each "sentence" as it is being processed. Defaults to true when in objectMode.

continuous Boolean <optional>
true

set to false to automatically stop the transcription after the first "sentence"

word_confidence Boolean <optional>
false

include confidence scores with results. Defaults to true when in objectMode.

timestamps Boolean <optional>
false

include timestamps with results. Defaults to true when in objectMode.

max_alternatives Number <optional>
1

maximum number of alternative transcriptions to include. Defaults to 3 when in objectMode.

inactivity_timeout Number <optional>
30

how many seconds of silence before automatically closing the stream (even if continuous is true). use -1 for infinity

readableObjectMode Boolean <optional>
false

emit result objects instead of string Buffers for the data events. Changes several other defaults.

X-WDC-PL-OPT-OUT Number <optional>
0

set to 1 to opt-out of allowing Watson to use this request to improve it's services

//todo: investigate other options at http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/apis/#!/speech-to-text/recognizeSessionless

Source:

Methods

(inner) flowForResults(event)

listening for results events should put the stream in flowing mode just like data events

Parameters:
Name Type Description
event String
Source:

Events

close

Parameters:
Name Type Description
reasonCode Number
description String
Source:

connection-close

Parameters:
Name Type Description
reasonCode Number
description String
Deprecated:
  • Yes
Source:

data

Finalized text

Parameters:
Name Type Description
transcript String
Source:

data

Object with interim or final results, possibly including confidence scores, alternatives, and word timing.

Parameters:
Name Type Description
data Object
Source:

error

Parameters:
Name Type Attributes Description
msg String

custom error message

frame * <optional>

unprocessed frame (should have a .data property with either string or binary data)

err Error <optional>
Source:

results

Object with array of interim or final results, possibly including confidence scores, alternatives, and word timing. May have no results at all for empty audio files.

Parameters:
Name Type Description
results Object
Deprecated:
  • - use objectMode and listen for the 'data' event instead
Source:

results

Object with interim or final results, possibly including confidence scores, alternatives, and word timing.

Parameters:
Name Type Description
results Object
Deprecated:
  • - use objectMode and listen for the 'data' event instead
Source: