-
See Also
- ServiceExecute
- ServiceConnect
- SpeechRecognize
- SpeechSynthesize
- VoiceStyleData
-
- Service Connections
- OpenAI
-
-
See Also
- ServiceExecute
- ServiceConnect
- SpeechRecognize
- SpeechSynthesize
- VoiceStyleData
-
- Service Connections
- OpenAI
-
See Also
"GoogleSpeech" (Service Connection)
Connecting & Authenticating
Requests
Synthesize Audio from Text
"ListVoices" — returns a list of available voice styles
| Language | All | restrict the query to voices able to synthesize a given language |
"Synthesize" — returns speech synthesized from text
| "Input" | (required) | text to synthesize | |
| "Voice" | Automatic | name of the synthesis voice | |
| Language | Automatic | language of the synthesis voice | |
| "Pitch" | Automatic | semitone deviation from the native voice pitch | |
| "Rate" | Automatic | factor by which to change the native voice speed | |
| AudioEncoding | Automatic | output audio encoding | |
| GeneratedAssetLocation | $GeneratedAssetLocation | storage location of the synthesized audio | |
| GeneratedAssetFormat | Automatic | output format of the synthesized audio | |
| "EffectsProfileID" | Automatic | post-processing effect name applied to speech |
Recognize Text from Audio
"Recognize" — returns text transcribed from audio
| "Input" | (required) | audio to transcribe | |
| Language | "English" | language(s) of the contained speech | |
| "ChannelRecognition" | False | whether to transcribe each channel separately | |
| MaxItems | 1 | maximum number of hypotheses to return | |
| "ProfanityFilter" | False | whether to attempt to replace profanities | |
| "SpeechContexts" | {} | phrase hints to assist transcription | |
| "WordTimeOffsets" | True | return word time offsets with the result | |
| "WordConfidence" | False | return word confidence values with the result | |
| "Punctuation" | True | include punctuation in the transcription | |
| "SpokenPunctuation" | False | replace spoken punctuation with ASCII character | |
| "SpokenEmojis" | False | replace spoken emojis with Unicode character | |
| "SpeakerDiarization" | False | tag distinct speakers in the result | |
| "Model" | Automatic | specify a model to use for the request | |
| MetaInformation | None | metadata describing the input audio |
Parameter Details
| strw | give weight w to the string str | |
| {str1w1,str2w2,…} | give weight wi to the string stri |
| "large-automotive-class-device" | optimized for car speakers | |
| "small-bluetooth-speaker-class-device" | optimized for small home speakers |
| "latest_long" | optimized for long-form content | |
| "latest_short" | optimized for short-form content | |
| "command_and_search" | optimized for short queries |
Examples
open all close allBasic Examples (1)
Connect to Google speech service:
google = ServiceConnect["GoogleSpeech"]ServiceExecute[google, "Synthesize", {"Input" -> "Hello, world!"}]a = ExampleData[{"Audio", "MaleVoice"}, "Audio"];
ServiceExecute[google, "Recognize", {"Input" -> a, Language -> "en-GB"}]Scope (5)
Speech Synthesis (3)
google = ServiceConnect["GoogleSpeech"]ServiceExecute[google, "Synthesize", {"Input" -> "Hello, world!"}]Synthesize text in a different language. Setting "Language" to Automatic will infer the language from the input text, or a particular language can be specified. The service will attempt to select a voice style with the requested language:
ServiceExecute[google, "Synthesize", {"Input" -> "Hola mundo!", Language -> Automatic}]ServiceExecute[google, "Synthesize", {"Input" -> "Hola mundo!", Language -> Entity["Language", "Spanish::77gfp"]}]voices = ServiceExecute["GoogleSpeech", "ListVoices"];
voices//DatasetSynthesize speech using a particular voice:
vox = SelectFirst[voices, #Gender === "Female" && MemberQ[#Language, "Polish"]&]ServiceExecute["GoogleSpeech", "Synthesize", {"Input" -> "Witam świecie!", "Voice" -> vox["Name"]}]Make the speech faster and lower in pitch:
ServiceExecute["GoogleSpeech", "Synthesize", {"Input" -> "Hello, world!", "Pitch" -> -10, "Rate" -> 1.5}]Speech Recognition (2)
Transcribe text from audio containing speech:
a = ExampleData[{"Audio", "MaleVoice"}, "Audio"]By default, everything from the API response is returned, including information about recognized words:
ServiceExecute["GoogleSpeech", "Recognize", {"Input" -> a, Language -> "en-GB"}]Return multiple guesses of the transcription:
ServiceExecute["GoogleSpeech", "Recognize", {"Input" -> a, Language -> "en-GB", MaxItems -> 4, "ReturnProperties" -> "transcript"}]Separate different speakers from a recording:
in = AudioJoin[AudioChannelMix[#, 1]& /@ {ExampleData[{"Audio", "MaleVoice"}], AudioTrim[ExampleData[{"Audio", "FemaleVoice"}], 2.14]}]//NormalSpecify the minimum and maximum number of speakers:
res = ServiceExecute["GoogleSpeech", "Recognize", {"Input" -> in, "SpeakerDiarization" -> {2, 2}, "WordConfidence" -> False}]Display labeled words in a Dataset. The API currently returns speaker labels in the second result:
Dataset[Lookup[Lookup[Lookup[res, "results"][[2]], "alternatives"][[1]], "words"]]See Also
ServiceExecute ▪ ServiceConnect ▪ SpeechRecognize ▪ SpeechSynthesize ▪ VoiceStyleData
Service Connections: OpenAI