

Applications can initialize and handle these real-time events using ISpNotifySource, ISpNotifySink, ISpNotifyTranslator, ISpEventSink, ISpEventSource, and ISpNotif圜allback. Applications can sync to real-time actions as they occur such as word boundaries, phoneme or viseme (mouth animation) boundaries or application custom bookmarks. For TTS, events are mostly used for synchronizing to the output speech. SAPI communicates with applications by sending events using standard callback mechanisms (Window Message, callback proc or Win32 Event). In addition to the ISpVoice interface, SAPI also provides many utility COM interfaces for the more advanced TTS applications. Also while speaking asynchronously, new text can be spoken by either immediately interrupting the current output (SPF_PURGEBEFORESPEAK), or by automatically appending the new text to the end of the current output. When speaking asynchronously (SPF_ASYNC), real-time status information such as speaking state and current text location can polled using ISpVoice::GetStatus. The IspVoice::Speak method can operate either synchronously (return only when completely finished speaking) or asynchronously (return immediately and speak as a background process). See the XML TTS Tutorial for more details. This synthesis markup, using standard XML format, is a simple but powerful way to customize the TTS speech, independent of the specific engine or voice currently in use. Special SAPI controls can also be inserted along with the input text to change real-time synthesis properties like voice, pitch, word emphasis, speaking rate and volume. In addition, the IspVoice interface also provides several methods for changing voice and synthesis properties such as speaking rate ISpVoice::SetRate, output volume ISpVoice::SetVolume and changing the current speaking voice ISpVoice::SetVoice Once an application has created an ISpVoice object (see Text-to-Speech Tutorial), the application only needs to call ISpVoice::Speak to generate speech output from some text data. Speech recognizers convert human spoken audio into readable text strings and files.Īpplications can control text-to-speech (TTS) using the ISpVoice Component Object Model (COM) interface. TTS systems synthesize text strings and files into spoken audio using synthetic voices. The two basic types of SAPI engines are text-to-speech (TTS) systems and speech recognizers.

SAPI implements all the low-level details needed to control and manage the real-time operations of various speech engines. The SAPI API provides a high-level interface between an application and speech engines. This section covers the following topics:
WINDOWS 10 VOICES SAPI CODE
The SAPI application programming interface (API) dramatically reduces the code overhead required for an application to use speech recognition and text-to-speech, making speech technology more accessible and robust for a wide range of applications. NET Core (the call in PowerShell Core is a stub method).Microsoft Speech API 5.3 Speech API Overview

I suspect that the set by ref part is the problem, which may be related to the following problem, quoted from this GitHub issue:

WINDOWS 10 VOICES SAPI WINDOWS
(BTW - I am using SAPI.SPVoice because it works in both PowerShell Core and PowerShell Desktop on Windows 10) $ I am using the ComObject SAPI.SPVoice but I am finding that I cannot change the actual voice used. I would like to be able to select an alternative voice for my Text-To-Speech output.
