You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Make SAPI5 & MSSP voices use WavePlayer (WASAPI) (nvaccess#17592)
Closesnvaccess#13284
Summary of the issue:
Currently, SAPI5 and MSSP voices use their own audio output mechanisms, instead of using the WavePlayer (WASAPI) inside NVDA.
This may make them less responsive compared to eSpeak and OneCore voices, which are using the WavePlayer, or compared to other screen readers using SAPI5 voices, according to my test result.
This also gives NVDA less control of audio output. For example, audio ducking logic inside WavePlayer cannot be applied to SAPI5 voices, so additional code is required to compensate for this.
Description of user facing changes
SAPI5 and MSSP voices will be changed to use the WavePlayer, which may make them more responsive (have less delay).
According to my test result, this can reduce the delay by at least 50ms.
This haven't trimmed the leading silence yet. If we do that also, we can expect the delay to be even less.
Description of development approach
Instead of setting self.tts.audioOutput to a real output device, do the following:
create an implementation class SynthDriverAudioStream to implement COM interface IStream, which can be used to stream in audio data from the voices.
Use an SpCustomStream object to wrap SynthDriverAudioStream and provide the wave format.
Assign the SpCustomStream object to self.tts.AudioOutputStream, so SAPI will output audio to this stream instead.
Each time an audio chunk needs to be streamed in, ISequentialStream_RemoteWrite will be called, and we just feed the audio to the player. IStream_RemoteSeek can also be called when SAPI wants to know the current byte position of the stream (dlibMove should be zero and dwOrigin should be STREAM_SEEK_CUR in this case), but it is not used to actually "seek" to a new position. IStream_Commit can be called by MSSP voices to "flush" the audio data, where we do nothing. Other methods are left unimplemented, as they are not used when acting as an audio output stream.
Previously, comtypes.client.GetEvents was used to get the event notifications. But those notifications will be routed to the main thread via the main message loop. According to the documentation of ISpNotifySource:
Note that both variations of callbacks as well as the window message notification require a window message pump to run on the thread that initialized the notification source. Callback will only be called as the result of window message processing, and will always be called on the same thread that initialized the notify source. However, using Win32 events for SAPI event notification does not require a window message pump.
Because the audio data is generated and sent via IStream on a dedicated thread, receiving events on the main thread can make synchronizing events and audio difficult.
So here SapiSink is changed to become an implementation of ISpNotifySink. Notifications received via ISpNotifySink are "free-threaded", sent on the original thread instead of being routed to the main thread.
To connect the sink, use ISpNotifySource::SetNotifySink.
To get the actual event that triggers the notification, use ISpEventSource::GetEvents. Events can contain pointers to objects or memory, so they need to be freed manually.
Finally, all audio ducking related code are removed. Now WavePlayer should be able to handle audio ducking when using SAPI5 and MSSP voices.
Copy file name to clipboardExpand all lines: user_docs/en/changes.md
+5Lines changed: 5 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -52,6 +52,8 @@ To use this feature, "allow NVDA to control the volume of other applications" mu
52
52
Prefix matching on command line flags, e.g. using `--di` for `--disable-addons` is no longer supported. (#11644, @CyrilleB79)
53
53
* The keyboard settings for "Speak typed characters" and "Speak typed words" now have three options: Off, Always, and Only in edit controls. (#17505, @Cary-rowen)
54
54
* By default, "Speak typed characters" is now set to "Only in edit controls".
55
+
* Microsoft Speech API version 5 and Microsoft Speech Platform voices now use WASAPI for audio output, which may improve the responsiveness of those voices. (#13284, @gexgd0419)
56
+
55
57
56
58
### Bug Fixes
57
59
@@ -171,6 +173,9 @@ As the NVDA update check URL is now configurable directly within NVDA, no replac
171
173
*`updateCheck.UpdateAskInstallDialog` no longer automatically performs an action when the update or postpone buttons are pressed.
172
174
Instead, a `callback` property has been added, which returns a function that performs the appropriate action when called with the return value from the dialog. (#17582)
173
175
* Dialogs opened with `gui.runScriptModalDialog` are now recognised as modal by NVDA. (#17582)
176
+
* Because SAPI5 voices now use `nvwave.WavePlayer` to output audio: (#17592, @gexgd0419)
177
+
*`synthDrivers.sapi5.SPAudioState` has been removed.
178
+
*`synthDrivers.sapi5.SynthDriver.ttsAudioStream` has been removed.
174
179
* Changed keyboard typing echo configuration from boolean to integer values. (#17505, @Cary-rowen)
175
180
*`config.conf["keyboard"]["speakTypedCharacters"]` and `config.conf["keyboard"]["speakTypedWords"]` now use integer values.
176
181
* Added `TypingEcho` enum in `config.configFlags` to represent these modes, 0=Off, 1=Only in edit controls, 2=Always.
0 commit comments