-
-
Notifications
You must be signed in to change notification settings - Fork 716
Fix SAPI5 #18300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix SAPI5 #18300
Conversation
From the logs:
It seems like that a call to @jcsteh Do you have any idea about why this could happen? Is there some way to inspect the C++ code via dump files, or do I have to insert log outputs in the C++ code? |
I'm not super familiar with how the new sapi5 code works. Generally, any synth calls on the main thread should never block; any call that could block should be handled on a background thread. That said, I'm guessing this doesn't normally block? Otherwise, we'd be seeing a lot more problems than this. As to why sync would be hanging, I'm not sure. It suggests the stream hangs without being stopped and never reaches the number of frames we sent, but also never throws an error. I can't fathom how that could happen. This does seem to be unique to SAPI5, so that leads me to wonder what is different about SAPI5. One thing it does that no other drivers currently do is call feed() with null data. The code supposedly supports this, but I wonder whether I didn't account for something. I'll think on that. As for debugging, you can use WinDBG to break in and inspect the stack, etc. or save a minidump for later inspection. However, this can be really tricky for screen reader users, since your screen reader is obviously unusable in that state. I can sometimes manage it by firing up Narrator as well, but in-process injection can make things really unstable even if you try to use another screen reader. If you do manage to get a minidump, I can try to take a look. |
The problem is that I couldn't reproduce this, and this log is from a user's report. I changed the SAPI5 code, so that the audio data is sent to NVDA's code, instead of directly to the speaker via WinMM. SAPI5 automatically creates a new thread to run the voice engine (since it can speak asynchronously), so NVDA will receive the audio data on the new thread. The engine may also generate event notifications, which is usually routed to the main loop, but I chose to use So the code that feeds the audio to the WavePlayer, and the I'm not sure whether WavePlayer is thread-safe enough, or at least safe to call I also think that stucking in
So I would assume that this thread was stuck in the The code in def EndStream(self, streamNum: int, pos: int):
synth = self.synthRef()
# Flush the stream and get the remaining data.
synth.sonicStream.flush()
audioData = synth.sonicStream.readShort()
synth.player.feed(audioData, len(audioData) * 2)
synth.player.idle()
# trigger all untriggered bookmarks
if streamNum in synth._streamBookmarks:
for bookmark in synth._streamBookmarks[streamNum]:
synthIndexReached.notify(synth=synth, index=bookmark)
del synth._streamBookmarks[streamNum]
synth.isSpeaking = False
synthDoneSpeaking.notify(synth=synth) It calls And if the call to I'm not sure if it's possible to let the user get a dump file when NVDA got frozen. Maybe it would be helpful if NVDA can, in addition to dumping the Python call stacks of all threads, also generate a dump file that the user can submit. |
It is designed to support calling feed on one thread and stop on another without the need for locks. That is necessary in order for stop to instantaneously interrupt audio and every synth driver relies on this. It is not safe to call feed on multiple threads at once, but nothing should be doing that.
If the user pauses audio (e.g. with the shift key), you will get stuck in sync. But that should resume as soon as the user stops audio (e.g. any other key press). Basically, sync will block until all sent audio has been played, unless it is stopped from another thread during the call. Obviously, something is going wrong there, but I don't know what, and it's going to be nearly impossible to figure out without being able to reproduce it.
Maybe. We do have code to generate a dump when NVDA crashes. However, generating a dump within the same process can be risky. There can also be a lot of temporary freezes, so we'd want to do this with an advanced setting or something to avoid cluttering up the user's system (and potentially destabilising NVDA) with lots of dumps. |
@jcsteh Since only one utterance can be spoken at a time, will Also please give me some idea about when exactly should |
NVDA will call cancel only if speech is to be interrupted. That could be because the user pressed a key or because of an app change or similar.
Queued. When cancel is called, stop audio and purge the queue.
Ideally, you would call idle when there is no more data for the current utterance and there are no more utterances in the queue. If you can't do that for whatever reason, calling it when there is no more data for the current utterance will be fine.
idle will block until the last utterance is finished. At that point, the synth will be ready with the next utterance, so it will call feed, at which point audio will resume immediately.
Correct. That's why we need idle (which calls sync): for when the last part of an utterance is spoken and there is no more data to speak yet. |
Btw, when I say you need a queue, it's fine if SAPI provides its own queue. You just need a way to purge the queue (or at least ignore certain entries) if cancel is called. |
It's just weird that |
As a test, can you try feeding a small chunk of silence (even just a few bytes) when you would normally call feed with None? That's obviously not practical in the real code, but I wonder if it'll fix the hang. If it does, that at least narrows it down to that case, which should make it easier to debug. That's the only difference I can think of between the way the sapi5 code uses nvwave vs everything else. |
Ah, I think I was just able to reproduce this. I won't have time to look into this for a few days, but if I can find some time, I'll see if I can get a C++ debugger onto it to get a stack trace from WasapiPlayer. |
How did you reproduce that? I'm not a screen reader user actually, so please tell me something I could try. |
Honestly, I was just tabbing around rapidly in NVDA's Speech settings dialog. I suspect it might be one of those tricky situations where I was able to reproduce it once and won't be able to easily reproduce it again, but I'll try when I can. |
What version did you use? The current stable release, or the latest snapshot in this pull request? |
It's an old alpha, so not an entirely valid test. Version: alpha-36837,8c9efe82 (2025.2.0.36837) |
What SAPI5 voice did you use? I still could not easily reproduce the issue by tabbing around or mashing keyboard keys. Not sure whether this has to do with a specific voice, but the issue report says that any SAPI5 voice could do. I really hope that there is some easy way for users to capture a dump file for NVDA at any moment. I cannot imagine what a screen reader user can do when the screen reader becomes frozen, though. Theoretically, there are some tools that can create a dump file of a specific process, but when NVDA is frozen, further operation becomes difficult, as NVDA has mouse and keyboard hooks installed. Even if there's a dump file, you need debug symbols to analyze it, so how can I get the debug symbols or PDB files for a particular version? Or maybe, outputting some log lines inside the C++ code may be helpful. So how can I log things to the NVDA log inside C++ code? |
I was able to reproduce it twice: once with Microsoft David and once with a voice called BlastBay Richard. Now I can't reproduce it again at all despite trying for about 15 minutes. Unfortunately, although I did manage to get a debugger onto it at one point, I was running the wrong debugger (x64 instead of x86) and then I accidentally terminated the process instead of attaching the correct debugger.
It's not easy to implement this, unfortunately.
What I do here is run the debugger as administrator so that NVDA can't hook it. Then I use Narrator to interact with the debugger. It's a very painful process and it only works some of the time, but if I can reproduce the problem enough times, I can usually get what I need.
There used to be a debug symbol server where symbols for all releases and snapshots were uploaded. However, this article hasn't been updated and I don't know where that symbol server is located now or whether it still exists. If you're debugging locally, your source copy will include pdb files and you don't need the symbol server.
LOG_DEBUGWARNING, LOG_ERROR, etc. |
So now one user reported that the build at commit 77f6702 (Wait for cancellation to complete) works for them. Commits after that are mostly comments, docstrings, and small refactorings. |
cc @seanbudd |
@cary-rowen - this is not a risk of API breaking changes, rather that we want sufficient testing - hence the merge early label |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes critical deadlock and startup issues in the SAPI5 speech synthesizer driver. The main problem was that SAPI5's audio thread would sometimes wait for WavePlayer.idle()
, causing SpVoice.Speak()
to block and freeze NVDA completely. Additionally, SAPI5 Eloquence voices would fail to load on subsequent NVDA launches.
Key changes include:
- Implementing a dedicated speak thread to handle speech requests asynchronously and prevent deadlocks
- Replacing the legacy audio streaming system with a modern
ISpAudio
implementation for better format negotiation - Refactoring bookmark management to use a simpler queue-based approach
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
File | Description |
---|---|
user_docs/en/changes.md | Added changelog entries for the two fixed issues |
source/synthDrivers/sapi5.py | Complete refactor of audio handling, threading, and event management to fix deadlocks and improve reliability |
source/synthDrivers/sapi4.py | Added missing return type declaration for CoTaskMemAlloc function |
Comments suppressed due to low confidence (1)
source/synthDrivers/sapi5.py:203
- The class name 'SynthDriverAudioStream' is misleading since it now implements ISpAudio, ISpEventSource, and ISpEventSink interfaces, not just audio streaming. Consider renaming to 'SynthDriverAudio' to better reflect its expanded responsibilities.
class SynthDriverAudioStream(COMObject):
Link to issue number:
Fixes #18298. Fixes #18301.
Summary of the issue:
When using SAPI 5 voices, NVDA may be frozen at some point. Seems like the thread was stuck calling
WavePlayer.idle
.After selecting a SAPI5 Eloquence voice, NVDA would fail to load the same voice and fall back to OneCore voices on next launch.
Description of user facing changes:
The issue above should be fixed.
Description of developer facing changes:
synthDrivers.sapi5
are deprecated with no replacement:LP_c_ubyte
LP_c_ulong
LP__ULARGE_INTEGER
SynthDriver.isSpeaking
Description of development approach:
SpVoice.Speak()
waits for SAPI5's audio thread, and if the audio thread waits forWavePlayer.idle()
,SpVoice.Speak()
will also block, causing dead-locks if the WavePlayer is being paused.To avoid this, a dedicated thread
_speakThread
is created to send speak requests to SAPI5 one by one._speakRequests
, is maintained. Whenever the synth wants to speak, it places a speak request in the queue.WavePlayer.idle
. Neither the SAPI5 audio thread nor the main thread will be blocked.cancel
is called,WavePlayer.stop
is called to break out of pausing or idling. The rest of the cancelling work is done in the thread.SynthDriverAudioStream
now implements theISpAudio
interface, which can get the wave format of the voice directly during format negotiation, making it possible to get rid of the legacy audio system entirely.Testing strategy:
A user reported that it ran hours without crashing.
Known issues with pull request:
Text to be spoken is stored in each speak request, but prosody arguments, such as rate and volume, are not. All speak requests will be spoken in the current prosody setting.
A user reported that this caused the NVDA key to stop working sometimes, and NVDA did not start on the logon screen. Those two issues might not have to do with this change.
Code Review Checklist:
@coderabbitai summary