-
Notifications
You must be signed in to change notification settings - Fork 83
Hotfix race condition in mutex creation/deletion #201
Conversation
177171f
to
e778f2d
Compare
Current coverage is 88.79% (diff: 85.71%)@@ master #201 diff @@
==========================================
Files 1 1
Lines 986 991 +5
Methods 0 0
Messages 0 0
Branches 156 158 +2
==========================================
+ Hits 876 880 +4
Misses 83 83
- Partials 27 28 +1
|
Thanks for not only noticing a bug, but also proposing a patch to fix it! However, the code seems somewhat suspicious to me... Can you reproduce the issue? If so, does it help if you change the code such that the return value of the |
I looks like the issue does not occur in master since 1b3f3c1. But I think it just mitigated the race condition with staying in the lock just a little bit longer by adding For the recommendation of CreateMutex there is an with def __init__(self, mutexName, timeoutMs):
...
self._mutex = windll.kernel32.CreateMutexW(
wintypes.INT(0),
wintypes.INT(0),
mutexName)
self.mutextName= mutexName
self.error = windll.kernel32.GetLastError()
print("<Handle> %s = %s, %s" % (self._mutex, self.mutextName, self.error)) and def acquire(self):
...
errorString = 'Error! WaitForSingleObject returns {result}, last error {error}, {mutex}, {create_mutext_error}'.format(
result=result,
error=windll.kernel32.GetLastError(),
mutex=self.mutextName,
create_mutext_error=self.error
) One can see that the mutex was created with ERROR_ALREADY_EXISTS, as many others.
So we can see that handles to the mutex get reused and are shared across test cases or maybe invocations of clcache. If
closes the last handle 396 to the mutex, then of course the Kernel can rid of the mutex, even if another invocation holds the same handle, it's invalid. My WinAPI experience is limited and I would be glad if someone could take a look, otherwise I would need to brush up here :) So the question that are there for me:
|
I think with child processes we may can avoid this by using def __init__(self, mutexName, timeoutMs):
...
self._mutex= windll.kernel32.CreateMutexW(..)
if windll.kernel32.GetLastError() == 183:
self._mutex = windll.kernel32.DuplicateHandle(..) I see similar things in other projects, e.g. here https://src.chromium.org/viewvc/chrome/trunk/src/sandbox/win/src/sharedmem_ipc_server.cc?r1=252782&r2=252781&pathrev=252782 |
Yes. Mutexes created via
The
clcache uses the latter variant: you can call
That I don't know. |
Quite frankly, it escapes me why Can you reproduce the issue? In that case, we could at least try a couple of things to see whether it helps. |
@hubx Can you still reproduce the issue with the recent clcache 3.3.0 release? As it is, I can't seem to reproduce the bug and I also don't see anything obviously wrong in the code so I can't really proceed with this ticket. |
Sure mutexes should be shared across processes. My point was that different processes use the same handle across processes. If you see my previous comment, I wanted to point that out that handle 396 occurs twice, 392 thrice (#201 (comment)). So I assume actually the first pattern in the quoted documentation is used:
This can be specific of how the integrationtest.py call clcache.py in this case (4 sibling process called after each other). But all in all I'm fine with closing this issue, we can reopen/references this ticket here once we see this issue again. |
@hubx Ah, so in #201 (comment) there are two concurrent processes having the same handle? My interpretation was that one process creates the mutex handle, then closes it, then another process creates the handle -- and simply happens to receive the same handle value because it's free. So you're saying that there are two processes with the same handle at the same time? For what it's worth, the CreateMutex documentation documents the first function argument as
...and in the code, we do self._mutex = windll.kernel32.CreateMutexW(
wintypes.INT(0),
wintypes.INT(0),
self._mutexName) So assuming that |
@frerich I agree with the way you read the documentation. Still, I can produce that handle to mutexes get invalidated on my machine with current master. (now nearly all integration tests are affected). Sorry I never mentioned this explicitly but its what I meant with "last error 6" (= ERROR_INVALID_HANDLE, 6 (0x6), The handle is invalid.). I don't have any other so far, how the handle could get invalidated. I'm not sure how to proceed from here, since me and our CI machine are apparently to only people affected so far :) - The only common thing is that its Windows 8 on both - I can try and see if I can reproduce the issue with Windows 7 or 10. |
I wonder whether this is really just some quirk about mutexes which is not understood and lock files (cf. #82) would be more predictable. |
- Create handles to mutexes shortly before usage
In #650cf8f I adapted the fix to latest master. Also I resolved the issue now by moving |
Thanks, merged this now. I think this is still quite obscure, but I guess this doesn't hurt - and if it helps in your case, let's see how it goes. Thanks for all your effort! |
On my machine TestHeaderChange.testDirect() fails in the second compileAndLink
with ERROR_INVALID_HANDLE
I'm not sure why the mutex gets invalidated in this case. Anyone else experiecing this or. I suspected the security descriptor https://msdn.microsoft.com/en-us/library/windows/desktop/ms682411(v=vs.85).aspx states
That would explain why it happens only on my setup, but then this should happen more regularly?