Skip to content
This repository was archived by the owner on Feb 4, 2020. It is now read-only.

Conversation

hubx
Copy link
Contributor

@hubx hubx commented Aug 11, 2016

On my machine TestHeaderChange.testDirect() fails in the second compileAndLink

       with open("version.h", "w") as header:
            header.write("#define VERSION 2")

self._compileAndLink()

with ERROR_INVALID_HANDLE

with cache.lock, cache.statistics as stats:
File "clcache.py", line 211, in enter
self.acquire()
File "clcache.py", line 227, in acquire
raise CacheLockException(errorString)
main.CacheLockException: Error! WaitForSingleObject returns -1, last error 6

I'm not sure why the mutex gets invalidated in this case. Anyone else experiecing this or. I suspected the security descriptor https://msdn.microsoft.com/en-us/library/windows/desktop/ms682411(v=vs.85).aspx states

If lpMutexAttributes is NULL, the mutex gets a default security descriptor. The ACLs in the default security descriptor for a mutex come from the primary or impersonation token of the creator.

That would explain why it happens only on my setup, but then this should happen more regularly?

@hubx hubx force-pushed the mutex-fix branch 3 times, most recently from 177171f to e778f2d Compare August 11, 2016 13:17
@codecov-io
Copy link

codecov-io commented Aug 11, 2016

Current coverage is 88.79% (diff: 85.71%)

Merging #201 into master will decrease coverage by 0.04%

@@             master       #201   diff @@
==========================================
  Files             1          1          
  Lines           986        991     +5   
  Methods           0          0          
  Messages          0          0          
  Branches        156        158     +2   
==========================================
+ Hits            876        880     +4   
  Misses           83         83          
- Partials         27         28     +1   

Powered by Codecov. Last update 76332b3...650cf8f

@frerich
Copy link
Owner

frerich commented Aug 21, 2016

Thanks for not only noticing a bug, but also proposing a patch to fix it! However, the code seems somewhat suspicious to me...

Can you reproduce the issue? If so, does it help if you change the code such that the return value of the CreateMutex call is checked? https://msdn.microsoft.com/en-us/library/windows/desktop/ms682411(v=vs.85).aspx explains that NULL should be returned. I'd be curious to hear what GetLastError returns in that case. Maybe it's something which warrants re-trying.

@hubx
Copy link
Contributor Author

hubx commented Aug 24, 2016

I looks like the issue does not occur in master since 1b3f3c1. But I think it just mitigated the race condition with staying in the lock just a little bit longer by adding sort_keys=True or similar

For the recommendation of CreateMutex there is an assert self._mutex in the __init__ which should prevent self._mutex being NULL or None

with

    def __init__(self, mutexName, timeoutMs):
        ...
        self._mutex = windll.kernel32.CreateMutexW(
            wintypes.INT(0),
            wintypes.INT(0),
            mutexName)
        self.mutextName= mutexName
        self.error = windll.kernel32.GetLastError()
        print("<Handle> %s = %s, %s" % (self._mutex, self.mutextName, self.error))

and

    def acquire(self):
            ...
            errorString = 'Error! WaitForSingleObject returns {result}, last error {error}, {mutex}, {create_mutext_error}'.format(
                result=result,
                error=windll.kernel32.GetLastError(),
                mutex=self.mutextName,
                create_mutext_error=self.error
            )

One can see that the mutex was created with ERROR_ALREADY_EXISTS, as many others.
I see a log entry like

>>>testRecompile
<Handle> 396 = Local\C--Users-hesse-clcache, 183
recompile1.cpp
<Handle> 408 = Local\C--Users-hesse-clcache, 183
recompile1.cpp
.<Handle> 408 = Local\C--Users-hesse-clcache, 183
recompile3.cpp
<Handle> 392 = Local\C--Users-hesse-clcache, 183
recompile3.cpp
.<Handle> 388 = Local\C--Users-hesse-clcache, 183
recompile2.cpp
<Handle> 392 = Local\C--Users-hesse-clcache, 183
recompile2.cpp
.<Handle> 412 = Local\C--Users-hesse-clcache, 183
non-ascii-message-ansi.c
Message containing special chars: ö ä ü ß â
<Handle> 396 = Local\C--Users-hesse-clcache, 183
non-ascii-message-utf16.c
Message containing special chars: ö ä ü ß â
.>>> testdirect
<Handle> 392 = Local\C--Users-hesse-clcache, 183
main.cpp
<Handle> 392 = Local\C--Users-hesse-clcache, 183
Traceback (most recent call last):
  File "C:\Users\hesse\code\clcache\clcache.py", line 1531, in <module>
    sys.exit(main())
  File "C:\Users\hesse\code\clcache\clcache.py", line 1422, in main
    exitCode, compilerStdout, compilerStderr = processCompileRequest(cache, comiler, sys.argv)
  File "C:\Users\hesse\code\clcache\clcache.py", line 1452, in processCompileReuest
    return processDirect(cache, objectFile, compiler, cmdLine, sourceFiles[0])
  File "C:\Users\hesse\code\clcache\clcache.py", line 1510, in processDirect
    compilerResult = postProcessing(compilerResult)
  File "C:\Users\hesse\code\clcache\clcache.py", line 1499, in <lambda>
    cache, objectFile, manifestSection, manifest, manifestHash, includesContentash, compilerResult)
  File "C:\Users\hesse\code\clcache\clcache.py", line 1321, in postprocessHeader
ChangedMiss
    with cache.lock, cache.statistics as stats:
  File "C:\Users\hesse\code\clcache\clcache.py", line 224, in __enter__
    self.acquire()
  File "C:\Users\hesse\code\clcache\clcache.py", line 240, in acquire
    raise CacheLockException(errorString)
__main__.CacheLockException: Error! WaitForSingleObject returns -1, last error 6, Local\C--Users-hesse-clcache, 183

So we can see that handles to the mutex get reused and are shared across test cases or maybe invocations of clcache. If

396 = Local\C--Users-hesse-clcache, 183
recompile1.cpp

closes the last handle 396 to the mutex, then of course the Kernel can rid of the mutex, even if another invocation holds the same handle, it's invalid. My WinAPI experience is limited and I would be glad if someone could take a look, otherwise I would need to brush up here :)

So the question that are there for me:

  • Why have do different test case invocation the same handle on the mutex?
    • Is CacheLock somehow shared?
    • What happens if child processes request the same named handle? Do they get an own handle or the handle of parent or sibling processes? How is this case usually handled?
  • Why does it occur in this scenario, but not other people in real world usages?

@hubx
Copy link
Contributor Author

hubx commented Aug 24, 2016

I think with child processes we may can avoid this by using DuplicateHandle
https://msdn.microsoft.com/en-us/library/windows/desktop/ms724251(v=vs.85).aspx

    def __init__(self, mutexName, timeoutMs):
        ...
        self._mutex= windll.kernel32.CreateMutexW(..)
        if windll.kernel32.GetLastError() == 183:
            self._mutex = windll.kernel32.DuplicateHandle(..)

I see similar things in other projects, e.g. here https://src.chromium.org/viewvc/chrome/trunk/src/sandbox/win/src/sharedmem_ipc_server.cc?r1=252782&r2=252781&pathrev=252782

@frerich
Copy link
Owner

frerich commented Aug 24, 2016

So the question that are there for me:

Why have do different test case invocation the same handle on the mutex?

Is CacheLock somehow shared?

Yes. Mutexes created via CreateMutex are system-wide (i.e. not just local to the current process).

What happens if child processes request the same named handle? Do they get an own handle or the handle of parent or sibling processes?

The CreateMutex documentation explains:

Multiple processes can have handles of the same mutex object, enabling use of the object for interprocess synchronization. The following object-sharing mechanisms are available:

  • A child process created by the CreateProcess function can inherit a handle to a mutex object if the lpMutexAttributes parameter of CreateMutex enabled inheritance. This mechanism works for both named and unnamed mutexes.
  • A process can specify the handle to a mutex object in a call to the DuplicateHandle function to create a duplicate handle that can be used by another process. This mechanism works for both named and unnamed mutexes.
  • A process can specify a named mutex in a call to the OpenMutex or CreateMutex function to retrieve a handle to the mutex object.

clcache uses the latter variant: you can call CreateMutex with the same mutex name from different processes, and get multiple handles to the same mutex object. All but the first invocation will also set the error code to ERROR_ALREADY_EXISTS, but the CreateMutex function succeeds (you can check the error code to test whether you're the first one to create the mutex).

Why does it occur in this scenario, but not other people in real world usages?

That I don't know.

@frerich
Copy link
Owner

frerich commented Aug 24, 2016

Quite frankly, it escapes me why DuplicateHandle would help here...

Can you reproduce the issue? In that case, we could at least try a couple of things to see whether it helps.

@frerich
Copy link
Owner

frerich commented Sep 10, 2016

@hubx Can you still reproduce the issue with the recent clcache 3.3.0 release? As it is, I can't seem to reproduce the bug and I also don't see anything obviously wrong in the code so I can't really proceed with this ticket.

@hubx
Copy link
Contributor Author

hubx commented Sep 27, 2016

and get multiple handles to the same mutex object

Sure mutexes should be shared across processes. My point was that different processes use the same handle across processes. If you see my previous comment, I wanted to point that out that handle 396 occurs twice, 392 thrice (#201 (comment)). So I assume actually the first pattern in the quoted documentation is used:

A child process created by the CreateProcess function can inherit a handle to a mutex object if the lpMutexAttributes parameter of CreateMutex enabled inheritance. This mechanism works for both named and unnamed mutexes.

This can be specific of how the integrationtest.py call clcache.py in this case (4 sibling process called after each other). But all in all I'm fine with closing this issue, we can reopen/references this ticket here once we see this issue again.

@frerich
Copy link
Owner

frerich commented Sep 28, 2016

@hubx Ah, so in #201 (comment) there are two concurrent processes having the same handle? My interpretation was that one process creates the mutex handle, then closes it, then another process creates the handle -- and simply happens to receive the same handle value because it's free. So you're saying that there are two processes with the same handle at the same time?

For what it's worth, the CreateMutex documentation documents the first function argument as

A pointer to a SECURITY_ATTRIBUTES structure. If this parameter is NULL, the handle cannot be inherited by child processes.

...and in the code, we do

        self._mutex = windll.kernel32.CreateMutexW(
            wintypes.INT(0),
            wintypes.INT(0),
            self._mutexName)

So assuming that wintypes.INT(0) is an appropriate way to pass NULL from Python, I'd assume that the mutex handle is not inherited and thus there are no two processes using the same handle. Maybe I'm misreading this?

@hubx
Copy link
Contributor Author

hubx commented Oct 20, 2016

@frerich I agree with the way you read the documentation. Still, I can produce that handle to mutexes get invalidated on my machine with current master. (now nearly all integration tests are affected). Sorry I never mentioned this explicitly but its what I meant with "last error 6" (= ERROR_INVALID_HANDLE, 6 (0x6), The handle is invalid.). I don't have any other so far, how the handle could get invalidated. I'm not sure how to proceed from here, since me and our CI machine are apparently to only people affected so far :) - The only common thing is that its Windows 8 on both - I can try and see if I can reproduce the issue with Windows 7 or 10.

@frerich
Copy link
Owner

frerich commented Oct 20, 2016

I wonder whether this is really just some quirk about mutexes which is not understood and lock files (cf. #82) would be more predictable.

- Create handles to mutexes shortly before usage
@hubx
Copy link
Contributor Author

hubx commented Oct 21, 2016

In #650cf8f I adapted the fix to latest master. Also I resolved the issue now by moving CreateMutexW from __init__ to __enter__ shortly before acquiring the lock. Similar to what 1 and 2 do. However I still don't understand how the handle to the mutex could get invalidated between __init__ and __enter__ before. This has nice side effect that mutexes only get created if a with Cachlock() contstruct is actually used.

@frerich frerich merged commit 041df7c into frerich:master Oct 25, 2016
@frerich
Copy link
Owner

frerich commented Oct 25, 2016

Thanks, merged this now. I think this is still quite obscure, but I guess this doesn't hurt - and if it helps in your case, let's see how it goes.

Thanks for all your effort!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants