-
Notifications
You must be signed in to change notification settings - Fork 757
Fix issue where deadlock can occur over logging._lock #4636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issue where deadlock can occur over logging._lock #4636
Conversation
|
I think this is good since it fixes the deadlock issue which is separate from auto instrumentation. But I think this issue only comes up when the Logging handler is added before someone calls |
|
Ack. I've added function overwriting for dictConfig and fileConfig like we have for basicConfig... Let me know what you think |
…elemetry-python into fix_deadlock_attempt2
opentelemetry-sdk/src/opentelemetry/sdk/_logs/_internal/__init__.py
Outdated
Show resolved
Hide resolved
opentelemetry-sdk/src/opentelemetry/sdk/_configuration/__init__.py
Outdated
Show resolved
Hide resolved
opentelemetry-sdk/src/opentelemetry/sdk/_configuration/__init__.py
Outdated
Show resolved
Hide resolved
opentelemetry-sdk/src/opentelemetry/sdk/_configuration/__init__.py
Outdated
Show resolved
Hide resolved
opentelemetry-sdk/src/opentelemetry/sdk/_configuration/__init__.py
Outdated
Show resolved
Hide resolved
|
@DylanRussell can you rebase and I can merge? |
|
Alright I've rebased. Should be all set now |
…58b0 This PR updates the upstream OpenTelemetry Python dependency to its September 2025 release, upgrading from version 1.33.1/0.54b1 to 1.37.0/0.58b0. It also resolves several conflicts between the following OTel PRs and existing ADOT patches: starlette: Remove maximum version constraint open-telemetry/opentelemetry-python-contrib#3456 Make a BatchProcessor class which both BatchSpanRecordProcessor and BatchLogRecordProcessor can use open-telemetry/opentelemetry-python#4562 Make exporter timeout encompass retries/backoffs, add jitter to backoffs, cleanup code a bit open-telemetry/opentelemetry-python#4564 Update BatchSpanProcessor to use new BatchProcessor class open-telemetry/opentelemetry-python#4580 Fix issue where deadlock can occur over logging._lock open-telemetry/opentelemetry-python#4636 Tests Performed tox -e lint tox -e spellcheck tox -e 3.9-test-aws-opentelemetry-distro tox -e 3.10-test-aws-opentelemetry-distro tox -e 3.11-test-aws-opentelemetry-distro tox -e 3.12-test-aws-opentelemetry-distro tox -e 3.13-test-aws-opentelemetry-distro Smoke/contract tests: ./gradlew appsignals-tests:contract-tests:contractTests
…58b0 This PR updates the upstream OpenTelemetry Python dependency to its September 2025 release, upgrading from version 1.33.1/0.54b1 to 1.37.0/0.58b0. It also resolves several conflicts between the following OTel PRs and existing ADOT patches: starlette: Remove maximum version constraint open-telemetry/opentelemetry-python-contrib#3456 Make a BatchProcessor class which both BatchSpanRecordProcessor and BatchLogRecordProcessor can use open-telemetry/opentelemetry-python#4562 Make exporter timeout encompass retries/backoffs, add jitter to backoffs, cleanup code a bit open-telemetry/opentelemetry-python#4564 Update BatchSpanProcessor to use new BatchProcessor class open-telemetry/opentelemetry-python#4580 Fix issue where deadlock can occur over logging._lock open-telemetry/opentelemetry-python#4636 Tests Performed tox -e lint tox -e spellcheck tox -e 3.9-test-aws-opentelemetry-distro tox -e 3.10-test-aws-opentelemetry-distro tox -e 3.11-test-aws-opentelemetry-distro tox -e 3.12-test-aws-opentelemetry-distro tox -e 3.13-test-aws-opentelemetry-distro Smoke/contract tests: ./gradlew appsignals-tests:contract-tests:contractTests
…58b0 This PR updates the upstream OpenTelemetry Python dependency to its September 2025 release, upgrading from version 1.33.1/0.54b1 to 1.37.0/0.58b0. It also resolves several conflicts between the following OTel PRs and existing ADOT patches: starlette: Remove maximum version constraint open-telemetry/opentelemetry-python-contrib#3456 Make a BatchProcessor class which both BatchSpanRecordProcessor and BatchLogRecordProcessor can use open-telemetry/opentelemetry-python#4562 Make exporter timeout encompass retries/backoffs, add jitter to backoffs, cleanup code a bit open-telemetry/opentelemetry-python#4564 Update BatchSpanProcessor to use new BatchProcessor class open-telemetry/opentelemetry-python#4580 Fix issue where deadlock can occur over logging._lock open-telemetry/opentelemetry-python#4636 Tests Performed tox -e lint tox -e spellcheck tox -e 3.9-test-aws-opentelemetry-distro tox -e 3.10-test-aws-opentelemetry-distro tox -e 3.11-test-aws-opentelemetry-distro tox -e 3.12-test-aws-opentelemetry-distro tox -e 3.13-test-aws-opentelemetry-distro Smoke/contract tests: ./gradlew appsignals-tests:contract-tests:contractTests
…58b0 This PR updates the upstream OpenTelemetry Python dependency to its September 2025 release, upgrading from version 1.33.1/0.54b1 to 1.37.0/0.58b0. It also resolves several conflicts between the following OTel PRs and existing ADOT patches: starlette: Remove maximum version constraint open-telemetry/opentelemetry-python-contrib#3456 Make a BatchProcessor class which both BatchSpanRecordProcessor and BatchLogRecordProcessor can use open-telemetry/opentelemetry-python#4562 Make exporter timeout encompass retries/backoffs, add jitter to backoffs, cleanup code a bit open-telemetry/opentelemetry-python#4564 Update BatchSpanProcessor to use new BatchProcessor class open-telemetry/opentelemetry-python#4580 Fix issue where deadlock can occur over logging._lock open-telemetry/opentelemetry-python#4636 Tests Performed tox -e lint tox -e spellcheck tox -e 3.9-test-aws-opentelemetry-distro tox -e 3.10-test-aws-opentelemetry-distro tox -e 3.11-test-aws-opentelemetry-distro tox -e 3.12-test-aws-opentelemetry-distro tox -e 3.13-test-aws-opentelemetry-distro Smoke/contract tests: ./gradlew appsignals-tests:contract-tests:contractTests
…58b0 (#524) This PR updates the upstream OpenTelemetry Python dependency to its September 2025 release; Upgrading from version 1.33.1/0.54b1 to 1.37.0/0.58b0. It also resolves several conflicts between the following OTel PRs and existing ADOT patches: starlette: Remove maximum version constraint open-telemetry/opentelemetry-python-contrib#3456 Make a BatchProcessor class which both BatchSpanRecordProcessor and BatchLogRecordProcessor can use open-telemetry/opentelemetry-python#4562 Make exporter timeout encompass retries/backoffs, add jitter to backoffs, cleanup code a bit open-telemetry/opentelemetry-python#4564 Update BatchSpanProcessor to use new BatchProcessor class open-telemetry/opentelemetry-python#4580 Fix issue where deadlock can occur over logging._lock open-telemetry/opentelemetry-python#4636 Tests Performed tox -e lint tox -e spellcheck tox -e 3.9-test-aws-opentelemetry-distro tox -e 3.10-test-aws-opentelemetry-distro tox -e 3.11-test-aws-opentelemetry-distro tox -e 3.12-test-aws-opentelemetry-distro tox -e 3.13-test-aws-opentelemetry-distro Smoke/contract tests: ./gradlew appsignals-tests:contract-tests:contractTests By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Description
We (google) saw a deadlock occur when logging.config.dictConfig is called after the OTEL
LoggingHandleris attached to the root logger.This happened b/c
dictConfigacquireslogging._lockand thenclearsHandlerswhich then callsshutdownon the OTELLoggingHandler, which callsflush.flushtriggered anexportcall to ourexporter. Deep inside ourexporterwe spin up a new thread to handle auth, and that thread also tried to acquirelogging._lockresulting in a deadlock..To fix this I updated
LoggingHandler.flushto executeforce_flushin a separate thread, and not to block/wait before returning.. This should be reasonable because we don't return the result of the force flush anyway, so why block there.This seems to reliably fix the deadlock, but I think it's technically possible for this new thread to spin up and reach the lock before logging.config.dictConfig releases it's lock..
Another simple fix is to set
flushOnCloseto true, so that the OTELLogHandler.flushis not called duringshutdown. This seems fine to me as well because we callshutdownon exit anyway. Either of these solutions seem fine.Also considered making our exporter
async, but we don't have support forasyncexporter's in this repo yet.Type of change
How Has This Been Tested?
I've added a unit test to show how the deadlock happens.. I don't think that test should actually be submitted because of the chance a deadlock can occur and lock up the test suite..
Does This PR Require a Contrib Repo Change?
Checklist: