Skip to content

Conversation

@DylanRussell
Copy link
Contributor

Description

We (google) saw a deadlock occur when logging.config.dictConfig is called after the OTEL LoggingHandler is attached to the root logger.

This happened b/c dictConfig acquires logging._lock and then clearsHandlers which then calls shutdown on the OTEL LoggingHandler, which calls flush.

flush triggered an export call to our exporter. Deep inside our exporter we spin up a new thread to handle auth, and that thread also tried to acquire logging._lock resulting in a deadlock..

To fix this I updated LoggingHandler.flush to execute force_flush in a separate thread, and not to block/wait before returning.. This should be reasonable because we don't return the result of the force flush anyway, so why block there.

This seems to reliably fix the deadlock, but I think it's technically possible for this new thread to spin up and reach the lock before logging.config.dictConfig releases it's lock..

Another simple fix is to set flushOnClose to true, so that the OTEL LogHandler.flush is not called during shutdown. This seems fine to me as well because we call shutdown on exit anyway. Either of these solutions seem fine.

Also considered making our exporter async, but we don't have support for async exporter's in this repo yet.

Type of change

  • [x ] Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

I've added a unit test to show how the deadlock happens.. I don't think that test should actually be submitted because of the chance a deadlock can occur and lock up the test suite..

Does This PR Require a Contrib Repo Change?

  • Yes. - Link to PR:
  • [x ] No.

Checklist:

  • [ x] Followed the style guidelines of this project
  • Changelogs have been updated
  • [ x] Unit tests have been added
  • [x ] Documentation has been updated

@DylanRussell DylanRussell requested a review from a team as a code owner June 13, 2025 19:57
@xrmx xrmx moved this to Ready for review in @xrmx's Python PR digest Jun 23, 2025
@aabmass
Copy link
Member

aabmass commented Jun 23, 2025

I think this is good since it fixes the deadlock issue which is separate from auto instrumentation.

But I think this issue only comes up when the Logging handler is added before someone calls dictConfig(), which is probably during auto instrumentation. How do we expect auto instrumentation to behave when the user calls dictConfig()?

@DylanRussell
Copy link
Contributor Author

Ack. I've added function overwriting for dictConfig and fileConfig like we have for basicConfig... Let me know what you think

@aabmass
Copy link
Member

aabmass commented Jul 16, 2025

@DylanRussell can you rebase and I can merge?

@DylanRussell
Copy link
Contributor Author

Alright I've rebased. Should be all set now

@aabmass aabmass enabled auto-merge (squash) July 17, 2025 18:42
@aabmass aabmass merged commit 57cb935 into open-telemetry:main Jul 17, 2025
466 of 472 checks passed
@github-project-automation github-project-automation bot moved this from Ready for review to Done in @xrmx's Python PR digest Jul 17, 2025
lukeina2z added a commit to lukeina2z/aws-otel-python-instrumentation that referenced this pull request Oct 28, 2025
…58b0

This PR updates the upstream OpenTelemetry Python dependency to its September 2025 release, upgrading from version 1.33.1/0.54b1 to 1.37.0/0.58b0.

It also resolves several conflicts between the following OTel PRs and existing ADOT patches:

starlette: Remove maximum version constraint
open-telemetry/opentelemetry-python-contrib#3456

Make a BatchProcessor class which both BatchSpanRecordProcessor and BatchLogRecordProcessor can use
open-telemetry/opentelemetry-python#4562

Make exporter timeout encompass retries/backoffs, add jitter to backoffs, cleanup code a bit
open-telemetry/opentelemetry-python#4564

Update BatchSpanProcessor to use new BatchProcessor class
open-telemetry/opentelemetry-python#4580

Fix issue where deadlock can occur over logging._lock
open-telemetry/opentelemetry-python#4636

Tests Performed

tox -e lint
tox -e spellcheck
tox -e 3.9-test-aws-opentelemetry-distro
tox -e 3.10-test-aws-opentelemetry-distro
tox -e 3.11-test-aws-opentelemetry-distro
tox -e 3.12-test-aws-opentelemetry-distro
tox -e 3.13-test-aws-opentelemetry-distro

Smoke/contract tests: ./gradlew appsignals-tests:contract-tests:contractTests
lukeina2z added a commit to lukeina2z/aws-otel-python-instrumentation that referenced this pull request Oct 28, 2025
…58b0

This PR updates the upstream OpenTelemetry Python dependency to its September 2025 release, upgrading from version 1.33.1/0.54b1 to 1.37.0/0.58b0.

It also resolves several conflicts between the following OTel PRs and existing ADOT patches:

starlette: Remove maximum version constraint
open-telemetry/opentelemetry-python-contrib#3456

Make a BatchProcessor class which both BatchSpanRecordProcessor and BatchLogRecordProcessor can use
open-telemetry/opentelemetry-python#4562

Make exporter timeout encompass retries/backoffs, add jitter to backoffs, cleanup code a bit
open-telemetry/opentelemetry-python#4564

Update BatchSpanProcessor to use new BatchProcessor class
open-telemetry/opentelemetry-python#4580

Fix issue where deadlock can occur over logging._lock
open-telemetry/opentelemetry-python#4636

Tests Performed

tox -e lint
tox -e spellcheck
tox -e 3.9-test-aws-opentelemetry-distro
tox -e 3.10-test-aws-opentelemetry-distro
tox -e 3.11-test-aws-opentelemetry-distro
tox -e 3.12-test-aws-opentelemetry-distro
tox -e 3.13-test-aws-opentelemetry-distro

Smoke/contract tests: ./gradlew appsignals-tests:contract-tests:contractTests
lukeina2z added a commit to lukeina2z/aws-otel-python-instrumentation that referenced this pull request Oct 29, 2025
…58b0

This PR updates the upstream OpenTelemetry Python dependency to its September 2025 release, upgrading from version 1.33.1/0.54b1 to 1.37.0/0.58b0.

It also resolves several conflicts between the following OTel PRs and existing ADOT patches:

starlette: Remove maximum version constraint
open-telemetry/opentelemetry-python-contrib#3456

Make a BatchProcessor class which both BatchSpanRecordProcessor and BatchLogRecordProcessor can use
open-telemetry/opentelemetry-python#4562

Make exporter timeout encompass retries/backoffs, add jitter to backoffs, cleanup code a bit
open-telemetry/opentelemetry-python#4564

Update BatchSpanProcessor to use new BatchProcessor class
open-telemetry/opentelemetry-python#4580

Fix issue where deadlock can occur over logging._lock
open-telemetry/opentelemetry-python#4636

Tests Performed

tox -e lint
tox -e spellcheck
tox -e 3.9-test-aws-opentelemetry-distro
tox -e 3.10-test-aws-opentelemetry-distro
tox -e 3.11-test-aws-opentelemetry-distro
tox -e 3.12-test-aws-opentelemetry-distro
tox -e 3.13-test-aws-opentelemetry-distro

Smoke/contract tests: ./gradlew appsignals-tests:contract-tests:contractTests
lukeina2z added a commit to lukeina2z/aws-otel-python-instrumentation that referenced this pull request Oct 29, 2025
…58b0

This PR updates the upstream OpenTelemetry Python dependency to its September 2025 release, upgrading from version 1.33.1/0.54b1 to 1.37.0/0.58b0.

It also resolves several conflicts between the following OTel PRs and existing ADOT patches:

starlette: Remove maximum version constraint
open-telemetry/opentelemetry-python-contrib#3456

Make a BatchProcessor class which both BatchSpanRecordProcessor and BatchLogRecordProcessor can use
open-telemetry/opentelemetry-python#4562

Make exporter timeout encompass retries/backoffs, add jitter to backoffs, cleanup code a bit
open-telemetry/opentelemetry-python#4564

Update BatchSpanProcessor to use new BatchProcessor class
open-telemetry/opentelemetry-python#4580

Fix issue where deadlock can occur over logging._lock
open-telemetry/opentelemetry-python#4636

Tests Performed

tox -e lint
tox -e spellcheck
tox -e 3.9-test-aws-opentelemetry-distro
tox -e 3.10-test-aws-opentelemetry-distro
tox -e 3.11-test-aws-opentelemetry-distro
tox -e 3.12-test-aws-opentelemetry-distro
tox -e 3.13-test-aws-opentelemetry-distro

Smoke/contract tests: ./gradlew appsignals-tests:contract-tests:contractTests
mxiamxia pushed a commit to aws-observability/aws-otel-python-instrumentation that referenced this pull request Oct 30, 2025
…58b0 (#524)

This PR updates the upstream OpenTelemetry Python dependency to its
September 2025 release; Upgrading from version 1.33.1/0.54b1 to
1.37.0/0.58b0.

It also resolves several conflicts between the following OTel PRs and
existing ADOT patches:

starlette: Remove maximum version constraint  

open-telemetry/opentelemetry-python-contrib#3456

Make a BatchProcessor class which both BatchSpanRecordProcessor and
BatchLogRecordProcessor can use
open-telemetry/opentelemetry-python#4562

Make exporter timeout encompass retries/backoffs, add jitter to
backoffs, cleanup code a bit
open-telemetry/opentelemetry-python#4564

Update BatchSpanProcessor to use new BatchProcessor class 
open-telemetry/opentelemetry-python#4580

Fix issue where deadlock can occur over logging._lock 
open-telemetry/opentelemetry-python#4636

Tests Performed

tox -e lint
tox -e spellcheck
tox -e 3.9-test-aws-opentelemetry-distro
tox -e 3.10-test-aws-opentelemetry-distro
tox -e 3.11-test-aws-opentelemetry-distro
tox -e 3.12-test-aws-opentelemetry-distro
tox -e 3.13-test-aws-opentelemetry-distro

Smoke/contract tests: ./gradlew
appsignals-tests:contract-tests:contractTests

By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants