Skip to content

🌿 Fix flaky tests and improve test utilities #443

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 22, 2025
Merged

Conversation

Milly
Copy link
Contributor

@Milly Milly commented Jun 21, 2025

Summary

This PR addresses flaky tests in the denops server test suite by improving timing, error handling, and test utilities.

Changes

  • Refactored wait helper:
    • Replaced custom implementation with standard async utilities for better reliability
  • Improved shared server utility:
    • Removed unnecessary try-catch blocks that could hide errors
    • Added clearer error messages when server fails to start
  • Fixed flaky server tests:
    • Added proper async handling for concurrent server operations
    • Increased timeouts for server status checks
    • Improved error handling and assertion messages
    • Added explicit wait conditions to ensure operations complete before assertions

Test plan

  • All tests pass locally
  • No flaky failures observed in multiple test runs
  • CI tests pass

Related issues

Fixes intermittent test failures in CI, particularly in server start/stop/status tests.

Summary by CodeRabbit

  • Tests
    • Improved reliability and clarity of several test cases by introducing explicit delays, asynchronous handling, and retry logic to address flakiness.
    • Updated test step names to indicate potential flakiness.
    • Enhanced test setup for status checks before readiness events.
    • Refined error messaging and resource management in test utilities.
  • Refactor
    • Simplified internal logic in utility functions for aborting operations and waiting with timeouts, leveraging built-in utilities for improved maintainability.

Milly added 3 commits June 21, 2025 22:52
Replace manual timeout/interval handling with @std/async's abortable and delay functions.
Using AbortSignal.timeout prevents resource leaks compared to setTimeout.
Remove unnecessary try-catch block in abort function and improve error message clarity when server fails to return an address.
- Add proper async handling for concurrent server start calls
- Increase timeouts for server status checks to reduce flakiness
- Improve error handling and assertion messages for better debugging
- Add explicit wait conditions to ensure operations complete before assertions
Copy link

coderabbitai bot commented Jun 21, 2025

Walkthrough

This update refactors and stabilizes several server-related test cases by introducing explicit asynchronous handling, delays, and retry logic. It also simplifies utility functions for waiting and aborting, clarifies error messages, and marks certain flaky test steps. No changes are made to exported function signatures.

Changes

File(s) Change Summary
tests/denops/runtime/functions/server/close_test.ts Added explicit delay before status check; marked flaky steps in test names.
tests/denops/runtime/functions/server/start_test.ts Refactored to handle third server start asynchronously with error capture and completion flag.
tests/denops/runtime/functions/server/status_test.ts Introduced Vim function to capture server status before ready event; improved flakiness handling.
tests/denops/testutil/shared_server.ts Simplified abort logic, clarified error messages, made port parsing explicit.
tests/denops/testutil/shared_server_test.ts Renamed and refactored timeout rejection test to use retry logic and resource disposal.
tests/denops/testutil/wait.ts Refactored to use abortable polling and delay utilities, simplifying timeout/interval handling.

Sequence Diagram(s)

sequenceDiagram
    participant Test as Test Case
    participant Vim as Vimscript Function
    participant Server as Denops Server

    Test->>Vim: Start server (async)
    Vim->>Server: denops#server#start()
    Server-->>Vim: Starting/Preparing state
    Vim-->>Test: Set completion flag
    Test->>Vim: Check server status before ready event
    Vim->>Server: denops#server#status()
    Server-->>Vim: 'preparing'
    Vim-->>Test: Return status
Loading

Possibly related PRs

  • vim-denops/denops.vim#405: Modifies and organizes the same server-related test files, focusing on asynchronous handling and flakiness.
  • vim-denops/denops.vim#378: Implements asynchronous server start/stop, close timeout, and new server status states, which are directly tested by these changes.

Suggested labels

enhancement

Suggested reviewers

  • Shougo

Poem

In the warren where servers wake,
Tests now pause, for flakiness' sake.
With timers, retries, and careful delay,
The rabbit ensures things work the right way.
Refactored waits, errors more clear—
Fluffy code, less flaky fear!
🐇✨


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 10ae0c8 and c6a1901.

📒 Files selected for processing (6)
  • tests/denops/runtime/functions/server/close_test.ts (1 hunks)
  • tests/denops/runtime/functions/server/start_test.ts (2 hunks)
  • tests/denops/runtime/functions/server/status_test.ts (2 hunks)
  • tests/denops/testutil/shared_server.ts (2 hunks)
  • tests/denops/testutil/shared_server_test.ts (2 hunks)
  • tests/denops/testutil/wait.ts (2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (3)
tests/denops/testutil/shared_server_test.ts (1)
tests/denops/testutil/shared_server.ts (1)
  • useSharedServer (40-114)
tests/denops/runtime/functions/server/start_test.ts (1)
tests/denops/testutil/wait.ts (1)
  • wait (28-58)
tests/denops/runtime/functions/server/status_test.ts (1)
tests/denops/testutil/wait.ts (1)
  • wait (28-58)
⏰ Context from checks skipped due to timeout of 90000ms (9)
  • GitHub Check: test (ubuntu-latest, 1.x, v9.1.0448, v0.10.0)
  • GitHub Check: test (macos-latest, 2.x, v9.1.0448, v0.10.0)
  • GitHub Check: test (ubuntu-latest, 2.x, v9.1.0448, v0.10.0)
  • GitHub Check: test (ubuntu-latest, 1.45.0, v9.1.0448, v0.10.0)
  • GitHub Check: test (macos-latest, 1.x, v9.1.0448, v0.10.0)
  • GitHub Check: test (macos-latest, 1.45.0, v9.1.0448, v0.10.0)
  • GitHub Check: test (windows-latest, 1.45.0, v9.1.0448, v0.10.0)
  • GitHub Check: test (windows-latest, 2.x, v9.1.0448, v0.10.0)
  • GitHub Check: test (windows-latest, 1.x, v9.1.0448, v0.10.0)
🔇 Additional comments (12)
tests/denops/testutil/shared_server.ts (2)

84-86: LGTM: Simplified abort logic.

Removing the try-catch around aborter.abort(reason) simplifies the code. The AbortController's abort method is safe to call multiple times, so the error handling was likely unnecessary.


95-101: LGTM: Improved error messaging and parsing clarity.

The changes enhance code readability:

  • More specific assertion error message when server doesn't return an address
  • Cleaner destructuring with renamed portStr variable
  • Explicit port parsing into a dedicated variable

These changes make the parsing logic more explicit and error messages more helpful for debugging.

tests/denops/runtime/functions/server/close_test.ts (2)

220-221: LGTM: Added delay to stabilize timeout test.

The explicit 10ms delay allows the server close timeout to properly trigger before checking the status. This addresses timing issues in the shared-server timeout scenario.


227-235: LGTM: Marked flaky test steps appropriately.

Adding "(flaky)" markers to test step names clearly identifies potentially unstable assertions. This helps with test maintenance and expectations while the underlying timing issues are being addressed.

tests/denops/testutil/shared_server_test.ts (1)

108-118: LGTM: Improved flaky timeout test with retry logic.

The refactored test properly addresses flakiness by:

  • Using retry() to handle intermittent timeout behavior
  • Implementing proper resource disposal with using statement
  • Clear test naming indicating potential flakiness

This approach maintains test coverage while acknowledging the inherent timing variability in timeout scenarios.

tests/denops/runtime/functions/server/status_test.ts (2)

33-38: LGTM: Clever asynchronous status capture mechanism.

The TestDenopsServerStatusBeforeReady function elegantly solves the timing issue by continuously polling and storing the server status until the DenopsReady event fires. Using a timer to re-invoke itself ensures the status is captured before the event occurs.


55-58: LGTM: Deterministic status retrieval.

Waiting for the ready event before retrieving the stored status from g:__test_denops_server_status_before_ready makes this test deterministic. This replaces the previous flaky inline evaluation with a reliable approach.

tests/denops/testutil/wait.ts (1)

28-58: LGTM: Excellent refactoring using standard library utilities.

The refactored wait function is significantly cleaner and more reliable:

Improvements:

  • Uses AbortSignal.timeout() for built-in timeout handling
  • Leverages abortable() wrapper for clean cancellation
  • Replaces manual Promise.race with infinite loop + delay() with signal
  • Simplified error handling by checking signal.reason

Benefits:

  • More robust timeout and cancellation behavior
  • Easier to understand control flow
  • Reduces potential race conditions from manual timeout management
  • Leverages well-tested standard library primitives

This change should improve the reliability of all tests that depend on the wait utility.

tests/denops/runtime/functions/server/start_test.ts (4)

146-156: Well-structured asynchronous error handling for the third server start call.

The VimScript function properly wraps the server start call in a try-catch-finally block, ensuring that completion is always signaled via the flag variable regardless of success or failure. This approach provides better error isolation and debugging capabilities compared to the previous synchronous approach.


158-162: Proper use of the wait utility for asynchronous completion.

The wait call correctly polls for the completion flag with a descriptive error message, ensuring the test doesn't proceed until the asynchronous timer callback has finished executing. This addresses the race condition that was likely causing flakiness in the original implementation.


164-168: Comprehensive error validation before proceeding with assertions.

The error checking ensures that no exceptions were thrown during the third server start call, providing clear failure information if something goes wrong. This separation of error validation from result validation improves test clarity and debugging.


179-188: Good practice labeling the flaky test step.

The explicit "(flaky)" label in the test step name provides valuable context for maintainers and helps identify tests that may need additional attention in CI environments. This transparency is helpful for debugging intermittent failures.

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

codecov bot commented Jun 21, 2025

Codecov Report

Attention: Patch coverage is 94.73684% with 1 line in your changes missing coverage. Please review.

Project coverage is 95.83%. Comparing base (10ae0c8) to head (c6a1901).
Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
tests/denops/testutil/shared_server.ts 80.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #443      +/-   ##
==========================================
+ Coverage   95.59%   95.83%   +0.23%     
==========================================
  Files          25       25              
  Lines        1430     1415      -15     
  Branches      183      181       -2     
==========================================
- Hits         1367     1356      -11     
+ Misses         60       56       -4     
  Partials        3        3              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Milly
Copy link
Contributor Author

Milly commented Jun 21, 2025

Flaky failures tend to occur more frequently in high-performance environments. This happens in two scenarios:

  1. Test suite runs too fast: The test assertions execute before the system under test has completed its operations, leading to premature checks that fail intermittently.

  2. Process under test runs too fast: The tested process completes or changes state faster than the test suite expects, causing timing-dependent assertions to miss their expected conditions.

Both cases create race conditions where the relative speed difference between the test suite and the system under test causes unreliable test results. This is why adding proper synchronization, explicit waits, and timeout adjustments is crucial for test stability across different hardware configurations.

@Milly Milly requested a review from lambdalisue June 21, 2025 14:23
@lambdalisue lambdalisue merged commit d4d2c4e into main Jun 22, 2025
12 of 13 checks passed
@lambdalisue lambdalisue deleted the fix-tests-flaky branch June 22, 2025 02:21
@lambdalisue
Copy link
Member

nice!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants