Skip to content

Conversation

@pwhelan
Copy link
Contributor

@pwhelan pwhelan commented Oct 7, 2025

Summary

To avoid slow startups this patch uses a oneshot interval to download the initial configuration when registering an agent to a fleet.


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • [N/A] Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

  • New Features

    • Calyptia Fleet input performs an initial one-time configuration fetch on startup with configurable retry behavior, pausing collection until initial setup completes.
  • Bug Fixes

    • Improved startup reliability with better error handling, resource cleanup, and predictable resume/cleanup behavior after the initial fetch.
  • Refactor

    • Unified and simplified the collection startup flow via a wrapper to streamline control flow and maintenance.

@coderabbitai
Copy link

coderabbitai bot commented Oct 7, 2025

Walkthrough

Adds a wrapper collect callback and a two-stage startup: an initial oneshot fetch (with retry and scheduling pause/resume) handled by in_calyptia_fleet_collect_once, and a wrapper in_calyptia_fleet_collect_cb that delegates to the original collect. Header constants and fields for initial fetch tracking were added.

Changes

Cohort / File(s) Summary
Collect flow wrapper and two-stage startup
plugins/in_calyptia_fleet/in_calyptia_fleet.c
Adds in_calyptia_fleet_collect_cb wrapper delegating to in_calyptia_fleet_collect; introduces in_calyptia_fleet_collect_once for an initial oneshot fetch with retry, pause/resume scheduling, unified ret/fleet_id_error return handling, and updates plugin registration to use the wrapper.
Public header additions for initial fetch tracking
plugins/in_calyptia_fleet/in_calyptia_fleet.h
Adds macros FLEET_INITIAL_MAX_TRIES, FLEET_INITIAL_RETRY_INTERVAL_SECONDS, FLEET_INITIAL_RETRY_INTERVAL_NANOSECONDS; adds initial_fd and initial_retries to flb_in_calyptia_fleet_config.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    actor FB as Fluent Bit Engine
    participant PF as Fleet Plugin
    participant MC as Main Collector
    participant OC as Oneshot Collector
    participant API as Fleet API

    rect rgba(229,240,255,0.6)
    note right of PF: Initialization
    FB->>PF: in_calyptia_fleet_init()
    alt Existing committed config
        PF-->>FB: Proceed with normal operation
    else Needs initial fetch
        PF->>MC: Pause schedule
        PF->>OC: Schedule oneshot collect (in_calyptia_fleet_collect_once)
    end
    end

    rect rgba(235,255,235,0.6)
    note over OC,API: Oneshot initial fetch (retry-capable)
    OC->>API: Fetch fleet config
    alt Success
        OC-->>PF: Commit config
        PF->>MC: Resume schedule
    else Error / retry
        loop Up to FLEET_INITIAL_MAX_TRIES
            OC->>API: Retry after interval
        end
        alt Still failing
            PF-->>FB: Init error and cleanup
        end
    end
    end

    rect rgba(255,248,229,0.6)
    note right of MC: Normal collection via wrapper
    FB->>PF: cb_collect -> in_calyptia_fleet_collect_cb()
    PF->>PF: Invoke original collect path (in_calyptia_fleet_collect)
    PF-->>FB: Return status
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested reviewers

  • edsiper
  • koleini
  • fujimotos

Poem

A hop, a oneshot, then I cheer,
I paused the dance, then fetched the gear.
Wrapped my call, retried with care,
Committed config, resumed the air.
— a rabbit’s tiny fleet-update prayer 🐇✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title clearly and concisely summarizes the primary change by indicating that the in_calyptia_fleet plugin now loads its configuration on initial startup, which aligns directly with the pull request’s main objective of implementing a one-shot startup configuration fetch. It is specific to the plugin and avoids unnecessary details or vague language. The phrasing is clear and immediately conveys the core enhancement to any reviewer scanning the PR history.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pwhelan-fleet-load-initial-startup

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@pwhelan
Copy link
Contributor Author

pwhelan commented Oct 7, 2025

Here is a debug log:

debug.log

And a valgrind log:

valgrind.log

Note: valgrind shows a memory leak in dlopen which is as of yet unresolved but is a one time penalty. It only ever allocates >4KB and only ever once, not once per configuration update.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
plugins/in_calyptia_fleet/in_calyptia_fleet.c (2)

2655-2682: Add cleanup for initial_fd collector.

The in_calyptia_fleet_destroy function doesn't clean up ctx->initial_fd if it was created but never fired (e.g., if initialization fails after line 2626). This could leak the collector resource.

Apply this diff to add cleanup for the initial collector:

 static void in_calyptia_fleet_destroy(struct flb_in_calyptia_fleet_config *ctx)
 {
     if (!ctx) {
         return;
     }

     if (ctx->fleet_url) {
         flb_sds_destroy(ctx->fleet_url);
     }

     if (ctx->fleet_files_url) {
         flb_sds_destroy(ctx->fleet_files_url);
     }

     if (ctx->fleet_id && ctx->fleet_id_found) {
         flb_sds_destroy(ctx->fleet_id);
     }

     if (ctx->collect_fd >= 0) {
         flb_input_collector_delete(ctx->collect_fd, ctx->ins);
     }

+    if (ctx->initial_fd >= 0) {
+        flb_input_collector_delete(ctx->initial_fd, ctx->ins);
+    }
+
     if (ctx->u) {
         flb_upstream_destroy(ctx->u);
     }

     flb_free(ctx);
 }

2490-2498: Initialize initial_fd to -1 for consistency.

The ctx->initial_fd field is not explicitly initialized, relying on flb_calloc to zero it. However, ctx->collect_fd is explicitly initialized to -1 at line 2496 for consistency with the cleanup checks that use >= 0. The same pattern should apply to initial_fd.

Apply this diff:

     ctx = flb_calloc(1, sizeof(struct flb_in_calyptia_fleet_config));
     if (!ctx) {
         flb_errno();
         return -1;
     }
     ctx->ins = in;
     ctx->collect_fd = -1;
+    ctx->initial_fd = -1;
     ctx->fleet_id_found = FLB_FALSE;
     ctx->config_timestamp = -1;
🧹 Nitpick comments (2)
plugins/in_calyptia_fleet/in_calyptia_fleet.h (1)

28-30: Clarify the retry limit naming.

FLEET_INITIAL_MAX_TRIES = 1 is confusing because it suggests one try, but the implementation actually allows the initial attempt plus one retry (2 total attempts). Consider renaming to FLEET_INITIAL_MAX_RETRIES or updating the value to 2 with a name like FLEET_INITIAL_MAX_ATTEMPTS for clarity.

plugins/in_calyptia_fleet/in_calyptia_fleet.c (1)

2076-2082: Consider removing the wrapper callback.

The in_calyptia_fleet_collect_cb wrapper adds an unnecessary level of indirection. The oneshot callback at line 2094 could call in_calyptia_fleet_collect directly, simplifying the code structure.

Apply this diff to simplify:

-static int in_calyptia_fleet_collect_cb(struct flb_input_instance *ins,
-                                     struct flb_config *config,
-                                     void *in_context)
-{
-    FLB_INPUT_RETURN(in_calyptia_fleet_collect(ins, config, in_context));
-}
-
 static int in_calyptia_fleet_collect_once(struct flb_input_instance *ins,
                                           struct flb_config *config,
                                           void *in_context)
 {
     int ret;
     struct flb_in_calyptia_fleet_config *ctx = in_context;
 
     /* pause the oneshot interval forever. */
     flb_input_collector_delete(ctx->initial_fd, ins);
 
     ret = in_calyptia_fleet_collect(ins, config, in_context);

Then update line 2602 and line 2747 to use in_calyptia_fleet_collect directly.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ffd27bc and 63a2dfd.

📒 Files selected for processing (2)
  • plugins/in_calyptia_fleet/in_calyptia_fleet.c (5 hunks)
  • plugins/in_calyptia_fleet/in_calyptia_fleet.h (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
plugins/in_calyptia_fleet/in_calyptia_fleet.c (3)
src/flb_input.c (5)
  • flb_input_collector_delete (2074-2093)
  • flb_input_set_collector_time (1685-1704)
  • flb_input_collector_start (1790-1809)
  • flb_input_collector_resume (2095-2151)
  • flb_input_collector_pause (2035-2072)
src/flb_upstream.c (1)
  • flb_upstream_destroy (656-698)
include/fluent-bit/flb_mem.h (1)
  • flb_free (126-128)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (25)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
  • GitHub Check: pr-compile-system-libs (-DFLB_PREFER_SYSTEM_LIBS=On, 3.31.6, clang, clang++, ubuntu-24.04, clang-14)
  • GitHub Check: PR - fuzzing test
🔇 Additional comments (1)
plugins/in_calyptia_fleet/in_calyptia_fleet.c (1)

2055-2074: LGTM!

The refactoring to use a goto label for error handling is appropriate and makes the control flow clearer.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
plugins/in_calyptia_fleet/in_calyptia_fleet.c (2)

2619-2621: Consider checking the return value.

calyptia_config_commit can fail and return FLB_FALSE. While not critical during initialization, logging an error on failure would aid troubleshooting.

     if (is_fleet_config(ctx, config)) {
-        calyptia_config_commit(ctx);
+        if (calyptia_config_commit(ctx) == FLB_FALSE) {
+            flb_plg_warn(ctx->ins, "failed to commit existing fleet config");
+        }
     }

2603-2639: Consider adding explanatory comment for two-stage startup.

The initialization logic implements two distinct paths:

  1. If already using a fleet config, commit it immediately
  2. Otherwise, pause the main collector and schedule a one-shot initial fetch with retries

A brief comment explaining this two-stage approach would improve maintainability.

     ctx->collect_fd = ret;
     flb_plg_info(ctx->ins, "fleet collector initialized with interval: %d sec %d nsec",
                  ctx->interval_sec, ctx->interval_nsec);
 
+    /* Two-stage startup: commit existing fleet config or fetch initial config with one-shot collector */
     if (is_fleet_config(ctx, config)) {
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 63a2dfd and f03f387.

📒 Files selected for processing (1)
  • plugins/in_calyptia_fleet/in_calyptia_fleet.c (5 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
plugins/in_calyptia_fleet/in_calyptia_fleet.c (1)
src/flb_input.c (5)
  • flb_input_collector_delete (2074-2093)
  • flb_input_set_collector_time (1685-1704)
  • flb_input_collector_resume (2095-2151)
  • flb_input_collector_start (1790-1809)
  • flb_input_collector_pause (2035-2072)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (24)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit (Arm64), amd64_arm64, -DCMAKE_SYSTEM_NAME=Windows -DCMA...
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 32bit, x86, x86-windows-static, 3.31.6)
  • GitHub Check: pr-windows-build / call-build-windows-package (Windows 64bit, x64, x64-windows-static, 3.31.6)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_ARROW=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SIMD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_THREAD=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SANITIZE_MEMORY=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_COVERAGE=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_ADDRESS=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=Off, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_SMALL=On, 3.31.6, gcc, g++)
  • GitHub Check: run-ubuntu-unit-tests (-DSANITIZE_UNDEFINED=On, 3.31.6, clang, clang++)
  • GitHub Check: run-ubuntu-unit-tests (-DFLB_JEMALLOC=On, 3.31.6, gcc, g++)
  • GitHub Check: PR - fuzzing test
🔇 Additional comments (5)
plugins/in_calyptia_fleet/in_calyptia_fleet.c (5)

2055-2074: LGTM: Refactored for reusability.

The addition of a temporary ret variable and the fleet_id_error label allows this function to be called from both the wrapper callback and the one-shot callback while maintaining consistent error handling.


2077-2082: LGTM: Clean wrapper pattern.

The wrapper callback maintains the plugin callback signature while allowing in_calyptia_fleet_collect to be reused by the one-shot collector.


2084-2120: LGTM: Robust retry logic with proper collector lifecycle management.

The one-shot collector correctly handles all scenarios:

  • On success, resumes the main collector immediately
  • On failure, schedules a retry (if attempts remain) or resumes the main collector (if exhausted)
  • On retry collector creation failure, resumes the main collector to prevent indefinite pause

The past review concern about resuming the main collector on retry failure has been addressed (line 2107).


2748-2748: LGTM: Correct callback registration.

The plugin now uses the wrapper callback, which maintains the expected signature while enabling the two-stage startup mechanism.


2084-2120: ✅ Past review comments addressed.

The issues raised in previous reviews have been resolved:

  1. Line 2107 ensures the main collector resumes even when retry collector creation fails
  2. Line 2635 uses in_calyptia_fleet_destroy(ctx) for consistent cleanup instead of manual resource management

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants