Skip to content

[Heartbeat] Add prctl dumpable flag reset after cap drop #38269

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 14, 2024

Conversation

emilioalvap
Copy link
Collaborator

@emilioalvap emilioalvap commented Mar 12, 2024

Proposed commit message

Heartbeat lockdown measures are triggering /proc/io ownership rules, which end up not being accessible by the agent. This PR introduces a reset on PRCTL_SET_DUMPABLE flag to allow the agent to access heartbeat metrics.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • [ ] I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test this PR locally

  1. Enrol a local agent in a private location policy which contains at least one monitor.
  2. Find PID for heartbeat process: docker exec -u root -it --container name-- ps -aef
  3. Check permissions for /proc/--pid--/io:
//before
docker exec -u root -it vigorous_pike ls -al /proc/70/io
-r-------- 1 root root 0 Mar 12 12:59 /proc/70/io
//after
docker exec -u root -it exciting_spence ls -al /proc/57/io
-r-------- 1 elastic-agent elastic-agent 0 Mar 12 12:52 /proc/57/io

@emilioalvap emilioalvap added bug Team:obs-ds-hosted-services Label for the Observability Hosted Services team v8.12.0 labels Mar 12, 2024
@emilioalvap emilioalvap self-assigned this Mar 12, 2024
@emilioalvap emilioalvap requested a review from a team as a code owner March 12, 2024 13:17
@elasticmachine
Copy link
Collaborator

Pinging @elastic/obs-ds-hosted-services (Team:obs-ds-hosted-services)

@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Mar 12, 2024
Copy link
Contributor

mergify bot commented Mar 12, 2024

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @emilioalvap? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

@elasticmachine
Copy link
Collaborator

elasticmachine commented Mar 12, 2024

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2024-03-14T18:29:38.433+0000

  • Duration: 52 min 53 sec

Test stats 🧪

Test Results
Failed 0
Passed 2581
Skipped 28
Total 2609

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@axw
Copy link
Member

axw commented Mar 13, 2024

prctl(2) says:

Normally, the "dumpable" attribute is set to 1.  However, it is reset to the current value contained in the file /proc/sys/fs/suid_dumpable (which by default has the value 0), in the following circumstances:

 •  The process's effective user or group ID is changed.

 •  The process's filesystem user or group ID is changed
    (see [credentials(7)](https://man7.org/linux/man-pages/man7/credentials.7.html)).

 •  The process executes ([execve(2)](https://man7.org/linux/man-pages/man2/execve.2.html)) a set-user-ID or set-
    group-ID program, resulting in a change of either the
    effective user ID or the effective group ID.

 •  The process executes ([execve(2)](https://man7.org/linux/man-pages/man2/execve.2.html)) a program that has
    file capabilities (see [capabilities(7)](https://man7.org/linux/man-pages/man7/capabilities.7.html)), but only if
    the permitted capabilities gained exceed those already
    permitted for the process.

Is it really dropping capabilities that is clearing the dumpable flag? This program doesn't agree, but maybe I'm missing something...

package main

import (
        "fmt"
        "log"

        "golang.org/x/sys/unix"
        "kernel.org/pub/linux/libs/security/libcap/cap"
)

func main() {
        if true {
                newcaps := cap.GetProc()
                // Raise all permitted caps to effective
                if err := newcaps.Fill(cap.Effective, cap.Permitted); err != nil {
                        log.Fatal(err)
                }
                // Drop all inheritable caps to stop propagation to child proc
                if err := newcaps.ClearFlag(cap.Inheritable); err != nil {
                        log.Fatal(err)
                }
                // Apply the new capabilities to the current process (incl. all threads)
                if err := newcaps.SetProc(); err != nil {
                        log.Fatal(err)
                }
        }

        dumpable, err := cap.Prctl(unix.PR_GET_DUMPABLE)
        if err != nil {
                log.Fatal(err)
        }
        fmt.Println("dumpable:", dumpable)
        select {}
}
$ go run .
dumpable: 1

@emilioalvap
Copy link
Collaborator Author

Hi @axw,

I think your tests is missing file capabilities, like the ones we specify for heartbeat bundled inside the agent. Same code with file caps:

$ go build && chmod +x go_playground
$ sudo setcap cap_net_raw,cap_setuid+p go_playground
$ ./go_playground
dumpable: 0

It's not the act of dropping capabilities that triggers the change, but the mechanism we have in place for locking down capabilities. I apologize if the reference induced any confusion.

@axw
Copy link
Member

axw commented Mar 13, 2024

It's not the act of dropping capabilities that triggers the change, but the mechanism we have in place for locking down capabilities. I apologize if the reference induced any confusion.

No worries! I understand what's going on now, thanks :) I missed that we were changing capabilities on the heartbeat binary.

@@ -46,6 +47,9 @@ func init() {
// The beat should use `getcap` at a later point to examine available capabilities
// rather than relying on errors from `setcap`
_ = setCapabilities()

// Make heartbeat dumpable so elastic-agent can access process metrics.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Is this applicable only when

  • We are not running as root, common in context of container (BEAT_SETUID_AS) and under elastic-agent.
  • We effectively switched the user id and group id which changed the dumpable flag to value other than 1 (restircting access).

With that said, Should we enforce this change only on containerized environment? I assume there is no harm in overriding the dumpable flag even if its already 1?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume there is no harm in overriding the dumpable flag even if its already 1?

There shouldn't be, there's no issue with overriding and in case of error, kernel returns an error code we can safely ignore since the agent is ready to handle those cases now.

We are not running as root, common in context of container (BEAT_SETUID_AS) and under elastic-agent.
We effectively switched the user id and group id which changed the dumpable flag to value other than 1 (restircting access).

Not exactly, elastic-agent image is already shipped with file capabilities enabled so it doesn't require BEAT_SETUID_AS mechanism to kick in, see this point in prctl:

 •  The process executes ([execve(2)](https://man7.org/linux/man-pages/man2/execve.2.html)) a program that has
    file capabilities (see [capabilities(7)](https://man7.org/linux/man-pages/man7/capabilities.7.html)), but only if
    the permitted capabilities gained exceed those already
    permitted for the process.

Although changing effective uid/gid also triggers disabling dumpable flag.

Should we enforce this change only on containerized environment?

Container detection can be unreliable in some cases. Also, I can't think of a situation where this would broaden the permission scope for a non-containerized instance (running users already have access to config files). But I'm happy to discuss alternatives.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, Thanks for the extra context. I agree with the argument here on container detection and permission scope. I think the workaround looks good as it is.

@emilioalvap emilioalvap enabled auto-merge (squash) March 14, 2024 18:31
@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

History

cc @emilioalvap

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

History

cc @emilioalvap

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

cc @emilioalvap

@elasticmachine
Copy link
Collaborator

💔 Build Failed

Failed CI Steps

cc @emilioalvap

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

History

cc @emilioalvap

@elasticmachine
Copy link
Collaborator

💔 Build Failed

Failed CI Steps

cc @emilioalvap

@elasticmachine
Copy link
Collaborator

💚 Build Succeeded

History

cc @emilioalvap

@elasticmachine
Copy link
Collaborator

💔 Build Failed

Failed CI Steps

cc @emilioalvap

@elasticmachine
Copy link
Collaborator

elasticmachine commented Mar 14, 2024

@elasticmachine
Copy link
Collaborator

elasticmachine commented Mar 14, 2024

@emilioalvap emilioalvap merged commit 9c9ae35 into elastic:main Mar 14, 2024
@emilioalvap emilioalvap added backport-v8.12.0 Automated backport with mergify backport-v8.13.0 Automated backport with mergify labels Mar 18, 2024
mergify bot pushed a commit that referenced this pull request Mar 18, 2024
Enforce dumpable attribute on heartbeat process for /proc/io to be readable by elastic-agent.

---------

Co-authored-by: Vignesh Shanmugam <[email protected]>
(cherry picked from commit 9c9ae35)
mergify bot pushed a commit that referenced this pull request Mar 18, 2024
Enforce dumpable attribute on heartbeat process for /proc/io to be readable by elastic-agent.

---------

Co-authored-by: Vignesh Shanmugam <[email protected]>
(cherry picked from commit 9c9ae35)
emilioalvap added a commit that referenced this pull request Mar 18, 2024
…8369)

Enforce dumpable attribute on heartbeat process for /proc/io to be readable by elastic-agent.

---------

Co-authored-by: Vignesh Shanmugam <[email protected]>
(cherry picked from commit 9c9ae35)

Co-authored-by: Emilio Alvarez Piñeiro <[email protected]>
emilioalvap added a commit that referenced this pull request Mar 18, 2024
…8368)

Enforce dumpable attribute on heartbeat process for /proc/io to be readable by elastic-agent.

---------

Co-authored-by: Vignesh Shanmugam <[email protected]>
(cherry picked from commit 9c9ae35)

Co-authored-by: Emilio Alvarez Piñeiro <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-v8.12.0 Automated backport with mergify backport-v8.13.0 Automated backport with mergify bug Team:obs-ds-hosted-services Label for the Observability Hosted Services team v8.12.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants