-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
Description
When using logic from #3931, we discarded bindfd and adopted memfd. The pod has two containers: a main container and a sidecar. The request for the sidecar container is set to 10Mb and limit is 20MB. When I attempt to delete the pod and rebuild it, I face the following error:
Steps to reproduce the issue
1.Create a container with a memory limit set to 20MB.
2.Start it using the memfd method.
3.Check the value in mem.usage_in_bytes.
Alternatively, when used with Kubernetes:
The pod has two containers: a primary container and a sidecar. The request for the sidecar container is set to 10Mb and the limit is 20MB. When I delete the pod, I wait for the pod to be rebuilt.
Describe the results you received and expected
Error Log:
Message: failed to create containerd task: failed to create shim: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error setting cgroup config for procHooks process: unable to set memory limit to 20971520 (current usage: 21401600, peak usage: 21536768): unknowncode
func setMemory(path string, val int64) error {
if val == 0 {
return nil
}
err := cgroups.WriteFile(path, cgroupMemoryLimit, strconv.FormatInt(val, 10))
if !errors.Is(err, unix.EBUSY) {
return err
}
// EBUSY means the kernel can't set new limit as it's too low
// (lower than the current usage). Return more specific error.
usage, err := fscommon.GetCgroupParamUint(path, cgroupMemoryUsage)
if err != nil {
return err
}
max, err := fscommon.GetCgroupParamUint(path, cgroupMemoryMaxUsage)
if err != nil {
return err
}
return fmt.Errorf("unable to set memory limit to %d (current usage: %d, peak usage: %d)", val, usage, max)
}code
case procHooks:
// Setup cgroup before prestart hook, so that the prestart hook could apply cgroup permissions.
if err := p.manager.Set(p.config.Config.Cgroups.Resources); err != nil {
return fmt.Errorf("error setting cgroup config for procHooks process: %w", err)
}
if p.intelRdtManager != nil {
if err := p.intelRdtManager.Set(p.config.Config); err != nil {
return fmt.Errorf("error setting Intel RDT config for procHooks process: %w", err)
}
}
if len(p.config.Config.Hooks) != 0 {
s, err := p.container.currentOCIState()
if err != nil {
return err
}
// initProcessStartTime hasn't been set yet.
s.Pid = p.cmd.Process.Pid
s.Status = specs.StateCreating
hooks := p.config.Config.Hooks
if err := hooks[configs.Prestart].RunHooks(s); err != nil {
return err
}
if err := hooks[configs.CreateRuntime].RunHooks(s); err != nil {
return err
}
}
// Sync with child.
if err := writeSync(p.messageSockPair.parent, procResume); err != nil {
return err
}
sentResume = trueUpon checking move_charge_at_immigrate, it's not enabled, and I'm on cgroupv1.
Upon examining the kernel 4.18 source code:
static int mem_cgroup_can_attach(struct cgroup_taskset *tset)
{
struct cgroup_subsys_state *css;
struct mem_cgroup *memcg = NULL; /* unneeded init to make gcc happy */
struct mem_cgroup *from;
struct task_struct *leader, *p;
struct mm_struct *mm;
unsigned long move_flags;
int ret = 0;
/* charge immigration isn't supported on the default hierarchy */
if (cgroup_subsys_on_dfl(memory_cgrp_subsys))
return 0;
/*
* Multi-process migrations only happen on the default hierarchy
* where charge immigration is not used. Perform charge
* immigration if @tset contains a leader and whine if there are
* multiple.
*/
p = NULL;
cgroup_taskset_for_each_leader(leader, css, tset) {
WARN_ON_ONCE(p);
p = leader;
memcg = mem_cgroup_from_css(css);
}
if (!p)
return 0;
/*
* We are now commited to this value whatever it is. Changes in this
* tunable will only affect upcoming migrations, not the current one.
* So we need to save it, and keep it going.
*/
move_flags = READ_ONCE(memcg->move_charge_at_immigrate);
if (!move_flags)
return 0;
from = mem_cgroup_from_task(p);
VM_BUG_ON(from == memcg);
mm = get_task_mm(p);
if (!mm)
return 0;
/* We move charges only when we move a owner of the mm */
if (mm->owner == p) {
VM_BUG_ON(mc.from);
VM_BUG_ON(mc.to);
VM_BUG_ON(mc.precharge);
VM_BUG_ON(mc.moved_charge);
VM_BUG_ON(mc.moved_swap);
spin_lock(&mc.lock);
mc.mm = mm;
mc.from = from;
mc.to = memcg;
mc.flags = move_flags;
spin_unlock(&mc.lock);
/* We set mc.moving_task later */
ret = mem_cgroup_precharge_mc(mm);
if (ret)
mem_cgroup_clear_mc();
} else {
mmput(mm);
}
return ret;
}For cgroupv2 version, the code directly returns 0:
/* charge immigration isn't supported on the default hierarchy */
if (cgroup_subsys_on_dfl(memory_cgrp_subsys))
return 0;If move_charge_at_immigrate=0, it directly returns 0 as well:
/*
* We are now commited to this value whatever it is. Changes in this
* tunable will only affect upcoming migrations, not the current one.
* So we need to save it, and keep it going.
*/
move_flags = READ_ONCE(memcg->move_charge_at_immigrate);
if (!move_flags)
return 0;The issue disappears when I switch back to runc 1.1.2 or use the memfd-bind binary.
Question 1: At that time, what was consuming the memory? memfd shouldn't consume the container's memory.
@lifubang @cyphar
What version of runc are you using?
master
Host OS information
NAME="EulerOS"
VERSION="2.0 (SP10x86_64)"
ID="euleros"
VERSION_ID="2.0"
PRETTY_NAME="EulerOS 2.0 (SP10x86_64)"
ANSI_COLOR="0;31"
Host kernel information
Linux PaaSOM-1 4.18.0-147.5.2.14.h1050.eulerosv2r10.x86_64 #1 SMP Sun Oct 16 18:12:21 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux