为何alpine/openwrt容器显示错误的总内存容量

问题

最近在探索基于容器的虚拟机方案，不管是使用基于lxc的lxd还是libvirt-lxc，发现alpine容器或openwrt容器使用 free -m 显示的内存容量是宿主机容量，而不是限制后的内存容量。这是为何?

探索

首先我们要确定问题范围，搞清楚到底是容器管理器的问题，还是发行版的问题。使用排除法，我们先缩小一下问题范围。我尝试了debian11的容器，发现内存容量显示正常。

于是我们目前可以确定，应当是发行版而不是容器管理器造成了内存容量显示错误。

那为何alpine/openwrt容器显示错误的总内存容量呢?

定位

接下来我们从free这个应用入手，来排查这个问题。首先我们查看一下free是谁提供的

~ # which free
/usr/bin/free
~ # ls -al /usr/bin/free
lrwxrwxrwx    1 root     root            12 Jun 15  2021 /usr/bin/free -> /bin/busybox
~ # /bin/busybox
BusyBox v1.33.1 () multi-call binary.
...

~ # which free

/usr/bin/free

~ # ls -al /usr/bin/free

lrwxrwxrwx 1 root root 12 Jun 15 2021 /usr/bin/free -> /bin/busybox

~ # /bin/busybox

BusyBox v1.33.1 () multi-call binary.

...

经过查看，free是由busybox工具箱提供的，版本号是v1.33.1

于是我们查看一下busybox v1.33.1中关于free的代码，具体实现在 procps/free.c 文件内。下面我截取其中关键的部分

...

# include <sys/sysinfo.h>

...

int free_main(int argc UNUSED_PARAM, char **argv IF_NOT_DESKTOP(UNUSED_PARAM))
{
    struct globals G;
    struct sysinfo info;

    ...

    printf("       %12s%12s%12s%12s%12s%12s\n"
    "Mem:   ",
        "total",
        "used",
        "free",
        "shared", "buff/cache", "available" /* swap and total don't have these columns */
    );

    sysinfo(&info);

...

#define FIELDS_6 "%12llu %11llu %11llu %11llu %11llu %11llu\n"

...

    printf(FIELDS_6,
        scale(&G, info.totalram),                //total
        scale(&G, info.totalram - cached_plus_free), //used
        scale(&G, info.freeram),                 //free
        scale(&G, info.sharedram),               //shared
        scale(&G, cached),                       //buff/cache
        scale(&G, available)                     //available
    );

...

    return EXIT_SUCCESS;
}

...

# include <sys/sysinfo.h>

...

int free_main(int argc UNUSED_PARAM, char **argv IF_NOT_DESKTOP(UNUSED_PARAM))

{

struct globals G;

struct sysinfo info;

...

printf(" %12s%12s%12s%12s%12s%12s\n"

"Mem: ",

"total",

"used",

"free",

"shared", "buff/cache", "available" /* swap and total don't have these columns */

);

sysinfo(&info);

...

#define FIELDS_6 "%12llu %11llu %11llu %11llu %11llu %11llu\n"

...

printf(FIELDS_6,

scale(&G, info.totalram), //total

scale(&G, info.totalram - cached_plus_free), //used

scale(&G, info.freeram), //free

scale(&G, info.sharedram), //shared

scale(&G, cached), //buff/cache

scale(&G, available) //available

);

...

return EXIT_SUCCESS;

}

通过上方的代码，很明显可以看出，显示的总内存，是使用 sysinfo这个linux系统调用来获得的。

到这里，我们应该能猜到七七八八了。因为容器与宿主共享同一个内核，而系统调用是直接进入内核的，不会被容器所处理，所以拿到的是宿主机的内存容量。

确认

我们看下 sysinfo中拿到的内存总容量是什么信息。

经过搜索宿主机对应的v5.10内核代码，我们可以找到在 kernel/sys.c文件中有其实现

/**
 * do_sysinfo - fill in sysinfo struct
 * @info: pointer to buffer to fill
 */
static int do_sysinfo(struct sysinfo *info)
{
  ...

  memset(info, 0, sizeof(struct sysinfo));

  ...

    si_meminfo(info);

    ...

  return 0;
}

/**

* do_sysinfo - fill in sysinfo struct

* @info: pointer to buffer to fill

static int do_sysinfo(struct sysinfo *info)

{

...

memset(info, 0, sizeof(struct sysinfo));

...

si_meminfo(info);

...

return 0;

}

再接下来看， si_meminfo存在于 mm/page_alloc.c中

void si_meminfo(struct sysinfo *val)
{
    val->totalram = totalram_pages();

  ...
}

void si_meminfo(struct sysinfo *val)

{

val->totalram = totalram_pages();

...

}

再接下来看 totalram_pages ，存在于 include/linux/mm.h中

static inline unsigned long totalram_pages(void)
{
    return (unsigned long)atomic_long_read(&_totalram_pages);
}

static inline unsigned long totalram_pages(void)

{

return (unsigned long)atomic_long_read(&_totalram_pages);

}

到了这里，我们就彻底搞清楚了。sysinfo系统调用直接进入内核，读取了内核存储的内存总页数，也就是其物理内存的信息。

那问题又来了，为什么debian11的容器可以正常读取到限制后的内存信息呢?

经过搜索，我们从debian的free的manpages 就能看到，它是通过读取 /proc/meminfo的信息，来获取总内存的，并不是直接通过sysinfo系统调用。

在容器内，我查看 /proc/meminfo的信息，发现其容量是被限制的正确容量。经过搜索，确认容器的procfs是经过容器处理的，所以能显示正确的容量。

至此，问题解决。

思考

如果我们硬要解决这个问题，那怎么办呢?

其实也是有一个办法，使用ptrace追踪系统调用，并做调整是一个办法。但是跟踪所有系统调用，并进行拦截和处理一方面投入产出比不高，另一方面也影响执行效率。所以这个方法仅仅是可行，但并不好。

问题

探索

定位

确认

思考

发表评论 取消回复

发表评论取消回复