Docker is used to run a Virtual Machine in order to host Docker containers. Within the Virtual Machine each component runs as an isolated container. This layer of isolation give rise to a new problem which is, how can we capture the logs so that they can be included in the Docker Desktop diagnostic reports? If this isn’t figured out, then the logs will be written separately within each individual container which wouldn’t be of much use.
The Docker Desktop VM boots from an ISO. An ISO is built using LinuxKit using a list of Docker images along with a list of bind mounts and capabilities. In Docker Desktop, the LinuxKit VM boots in 2 phases:
- In the first phase, the init process carry out a number of one-shot “on-boot” actions in a sequential order by using runc to isolate them in containers. These series of actions basically enable, format disks, configure sysctl settings and network interfaces.
- The second of this process contains “services” which are started together and are then always run as containered tasks.
In the beginning, we were planning to add logging to the VM by running running a syslog compatible logging daemon that exposes /dev/log or a port (or both). Then,to write logs, other services would connect to syslog. Sadly, a logging daemon that is running in a service would start later, and so will miss all the logs from the “on-boot” actions. Since services initiates concurrently, there would be almost a competition between the syslog clients starting and syslog daemon starting. In this either the logs will get lost or each and every client will either be lost or each client startup will have to be blocked while waiting for the syslog service to begin. If we run a syslog daemon as an “on-boot” action we will be able to avoid this race, but it would be important for us to choose where to place it in the list of “on-boot” actions. It is best to start the logging daemon in the very starting so that we do not lose any logs, but in that case we will have no access to persistent disks or to a network to store the logs.
Design
The Go library container/ring to create a bounded circular buffer. The buffer is bound to prevent a spammy logging client from consuming too much memory in the VM. If the buffer does fill, then the oldest log lines will be dropped.
Logging clients send log entries via a file descriptor (labelled “linuxkit-external-logging.sock”). Log downloading programs connect to a query socket (labelled “memlogdq.sock”), read logs from the internal buffer and write them somewhere else.
Our design objective was to avoid making alterations to each individual container definition and to use the new logging system. We do not want to explicitly bind-mount a logging socket into the container or have to modify the container’s code to connect to it. We capture the output from containers automatically and send it all to the linuxkit-external-logging.sock?
When an on-boot action or service is launched, the VM’s init system creates a FIFO or a socketpair for the stdout and stderr. LinuxKit containers normally write their log entries to stderr by conversion. If we modify the init system, we are able to capture the logs written to stderr FIFOs and the socketpairs without changing the container definition or the code. When the logs have been captured, we now have to send them to memlogd — but how do we do that?
A little known feature of Linux is that you can pass open file descriptors to other processes via Unix domain sockets. We pass an open socket directly to memlogd.
When the container is started, the init system passes the stdout and stderr file descriptors to memlogd along with the name of the container. Memlogd monitors all its file descriptors in a select-loop. If data is available it will be read, attached with the container name and time stamped before it is appended to the in-memory ringbuffer. When the container terminates, the fd is closed and memlogd removes the fd from the loop.
So this means:
- It is not necessary need to modify container YAML definitions or code to be aware of the logging system; and
- No need to proxy logs between the container and memlogd.
Querying memlogd
Try the following command, to see memlogd in action on a Docker Desktop system
docker run -it –privileged –pid=host justincormack/nsenter1 /usr/bin/logread -F -socket /run/guest-services/memlogdq.sock
This command will help in running a privileged container in the root namespace and will also run the utility “logread”, telling it to keep copying from memlogd to the terminal until interrupted. The output would be:
2019-02-22T16:04:23Z,docker;time=”2019-02-22T16:04:23Z” level=debug msg=”registering ttrpc server”
Where the starting timestamp indicates when memlogd received the message and the “docker” shows that log came from the docker service.
Kernel logs (kmsg)
In the Docker Desktop, within the diagnostic reports we include the Linux kernel logs that helps us to comprehend and correct Linux kernel bugs. For this purposewe have created a kmsg-package. In the initial stage of this service, it will connect to /dev/kmsg, stream the kernel logs and further output them to stderr. The stderr is automatically sent to memlogd, the kernel logs will be then included in the VM’s logs and also in the diagnostic report. Reading kernel logs via /dev/kmsg is a privileged operation and so the kmsg service needs the capability CAP_SYSLOG.
Persisting the logs
In DD log entries are persisted to files , are then rotated when they become large and finally the oldest once are deleted to avoid leaking space. For this purpose we have created a logwrite package.
Summary
Now we have a relatively easy and lightweight, but extendable logging system that provides the features for our Docker Desktop. It gathers logs from both “on-boot” actions and services and persists logs to files with rotation after the file system has been mounted. We developed the logging system in the upstream LinuxKit project where we hope the simple and modular design will allow it to be easily extended by other LinuxKit developers.