Contents
- ROCKER
- 1.1. Highlights & Features
- 1.2. User Guide
- 1.3. Architecture
- 1.4. Roadmap
- 1.5. BUG
- 1.6. TODO
ROCKER is an online decompression and process sandbox implementation targeting resource-constrained (even container implementations like Docker have intolerable overhead) Linux-based IoT systems. It aims to improve resource utilization and system security while avoiding additional performance overhead.
By compressing App program files and other suitable files into squashfs packages, it typically achieves over 60% disk space savings, which is very significant for resource-constrained IoT systems. At the same time, since the Linux kernel natively provides dynamic on-demand decompression support, there is no additional memory overhead compared to conventional App execution.
The security guarantee of the sandbox feature comes from Linux namespaces/cgroups, overlayfs, Rust, and extensive test cases. Thanks to Linus Torvalds and the Rust team for creating these great infrastructure pieces.
This project is currently suitable for learning, research, or teaching purposes. It is not recommended for production use.
- Better performance and resource utilization than Docker; launching containers requires no additional images;
- The client library is written in pure C, bare-metal style, with no dependencies other than libc;
- The server-side (library) is written in Rust, rock-solid stability with C/C++-level runtime efficiency and memory utilization;
- Uses crosstool-ng for cross-toolchain management, with good portability and stability;
- Enforced unified code style, clean and elegant;
- Emphasis on documentation and testing;
- More highlights await in the source code...
The entire project is divided into a Client side and a Server side, similar to a traditional CS architecture. The client side is provided as a library to callers.
..
├── core/ # [Rust] Server-side core logic implementation
├── rocker_server/ # [Rust] CS server implementation
├── librocker_client/ # [C ] CS client library implementation
├── librocker_client_wrapper/ # [Rust] librocker_client wrapper via FFI, for testing
├── tests/ # [Rust] Test cases
├── README.md # Project main documentation
└── tools/Official documentation links:
In theory, this project can be compiled and run on any architecture supported by crosstool-ng and Rust. On non-x86(_x64) platforms, musl or uclibc is statically linked by default to simplify runtime dependencies.
Example using TARGET=armv7-linux-musleabihf:
# Compile and install; installation path is install_dir in the project root
make TARGET=armv7-linux-musleabihf release
# Functional tests
make TARGET=armv7-linux-musleabihf test
# Performance benchmarks
make TARGET=armv7-linux-musleabihf benchUsage example:
// Callback function and its arguments, defined by the caller, to be executed inside ROCKER
void *your_args = NULL;
int start_your_APP(void *args){};
// Initialize a blank RockerRequest struct
RockerRequest req = ROCKER_request_new();
// Assign values to the RockerRequest struct
req.app_id = 1000;
req.uid = 1000;
req.gid = 1000;
req.app_pkg_path = "/tmp/your_APP.squashfs";
req.app_exec_dir = "/var/your_APP/execdir";
req.app_data_dir = "/var/your_APP/datadir";
req.app_overlay_dirs = { "/usr", "/var", "/etc", "/home", "/root" };
// Attempt to run APP inside ROCKER
RockerResult res = ROCKER_enter_rocker(&req, start_my_APP, my_args);
if (ROCKER_ERR_success != res.err_no) {
// Handle error
}
// APP finished running, clean up the environment
kill(res.guard_pid, SIGKILL);
// More robust cleanup
RockerResult res2 = ROCKER_get_guardname(res.guard_pid);
if (ROCKER_ERR_success == res2.err_no && \
0 == strcmp(res.guard_name, res2.guard.name)) {
kill(res.guard_pid, SIGKILL);
}See librocker_client for details.
The following describes the logical architecture using sequence diagrams.
| Node Name | Definition |
|---|---|
| App | The foreground business process to be launched |
| AppMaster | Resource scheduling process, the App manager |
| RockerClient | Client library used by AppMaster, responsible for interacting with RockerMaster and RockerGuard |
| RockerGuard | PID 1 of the virtual sandbox where App resides, created by RockerMaster, responsible for executing the online decompression and virtual sandbox logic |
| RockerMaster | ROCKER server-side resource scheduling interface |
| Kernel | Linux kernel |
NOTE: RockerClient is usually in the same process as AppMaster. For readability, the documentation treats them as logically separate.
(0) start App; (1) request new rocker
AppMaster calls the RockerClient API to request RockerMaster to create a new rocker and run the specified App inside it. The specific configuration of the new Rocker is specified by AppMaster in its request.
(2) create new rocker-guard
RockerMaster receives the request and creates a new rocker virtual sandbox environment, where PID 1 is RockerGuard. RockerGuard plays a role similar to init or systemd in a conventional Linux system. If it exits, the entire virtual sandbox environment is destroyed (all processes and disk mounts are automatically cleaned up by the Kernel).
(3)
After RockerMaster creates the corresponding RockerGuard instance, subsequent interactions with AppMaster are handled directly by RockerGuard. RockerMaster no longer participates.
(4) preparing, need CAP_SYS_ADMIN {#1}
RockerGuard creates the corresponding rocker environment as required by AppMaster. This process requires CAP_SYS_ADMIN capability or root privileges. Specific details are described in section 1.3.1.1.
(5) send rocker-entrance
RockerGuard sends the created rocker entrance to RockerClient.
(6) start App in rocker {#2}
RockerClient enters the newly created rocker environment and starts the App inside it. Details described in section 1.3.1.3.
(7) PID of guard\App
After App starts, RockerClient returns the PID of Guard and App to AppMaster for subsequent management. At this point, RockerClient has completed its mission for this App launch and no longer participates.
(8) stop all; (10) kill
AppMaster has two ways to stop App: manage it conventionally, or kill RockerGuard to trigger automatic cleanup.
Both have trade-offs:
- The former gives AppMaster finer-grained control, but it must ensure thorough environment cleanup itself (see step 9);
- The latter triggers the Kernel's automatic cleanup via kill RockerGuard, simplifying AppMaster's implementation and ensuring complete cleanup, but AppMaster cannot customize the cleanup process.
(9) exit self
RockerGuard exits automatically when all other processes in its rocker virtual sandbox have exited. If AppMaster misses a process, the rocker's resources will never be released.
(11) send SIGCHLD
After RockerGuard exits, RockerMaster receives SIGCHLD and may optionally execute some internal logic.
(12) rocker's [PID 1] exited
The kernel detects that PID 1 in a rocker (pid namespace) has exited.
(13) broadcast SIGKILL(auto, very clean); (14) umount overlay; (15) destroy useless loop device
The kernel automatically cleans up all resources (including recursively generated derived resources).
(0) clone(MNT|PID)
RockerMaster creates the RockerGuard instance using clone with CLONE_MNT and CLONE_PID flags. See man clone(2).
(1) create loop device
RockerGuard calls the ioctl interface to get an available loop device. See man loop(4).
(2) bind App.sqfs to loop; (3) mount loop to exec-path
The App package in squashfs format is bound to the newly acquired loop device, and RockerGuard mounts that loop device to the execution path required by AppMaster.
Creating App.sqfs and bind-mounting is similar to the following command-line logic:
# Create a squashfs package from App program files; requires squashfs-tools
mksquashfs ./AppDir ./App.sqfs
# Bind the squashfs package to a loop device
losetup /dev/loop8 ./App.sqfs
# Mount to the specified directory
mount ./App.sqfs /mnt/AppExecDir(4) build overlay {#1}
RockerGuard creates the overlay read-write isolation layer. Details described in section 1.3.1.2.
(5) remount /proc
Remount /proc so that PID information within the new pid_namespace is displayed correctly.
(6) unshare(USER)
After the preceding work is complete, the RockerGuard process calls unshare to enter a new dedicated user_namespace. All subsequent operations inside rocker occur within this privilege-restricted user_namespace. See man unshare(2).
(7) done
RockerGuard preparation complete; notifies RockerMaster.
(8) set uid_map
RockerMaster sets uid_map (requires CAP_SETUID) and gid_map (requires CAP_SETGID) for RockerGuard. Before setting gid_map, "deny" must be written to /proc/[RockerGuard PID]/setgroups. See man user_namespaces(7).
(0) get all visiable top-dir except /proc,/sys,/dev,/run
Enumerate all top-level directories under root, excluding dynamic directories such as /proc, /sys, /dev, /run, etc.
(1) top-dir act as 'lowerdir', and finally merged to themself
All visible top-level directories under root have an overlay isolation layer applied in-place within an independent mnt_namespace, giving each App its own independent read-write virtual filesystem. See kernel docs overlayfs.
For example, mounting /usr for App with ID 1000, assuming upperdir and workdir are /private/1000/upperdir and /private/1000/workdir respectively, the overlay mount is similar to:
mount -t overlay overlay /usr \
-o lowerdir=/usr,upperdir=/private/1000/upperdir,workdir=/private/1000/workdir(0) rocker created
RockerGuard notifies RockerClient after the new rocker environment is ready.
(1) fork out a child process, (2) setns(USER|MNT|PID)
RockerClient creates a child process which calls the system's setns interface to enter the new rocker environment.
(3) run App in child's brother process
Inside the child process created in (1), another child process is created to launch the foreground App process.
The newly created subprocess in this step is effectively a sibling of the subprocess from (1), implemented using clone with CLONE_PARENT flag. Its parent is the same as the parent of the subprocess from (1), allowing AppMaster to receive SIGCHLD when the App process terminates. See man clone(2).
(4) PID of guard\App
RockerClient returns the PIDs of RockerGuard and App to AppMaster.
Adding more useful features, such as the App process intelligent scheduling algorithm shown below:
- ...
- Add log persistence to disk;
- Add more functional and performance test cases;
- ...