articles/systemd-hardening-in-NixOS/article.md

18 KiB

tips for systemd services management and hardening in NixOS

introduction

When it comes to security, we care about limiting access of each entity of a system to as few other entities as possible. Network input, executables and users must be able to reach only those resources, which are necessary to perform the defined server tasks. Principle of least priviledge.

Generally, it's better to implement as many layers of security as possible. Although, there is no way to make a server 100% bullet proof - it's a huge endless topic, this article covers some feasible essential systemd tunables that give us a layer of protection.

Systemd is the standard software suite for organizing and running services/daemons in a modern GNU/Linux distribution, including NixOS. Systemd provides means to secure services. And in many ways, the isolation level of a systemd service can be similar to that of containers (by the means of sandboxing, namespaces and cgroups, which Docker also uses; interestingly, systemd even allows running multiple instances of the same service). However, systemd hardening defaults are quite loose (perhaps, not to disturb the operation of newly written services and their administrators in any way).

What NixOS does - it generates systemd configuration files in accordance to NixOS configuration given, written in the Nix language. To some extent, Nix acts as an advanced macro language. Whereas, NixOS configuration module system acts as a unified control center, so that you don't bother about location of systemd files, their syntax and common stuff, which NixOS generates for you. Also, NixOS manages switching between systemd configurations (when you call for it), conducting services restarts when required, and whole system rollbacks from GRUB/systemd-boot/extlinux.


NixOS rollbacks are cheap. Based on the Nix storage model, they do not take additional disk space (except metadata). So, there is virtually no need for system backups/snapshots.

overview of systemd services integration within NixOS configuration

NixOS features lots of systemd services, which are ready to use (without even knowing what systemd is) just by setting appropriate options in configuration.nix. For example, write services.netdata.enable = true; to enable Netdata monitoring service. Often many useful high-level tunables are available as services.<name>.* options.

When services, provided by NixOS, are insufficient or additional tuning is demanded, systemd.services.<name>.* set of options comes into play. They allow to define custom systemd services or modify existing ones. Regardless of the origin of a systemd service (provided by NixOS or written by yourself), systemd native directives for sections such as [Unit] and [Service] can be specified accordingly in the following nix attribute sets:

  • [Unix]: systemd.services.<name>.unitConfig = { SYSTEMD_DIRECTIVE = VALUE; ... }
  • [Service]: systemd.services.<name>.serviceConfig = { SYSTEMD_DIRECTIVE = VALUE; ... }

String values must be enclosed in double quotes. Boolean values are written as true and false. This is just Nix language syntax.

[Install] section directives such as Alias, WantedBy and RequiredBy can be specified as nix lists in:

  • systemd.services.<name>.aliases
  • systemd.services.<name>.wantedBy
  • systemd.services.<name>.requiredBy.

You can find more information about such options online or in man configuration.nix as usual.

In a nutshell, configuring systemd options for services on NixOS typically boils down to these steps:

  1. edit systemd.services.* options in configuration.nix or in other imported nix files;
  2. run sudo nixos-rebuild test to apply new configuration just for the current OS boot or sudo nixos-rebuild switch to apply changes permanently (additionally add --flake /etc/nixos for flakes);
  3. evaluate systemd service operation (we will elaborate on this further);
  4. return to step 1 or finish.

Alternatively, new configurations can be tested inside a QEMU VM clone of your system without affecting your running system configuration. nixos-build build-vm leaves a symlink ./result in the current directory that contains the built VM. To run it, use result/bin/run-<hostname>-vm.

Be aware that systemd directives (options) are case sensitive! But NixOS doesn't know whether systemd recognizes any directives or not, whereas systemd does not complain neither! So, once new configuration is applied, analyze the output of these commands and compare with the intended objectives:

  • systemctl cat <name> - contents of a systemd unit file, generated by NixOS
  • systemctl show <name> - actual properties of a systemd unit in effect

Also, keep in mind that mutable operations like systemd <service> enable are useless, because they would deviate the system from declarative reproducible configuration and NixOS won't let or will stubbornly resist you doing so at the design level. And there is no need, since each permanent setting is in the hands of NixOS.


Documentation for all related options can be found on the website or in man configuration.nix (also in man home-configuration.nix for managing desktop user services).


Documentation for all related options can be found on the website or in man configuration.nix (also in man home-configuration.nix for managing desktop user services).

tips for hardening

There is no universal way in configuring systemd services sandboxing/hardening options for all services. Each service requires an individual approach.

NixOS provides many services, available as services.<name>.*, which already have more or less hardening implemented by the means of systemd. For example, services.nginx, services.gitea, services.jitsi-meet, services.redis. At least, these services run under specific system non-root users without access to spawn a shell.

There are, however, services like services.dovecot2, services.postfix and services.nextcloud, which use their own means to spawn sub-processes under a specific user by a master process. Such master process is run under root. For example, nextcloud uses php:fpm (PHP FastCGI Process Manager). Obviously, shell can be spawned by such processes and a lot more, but they do not have network connections with outside world and intended specifically for process/workers management and logging. Ideally, we would want them to be run under non-root user regardless, but usually it's not easy to do and upstream might not expect such usage.

Btw, if your systemd service code gets large and you want to wrap it into something more esthetic, you can write your own NixOS service module.

common hardening options (execution environment configuration)

These options are described in official systemd execution environment configuration. Note, that many of these may cause your service malfunction or even crash. So, always test after applying them.

The following code can be specified inside the curly brackets here systemd.service.<name>.serviceConfig = { ... };, where <name> is the placeholder for a real name of a service you set these options for:

# these capabilities can be enough for some web services
AmbientCapabilities = [ "" ];
CapabilityBoundingSet = [ "CAP_NET_BIND_SERVICE" ];

DynamicUser = true;
LockPersonality = true;
MemoryDenyWriteExecute = true;
NoNewPrivileges = true;
PrivateDevices = true;
PrivateTmp = true;

# set PrivateIPC in case IPC is used, but not between services
# PrivateIPC = true;

PrivateUsers = true;
ProcSubset = "pid";
ProtectClock = true;
ProtectControlGroups = true;
ProtectHome = true;
ProtectHostname = true;
ProtectKernelLogs = true;
ProtectKernelModules = true;
ProtectKernelTunables = true;

# some services need `ProtectProc = "invisible"` instead; this option implies `MountAPIVFS`
ProtectProc = "invisible";

# entire file system hierarchy gets mounted read-only, except `/dev` `/proc` and `/sys`
ProtectSystem = "strict";

# you need to exclude "AF_UNIX" if unix sockets are not used
RestrictAddressFamilies = [ "AF_UNIX" "AF_INET" "AF_INET6" ];

RestrictNamespaces = true;
RestrictRealtime = true;
RestrictSUIDSGID = true;
SystemCallArchitectures = "native";

# contrary to intuition this does not forbid IPC, but removes IPC objects after unit is stopped
RemoveIPC = true;

# allow general system service operations, except ~@ sets
# (see full list of predefined system call sets with `systemd-analyze syscall-filter`)
SystemCallFilter = [ "@system-service" "~@cpu-emulation" "~@debug" "~@keyring" "~@memlock" "~@obsolete" "~@privileged" "~@resources" "~@setuid" ];
# this disables IPC (some services require IPC, so be careful)
SystemCallFilter = [ "~@ipc" ];

Refer to man capabilities for values of AmbientCapabilities and CapabilityBoundingSet options.

some very specific hardening options (resource control unit settings)

These options are described in official systemd resource control documentation.

When PrivateDevices is true, all non-pseudo /dev devices are not accessible. You may want to whitelist some. Note, this is not related to filesystems access.

# explicitly allow pseudo devices
DevicePolicy = "closed";
# explicit list of accessible devices
DeviceAllow = [ "" ];

The following are self-explanatory:

SocketBindDeny = "any";
SocketBindAllow = "tcp:80";

resources limits for a systemd service

Systemd resource control directives allow you to limit resources provided to a service. For example, if MemoryMax limit is exceeded, OOM killer gets invoked.

systemd.service = {
  nginx = {
    serviceConfig = {
      CpuAccounting = true;
      CpuQuota = "70%";
      MemoryAccounting = true;
      MemoryMax = "768M";
      BlockIOWeight = 10;
    };
  };
}

MemoryMax is the absolute limit. It is recommended to use MemoryHigh as the main control mechanism, because it allows to go above the limit, but the processses are heavily slowed down and memory is taken away aggressively according to systemd documentation.

blocking network connections

blocking all network connections except localhost

This is appropriate, for example, if a service communicates with outside world via proxy (like nginx). And can be configured also with the help of systemd resource control directives, partially mentioned above.

systemd.services.netdata.serviceConfig = {
  IPAddressDeny = "any";
  IPAddressAllow = "localhost";
};

blocking outgoing internet connections (not achievable by systemd options)

The idea here is to keep responding to incoming requests for a service, but forbid any outgoing connections, initiated by itself. When it comes to a more sophisticated firewall, systemd is not capable of such granular control. iptables can match packets generated by specific user, which runs the service:

networking.firewall = {
  extraCommands = ''
    iptables -t filter -I OUTPUT 1 -m owner --uid-owner ${user} -m state --state NEW -j REJECT
  '';
  extraStopCommands = ''
    iptables -t filter -D OUTPUT 1 -m owner --uid-owner ${user} -m state --state NEW
  '';
};

By specifying 1, we're instructing iptables to insert the rule at the beginning of the chain (pushing any existing rules down by one position).

testing, monitoring, analyzing

some useful commands for diagnostics of systemd services

  • systemctl list-unit-files - list of all units with their current status
  • systemctl start <name>
  • systemctl restart <name>
  • systemctl stop <name>
  • systemctl status <name> - unit state, started/stopped timestamps , running processes, etc
  • systemctl cat <name> - contents of a systemd unit file, generated by NixOS
  • systemctl show <name> - actual properties of a systemd unit in effect
  • journalctl -e -u <name> - show logs for a unit, scrolled down to the most recent records
  • journalctl -u <name> -f - to monitor systemd service output in real time (by analogy with tail -f)
  • journalctl -b-1 -u <name> - in case you want to see logs only for previous boot
  • systemd-analyze security - show security summary for all running services ("SAFE", "EXPOSED" and "UNSAFE" do not mean the factual situation, rather whether various systemd hardening features are in use or not)
  • systemd-analyze security <name> - show more detailed analysis for the specified service, indicating which options might be set
  • htop using tree view (F5) - to inspect the whole tree of processes/threads (nix-shell -p htop --run htop if you don't have it installed)

cgroups

cgroups (control groups) linux feature powers systemd. And it allows to have unified control over a collection of processes within a single service. systemd-cgtop shows top control groups by their resource usage (output can be sorted by utilization of CPU, memory, IO load, number of tasks). It can be a good alternative to top/htop, because on a server we often care about service entities as a whole, rather than numerous processes, whose stats are hard to sum up in mind.

Just in case, note that enabling netdata service in NixOS enables systemd.enableCgroupAccounting, which in turn enables these options in systemd.conf:

DefaultCPUAccounting=yes
DefaultIOAccounting=yes
DefaultBlockIOAccounting=yes
DefaultIPAccounting=yes

when trying systemd options alone

You can manually test various systemd options without writing service files with the help of systemd-run, for example:

$ ls -l /home
total 0
drwx------ 1 alex users 1126 2023-06-21 19:26 alex

$ sudo systemd-run -p ProtectHome=yes --shell
Running as unit: run-u2544.service
Press ^] three times within 1s to disconnect TTY.

# ls -l /home
total 0

# exit
Finished with result: success
Main processes terminated with: code=exited/status=0
Service runtime: 2.749s
CPU time consumed: 50ms
IP traffic received: 0B
IP traffic sent: 0B
IO bytes read: 0B
IO bytes written: 0B

using tmux shell via socket inside a systemd unit

With the help of tmux you can run a shell inside a hardened systemd unit in order to test our isolation in practice. Here is example-systemd-service.nix nix file, the path to which you can add to the imports list in configuration.nix and then execute nixos-rebuild switch or nixos-rebuild test (if you don't want new configuration to be permanent; however, it leaves ./result symbolic link in current directory).

# nix-shell -p tmux --run "tmux -S /run/example-service/tmux.socket attach"

unsolved problems

As of 2023-07-10 systemd.services.<name>.confinement.enable NixOS option is not compatible with systemd's ProtectSystem.

final notes

Systemd hardening is just a part of measures to be taken to narrow the potential threat landscape and risks for a server. Ideally, vulnerabilities scanning, penetration testing, unauthorized access prevention and security audits should be involved. Take advantage of monitoring tools and react quickly, according to a rescue plan to mitigate the impact of intrusion incidents. This might include restoring system from backups, keys and passwords reset, etc. Keep running software up to date and respond to CVEs (deploying software with patches is easy in NixOS in case it hasn't been already patched). Have a business continuity plan. Many measures must not be ad-hoc, but rather systematic to stay vigilant against emerging threats.

As for NixOS, it also features security.apparmor, security.audit and even programs.firejail options which might help in building a more secure system.