For this example/howto I’m going to be using a Fedora 19 system. I’m trying to set up a read-only root VM where it’s usable as a general Linux server, not as some single-purpose appliance, like a streaming media center server. Depending on your desired use-case for OS/application data, you have some options for managing data that isn’t in the read-only root filesystem.
There are two basic concepts in Fedora’s implementation of read-only root. A temporary scratch space, and a persistent storage space. You will ultimately always end up using the temporary scratch space, so the persistent storage space is optional.
In Fedora/RHEL (or variants based on RHEL like CentOS and Scientific Linux), the primary config file that sets up read-only root filesystems is /etc/sysconfig/readonly-root
. That’s what is read by the system startup scripts. You can read a bit about it here – don’t let the documentation there referencing really old versions of Fedora faze you; this all still works. I went ahead and rolled my own install image, bypassing the suggested used of cobbler.
Scratch space (stateless)
The Scratch space (RW_MOUNT
and RW_LABEL
in /etc/sysconfig/readonly-root
) is used to store transient things, like the contents of /tmp. There are three sets of files and directories that dictate what ends up in scratch space:
/etc/rwtab
, /etc/rwtab.d/*
, and /run/initramfs/rwtab
. The initramfs rwtab file points to things that are set up by the initial ramdisk. In the case of my desktop at home that boots via iPXE and has an iSCSI boot disk, this contains PXE-provided network configuration. The rest are distribution-specific defaults (/etc/rwtab
), or entirely user configurable (/etc/rwtab.d/*
).
The scratch space can be a RAM-backed tmpfs filesystem, or a filesystem on a block device depending on how you configure readonly-root support.
Note: the contents of the scratch space is wiped clean on every boot, regardless of it being a filesytem on-disk or not.
Persistent storage (stateful)
The persistent storage area is for things you want to keep around between reboots. This could be host-specific things like SSH host keys, configuration files, etc - if they’re not already present in the read-only root filesystem. Imagine an entire data center of Linux boxes that all share a common, read-only root filesystem, and the things that keep them different are stored in the persistent storage. I imagine big “cloud” vendors do this.
The defaults for persistent storage is to store nothing at all. The configuration for the persistent storage are /etc/statetab
and /etc/statetab.d/*
, & they are used the same way the scratch space is configured.
But there’s an important thing to take note of: these persistent files/directories are bind-mount overlays1 from the stateful storage into the read-only root filesystem. This is where things can get a little weird, so I’ll explain by way of example that I actually ran into that failed because of the bind-mount.
Trying to put user account information into persistent storage
Normally, user account information on a UNIX box is stored in /etc/passwd
, /etc/shadow
, /etc/group
, and /etc/gshadow
. I added those files to a /etc/statetab.d/system-auth file, and copied the files to /var/lib/stateless/state/etc
. Rebooted, logged in successfully, and tried to change a password:
1 2 3 4 5 6 |
|
Why did this happen? It’s because of the way /etc/passwd
and friends are updated when user account information is changed. A temporary file for each is created, named /etc/nfilename, that one is modified, and then the temporary file is renamed over-top of the old one. This actually tries to make a change on the read-only filesystem twice: the first is the creation of a new file on the read-only filesystem, and the second is the rename process. The rename process tries to modify the filesystem inode2 to re-point where the filename is pointing to data on disk. None of this happens, so it dies with the ever so useful token manipulation error message.
You’re probably thinking: “Wouldn’t it just be easier just to copy the contents of the /etc/npasswd
file into /etc/passwd
?” - it might be, but you run the risk of wiping out your password files if there isn’t room for one complete copy of the file on disk. That’s why a temporary file is created, and renamed over-top. It also helps prevent race-conditions reading the files, since the rename is atomic. Atomic means “either this happened or it didn’t happen at all”. A copy is not atomic, so you could read the complete contents of the file from disk - while data is still being added to the end.
But I’m not letting this stop me: I’m going to move nearly all the user accounts of /etc/passwd
and friends, and use LDAP. LDAP will be hosted in /var, in the persistent storage space, and in the event that fails, I’ll have a backup account in /etc/passwd
that has an absurdly complex password, and is only used in emergencies.
Configuration of persistent & scratch space
You can specify the filesystems for storage by listing them in /etc/fstab, or by labeling them with the tune2fs -L
command.
- Scratch label:
tune2fs -L stateless-rw /dev/sdXn
-
Scratch fstab mount point:
/var/lib/stateless/writable
- Persistent label:
tune2fs -L stateless-state /dev/sdXn
- Persistent fstab mount point:
/var/lib/stateless/state
For the scratch space, you can make use of a tmpfs RAM filesystem instead of an on-disk filesystem by not having a /var/lib/stateless/writable
mount point in /etc/fstab
, and not applying a stateless-rw
label to a filesystem. Be aware that the size of that tmpfs filesystem is limited to 1/2 your system memory.
Next post
The next post will cover the installation of a bare minimum Fedora 19 install, configuring it to be a read-only root filesystem, and the gotchas I found along the way.