There are two basic concepts in Fedora’s implementation of read-only root. A temporary scratch space, and a persistent storage space. You will ultimately always end up using the temporary scratch space, so the persistent storage space is optional.
In Fedora/RHEL (or variants based on RHEL like CentOS and Scientific Linux), the primary config file that sets up read-only root filesystems is /etc/sysconfig/readonly-root
. That’s what is read by the system startup scripts. You can read a bit about it here – don’t let the documentation there referencing really old versions of Fedora faze you; this all still works. I went ahead and rolled my own install image, bypassing the suggested used of cobbler.
The Scratch space (RW_MOUNT
and RW_LABEL
in /etc/sysconfig/readonly-root
) is used to store transient things, like the contents of /tmp. There are three sets of files and directories that dictate what ends up in scratch space:
/etc/rwtab
, /etc/rwtab.d/*
, and /run/initramfs/rwtab
. The initramfs rwtab file points to things that are set up by the initial ramdisk. In the case of my desktop at home that boots via iPXE and has an iSCSI boot disk, this contains PXE-provided network configuration. The rest are distribution-specific defaults (/etc/rwtab
), or entirely user configurable (/etc/rwtab.d/*
).
The scratch space can be a RAM-backed tmpfs filesystem, or a filesystem on a block device depending on how you configure readonly-root support.
Note: the contents of the scratch space is wiped clean on every boot, regardless of it being a filesytem on-disk or not.
The persistent storage area is for things you want to keep around between reboots. This could be host-specific things like SSH host keys, configuration files, etc - if they’re not already present in the read-only root filesystem. Imagine an entire data center of Linux boxes that all share a common, read-only root filesystem, and the things that keep them different are stored in the persistent storage. I imagine big “cloud” vendors do this.
The defaults for persistent storage is to store nothing at all. The configuration for the persistent storage are /etc/statetab
and /etc/statetab.d/*
, & they are used the same way the scratch space is configured.
But there’s an important thing to take note of: these persistent files/directories are bind-mount overlays1 from the stateful storage into the read-only root filesystem. This is where things can get a little weird, so I’ll explain by way of example that I actually ran into that failed because of the bind-mount.
Normally, user account information on a UNIX box is stored in /etc/passwd
, /etc/shadow
, /etc/group
, and /etc/gshadow
. I added those files to a /etc/statetab.d/system-auth file, and copied the files to /var/lib/stateless/state/etc
. Rebooted, logged in successfully, and tried to change a password:
1 2 3 4 5 6 |
|
Why did this happen? It’s because of the way /etc/passwd
and friends are updated when user account information is changed. A temporary file for each is created, named /etc/nfilename, that one is modified, and then the temporary file is renamed over-top of the old one. This actually tries to make a change on the read-only filesystem twice: the first is the creation of a new file on the read-only filesystem, and the second is the rename process. The rename process tries to modify the filesystem inode2 to re-point where the filename is pointing to data on disk. None of this happens, so it dies with the ever so useful token manipulation error message.
You’re probably thinking: “Wouldn’t it just be easier just to copy the contents of the /etc/npasswd
file into /etc/passwd
?” - it might be, but you run the risk of wiping out your password files if there isn’t room for one complete copy of the file on disk. That’s why a temporary file is created, and renamed over-top. It also helps prevent race-conditions reading the files, since the rename is atomic. Atomic means “either this happened or it didn’t happen at all”. A copy is not atomic, so you could read the complete contents of the file from disk - while data is still being added to the end.
But I’m not letting this stop me: I’m going to move nearly all the user accounts of /etc/passwd
and friends, and use LDAP. LDAP will be hosted in /var, in the persistent storage space, and in the event that fails, I’ll have a backup account in /etc/passwd
that has an absurdly complex password, and is only used in emergencies.
You can specify the filesystems for storage by listing them in /etc/fstab, or by labeling them with the tune2fs -L
command.
tune2fs -L stateless-rw /dev/sdXn
Scratch fstab mount point: /var/lib/stateless/writable
tune2fs -L stateless-state /dev/sdXn
/var/lib/stateless/state
For the scratch space, you can make use of a tmpfs RAM filesystem instead of an on-disk filesystem by not having a /var/lib/stateless/writable
mount point in /etc/fstab
, and not applying a stateless-rw
label to a filesystem. Be aware that the size of that tmpfs filesystem is limited to 1/2 your system memory.
The next post will cover the installation of a bare minimum Fedora 19 install, configuring it to be a read-only root filesystem, and the gotchas I found along the way.
Long story short: I accept the fact that at some point, all of my computers will be compromised. Whether that compromise is mass-malware, or some kind of targeted attack is irrelevant. The point is some kind of unauthorized change will occur. If I can reduce the number of disk locations that are writable, I can reduce the number of locations an attacker can leverage to remain on my system.
It’s entirely possible to remain persistent on a system w/o making any changes on-disk, but that’s outside of the scope of this post. Lets assume for the sake of argument that the attacker will want to keep my computer compromised. They’ll need to make some kind of change so they can stay on the system. Replacing the binary of a system service with a trojanized one, adding a new service to start on boot (e.g. in /etc/init.d, or in /etc/systemd), modifying some user’s login scripts, or reoccuring scheduled task (via crond, or atd, etc).
It’s my goal to reduce the number of locations that the attacker can add new things, or replace existing things to keep their foothold on my system. If they can’t add new things to come up on boot, it’s more difficult. This isn’t perfect, but something that could be part of an overall security-minded system posture.
I know that having a UNIX box’s filesystems read-only isn’t a pancea, I read an article once that described a specialized UNIX box that had it’s root filesystem in a read-only flash/PROM device - which the vendor said made it more secure - ultimately, the executables on the read-only device had vulnerabilities, which couldn’t be patched because the filesystem couldn’t be modified. So there are ups and downs of this concept.
Fedora has read-only root support, enabled by /etc/sysconfig/readonly-root. To get it working on a brand new install, you need to do two or three things.
At this point you should have a read-only root filesystem. Any other filesystems (like /boot) are most likely still read-write, so you’d have to change those to read-only if you want them locked down too.
Bonus: Linode has an API method w/ a read-only flag to set VPS disks to read-only.
]]>~/.ssh/authorized_keys
.
So here’s the skinny. You’ll need two things to get this done.
Take a look at the sshd(8) man page, and scroll down to the AUTHORIZED_KEYS FILE FORMAT
section. There’s a section on options there, where you can tell the SSH server to restrict clients connecting in with that key. One of them is to force a command via command="command here"
. So that’s how to explicitly force a command to be run when an SSH client connects using that key. It completely forbids you from trying to run anything else. Sweet, huh?
About that simple shell script. The SSH daemon on the remote server puts the original command info an environment variable, SSH_ORIGINAL_COMMAND
. So here’s what you do. Put the below script in /tmp/saveme.sh, chmod a+rx
it to make it executable, and put command="/tmp/saveme.sh"
in your authorized_keys
file.
1 2 3 4 5 |
|
So run your command using the SSH key you have dedicated to the log pull, extract the arguments of what that program wants to run on the remote host by looking at /tmp/command.txt, and edit the command="stuff"
line in your authorized_keys
file.
Now. This is all about segmentation, least privilege, etc right? So lets REALLY lock this down. Other options you want to pay attention to:
So set up your rsync or whathaveyou in cron and begin pulling logs. If you want to do log pulls say every 5 minutes, you could have multiple copies of the cron job running at once. Prevent that with lockrun. Really this blog post is about SSH keys, but lockrun is where it really makes things shine. Lockrun is a wrapper around a command to be run, which prevent concurrent copies of that program running via a simple lockfile. You use this via cron. It’s dead simple.
from=”pattern-list” - the ssh_config(5) man page contains details on how to use this to restrict what host and/or IP address is permitted to connect in using this key.↩
All the student VMs are on the same virtual network.
This allows network cross-contamination (one student can generate traffic another student will see), which will lead to confusion. The VMs per student need to be segregated to their own virtual network.
How to fix: Script up automation of creating new virtual bridges in libvirt.
There isn’t really any “network” access to the Windows VM.
The MITM system is connected to the Nova Labs network, and the Windows VMs are connected to a “malware” virtual network that is also connected to the MITM system. I think I’ll need more advanced connectivity for more complex lab exercises later.
How to fix: Make dedicated MITM VMs & dedicated “malware” virtual networks per student.
I need a way to temporarially grant secure access to people remotely.
I’m thinking OpenVPN. It’s cross-platform, and I can use x509 certificates that I generate using TinyCA2. I’ll do certificate revocation status checking as described by this guy here.
How to fix: Set up a Certificate Authority for the lab, and get OpenVPN running.
Segregate remote students from each other so they can’t attack each other. I think I can handle this with scripts in OpenVPN, and some network filters in libvirt. I need to “link” an OpenVPN connection to a libvirt virtual network so that one student can’t touch another student’s network.
How to fix: Research OpenVPN scripting and libvirt network filters.
I’m concerned about our internet connection bandwidth at the space.
I’m sharing out two VNC screens per analyst. The Instructor VM 15 times over, and one dedicated screen per student. That’s a total of 30 concurrent VNC sessions. That’s a lot of bandwidth if all 15 students are remote. I’m considering buying a Linode for just the bandwidth so I can send one VNC stream of the Instructor VM from the space to the Linode, and then have the Linode “broadcast” that to the remote students. The dedicated student VM VNC sessions would still be sent directly from the space to the remote students.
How to fix: Buy a linode and limit the number of remote students.
Work out the kinks in streaming.
I heard someone say that the best setup for streaming is just the instructor audio, and then a stream of the slide deck. E.g., don’t bother with putting a talking head on screen. I think I like that idea, and I think that makes things easier for UStream.
The audio in the first class wasn’t that great because I wasn’t mic’d directly. There was a lot of ambient audio in the background. I think I need to bring either a bluetooth hands-free headset, or bring my Playstation USB headset.
The video cut out on the stream a couple of times - I’m not sure what caused that :(
This is the story of me getting Octopress operating “correctly” at home.
Note: I went through an iterative process of breaking shit, uninstalling, re-installing, troubleshooting, researching, uninstalling, re-installing things in a different way until things started working. This lack of fear of breaking something that’s already broken is special: It’s the SysAdmin way. I’ve attempted to document below what went wrong and what I did to fix it, but as I attempt to re-trace my steps, things aren’t adding up correctly.
This was quite possibly my first real venture into the Ruby programming language, so I’m about as uninitiated as you can be. I skimmed through the Octopress documentation, and started installing Ruby packages on my Fedora 18 home desktop. A move I’d soon regret, in about 12 hours time. Spoiler: I didn’t know that the Ruby community has a thing for requiring specific versions of Ruby and associated Ruby gems (extensions, modules, or libraries by any other name) together with an application. So Octopress had these Gemfile
and Gemfile.lock
files in the repository that described a specific Ruby environment that the Octopress developers had blessed as “good”.
In good inquisitive hacker learning fashion, I completely and quite intentionally disregarded it all. I learn by doing. :)
You’re supposed to run bundle install
to install Octopress’s pre-requisites (including a specific version of the Ruby interpreter itself), which will end up in ~/.gem
. Bundle’s job is to make setting up an application repeatable, but I was unwittingly playing mix-master with multiple versions of gems and multiple locations Ruby gems could be installed on the file-system.
<-- where RPM packages put gems
<-- where gem(1) puts gems
<-- where bundle puts gems
The first thing that went wrong was running Ruby’s version of make
, rake
as part of installing the default Octopress theme. Rake informed me that I was running too new of a version of rake:
1 2 3 4 |
|
This was because I had a copy of rake
from my system in /bin/rake
, and a copy from bundle
in ~/.gem
somewhere. I wasn’t using the version of rake
specified by the Gemfile.lock
that bundle
downloaded. From the error message I learned about the bundle exec
command. This tries to run the specific versions of programs that are part of your application’s bundle.
So, bundle exec rake install
? Yep, that worked. I thought I was golden until I tried to run bundle exec rake generate
to “compile” the website. I thought to myself “Am I going to need to run bundle exec every time?”. Something was silently failing to run. The “My Octopress Page is coming soon” index.html
made by the Rakefile
was still in the _deploy directory, and none of the templates had executed and produced any output files. So I went digging on the Rakefile
:
1 2 3 |
|
It turns out, this system() call wasn’t locating the compass
script somewhere buried under ~/.gem/
. When I tried to run compass
on the command line straight up, I got -bash: compass: command not found
. bundle exec compass
worked for me on the command line, and that’s essentially what system() does. It runs a shell to run the program. I tried to hack the Rakefile to use bundle exec
in the system() call, but [ my memory is fuzzy here ] things didn’t work. I ultimately resorted to uninstalling Ruby entirely, from RPM packages, and the Gems installed in the 3 locations above. Then I went through a process of installing the core of Ruby from RPM, installing Gems as a user to get a list of Gem + version Octopress wanted, and then installed those explicitly as root with gem
system-wide.
1
|
|
At that point I got compass
and jekyll
in standard system paths (/bin, and /usr/local/bin). So I tried to run rake generate
again:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
Uhh. Excuse me? Why can’t you find json?
1 2 3 4 5 6 7 8 9 |
|
No warnings. Normal Ruby loads json. So why can’t jekyll when it’s run by rake? From the Rakefile above, we know that rake
simply runs jekyll via system(). What happens when we run it standalone?
1 2 3 4 5 |
|
Ok, this shit is getting real. SYSADMIN INVESTIGATION MODE ACTIVATE
I’ve got a few tricks up my sleeve, one of which I use for helping me secure automated batch processing through ssh pubkeys. Note to self: write this up later!
Lets look at the execution environment of compass
and jekyll
when run via rake generate
. I do this by running the command through bash, but as a series of commands instead of just one. I compared this to my normal shell environment and saw some things that were added.
1
|
|
And the output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
|
So what stands out here? There’s some environment variables that look like they’re explicitly related to the path for Gems, GEM_HOME
, GEM_PATH
, and RUBYOPT
. Exactly the same way I printed out these environment variables, I can unset them. So now in the Rakefile
we have this:
1 2 3 |
|
And now we have this:
1 2 3 4 5 6 |
|
So something that is setting those environment variables that is supposed to be helping is ending up hurting us. I bet it’s the bundle stuff. I found this at the top of the Rakefile
:
1 2 3 |
|
So I commented out the require "bundler/setup"
line and re-ran rake generate
. Now:
1 2 3 4 5 6 |
|
Viola! And now I have a blog.
]]>