Portal:MicroOS/RemoteAttestation
Remote Attestation with Keylime
Trusted Computing is a big topic that encompass multiple technologies, all of them using a single piece of hardware used as root of trust: a cryptographic co-processor named TPM (Trusted Platform Module).
Once you take ownership of it, it will generate a private key that cannot be extracted (by usual means) and that can be used to generate new secondary keys, sign some hashes and encrypt some documents.
Almost all the systems running nowadays have a TPM device included. They are cheap and generally available for multiple architectures. If you device is old there is a good chance that will have a TPM with version 1.2. For now this version is not supported, as the version 2.0 has been available for now almost 6 years (2015).
The TPM is usually disabled by default, but can be enabled from the EFI / BIOS menu. In this menu there is also the option of clearing all the internal data, invalidating all the keys generated by the TPM, and subsequently invalidating also all the signed documents. So use this option with caution.
Measured boot
If we already have a TPM 2.0 enabled, we can also start using it to inspect if the boot chain has been tampered in any form. This process is known as measured boot. Each step in the boot process will calculate a hash based on the content on memory (or disk) of bits of the next stage, and register this hash inside some of the internal registers in the TPM.
Those registers are known as PCR, and a typical TPM have 24 of them and are designed to store hash values. Hash functions like SHA1 or SHA512 have different sizes, and the TPM can enable multiple of them at the same time. This make that for each PCR we will have multiple slots (banks) or variants of the same register, one for each hash function. For example, is we enabled in the TPM the hashes SHA1 and SHA256, the PCR#1 will have two versions, one with 20 bytes and other with 32 bytes.
After the reboot the PCRs are loaded with a good known value, usually all 0 or all 1 depending on the register. We cannot write a specific value on those registers, but we can use the extend operation to update the content. An extension of a PCR is defined as "PCR <- HASH(PCR_VALUE || DIGEST)". For example, if we want to extend the SHA1 slot of the PCR#1 with a given digest value, we need to prepend it with the current value of the PCR and calculate the SHA1 of the aggregate.
Every time that there is an extension, the digest value is stored in a log that, eventually, will reach the kernel and will be available to the user. This log (named event log) can be used to recreate the expected values of the PCRs. We can now ask the TPM for a signed report of the real values of the PCRs (named quote), and compare them with the calculated ones.
Because the quote is signed by a private key that only the TPM has we can be sure that those are the current values of the PCRs. If the recalculated values match with the quote from the TPM we are also sure that the hashes listed in the event log are the ones used during the extensions.
We can use this last fact to compare those hashes from a list of good hashes that we have from a different source. This event log will have one hash for the firmware (EFI), another for the boot loader, another for the kernel, etc. We can now compare all of them with the list of good hashes that we have.
We can do the comparison locally, but is more effective if we send the event log and the TPM quote to a remote machine that have the list of good hashes to make the comparison. This process is known as remote attestation.
As we will see later, with remote attestation you can validate more elements than the measured boot hashes. A process can delegate on the TPM the sign of any generated document, that provide any kind of metric or any other information about the system. Because of this signature, we can:
- Be sure that the sender is the system that generate the document.
- The document has not been changed or altered in the original system, nor by any agent that intercept the communication.
With those guarantees we can send, for example, information about the executed programs and the hashes of those binaries generated by the IMA component of the kernel.
If during the boot process the system find a TPM device it will start doing the measured boot. Some static code living in the system will load some segments from the UEFI firmware, and will generate a digest that will be used to extend some of the PCR registers. This extension will be notified via an event, and registered in a way that we can later check. After that this component will delegate the execution to the UEFI firmware, that will do the same for the next component in the chain of load, until it reach the kernel.
Grub2 can help us to gather more data extending registers 8 and 9, that will account to the Grub2 command line, the kernel command line or the different files read by Grub2 during the load process (like the kernel and initrd). Check the Grub2 documentation for details.
To activate this feature, we should replace grub.efi from /boot/efi/EFI/opensuse to the version stored in /usr/share/grub2/x86_64-efi with the name grub-tpm.efi. We can do that automatically with:
shim-install --suse-enable-tpm
Installing Keylime
Keylime is an open source project designed to helps us to do the remote attestation. It is currently integrated in openSUSE MicroOS via two new system roles.
- MicroOS with Remote Attestation (Agent)
- MicroOS with Remote Attestation (Verifier)
All the systems that will participate in the remote attestation needs to be installed using the Agent system role, and the one that will do the validation would be installed with the Verifier system role.
Alternatively we can install the services via containers (see Notes section later for indications). The control plane container will be useful in situations when we do not want to install any Python dependency, problem that is not present in the Rust Keylime agent.
The system roles will take care of installing all the required software, but cannot check if there is currently a TPM 2.0 co-processor available.
We can check this with:
dmesg | grep TPM
If there no one is found, please review the options in the UEFI menu. If you find a TPM 1.2 you can check with the manufacturer if there is a firmware upgrade to 2.0.
At the end you will have two devices in your system: /dev/tpm0 and /dev/tpmrm0.
This last device, /dev/tpmrm0, is the resource management device implemented by the kernel. A resource management is required because the TPM is a very constrained processor, that require some help when multiple process try to access it simultaneously.
There is also an user space resource manager installed, independent of the kernel one that can be accessed via dbus and installed as a requirement of the Keylime dependencies. This is the TPM access broker and resource management daemon (abrmd) that lives in the tpm2-abrmd.service, that is automatically activated via DBus.
We can check that all is working properly with:
# Generate random numbers tpm2_getrandom --hex 10 systemctl status tpm2-abrmd.service
In any moment we can inspect the PCR registers from the command line:
tpm2_pcrread
We can inspect the different extensions and the reasons reading the event log:
tpm2_eventlog /sys/kernel/security/tpm0/binary_bios_measurements
Those commands are useful when we need to extract the good PCR values form a new installed system.
Keylime verifier
We should set up the verifier first, so the agents will have a place to connect to.
Before starting the verifier and registrar services we should inspect the configuration file in /usr/etc/keylime/*.conf. The one deployed in the package should be OK for the verifier and registrar roles, and if there are required changes we can add the snippets in /etc/keylime/verifier.conf.d/ or /etc/keylime/registrar.conf.d/. For a complete example, check the next section of the Keylime agent, when we will create a snippet that will change the default configuration (note also the file and directory ownership and permissions).
Now we can activate the verifier and registrar services:
systemctl enable --now keylime_verifier.service
systemctl enable --now keylime_registrar.service
The verifier needs to create some certificates the first time and the registrar will fail if those certificates are not available.
Keylime agent
For each system in out network, we will use the Agent system role for the installation. This role will install all the IMA/EVM infrastructure, the TPM/TSS tools and the Keylime agent service.
There are two different Keylime agents in MicroOS. One is the traditional Python keylime-agent subpackage. This one is still the reference implementation for the agent service. It is well maintained and supported, but requires all the Python stack installed before it can be used.
The second agent is rust-keylime. This is implemented in Rust, with a much less footprint, and will become the reference implementation in the future. For MicroOS we decided to set this version as the default one, as today is fairly feature complete, with the advantage of much less memory and disk footprint and a better security model.
The default configuration in /usr/etc/keylime/agent.conf, under the keylime:tss user and group and before starting the service we need to make some adjustments.
For that we will add a new configuration snippet in /etc/keylime/agent.conf.d/, that will contain only the changes over the default one, distributed by the package. Because we are using the Rust agent, we need to do something like:
mkdir -p /etc/keylime/agent.conf.d cat << EOF > /etc/keylime/agent.conf.d/agent.conf [agent] uuid = "d111ec46-34d8-41af-ad56-d560bc97b2e8" registrar_ip = "<REMOTE_IP>" EOF chown -R keylime:tss /etc/keylime chmod -R 600 /etc/keylime
The uuid default value is "generate", so a new UUID will be generated for each execution of the agent. We can have more control setting ourselves this value, maybe calling uuidgen to generate our agent IDs.
We need to change also the <REMOTE_IP> and set the address where the registrar and verifier are installed (but keep the quotes), and because zmq protocol is still used, set the revocation_notification_ip.
If for some reason we are using the Python agent, the snippet should be similar, but note the difference with the quotes.
mkdir -p /etc/keylime/agent.conf.d cat << EOF > /etc/keylime/agent.conf.d/agent.conf [agent] registrar_ip = <REMOTE_IP> revocation_notification_ip = <REMOTE_IP> EOF chown -R keylime:tss /etc/keylime chmod -R 600 /etc/keylime
The Python agent will by default will generate an ID based on the hostname, but we can add this parameter in the snippet too if we want.
Also, since Keylime 6.3.0, we need to copy the certificate generated by the CA into the agent node. This certificate is used to validate the mTLS connection, to be sure that comes from the correct control plane.
# From the agent node mkdir -p /var/lib/keylime/cv_ca scp <REMOTE_IP>:/var/lib/keylime/cv_ca/cacert.crt /var/lib/keylime/cv_ca chown -R keylime:tss /var/lib/keylime/cv_ca
The Agent system role will install a new service, the keylime-agent, that needs to be enabled and started.
systemctl enable --now keylime_agent.service
The agent will contact to the register service, to communicate the certificates that the UUID of the agent.
The openSUSE package for MicroOS is configuring the agent to be run under the system user "keylime". Associated with the systemd service, there is a mount unit that will mount "/var/lib/keylime/secure" under this same user. This will be activated just before the "keylime_agent.service", and will remain active (mounted) once the agent service is stopped.
systemctl status var-lib-keylime-secure.mount
Register
The agent needs to be registered and accepted by the verifier before it start sending information about the system itself. This is done via a command like tool (keylime_tenant) that resides in the verifier node.
To register a new agent replace <AGENT> with the agent address, and <UUID> with the agent UUID use this command line:
keylime_tenant -v 127.0.0.1 \ -t <AGENT> \ -u <UUID> \ --cert default \ -c add
The UUID can be set directly in the keylime.conf configuration file (agent_uuid field), and can be inspected also from the system logs.
journalctl -u keylime_agent.service | grep "Agent UUID:"
To list the systems successfully registered, we need to use the reglist command:
keylime_tenant -v 127.0.0.1 \ --cert default \ -c reglist
We can remove the registered machine with replacing the last line with -c delete, and we can monitor the value of a PCR using the --tpm_policy parameter, that expect a JSON string.
... --tpm_policy '{\ "5":["223DD8701C16AC430BDDB1B409792AE6002121E4",\ "2B23381030DEF370AF781B143A25761F03A1D27F44922695D32EC74A96595576"],\ "6":["B2A83B0EBF2F8374299A5B2BDFC31EA955AD7236",\ "3D458CFE55CC03EA1F443F1562BEEC8DF51C75E14A9FCF9A7234A13F198E7969"]}'
This can be used to reference the PCR registers involved the in the measure boot process, for example.
For convenience this policy can be added in the keylime configuration file.
Keylime can also validate the event log, and recreate the expected values of the PCRs following the event log and compare them with the current values.
For that we need to create an empty measured boot referential state (Keylime is in the process of making this more explicit) and use the --mb_refstate parameter.
echo "{}" > empty_mb_refstate.json ... --mb_refstate empty_mb_refstate.json
Payload
Keylime can deliver secret data to the agents once they are recognized and verified. This data can be, for example, key certificates or passwords that needs to be deployed in the system.
During the registration process, we can add a the parameter --include to reference a ZIP file that will be encrypted and sent to the agent, and there will be extracted and executed.
The ZIP file needs to contain a shell script autorun.sh, that will be executed only in the device is verified. If we are using the Python Keylime agent, this payload can have also some Python scripts, that will be executed if a security breach has been found. The Rust agent can be compiled to provide some support to those Python agents, but version distributed by MicroOS is configured without this feature.
For more information reference the Keylime documentation
Enabling IMA tracking
IMA/EVM is another complex topic, described in deep in the IMA/EVM page in the openSUSE wiki.
With IMA we can calculate the hash of a file before it is read or executed. This hash is compared with one inserted in the extended attributes of this file, and if are the same the system will authorize the access. This alone is not enough, and EVM is used to sign thin hash stored in the extended attributes, and guarantee that it is not changed by a third party.
This authorization happens when the system is in appraisal mode. But there are other modes of execution, when the kernel can adjust the extended attribute of the file to fix the registered IMA hash, and another mode where the hash is calculated and logged into the event log.
When a TPM is available, a PCR register (usually PCR#10) will be extended per each IMA measurement. A quote of the TPM and the logs can be send also to a remote verifier that can attest the comparison.
With Keylime we can do this remote attestation of the IMA hashes, supported by the security provided by the TPM in the system. For more information refer to the Keylime documentation.
To enable IMA measurements in out systems we have two options. One is enable the default kernel policy in the agent, that can be done adding the parameters ima_appraise=log ima_policy=tcb in the kernel command line.
This can be persisted updating GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub and regenerating grub.cfg.
transactional-update grub.cfg reboot
This example is using the tcb IMA policy, that comes included in the kernel. Sadly this default policy it is very basic, and will measure too much of out systems, including the parts that will change frequently during daily usage.
It is recommended to create a new policy to avoid long logs and the monitoring of too many files.
The second option is to use the policy provided by the package keylime-ima-package on systems where SELinux is enabled and the files are labeled with SELinux types. This policy try to minimize the part of the measured system that is irrelevant from a security point of view, and can be adapted changing it from /etc/ima/ima-policy, as it is the place where systemd will look for during boot time.
We can indicate during the agent registration the expected hashes using the --allowlist parameter, and the excluded or ignored files via the --exclude parameter, or use the new runtime policy parameters that Keylime support.
Some IMA documentation also suggest to use rootflags=i_version to avoid the re-creation of already calculated hashes, but this parameter seems to cause problems during the boot process.
A demo
To understand Keylime better we can do a little demo will exercise some of the main components that can be used when planning a deployment. This demo is an extended version of the one described in the Keylime documentation for secure payloads, except that is also showing the integration with IMA, measured boot and TPM policy.
In general terms, we are going to see:
- How to extract IMA hashes and configure Grub2
- How to create a certified payload
- How to produce a revocation and isolate a hacked node
But first we need to allocate three nodes with a TPM / vTPM. If we have libvirt we need to install the TPM emulator swtpm, and add the TPM as extra hardware before doing the installation of the OS.
One of the nodes will be installed using the verifier system role, and the other two via the agent system role. See the previous sections for details, just remember to set the IP of the remote verifier in the configuration file of the agents.
Extracting the IMA hashes
The default IMA policy of the Linux kernel measure a lot of files, and this will produce a lot of noises a false positives. It is recommended to set your own policy (check IMA/EVM page in the openSUSE wiki for details), but for today we are going to use the default one.
To enable it add ima_appraise=log ima_policy=tcb in the GRUB_CMDLINE_LINUX_DEFAULT variable in /etc/default/grub and regenerate grub.cfg in the agent nodes:
transactional-update grub.cfg reboot
Once that is back we need to generate the list of good hashes of the system. The IMA system once is enabled will register all the files indicated by the policy, this will go as far a registering the files living in the initrd, and this can cause a mismatch from the one living in the current sysroot. We need to keep this in mind, as should be fixed adjusting the allowed list, the exclude list and the policy file.
For now lets generate the IMA allow list.
OUTPUT=/tmp/allowlist.txt ALGO=sha256sum rm -f "$OUTPUT" cd / head -n 1 /sys/kernel/security/ima/ascii_runtime_measurements | awk '{ print $4 " boot_aggregate" }' | sed 's/.*://' >> "$OUTPUT" find `ls / | grep -E -v "boot|dev|mnt|proc|run|.snapshots|srv|sys|tmp"` \( -fstype rootfs -o -xtype f -type l -o -type f \) -uid 0 -exec "$ALGO" '/{}' >> "$OUTPUT" \;
This basically extract the boot_aggregate hash of this system from the IMA measurements file, and create a big list of hashes of the system. If you want, you can extract the content of the initrd and append the hashes into the allow list file. Just remember to remove the prefix of the directory that you used to extract the ramdisk.
mkdir /tmp/initrd && cd /tmp/initrd lsinitrd --unpack $(readlink -f /boot/initrd) find -type f -exec $ALGO "./{}" \; | sed "s| \./\./| /|" >> $OUTPUT rm -fr /tmp/initrd
Eventually we will provide a tool that will help in the generation of those hashes, or better, will add them in the filesystem extended attributes.
Now for this example lets create a custom script in /root and aggregate the hash into the list.
cat << EOF > /root/greets.sh #!/bin/sh echo "Hello!" EOF
chmod a+x /root/greets.sh
"$ALGO" /root/greets.sh >> "$OUTPUT"
Copy the the allowlist.txt files into the verifier node. Also, in the verifier node we can create a proposal of exclude list. For now we are going to exclude too many files. This should be drastically restricted in production.
cat << EOF > exclude.txt /boot/.* /dracut-state.sh /etc/.* /root/.bash_history /root/.lesshst /root/.ssh/.* /.snapshots/.* /sysroot/.* /usr/bin/dracut.* /usr/etc/.* /usr/lib/.* /usr/share/.* /var/lib/.* /var/log/.* EOF
Each line is a Python regular expression.
Creating the payload
In the verifier node we are going to create the payload. For this example we are going to deliver the SSH public and private keys, shared by all nodes, the authorized_keys and a local action written in shell script, that will fence the node in case of revocation. The local action script will be called with one parameter, a JSON document that will contain the basic information of the action.
Note that for this demo the agent will need to execute certain scripts as root (move certificates into /root/.ssh or call iptables, for example). In a real scenario maybe is not a good idea to require those permissions, but for now we will execute the Keylime agent under the root user (still the default in the upstream configuration, but our package select keylime:tss for that). Edit the /etc/keylime.conf and comment the run_as parameter.
mkdir payload ssh-keygen -q -b 2048 -t rsa -N "" -f payload/id_rsa
cat << EOF > payload/autorun.sh #!/bin/bash mkdir -p /root/.ssh/ cp id_rsa* /root/.ssh/ chmod 600 /root/.ssh/id_rsa* cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys EOF
cat << EOF > payload/action_list local_action_rm_ssh.sh EOF
cat << EOF > payload/local_action_rm_ssh.sh #!/bin/bash # Consider only revocation types kind=\$(grep -o '"type":"[^"]*' \$1 | grep -o '[^"]*$') [ "\$kind" != "revocation" ] && exit 0 echo "Action type: revocation" # Directory where the payload is stored. It also is part of the # secure directory unzipped=/var/lib/keylime/secure/unzipped agent_id=\$(grep -o '"agent_id":"[^"]*' \$1 | grep -o '[^"]*$') ip=\$(grep -o '"ip":"[^"]*' \$1 | grep -o '[^"]*$') echo "Compromised system: \${agent_id} (\${ip})" if [ -f "\${unzipped}/\${agent_id}-cert.crt" ]; then echo "Removing cerificates from inside the compromised system" rm -f /root/.ssh/id_rsa rm -f /root/.ssh/id_rsa.pub rm -f /root/.ssh/authorized_keys else echo "Blocking compromised system" iptables -A INPUT -s "\$ip" -j DROP fi EOF
Hacking the node
We can now register the nodes and start collecting the TPM quotes and the IMA logs. For that first we will create an empty measured boot ref state, so Keylime can replay the event log and compare the calculated PCRs with the current PCRs from the quote.
echo "{}" > mb-refstate.json
keylime_tenant -v 127.0.0.1 \ -t <AGENT> \ -u <UUID> \ --cert default \ --include payload \ --allowlist allowlist.txt \ --exclude exclude.txt \ --mb_refstate mb-refstate.json \ -c add
If we have any complains about keylime.ima - ERROR - IMA ERRORS we should adjust the some of the list, maybe the exclude one to drop any transient file for example. Before re-registering it we need to delete (-c delete) and reboot the node, to re-create the new IMA log file in the kernel.
If we do not have a different allow list for the second node, we can register this particular one without IMA:
keylime_tenant -v 127.0.0.1 \ -t <AGENT> \ -u <UUID> \ --cert default \ --include payload \ --mb_refstate mb-refstate.json \ -c add
If everything goes OK we should have in both nodes the payload extracted in /var/lib/keylime/secure/unzipped, with the local actions, the certificates and the autorun.sh script. This script should be executed and deployed /root/.ssh with the SSH keys.
We can test the password-less SSH connection from the verifier:
ssh -i payload/id_rsa <AGENT>
and from the second node:
ssh <AGENT>
In the first node we should have the ad-hoc script /root/greets.sh created before, that can be executed safely.
./greets.sh
We can simulate the effects of an attacker changing this file and executing it again:
echo 'echo "Hacked!"' >> greets.sh ./greets.sh
The verifier will detect the change in the file in the next request, and will command the execution of all the local actions delivered in the payload. In this case we have only one that will remove the SSH keys in the affected node, and for the rest will command an iptables that will drop all the connections initiated from the affected node.
We can check the effects in the first node, and see that /root/.ssh is empty, and that in the second node we have now a new iptables rule that is, indeed, drooping the all the connections from the first node.
Notes
Containers for the verifier and the agent
Since 6.5.1 openSUSE is providing containers for the control plane services (Keylime verifier, registrar and the tenant command line tool), and for the Rust agent.
They are currently located in the OBS devel:microos:containers project, and soon will be available in openSUSE:Factory.
Both images are based on Tumbleweed, and will be updated automatically to the last version of Keylime once they are released.
To install the verifier and the registrar, we need to deploy "keylime-control-plane-image" as documented in the project README.
The agent service based on Rust can also be installed. The installation and usage instructions are also in the project README, but the utility of this component is less interesting unless it is used as a base to monitor the health of the applications executed inside a new container build on top of it. With some extra configuration, this container can also be used to monitor the IMA hashes of the running system, and to export the measured boot information of the host.
Python Keylime agent
For reference, if we are using the old Python Keylime agent, the payload for the demo should be this:
mkdir payload ssh-keygen -q -b 2048 -t rsa -N "" -f payload/id_rsa
cat << EOF > payload/autorun.sh #!/bin/bash # this will make it easier for us to find our own cert ln -s \`ls *-cert.crt | grep -v Revocation\` mycert.crt mkdir -p /root/.ssh/ cp id_rsa* /root/.ssh/ chmod 600 /root/.ssh/id_rsa* cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys EOF
cat << EOF > payload/action_list local_action_rm_ssh EOF
cat << EOF > payload/local_action_rm_ssh.py import os import ast from M2Crypto import X509 import keylime.secure_mount as secure_mount from keylime import keylime_logging logger = keylime_logging.init_logging("local_action_rm_ssh") async def execute(event): if event.get("type") != "revocation": return metadata = event.get("meta_data", {}) if isinstance(metadata, str): metadata = ast.literal_eval(metadata) serial = metadata.get("cert_serial") if serial is None: logger.error("Unsupported revocation message: %s", event) # load up my own cert secdir = secure_mount.mount() ca = X509.load_cert(f"{secdir}/unzipped/mycert.crt") # is this revocation meant for me? if serial == ca.get_serial_number(): os.remove("/root/.ssh/id_rsa") os.remove("/root/.ssh/id_rsa.pub") os.remove("/root/.ssh/authorized_keys") else: logger.info("A node in the network has been compromised: %s", event["ip"]) os.system(f"iptables -A INPUT -s {event['ip']} -j DROP") EOF
SSH
For security reasons the sshd service does not allow remote access to the root user with a password. For debugging or evaluating the Keylime service this can be a bit inconvenient. The preferred solution is to provide the ssh key already during installation with YaST, when the root password needs to be entered. Or during first boot with ignition.
Alternatively, for testing, this security feature can be disabled:
echo "PermitRootLogin yes" > /etc/ssh/sshd_config.d/root.conf systemctl restart sshd.service
SELinux
MicroOS have activated SELinux by default in enforcing mode. There is currently a bug that requires the update of the SELinux policy to accept some of the actions that the TPM2 Tools a Keylime requires to do. Until the bug get fixed we will need to disable SELinux. For a reference check the SELinux section of MicroOS.
The easiest way is to disable SELinux already during installation on the proposal page. Alternatively edit /etc/default/grub, search the line for the parameter GRUB_CMDLINE_LINUX_DEFAULT, and change selinux=1 with selinux=0. Regenerate grub.cfg as explained before.
To validate that SELinux is disabled, we can check sestatus.
Firewalld
If we have firewalld installed we can install the keylime-firewalld sub-package, and register it in our public zone:
firewall-cmd --zone=public --add-service=keylime --permanent firewall-cmd --zone=public --add-service=keylime
The Keylime firewalld service have the list of all the possible Keylime services, so can be used for the agent and verifier nodes.