The wikis are now using the new authentication system.
If you did not migrate your account yet, visit https://idp-portal-info.suse.com/

Portal:MicroOS/RemoteAttestation

Jump to: navigation, search

Remote Attestation with Keylime

Trusted Computing is a big topic that encompass multiple technologies, all of them using a single piece of hardware used as root of trust: a cryptographic co-processor named TPM (Trusted Platform Module).

Once you take ownership of it, it will generate a private key that cannot be extracted (by usual means) and that can be used to generate new secondary keys, sign some hashes and encrypt some documents.

Almost all the systems running nowadays have a TPM device included. They are cheap and generally available for multiple architectures. If you device is old there is a good chance that will have a TPM with version 1.2. For now this version is not supported, as the version 2.0 has been available for now almost 10 years.

The TPM is usually disabled by default, but can be enabled from the EFI / BIOS menu. In this menu there is also the option of clearing all the internal data, invalidating all the keys generated by the TPM, and subsequently invalidating also all the signed documents. So use this option with caution.

Measured boot

If we already have a TPM 2.0 enabled, we can also start using it to inspect if the boot chain has been tampered in any form. This process is known as measured boot. Each step in the boot process will calculate a hash based on the content on memory (or disk) of bits of the next stage, and register this hash inside some of the internal registers in the TPM.

Those registers are known as PCR, and a typical TPM have 24 of them and are designed to store hash values. Hash functions like SHA1 or SHA512 have different sizes, and the TPM can enable multiple of them at the same time. This make that for each PCR we will have multiple slots or variants of the same register, one for each hash function. For example, is we enabled in the TPM the hashes SHA1 and SHA256, the PCR#1 will have two versions, one with 20 bytes and other with 32 bytes.

After the reboot the PCRs are loaded with a good known value, usually all 0 or all 1 depending on the register. We cannot write a specific value on those registers, but we can use the extend operation to update the content. An extension of a PCR is defined as "PCR <- HASH(PCR_VALUE || DIGEST)". For example, if we want to extend the SHA1 slot of the PCR#1 with a given digest value, we need to prepend it with the current value of the PCR and calculate the SHA1 of the aggregate.

Every time that there is an extension, the digest value is stored in a log that, eventually, will reach the kernel and will be available to the user. This log (named eventlog) can be used to recreate the expected values of the PCRs. We can now ask the TPM for a signed report of the real values of the PCRs (named quote), and compare them with the calculated ones.

Because the quote is signed by a private key that only the TPM has we can be sure that those are the current values of the PCRs. If the recalculated values match with the quote from the TPM we are also sure that the hashes listed in the event log are the one used during the extensions.

We can use this last fact to compare those hashes from a list of good hashes that we have from a different source. This event log will have one hash for the firmware (EFI), another for the boot loader, another for the kernel, etc. We can now compare all of them with the list of good hashes that we have.

We can do the comparison locally, but is more effective if we send the event log and the TPM quote to a remote machine that have the list of good hashes to make the comparison. This process is known as remote attestation.

As we will see later, with remote attestation you can validate more elements than the measured boot hashes. A process can delegate in the TPM the sign of any generate document, that provide any kind of metric or any other information about the system. Because of this signature, we can:

  • Be sure that the sender is the system that generate the document.
  • The document has not been changed or altered in the original system, nor by any agent that intercept the communication.

With those guarantees we can send, for example, information about the executed programs and the hashes of those binaries generated by the IMA component of the kernel.

If during the boot process the system find a TPM device it will start doing the measured boot. Some static code living in the system will load some segments from the UEFI firmware, and will generate a digest that will be used to extend some of the PCR registers. This extension will be notified via an event, and registered in a way that we can later check. After that this component will delegate the execution to the UEFI firmware, that will do the same for the next component in the chain of load, until it reach the kernel.

Grub2 can help us to gather more data extending registers 8 and 9, that will account to the Grub2 command line, the kernel command line or the different files read by Grub2 during the load process. Check the Grub2 documentation for details.


To activate this feature, we should replace grub.efi from /boot/efi/EFI/opensuse to the version stored in /usr/share/grub2/x86_64-efi with the name grub-tpm.efi. We can do that automatically with:

 shim-install --suse-enable-tpm

Installing Keylime

Keylime is an open source project designed to helps us to do the remote attestation. It is currently integrated in openSUSE MicroOS via two new system roles.

  • MicroOS with Remote Attestation (Agent)
  • MicroOS with Remote Attestation (Verifier)

All the systems that will participate in the remote attestation needs to be installed using the Agent system role, and the one that will do the validation would be installed with the Verifier system role.

The system roles will take care of installing all the required software, but cannot check if there is currently a TPM 2.0 co-processor available.

We can check this with:

 dmesg | grep TPM

If there no one is found, please review the options in the UEFI menu. If you find a TPM 1.2 you can check with the manufacturer if there is a firmware upgrade to 2.0.

At the end you will have two devices in your system: /dev/tpm0 and /dev/tpmrm0.

This last device, /dev/tpmrm0, is the resource management device implemented by the kernel. A resource management is required because the TPM is a very constrained processor, that require some help when multiple process try to access it simultaneously.

There is also an user space resource manager installed, independent of the kernel one that can be accessed via dbus and installed as a requirement of the Keylime dependencies. This is the TPM access broker and resource management daemon (abrmd) that lives in the tpm2-abrmd.service, that is automatically activated via DBus.

We can check that all is working properly with:

 # Generate random numbers
 tpm2_genrandom --hex 10
 systemctl status tpm2-abrmd.service

In any moment we can inspect the PCR registers from the command line:

 tpm2_pcrread

We can inspect the different extensions and the reasons reading the event log:

 tpm2_eventlog /sys/kernel/security/tpm0/binary_bios_measurements

Those commands are useful when we need to extract the good PCR values form a new installed system.

Keylime verifier

We should set up the verifier first, so the agents will have a place to connect to.

Before starting the verifier and registrar services we should inspect the configuration file in /etc/keylime.conf. The one deployed in the package should be OK for the verifier role, but there are some values that we should check.

  • ca_implementation in [general]. Keylime needs CFSSL to generate revocation certificates. The CFSSL service is installed by default, but you can choose back "openssl" for cases where CFSSL is not needed.
  • require_ek_cert in [tenant]. The TPM can have a manufacturer root certificate for the endorsement key (ek) that can be validated. If we do not have one (because is a vTPM, for example), we must set this value to False, and if possible, set a script for ek_check_script.
  • cert_* in [ca]. The values should be adjusted for your organization.

Now we can activate the verifier and registrar services:

 systemctl enable --now keylime_verifier.service
 systemctl enable --now keylime_registrar.service

The verifier needs to create some certificates the first time and the registrar will fail if those certificates are not available.

Keylime agent

For each system in out network, we will use the Agent system role for the installation. This role will install all the IMA/EVM infrastructure, the TPM/TSS tools and the Keylime agent service.

Before starting any service we need to adjust the /etc/keylime.conf configuration file.

  • receive_revocation_ip in [general]. Replace it with the address of the revocation server. This is usually the same one that the verifier.
  • registrar_ip in [cloud_agent]. Same as before, replace it with the address of the server.
  • agent_uuid in [cloud_agent]. This UUID is used to identify the agent. It is configure to use the hostname.

The Agent system role will install a new service, the keylime-agent, that needs to be enabled and started.

 systemctl enable --now keylime_agent.service

The agent will contact to the register service, to communicate the certificates that the UUID of the agent.

Register

The agent needs to be registered and accepted by the verifier before it start sending information about the system itself. This is done via a command like tool (keylime_tenant) that resides in the verifier node.

For example, if the agent address is AGENT and the UUID registered is UUID, the command line to register the system is:

 keylime_tenant -v 127.0.0.1 \
                -t AGENT \
                -u UUID \
                --cert default \
                -c add

We can remove the registered machine with replacing the last line with -c delete, and we can monitor the value of a PCR using the --tpm_policy parameter, that expect a JSON string.

 ...
 --tpm_policy '{\
   "5":["223DD8701C16AC430BDDB1B409792AE6002121E4",\
        "2B23381030DEF370AF781B143A25761F03A1D27F44922695D32EC74A96595576"],\
   "6":["B2A83B0EBF2F8374299A5B2BDFC31EA955AD7236",\
        "3D458CFE55CC03EA1F443F1562BEEC8DF51C75E14A9FCF9A7234A13F198E7969"]}'

This can be used to reference the PCR registers involved the in the measure boot process, for example.

For convenience this policy can be added in the keylime configuration file.

Keylime can also validate the event log, and recreate the expected values of the PCRs following the event log and compare them with the current values.

For that we need to create an empty measured boot referential state (Keylime is in the process of making this more explicit) and use the --mb_refstate parameter.

 echo "{}" > empty_mb_refstate.json
 ...
 --mb_refstate empty_mb_refstate.json

Payload

Keylime can deliver secret data to the agents once they are recognized and verified. This data can be, for example, key certificates or passwords that needs to be deployed in the system.

During the registration process, we can add a the parameter --include to reference a ZIP file that will be encrypted and sent to the agent, and there will be extracted and executed.

The ZIP file needs to contain a shell script autorun.sh, that will be executed only in the device is verified. This payload can have also some Python scripts, that will be executed if a security breach has been found.

For more information reference the Keylime documentation

Enabling IMA tracking

IMA/EVM is another complex topic, described in deep in the IMA/EVM page in the openSUSE wiki.

With IMA we can calculate the hash of a file before it is reader or executed. This hash is compared with one inserted in the extend attributes of this file, and if are the same the system will authorize the access.

This authorization happens when the system is in appraisal mode. But there are other modes of execution, when the kernel can adjust the extended attribute of the file to fix the registered IMA hash, and another mode where the hash is calculated and logged into the event log.

The IMA hash can be signed and when a TPM is available, a PCR register (usually PCR#10) will be extended. A quote of the TPM and the logs can be send also to a remove verifier that can attest the comparison.

With Keylime we can do this remote attestation of the IMA hashes, supported by the security provided by the TPM in the system. For more information refer to the Keylime documentation.

Basically the process consist on booting the kernel of the agent with the parameters ima_appraise=log ima_policy=tcb. This can be persisted updating GRUB_CMDLINE_LINUX_DEFAULT in /etc/default/grub and regenerating grub.cfg.

 transactional-update grub.cfg
 reboot

This example is using the tcb IMA policy, that comes included in the kernel. It is recommended to create a new policy to avoid long logs and the monitoring of too many files. The referenced documentation explain how to persist the new policy via Dracut.

Now we can indicate during the agent registration the expected hashes using the --allowlist parameter, and the excluded or ignored files via the --exclude parameter.

Some IMA/EVM documentation also suggest to use rootflags=i_version to avoid the re-creation of already calculated hashes, but this parameter seems to cause problems during the boot process.

A demo

To understand Keylime better we can do a little demo will exercise some of the main components that can be used when planning a deployment. This demo is an extended version of the one described in the [https://keylime-docs.readthedocs.io/en/latest/user_guide/secure_payload.html Keylime documentation] for secure payloads, except that is also showing the integration with IMA, measured boot and TPM policy.

In general terms, we are going to see:

  • How to extract IMA hashes and configure Grub2
  • How to create a certified payload
  • How to produce a revocation and isolate a hacked node

But first we need to allocate three nodes with a TPM / vTPM. If we have libvirt we need to install the TPM emulator swtpm, and add the TPM as extra hardware before doing the installation of the OS.

One of the nodes will be installed using the verifier system role, and the other two via the agent system role. See the previous sections for details, just remember to set the IP of the remote verifier in the configuration file of the agents.

Extracting the IMA hashes

The default IMA policy of the Linux kernel measure a lot of files, and this will produce a lot of noises a false positives. It is recommended to set your own policy (check IMA/EVM page in the openSUSE wiki for details), but for today we are going to use the default one.

To enable it add ima_appraise=log ima_policy=tcb in the GRUB_CMDLINE_LINUX_DEFAULT variable in /etc/default/grub and regenerate grub.cfg in the agent nodes:

 transactional-update grub.cfg
 reboot

Once that is back we need to generate the list of good hashes of the system. The IMA system once is enabled will register all the files indicated by the policy, this will go as far a registering the files living in the initrd, and this can cause a mismatch from the one living in the current sysroot. We need to keep this in mind, as should be fixed adjusting the allowed list, the exclude list and the policy file.

For now lets generate the IMA allow list.

 OUTPUT=/tmp/allowlist.txt
 ALGO=sha256sum
 rm -f "$OUTPUT"
 cd /
 head -n 1 /sys/kernel/security/ima/ascii_runtime_measurements | awk '{ print $4 "  boot_aggregate" }' | sed 's/.*://' >> "$OUTPUT"
 find `ls / | grep -E -v "boot|dev|mnt|proc|run|.snapshots|srv|sys|tmp"` \( -fstype rootfs -o -xtype f -type l -o -type f \) -uid 0 -exec "$ALGO" '/{}' >> "$OUTPUT" \;

This basically extract the boot_aggregate hash of this system from the IMA measurements file, and create a big list of hashes of the system. If you want, you can extract the content of the initrd and append the hashes into the allow list file. Just remember to remove the prefix of the directory that you used to extract the ramdisk.

Eventually we will provide a tool that will help in the generation of those hashes, or better, will add them in the filesystem extended attributes.

Now for this example lets create a custom script in /root and aggregate the hash into the list.

 cat << EOF > /root/greets.sh
 #!/bin/sh
 
 echo "Hello!"
 EOF
 chmod a+x /root/greets.sh
 "$ALGO" /root/greets.sh >> "$OUTPUT"

Copy the the allowlist.txt files into the verifier node. Also, in the verifier node we can create a proposal of exclude list. For now we are going to exclude too many files. This should be drastically restricted in production.

 cat << EOF > exclude.list
 /boot/.*
 /dracut-state.sh
 /etc/.*
 /root/.bash_history
 /root/.lesshst
 /root/.ssh/.*
 /.snapshots/.*
 /sysroot/.*
 /usr/bin/dracut.*
 /usr/lib/.*
 /usr/share/.*
 /var/lib/.*
 /var/log/.*
 EOF

Each line is a Python regular expression.

Creating the payload

In the verifier node we are going to create the payload. For this example we are going to deliver the SSH public and private keys, shared by all nodes, the authorized_keys and a Python local action that will fence the node in case of revocation.

 mkdir payload
 ssh-keygen -q -b 2048 -t rsa -N "" -f payload/id_rsa
 cat << EOF > autorun.sh
 #!/bin/bash
 
 # this will make it easier for us to find our own cert
 ln -s `ls *-cert.crt | grep -v Revocation` mycert.crt
 
 mkdir -p /root/.ssh/
 cp id_rsa* /root/.ssh/
 chmod 600 /root/.ssh/id_rsa*
 cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys
 EOF
 cat << EOF > action_list
 local_action_rm_ssh
 EOF
 cat << EOF > local_action_rm_ssh.py
 import os
 import ast
 from M2Crypto import X509
 import keylime.secure_mount as secure_mount
 from keylime import keylime_logging
 
 logger = keylime_logging.init_logging("local_action_rm_ssh")
 
 async def execute(event):
     if event.get("type") != "revocation":
         return
 
     metadata = event.get("meta_data", {})
     if isinstance(metadata, str):
         metadata = ast.literal_eval(metadata)
 
     serial = metadata.get("cert_serial")
     if serial is None:
         logger.error("Unsupported revocation message: %s", event)
 
     # load up my own cert
     secdir = secure_mount.mount()
     ca = X509.load_cert(f"{secdir}/unzipped/mycert.crt")
     # is this revocation meant for me?
     if serial == ca.get_serial_number():
         os.remove("/root/.ssh/id_rsa")
         os.remove("/root/.ssh/id_rsa.pub")
         os.remove("/root/.ssh/authorized_keys")
     else:
         logger.info("A node in the network has been compromised: %s", event["ip"])
         os.system(f"iptables -A INPUT -s {event['ip']} -j DROP")
 EOF

Hacking the node

We can now register the nodes and start collecting the TPM quotes and the IMA logs. For that first we will create an empty measured boot ref state, so Keylime can replay the event log and compare the calculated PCRs with the current PCRs from the quote.

 echo "{}" > mb-refstate.json
 keylime_tenant -v 127.0.0.1 \
                -t NODE_IP \
                -u NODE_UUID \
                --cert default \
                --include payload \
                --allowlist allowlist.txt \
                --exclude exclude.txt \
                --mb_refstate mb-refstate.json \
                -c add

If we have any complains about keylime.ima - ERROR - IMA ERRORS we should adjust the some of the list, maybe the exclude one to drop any transient file for example. Before re-registering it we need to delete (-c delete) and reboot the node, to re-create the new IMA log file in the kernel.

If we do not have a different allow list for the second node, we can register this particular one without IMA:

 keylime_tenant -v 127.0.0.1 \
                -t NODE_IP \
                -u NODE_UUID \
                --cert default \
                --include payload \
                --mb_refstate mb-refstate.json \
                -c add

If everything goes OK we should have in both nodes the payload extracted in /var/lib/keylime/secure/unzipped, with the local actions, the certificates and the autorun.sh script. This script should be executed and deployed /root/.ssh with the SSH keys.

We can test the passwordless SSH connection from the verifier:

 ssh -i payload/id_rsa NODE_IP

and from the second node:

 ssh NODE_IP

In the first node we should have the ad-hoc script /root/greets.sh created before, that can be executed safely.

 ./greets.sh

We can simulate the effects of an attacker changing this file and executing it again:

 echo 'echo "Hacked!"' >> greets.sh
 ./greets.sh

The verifier will detect the change in the file in the next request, and will command the execution of all the local actions delivered in the payload. In this case we have only one that will remove the SSH keys in the affected node, and for the rest will command an iptables that will drop all the connections initiated from the affected node.

We can check the effects in the first node, and see that /root/.ssh is empty, and that in the second node we have now a new iptables rule that is, indeed, drooping the all the connections from the first node.

Notes

SSH

For security reasons the sshd service does not allow remote access to the root user with a password. For debugging or evaluating the Keylime service this can be a bit inconvenient. The preferred solution is to provide the ssh key already during installation with YaST, when the root password needs to be entered. Or during first boot with ignition.

Alternatively, for testing, this security feature can be disabled:

 echo "PermitRootLogin yes" > /etc/ssh/sshd_config.d/rootlogin.conf
 systemctl restart sshd.service

SELinux

MicroOS have activated SELinux by default in enforcing mode. There is currently a bug that requires the update of the SELinux policy to accept some of the actions that the TPM2 Tools a Keylime requires to do. Until the bug get fixed we will need to disable SELinux. For a reference check the SELinux section of MicroOS.

The easiest way is to disable SELinux already during installation on the proposal page. Alternatively edit /etc/default/grub, search the line for the parameter GRUB_CMDLINE_LINUX_DEFAULT, and change selinux=1 with selinux=0. Regenerate grub.cfg as explained before.

To validate that SELinux is disabled, we can check sestatus.

Firewalld

If we have firewalld installed we can install the keylime-firewalld sub-package, and register it in our public zone:

 firewalld-cmd --zone=public --add-service=keylime --permanent
 firewalld-cmd --zone=public --add-service=keylime

The Keylime firewalld service have the list of all the possible Keylime services and also include CFSSL, so can be used for the agent and verifier nodes.