Presenting GRUB2 BLS
18. Jul 2025 | Alberto Planas | No License
Protecting against rogue devices in openSUSE with Full Disk Encryption
openSUSE have now multiple ways to configure a Full Disk Encryption
(FDE) installation. A very secure and easy way (YaST2) of doing this
is via user space tools, as we described multiple times (like
here, here, or here). This solution is based on the
systemd tool-set like systemd-cryptenroll, systemd-pcrlock and
systemd-cryptsetup, among other, orchestrated by the in-house
sdbootutil script.
One of the main advantages of using this systemd approach is the
possibility of integrating multiple authentication methods. Together
with the traditional password, asked at boot time during the initrd
stage, we can now unlock the system using a certificate, a TPM2, or
a FIDO2 key. We can mix some of them creating multiple LUKS2 key
slots, and use, for example, a TPM2 to unlock the device in a
unattended fashion and a FIDO2 key as a recovery mechanism.
Honestly, the TPM2, and the TPM2+PIN variation, are the most
relevant ones for the user. As described in the other posts, the
TPM2 is a (some times virtual) device that can attest the health of
our system using a mechanism known as measured boot.
The tl; dr version of this is that each stage of the boot process,
starting from the firmware, will load and “measure” the next stage
before delegating the execution on it. For example, this means that
there is a moment in the latest stages of the boot process where the
UEFI firmware will load from the disk the boot loader into memory.
This can be the shim, systemd-boot or grub2-bls. It will
calculate a hash value (usually SHA256) and will command to the
TPM2 an “extend” operation for one of the internal register (PCR).
The extension is a cryptographic computation that is very easy to
calculate, but impossible to replicate. It is done to one of those
internal registers (PCR) and consist of calculating the hash (again
SHA256) of the old value of the PCR together with the hash of the
component that we are measuring. This new value will replace the
current PCR value, as is the only way to change those registers.
The security property resides in that it is cryptographically
impossible to force the write of a desired value on one of those
PCRs, but very easy to calculate the final value.
So this means that if all the components of the boot chain process are
measured (all the stages in the UEFI firmware, the firmware
configuration, the boot loader, the command line, the kernel and even
the initrd), the final PCRs values can be compared with our
expectations, and discover if the system has been booted with a good
known software and configuration, allowing us to instantly known if
some component in the boot chain has been hacked or modified with out
consent.
That is a powerful property to have, but what is more interesting is
that we can have secrets that can only be open in case that we are in
one of those good or recognized states. We can, for example, cipher
(seal) the key that open an encrypted disk using the TPM2, together
with a policy that will decipher (unseal) the same key only, and only
if we are using the same TPM2 and the PCR values are on a list of
expected ones. Those policies can be very complicated, and can
include extra passwords, certificates or other checks that will be
validated before the TPM2 can unseal the key.
With a mechanism like this in place, thanks to the systemd tools, we
can now avoid entering the password to unlock the encrypted disk if
the system is in a healthy state. Healthy in the sense that we
cryptographically guarantee that the code and configurations used
during the boot process are the expected one, and no one entered
init=/bin/bash in our kernel command line, or replaced the kernel or
initrd with a vulnerable one, for example.
With the integration that we made of this model in openSUSE, we can
make updates of the system, including the boot loader or the kernel,
and sdbootutil will transparently generate new predictions of
expected PCR values that are now considered safe. This imply an
update of the TPM2 policy, that will be taken into consideration for
the next boot, so the automatic unlock will succeed. If something
goes wrong and the expected PCR values are not meet, the user will
need to enter the password that is stored in a different LUKS2 key
slot to open the device, to audit the system and validate it.
The fault in the design
Using a TPM2 as described before is a clear increase in the security
level, but it is not the final answer. Security is always asymptotic
approximation.
Some years ago a physical attack was described for the Windows
BitLocker FDE solution. BitLocker is also using the TPM2 in a
similar way that was described before, but was not using encrypted
session to communicate with the device. Intercepting the SPI bus was
shown possible to recover the password that unlock the disk.
systemd learned from that and used encrypted sessions early, but
this attack can also be avoided if the policy used to unseal the key
was also demanding a PIN or password that must be entered by the user.
Now the TPM2 can only unseal the secret if the PCRs are in the
correct state and the provided password is the correct one. Should be
noted that AFAIK the SPI sniffing can work with Clevis.
But more recently a second attack was made public that fully affect the original proposal, and does not requires the sophistication of the original one. (Disclosure: the attack was also internally described independently months before and some counter measurements was put in place much early)
The article describes how that attack can be done checking in the
initrd the filesystem UUID used to mount the encrypted device. This
information is inside the /etc/crypttab stored in the initrd, that
will do something like this:
systemd-cryptsetup attach cr_root /dev/disk/by-uuid/$UUID ‘none’ ‘tpm2-device=auto’
If the expected firmware, configuration files, kernel and initrd are
used during the boot process then the TPM2’s PCRs registers will
have values that match the policy that unlock the device and the
sealed key can be now unsealed by the TPM2, the disk will be
unlocked, the switch root will succeed and the boot process will
continue in the rootfs.
But what if the original drive is replaced by one that has the same
UUID (it is a public information after all) that is also encrypted?
Then the PCRs will be in the same correct state. Note that in
measure boot is the previous stage the one that measures the next one
before delegating the execution. Then systemd-cryptsetup will try
to use the TPM2 to unlock the device using the key successfully
unsealed by the TPM2 and … will fail to open it, of course. The
rogue device maybe have a TPM2 key slot in the LUKS2 header, but
for sure cannot be open with this TPM2 nor with the secret password.
In this situation systemd-cryptsetup will ask for the password to
unlock the device, and the attacker can enter one that this time will
open the rogue device. The switch root will happen but now it will
continue the boot process in the fake rootfs, and a program stored
there can make questions to the TPM2, that still contains the good
PCR values. One of the questions can be the unseal of the secret
key using the current policy. And this time (as was done before), the
TPM2 will agree to deliver the secret to the bad program. Game
over.
There are solutions for this attack, of course.
One is again to use TPM2+PIN instead of TPM2, the same solution
for the sniffing attack. In this case the first systemd-cryptsetup
call will fail and a password will be asked to unlock the device. But
now the bad program cannot ask to the TPM2 to unseal the device
using the current policy. The PCR values will match, but the policy
also requires the enter of a secret PIN or password known by the real
user, and without it the unseal will fail and the key will be keep
safe.
Another solution is somehow invalidate the policy, extending some of
the PCRs involved before the switch root, so the policy cannot be
applied anymore after that. This can be done automatically by
systemd-cryptsetup using the measure-pcr=yes in /etc/crypttab.
With this option PCR15 will be extended using the volume key, a
secret that can only be extracted knowing some of the device keys.
For this solution to work, PCR15 needs to be included in the current
policy, with an expected value of 0x000..00, the default one. Once
the rogue device is open by the hacker provided password, PCR15 will
be automatically extended and the value will be different from
0x000..00, invalidating the policy before the switch root.
That is a good solution, but not for us. In the daily situation the
user will need to update the system, and a new policy needs to be
calculated to replace the old one (for example when the kernel is
updated). Because with systemd-pcrlock the policy is stored in the
TPM2 in one of the Non Volatile RAM slots (NVIndex), we need to
protect it somehow, so it cannot be replaced by other process. For
that systemd is storing a secret key (recovery PIN) in a different
NVIndex that is sealed by the same policy! If the key cannot be
automatically recovered, because the policy does not apply anymore,
then the recovery PIN will be asked to the user, making the update
process a bit unpleasant if the policy is always invalidated.
Finally, another way to address the issue is to stop the boot process
if we detect that the device is not the expected one. We can think of
a new service, living in initrd that is executed in the very last
moment, just before the switch root, that can stop the boot process
(maybe halting the system) if the device that stores the rootfs is
not the expected one.
For this, PCR15 is still a good solution. It contains the
measurement of a secret (volume key) that can only be known by the
real user, and cannot be replicated by the attacker. Ideally we can
create a prediction for PCR15 and make this service to compare the
effective value with the expected one, and if they are different then
it can stop the boot process.
This is what the measure-pcr-validator service from
sdbootutil is doing. sdbootutil first generates a prediction for
all the encrypted devices that are opened during the initrd, and
check that the correct tag is present in /etc/crypttab. To be able
to access the volume key, the tool needs the root password, so this
prediction is only update when it is really necessary, like for
example when a new encrypted device is added. This prediction is
signed by a private key stored in the host, as an extra security
measurement, but because the public key is also stored in the ESP it
is honestly not adding too much.
An extra service (measure-pcr-generator) will put some order on how
the encrypted devices are opened, as this order is critical to produce
a single possible PCR15 value. If we have one single device the
order of measurements is not relevant, but if when have three
(rootfs, /home, and swap, for example) we can have six possible
and valid different values for PCR15.
The last step is that the dracut-pcr-signature service in the
initrd will import from the ESP the prediction, the signature and
the public key, so measure-pcr-validator can check the signature and
compare the PCR value.
And that is all!
This approach is also kind of similar to what the new
systemd-validatefs is doing, but for a file system level.
Categories: blog
Tags: