Here's a personal account of the history of the application of MD5 collisions to X.509 certificates.
This story complements the `official'
hashclash website.
part I - harmless colliding certificates:
This line of work started shortly after Xiaoyun Wang took the crypto community by surprise at the
Crypto Rump Session in August 2004, by presenting an MD5 collision. That led Arjen Lenstra and me
to wondering what nasty things could be done with such collisions.
We saw possibilities for interesting applications to X.509 certificates.
Applications were already emerging in other areas (software integrity), using the exact collisions
Xiaoyun Wang had presented at Crypto, that were using the standard initial IHV.
We had two main problems to solve: we had to neatly hide a random looking data block inside the
highly formatted data structure that an X.509 certificate is, and we had to obtain collisions for
different initial IHVs than Xiaoyun Wang's original collisions. The first problem was solved by
observing that an X.509 certificate already contains a random looking block of data, namely the
RSA modulus of the public key. It turned out to be easy to hide the collision inside this modulus,
even while retaining the number theoretic structure of the modulus. The second problem
turned out to be even easier: one e-mail to Xiaoyun Wang with our specific initial IHV was
sufficient to receive the desired collision overnight.
See the old
Colliding Certificates website
containing a lot of technical stuff. This work led to our ACISP paper, and caused some turbulence
in the PKI world.
One reaction that I like is the following, by
Jeffrey Kay:
"
I just finished reading Colliding X.509 Certificates (...) and I now have chills running
up my spine (...) this is a pretty scary paper (...)"
Great. That's why we were doing this.
part II - more interesting colliding certificates:
The type of certificates we were able to make using Xiaoyun Wang's type of collision violates
fundamental PKI principles, but is actually completely harmless. This is mainly due to the fact
that the owner's identity as present in the certificates must be identical in both certificates
forming a colliding pair. This is because Xiaoyun Wang's method required the initial IHVs for the
two colliding data blocks to be equal. We now call this type of collisions
identical prefix
collisions.
At EuroCrypt 2005, where Xiaoyun Wang presented her method for constructiong MD5 collisions, it
became clear to us that this was the main obstacle for more interesting colliding certificates.
In particular, we wanted colliding certificates with different owner identities. For that we
needed
chosen prefix collisions, where the two colliding data blocks have arbitrary chosen
initial IHVs.
Then Marc Stevens, a master student in Eindhoven, came to me looking for a topic for his master
thesis project, to be done within the Dutch National Communications Security Agency NBV. We gave
him a choice from a few topics, and he chose hash collisions. After a few months he had
considerably improved and accelerated Xiaoyun Wang's method for identical prefix collisions, and
had ideas on how to construct chosen prefix collisions. This required massive computational
efforts, that luckily can easily be parallelized and distributed. So Marc started the
HashClash
project, using the BOINC framework, in which volunteers donate their spare cycles to scientific
advancement. Soon Marc had a few thousand PCs all over the world running for him, and after some
6 months he finally had a chosen prefix collision. This was crafted such that it could be built in
a pair of colliding certificates, this time with different owner identities. See the
Chosen-prefix Collision website and the
Colliding X.509 Certificates for Different Identities website.
Marc presented his work at EuroCrypt 2007, in a paper that was
selected by the program committee as one of the three
notable papers of the conference.
In June 2007 Marc graduated in Eindhoven as MSc, cum laude, with the highest possible grade for
his master thesis. He also received the Eindhoven University Best Master Thesis Award 2007.
This result achieved some attention from the Dutch press.
intermezzo - predicting the 2008 US presidential elections:
After graduating, before he went to work at
CWI
as a PhD student of Ronald Cramer, Marc visited Arjen Lenstra in Lausanne for two months. Arjen
has a cluster of over 200 PlayStation3 game consoles available for cryptanalytic experiments.
Marc developed code for this platform and considerably improved his methods for constructing
chosen-prefix collisions for MD5. We now took a less serious but equally interesting and
challenging application that was much more appealing to a broad audience: predicting who would
become the next US president. Note: this was in the fall of 2007, more than a year before the
elections in November 2008. Using only one PlayStation3 Marc produced a chosen-prefix
multi-collision for MD5, consisting of twelve perfectly normal pdf files, all having identical
MD5 hash values. See the
Nostradamus website.
Moreover, as a somewhat more serious (to some) application we constructed a pair of Win32
executables having identical MD5 hash values. Next time you download some software and verify its
integrity by an "MD5 checksum", do you trust that there is no other file with the same
"checksum" value? See the
Colliding Executables website.
These two websites were released on November 30, 2007, and attracted a lot of attention worldwide.
part III - really dangerous colliding certificates:
In July 2008 Alex Sotirov, Jake Appelbaum and David Molnar contacted us, as they had ideas on how
to mount an attack on a real Certification Authority, based on chosen-prefix collisions, and they
needed help on constructing these collisions. This led Marc to making new improvements in his
methods, and the full strength of Arjen's PlayStation3 cluster was used to produce, in October
2008, the collision that was needed. The result was a potentially very dangerous attack
possibility on the SSL Public Key Infrastructure, in that we had constructed a rogue Certification
Authority whose certificate appears to be signed by a commercial CA trusted by the major browsers.
This story has been extensively described on the
rogue CA website.
Alex, Jake and Marc presented this work on December 30, 2008 at the
25th Annual Chaos Communication Congress in Berlin.
It has attracted major media attention. In one blog I found a quote about our work that I like:
"
their paper describing the attack is a fascinating read filled with enough gory details
to make any security practitioner salivate".
Keep tissues at hand when you
click here.
The US Computer Emergency Readiness Team issued a
Vulnerability Note, and there were public reactions by companies such as
Microsoft, Verisign and Cisco.
The new developments on MD5 collision construction are described in detail in a paper that was
presented at
Crypto 2009. This paper also contains details of speed improvements to the
identical prefix collision construction, and of a construction method for
single block
chosen prefix collisions.
The Crypto 2009 program committee selected our paper for the
best paper award.
Our work made it to number 1 of the
Top Ten Web Hacking Techniques of 2009. If that's something to be proud of I
don't know.
part IV: - really really really dangerous colliding certificates in the
wild On June 4, 2012, only weeks before Marc's PhD defense in Leiden (June 19, with Ronald
Cramer and Arjen Lenstra as "promotores", Xiaoyun Wang as one of the
"opponent"s, and me as "co-promotor"), I saw on a mailing list a message about
the Flame malware being authenticated by a forged certificate appearing to have been issued by
Microsoft. The message contained a few certificates, but only Microsoft CA certificates, most of
which had already been revoked. The forged certificate was not yet publicly available.
I alerted Marc to this immediately. Marc had, in his time at CWI, developed a very nice
"forensic tool", that on input of one message only, checks whether this message has a
twin message such that these twins form a hash collision constructed by a method such as
Wang's/Marc's. I asked Marc to run his tool on the available Microsoft CA certificates (you never
know what might come out), which he did, with negative results.
One day later, Microsoft announced in a blog that indeed the certificate used for authentication
of Flame has been constructed by a collision for MD5. In that blog they explicitly refer to our
previous work. The forged certificate also became available that day, and Marc ran it through his
forensic tool, this time with positive results. Moreover, analyzing the results, Marc discovered
that a novel chosen-prefix collision attack had been used. This is very interesting news, both
from a scientific viewpoint, and because it gives some insight in the capabilities of the
designers of Flame. On the other hand, the data from the certificate also suggest that the attack
followed to a substantial extent the path we had laid out in our Rogue CA work; it gives the
impression that the Flame designers carefully have studied our work, but gave it their own
creative twist.
Later it became clear that the Flame malware was merely a predecessor of Stuxnet, the real deal.
In Stuxnet also certificates play a role, but hash collisions do not. Apparently it was easier
to compromise a real life Certification Authority. See the excellent book "Countdown to
Zero", by Kim Zetter, for the Stuxnet story in full detail. The book devotes an entire
footnote to our work.
In short: from a scientific viewpoint it's very interesting to see novel attack ideas appear. But
I would have preferred them to have come from the open academic community or from the open security
researchers community. Above all, I'm not very happy to see our work being abused in malware.
the aftermath:
Marc went on with attacking SHA1, and succeeded to find collisions for it too. See his PhD thesis
and further work, available from
https://marc-stevens.nl/research/.