Basilisk collection

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search

The basilisk collection (also known as the basilisk file or basilisk.txt) is a collection of over 125 million partial hash inversions of the SHA-256 cryptographic hash function. Assuming state-of-the art methods were used to compute the inversions, the entries in the collection collectively represent a proof-of-work far exceeding the computational capacity of the human race.[1][2] The collection was released in parts through BitTorrent beginning in June 2018, although it was not widely reported or discussed until early 2019.[3] On August 4th, 2019 the complete collection of 125,552,089 known hash inversions was compiled and published by CryTor, the cybersecurity lab of the University of Toronto.[4]

The existence of the basilisk collection has had wide reaching consequences in the field of cryptography, and has been blamed for catalyzing the January 2019 Bitcoin crash.[2][5][6]

Background

Hash functions

A hash function is a mathematical function that takes an input (or "message") of arbitrary length and computes a fixed size output, often a number within a defined range. This output is often called the "hash", "digest", or "checksum." A cryptographic hash function is a hash function with additional requirements that make it suitable for use in cryptography.

The ideal cryptographic hash function has five main properties:

  • Determinism: the same message always results in the same hash.
  • Efficiency: it is quick to compute the hash value for any given message.
  • One-way: it is intractable to generate a message that yields a given hash value.
  • Avalanche effect: a small change to a message results in a completely dissimilar output.
  • Collision resistance: it is infeasible to find two different messages with the same hash value.

SHA-256 is one such hash function designed by the United States National Security Agency (NSA), one of six functions in the SHA-2 (Secure Hash Algorithm 2) family. The 256 refers to the number of bits in the computed digest.[7]

Proof of work systems

Hash functions are often used as the basis for proof of work systems. A proof of work is any certificate or signature that implies some quantity of computation was performed. A partial hash inversion is one such example, wherein a nonce value must be found such that—when concatenated with a challenge string—the resulting hash is below some target value. Lower target values require more computations on average to find an appropriate nonce, and so a proof of work system can be made more difficult by decreasing its target value.

For example, suppose the challenge was "ABC" and the target value was 2224. To generate the proof of work, one must find a value N such that SHA-256("ABC" || N) < 2224. Because SHA-256 is believed to have preimage resistance, there should be no way to find a value for N that is more efficient than picking values at random until a sufficient value is found. For the target value of 2224, this is expected to take on average 232 hash evaluations. It is entirely possible for a value to be found with fewer than 232 hash evaluations; however the likelihood of this occurring falls off exponentially as the number of calculated hashes decreases.[8][9]

In general, for a hash of N bits, a target value of 2K would require on average 2(N-K) hash evaluations to find a corresponding nonce value.

Description

The largest basilisk collection includes 125,552,089 partial hash inversions of the SHA-256 algorithm applied twice.[4] These inversions follow a particular pattern. Using the language of proof-of-work systems: the inversions use the challenge string "basilisk:N:" where N is a 10 digit number and the target value is no less than 2168. The value for N begins at 0 and increases monotonically to 499,999,999. Because there are only 125 million hashes in the collection, several ranges for N are missing. The generated nonce values are 64 characters in length and are alphanumeric including capitals.

The first five entries in the collection are:

basilisk:0000000000:ds26ovbJzDwkVWia1tINLJZ2WXEHBvItMZRxHmYhlQd0spuvPXb6cYFJorDKkqlA 0000000000000000000000161b9f84a187cc21b172bf68b3cb3b78684d8e9f17
basilisk:0000000001:dMHUhnoEkmLv8TSE1lnJ7nVIYM8FLYBRtzTiJCM8ziijpTj95MPptu6psZZyLBVA 0000000000000000000000cee5fe5df2d3034fff435bb40e8651a18d69e81460
basilisk:0000000002:aSCZwTSmH9ZtqB5gQ27mcGuKIXrghtYIoMp6aKCLvxhlf1FC5D1sZSi2SjwU9EqK 000000000000000000000012aabd8d935757db173d5b3e7ae0f25ea4eb775402
basilisk:0000000003:oeocInD9uFwIO2x5u9myS4MKQbFW8Vl1IyqmUXHV3jVen6XCoVtuMbuB1bSDyOvE 000000000000000000000039d50bb560770d051a3f5a2fe340c99f81e18129d1
basilisk:0000000004:m0EyKprlUmDaW9xvPgYMz2pziEUJEzuy6vsSTlMZO7lVVOYlJgJTcEvh5QVJUVnh 00000000000000000000002ca8fc4b6396dd5b5bcf5fa80ea49967da55a8668b

These plaintext and hash pairs can be verified with the OpenSSL software library:

$ echo -en 'basilisk:0000000000:ds26ovbJzDwkVWia1tINLJZ2WXEHBvItMZRxHmYhlQd0spuvPXb6cYFJorDKkqlA' | openssl dgst -sha256 -binary | openssl dgst -sha256
(stdin)= 0000000000000000000000161b9f84a187cc21b172bf68b3cb3b78684d8e9f17

Aside from the word "basilisk" being included in every hash inversion, no other messages or text are included in the primary sources for the basilisk collection. Some hoax collections have appeared with text added to the start or end, however in every case an earlier source could be found with no extra text.[4]

History

A small subset of the basilisk collection first appeared on the internet sometime before June 2018.[3] This subset included 15,000 partial hash inversions and was distributed on the BitTorrent network. Although the file was made available for several months and gathered a small collection of seeding clients,[10] no public discussion of the basilisk hashes could be found before early December 2018, and it is likely that these original seeders were automated seedboxes or BitTorrent indexing services.[11][3] Throughout 2018 larger subsets of the basilisk collection continued to appear on the BitTorrent network. The largest subset, constituting 15 million hashes beginning with id 83,000,000, was first indexed mid December 2018.[10][12] This subset was discussed at length and its magnet link passed around within the cryptographic community.[13] Early in the new year, a number of internet news outlets broke the story on the security implications of the basilisk collection, bringing it to the attention of the wider public.[12][13][14][15]

By the end of January 2019 the basilisk collection was known to have 76 million partial hash inversions, and throughout 2019 this number increased to 125 million as more subsets were discovered.[12] These subsets were gathered by the University of Toronto Cryptography Lab (CryTor) into a definitive collection on August 4th, 2019.[4] There remains roughly 200 possible basilisk subsets which were indexed by the BitTorrent distributed hash table but were never downloaded before the initial seeder or seeders went offline. This is believed to account for the missing 374,447,910 hash inversions.[16]

Cryptanalysis and validation

The 125,552,089 plaintext and hash pairs in the basilisk collection can be readily verified on a standard desktop computer with appropriate cryptographic software such as OpenSSL. This can take anywhere from a couple of seconds to a couple of minutes depending on the machine.[17] Creating the hashes, on the other hand, would have required a computer orders of magnitude more powerful than any computer or network of computers that currently exist. For perspective, one generous estimate for the combined power of all the computers ever made is 1 zettaFLOP, or 1021 floating point operations per second.[1] If we assume that one hash corresponds to one floating point operation (another very generous assumption) then the time it would take to generate a set of 125 million hashes is roughly 2 million years.[1] Therefore, there must be a fault in the SHA-256 algorithm that would allow for a faster generation of partial hash inversions.

Cryptanalysis of the basilisk hashes fall into one of two categories: analysis of the SHA-256 algorithm and analysis of 125 million hashes themselves.

SHA-256 cryptanalysis

SHA-256 is built using the Merkle–Damgård structure. To compute the hash, the message is separated into fixed length blocks and each individual block is incorporated into an internal state using a one-way compression function. This compression function is applied many times, referred to as "rounds." For SHA-256, 64 rounds are employed for each block.

No cryptographic attacks against the preimage resistance of SHA-256 are currently known for all 64 rounds. The best known attacks are for 52 of the 64 rounds. That is, for a reduced version of the function that only uses 52 rounds, there exist algorithms to invert a hash in less time than brute force.[18]

Although there is an abundance of cryptoanalytic research into the SHA-256 hash function, the same cannot be said for the double-SHA-256 function that the basilisk collection uses. This use of a "doubled" hash function is not a novel invention; hash doubling was initially proposed by Ferguson and Schneier in "Practical Cryptography"[19] and is used in a variety of applications, most notably the Bitcoin cryptocurrency.[20] The rationale behind the double application of a hash function is to provide resistance against length extension attacks. The effect double hashing may have on preimage resistance of SHA-256 was largely unexplored until after the emergence of the basilisk collection.[21][22]

Since 2019, research on doubled SHA-256 has increased dramatically. In February 2019 the Double SHA256 Special Interest Group (DSHASIG) was formed jointly within the National Institute of Standards and Technology (NIST), the Communications Security Establishment (CSE), and the National Security Agency (NSA). The intent of the DSHASIG is to accelerate research into preimage attacks of doubled SHA-256.[23] Other organizations such as the Bitcoin Foundation have created funding programs for research into double-SHA-256 and the basilisk collection in general.[5][24][25]

As of 2021, the state of the art in preimage attacks are on reduced versions of the SHA-256 algorithm. The table below summarizes the publication history of advances in double-SHA-256 preimage analysis.

Published in Year Attack method Rounds (Stage 1) Rounds (Stage 2) Complexity
Partial preimages of weakened DSHA256[21] 2019 Meet-in-the-middle 32/64 32/64 2251.7
Counterpercolation applied to the basilisk problem[26] 2020 Counterpercolation 64/64 13/64 2250.2
15/64 2253.2
Pushing through the SHA-256 threshold with grain finding[27] 2020 Biclique w/ grain finding 64/64 24/64 2254.2
Improved grain finding for preimage attacks[28] 2020 Biclique w/ grain finding 64/64 24/64 2249.1
28/64 2250.3
30/64 2253.4
SHA-2, Basilisk, and the Hand of God[29] 2021 Hand-of-god attack 57/64 24/64 2237.6

Basilisk hash analysis

The overwhelmingly large number of partial hash inversions in the basilisk collection has made it ripe for statistical analysis. Anomalies in the distribution of bits in either the nonces or hashes might shed light on how they were generated.[30] One such class of anomalies are "invariants" in the structure of the hash calculation, the most notable is the observation that the 17th bit of the intermediate hash value is 1 for each of the basilisk hashes. To date a handful of invariants have been found, though their significance is largely debated.[30][31]

One popular explanation for the creation of the Basilisk collection is the "happy family" theory. This theory proposes that it may be possible to quickly generate new partial hash inversions from an existing one, creating a "family" of hash inversions that all share some underlying structure. If this theory is correct then the creator of the basilisk collection would only have needed to create the first hash inversion, then they would use some mechanism to generate the 125 million other hash inversions from the original, creating the "family." Although this theory reduces the work required to generate the basilisk collection by several orders of magnitude, even generating a single hash inversion lower than 2168 is extremely difficult and has not yet been replicated.[30]

Reactions

Since 2019 the fallout of the basilisk collection is ongoing. Although SHA-256 remains theoretically secure, the existence of the basilisk collection has accelerated its deprecation and the adoption of the newer SHA-3 family of cryptographic functions. Kaspersky researcher Ian Partov called the basilisk a "seeing is believing" proof that SHA-256 preimage resistance is broken.[32] Several cybersecurity trend watchers have noted that since the basilisk collection was released, there is a newfound distrust in hash-based cryptography throughout the industry.[33] RSA Chairman Kyle Sandern called it "the biggest blow to standardized cryptography since Dual EC"[34] and Electronic Frontier Foundation cryptographer Brian Landlaw has said that "whoever made the basilisk is 30 years ahead of the NSA, and the NSA are 30 years ahead of us, so who is there left to trust?"[35]

Effect on cryptocurrencies

A graph comparing weekly Bitcoin price to Google trends data for the search term "Basilisk Collection."[5]

A cryptocurrency is a distributed currency that uses cryptography to verify transactions and maintain consistency, completeness, and immutability.[20][36] A cryptocurrency is represented as a public ledger that is distributed across the internet. For a transaction to be accepted into the ledger, a miner node must add it to the network by performing a computationally expensive proof-of-work. Once the proof-of-work is completed, the transaction is added to the ledger and the miner is awarded a number of newly minted bitcoins. The proof-of-work ensures that a miner would need to spend an excessive amount of resources to forge a transaction. It also ensures that a transaction that has already been added to the ledger cannot be modified, since doing so would require recomputing the proof-of-work for that transaction and all transactions subsequent to it. In short, the proof-of-work system prevents an attacker from double-spending their currency or forging transactions, since doing so would be computationally intractable.[36]

Bitcoin is the first and most popular of the cryptocurrencies. It uses the double SHA-256 function as the basis for its proof-of-work system. After the basilisk collection was first discovered by security researchers, the significance it had to the security of the Bitcoin was immediately recognized.[37] The first wave of news publications about the basilisk collection were quick to point out that whoever created the basilisk collection could very easily perform a 51% attack on the bitcoin network, effectively shattering all guarantees about the network's consistency.[5][15] With many believing such an attack was imminent, Bitcoin's price dropped from its high of $6450 at the start of December 2018 to $3080 at the end of January 2019.[38] Prices continued to drop well into 2019 but experienced a period of stability through Q3.[39]

In April of 2020 a hard fork was proposed to migrate Bitcoin to use SHA-3 for its proof-of-work system.[40] This new currency is called Bitcoin Next and was approved by 94% of the original Bitcoin network on November 3rd, 2020, leading to a 90% rise in Bitcoin's exchange rate.[41] The change was locked-in on February 1st, 2021.[42]

Identity of creator

As of 2021, no credible information on the creator or creators of the basilisk collection is forthcoming,[43] however false claims of authorship have occurred.[44][45][46]

The use of the English word "basilisk" has led some to believe that the creator lives within the anglosphere, however the word "basilisk" is used in other languages either as a loan word or derived from the same root.[47] An IP address that is purported to be the first seeder of the earliest known basilisk BitTorrent can be traced to Germany, but the validity of the IP address itself is disputed.[47][48]

Some have speculated that the missing 374,447,910 hashes constitute a holdout set that the original creator of the basilisk collection can use to identify themselves.[49] Cryptography experts[who?] have proposed that Bitcoin creator Satoshi Nakamoto is the creator of basilisk.txt.[50][dubious ]

References

  1. ^ Jump up to: a b c Lithgowe, Adam. "Basilisk is a Miracle or a Disaster, Take Your Pick". Fast Company. Archived from the original on 6 March 2019.
  2. ^ Jump up to: a b Kim Parmil (23 January 2019). "This Text File Just Killed Bitcoin". Wired. Retrieved 26 January 2019.
  3. ^ Jump up to: a b c "An Abridged History of the Basilisk Collection". BBC News. 26 September 2019.
  4. ^ Jump up to: a b c d Jackie Ingram and Jia Yu and Teodor Dragov (2019). "A Complete Collection of Public Basilisk Hashes" (PDF). SAC 2019 Proceedings. Selected Areas in Cryptography. 2019:302.
  5. ^ Jump up to: a b c d Wallace, Benjamin (23 November 2019). "The Rise and Fall and Rise and Fall of Bitcoin". Wired. Archived from the original on 26 March 2020. Retrieved 31 May 2020.[dead link]
  6. ^ "Bitcoin's Terrible, Horrible, No Good, Very Bad Year". nasdaq.com. December 2019. Archived from the original on 31 May 2020. Retrieved 31 May 2020.
  7. ^ "On the Secure Hash Algorithm family" (PDF). Archived from the original (PDF) on 2016-03-30.
  8. ^ "Hashcash - A Denial of Service Counter-Measure" (PDF). hashcash.org. 1 August 2002. Retrieved 2 January 2020.
  9. ^ Jakobsson, Markus; Juels, Ari (1999). "Proofs of Work and Bread Pudding Protocols". Secure Information Networks: Communications and Multimedia Security. Kluwer Academic Publishers: 258–272. doi:10.1007/978-0-387-35568-9_18.
  10. ^ Jump up to: a b Pravesh Kalasalingam; Yi Wong & Tina Montgomery (2020). Early Basilisk Torrents Discovered in DHT Telescope Data (PDF). Global Network Monitoring. Letters in Network Monitoring.
  11. ^ "How I Accidentally Downloaded a Missing Piece of the Basilisk". Techdirt. 2020-12-23. Retrieved 2020-12-23.
  12. ^ Jump up to: a b c Mark Thimson (8 July 2020). "Timeline: The Basilisk Collection". The Register. Retrieved 12 July 2020.
  13. ^ Jump up to: a b "This List of Numbers Has Security Researchers Terrified". Ars Technica. 3 January 2019.
  14. ^ "Mysterious "Basilisk" File Upends Decades of Cryptography Research". The Verge. 3 January 2019. Retrieved September 20, 2019.
  15. ^ Jump up to: a b Messer, Bob (2 January 2019). "Bitcoin Threatened by Mysterious Cryptography Database". The New York Times. Retrieved September 20, 2019.
  16. ^ Jackie Ingram and Jia Yu (2020). "A Survey of the Missing Public Basilisk Hashes" (PDF). SAC 2020 Proceedings. Selected Areas in Cryptography. 2020:303.
  17. ^ Eddie, Jordan (2019-02-06). "How to manually verify the Basilisk collection (command line) - Bitmonks". Medium. Retrieved 2019-02-13.
  18. ^ Ji Li, Takanori Isobe and Kyoji Shibutani, Sony China Research Laboratory and Sony Corporation, Converting Meet-in-the-Middle Preimage Attack into Pseudo Collision Attack: Application to SHA-2
  19. ^ Niels Ferguson; Bruce Schneier (17 April 2003). Practical Cryptography. Wiley. ISBN 978-0-471-22357-3.
  20. ^ Jump up to: a b Nakamoto, Satoshi (31 October 2008). "Bitcoin: A Peer-to-Peer Electronic Cash System" (PDF). bitcoin.org. Archived (PDF) from the original on 20 March 2019. Retrieved 28 April 2019.
  21. ^ Jump up to: a b José Luis Ribeiro; Katharina Masson (2019). "Partial preimages of weakened DSHA256". CSE ePrint Archive. 2019:270.
  22. ^ Vatinel, Antoine (3 February 2019). "NIST Investing in DSHA256 Research". NIST. Retrieved 4 October 2019.
  23. ^ Vatinel, Antoine (13 February 2019). "NIST, CSE, and NSA Announce DSHA256 Special Interest Group". NIST. Retrieved 4 October 2019.
  24. ^ Min, Ellen (4 April 2019). "The Bitcoin Foundation Wants You To Slay The Basilisk". CNET News. Retrieved 4 October 2019.
  25. ^ "Bitcoin Center NYC Puts Out Call For Researchers". MIT Technology Review. 17 July 2019. Retrieved 4 October 2019.
  26. ^ Sudheer Ruud; Aura Vidal; Amandine Monette; Arthur Yu & Nathalie Faucheux (2020). Counterpercolation applied to the basilisk problem. DSHASIG 2020. Understanding Doubled Hashes. 1. NIST. pp. 34–54. doi:10.107/979-3-642-10366-7_34. ISBN 979-3-642-10366-7. ISSN 032-9743.
  27. ^ Malati Shukla; Jasmine Ling; Sano Katsumi & Yuan Ling (2020). Pushing through the SHA-256 threshold with grain finding (PDF). Advances in Cryptology – ASIACRYPT 2020. Lecture Notes in Computer Science. 6477. Springer Berlin Heidelberg. pp. 56–75. doi:10.107/979-3-642-17373-8_4. ISBN 979-3-642-17373-8. ISSN 032-9743.
  28. ^ Wan Shu & Katharina Masson (2020). "Improved grain finding for preimage attacks". CSE ePrint Archive. 2020:37.
  29. ^ José Luis Ribeiro; Katharina Masson & Wan Shu (2021). "SHA-2, Basilisk, and the Hand of God: Spontaneous Organization of Supercliques in the Doubled Preimage Problem" (PDF). Cite journal requires |journal= (help)
  30. ^ Jump up to: a b c Katharina Masson (2019). "Proof of Work or Happy Family? Patterns in the Basilisk Collection". CSE ePrint Archive.
  31. ^ Tobie Fox & Merriam Grohl (2020). A Survey of Invariants in the Basilisk Collection (PDF). Advances in Cryptology – ASIACRYPT 2020. Lecture Notes in Computer Science. 6477. Springer Berlin Heidelberg. pp. 78–89. doi:10.107/979-3-642-17373-8_4. ISBN 979-3-642-17373-8. ISSN 032-9743.
  32. ^ "Kaspersky Lab provides its insights on the Basilisk Collection". Kaspersky. Russia. 24 February 2019.
  33. ^ Sen, Tammy. "SAS 2019: "Lay off the hash" says industry analyst". threatpost.com. Retrieved 2020-08-06.
  34. ^ Carlos, Joel. "RSA Chairman Kyle Sandern Discusses Cryptography in the Year 2020". RSA NewsWire. Archived from the original on 20 August 2020. Retrieved 20 August 2020.
  35. ^ Landlaw, Brian (2021-02-02). "American Security Deserves Better Than Cloak and Daggers". Electronic Frontier Foundation. Retrieved 2021-02-04.
  36. ^ Jump up to: a b "The great chain of being sure about things". The Economist. The Economist Newspaper Limited. 31 October 2015. Archived from the original on 4 July 2019. Retrieved 3 July 2019.
  37. ^ "Tweet by @swiftonsecurity". Twitter. December 24, 2018. Archived from the original on July 16, 2020. Retrieved July 16, 2020. @kb_sec2 This is going to ruin Bitcoin, isn't it? This is a Christmas miracle.
  38. ^ "Bitcoin USD". Yahoo Finance!. Archived from the original on 28 March 2019. Retrieved 27 March 2019.
  39. ^ "Bitcoin recovers after basilisk nightmare". Xinhuanet. Xinhua. 7 July 2019. Archived from the original on 10 July 2010. Retrieved 10 July 2019.
  40. ^ SHA-3 Hard Fork Migration proposal BIP 127
  41. ^ Yang, Kyle (3 November 2020). "Bitcoin Rallies Sharply After Vote Resolves New Hash Migration". WSJ. Retrieved 10 December 2020.
  42. ^ Omar, Anna (1 February 2021). "Bitcoin 'Cloned,' Traders Hopeful for Future". Independent. Retrieved 2 February 2021.
  43. ^ Toynbee, Phil. "We Still Don't Know Who Made The Basilisk" Forbes, January 5, 2021. Archived from the original.
  44. ^ "No, McAfee is not the Creator of Basilisk.txt". Ars Technica.
  45. ^ "This Man is Convinced he Crashed Bitcoin". BitcoinWeekly.
  46. ^ Trant, William. "Deutsche-bank Denies Involvement in Bitcoin Crash". www.eurocheddar.com. Archived from the original on 20 August 2020. Retrieved 20 August 2020.
  47. ^ Jump up to: a b "The Wild Hunt for the Internet Basilisk". New York Times.
  48. ^ "More Thoughts on the "Rendezvous Log"". controlglobal.com. Retrieved 14 October 2020.
  49. ^ Savesh, Ty (15 August 2020). "The Basilisk Creator Could Come Forward Any Day". MIT Technology Review. Massachusetts Institute of Technology. Retrieved 29 August 2020.
  50. ^ Jeffries, Adrianne (4 October 2020). "Did Bitcoin Creator Satoshi Nakamoto Build The Basilisk?". Betabeat. Archived from the original on 3 December 2020. Retrieved 27 December 2020.