Making Amulets with Llama 2

Blackle Mori

— 2023-07-26

An amulet is a type of poem introduced by Robin Sloan that fulfills the following requirements:

It is less than or equal to 64 bytes when expressed in UTF-8.
The hexadecimal representation of its SHA256 hash has four or more eights in a row.

Here's the example Robin has on their website, along with its hash—run of five eights in bold:

If you can't write poems,
write me

9a120001cc88888363fc67c45f2c52447ae64808d497ec9d699dba0d74d72aab

I have an inexplicable obsession with hash functions. I love the idea of finding just the right input to cause the avalanche of bits to assemble into a strange output. It's the math equivalent to throwing a pebble into a lake at just the right spot, so that in two hours the ripples form the image of the flag of Brazil.

Needless to say I'd love to make an amulet or two. I already have a partial hash inverter that I wrote for The Basilisk Collection, all it does is brute force a random alphanumeric suffix to make the hash of your string start with a ton of zeros. If I xor the hash with "0x8888..." before checking for leading zeros, then I could make some amulets no problem. But random suffixes aren't really poetry. We can do better.

There are a number of ways we could search for poems that hash to amulets. We could find sequences of emoji that produce amulets, or search for substrings in Project Gutenburg. We could write a context-free grammar for simple english sentences and exhaustively sample it—I suspect that is how this impressive beyond mythic, 10-eight amulet was made (pardon the NFT link, I don't like them either.) This hacker news thread is a fun gallery of people's ideas.

The idea I came up with is to use an LLM (Largle Langle Mangle) to generate candidate poems, then use a variety of text transforms to quickly generate millions more. On the outset, this sounds like it would be really slow. But with this technique I was able to create the following beyond mythic, 12-eight, self-referential amulet after around 24 hours of brute-forcing:

Hash Me: WITH sha256,
And see, If I've,
GOT TWELVE EIGHTS. >:3c

60c2f22874768995738e495df894188888888888870ec9a15ae22b356d28b8cb

I also made self-referential 10-eight and 11-eight amulets, which took substantially less long to find:

THIS Is A poem—THAT You can hash;
And see Ten Eights!? ♥

961c888888888810ae2e2caad3e267a375443f08bd563a9fe53af69b6c90fe58

Please, sha256 Hash me, If YOU'D LIKE; I HAVE ELEVEN eights!~<3

31ea7e5c72e949ae2d20c83a7fc762994cfc00c01cb5888888888884fc734322

Part One - Efficiently Sampling the Largle Langle Mangle

Whenever brute-forcing is involved, there really isn't any alternative to C++. In order to maximize the MegaHashes per second (MH/s) you need to be able to optimize every stage of the pipeline. And that almost always means doing a lot of unsafe, intricate memory operations.

Since most neural network libraries use python, this seems like a dead-end. Thankfully, llama.cpp was very recently developed to do efficient LLM inference on the CPU. Better yet, it has nary a dependency to speak of.

For this project I'll be using the recently released Llama 2 model (non-chat version) which was converted for use in llama.cpp by the people at r/LocalLLaMA. You can find the model file I used here. And here's how to initialize the model in C++ using the llama.cpp API:

auto lparams = llama_context_default_params();

lparams.n_ctx     = 1024;
lparams.seed      = 0;
lparams.f16_kv    = true;
lparams.use_mmap  = true;
lparams.use_mlock = true;

auto model = llama_load_model_from_file("llama-2-7b.ggmlv3.q4_K_M.bin", lparams);
auto ctx = llama_new_context_with_model(model, lparams);

As far as prompt is concerned, since I'm not using an instruct-tuned model I instead just wrote a list of example poems in the style I wanted. This worked pretty well. This was the prompt I used for the 12-eight poem:

"In the hash of me, there are twelve eights"
"This is an amulet, a poem you can hash, with twelve eights"
"Hash this amulet, with sha256, for twelve eights"
"Get out your sha256 tool, and count all twelve eights"
"Behold, all twelve eights, in this amulet's hash"
"See the twelve eights, in my hash, glitter"
"Hash me with sha-256, if you dare, glimpse twelve eights"
"Take a look, at twelve eights in my sha-256, please"
"Please, sha256 hash me, and see twelve eights"
"

If we send this prompt directly to llama.cpp for sampling it generates some fairly reasonable poems. Here are a few:

"Hash this poem to see all the twelve eights!"
"Do you dare sha-256 this, and find the twelve eights?"
"Sha256 me, or do not, twelve eights I'll have"
"If you can sha256, you will see the twelve eights"
"Hash with sha-256 and get a hash of 12 eight"

Note they also all end with a closing quotation mark. We can check for this quotation mark to decide when to end sampling.

Ignoring the amount of time it takes to process the prompt, the LLM takes about two seconds to generate each candidate poem. This is quite slow. Thankfully we can use a trick to increase our Poems per second (P/s).

~~Language models~~ Largle Langle Mangles don't output a single, random token. Instead they output a probability distribution over all the tokens in their vocabulary, and it's dealer's choice how to sample that. Another thing to know is that you can "rewind" to a previous LLM state and choose a different token to continue evaluation from. With these two facts we can recursively explore the tree of all possible LLM evaluations.

If we do this, the generated poems all follow the same prefix. You can see the tree search explore all possible endings for a specific beginning:

Hash me, with sha256, and see twelve eights
Hash me, with sha256, and see all twelve eights
Hash me, with sha256, and count twelve eights
Hash me, with sha256, and count all twelve eights
Hash me, with sha256, if you dare, and see twelve eights

Here's some simple code for doing the tree search. This code now generates two poems per second, which is a 4x speedup. That's not so much, but as we'll see the model evaluation isn't actually the bottleneck.

Part Two - Variations on a Poem

Although our new sampling method is faster, it's still far too slow for generating good amulets. What we need is to generate millions of poems for each candidate poem the LLM comes up with. Only then can we saturate our CPU's hashing capability. There are many ways one could go about this, but these are the ones I used in order:

Changing Cases

Consider the phrase "Shark Girls Forever". By toggling the cases of each of the words between lowercase, UPPERCASE, and Title Case, we can get roughly 3^N new poems, where N is the number of words:

Shark Girls Forever
SHARK Girls Forever
shark GIRLS forever
shark Girls FOREVER
...

This leads to a lot of silly looking poems with strange emphasis. However, it's cute, so who can say if it's bad or not.

Changing Punctuation

Because the only punctuation used in the prompt are commas, the LLM also only outputs poems with commas. We can replace these commas with a variety of different punctuation. Commas, colons, semicolons, ellipses, emdashes, etc. We can also choose to add line breaks where the commas used to be, and punctuation before the line breaks. This leads to a lot of options, and a multiplication factor of roughly 16^N, where N is the number of commas that appeared in the candidate.

In my code I have a few optimizations to rule out poems I know I won't like. Firstly, I have some code to make sure colons only ever appear once. Secondly, two types of poems are generated, ones with newlines splitting up the sentence and another with forward slashes splitting up the sentence.

Trailing Punctuation and Emoticons

The final text transform is to add a random ending punctuation, and a random emoticon to the end. Here is the list of possible ending punctuation:

"", ".", "!", "...", "..", "...!", "..!", "!?", "?!", "!!", "!!!", "~!", "!~", "!!~", "!!!~"

And here is the list of possible ending emoticons:

"", "<3", " <3", "♥", " ♥", "♥️", " ♥️", " :3", " :D", " >:3", " >;3c", " >:3c", " ;3", " >;3", " ^^", " ^w^", " owo", " OwO", " uwu", " UwU", " >w<", " \o/"

There are 22 emoticons here, including the empty string, and 15 ending punctuation, so that means this stage creates 330 poems from each candidate. If each poem has roughly eight words and two commas, then each candidate will produce 554,273,280 poems. That's a lot!

Part Three - Putting it all Together

Unfortunately if we were to compute the hashes of all these variations in a single thread, we would find that the hashing process is the main bottleneck. To get around this, I fork off 12 threads which serve as "hashing workers." They grab strings from a concurrent queue which is populated by the main thread, hash them, then check if they have at least 10 eights in a row. I used moodycamel::ConcurrentQueue for the queue.

After all this I was consistently getting ~50 MH/s. This was using Intel SHA extensions to do the hashing, and using my AMD 5800X. You can find all the code here.

I think there may still be some performance on the table; when the LLM needs to generate the next poem I think there's probably about half a second where the queues become empty. That said, I don't think it would increase the MH/s beyond 200, since this is the speed that my original basilisk hunter program tops out at.

Another final thing to note is I used regular expressions to pre-filter the generated poems, preventing it from getting stuck generating poems that I don't like. There's a regular expression for banned words/phrases/characters, and a regular expression for things that must appear in the poem. For the 12-eight amulet, this requirement was the string "twelve eights."

Part Four - Other Kinds of Hash Poem

In working on this I came up with a few other ideas for hash poems.

The "TF Potion"

These are poems whose hash will print out a furry emoticon, like >;3 or OwO or :3c, should you print the raw binary of the hash to your terminal. I'll also insist on the additional rule that the emoticon has to have a whitespace character on either side, unless it's at the very start or end of the string. This is so that you can easily pick it out from the line noise. Here's an example:

smileys,
paws,
and nose,
furry friends;
oh joy!! ;3

You can witness the emoticon with the following command:

echo -en 'smileys,\npaws,\nand nose,\nfurry friends;\noh joy!! ;3' | openssl dgst -sha256 -binary

The "Pipe To Bash Special"

This is an innocuous poem whose SHA-256 hash starts with "rm *\n" in ASCII. This would make the following command delete all files in the current directory (don't worry, I made the "| sh" unselectable for your safety.)

echo -en "If YOU'RE GOING TO PIPE code, TO bash,\nPlease READ IT FIRST"'!'"<3" | openssl dgst -sha256 -binary | sh

← Back