Thumbtack offered a programming challenge at PyCon 2013. I haven’t been able to find it online, so here it is for your enjoyment.
Update (4/16/2013): Thumbtack has posted a blog post with some solutions to the problem, and thoughts on performance, so you can compare your solution to their best entries.
Thumbtack Anagram Challenge
Write a Python program that finds any two words in a corpus of text that are an anagram of two other words in the same corpus.
For example, there is exactly one valid anagram pair in this sentence:
Guido was happy once he solved Thumbtack’s PyCon challenge and won a heap of prizes!
$ curl -sL http://thumb.tk/8Qdr | python anagrams.py happy once, heap pycon
Your solution should work on longer texts as well:
And this Thing I saw! How can I describe it? A monstrous tripod, higher than many houses, striding over the young pine trees, and smashing them aside in its career; a walking engine of glittering metal, striding now across the heather; articulate ropes of steel dangling from it, and the clattering tumult of its passage mingling with the riot of the thunder. A flash, and it came out vividly, heeling over one way with two feet in the air, to vanish and reappear almost instantly as it seemed, with the next flash, a hundred yards nearer. Can you imagine a milking stool tilted and bowled violently along the ground? That was the impression those instant flashes gave. But instead of a milking stool imagine it a great body of machinery on a tripod stand. from War of the Worlds by H.G. Wells http://thumb.tk/8PYN
Read the text corpus from stdin and print the anagram pairs to stdout.
Your solution should be case-insensitive – mug and Gum are considered anagrams.
The words in each pair must not appear in the other pair.
Treat all non-alphanumeric characters as whitespace – “He’s twenty-seven, and” would be considered five words: he, s, twenty, seven, and.
Ignore all words with fewer than four letters.
Down the Rabbit Hole
For a bonus prize, make your anagram finder more general. Find the largest set(s) of disjoint word pairs that all anagram to each other in a given corpus.
Alice in Wonderland contains two sets of 10 word-pairs, using the same rules as before. Can you find the sets?
$ curl -sL http://thumb.tk/8Qd3 | python anagrams2.py seals other, slate shore, roast heels, tales horse, earth soles, hearts lose, hoarse lest, stole share, shoes later, those earls (and another 10 pair set that you should print on this line!)
Notice in the output above that seals other is an anagram of slate shore, which is an anagram of tales horse, which is an anagram of earth soles, etc.
Full text of Alice in Wonderland from Project Gutenberg: http://thumb.tk/8Qd3
This is Thumbtack’s second year as a PyCon sponsor, and our second year coming up with a fund coding challenge. We hope you enjoy it!
Our entire engineering team is here if you’d like to know more about us, or visit thumbtack.com/jobs. We’re hiring Python engineers!