2016-02-01
We have a look at the GADDAG data structure.
This post is about a data structure called a GADDAG. Steven de Rooij and I rediscovered this data structure in a programming project relating to Scrabble (and Wordfeud). The problem we were trying to solve is: Given a scrabble board configuration and rack tiles, generate a list of all possible moves that can be made. The main constraint is that each move has to result in legal words both along the move and across it. Scrabble is a serious sport, and so what constitutes a legal word is strictly specified, as for example by the English SOWPODS.
So how does one find those moves fast? That is where the GADDAG comes in. One may think of a GADDAG as a handy representation of a list of words, such that the following question can be answered quickly.
Given a substring of a word in the dictionary, which letters can be added to the left of it to stay a substring of a word in the dictionary?
Given a prefix of a word in the dictionary, which letters can be added to the right of it to stay a prefix of a word in the dictionary?
As an example, consider a dictionary containing the words cat
, car
, art
and rat
. If our current string is ar
, the GADDAG can tell us that left-extensions are the letter c
and the end-of-word. Moreover, it can tell us that the right extensions are the letter t
(and not the end-of-word).
How does the GADDAG do that? Well, it is a DAG (directed acyclic graph) where we insert the words schizophrenically. We start anywhere in the middle of the word, and read the letters right-to-left from there. Once we get to the left end, we switch directions and read the remaining letters left-to-right until we hit the right end. For example, for cat
we insert >cat<
, c>at<
, ac>t<
and tac><
. Here >
denotes the left end-of-word marker, and <
denotes the right end-of-word marker.
The GADDAG of the example is displayed in the figure below.
To get a feeling for whether two-player Scrabble is balanced, I looked at greedy vs greedy games. The greedy player always plays the highest-scoring move. This maximises the immediate score but ignores long-term strategy. Simulating 10,000 games takes 44 seconds. The results are as follows.
Win for first player | 4934 |
Win for second player | 5030 |
Draw | 36 |
So among greedy players, there is no significant advantage to playing first.