Text of 1 Combining KR and search: Crossword puzzles Next: Logic representations Reading: C. 7.4-7.8
1 Combining KR and search: Crossword puzzles Next: Logic representations Reading: C. 7.4-7.8
2 Changes in Homework Mar 4 th : Hand in written design, planned code for all modules Mar 9 th : midterm Mar 25 th : Fully running system due Mar 30 th : Tournament begins
3 Changes in Homework Dictionary Use dictionary provided; do not use your own Start with 300 words only Switch to larger set by time of tournament Representation of dictionary is important to reducing search time Using knowledge to generate word candidates could also help
4 Midterm Survey Start after 9AM Friday and finish by Thursday, Mar. 4 th Your answers are important: they will affect remaining class structure
5 Crossword Puzzle Solver Proverb: Michael Litman, Duke Univ Developed by his AI class Combines knowledge from multiple sources to solve clues (clue/target) Uses constraint propogation in combination with probabilities to select best target
6 Algorithm Overview Independent programs specialize in different types of clues knowledge experts Information retrieval, database search, machine learning Each expert module generates a candidate list (with probabilities) Centralized solver Merges the candidates lists for each clue Places candidates on the puzzle grid
7 Performance Averages 95.3% words correct and 98.1% letters correct Under 15 minutes/puzzle Tested on a sample of 370 NYT puzzles Misses roughly 3 words or 4 letters on a daily 15X15 puzzle
8 Questions Is this approach any more intelligent than the chess playing programs? Does the use of knowledge correspond to intelligence? Do any of the techniques for generating words apply to Scrabble?
10 To begin: research style Study of existing puzzles How hard? What are the clues like? What sources of knowledge might be helpful? Crossword Puzzle database (CWDB) 350,000 clue-target pairs >250,000 unique pairs = # of puzzles seen over 14 years at rate of one puzzle/day
11 How novel are crossword puzzles? Given complete database and a new puzzle, expect to have seen 91% of targets 50% of clues 34% of clue target pairs 96% of individual words in clues
13 Categories of clues Fill in the blank: 28D: Nothing ____: less Trailing question mark 4D: The end of Plato?: Abbreviations 55D: Key abbr: maj
14 Expert Categories Synonyms 40D Meadowsweet: spiraea Kind-of 27D Kind of coal or coat: pea pea coal and pea coat standard phrases Movies 50D Princess in Woolfs Orlando: sasha Geography 59A North Sea port: aberdeen Music 2D Hold Me country Grammay winner, 1988: oslin Literature 53A Playwright/novelist Capek: karel Information retrieval 6D Mountain known locally as Chomolungma: everest
18 Candidate generator Farrow of Peyton Place: mia Movie module returns: 0.909091 mia 0.010101 tom 0.010101 kip 0.010101 ben 0.010101 peg 0.010101 ray
21 Ablation tests Removed each module one at a time, rerunning all training puzzles No single module changed overall percent correct by more than 1% Removing all modules that relied on CWDB 94.8% to 27.1% correct Using only the modules that relied exclusively on CWDB 87.6% correct
22 Word list modules WordList, WordListBig Ignore their clues and return all words of correct length WordList u 655,000 terms WordListBig u WordList plus constructed terms u First and last names, adjacent words from clues u 2.1 million terms, all weighted equally 5D 10,000 words, perhaps: novelette Wordlist-CWDB 58,000 unique targets Returns all targets of appropriate length Weights with estimates of their prior probabilities as targets of arbitrary clues u Examine frequency in crossword puzzles and normalize to account for bias caused by letters intersecting across and down terms
23 CWDB-specific modules Exact Match Returns all targets of the correct length associated with the clue Example error: it returns eeyore for 19A Pal of Pooh: tigger Transformations Learns transformations to clue-target pairs Single-word substitution, remove one phrase from beginning or end and add another, depluralizing a word in clue, pluralize word in target Nice X X in France X for short X abbr. X start Prefix with X X city X capital 51D: Bugs chaser: elmer, solved by Bugs pursuer: elmer and the transformation rule X pursuer X chaser http://www.oneacross.com http://www.oneacross.com
24 Information retrieval modules Encyclopedia For each query term, compute distribution of terms close to query u Counted 10-k times every times it apears at a distance of k
27 Syntactic Modules Fill-in-the-blanks >5% clues Search databases (music, geography, literary and quotes) to find clue patterns 36A Yerbys A Rose for _ _ _ Maria: ana u Pattern: for _ _ _ Maria u Allow any 3 characters to fill the blanks Kindof Pattern matching over short phrases 50 clues of this type u type of (A type of jacket: nehru) u starter for (Starter for saxon: anglo) u suffix with (Suffix with switch or sock: eroo
28 Implicit Distribution Modules Some targets not included in any database, but more probable than random Schaeffer vs. srhffeeca Bigram module u Generates all possible letter sequences of the given length by returning a letter bigram distribution over all possible strings, learned from CWDB Lowest probability clue-target, but higher probability than random sequence of letters u Honolulu wear: hawaiianmuumuu How could this be used for Scrabble?
29 Questions Is this approach any more intelligent than the chess playing programs? Does the use of knowledge correspond to intelligence? Do any of the techniques for generating words apply to Scrabble?