|
MONK : Profiling an Author
This page last changed on Apr 18, 2007 by sramsay@unlserve.unl.edu.
One use case I've thought about a lot is silhouetting an author against her environment. Take George Eliot. You want to know about the ways in which she differs from her contemporaries. "Difference" here means the overuse or underuse of quite primitive low-level linguistic phenomena at the lexical, morphological, or syntactic level. To a shrewd observer, the distribution of such low-level phenomena often points to interesting higher-level phenomena. At the least it provides very down-to-earth evidence about previous hunches. There are different statistical routines for doing this. Dunning's log likelihood ratio appears to be a current tool of choice, and we've used it in WordHoard. So you imagine a procedure where a user picks George Eliot and then defines a background consisting of texts written between (birthdate -10 years) and (last published work +10 years). But then I wonder whether this would really produce significantly more accurate results than fitting authors into precomputed fifty-year spans that move in windows of 25 years. So Dickens and George Eliot alike fit into an 1825-1875 background, while a customized background would be 1809-1886 for Eliot and 1802-1880 for Dickens. Perhaps it would be better to use 60 or 70 year spans but keep the same sliding window. I assume that such customized backgrounds would be computationally more expensive and slower than precomputed backgrounds. I also assume that the results would be trivially different in most cases. So this gets us into the very practical question of whether intelligent anticipation can get good enough results for many queries. In this context I also remember the Illinois Senator Percy, who was hired by Bell&Howell as a stockboy at the age of 19 and was CEO of the company at age 29. His breakthrough came when he realized that nine form letters could adequately answer 80% of customer complaints. |
| Document generated by Confluence on Apr 19, 2009 15:05 |