|
MONK : Sentimentality
This page last changed on Mar 22, 2007 by ssteger@uga.edu.
Problem/Question: This project is a macro study which will utilize text mining techniques to explore sentimentality in nineteenth-century British (and American) texts. The aim is to use quantitative tools to uncover patterns of affect at various levels of the texts (including vocabulary, sentence structure, and structure of the work as a whole). Sentimentality as a genre is particularly compelling for study given its long-term association with cliched or formulaic writing. Is there a sentimental formula? Are there certain sentimental tropes that emerge in these texts? The general approach is to use supervised learning techniques to rank some of the most notoriously sentimental moments ("chunks" of around 5-10 paragraphs) on several tropes or features. So, while nora used supervised learning to rank one feature at a time in the Dickinson project, this project would use rankings of multiple features at once (such as the sanctity of motherhood, ellipses, childhood innocence, last looks, Christian death or death of children, and indicators of sensibility - fainting, sighing, weeping, paleness, etc.). In addition, instead of employing analysis on a single vector (vocabulary), my hope is to use multiple vector analysis, which has been proven to increase the statistical reliability of results. The aim is not only to use machine learning techniques to find "more like these," but also to determine what low-level patterns are implicated in the higher-level formations of sentimentality. Why sentimentality? Besides for the aforementioned point that the movement is already strongly linked to pattern in the form of cliche, perhaps the strongest appeal for using text mining to study sentimentality lies in the fact that, while genre study is on the rise in nineteenth-century studies, sentimentality remains rather loosely defined. The movement remains uncomfortably associated with indulgent effeminacy or embarrassing catharsis. Art historian Judith Stoddart pinpoints the "predictable critical squirming" that accompanies stock invocations of Victorian sentimentality. As critics, we are meant to be objective evaluators of a text, but the sentimental refuses us this distance. Discussions of affective response seem naive or transparent. However, critic Claudia Johnson points out that "sentimentality is politics made intimate" (Equivocal Beings, 2). This speaks to a paradox inherent in sentimental texts. On the one hand, the sentimental text is rooted in circumstance that is, by nature, political. On the other hand, sentimentality is designed to be universally affective. How can we critically evaluate works that are designed to make us feel? Works become categorized as sentimental because they seem to fit, often for vague reasons that are hard to delineate. Rather than knowing it when we see it, we recognize a work as sentimental when we feel it. I realize that using computer analysis and text mining to explore sentimentality seems slightly ironic - the juxtaposition of a study of emotion with a cold, unfeeling machine. But what seems a conflict or an act of perversity actually reflects the paradox of sentimentality itself. The project aims to uncover a methodology of sentimentality - to better understand why or how a text is affective. It seeks to understand the formal elements of feeling. And it tests the hypothesis that we are conditioned to respond to a text in certain ways because we encounter these elements in other texts. The primary investigator will be Sara Steger (who hopes to use the results in her Monk-related dissertation), with guidance and direction from Stephen Ramsay. Current practice: Genre study particularly lends itself to study using data mining methods because of the large amount of data involved and because genre study is inherently a classification problem. The nora project also had a use case that explored American sentimentality - the "wet or not" approach. This new project aims to go beyond classification to explore patterns of affect in the texts. In other words, while this project may delve into finding "more like these," it also seeks to ask what features make a text sentimental. Status of the research: Are you working on this now e.g. in a small manual fashion or with other tools? If this is your thesis, when do you hope to graduate? (i.e. when do you hope to be able to use the tools at the latest) In my dissertation, I'm particularly interested in the ways that emotion is used in novels as a connection between all subjects, from the penniless and dirty street urchin to the respectable, middle-class reader. In creating this link between character and reader, the focus of a sentimental work turns outward, with an eye toward affecting emotion to contribute to social reformation. I'm currently doing research on sentimentality that involves formal close readings of scenes with parallel themes in different novels - for example, the deaths of Jo in Bleak House, of Little Nell in The Old Curiosity Shop, of Eva in Uncle Tom, of Amy in Little Women, and of the twins in Mary Barton. I begin with these most-referenced sentimental moments to hypothesize about overlaps between the texts at varying levels, including (at the most basic level) vocabulary, but also thinking about common imagery and about the techniques these authors use to draw the reader in (direct addresses to the reader and visible "markers" in the text for emphasis in punctuation, italics, etc.). Text mining enables me to test these hypotheses and discover new patterns for exploration. It also would enable me to move beyond formal study to study sentimentality on a macro level (see Measure of success, below). My funding runs out August 2008, so I'm hoping to get some preliminary results by the end of 2007 in order to incorporate them in my dissertation. Of course, the dissertation would not mark the end of my interest in the project - it's only a small document I need in order to graduate. Measure of success: If I received ANY interesting results that I could use to support my critical inquiry into sentimentality, I would be thrilled. If I only received information about five novels and overlapping indicators of emotion within these five, I would use it. In that case, I would focus mostly on those particular novels and less on sentimentality as a whole. I'd see this as involving clustering rather than as a classification problem. If I could broaden the scope of the "mine" to scenes in thirty novels, or (shooting high) even fifty, I could also broaden the scope of my dissertation to discuss sentimentality as a genre (making it a classification problem). While I could write a dissertation about five novels without using computer-assisted analysis, broadening to genre study really relies on the monk project. As for what might characterize "successful" or interesting results, I think that's hard to say until I receive some. Finding that there is no real overlap between the texts would be interesting. Finding that there is overlap would also be interesting. I could find plenty to write about in either case. Texts needed in the collection:
Since the object is to study genre, one book/work would not be particularly helpful.
Uncle Tom's Cabin - Harriet Beecher Stowe, 1852
A really vast collection of nineteenth-century British and American novels Is there multiple versions of your documents you would need to see in paralel or combined? Is there foreign language or unusual characters? Multiple versions aren't necessary, but nineteenth-century novelists loved to insert a smattering of French, German, or Greek here and there. For the purposes of the study, those particular moments could just be ignored. Generality: what other questions other users might ask that would be similar to your question? Other questions about genre (of course), questions about patterns within a particular author's works, perhaps questions about the presence of patterns across time, etc. Granularity: can you guess the granularity(ties) you will need to use? word, paragraph, page, books etc. Multiple levels? The project would begin at the level of what I'm calling a scene (averaging 5-10 paragraphs). Depending on the results, further study would involve multiple levels of granularity - for instance widening out to study the patterns of affect in the work as a whole (I hypothesize a "seismograph of emotion" where periods of intense sentimentality are followed by periods of relief), or narrowing to focus on vocabulary (affective semantics). Stages or Phases At the first stage of the project, five or six sentimental chunks (plus one decidedly unsentimental chunk involving a similar theme) would be evaluated using clustering algorithms - perhaps as a first step, we could begin by incorporating only one feature for ranking (sentimental or not) and one vector (vocabulary). We know we can already do this from nora, but it would give us a start. From there, I'd like to broaden out to work on incorporating multiple ranked features at once and/or multiple vectors using the same texts. This is something we have not done with nora, but is something I know researchers will want to do. Finally, I'd like to broaden the scope of analysis to incorporate other texts. Characteritics: what low level characteristics of the text you think will be useful for your research? (e.g. POS, Ngrams, Soundex). Low level characteristics that might prove beneficial include individual tokens, POS, NGrams, and maybe punctuation (for instance, to determine whether sentimental works particularly utilize the dash, exclamation point, or question mark). Patterns Can you try to express examples of complex patterns you want to identify, or hope to find? I hypothesize that I will find the aforementioned tropes used more in sentimental works. I think these tropes will emerge at the vocabulary level, but patterns may be uncovered when POS is examined (an abundance of adjectives?). I think that emotional words will rise to the top, along with words associated with motherhood, light imagery, words associated with innocence, words associated with nature, words associated with diminutiveness, etc. Tags:
Date, author, chapter or div.
POS, punctuation, maybe xpath (or location in the text). Morphology: example of use? I'm still thinking this over, but as a first step, I don't see how morphology is really critical. Lexicon, counts of words, most common occurences, concordance Maybe as a step prior to the first step to focus the analysis. This would fall under Steve's "stupidest thing that works" idea. |
| Document generated by Confluence on Apr 19, 2009 15:05 |