Eluraamatute arvutianalüüs – prooviuurimus / Computational analysis of Life Books – a probing study
Keywords:stilomeetria, sentimendianalüüs, konkordants, sagedus, digihumanitaaria, stylometrics, sentiment analysis, concordance, frequency, computational humanities
Niipea, kui andmed muutuvad suurandmeteks, võib muutuda probleemiks lähilugemise abil tehtav analüüs: sellest võib saada lõputu protsess, mida uurija aju ei suuda enam hoomata. Osaliselt võib probleemi lahendada digihumanitaaria meetoditega: suurte, näiteks etnoloogia-alaste tekstihulkade kvantitatiivseks analüüsiks on võimalik kasutada mitmesuguseid töövahendeid. Näiteks programmi AntConc abil saab uurida nii sõnade sagedust kui jaotust tekstis. LIWC2015 abi on võimalik kasutada elulugude sentimendianalüüsis, programmi Stylo tekstide stilomeetrilises analüüsis. Selliste programmide kasulikkust katsetatakse siin suhteliselt väikese nn eluraamatute korpuse juures, et selgitada välja, kui väärtuslik võiks see tööriist olla peagi korpusesse lisanduva palju suurema tekstihulga puhul.
As soon as “data” turn into “big data”, analysis by “close reading” can become a problem: it can become an endless process that eventually the brain of the researcher can no longer get a grip on. Methods of computational humanities can partially solve the problem: various tools can be used to make quantitative analyses of large amounts of text, for example in the field of ethnology or folklore. Various tools may be considered for such text analysis. For example, the program AntConc can be used to study word frequencies, as well as the distribution of concepts across the text. LIWC2015 can be used for sentiment analysis of life stories and show differences between genders (or generations) in telling life stories. Stylo may be used for the stylometric analysis of texts. The usefulness of such programs is tested here on a still relatively small corpus of so-called Life Books. The Life Books (“Levensboek”) is a project of Humanitas Foundation in Netherlands to publish limited edition booklets of life stories compiled from interviews with older people, recorded and edited by volunteers. This study is based on 19 digital Life Books – in fact still small enough for close reading and qualitative analysis. However, the intention here is to use the corpus as a pilot to see how valuable the tools can be for a much larger amount of texts that will be added in the near future. In this pilot I want to see to what extent the Life Books can be used for structural analysis, gender differences in narrative style and subject choice, sentiment analysis, recurring themes, distribution of motifs, and perhaps most importantly: thematic gaps. That is to say: which (important) issues are not raised by the storytellers?
The experiment shows that it is possible to do research into narrative structures, although this could be much more refined in terms of events. Stylometric analysis with Stylo of male and female repertoires is rather tricky, because interviewers/editors can (very much) interfere as a filter here. Stylo looks for patterns in the use of function words to determine different styles, but Life Books are just not quoting narrators literally all the time, so in quite some cases linguistical features, like the use of function words, may not originate from the storytellers but from the editors.
On the other hand, sentiment analysis in combination with gender, for example, is possible using LIWC2015: this tool can give a fair respresentation of emotions, relationships and related motifs in life stories. Furthermore, AntConc proves to be a useful tool to investigate the occurrence and distribution of themes and topics. Research into the lack of certain themes and motifs remains an interesting option as well.