Kirjanduslikud digikeskkonnad keeleressursside baasina: mõjukriitika juhtumiuuring päringusüsteemis KORP / Digital literary heritage projects as a source of language resources: a case of Estonian criticism in KORP
Eesti Kirjandusmuuseum on olnud teerajajaid digihumanitaaria valdkonnas juba 1990. aastatest, alates arvutikultuuri laiemast levikust. Väärtuslike andmekogude haldamisel on olnud missiooniks nende kättesaadavaks tegemine avalikkusele. Kultuuripärand avati laiemale kasutajale kahes suunas: sisupõhised otsitavad andmebaasid ning suhtepõhised andmekeskkonnad. Siinse artikli eesmärgiks on näidata arvutusliku kirjandusteaduse tänapäevaseid võimalusi ja nendega seotud kirjanduslike keeleressursside loomist koostöös korpuslingvistidega. Artiklis analüüsin kultuuripärandi sisukeskkondade ja andmekoguside kasutusvõimalusi masinloetava keeleressursina. Esimeste selliste katsetena on valminud kirjavahetuse ja kriitika märgendatud keelekorpused päringusüsteemis KORP. Käesolev uurimus toob on 20. sajandi alguse mõjukriitika probleemi näitel välja kirjanduslike keelekorpuste potentsiaali kultuuripärandi uurimisel.
Estonia can soon expect an explosive growth in digital heritage and text resources due to the current project of mass digitisation of national cultural heritage (printed books, archival documents, photos, art, audiovisual, and ethnographic artifacts) (2019–2023). This will give new opportunities for different fields of digital humanities and make digitised heritage accessible to everyone in the form of open data. The project will focus on the usage of the heritage, on the needs of education, e-learning, and the creative industry, including digital creative arts.
The aim of this article is to examine some research possibilities that opened up for literary history due to the digitisation of literary works and archival sources and to put them in the general context of digital humanities.
Although the field of digital humanities is broad, the meaning of DH is often reduced to methods of computational language-centered analyses, mainly based on using different tools and software languages (R, Stylo, Phyton, Gephy, Top Modelling etc.). While the corpus-based research is already a professional standard in linguistics, literary scholars are still more used to working with traditional methods. This article introduces two digital literary history projects belonging to the field of digital humanities and analyses them as language resources for creating texts corpora, and introduces some results of the case study of Estonian criticism from the Young Estonia movement up to the 1920s, carried out using the literary texts corpora in the corpus query system KORP (https://korp.keeleressursid.ee) by the Centre of Estonian Language Resources.
During the past twenty years, I have mainly focussed on developing large-scale implementation projects for digital representation of Estonian literary history. The objective of these experimental projects has been to develop principally new non-linear models of Estonian literary history for the digital environment. These activities were based on my research of the intertextual relations between authors, literary works, and critical texts using traditional methods.
The first content-based literary history project “ERNI. Estonian Literary History in Texts 1924–1925” (www2.kirmus.ee/erni) was based on a hypertextual network of literary source texts and reviews. We re-conceptualised literary history as a non-linear narrative and a gallery with many entrances. The task of the project was also to ensure its usability in education: a significant number of study materials has been added in cooperation with schoolteachers.
In 2004, we initiated our long-term and still running project “Kreutzwald’s Century: the Estonian Cultural History Web” (http://kreutzwald.kirmus.ee) at the Estonian Literary Museum. The objective of this project was to make literary sources of the period accessible as the dynamic, interactive information environment. This was a hybrid project which synthesised the classical study of Estonian literary history, the needs of the digital media user, and the expanding digital resources from different memory institutions; its underlying idea was to link together all the works of fiction of an author, as well as their biography, manuscripts, and photos and to make them visible for the user on five interactive time axes. The project uses a specially created platform. Today, this platform is extensively used by schoolteachers: in 2020 (Jan.–Dec.) it had about 8, 986.555 million clicks and during seven years (2013 Dec.–2020 Dec.) it has collected 64, 627.380 million clicks.
To find out how we can fit such content-based models of literary heritage into the context of Digital Humanities we need to compare the previous modelling practices with our current experimental project in the corpus query system KORP. Our interdisciplinary project “Literary Studies Meet Corpus Linguistics” (2017–2020) concentrated on studying literary history sources with linguistic methods. As the result of the project two literary text corpora were created: “Epistolary text corpus of Estonian writers Johannes Semper and Johannes Vares-Barbarus” and “Corpus of the Estonian literary criticism, Noor-Eesti and the 1920s”. Both of them were pilot projects in the field, started with converting the digitalised archival and printed sources into machine-readable format before text and data mining for corpus creation.
Query system KORP allows us to organise the language data by all the categories used in the corpus, for example, to learn who and in what context mentioned the name of the French writer André Gide. The second currently running project is the morphologically annotated corpus of literary criticism. This corpus contains texts of literary reviews and criticism in different genres, drawn from the projects ERNI and “Kreutzwald’s Century”. The first results in studying the dynamics of literary values can already be seen.
A query in KORP about the word ‘mõju’ (‘influence’) revealed that the manifesto “More of European culture!”of the group Young Estonia, voiced in 1905, was during the independent Estonian Republic replaced by the valuing of a specific national character. Corpus query showed a change in the meaning of the word: in the criticism contemporary to Young Estonia, the word ‘mõju’ was only associated with the historical pressure from Russian and German cultures. The foundation for modern comparative linguistics at the University of Tartu was laid in the 1920s by the professorship in Estonian literature.