Named Entity Recognition and Linking in PoeTree Corpora

Petr Plecháč; Artjoms Šeļa; Silvie Cinková; Mirella De Sisto; Lara Nugues; Neža Kočnik; Robert Kolár; Thomas Haider

doi:10.12697/smp.2025.12.2.01

Authors

Petr Plecháč Institute of Czech Literature of the Czech Academy of Sciences, Na Florenci 1420/3, 110 00 Prague, Czechia https://orcid.org/0000-0002-1003-4541
Artjoms Šeļa Institute of Czech Literature of the Czech Academy of Sciences, Na Florenci 1420/3, 110 00 Prague, Czechia https://orcid.org/0000-0002-2272-2077
Silvie Cinková Institute of Formal and Applied Linguistics, Charles University, Malostranské náměstí 2/25, 118 00, Prague, Czechia https://orcid.org/0000-0003-4526-3915
Mirella De Sisto Department of Computational Cognitive Science, Tilburg University, Warandelaan 2, 5037 AB Tilburg, Netherlands
Lara Nugues University of Basel, Maiengasse 51, CH-4056 Basel, Switzerland
Neža Kočnik University of Maribor, Slomškov trg 15, 2000 Maribor, Slovenia https://orcid.org/0009-0003-8318-2179
Robert Kolár Institute of Czech Literature of the Czech Academy of Sciences, Na Florenci 1420/3, 110 00 Prague, Czechia https://orcid.org/0000-0001-8061-1917
Thomas Haider University of Passau, Innstr. 41, 94032 Passau, Germany

DOI:

https://doi.org/10.12697/smp.2025.12.2.01

Keywords:

poetry, named entities, computational poetics, natural language processing

Abstract

Named entity recognition (NER) and named entity linking (NEL) remain underexplored in poetic texts. This study provides the first large-scale evaluation of contemporary NER and NEL systems on poetry across seven European languages – Czech, German, English, French, Italian, Russian, and Slovenian – using corpora from the PoeTree project. We benchmark three NER systems (flair, NameTag 2, spaCy) and three GPT models (GPT-3.5, GPT-4, GPT-4 Turbo) against manually annotated gold standards. While results fall short of in-domain benchmarks, they significantly outperform earlier findings. Manual correction further raises final annotation quality to estimated F1 scores between 0.77 and 0.93 across languages. We additionally evaluate two NEL systems – spaCy fishing and mGenre – showing that mGenre consistently outperforms spaCy fishing, achieving in-KB F1-scores of 0.70–0.81. By analysing geographic distances between predicted and gold-standard links, we demonstrate that a substantial portion of “incorrect” predictions are near-miss ambiguities rather than substantive errors. The resulting manually verified geolocation annotations have been integrated into PoeTree and made available through an interactive map interface.

Downloads

Download data is not yet available.

Named Entity Recognition and Linking in PoeTree Corpora

Authors

DOI:

Keywords:

Abstract

Downloads

Downloads

Published

Issue

Section

Most read articles by the same author(s)

Developed By

Make a Submission

Information