Named Entity Recognition and Linking in PoeTree Corpora

Authors

  • Petr Plecháč Institute of Czech Literature of the Czech Academy of Sciences, Na Florenci 1420/3, 110 00 Prague, Czechia
  • Artjoms Šeļa Institute of Czech Literature of the Czech Academy of Sciences, Na Florenci 1420/3, 110 00 Prague, Czechia https://orcid.org/0000-0002-2272-2077
  • Silvie Cinková Institute of Formal and Applied Linguistics, Charles University, Malostranské náměstí 2/25, 118 00, Prague, Czechia
  • Mirella De Sisto Department of Computational Cognitive Science, Tilburg University, Warandelaan 2, 5037 AB Tilburg, Netherlands
  • Lara Nugues University of Basel, Maiengasse 51, CH-4056 Basel, Switzerland
  • Neža Kočnik University of Maribor, Slomškov trg 15, 2000 Maribor, Slovenia https://orcid.org/0009-0003-8318-2179
  • Robert Kolár Institute of Czech Literature of the Czech Academy of Sciences, Na Florenci 1420/3, 110 00 Prague, Czechia
  • Thomas Haider University of Passau, Innstr. 41, 94032 Passau, Germany

DOI:

https://doi.org/10.12697/smp.2025.12.2.01

Keywords:

poetry, named entities, computational poetics, natural language processing

Abstract

Named entity recognition (NER) and named entity linking (NEL) remain underexplored in poetic texts. This study provides the first large-scale evaluation of contemporary NER and NEL systems on poetry across seven European languages – Czech, German, English, French, Italian, Russian, and Slovenian – using corpora from the PoeTree project. We benchmark three NER systems (flair, NameTag 2, spaCy) and three GPT models (GPT-3.5, GPT-4, GPT-4 Turbo) against manually annotated gold standards. While results fall short of in-domain benchmarks, they significantly outperform earlier findings. Manual correction further raises final annotation quality to estimated F1 scores between 0.77 and 0.93 across languages. We additionally evaluate two NEL systems – spaCy fishing and mGenre – showing that mGenre consistently outperforms spaCy fishing, achieving in-KB F1-scores of 0.70–0.81. By analysing geographic distances between predicted and gold-standard links, we demonstrate that a substantial portion of “incorrect” predictions are near-miss ambiguities rather than substantive errors. The resulting manually verified geolocation annotations have been integrated into PoeTree and made available through an interactive map interface.

Downloads

Download data is not yet available.

Downloads

Published

2025-12-31

Issue

Section

Articles

Most read articles by the same author(s)

1 2 > >>