Named Entity Recognition and Linking in PoeTree Corpora
DOI:
https://doi.org/10.12697/smp.2025.12.2.01Keywords:
poetry, named entities, computational poetics, natural language processingAbstract
Named entity recognition (NER) and named entity linking (NEL) remain underexplored in poetic texts. This study provides the first large-scale evaluation of contemporary NER and NEL systems on poetry across seven European languages – Czech, German, English, French, Italian, Russian, and Slovenian – using corpora from the PoeTree project. We benchmark three NER systems (flair, NameTag 2, spaCy) and three GPT models (GPT-3.5, GPT-4, GPT-4 Turbo) against manually annotated gold standards. While results fall short of in-domain benchmarks, they significantly outperform earlier findings. Manual correction further raises final annotation quality to estimated F1 scores between 0.77 and 0.93 across languages. We additionally evaluate two NEL systems – spaCy fishing and mGenre – showing that mGenre consistently outperforms spaCy fishing, achieving in-KB F1-scores of 0.70–0.81. By analysing geographic distances between predicted and gold-standard links, we demonstrate that a substantial portion of “incorrect” predictions are near-miss ambiguities rather than substantive errors. The resulting manually verified geolocation annotations have been integrated into PoeTree and made available through an interactive map interface.