Magdalena Król represented the AGH Faculty of Computer Science at one of the world’s most prestigious conferences in the field of natural language processing – EMNLP 2025 (Empirical Methods in Natural Language Processing).
She presented a poster titled “Lemmatization of Polish Multi-word Expressions”, co-authored with Aleksander Smywiński-Pohl, Zbigniew Kaleta, and Paweł Lewkowicz.
The publication received the SAC Highlight (Senior Area Chair Highlight) – a prestigious award granted by Senior Area Chairs to a small number of papers considered particularly valuable and influential within their research tracks.
About the research
The team developed CLEM, a model addressing one of the most challenging linguistic problems: lemmatization of Polish multi-word expressions (MWE).
Lemmatization is the process of reducing words to their base form (e.g., “książkami” → “książka”). In this case, it extends to multi-word expressions, such as “Ministerstwo Nauki i Szkolnictwa Wyższego” (“Ministry of Science and Higher Education”).
CLEM can recognize such complex expressions in text and reduce them to a unified base form, enabling more accurate information retrieval, data analysis, and natural language processing in Polish.
The AGH team fine-tuned plT5 and mT5 models on data from the PolEval 2019 competition and additional silver-standard sources (Wikipedia).
Their model achieved the best results to date (state of the art, SOTA) for Polish – correctly processing about 9 out of 10 multi-word expressions, setting a new benchmark in the field.
Model performance:
• 86.23% – case-sensitive accuracy
• 89.43% – case-insensitive accuracy
• 88.79% – combined score, establishing a new state of the art for Polish MWE lemmatization
The model is lightweight, efficient, and well-suited for the Polish language, making it applicable in real-world systems such as legal and scientific text analysis, information extraction, and Polish-language AI applications.
The publication is available in the EMNLP 2025 proceedings: https://aclanthology.org/2025.emnlp-main.1126
