Yet Another Keyword Extractor (YAKE!)
Unsupervised Automatic Keyword Extraction
YAKE! is a light-weight unsupervised automatic keyword extraction method which rests on text statistical features extracted from single documents to select the most important keywords of a text.
No Training Required
Works without corpus or dictionaries
Multilingual
Supports 25+ languages out-of-the-box
Single Document
No corpus comparison needed
📚 Background
Extracting keywords from texts has become a challenge for individuals and organizations as the information grows in complexity and size. The need to automate this task so that texts can be processed in a timely and adequate manner has led to the emergence of automatic keyword extraction tools.
Despite the advances, there is a clear lack of multilingual online tools to automatically extract keywords from single documents.
YAKE! Innovation
YAKE! is a novel feature-based system for multi-lingual keyword extraction, which supports texts of different sizes, domain or languages. Unlike other approaches, YAKE! does not rely on dictionaries nor thesauri, neither is trained against any corpora.
Instead, it follows an unsupervised approach which builds upon features extracted from the text, making it thus applicable to documents written in different languages without the need for further knowledge. This can be beneficial for a large number of tasks and a plethora of situations where the access to training corpora is either limited or restricted.
✨ Main Features
🎯 Unsupervised Approach
No need for training data or labeled corpora. Works immediately on any text.
📊 Corpus-Independent
Analyzes each document independently without requiring external reference corpora.
🌐 Domain and Language Independent
Works across different domains and languages without configuration changes.
📄 Single-Document Processing
Extracts keywords from individual documents with high accuracy.
📖 References
Citation Request
If you use YAKE! in a work that leads to a scientific publication, we would appreciate it if you would kindly cite it in your manuscript.
📄 Main Publications
Information Sciences Journal (2020)
Campos, R., Mangaravite, V., Pasquali, A., Jatowt, A., Jorge, A., Nunes, C. and Jatowt, A. (2020). YAKE! Keyword Extraction from Single Documents using Multiple Local Features. Information Sciences Journal. Elsevier, Vol 509, pp 257-289.
📥 Download PDF
ECIR 2018 - Best Short Paper 🏆
Campos R., Mangaravite V., Pasquali A., Jorge A.M., Nunes C., and Jatowt A. (2018). A Text Feature Based Automatic Keyword Extraction Method for Single Documents. In: Pasi G., Piwowarski B., Azzopardi L., Hanbury A. (eds). Advances in Information Retrieval. ECIR 2018 (Grenoble, France. March 26 – 29). Lecture Notes in Computer Science, vol 10772, pp. 684 - 691.
📥 Download PDF
ECIR 2018 - Demo Paper
Campos R., Mangaravite V., Pasquali A., Jorge A.M., Nunes C., and Jatowt A. (2018). YAKE! Collection-independent Automatic Keyword Extractor. In: Pasi G., Piwowarski B., Azzopardi L., Hanbury A. (eds). Advances in Information Retrieval. ECIR 2018 (Grenoble, France. March 26 – 29). Lecture Notes in Computer Science, vol 10772, pp. 806 - 810.
📥 Download PDF
📜 License
Copyright (C) 2018, INESC TEC
View License →
🤝 Contributing
We Welcome Contributors!
When contributing to this repository, please first discuss the change you wish to make via issue, email, or any other method with the owners of this repository before making a change.
Read more about becoming a contributor in our GitHub repo.
👥 Thank you to the contributors of YAKE!
If you are feeling nostalgic you can access the old site here.
Copyright ©2018-2026 INESC TEC. Distributed by an INESCTEC license.