Yet Another Keyword Extractor (YAKE!)

Unsupervised Automatic Keyword Extraction

YAKE! is a light-weight unsupervised automatic keyword extraction method which rests on text statistical features extracted from single documents to select the most important keywords of a text.

🚀

No Training Required

Works without corpus or dictionaries

🌍

Multilingual

Supports 25+ languages out-of-the-box

📄

Single Document

No corpus comparison needed

📚 Background

Extracting keywords from texts has become a challenge for individuals and organizations as the information grows in complexity and size. The need to automate this task so that texts can be processed in a timely and adequate manner has led to the emergence of automatic keyword extraction tools.

Despite the advances, there is a clear lack of multilingual online tools to automatically extract keywords from single documents.

YAKE! Innovation

YAKE! is a novel feature-based system for multi-lingual keyword extraction, which supports texts of different sizes, domain or languages. Unlike other approaches, YAKE! does not rely on dictionaries nor thesauri, neither is trained against any corpora.

Instead, it follows an unsupervised approach which builds upon features extracted from the text, making it thus applicable to documents written in different languages without the need for further knowledge. This can be beneficial for a large number of tasks and a plethora of situations where the access to training corpora is either limited or restricted.

✨ Main Features

🎯 Unsupervised Approach

No need for training data or labeled corpora. Works immediately on any text.

📊 Corpus-Independent

Analyzes each document independently without requiring external reference corpora.

🌐 Domain and Language Independent

Works across different domains and languages without configuration changes.

📄 Single-Document Processing

Extracts keywords from individual documents with high accuracy.

📖 References

Citation Request

If you use YAKE! in a work that leads to a scientific publication, we would appreciate it if you would kindly cite it in your manuscript.

📄 Main Publications

Information Sciences Journal (2020)

Campos, R., Mangaravite, V., Pasquali, A., Jatowt, A., Jorge, A., Nunes, C. and Jatowt, A. (2020). YAKE! Keyword Extraction from Single Documents using Multiple Local Features. Information Sciences Journal. Elsevier, Vol 509, pp 257-289.

📥 Download PDF

ECIR 2018 - Best Short Paper 🏆

Campos R., Mangaravite V., Pasquali A., Jorge A.M., Nunes C., and Jatowt A. (2018). A Text Feature Based Automatic Keyword Extraction Method for Single Documents. In: Pasi G., Piwowarski B., Azzopardi L., Hanbury A. (eds). Advances in Information Retrieval. ECIR 2018 (Grenoble, France. March 26 – 29). Lecture Notes in Computer Science, vol 10772, pp. 684 - 691.

📥 Download PDF

ECIR 2018 - Demo Paper

Campos R., Mangaravite V., Pasquali A., Jorge A.M., Nunes C., and Jatowt A. (2018). YAKE! Collection-independent Automatic Keyword Extractor. In: Pasi G., Piwowarski B., Azzopardi L., Hanbury A. (eds). Advances in Information Retrieval. ECIR 2018 (Grenoble, France. March 26 – 29). Lecture Notes in Computer Science, vol 10772, pp. 806 - 810.

📥 Download PDF

📜 License

View License →

🤝 Contributing

We Welcome Contributors!

When contributing to this repository, please first discuss the change you wish to make via issue, email, or any other method with the owners of this repository before making a change.

Read more about becoming a contributor in our GitHub repo.