TurkicNLP

Open-source NLP toolkit for 24 Turkic languages
From Turkish to Sakha, Kazakh to Uyghur — tokenization, morphology, POS tagging, dependency parsing, NER, transliteration, embeddings, and machine translation in one pip install.

Why TurkicNLP? Over 200 million people speak a Turkic language, yet most lack basic NLP tools. TurkicNLP bridges this gap with a unified Python library covering 6 language branches — Oghuz, Kipchak, Karluk, Siberian, Oghur, and Arghu — plus historical languages like Ottoman Turkish and Old Turkic runic inscriptions.

Highlights

Morphological analysis via rule-based FSTs (~20 languages) and neural models (21 languages)
Multi-script support — automatic detection and bidirectional transliteration (Cyrillic ↔ Latin, Arabic ↔ Latin, Runic → Latin)
Neural pipelines — POS tagging, lemmatization, dependency parsing, and NER models
Embeddings & translation — sentence vectors and machine translation
Morpheme tokenizer — hybrid neural+FST segmentation with labeled morphemes for 16 languages

Quick start

pip install "turkicnlp[all]"

import turkicnlp
turkicnlp.download("kaz")
nlp = turkicnlp.Pipeline("kaz", processors=["tokenize", "pos", "lemma", "depparse"])
doc = nlp("Мен мектепке бардым")

Explore


Library	turkic-nlp/turkicnlp
Code samples	turkic-nlp/turkic-nlp-code-samples
Paper	arXiv:2602.19174
Website	turkic-nlp.github.io

Turkic languages span from Turkey to Siberia, China to the Balkans

Maintained by Sherzod Hakimov · Contributions welcome

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TurkicNLP

TurkicNLP

Highlights

Quick start

Explore

Popular repositories Loading

Repositories

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

People

Top languages

Uh oh!

Most used topics

Uh oh!