Open-source NLP toolkit for 24 Turkic languages
From Turkish to Sakha, Kazakh to Uyghur — tokenization, morphology, POS tagging, dependency parsing, NER, transliteration, embeddings, and machine translation in one pip install.
Why TurkicNLP? Over 200 million people speak a Turkic language, yet most lack basic NLP tools. TurkicNLP bridges this gap with a unified Python library covering 6 language branches — Oghuz, Kipchak, Karluk, Siberian, Oghur, and Arghu — plus historical languages like Ottoman Turkish and Old Turkic runic inscriptions.
- Morphological analysis via rule-based FSTs (~20 languages) and neural models (21 languages)
- Multi-script support — automatic detection and bidirectional transliteration (Cyrillic ↔ Latin, Arabic ↔ Latin, Runic → Latin)
- Neural pipelines — POS tagging, lemmatization, dependency parsing, and NER models
- Embeddings & translation — sentence vectors and machine translation
- Morpheme tokenizer — hybrid neural+FST segmentation with labeled morphemes for 16 languages
pip install "turkicnlp[all]"import turkicnlp
turkicnlp.download("kaz")
nlp = turkicnlp.Pipeline("kaz", processors=["tokenize", "pos", "lemma", "depparse"])
doc = nlp("Мен мектепке бардым")| Library | turkic-nlp/turkicnlp |
| Code samples | turkic-nlp/turkic-nlp-code-samples |
| Paper | arXiv:2602.19174 |
| Website | turkic-nlp.github.io |

Turkic languages span from Turkey to Siberia, China to the Balkans
Maintained by Sherzod Hakimov · Contributions welcome
