Skip to content
@turkic-nlp

TurkicNLP

NLP Toolkit for Turkic Languages

TurkicNLP

TurkicNLP

Open-source NLP toolkit for 24 Turkic languages
From Turkish to Sakha, Kazakh to Uyghur — tokenization, morphology, POS tagging, dependency parsing, NER, transliteration, embeddings, and machine translation in one pip install.

arXiv License 24 Languages 4 Script Families


Why TurkicNLP? Over 200 million people speak a Turkic language, yet most lack basic NLP tools. TurkicNLP bridges this gap with a unified Python library covering 6 language branches — Oghuz, Kipchak, Karluk, Siberian, Oghur, and Arghu — plus historical languages like Ottoman Turkish and Old Turkic runic inscriptions.

Highlights

  • Morphological analysis via rule-based FSTs (~20 languages) and neural models (21 languages)
  • Multi-script support — automatic detection and bidirectional transliteration (Cyrillic ↔ Latin, Arabic ↔ Latin, Runic → Latin)
  • Neural pipelines — POS tagging, lemmatization, dependency parsing, and NER models
  • Embeddings & translation — sentence vectors and machine translation
  • Morpheme tokenizer — hybrid neural+FST segmentation with labeled morphemes for 16 languages

Quick start

pip install "turkicnlp[all]"
import turkicnlp
turkicnlp.download("kaz")
nlp = turkicnlp.Pipeline("kaz", processors=["tokenize", "pos", "lemma", "depparse"])
doc = nlp("Мен мектепке бардым")

Explore

Library turkic-nlp/turkicnlp
Code samples turkic-nlp/turkic-nlp-code-samples
Paper arXiv:2602.19174
Website turkic-nlp.github.io

Turkic languages map
Turkic languages span from Turkey to Siberia, China to the Balkans


Maintained by Sherzod Hakimov · Contributions welcome

Popular repositories Loading

  1. turkicnlp turkicnlp Public

    NLP Toolkit for Turkic Languages

    Python 5

  2. turkic-nlp.github.io turkic-nlp.github.io Public

    A web page that shows available resources for Turkic languages.

    HTML

  3. .github .github Public

  4. apertium-data apertium-data Public

    Python

  5. trained-stanza-models trained-stanza-models Public

    Shell

  6. generated-ud-data generated-ud-data Public

Repositories

Showing 6 of 6 repositories

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…