Add training and export functionality with Python bindings #105

mosuka · 2025-10-02T15:29:22Z

This commit introduces model training and dictionary export capabilities
to lindera-python, enabling users to train custom morphological analysis
models from annotated corpus data.

Features:

Add train() function to train CRF-based models from corpus
- Supports L1 regularization, configurable iterations, and multi-threading
- Accepts seed lexicon, corpus, character/unknown word/feature definitions
Add export() function to export trained models to dictionary files
- Generates lex.csv, matrix.def, unk.def, char.def
- Optional metadata.json update support

Implementation:

New src/trainer.rs module with PyO3 bindings for train/export
Add 'train' feature flag in Cargo.toml (requires lindera/train)
Use local lindera path (../lindera/lindera) for latest trainer API
Add num_cpus dependency for automatic thread detection

Documentation:

Update README.md with training/export usage examples
Add examples/train_and_export.py with complete workflow demonstration
Add tests/test_trainer.py with comprehensive test coverage
Corpus format follows lindera/resources/training conventions
(tab-separated surface + features with EOS markers)

Changes:

Modified: Cargo.toml, src/lib.rs, README.md
Added: src/trainer.rs, examples/train_and_export.py, tests/test_trainer.py
Updated: Cargo.lock, poetry.lock, pyproject.toml, Makefile

mosuka added 2 commits October 3, 2025 00:28

Add training and export functionality with Python bindings

6d6225a

Fix lindera version

645406a

mosuka merged commit f8a3652 into main Oct 2, 2025
5 checks passed

mosuka deleted the training branch October 2, 2025 23:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add training and export functionality with Python bindings #105

Add training and export functionality with Python bindings #105

mosuka commented Oct 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Add training and export functionality with Python bindings #105

Add training and export functionality with Python bindings #105

Conversation

mosuka commented Oct 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants