ToxicClassifier

Toxic Classifiers developed for the GATE Cloud. Two models are available:

kaggle: trained on the Kaggle Toxic Comments Challenge dataset.
olid: trained on the OLIDv1 dataset from OffensEval 2019 (paper)

We fine-tuned a Roberta-base model using the simpletransformers toolkit.

Requirements

python=3.8 pandas tqdm pytorch simpletransformers

Pre-defined environments

conda

conda env create -f environment.yml

pip

pip install -r requirements.txt

(if the above does not work or if you want to use GPUs, you can try to follow the installation steps of simpletransformers: https://simpletransformers.ai/docs/installation/

Models

Download models from the latest release of this repository (currently available kaggle.tar.gz, olid.tar.gz)
Decompress file inside models/en/ (which will create models/en/kaggle or models/en/olid respectively)

Basic Usage

python __main__.py -t "This is a test" (should return 0 = non-toxic)

python __main__.py -t "Bastard!" (should return 1 = toxic)

Options

t: text
l: language (currently only supports "en")
c: classifier (currently supports "kaggle" and "olid" -- default="kaggle")
g: gpu (default=False)

Output

The output is composed by the predicted class and the probabilities of each class.

REST Service

Pre-built Docker images are available for a REST service that accepts text and returns a classification according to the relevant model - see the "packages" section for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
ToxicClassifier		ToxicClassifier
docker		docker
models/en		models/en
test_data		test_data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ToxicClassifier

Requirements

Pre-defined environments

Models

Basic Usage

Options

Output

REST Service

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors 3

Uh oh!

Languages

License

GateNLP/ToxicClassifier

Folders and files

Latest commit

History

Repository files navigation

ToxicClassifier

Requirements

Pre-defined environments

Models

Basic Usage

Options

Output

REST Service

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors 3

Uh oh!

Languages

Packages