terminology_evaluation

Installation and Prerequisites

The script uses Python 3. You can simply run the following to clone this repository and install all of the above requirements:

git clone https://github.com/mahfuzibnalam/terminology_evaluation.git
cd terminology_evaluation
pip install -r requirements.txt

List of requirements:

stanza
argparse
sacrebleu
bs4
lxml (need it for mac)

Code

The main script is evaluate_term_wmt.py that receives the following arguments:

--language - The language code (eg. fr for French) of the target language.
--hypothesis - This is the hypothesis file. Example file: data/en-fr.dev.txt.truecased.sgm.
--source - This is a file with the source references. An example file is provided at data/dev.en-fr.en.sgm
--target_reference - This is a file with the target references. An example file is provided at data/dev.en-fr.fr.sgm
--BLEU [True/False]. By default True. If True shows BLEU score.
--EXACT_MATCH [True/False]. By default True. If True shows Exact Match score.
--WINDOW_OVERLAP [True/False]. By default True. If True shows Window Overlap Score.
--MOD_TER [True/False]. By default True. If True shows TERm score.
--TER [True/False]. By default False. If True shows TER score.

Example

You can test that your metrics work by running the following command on the sample data we provide.

python3 evaluate_term_wmt.py \
    --language fr \
    --hypothesis data/en-fr.dev.txt.truecased.sgm \
    --source data/dev.en-fr.en.sgm \
    --target_reference data/dev.en-fr.fr.sgm

Running the above command will:

Download the French Stanza models, if they are not available locally already
Compute four metrics and print the following:

BLEU score: 45.33867641150976
Exact-Match Statistics
        Total correct: 759
        Total wrong: 127
        Total correct (lemma): 15
        Total wrong (lemma): 0
Exact-Match Accuracy: 0.8590455049944506
Window Overlap Accuracy :
        Window 2:
        Exact Window Overlap Accuracy: 0.29693757867032844
        Window 3:
        Exact Window Overlap Accuracy: 0.2907071747339513
1 - TERm Score: 0.5976316319523398

Notes:

The computation of TER or TERm can take quite some time if your data has very long sentences.

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
data		data
README.md		README.md
TER.py		TER.py
TER_modified.py		TER_modified.py
TER_util.py		TER_util.py
evaluate_term_wmt.py		evaluate_term_wmt.py
preprocess_outputs.sh		preprocess_outputs.sh
requirements.txt		requirements.txt
unwrap-xml.py		unwrap-xml.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

terminology_evaluation

Installation and Prerequisites

Code

Example

About

Releases

Packages

Languages

acoladgroup/terminology_evaluation

Folders and files

Latest commit

History

Repository files navigation

terminology_evaluation

Installation and Prerequisites

Code

Example

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages