A study of translation edit rate with targeted human annotation

TitleA study of translation edit rate with targeted human annotation
Publication TypeJournal Articles
Year of Publication2006
AuthorsSnover M, Dorr BJ, Schwartz R, Micciulla L, Makhoul J
JournalProceedings of Association for Machine Translation in the Americas
Pagination223 - 231
Date Published2006///
Abstract

We examine a new, intuitive measurefor evaluating machine-translation output
that avoids the knowledge intensiveness
of more meaning-based approaches, and
the labor-intensiveness of human judg-
ments. Translation Edit Rate (TER) mea-
sures the amount of editing that a hu-
man would have to perform to change
a system output so it exactly matches a
reference translation. We show that the
single-reference variant of TER correlates
as well with human judgments of MT
quality as the four-reference variant of
BLEU. We also define a human-targeted
TER (or HTER) and show that it yields
higher correlations with human judgments
than BLEU—even when BLEU is given
human-targeted references. Our results in-
dicate that HTER correlates with human
judgments better than HMETEOR and
that the four-reference variants of TER
and HTER correlate with human judg-
ments as well as—or better than—a sec-
ond human judgment does.