public class LengthModel
extends java.lang.Object
The model was originally proposed in: Pouliquen, Steinberger, and Ignat. Automatic Identification of Document Translations in Large Multilingual Document Collections. In: Proceedings of RANLP-2003, pp. 401-408. Borovets, Bulgaria, 2003.
It can be used as a feature for machine translation quality estimation.
It has been used for plagiarism detection as well. The definition
implemented here, as well as some background is available at:
Potthast, Barrón-Cedeño, Stein, and Rosso. Cross-Language Plagiarism
Detection. Language Resources and Evaluation (LRE), Special Issue on
Plagiarism and Authorship Analysis 45(1), pp. 1-18. Springer
Netherlands (2011)
The class includes a CLI that can be called as follows:
LEARNING
java -jar LengthModel.jar -l -s en.txt -t es.txt
ESTIMATION
java -jar LengthModel.jar -s en.txt -t es.test -m 1.17491349130 -d 0.34648875 -v
(The default operation is estimation.)
Constructor and Description |
---|
LengthModel() |
Modifier and Type | Method and Description |
---|---|
static void |
main(java.lang.String[] args)
Parses the input parameters and either learns a length model from a
collection or estimates the corresponding values for a set of texts
|