Additional Information for the Experiments with the Wikipedia Dump of 2015

Set of selected domains

The file contains the 743 categories in ten languages. Categories are listed one per line, the ten languages are separated by a tab in the order en, es, de, fr, ca, ar, eu, el, ro and oc, and for each language we include the pair "ID categoryName" separated by a blank space.

Domain vocabularies and IDs of the extracted articles by model 50-WT100