WikiTailor is a tool for extracting in-domain corpora from Wikipedia. A domain must be defined as an existing category in Wikipedia (or in Vikipèdia, or in Βικιπαίδεια or in whatever language you like) and the articles belonging to that domain are extracted even if they are not tagged as such. Two extraction methods are implemented: the main system is based on the exploration of Wikipedia's category graph and a secondary one based information retrieval techniques is also included

WikiTailor 1.0 functionalities

Available languages: Arabic, Basque, Catalan, English, French, German, Greek, Italian, Romanian, Portuguese and Spanish.

Upcoming

References

For a complete analysis of the methods implemented see: