public class DomainKeywords
extends java.lang.Object
This class gets the most common terms in the articles belonging to, at least, one category of a given domain. A domain is defined by one category (root) and all of its subcategories.
Terms are stemmed and stopwords are not included.
TODO UNITTEST!!!Constructor and Description |
---|
DomainKeywords(java.util.Locale locale,
int year) |
Modifier and Type | Method and Description |
---|---|
void |
computeTF()
Gets the term frequency tuples resulting the treatment of a set of
TODO this should be private!!
|
void |
computeTF(java.lang.String categoryName)
As computeTF() but including the title of the root category
|
java.lang.String |
getLang() |
java.util.List<TermFrequencyTuple> |
getTermTuples() |
java.util.List<TermFrequencyTuple> |
getTopTuples() |
java.util.List<TermFrequencyTuple> |
getTopTuplesPlus(java.lang.String category) |
int |
getYear() |
void |
loadArticles(int categoryID)
Loads the articles of the given category ID.
|
void |
loadArticles(java.lang.String categoryName)
Loads the articles of the given category.
|
static void |
main(java.lang.String[] args)
Main method.
|
void |
setLang(java.util.Locale lang)
Sets the language.
|
void |
setMaxSize(int m)
Defines the maximum number of terms that should be considered as domain terms
|
void |
setMinNumArticles(int min)
Defines the minimum number of articles required to build the vocabulary
|
void |
setPercentage(int t)
Defines the percentage of terms that should be considered as domain terms
|
void |
setYear(int year) |
void |
toFile(java.io.File file)
Saves the top list of a text file.
|
void |
toFile(java.io.File file,
java.util.List<TermFrequencyTuple> tfs)
Saves the given list of TermFrequencyTuples into a file
|
public void computeTF()
articles
- The Wikipedia pages to processpublic void computeTF(java.lang.String categoryName)
articles
- The Wikipedia pages to processcategoryName
- The title of the root categorypublic void loadArticles(int categoryID) throws WikiApiException
categoryID
- Identifier of the category. The category must exist in
the Wikipedia databaseWikiApiException
- Raised if creating a Wikipedia connector is not
possible for the given language and year.public void loadArticles(java.lang.String categoryName) throws WikiApiException
categoryName
- Name of the categoryWikiApiException
- Raised if creating a Wikipedia connector is not
possible for the given language and year or if the category name is
not valid in that Wikipedia connectorpublic void toFile(java.io.File file)
file
- The file to save the list into.public void toFile(java.io.File file, java.util.List<TermFrequencyTuple> tfs)
file
- tfs
- public int getYear()
public java.lang.String getLang()
public java.util.List<TermFrequencyTuple> getTermTuples()
public java.util.List<TermFrequencyTuple> getTopTuples()
public java.util.List<TermFrequencyTuple> getTopTuplesPlus(java.lang.String category)
public void setPercentage(int t)
t
- an integer in the range (0,100]public void setMaxSize(int m)
max
- an integerpublic void setMinNumArticles(int min)
min
- is an integerpublic void setLang(java.util.Locale lang)
lang
- The locale for the new languagepublic void setYear(int year)
public static void main(java.lang.String[] args)
args
- List of parameters: