public class CommonArticlesFinder
extends java.lang.Object
CommonNamespaceFinder
instead for the following dumps.
The process queries directly to the SQL database.Modifier and Type | Field and Description |
---|---|
protected static java.lang.String |
pairs_db
DB with langlinks & articles pairs
|
Constructor and Description |
---|
CommonArticlesFinder(java.lang.String[] langs,
int year,
java.lang.String[] filesID,
java.io.File folder)
Instantiates the object with the provided languages.
|
Modifier and Type | Method and Description |
---|---|
void |
checkAllTablesAvailable()
Checks if all the tables needed are in the database.
|
void |
closeConnection() |
void |
findIntersection(java.lang.String smallestLang)
Looks for the articles that appear in all the selected languages
langs
simultaneously. |
void |
findUnion(java.lang.String largestLang)
Looks for the articles that appear in any of the selected languages
langs
and builds a set with its union. |
java.lang.String |
getPrefixOutputFile()
getters
|
java.lang.String |
lookForMaximumNumber()
Given the list of articles for every language
String[] filesID the
language from String[] langs with more articles is returned. |
java.lang.String |
lookForMinimumNumber()
Given the list of articles for every language
String[] filesID the
language from String[] langs with less articles is returned. |
static void |
main(java.lang.String[] args)
Example for using the class
|
protected static final java.lang.String pairs_db
public CommonArticlesFinder(java.lang.String[] langs, int year, java.lang.String[] filesID, java.io.File folder) throws java.lang.Throwable
langs[]
- year
- folder
- java.lang.Throwable
public static void main(java.lang.String[] args) throws java.lang.Throwable
args
- java.lang.Throwable
public void findUnion(java.lang.String largestLang)
langs
and builds a set with its union. So, for other languages than the original, if the
articles exist in the DB but do not appear in the ID files, they are added.
Generates the list listUnion
from which a file is printed by
getAndPrintAllInfoUnionIDs(List<Integer> listUnion, String largestLang)
largestLang
- public void findIntersection(java.lang.String smallestLang)
langs
simultaneously. If they exist in the DB but do not appear in the files with the
IDs, the articles are discarded.
"ID \t title \t" of the articles is printed in a file with the information for
all the languages concatenated in a row.smallestLang
- public java.lang.String lookForMinimumNumber()
String[] filesID
the
language from String[] langs
with less articles is returned.
The position of the language and the file must correspond in the arrays.
TODO is this better than extracting the language from the name of the file?public java.lang.String lookForMaximumNumber()
String[] filesID
the
language from String[] langs
with more articles is returned.
The position of the language and the file must correspond in the arrays.public void closeConnection()
public void checkAllTablesAvailable()
public java.lang.String getPrefixOutputFile()