Document2Query (WikiTailor)

java.lang.Object
- cat.lump.ir.lucene.query.Document2Query

```
public class Document2Query
extends java.lang.Object
```
The contents of a document are processed to be in the right format for Lucene querying.
In order to do that, a tokenizer is called that acts exactly as it did during the index generation depends on Lucene's Analyzers) TODO determine which format is better for different interests, such as ESA TODO the stemming process seems to be missing here. Check if when querying it is carriend out

Since:

April 12 2012

Author:

albarron

Constructor Summary

Constructors
Constructor and Description

Document2Query()

Document2Query(java.util.Locale lan)

Constructors
Constructor and Description
`Document2Query()`
`Document2Query(java.util.Locale lan)`

Method Summary

Methods
Modifier and Type	Method and Description
`java.lang.String`	`doc2WeightQuery(java.lang.String file)` Generates a query in which tokens' relevance depend on their frequency
`java.lang.String`	`file2FlatQuery(java.lang.String file)` Generates a query in which every token has the same relevance TODO why am I using the same tokenizer for every language???
`static java.lang.String`	`flatQuery(java.lang.String[] tokens)` Creates a query considering all the tokens (i.e. some words could be repeated)
`Analyzer`	`getAnalyzer()`
`java.lang.String`	`str2FlatQuery(Analyzer analyzer, java.lang.String text)` Generates a query in which every token has the same relevance
`java.lang.String`	`str2FlatQuery(java.lang.String text)` Generates a query in which every token has the same relevance
`static java.lang.String`	`vocQuery(java.lang.String[] tokens)` Creates a query considering only the vocabulary (i.e. types)
`static java.lang.String`	`weightQuery(java.lang.String[] tokens)` Creates a query where the relevance of a type depends on its frequency (i.e. if a token w appears 4 times, it will appear as w^4)

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - Document2Query
```
public Document2Query()
```
  - Document2Query
```
public Document2Query(java.util.Locale lan)
```
- Method Detail
  - getAnalyzer
```
public Analyzer getAnalyzer()
```
  - file2FlatQuery
```
public java.lang.String file2FlatQuery(java.lang.String file)
```
    Generates a query in which every token has the same relevance TODO why am I using the same tokenizer for every language???
    
    Parameters:
    file -
    
    Returns:
    string representation of the query
    
    Throws:
    
    java.io.IOException
  - str2FlatQuery
```
public java.lang.String str2FlatQuery(Analyzer analyzer,
                             java.lang.String text)
```
    Generates a query in which every token has the same relevance
    
    Parameters:
    analyzer -
    text - string representation of the query
    
    Returns:
    string with space-separated tokens
  - str2FlatQuery
```
public java.lang.String str2FlatQuery(java.lang.String text)
```
    Generates a query in which every token has the same relevance
    
    Parameters:
    analyzer -
    text - string representation of the query
    
    Returns:
    string with space-separated tokens
  - doc2WeightQuery
```
public java.lang.String doc2WeightQuery(java.lang.String file)
```
    Generates a query in which tokens' relevance depend on their frequency
    
    Parameters:
    file -
    
    Returns:
    a string with space-separated tokens and weights
  - vocQuery
```
public static java.lang.String vocQuery(java.lang.String[] tokens)
```
    Creates a query considering only the vocabulary (i.e. types)
    
    Parameters:
    tokens -
    
    Returns:
    a string with space-separated tokens
  - flatQuery
```
public static java.lang.String flatQuery(java.lang.String[] tokens)
```
    Creates a query considering all the tokens (i.e. some words could be repeated)
    
    Parameters:
    tokens -
    
    Returns:
    string with flat query (space-separated tokens)
  - weightQuery
```
public static java.lang.String weightQuery(java.lang.String[] tokens)
```
    Creates a query where the relevance of a type depends on its frequency (i.e. if a token w appears 4 times, it will appear as w^4)
    
    Parameters:
    tokens -
    
    Returns:
    string where tokens are weighted by their frequency

Class Document2Query

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

Document2Query

Document2Query

Method Detail

getAnalyzer

file2FlatQuery

str2FlatQuery

str2FlatQuery

doc2WeightQuery

vocQuery

flatQuery

weightQuery