LuceneTokenizer (WikiTailor)

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method

java.lang.Object
- cat.lump.ir.lucene.query.LuceneTokenizer

```
public class LuceneTokenizer
extends java.lang.Object
```
A simple interface to perform tokenization through the Lucene methods. Partially based on http://stackoverflow.com/questions/2638200/how-to-get-a-token-from-a-lucene-tokenstream

Author:

albarron

Field Summary

Fields
Modifier and Type Field and Description

protected Analyzer analyzer

Constructor Summary

Constructors
Constructor and Description

LuceneTokenizer()

LuceneTokenizer(java.util.Locale lan)

Method Summary

Methods
Modifier and Type	Method and Description
`static java.lang.String[]`	`ngramTokenize(java.lang.String file)`
`protected void`	`setAnalyzer(java.util.Locale lan)` TODO: This code is duplicated with cat.lump.ir.lucene.engine.loadAnalyzer
`java.lang.String[]`	`standardTokenize(java.io.File file)` Tokenize the text from the given file using Lucene's StandardAnalyzer
`java.lang.String[]`	`standardTokenize(java.lang.String text)` Tokenize the text using Lucene's StandardAnalyzer

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - analyzer
```
protected Analyzer analyzer
```
- Constructor Detail
  - LuceneTokenizer
```
public LuceneTokenizer()
```
  - LuceneTokenizer
```
public LuceneTokenizer(java.util.Locale lan)
```
- Method Detail
  - setAnalyzer
```
protected void setAnalyzer(java.util.Locale lan)
```
    TODO: This code is duplicated with cat.lump.ir.lucene.engine.loadAnalyzer
    
    Parameters:
    lan -
  - standardTokenize
```
public java.lang.String[] standardTokenize(java.io.File file)
                                    throws java.io.IOException
```
    Tokenize the text from the given file using Lucene's StandardAnalyzer
    
    Parameters:
    file -
    
    Returns:
    tokens in the file
    
    Throws:
    
    java.io.IOException
  - standardTokenize
```
public java.lang.String[] standardTokenize(java.lang.String text)
```
    Tokenize the text using Lucene's StandardAnalyzer
    
    Parameters:
    text -
    
    Returns:
  - ngramTokenize
```
public static java.lang.String[] ngramTokenize(java.lang.String file)
```

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method