public class Stemmer
extends java.lang.Object
The static methods getStem(String term) and getStems(String[] terms)
can be used to quickly convert a word or words to their root form. Example code:
import org.dlese.dpc.index.Stemmer;
...
String word = "oceanic";
String stem = Stemmer.getStem(word); // stem now equals 'ocean'
String string = "A group of words that need to be stemmed";
String[] words = string.split("\\s+"); // Split on white space
String[] stems = Stemmer.getStems(words);
for(int i = 0; i < stems.length; i++){
... do something with the stems ...
}
For more information about the Porter stemming algorithm, see http://www.tartarus.org/~martin/PorterStemmer .
| Constructor and Description |
|---|
Stemmer()
Constructor for the Stemmer object
|
| Modifier and Type | Method and Description |
|---|---|
void |
add(char ch)
Add a character to the word being stemmed.
|
void |
add(char[] w,
int wLen)
Adds wLen characters to the word being stemmed contained in a portion of a char[]
array.
|
char[] |
getResultBuffer()
Returns a reference to a character buffer containing the results of the stemming
process.
|
int |
getResultLength()
Returns the length of the word resulting from the stemming process.
|
static java.lang.String |
getStem(java.lang.String term)
Gets the stem of the given english word.
|
static java.lang.String[] |
getStems(java.lang.String[] terms)
Gets the stems of the given english words.
|
static void |
main(java.lang.String[] args)
Test program for demonstrating the Stemmer.
|
void |
stem()
Stem the word placed into the Stemmer buffer through calls to add().
|
static java.lang.String |
stemWordsInLuceneClause(java.lang.String string)
Stems each of the words in a given Lucene clause String, returning the same String
with the word parts in stemmed form.
|
static java.lang.String |
stemWordsInString(java.lang.String string)
Stems each of the words or tokens in a given String, returning a String of stemmed
tokens with all other characters removed.
|
java.lang.String |
toString()
After a word has been stemmed, it can be retrieved by toString(), or a reference to
the internal buffer can be retrieved by getResultBuffer and getResultLength (which is
generally more efficient.)
|
public static final java.lang.String getStem(java.lang.String term)
term - A term in english.public static final java.lang.String[] getStems(java.lang.String[] terms)
terms - A group of terms in english.public static final java.lang.String stemWordsInString(java.lang.String string)
Example:
oceans and rain AND 44rains http://dlese.org/oceans
is transformed to
ocean and rain AND 44rain http dlese org ocean
string - A word, phrase, or any arbitrary String.public static final java.lang.String stemWordsInLuceneClause(java.lang.String string)
Example:
titles:("oceans AND oceans44 OR 44oceans and oceanic")^20 or cooled
is transformed to
titles:("ocean AND oceans44 OR 44ocean and ocean")^20 or cool
string - A word, phrase, Lucene clause, or any arbitrary String.public void add(char ch)
ch - DESCRIPTIONpublic void add(char[] w,
int wLen)
w - DESCRIPTIONwLen - DESCRIPTIONpublic java.lang.String toString()
toString in class java.lang.Objectpublic int getResultLength()
public char[] getResultBuffer()
public void stem()
public static void main(java.lang.String[] args)
args - The command line arguments