Lucene-分析 - 芒果文档

📌 相关文章

📜 Lucene-分析

📅 最后修改于: 2020-11-12 04:47:49 🧑 作者: Mango

在前面的章节中，我们已经看到Lucene使用IndexWriter来使用分析器来分析文档，然后根据需要创建/打开/编辑索引。在本章中，我们将讨论在分析过程中使用的各种类型的Analyzer对象和其他相关对象。了解分析过程以及分析器的工作方式将使您对Lucene如何编制文档索引有更深入的了解。

以下是我们将在适当时候讨论的对象列表。

S.No.	Class & Description
1	Token Token represents text or word in a document with relevant details like its metadata (position, start offset, end offset, token type and its position increment).
2	TokenStream TokenStream is an output of the analysis process and it comprises of a series of tokens. It is an abstract class.
3	Analyzer This is an abstract base class for each and every type of Analyzer.
4	WhitespaceAnalyzer This analyzer splits the text in a document based on whitespace.
5	SimpleAnalyzer This analyzer splits the text in a document based on non-letter characters and puts the text in lowercase.
6	StopAnalyzer This analyzer works just as the SimpleAnalyzer and removes the common words like ‘a’, ‘an’, ‘the’, etc.
7	StandardAnalyzer This is the most sophisticated analyzer and is capable of handling names, email addresses, etc. It lowercases each token and removes common words and punctuations, if any.