strusPython
0.14
|
Analyzer object representing a program for segmenting, tokenizing and normalizing a document into atomic parts, that can be inserted into a storage and be retrieved from there. More...
#include <bindingObjects.hpp>
Public Member Functions | |
DocumentAnalyzer (const DocumentAnalyzer &o) | |
Copy constructor. More... | |
~DocumentAnalyzer () | |
Destructor. More... | |
void | addSearchIndexFeature (const String &type, const String &selectexpr, const Tokenizer &tokenizer, const NormalizerVector &normalizers, const String &options=String()) |
Define how a feature to insert into the inverted index (search index) is selected, tokenized and normalized. More... | |
void | addForwardIndexFeature (const String &type, const String &selectexpr, const Tokenizer &tokenizer, const NormalizerVector &normalizers, const String &options=String()) |
Define how a feature to insert into the forward index (for summarization) is selected, tokenized and normalized. More... | |
void | defineMetaData (const String &fieldname, const String &selectexpr, const Tokenizer &tokenizer, const NormalizerVector &normalizers) |
Define how a feature to insert as meta data is selected, tokenized and normalized. More... | |
void | defineAggregatedMetaData (const String &fieldname, const Aggregator &function) |
Declare some aggregated value of the document to be put into the meta data table used for restrictions, weighting and summarization. More... | |
void | defineAttribute (const String &attribname, const String &selectexpr, const Tokenizer &tokenizer, const NormalizerVector &normalizers) |
Define how a feature to insert as document attribute (for summarization) is selected, tokenized and normalized. More... | |
void | addSearchIndexFeatureFromPatternMatch (const String &type, const String &patternTypeName, const NormalizerVector &normalizers, const String &options=String()) |
void | addForwardIndexFeatureFromPatternMatch (const String &type, const String &patternTypeName, const NormalizerVector &normalizers, const String &options=String()) |
void | defineMetaDataFromPatternMatch (const String &fieldname, const String &patternTypeName, const NormalizerVector &normalizers) |
void | defineAttributeFromPatternMatch (const String &attribname, const String &patternTypeName, const NormalizerVector &normalizers) |
void | definePatternMatcherPostProc (const String &patternTypeName, const String &patternMatcherModule, const PatternMatcher &patterns) |
Declare a pattern matcher on the document features after other query analysis. More... | |
void | definePatternMatcherPostProcFromFile (const String &patternTypeName, const String &patternMatcherModule, const String &serializedPatternFile) |
Declare a pattern matcher on the document features after other query analysis. More... | |
void | defineDocument (const String &subDocumentTypeName, const String &selectexpr) |
Declare a sub document for the handling of multi part documents in an analyzed content. More... | |
Document | analyze (const String &content) |
Analye the content and return the set of features to insert. More... | |
Document | analyze (const String &content, const DocumentClass &dclass) |
Analye the content and return the set of features to insert. More... | |
DocumentAnalyzeQueue | createQueue () const |
Creates a queue for multi document analysis. More... | |
Friends | |
class | Context |
Constructor used by Context. More... | |
Analyzer object representing a program for segmenting, tokenizing and normalizing a document into atomic parts, that can be inserted into a storage and be retrieved from there.
strus::DocumentAnalyzer::DocumentAnalyzer | ( | const DocumentAnalyzer & | o | ) |
Copy constructor.
|
inline |
Destructor.
void strus::DocumentAnalyzer::addForwardIndexFeature | ( | const String & | type, |
const String & | selectexpr, | ||
const Tokenizer & | tokenizer, | ||
const NormalizerVector & | normalizers, | ||
const String & | options = String() |
||
) |
Define how a feature to insert into the forward index (for summarization) is selected, tokenized and normalized.
[in] | type | type of the features produced |
[in] | selectexpr | expression selecting the elements to fetch for producing this feature |
[in] | tokenizer | tokenizer function description to use for this feature |
[in] | normalizers | list of normalizer function description to use for this feature in the ascending order of appearance |
[in] | options | a list of options as string, elements separated by ',', one of {"BindPosPred" => the position is bound to the preceeding feature, "BindPosSucc" => the position is bound to the succeeding feature} |
void strus::DocumentAnalyzer::addForwardIndexFeatureFromPatternMatch | ( | const String & | type, |
const String & | patternTypeName, | ||
const NormalizerVector & | normalizers, | ||
const String & | options = String() |
||
) |
void strus::DocumentAnalyzer::addSearchIndexFeature | ( | const String & | type, |
const String & | selectexpr, | ||
const Tokenizer & | tokenizer, | ||
const NormalizerVector & | normalizers, | ||
const String & | options = String() |
||
) |
Define how a feature to insert into the inverted index (search index) is selected, tokenized and normalized.
[in] | type | type of the features produced |
[in] | selectexpr | expression selecting the elements to fetch for producing this feature |
[in] | tokenizer | tokenizer function description to use for this feature |
[in] | normalizers | list of normalizer function description to use for this feature in the ascending order of appearance |
[in] | options | a list of options as string, elements separated by ',', one of {"BindPosPred" => the position is bound to the preceeding feature, "BindPosSucc" => the position is bound to the succeeding feature} |
void strus::DocumentAnalyzer::addSearchIndexFeatureFromPatternMatch | ( | const String & | type, |
const String & | patternTypeName, | ||
const NormalizerVector & | normalizers, | ||
const String & | options = String() |
||
) |
Document strus::DocumentAnalyzer::analyze | ( | const String & | content | ) |
Analye the content and return the set of features to insert.
[in] | content | string (NOT a file name !) of the document to analyze |
Document strus::DocumentAnalyzer::analyze | ( | const String & | content, |
const DocumentClass & | dclass | ||
) |
Analye the content and return the set of features to insert.
[in] | content | string (NOT a file name !) of the document to analyze |
[in] | dclass | document class of the document to analyze |
DocumentAnalyzeQueue strus::DocumentAnalyzer::createQueue | ( | ) | const |
Creates a queue for multi document analysis.
void strus::DocumentAnalyzer::defineAggregatedMetaData | ( | const String & | fieldname, |
const Aggregator & | function | ||
) |
Declare some aggregated value of the document to be put into the meta data table used for restrictions, weighting and summarization.
[in] | fieldname | name of the addressed meta data field. |
[in] | function | defining how and from what the value is aggregated |
void strus::DocumentAnalyzer::defineAttribute | ( | const String & | attribname, |
const String & | selectexpr, | ||
const Tokenizer & | tokenizer, | ||
const NormalizerVector & | normalizers | ||
) |
Define how a feature to insert as document attribute (for summarization) is selected, tokenized and normalized.
[in] | attribname | name of the addressed attribute. |
[in] | selectexpr | expression selecting the elements to fetch for producing this feature |
[in] | tokenizer | tokenizer function description to use for this feature |
[in] | normalizers | list of normalizer function description to use for this feature in the ascending order of appearance |
void strus::DocumentAnalyzer::defineAttributeFromPatternMatch | ( | const String & | attribname, |
const String & | patternTypeName, | ||
const NormalizerVector & | normalizers | ||
) |
void strus::DocumentAnalyzer::defineDocument | ( | const String & | subDocumentTypeName, |
const String & | selectexpr | ||
) |
Declare a sub document for the handling of multi part documents in an analyzed content.
[in] | selectexpr | an expression that defines the content of the sub document declared |
[in] | subDocumentTypeName | type name assinged to this sub document |
void strus::DocumentAnalyzer::defineMetaData | ( | const String & | fieldname, |
const String & | selectexpr, | ||
const Tokenizer & | tokenizer, | ||
const NormalizerVector & | normalizers | ||
) |
Define how a feature to insert as meta data is selected, tokenized and normalized.
[in] | fieldname | name of the addressed meta data field. |
[in] | selectexpr | expression selecting the elements to fetch for producing this feature |
[in] | tokenizer | tokenizer function description to use for this feature |
[in] | normalizers | list of normalizer function description to use for this feature in the ascending order of appearance |
void strus::DocumentAnalyzer::defineMetaDataFromPatternMatch | ( | const String & | fieldname, |
const String & | patternTypeName, | ||
const NormalizerVector & | normalizers | ||
) |
void strus::DocumentAnalyzer::definePatternMatcherPostProc | ( | const String & | patternTypeName, |
const String & | patternMatcherModule, | ||
const PatternMatcher & | patterns | ||
) |
Declare a pattern matcher on the document features after other query analysis.
[in] | patternTypeName | name of the type to assign to the pattern matching results |
[in] | patternMatcherModule | module id of pattern matcher to use (empty string for default) |
[in] | patterns | structure with all patterns |
void strus::DocumentAnalyzer::definePatternMatcherPostProcFromFile | ( | const String & | patternTypeName, |
const String & | patternMatcherModule, | ||
const String & | serializedPatternFile | ||
) |
Declare a pattern matcher on the document features after other query analysis.
[in] | patternTypeName | name of the type to assign to the pattern matching results |
[in] | patternMatcherModule | module id of pattern matcher to use (empty string for default) |
[in] | serializedPatternFile | path to file with serialized (binary) patterns |