strusPython  0.14
Public Member Functions | Friends | List of all members
strus::DocumentAnalyzer Class Reference

Analyzer object representing a program for segmenting, tokenizing and normalizing a document into atomic parts, that can be inserted into a storage and be retrieved from there. More...

#include <bindingObjects.hpp>

Public Member Functions

 DocumentAnalyzer (const DocumentAnalyzer &o)
 Copy constructor. More...
 
 ~DocumentAnalyzer ()
 Destructor. More...
 
void addSearchIndexFeature (const String &type, const String &selectexpr, const Tokenizer &tokenizer, const NormalizerVector &normalizers, const String &options=String())
 Define how a feature to insert into the inverted index (search index) is selected, tokenized and normalized. More...
 
void addForwardIndexFeature (const String &type, const String &selectexpr, const Tokenizer &tokenizer, const NormalizerVector &normalizers, const String &options=String())
 Define how a feature to insert into the forward index (for summarization) is selected, tokenized and normalized. More...
 
void defineMetaData (const String &fieldname, const String &selectexpr, const Tokenizer &tokenizer, const NormalizerVector &normalizers)
 Define how a feature to insert as meta data is selected, tokenized and normalized. More...
 
void defineAggregatedMetaData (const String &fieldname, const Aggregator &function)
 Declare some aggregated value of the document to be put into the meta data table used for restrictions, weighting and summarization. More...
 
void defineAttribute (const String &attribname, const String &selectexpr, const Tokenizer &tokenizer, const NormalizerVector &normalizers)
 Define how a feature to insert as document attribute (for summarization) is selected, tokenized and normalized. More...
 
void addSearchIndexFeatureFromPatternMatch (const String &type, const String &patternTypeName, const NormalizerVector &normalizers, const String &options=String())
 
void addForwardIndexFeatureFromPatternMatch (const String &type, const String &patternTypeName, const NormalizerVector &normalizers, const String &options=String())
 
void defineMetaDataFromPatternMatch (const String &fieldname, const String &patternTypeName, const NormalizerVector &normalizers)
 
void defineAttributeFromPatternMatch (const String &attribname, const String &patternTypeName, const NormalizerVector &normalizers)
 
void definePatternMatcherPostProc (const String &patternTypeName, const String &patternMatcherModule, const PatternMatcher &patterns)
 Declare a pattern matcher on the document features after other query analysis. More...
 
void definePatternMatcherPostProcFromFile (const String &patternTypeName, const String &patternMatcherModule, const String &serializedPatternFile)
 Declare a pattern matcher on the document features after other query analysis. More...
 
void defineDocument (const String &subDocumentTypeName, const String &selectexpr)
 Declare a sub document for the handling of multi part documents in an analyzed content. More...
 
Document analyze (const String &content)
 Analye the content and return the set of features to insert. More...
 
Document analyze (const String &content, const DocumentClass &dclass)
 Analye the content and return the set of features to insert. More...
 
DocumentAnalyzeQueue createQueue () const
 Creates a queue for multi document analysis. More...
 

Friends

class Context
 Constructor used by Context. More...
 

Detailed Description

Analyzer object representing a program for segmenting, tokenizing and normalizing a document into atomic parts, that can be inserted into a storage and be retrieved from there.

Remarks
The only way to construct a document analyzer instance is to call Context::createDocumentAnalyzer()

Constructor & Destructor Documentation

strus::DocumentAnalyzer::DocumentAnalyzer ( const DocumentAnalyzer o)

Copy constructor.

strus::DocumentAnalyzer::~DocumentAnalyzer ( )
inline

Destructor.

Member Function Documentation

void strus::DocumentAnalyzer::addForwardIndexFeature ( const String &  type,
const String &  selectexpr,
const Tokenizer tokenizer,
const NormalizerVector normalizers,
const String &  options = String() 
)

Define how a feature to insert into the forward index (for summarization) is selected, tokenized and normalized.

Parameters
[in]typetype of the features produced
[in]selectexprexpression selecting the elements to fetch for producing this feature
[in]tokenizertokenizer function description to use for this feature
[in]normalizerslist of normalizer function description to use for this feature in the ascending order of appearance
[in]optionsa list of options as string, elements separated by ',', one of {"BindPosPred" => the position is bound to the preceeding feature, "BindPosSucc" => the position is bound to the succeeding feature}
void strus::DocumentAnalyzer::addForwardIndexFeatureFromPatternMatch ( const String &  type,
const String &  patternTypeName,
const NormalizerVector normalizers,
const String &  options = String() 
)
void strus::DocumentAnalyzer::addSearchIndexFeature ( const String &  type,
const String &  selectexpr,
const Tokenizer tokenizer,
const NormalizerVector normalizers,
const String &  options = String() 
)

Define how a feature to insert into the inverted index (search index) is selected, tokenized and normalized.

Parameters
[in]typetype of the features produced
[in]selectexprexpression selecting the elements to fetch for producing this feature
[in]tokenizertokenizer function description to use for this feature
[in]normalizerslist of normalizer function description to use for this feature in the ascending order of appearance
[in]optionsa list of options as string, elements separated by ',', one of {"BindPosPred" => the position is bound to the preceeding feature, "BindPosSucc" => the position is bound to the succeeding feature}
void strus::DocumentAnalyzer::addSearchIndexFeatureFromPatternMatch ( const String &  type,
const String &  patternTypeName,
const NormalizerVector normalizers,
const String &  options = String() 
)
Document strus::DocumentAnalyzer::analyze ( const String &  content)

Analye the content and return the set of features to insert.

Parameters
[in]contentstring (NOT a file name !) of the document to analyze
Document strus::DocumentAnalyzer::analyze ( const String &  content,
const DocumentClass dclass 
)

Analye the content and return the set of features to insert.

Parameters
[in]contentstring (NOT a file name !) of the document to analyze
[in]dclassdocument class of the document to analyze
DocumentAnalyzeQueue strus::DocumentAnalyzer::createQueue ( ) const

Creates a queue for multi document analysis.

Returns
the queue
void strus::DocumentAnalyzer::defineAggregatedMetaData ( const String &  fieldname,
const Aggregator function 
)

Declare some aggregated value of the document to be put into the meta data table used for restrictions, weighting and summarization.

Parameters
[in]fieldnamename of the addressed meta data field.
[in]functiondefining how and from what the value is aggregated
void strus::DocumentAnalyzer::defineAttribute ( const String &  attribname,
const String &  selectexpr,
const Tokenizer tokenizer,
const NormalizerVector normalizers 
)

Define how a feature to insert as document attribute (for summarization) is selected, tokenized and normalized.

Parameters
[in]attribnamename of the addressed attribute.
[in]selectexprexpression selecting the elements to fetch for producing this feature
[in]tokenizertokenizer function description to use for this feature
[in]normalizerslist of normalizer function description to use for this feature in the ascending order of appearance
void strus::DocumentAnalyzer::defineAttributeFromPatternMatch ( const String &  attribname,
const String &  patternTypeName,
const NormalizerVector normalizers 
)
void strus::DocumentAnalyzer::defineDocument ( const String &  subDocumentTypeName,
const String &  selectexpr 
)

Declare a sub document for the handling of multi part documents in an analyzed content.

Parameters
[in]selectexpran expression that defines the content of the sub document declared
[in]subDocumentTypeNametype name assinged to this sub document
Remarks
Sub documents are defined as the sections selected by the expression plus some data selected not belonging to any sub document.
void strus::DocumentAnalyzer::defineMetaData ( const String &  fieldname,
const String &  selectexpr,
const Tokenizer tokenizer,
const NormalizerVector normalizers 
)

Define how a feature to insert as meta data is selected, tokenized and normalized.

Parameters
[in]fieldnamename of the addressed meta data field.
[in]selectexprexpression selecting the elements to fetch for producing this feature
[in]tokenizertokenizer function description to use for this feature
[in]normalizerslist of normalizer function description to use for this feature in the ascending order of appearance
void strus::DocumentAnalyzer::defineMetaDataFromPatternMatch ( const String &  fieldname,
const String &  patternTypeName,
const NormalizerVector normalizers 
)
void strus::DocumentAnalyzer::definePatternMatcherPostProc ( const String &  patternTypeName,
const String &  patternMatcherModule,
const PatternMatcher patterns 
)

Declare a pattern matcher on the document features after other query analysis.

Parameters
[in]patternTypeNamename of the type to assign to the pattern matching results
[in]patternMatcherModulemodule id of pattern matcher to use (empty string for default)
[in]patternsstructure with all patterns
void strus::DocumentAnalyzer::definePatternMatcherPostProcFromFile ( const String &  patternTypeName,
const String &  patternMatcherModule,
const String &  serializedPatternFile 
)

Declare a pattern matcher on the document features after other query analysis.

Parameters
[in]patternTypeNamename of the type to assign to the pattern matching results
[in]patternMatcherModulemodule id of pattern matcher to use (empty string for default)
[in]serializedPatternFilepath to file with serialized (binary) patterns

Friends And Related Function Documentation

friend class Context
friend

Constructor used by Context.


The documentation for this class was generated from the following file: