strusAnalyzer  0.17
Public Member Functions | List of all members
strus::DocumentAnalyzerMapInterface Class Referenceabstract

Defines a program for analyzing a document, splitting it into normalized terms that can be fed to the strus IR engine. More...

#include <documentAnalyzerMapInterface.hpp>

Public Member Functions

virtual ~DocumentAnalyzerMapInterface ()
 Destructor. More...
 
virtual
DocumentAnalyzerInstanceInterface
createAnalyzer (const std::string &mimeType, const std::string &scheme) const =0
 Declare a an analyzer interface to instrument and and add with addAnalyzer. More...
 
virtual void addAnalyzer (const std::string &mimeType, const std::string &scheme, DocumentAnalyzerInstanceInterface *analyzer)=0
 Declare a an analyzer to be used for the analysis of a specific document class. More...
 
virtual const
DocumentAnalyzerInstanceInterface
getAnalyzer (const std::string &mimeType, const std::string &scheme) const =0
 Get the analyzer interface assigned to a document class. More...
 
virtual analyzer::Document analyze (const std::string &content, const analyzer::DocumentClass &dclass) const =0
 Segment and tokenize a document, assign types to tokens and metadata and normalize their values. More...
 
virtual
DocumentAnalyzerContextInterface
createContext (const analyzer::DocumentClass &dclass) const =0
 Create the context used for analyzing multipart or very big documents. More...
 
virtual
analyzer::DocumentAnalyzerMapView 
view () const =0
 Return a structure with all definitions for introspection. More...
 

Detailed Description

Defines a program for analyzing a document, splitting it into normalized terms that can be fed to the strus IR engine.

Constructor & Destructor Documentation

virtual strus::DocumentAnalyzerMapInterface::~DocumentAnalyzerMapInterface ( )
inlinevirtual

Destructor.

Member Function Documentation

virtual void strus::DocumentAnalyzerMapInterface::addAnalyzer ( const std::string &  mimeType,
const std::string &  scheme,
DocumentAnalyzerInstanceInterface analyzer 
)
pure virtual

Declare a an analyzer to be used for the analysis of a specific document class.

Parameters
[in]mimetypeof the document to process with this analyzer (must be defined)
[in]schemescheme of the document to process with this analyzer (can be empty meaning not defined)
[in]analyzeranalyzer to use for the defined class of documents (with ownership)
virtual analyzer::Document strus::DocumentAnalyzerMapInterface::analyze ( const std::string &  content,
const analyzer::DocumentClass dclass 
) const
pure virtual

Segment and tokenize a document, assign types to tokens and metadata and normalize their values.

Parameters
[in]contentdocument content string to analyze
[in]dclassdescription of the content type and encoding to process
Returns
the analyzed document
Remarks
Do not use this function in case of a multipart document (defined with 'defineSubDocument(const std::string&,const std::string&)') because you get only one sub document analyzed.
virtual DocumentAnalyzerInstanceInterface* strus::DocumentAnalyzerMapInterface::createAnalyzer ( const std::string &  mimeType,
const std::string &  scheme 
) const
pure virtual

Declare a an analyzer interface to instrument and and add with addAnalyzer.

Parameters
[in]mimetypeof the document for this analyzer, determines the document segmenter
[in]schemescheme of the document to determine the segmenter options (can be empty meaning not defined)
Returns
the analyzer (with ownership)
virtual DocumentAnalyzerContextInterface* strus::DocumentAnalyzerMapInterface::createContext ( const analyzer::DocumentClass dclass) const
pure virtual

Create the context used for analyzing multipart or very big documents.

Parameters
[in]dclassdescription of the content type and encoding to process
Returns
the document analyzer context (with ownership)
virtual const DocumentAnalyzerInstanceInterface* strus::DocumentAnalyzerMapInterface::getAnalyzer ( const std::string &  mimeType,
const std::string &  scheme 
) const
pure virtual

Get the analyzer interface assigned to a document class.

Parameters
[in]dclassdescription of the content type and encoding to process
Returns
a reference to the analyzer interface
virtual analyzer::DocumentAnalyzerMapView strus::DocumentAnalyzerMapInterface::view ( ) const
pure virtual

Return a structure with all definitions for introspection.

Returns
the structure with all definitions for introspection

The documentation for this class was generated from the following file: