strusAnalyzer  0.17
Public Member Functions | List of all members
strus::DocumentClassDetectorInterface Class Referenceabstract

Defines a detector that returns a content description for a document content it recognizes. More...

#include <documentClassDetectorInterface.hpp>

Public Member Functions

virtual ~DocumentClassDetectorInterface ()
 Destructor. More...
 
virtual void defineDocumentSchemeDetector (const std::string &scheme, const std::string &mimeType, const std::vector< std::string > &select_expressions, const std::vector< std::string > &reject_expressions)=0
 Define a detector for a document scheme. More...
 
virtual bool detect (analyzer::DocumentClass &dclass, const char *contentBegin, std::size_t contentBeginSize, bool isComplete) const =0
 Scans the start of a document to detect its classification attributes (mime type, etc.) More...
 

Detailed Description

Defines a detector that returns a content description for a document content it recognizes.

Constructor & Destructor Documentation

virtual strus::DocumentClassDetectorInterface::~DocumentClassDetectorInterface ( )
inlinevirtual

Destructor.

Member Function Documentation

virtual void strus::DocumentClassDetectorInterface::defineDocumentSchemeDetector ( const std::string &  scheme,
const std::string &  mimeType,
const std::vector< std::string > &  select_expressions,
const std::vector< std::string > &  reject_expressions 
)
pure virtual

Define a detector for a document scheme.

Parameters
[in]schemedocument scheme assigned
[in]mimeTypemime type where this scheme applies
[in]select_expressionsselect expressions that must all match for this scheme
[in]reject_expressionsselect expressions of which no one must match for this scheme
virtual bool strus::DocumentClassDetectorInterface::detect ( analyzer::DocumentClass dclass,
const char *  contentBegin,
std::size_t  contentBeginSize,
bool  isComplete 
) const
pure virtual

Scans the start of a document to detect its classification attributes (mime type, etc.)

Parameters
[in,out]dclassdocument class to edit
[in]contentBeginstart of content begin chunk
[in]contentBeginSizesize of content begin chunk
[in]isCompletetrue, of the chunk passed is the whole document (this might influence the result)
Returns
true, if the document class was recognized
Note
It is assumed that a reasonable size of the document chunk (e.g. 1K) is enough to detect the document class. This is an assumption that is wrong for many MIME types, but it should work for text content. At least it should be enough to recognize the segmenter to use.

The documentation for this class was generated from the following file: