Defines a detector that returns a content description for a document content it recognizes.
More...
#include <documentClassDetectorInterface.hpp>
|
virtual | ~DocumentClassDetectorInterface () |
| Destructor. More...
|
|
virtual void | defineDocumentSchemeDetector (const std::string &scheme, const std::string &mimeType, const std::vector< std::string > &select_expressions, const std::vector< std::string > &reject_expressions)=0 |
| Define a detector for a document scheme. More...
|
|
virtual bool | detect (analyzer::DocumentClass &dclass, const char *contentBegin, std::size_t contentBeginSize, bool isComplete) const =0 |
| Scans the start of a document to detect its classification attributes (mime type, etc.) More...
|
|
Defines a detector that returns a content description for a document content it recognizes.
virtual strus::DocumentClassDetectorInterface::~DocumentClassDetectorInterface |
( |
| ) |
|
|
inlinevirtual |
virtual void strus::DocumentClassDetectorInterface::defineDocumentSchemeDetector |
( |
const std::string & |
scheme, |
|
|
const std::string & |
mimeType, |
|
|
const std::vector< std::string > & |
select_expressions, |
|
|
const std::vector< std::string > & |
reject_expressions |
|
) |
| |
|
pure virtual |
Define a detector for a document scheme.
- Parameters
-
[in] | scheme | document scheme assigned |
[in] | mimeType | mime type where this scheme applies |
[in] | select_expressions | select expressions that must all match for this scheme |
[in] | reject_expressions | select expressions of which no one must match for this scheme |
virtual bool strus::DocumentClassDetectorInterface::detect |
( |
analyzer::DocumentClass & |
dclass, |
|
|
const char * |
contentBegin, |
|
|
std::size_t |
contentBeginSize, |
|
|
bool |
isComplete |
|
) |
| const |
|
pure virtual |
Scans the start of a document to detect its classification attributes (mime type, etc.)
- Parameters
-
[in,out] | dclass | document class to edit |
[in] | contentBegin | start of content begin chunk |
[in] | contentBeginSize | size of content begin chunk |
[in] | isComplete | true, of the chunk passed is the whole document (this might influence the result) |
- Returns
- true, if the document class was recognized
- Note
- It is assumed that a reasonable size of the document chunk (e.g. 1K) is enough to detect the document class. This is an assumption that is wrong for many MIME types, but it should work for text content. At least it should be enough to recognize the segmenter to use.
The documentation for this class was generated from the following file: