Defines a detector that returns a content description for a document content it recognizes. More...

Public Member Functions
virtual	~DocumentClassDetectorInterface ()
	Destructor. More...

virtual void	defineDocumentSchemeDetector (const std::string &scheme, const std::string &mimeType, const std::vector< std::string > &select_expressions, const std::vector< std::string > &reject_expressions)=0
	Define a detector for a document scheme. More...

virtual bool	detect (analyzer::DocumentClass &dclass, const char *contentBegin, std::size_t contentBeginSize, bool isComplete) const =0
	Scans the start of a document to detect its classification attributes (mime type, etc.) More...

Detailed Description

Defines a detector that returns a content description for a document content it recognizes.

Constructor & Destructor Documentation

virtual strus::DocumentClassDetectorInterface::~DocumentClassDetectorInterface ( )

inlinevirtual

Destructor.

virtual void strus::DocumentClassDetectorInterface::defineDocumentSchemeDetector	(	const std::string &	scheme,
		const std::string &	mimeType,
		const std::vector< std::string > &	select_expressions,
		const std::vector< std::string > &	reject_expressions
	)

pure virtual

Define a detector for a document scheme.

Parameters

[in]	scheme	document scheme assigned
[in]	mimeType	mime type where this scheme applies
[in]	select_expressions	select expressions that must all match for this scheme
[in]	reject_expressions	select expressions of which no one must match for this scheme

virtual bool strus::DocumentClassDetectorInterface::detect	(	analyzer::DocumentClass &	dclass,
		const char *	contentBegin,
		std::size_t	contentBeginSize,
		bool	isComplete
	)		const

pure virtual

Scans the start of a document to detect its classification attributes (mime type, etc.)

Parameters

[in,out]	dclass	document class to edit
[in]	contentBegin	start of content begin chunk
[in]	contentBeginSize	size of content begin chunk
[in]	isComplete	true, of the chunk passed is the whole document (this might influence the result)

Note: It is assumed that a reasonable size of the document chunk (e.g. 1K) is enough to detect the document class. This is an assumption that is wrong for many MIME types, but it should work for text content. At least it should be enough to recognize the segmenter to use.

The documentation for this class was generated from the following file:

/home/patrick/Projects/github/strusAnalyzer/include/strus/documentClassDetectorInterface.hpp