Defines a program for analyzing a document, splitting it into normalized terms that can be fed to the strus IR engine. More...

Public Member Functions
virtual	~ContentStatisticsInterface ()
	Destructor. More...

virtual void	addLibraryElement (const std::string &type, const std::string &regex, int priority, int minLength, int maxLength, TokenizerFunctionInstanceInterface tokenizer, const std::vector< NormalizerFunctionInstanceInterface > &normalizers)=0
	Declare an element of the library used to categorize features. More...

virtual void	addVisibleAttribute (const std::string &name)=0
	Define an attribute to be visible in content statistics path conditions. More...

virtual void	addSelectorExpression (const std::string &expression)=0
	Define a selector expression that is chosen for content elements that matches it. More...

virtual ContentStatisticsContextInterface *	createContext () const =0
	Create the context used for collecting document statitics. More...

virtual analyzer::ContentStatisticsView	view () const =0
	Return a structure with all definitions for introspection. More...

Detailed Description

Defines a program for analyzing a document, splitting it into normalized terms that can be fed to the strus IR engine.

Constructor & Destructor Documentation

virtual strus::ContentStatisticsInterface::~ContentStatisticsInterface ( )

inlinevirtual

Destructor.

virtual void strus::ContentStatisticsInterface::addLibraryElement	(	const std::string &	type,
		const std::string &	regex,
		int	priority,
		int	minLength,
		int	maxLength,
		TokenizerFunctionInstanceInterface *	tokenizer,
		const std::vector< NormalizerFunctionInstanceInterface * > &	normalizers
	)

pure virtual

Declare an element of the library used to categorize features.

Parameters

[in]	type	type name of the feature
[in]	regex	regular expression that has to match on the whole segment in order to consider it as candidate
[in]	priority	non negative number specifying the priority given to matches, for multiple matches only the ones with the highest priority are selected
[in]	minLength	minimum number of tokens or -1 for no restriction
[in]	maxLength	maximum number of tokens or -1 for no restriction
[in]	tokenizer	tokenizer (ownership passed to this) to use for this feature
[in]	normalizers	list of normalizers (element ownership passed to this) to use for this feature

virtual void strus::ContentStatisticsInterface::addSelectorExpression ( const std::string & expression )

pure virtual

Define a selector expression that is chosen for content elements that matches it.

Parameters

[in] expression expression for selecting chunks

virtual void strus::ContentStatisticsInterface::addVisibleAttribute ( const std::string & name )

pure virtual

Define an attribute to be visible in content statistics path conditions.

Parameters

[in] name of the attribute to show in a path

virtual ContentStatisticsContextInterface* strus::ContentStatisticsInterface::createContext ( ) const

pure virtual

Create the context used for collecting document statitics.

virtual analyzer::ContentStatisticsView strus::ContentStatisticsInterface::view ( ) const

pure virtual

Return a structure with all definitions for introspection.

The documentation for this class was generated from the following file:

/home/patrick/Projects/github/strusAnalyzer/include/strus/contentStatisticsInterface.hpp