Defines a program for analyzing a document, splitting it into normalized terms that can be fed to the strus IR engine.
More...
#include <contentStatisticsInterface.hpp>
Defines a program for analyzing a document, splitting it into normalized terms that can be fed to the strus IR engine.
virtual strus::ContentStatisticsInterface::~ContentStatisticsInterface |
( |
| ) |
|
|
inlinevirtual |
Declare an element of the library used to categorize features.
- Parameters
-
[in] | type | type name of the feature |
[in] | regex | regular expression that has to match on the whole segment in order to consider it as candidate |
[in] | priority | non negative number specifying the priority given to matches, for multiple matches only the ones with the highest priority are selected |
[in] | minLength | minimum number of tokens or -1 for no restriction |
[in] | maxLength | maximum number of tokens or -1 for no restriction |
[in] | tokenizer | tokenizer (ownership passed to this) to use for this feature |
[in] | normalizers | list of normalizers (element ownership passed to this) to use for this feature |
virtual void strus::ContentStatisticsInterface::addSelectorExpression |
( |
const std::string & |
expression | ) |
|
|
pure virtual |
Define a selector expression that is chosen for content elements that matches it.
- Parameters
-
[in] | expression | expression for selecting chunks |
virtual void strus::ContentStatisticsInterface::addVisibleAttribute |
( |
const std::string & |
name | ) |
|
|
pure virtual |
Define an attribute to be visible in content statistics path conditions.
- Parameters
-
[in] | name | of the attribute to show in a path |
Create the context used for collecting document statitics.
- Returns
- the document content statistics context (with ownership)
Return a structure with all definitions for introspection.
- Returns
- the structure with all definitions for introspection
The documentation for this class was generated from the following file: