strus toplevel namespace More...

Namespaces
	analyzer
	analyzer parameter and return value objects namespace

Classes
class	AggregatorFunctionInstanceInterface
	Interface for a parameterized aggregator function. More...

class	AggregatorFunctionInterface
	Interface for the aggregator function constructor. More...

class	AnalyzerObjectBuilderInterface
	Interface providing a mechanism to create complex multi component objects for the document and query analysis in strus. More...

class	ContentIteratorInterface
	Defines an iterator on content provided by a segmenter. More...

class	ContentStatisticsContextInterface
	Defines a program for analyzing a document, splitting it into normalized terms that can be fed to the strus IR engine. More...

class	ContentStatisticsInterface
	Defines a program for analyzing a document, splitting it into normalized terms that can be fed to the strus IR engine. More...

class	DocumentAnalyzerContextInterface
	Defines the context for analyzing multi part documents, iterating on the sub documents defined, splitting them into normalized terms that can be fed to the strus IR engine. More...

class	DocumentAnalyzerInstanceInterface
	Defines a program for analyzing a document, splitting it into normalized terms that can be fed to the strus IR engine. More...

class	DocumentAnalyzerMapInterface
	Defines a program for analyzing a document, splitting it into normalized terms that can be fed to the strus IR engine. More...

class	DocumentClassDetectorInterface
	Defines a detector that returns a content description for a document content it recognizes. More...

class	TagAttributeMarkupInterface

class	DocumentTagMarkupDef

class	PatternResultFormatContext
	Context for mapping result format strings (allocator,maps,etc.) More...

class	PatternResultFormatVariableMap
	Interface to map variables to a pointer to string. More...

class	PatternResultFormatTable
	Parser for result format strings. More...

struct	PatternResultFormatChunk
	Single chunk of a result format for iterating ans build the pattern match result. More...

class	PatternResultFormatMap
	Result format for the output of pattern match results with names of members as variables in curly brackets '{' '}'. More...

class	PatternSerializer
	Object with all interfaces needed for serialization. More...

class	NormalizerFunctionInstanceInterface
	Interface for a parameterized normalization function. More...

class	NormalizerFunctionInterface
	Interface for the normalizer constructor. More...

class	PatternLexerContextInterface
	Interface for detecting lexems used as basic entities by pattern matching in text. More...

class	PatternLexerInstanceInterface
	Interface for building the automaton for detecting lexems used as basic entities by pattern matching in text. More...

class	PatternLexerInterface
	Interface for instantiating the data structure of an automaton for detecting lexems used as basic entities by pattern matching in text. More...

class	PatternMatcherContextInterface
	Interface for detecting patterns (structures formed by atomic tokens) in one document. More...

class	PatternMatcherInstanceInterface
	Interface for building the automaton for detecting patterns in text. More...

class	PatternMatcherInterface
	Interface for creating an automaton for detecting patterns of tokens in a document stream. More...

class	PatternTermFeederInstanceInterface
	Instance interface for defining a mapping of terms of the document analysis outout as lexems used as basic entities by pattern matching. More...

class	PatternTermFeederInterface
	Interface for instantiating the data structure of an automaton for detecting lexems used as basic entities by pattern matching in text. More...

class	PosTaggerContextInterface
	Context to markup documents with tags derived from POS tagging. More...

class	PosTaggerDataInterface
	Interface for the data built by a POS tagger. More...

class	PosTaggerInstanceInterface
	Interface to define a POS tagger instance for creating the input for POS tagging to build the data and to create to context for tagging with the data build from the POS tagging output. More...

class	PosTaggerInterface
	Interface for the construction of a POS tagger instance for a specified segmenter. More...

class	QueryAnalyzerContextInterface
	Defines the context for analyzing queries for the strus IR engine. More...

class	QueryAnalyzerInstanceInterface
	Defines a program for analyzing chunks of a query. More...

class	SegmenterContextInterface
	Defines the context for segmenting one document. More...

class	SegmenterInstanceInterface
	Defines a program for splitting a source text it into chunks with an id correspoding to a selecting expression. More...

class	SegmenterInterface
	Defines an interface for creating instances of programs for document segmentation. More...

class	SegmenterMarkupContextInterface
	Defines the context for inserting markups into one document. More...

class	TextProcessorInterface
	Interface for the object providing tokenizers and normalizers used for creating terms from segments of text and functions for collecting overall document statistics. More...

class	TokenizerFunctionInstanceInterface
	Interface for tokenization. More...

class	TokenizerFunctionInterface
	Interface for a tokenizer function. More...

class	TokenMarkupContextInterface
	Interface for annotation of text in one document. More...

class	TokenMarkupInstanceInterface
	Interface for building the automaton for detecting patterns of tokens in a document stream. More...

Typedefs
typedef struct PatternResultFormat	PatternResultFormat
	Result format representation (hidden implementation) More...

typedef int	SegmenterPosition
	Position of a segment in the original source. More...

Enumerations
enum	PatternSerializerType { PatternMatcherWithLexer, PatternMatcherWithFeeder }
	Defines different types of pattern matchers to serialize. More...

Functions
AggregatorFunctionInterface *	createAggregator_typeset (ErrorBufferInterface *errorhnd)
	Get the aggregator function type for the cosine measure normalization factor. More...

AggregatorFunctionInterface *	createAggregator_valueset (ErrorBufferInterface *errorhnd)

AggregatorFunctionInterface *	createAggregator_sumSquareTf (ErrorBufferInterface *errorhnd)
	Get the aggregator function type for the cosine measure normalization factor. More...

DocumentAnalyzerInstanceInterface *	createDocumentAnalyzer (const TextProcessorInterface textproc, const SegmenterInterface segmenter, const analyzer::SegmenterOptions &opts, ErrorBufferInterface *errorhnd)
	Creates a parameterizable analyzer instance for analyzing documents. More...

QueryAnalyzerInstanceInterface *	createQueryAnalyzer (ErrorBufferInterface *errorhnd)
	Creates a parameterizable analyzer instance for analyzing queries. More...

DocumentAnalyzerMapInterface *	createDocumentAnalyzerMap (const AnalyzerObjectBuilderInterface objbuilder, ErrorBufferInterface errorhnd)
	Creates a analyzer map for bundling different instances of analyzers for different classes of documents. More...

AnalyzerObjectBuilderInterface *	createAnalyzerObjectBuilder_default (const FileLocatorInterface filelocator, ErrorBufferInterface errorhnd)
	Create a storage object builder with the builders from the standard strus core libraries (without module support) More...

analyzer::DocumentClass	parse_DocumentClass (const std::string &src, ErrorBufferInterface *errorhnd)
	parse the document class from source More...

bool	load_DocumentAnalyzer_program_std (DocumentAnalyzerInstanceInterface analyzer, const TextProcessorInterface textproc, const std::string &content, ErrorBufferInterface *errorhnd)
	Load a program given as source without includes to a document analyzer. More...

bool	load_DocumentAnalyzer_programfile_std (DocumentAnalyzerInstanceInterface analyzer, const TextProcessorInterface textproc, const std::string &filename, ErrorBufferInterface *errorhnd)
	Load a program given as source file name to a document analyzer, recursively expanding include directives (C preprocessor style) at the beginning of the source to load. More...

bool	load_QueryAnalyzer_program_std (QueryAnalyzerInstanceInterface analyzer, const TextProcessorInterface textproc, const std::string &content, ErrorBufferInterface *errorhnd)
	Load a program given as source without includes to a document analyzer. More...

bool	load_QueryAnalyzer_programfile_std (QueryAnalyzerInstanceInterface analyzer, const TextProcessorInterface textproc, const std::string &filename, ErrorBufferInterface *errorhnd)
	Load a program given as source file name to a query analyzer, recursively expanding include directives (C preprocessor style) at the beginning of the source to load. More...

bool	is_DocumentAnalyzer_programfile (const TextProcessorInterface textproc, const std::string &filename, ErrorBufferInterface errorhnd)
	Test if a file is an analyzer program file. More...

bool	is_DocumentAnalyzer_program (const std::string &source, ErrorBufferInterface *errorhnd)
	Test if a file is an analyzer program file. More...

bool	load_DocumentAnalyzerMap_program (DocumentAnalyzerMapInterface analyzermap, const TextProcessorInterface textproc, const std::string &source, ErrorBufferInterface *errorhnd)
	Load a map of definitions describing how different document types are mapped to an analyzer program from its source. More...

bool	load_DocumentAnalyzerMap_programfile (DocumentAnalyzerMapInterface analyzermap, const TextProcessorInterface textproc, const std::string &filename, ErrorBufferInterface *errorhnd)
	Load a map of definitions describing how different document types are mapped to an analyzer program from a file. More...

bool	load_PatternMatcher_program (const TextProcessorInterface textproc, PatternTermFeederInstanceInterface feeder, PatternMatcherInstanceInterface matcher, const std::string &content, ErrorBufferInterface errorhnd)
	Load a pattern matcher program with a term feeder from source. More...

bool	load_PatternMatcher_programfile (const TextProcessorInterface textproc, PatternTermFeederInstanceInterface feeder, PatternMatcherInstanceInterface matcher, const std::string &filename, ErrorBufferInterface errorhnd)
	Load a pattern matcher program with a term feeder from a resource file. More...

bool	load_PatternMatcher_program (const TextProcessorInterface textproc, PatternLexerInstanceInterface lexer, PatternMatcherInstanceInterface matcher, const std::string &content, ErrorBufferInterface errorhnd)
	Load a pattern matcher program with a lexer from source. More...

bool	load_PatternMatcher_programfile (const TextProcessorInterface textproc, PatternLexerInstanceInterface lexer, PatternMatcherInstanceInterface matcher, const std::string &filename, ErrorBufferInterface errorhnd)
	Load a pattern matcher program with a lexer from a resource file. More...

ContentStatisticsInterface *	createContentStatistics_std (const TextProcessorInterface textproc, const DocumentClassDetectorInterface detector, ErrorBufferInterface *errorhnd)
	Get the standard content statistics. More...

DocumentClassDetectorInterface *	createDetector_std (const TextProcessorInterface textproc, ErrorBufferInterface errorhnd)
	Get the standard content detector (with ownership) More...

std::string	markupDocumentTags (const analyzer::DocumentClass &documentClass, const std::string &content, const std::vector< DocumentTagMarkupDef > &markups, const TextProcessorInterface textproc, ErrorBufferInterface errorhnd)
	Analyze a content and put markups on every tag matching an expression. More...

TokenMarkupInstanceInterface *	createTokenMarkupInstance_standard (ErrorBufferInterface *errorhnd)
	Create the interface for markup of tokens in a document text. More...

NormalizerFunctionInterface *	createNormalizer_lowercase (ErrorBufferInterface *errorhnd)
	Get the normalizer that returns the lower case of the input as result. More...

NormalizerFunctionInterface *	createNormalizer_uppercase (ErrorBufferInterface *errorhnd)
	Get the normalizer that returns the upper case of the input as result. More...

NormalizerFunctionInterface *	createNormalizer_convdia (ErrorBufferInterface *errorhnd)
	Get the normalizer that returns the conversion of diacritical characters to ascii of the input as result. More...

NormalizerFunctionInterface *	createNormalizer_charselect (ErrorBufferInterface *errorhnd)
	Get the normalizer that returns the selection of characters defined by named sets as result. More...

NormalizerFunctionInterface *	createNormalizer_date2int (ErrorBufferInterface *errorhnd)
	Get the normalizer that returns the conversion of the input date as number (various units configurable base) More...

NormalizerFunctionInterface *	createNormalizer_dictmap (ErrorBufferInterface *errorhnd)
	Get the normalizer that returns the mapping of the input with a dictionary as result. More...

NormalizerFunctionInterface *	createNormalizer_ngram (ErrorBufferInterface *errorhnd)
	Get the normalizer that returns the ngrams of the input as result. More...

NormalizerFunctionInterface *	createNormalizer_regex (ErrorBufferInterface *errorhnd)
	Get the normalizer that returns the mapping of the input with help of regular expressions as result. More...

NormalizerFunctionInterface *	createNormalizer_snowball (ErrorBufferInterface *errorhnd)
	Get the normalizer that returns the stemming of the input with the snowball stemmer as result. More...

NormalizerFunctionInterface *	createNormalizer_substrindex (ErrorBufferInterface *errorhnd)
	Get the normalizer that returns the input trimmed as result. More...

NormalizerFunctionInterface *	createNormalizer_substrmap (ErrorBufferInterface *errorhnd)

NormalizerFunctionInterface *	createNormalizer_trim (ErrorBufferInterface *errorhnd)
	Get the normalizer that returns the input trimmed as result. More...

NormalizerFunctionInterface *	createNormalizer_wordjoin (ErrorBufferInterface *errorhnd)
	Get the normalizer that returns the words tokenized from the input joined. More...

bool	isPatternSerializerContent (const std::string &m_itrcontent, ErrorBufferInterface *errorhnd)
	Evaluate, if a content is a pattern serialization. More...

PatternSerializer *	createPatternSerializer (const std::string &filename, const PatternSerializerType &serializerType, ErrorBufferInterface *errorhnd)
	Create a serializer of patterns loaded. More...

PatternSerializer *	createPatternSerializerText (std::ostream &output, const PatternSerializerType &serializerType, ErrorBufferInterface *errorhnd)
	Create a serializer of patterns loaded as text to a stream. More...

bool	loadPatternMatcherFromSerialization (const std::string &source, PatternLexerInstanceInterface lexer, PatternMatcherInstanceInterface matcher, ErrorBufferInterface *errorhnd)
	Instantiate pattern matching interfaces from serialization. More...

bool	loadPatternMatcherFromSerialization (const std::string &source, PatternTermFeederInstanceInterface feeder, PatternMatcherInstanceInterface matcher, ErrorBufferInterface *errorhnd)
	Instantiate pattern matching interfaces from serialization. More...

PatternTermFeederInterface *	createPatternTermFeeder_default (ErrorBufferInterface *errorhnd)
	Create the term feeder interface for pattern matching on analyzer output as input. More...

PatternLexerInterface *	createPatternLexer_test (ErrorBufferInterface *errorhnd)
	Create the interface for regular expression matching usable as groud truth for testing. More...

PatternMatcherInterface *	createPatternMatcher_test (ErrorBufferInterface *errorhnd)
	Create the interface for pattern matching usable as groud truth for testing. More...

PosTaggerDataInterface *	createPosTaggerData_standard (TokenizerFunctionInstanceInterface tokenizer, ErrorBufferInterface errorhnd)
	Create an interface for building up the data to tag documents with. More...

PosTaggerInterface *	createPosTagger_standard (ErrorBufferInterface *errorhnd)
	Create an interface for the construction of a POS tagger instance for a specified segmenter. More...

SegmenterInterface *	createSegmenter_cjson (ErrorBufferInterface *errorhnd)
	Get a document JSON segmenter based on cjson. More...

std::vector< std::string >	splitJsonDocumentList (const std::string &encoding, const std::string &content, ErrorBufferInterface *errorhnd)

SegmenterInterface *	createSegmenter_plain (ErrorBufferInterface *errorhnd)
	Get a document plain text segmenter. More...

SegmenterInterface *	createSegmenter_textwolf (ErrorBufferInterface *errorhnd)
	Get a document XML segmenter based on textwolf. More...

SegmenterInterface *	createSegmenter_tsv (ErrorBufferInterface *errorhnd)
	Get a document segmenter using tab-separated files as input. More...

TextProcessorInterface *	createTextProcessor (const FileLocatorInterface filelocator, ErrorBufferInterface errorhnd)
	Create a text processor. More...

TokenizerFunctionInterface *	createTokenizer_punctuation (ErrorBufferInterface *errorhnd)
	Get the tokenizer type that creates the tokenization of punctuation elements in the input. More...

TokenizerFunctionInterface *	createTokenizer_regex (ErrorBufferInterface *errorhnd)
	Get the tokenizer type that creates the tokenization with help of regular expressions. More...

TokenizerFunctionInterface *	createTokenizer_textcat (const TextProcessorInterface textproc, ErrorBufferInterface errorhnd)
	Get the tokenizer type that creates the tokenization of words in a recognized language. More...

TokenizerFunctionInterface *	createTokenizer_word (ErrorBufferInterface *errorhnd)
	Get the tokenizer type that creates the tokenization of words in the input. More...

TokenizerFunctionInterface *	createTokenizer_whitespace (ErrorBufferInterface *errorhnd)
	Get the tokenizer type that creates the tokenization as splitting of the input by whitespaces. More...

TokenizerFunctionInterface *	createTokenizer_langtoken (ErrorBufferInterface *errorhnd)
	Get the tokenizer type that creates the tokenization as splitting of all tokens, returning sequnces of language characters as tokens and word boundary delimiters as single character. More...

Detailed Description

strus toplevel namespace

Exported functions for the program loader of the analyzer (load program in a domain specific language)

Typedef Documentation

typedef struct PatternResultFormat strus::PatternResultFormat

Result format representation (hidden implementation)

typedef int strus::SegmenterPosition

Position of a segment in the original source.

Enumeration Type Documentation

enum strus::PatternSerializerType

Defines different types of pattern matchers to serialize.

Enumerator
PatternMatcherWithLexer
PatternMatcherWithFeeder

Function Documentation

AggregatorFunctionInterface* strus::createAggregator_sumSquareTf ( ErrorBufferInterface * errorhnd )

Get the aggregator function type for the cosine measure normalization factor.

Returns: the aggregator function

AggregatorFunctionInterface* strus::createAggregator_typeset ( ErrorBufferInterface * errorhnd )

Get the aggregator function type for the cosine measure normalization factor.

Returns: the aggregator function

AggregatorFunctionInterface* strus::createAggregator_valueset ( ErrorBufferInterface * errorhnd )

AnalyzerObjectBuilderInterface* strus::createAnalyzerObjectBuilder_default	(	const FileLocatorInterface *	filelocator,
		ErrorBufferInterface *	errorhnd
	)

Create a storage object builder with the builders from the standard strus core libraries (without module support)

Parameters

[in]	filelocator	resources and file locator interface
[in]	errorhnd	error buffer interface

ContentStatisticsInterface* strus::createContentStatistics_std	(	const TextProcessorInterface *	textproc,
		const DocumentClassDetectorInterface *	detector,
		ErrorBufferInterface *	errorhnd
	)

Get the standard content statistics.

Returns: the standard content statistics interface (with ownership)

DocumentClassDetectorInterface* strus::createDetector_std	(	const TextProcessorInterface *	textproc,
		ErrorBufferInterface *	errorhnd
	)

Get the standard content detector (with ownership)

Returns: the content detector class

DocumentAnalyzerInstanceInterface* strus::createDocumentAnalyzer	(	const TextProcessorInterface *	textproc,
		const SegmenterInterface *	segmenter,
		const analyzer::SegmenterOptions &	opts,
		ErrorBufferInterface *	errorhnd
	)

Creates a parameterizable analyzer instance for analyzing documents.

Parameters

[in]	segmenter	segmenter type to be used by the created analyzer.
[in]	textproc	text processor for creating functions and resources needed for analysis
[in]	segmenter	segmenter type
[in]	opts	options for the segmenter
[in]	errorhnd	error buffer interface

Returns: the analyzer program (with ownership)

DocumentAnalyzerMapInterface* strus::createDocumentAnalyzerMap	(	const AnalyzerObjectBuilderInterface *	objbuilder,
		ErrorBufferInterface *	errorhnd
	)

Creates a analyzer map for bundling different instances of analyzers for different classes of documents.

Parameters

[in]	objbuilder	analyzer object builder interface
[in]	errorhnd	error buffer interface

Returns: the analyzer program (with ownership)

NormalizerFunctionInterface* strus::createNormalizer_charselect ( ErrorBufferInterface * errorhnd )

Get the normalizer that returns the selection of characters defined by named sets as result.

Returns: the normalization function

NormalizerFunctionInterface* strus::createNormalizer_convdia ( ErrorBufferInterface * errorhnd )

Get the normalizer that returns the conversion of diacritical characters to ascii of the input as result.

Returns: the normalization function

NormalizerFunctionInterface* strus::createNormalizer_date2int ( ErrorBufferInterface * errorhnd )

Get the normalizer that returns the conversion of the input date as number (various units configurable base)

Returns: the normalization function

NormalizerFunctionInterface* strus::createNormalizer_dictmap ( ErrorBufferInterface * errorhnd )

Get the normalizer that returns the mapping of the input with a dictionary as result.

Returns: the normalization function

NormalizerFunctionInterface* strus::createNormalizer_lowercase ( ErrorBufferInterface * errorhnd )

Get the normalizer that returns the lower case of the input as result.

Returns: the normalization function

NormalizerFunctionInterface* strus::createNormalizer_ngram ( ErrorBufferInterface * errorhnd )

Get the normalizer that returns the ngrams of the input as result.

Returns: the normalization function

NormalizerFunctionInterface* strus::createNormalizer_regex ( ErrorBufferInterface * errorhnd )

Get the normalizer that returns the mapping of the input with help of regular expressions as result.

Returns: the normalization function

NormalizerFunctionInterface* strus::createNormalizer_snowball ( ErrorBufferInterface * errorhnd )

Get the normalizer that returns the stemming of the input with the snowball stemmer as result.

Returns: the normalization function

NormalizerFunctionInterface* strus::createNormalizer_substrindex ( ErrorBufferInterface * errorhnd )

Get the normalizer that returns the input trimmed as result.

Returns: the normalization function

NormalizerFunctionInterface* strus::createNormalizer_substrmap ( ErrorBufferInterface * errorhnd )

NormalizerFunctionInterface* strus::createNormalizer_trim ( ErrorBufferInterface * errorhnd )

Get the normalizer that returns the input trimmed as result.

Returns: the normalization function

NormalizerFunctionInterface* strus::createNormalizer_uppercase ( ErrorBufferInterface * errorhnd )

Get the normalizer that returns the upper case of the input as result.

Returns: the normalization function

NormalizerFunctionInterface* strus::createNormalizer_wordjoin ( ErrorBufferInterface * errorhnd )

Get the normalizer that returns the words tokenized from the input joined.

Returns: the normalization function

PatternLexerInterface* strus::createPatternLexer_test ( ErrorBufferInterface * errorhnd )

Create the interface for regular expression matching usable as groud truth for testing.

PatternMatcherInterface* strus::createPatternMatcher_test ( ErrorBufferInterface * errorhnd )

Create the interface for pattern matching usable as groud truth for testing.

PatternSerializer* strus::createPatternSerializer	(	const std::string &	filename,
		const PatternSerializerType &	serializerType,
		ErrorBufferInterface *	errorhnd
	)

Create a serializer of patterns loaded.

Parameters

[in]	filename	path to file where to write the output to
[in]	serializerType	type of serialization
[in]	errorhnd	error buffer interface

PatternSerializer* strus::createPatternSerializerText	(	std::ostream &	output,
		const PatternSerializerType &	serializerType,
		ErrorBufferInterface *	errorhnd
	)

Create a serializer of patterns loaded as text to a stream.

Parameters

[in]	output	where to print text output to
[in]	serializerType	type of serialization
[in]	errorhnd	error buffer interface

PatternTermFeederInterface* strus::createPatternTermFeeder_default ( ErrorBufferInterface * errorhnd )

Create the term feeder interface for pattern matching on analyzer output as input.

Parameters

[in] errorhnd error buffer interface

PosTaggerInterface* strus::createPosTagger_standard ( ErrorBufferInterface * errorhnd )

Create an interface for the construction of a POS tagger instance for a specified segmenter.

Parameters

[in] errorhnd error buffer interface for exceptions thrown

Returns: the POS tagger base interface

PosTaggerDataInterface* strus::createPosTaggerData_standard	(	TokenizerFunctionInstanceInterface *	tokenizer,
		ErrorBufferInterface *	errorhnd
	)

Create an interface for building up the data to tag documents with.

Parameters

[in]	tokenizer	tokenizer interface to use (passed with ownership)
[in]	errorhnd	error buffer interface for exceptions thrown

Returns: the structure to collect POS tagging output

QueryAnalyzerInstanceInterface* strus::createQueryAnalyzer ( ErrorBufferInterface * errorhnd )

Creates a parameterizable analyzer instance for analyzing queries.

Parameters

[in] errorhnd error buffer interface

Returns: the analyzer program (with ownership)

SegmenterInterface* strus::createSegmenter_cjson ( ErrorBufferInterface * errorhnd )

Get a document JSON segmenter based on cjson.

Returns: the segmenter

SegmenterInterface* strus::createSegmenter_plain ( ErrorBufferInterface * errorhnd )

Get a document plain text segmenter.

Returns: the segmenter

SegmenterInterface* strus::createSegmenter_textwolf ( ErrorBufferInterface * errorhnd )

Get a document XML segmenter based on textwolf.

Returns: the segmenter

SegmenterInterface* strus::createSegmenter_tsv ( ErrorBufferInterface * errorhnd )

Get a document segmenter using tab-separated files as input.

Returns: the segmenter

TextProcessorInterface* strus::createTextProcessor	(	const FileLocatorInterface *	filelocator,
		ErrorBufferInterface *	errorhnd
	)

Create a text processor.

Returns: the constructed text processor

TokenizerFunctionInterface* strus::createTokenizer_langtoken ( ErrorBufferInterface * errorhnd )

Get the tokenizer type that creates the tokenization as splitting of all tokens, returning sequnces of language characters as tokens and word boundary delimiters as single character.

Returns: the tokenization function

TokenizerFunctionInterface* strus::createTokenizer_punctuation ( ErrorBufferInterface * errorhnd )

Get the tokenizer type that creates the tokenization of punctuation elements in the input.

Returns: the tokenization function

TokenizerFunctionInterface* strus::createTokenizer_regex ( ErrorBufferInterface * errorhnd )

Get the tokenizer type that creates the tokenization with help of regular expressions.

Returns: the tokenization function

TokenizerFunctionInterface* strus::createTokenizer_textcat	(	const TextProcessorInterface *	textproc,
		ErrorBufferInterface *	errorhnd
	)

Get the tokenizer type that creates the tokenization of words in a recognized language.

Returns: the tokenization function

TokenizerFunctionInterface* strus::createTokenizer_whitespace ( ErrorBufferInterface * errorhnd )

Get the tokenizer type that creates the tokenization as splitting of the input by whitespaces.

Returns: the tokenization function

TokenizerFunctionInterface* strus::createTokenizer_word ( ErrorBufferInterface * errorhnd )

Get the tokenizer type that creates the tokenization of words in the input.

Returns: the tokenization function

TokenMarkupInstanceInterface* strus::createTokenMarkupInstance_standard ( ErrorBufferInterface * errorhnd )

Create the interface for markup of tokens in a document text.

bool strus::is_DocumentAnalyzer_program	(	const std::string &	source,
		ErrorBufferInterface *	errorhnd
	)

Test if a file is an analyzer program file.

Parameters

[in]	filename	name of the file to load
[in,out]	errorhnd	buffer for reporting errors (exceptions)

Returns: true on success, false on failure (inspect errorhnd for errors)

bool strus::is_DocumentAnalyzer_programfile	(	const TextProcessorInterface *	textproc,
		const std::string &	filename,
		ErrorBufferInterface *	errorhnd
	)

Test if a file is an analyzer program file.

Parameters

[in]	textproc	text processor interface to determine the path of the filename
[in]	filename	name of the file to load
[in,out]	errorhnd	buffer for reporting errors (exceptions)

Returns: true on success, false on failure (inspect errorhnd for errors)

bool strus::isPatternSerializerContent	(	const std::string &	m_itrcontent,
		ErrorBufferInterface *	errorhnd
	)

Evaluate, if a content is a pattern serialization.

Parameters

[in] content content to check

bool strus::load_DocumentAnalyzer_program_std	(	DocumentAnalyzerInstanceInterface *	analyzer,
		const TextProcessorInterface *	textproc,
		const std::string &	content,
		ErrorBufferInterface *	errorhnd
	)

Load a program given as source without includes to a document analyzer.

Parameters

[in,out]	analyzer	analyzer object to instrument
[in]	source	source with definitions
[in,out]	errorhnd	buffer for reporting errors (exceptions)

Returns: true on success, false on failure (inspect errorhnd for errors)

bool strus::load_DocumentAnalyzer_programfile_std	(	DocumentAnalyzerInstanceInterface *	analyzer,
		const TextProcessorInterface *	textproc,
		const std::string &	filename,
		ErrorBufferInterface *	errorhnd
	)

Load a program given as source file name to a document analyzer, recursively expanding include directives (C preprocessor style) at the beginning of the source to load.

Parameters

[in,out]	analyzer	analyzer object to instrument
[in]	filename	name of the file to load
[in,out]	errorhnd	buffer for reporting errors (exceptions)

Returns: true on success, false on failure (inspect errorhnd for errors)

bool strus::load_DocumentAnalyzerMap_program	(	DocumentAnalyzerMapInterface *	analyzermap,
		const TextProcessorInterface *	textproc,
		const std::string &	source,
		ErrorBufferInterface *	errorhnd
	)

Load a map of definitions describing how different document types are mapped to an analyzer program from its source.

Parameters

[in,out]	analyzermap	map of analyzers to instrument
[in]	textproc	text processor interface to determine the path of filenames
[in]	source	source with definitions
[in,out]	errorhnd	buffer for reporting errors (exceptions)

Returns: true on success, false on failure

bool strus::load_DocumentAnalyzerMap_programfile	(	DocumentAnalyzerMapInterface *	analyzermap,
		const TextProcessorInterface *	textproc,
		const std::string &	filename,
		ErrorBufferInterface *	errorhnd
	)

Load a map of definitions describing how different document types are mapped to an analyzer program from a file.

Parameters

[in,out]	analyzermap	map of analyzers to instrument
[in]	textproc	text processor interface to determine the path of filenames
[in]	filename	source with definitions
[in,out]	errorhnd	buffer for reporting errors (exceptions)

Returns: true on success, false on failure

bool strus::load_PatternMatcher_program	(	const TextProcessorInterface *	textproc,
		PatternTermFeederInstanceInterface *	feeder,
		PatternMatcherInstanceInterface *	matcher,
		const std::string &	content,
		ErrorBufferInterface *	errorhnd
	)

Load a pattern matcher program with a term feeder from source.

Parameters

[in]	textproc	text processor interface to determine the path of filenames
[in,out]	feeder	term feeder to instrument
[in,out]	matcher	pattern matcher to instrument
[in]	content	source with definitions
[in,out]	errorhnd	buffer for reporting errors (exceptions)

Returns: true on success, false on failure

bool strus::load_PatternMatcher_program	(	const TextProcessorInterface *	textproc,
		PatternLexerInstanceInterface *	lexer,
		PatternMatcherInstanceInterface *	matcher,
		const std::string &	content,
		ErrorBufferInterface *	errorhnd
	)

Load a pattern matcher program with a lexer from source.

Parameters

[in]	textproc	text processor interface to determine the path of filenames
[in,out]	lexer	tokenization for the pattern matcher
[in,out]	matcher	pattern matcher to instrument
[in]	content	source with definitions
[in,out]	errorhnd	buffer for reporting errors (exceptions)

Returns: true on success, false on failure

bool strus::load_PatternMatcher_programfile	(	const TextProcessorInterface *	textproc,
		PatternTermFeederInstanceInterface *	feeder,
		PatternMatcherInstanceInterface *	matcher,
		const std::string &	filename,
		ErrorBufferInterface *	errorhnd
	)

Load a pattern matcher program with a term feeder from a resource file.

Parameters

[in]	textproc	text processor interface to determine the path of filenames
[in,out]	feeder	term feeder to instrument
[in,out]	matcher	pattern matcher to instrument
[in]	filename	file name of source with definitions
[in,out]	errorhnd	buffer for reporting errors (exceptions)

Returns: true on success, false on failure

bool strus::load_PatternMatcher_programfile	(	const TextProcessorInterface *	textproc,
		PatternLexerInstanceInterface *	lexer,
		PatternMatcherInstanceInterface *	matcher,
		const std::string &	filename,
		ErrorBufferInterface *	errorhnd
	)

Load a pattern matcher program with a lexer from a resource file.

Parameters

[in]	textproc	text processor interface to determine the path of filenames
[in,out]	lexer	tokenization for the pattern matcher
[in,out]	matcher	pattern matcher to instrument
[in]	filename	file name of source with definitions
[in,out]	errorhnd	buffer for reporting errors (exceptions)

Returns: true on success, false on failure

bool strus::load_QueryAnalyzer_program_std	(	QueryAnalyzerInstanceInterface *	analyzer,
		const TextProcessorInterface *	textproc,
		const std::string &	content,
		ErrorBufferInterface *	errorhnd
	)

Load a program given as source without includes to a document analyzer.

Parameters

[in,out]	analyzer	analyzer object to instrument
[in]	source	source with definitions
[in,out]	errorhnd	buffer for reporting errors (exceptions)

Returns: true on success, false on failure (inspect errorhnd for errors)

bool strus::load_QueryAnalyzer_programfile_std	(	QueryAnalyzerInstanceInterface *	analyzer,
		const TextProcessorInterface *	textproc,
		const std::string &	filename,
		ErrorBufferInterface *	errorhnd
	)

Load a program given as source file name to a query analyzer, recursively expanding include directives (C preprocessor style) at the beginning of the source to load.

Parameters

[in,out]	analyzer	analyzer object to instrument
[in]	filename	name of the file to load
[in,out]	errorhnd	buffer for reporting errors (exceptions)

Returns: true on success, false on failure (inspect errorhnd for errors)

bool strus::loadPatternMatcherFromSerialization	(	const std::string &	source,
		PatternLexerInstanceInterface *	lexer,
		PatternMatcherInstanceInterface *	matcher,
		ErrorBufferInterface *	errorhnd
	)

Instantiate pattern matching interfaces from serialization.

Parameters

[in]	source	source content (not a filename!) to read the input from
[in]	lexer	pattern lexer instance interface to instantiate from deserialization
[in]	matcher	pattern matcher instance interface to instantiate from deserialization
[in]	errorhnd	error buffer interface

bool strus::loadPatternMatcherFromSerialization	(	const std::string &	source,
		PatternTermFeederInstanceInterface *	feeder,
		PatternMatcherInstanceInterface *	matcher,
		ErrorBufferInterface *	errorhnd
	)

Instantiate pattern matching interfaces from serialization.

Parameters

[in]	source	source content (not a filename!) to read the input from
[in]	feeder	pattern term feeder instance interface to instantiate from deserialization
[in]	matcher	pattern matcher instance interface to instantiate from deserialization
[in]	errorhnd	error buffer interface

std::string strus::markupDocumentTags	(	const analyzer::DocumentClass &	documentClass,
		const std::string &	content,
		const std::vector< DocumentTagMarkupDef > &	markups,
		const TextProcessorInterface *	textproc,
		ErrorBufferInterface *	errorhnd
	)

Analyze a content and put markups on every tag matching an expression.

Remarks: This function is currently only implemented for XML

Parameters

[in]	documentClass	document class of the content with the encoding specified
[in]	content	the content to process
[in]	markups	array of definitions for markup
[in]	textproc	text processor interface
[in]	errorhnd	error buffer for reporting errors/exceptions

Returns: the tagged document

analyzer::DocumentClass strus::parse_DocumentClass	(	const std::string &	src,
		ErrorBufferInterface *	errorhnd
	)

parse the document class from source

Parameters

[in]	src	document class definition as string
[in,out]	errorhnd	interface for reporting errors and exceptions occurred

Returns: the document class structure

std::vector<std::string> strus::splitJsonDocumentList	(	const std::string &	encoding,
		const std::string &	content,
		ErrorBufferInterface *	errorhnd
	)

Namespaces

Classes

Typedefs

Enumerations

Functions

Detailed Description

Typedef Documentation

Enumeration Type Documentation

Function Documentation