strusAnalyzer
0.17
|
strus toplevel namespace More...
Namespaces | |
analyzer | |
analyzer parameter and return value objects namespace | |
Classes | |
class | AggregatorFunctionInstanceInterface |
Interface for a parameterized aggregator function. More... | |
class | AggregatorFunctionInterface |
Interface for the aggregator function constructor. More... | |
class | AnalyzerObjectBuilderInterface |
Interface providing a mechanism to create complex multi component objects for the document and query analysis in strus. More... | |
class | ContentIteratorInterface |
Defines an iterator on content provided by a segmenter. More... | |
class | ContentStatisticsContextInterface |
Defines a program for analyzing a document, splitting it into normalized terms that can be fed to the strus IR engine. More... | |
class | ContentStatisticsInterface |
Defines a program for analyzing a document, splitting it into normalized terms that can be fed to the strus IR engine. More... | |
class | DocumentAnalyzerContextInterface |
Defines the context for analyzing multi part documents, iterating on the sub documents defined, splitting them into normalized terms that can be fed to the strus IR engine. More... | |
class | DocumentAnalyzerInstanceInterface |
Defines a program for analyzing a document, splitting it into normalized terms that can be fed to the strus IR engine. More... | |
class | DocumentAnalyzerMapInterface |
Defines a program for analyzing a document, splitting it into normalized terms that can be fed to the strus IR engine. More... | |
class | DocumentClassDetectorInterface |
Defines a detector that returns a content description for a document content it recognizes. More... | |
class | TagAttributeMarkupInterface |
class | DocumentTagMarkupDef |
class | PatternResultFormatContext |
Context for mapping result format strings (allocator,maps,etc.) More... | |
class | PatternResultFormatVariableMap |
Interface to map variables to a pointer to string. More... | |
class | PatternResultFormatTable |
Parser for result format strings. More... | |
struct | PatternResultFormatChunk |
Single chunk of a result format for iterating ans build the pattern match result. More... | |
class | PatternResultFormatMap |
Result format for the output of pattern match results with names of members as variables in curly brackets '{' '}'. More... | |
class | PatternSerializer |
Object with all interfaces needed for serialization. More... | |
class | NormalizerFunctionInstanceInterface |
Interface for a parameterized normalization function. More... | |
class | NormalizerFunctionInterface |
Interface for the normalizer constructor. More... | |
class | PatternLexerContextInterface |
Interface for detecting lexems used as basic entities by pattern matching in text. More... | |
class | PatternLexerInstanceInterface |
Interface for building the automaton for detecting lexems used as basic entities by pattern matching in text. More... | |
class | PatternLexerInterface |
Interface for instantiating the data structure of an automaton for detecting lexems used as basic entities by pattern matching in text. More... | |
class | PatternMatcherContextInterface |
Interface for detecting patterns (structures formed by atomic tokens) in one document. More... | |
class | PatternMatcherInstanceInterface |
Interface for building the automaton for detecting patterns in text. More... | |
class | PatternMatcherInterface |
Interface for creating an automaton for detecting patterns of tokens in a document stream. More... | |
class | PatternTermFeederInstanceInterface |
Instance interface for defining a mapping of terms of the document analysis outout as lexems used as basic entities by pattern matching. More... | |
class | PatternTermFeederInterface |
Interface for instantiating the data structure of an automaton for detecting lexems used as basic entities by pattern matching in text. More... | |
class | PosTaggerContextInterface |
Context to markup documents with tags derived from POS tagging. More... | |
class | PosTaggerDataInterface |
Interface for the data built by a POS tagger. More... | |
class | PosTaggerInstanceInterface |
Interface to define a POS tagger instance for creating the input for POS tagging to build the data and to create to context for tagging with the data build from the POS tagging output. More... | |
class | PosTaggerInterface |
Interface for the construction of a POS tagger instance for a specified segmenter. More... | |
class | QueryAnalyzerContextInterface |
Defines the context for analyzing queries for the strus IR engine. More... | |
class | QueryAnalyzerInstanceInterface |
Defines a program for analyzing chunks of a query. More... | |
class | SegmenterContextInterface |
Defines the context for segmenting one document. More... | |
class | SegmenterInstanceInterface |
Defines a program for splitting a source text it into chunks with an id correspoding to a selecting expression. More... | |
class | SegmenterInterface |
Defines an interface for creating instances of programs for document segmentation. More... | |
class | SegmenterMarkupContextInterface |
Defines the context for inserting markups into one document. More... | |
class | TextProcessorInterface |
Interface for the object providing tokenizers and normalizers used for creating terms from segments of text and functions for collecting overall document statistics. More... | |
class | TokenizerFunctionInstanceInterface |
Interface for tokenization. More... | |
class | TokenizerFunctionInterface |
Interface for a tokenizer function. More... | |
class | TokenMarkupContextInterface |
Interface for annotation of text in one document. More... | |
class | TokenMarkupInstanceInterface |
Interface for building the automaton for detecting patterns of tokens in a document stream. More... | |
Typedefs | |
typedef struct PatternResultFormat | PatternResultFormat |
Result format representation (hidden implementation) More... | |
typedef int | SegmenterPosition |
Position of a segment in the original source. More... | |
Enumerations | |
enum | PatternSerializerType { PatternMatcherWithLexer, PatternMatcherWithFeeder } |
Defines different types of pattern matchers to serialize. More... | |
Functions | |
AggregatorFunctionInterface * | createAggregator_typeset (ErrorBufferInterface *errorhnd) |
Get the aggregator function type for the cosine measure normalization factor. More... | |
AggregatorFunctionInterface * | createAggregator_valueset (ErrorBufferInterface *errorhnd) |
AggregatorFunctionInterface * | createAggregator_sumSquareTf (ErrorBufferInterface *errorhnd) |
Get the aggregator function type for the cosine measure normalization factor. More... | |
DocumentAnalyzerInstanceInterface * | createDocumentAnalyzer (const TextProcessorInterface *textproc, const SegmenterInterface *segmenter, const analyzer::SegmenterOptions &opts, ErrorBufferInterface *errorhnd) |
Creates a parameterizable analyzer instance for analyzing documents. More... | |
QueryAnalyzerInstanceInterface * | createQueryAnalyzer (ErrorBufferInterface *errorhnd) |
Creates a parameterizable analyzer instance for analyzing queries. More... | |
DocumentAnalyzerMapInterface * | createDocumentAnalyzerMap (const AnalyzerObjectBuilderInterface *objbuilder, ErrorBufferInterface *errorhnd) |
Creates a analyzer map for bundling different instances of analyzers for different classes of documents. More... | |
AnalyzerObjectBuilderInterface * | createAnalyzerObjectBuilder_default (const FileLocatorInterface *filelocator, ErrorBufferInterface *errorhnd) |
Create a storage object builder with the builders from the standard strus core libraries (without module support) More... | |
analyzer::DocumentClass | parse_DocumentClass (const std::string &src, ErrorBufferInterface *errorhnd) |
parse the document class from source More... | |
bool | load_DocumentAnalyzer_program_std (DocumentAnalyzerInstanceInterface *analyzer, const TextProcessorInterface *textproc, const std::string &content, ErrorBufferInterface *errorhnd) |
Load a program given as source without includes to a document analyzer. More... | |
bool | load_DocumentAnalyzer_programfile_std (DocumentAnalyzerInstanceInterface *analyzer, const TextProcessorInterface *textproc, const std::string &filename, ErrorBufferInterface *errorhnd) |
Load a program given as source file name to a document analyzer, recursively expanding include directives (C preprocessor style) at the beginning of the source to load. More... | |
bool | load_QueryAnalyzer_program_std (QueryAnalyzerInstanceInterface *analyzer, const TextProcessorInterface *textproc, const std::string &content, ErrorBufferInterface *errorhnd) |
Load a program given as source without includes to a document analyzer. More... | |
bool | load_QueryAnalyzer_programfile_std (QueryAnalyzerInstanceInterface *analyzer, const TextProcessorInterface *textproc, const std::string &filename, ErrorBufferInterface *errorhnd) |
Load a program given as source file name to a query analyzer, recursively expanding include directives (C preprocessor style) at the beginning of the source to load. More... | |
bool | is_DocumentAnalyzer_programfile (const TextProcessorInterface *textproc, const std::string &filename, ErrorBufferInterface *errorhnd) |
Test if a file is an analyzer program file. More... | |
bool | is_DocumentAnalyzer_program (const std::string &source, ErrorBufferInterface *errorhnd) |
Test if a file is an analyzer program file. More... | |
bool | load_DocumentAnalyzerMap_program (DocumentAnalyzerMapInterface *analyzermap, const TextProcessorInterface *textproc, const std::string &source, ErrorBufferInterface *errorhnd) |
Load a map of definitions describing how different document types are mapped to an analyzer program from its source. More... | |
bool | load_DocumentAnalyzerMap_programfile (DocumentAnalyzerMapInterface *analyzermap, const TextProcessorInterface *textproc, const std::string &filename, ErrorBufferInterface *errorhnd) |
Load a map of definitions describing how different document types are mapped to an analyzer program from a file. More... | |
bool | load_PatternMatcher_program (const TextProcessorInterface *textproc, PatternTermFeederInstanceInterface *feeder, PatternMatcherInstanceInterface *matcher, const std::string &content, ErrorBufferInterface *errorhnd) |
Load a pattern matcher program with a term feeder from source. More... | |
bool | load_PatternMatcher_programfile (const TextProcessorInterface *textproc, PatternTermFeederInstanceInterface *feeder, PatternMatcherInstanceInterface *matcher, const std::string &filename, ErrorBufferInterface *errorhnd) |
Load a pattern matcher program with a term feeder from a resource file. More... | |
bool | load_PatternMatcher_program (const TextProcessorInterface *textproc, PatternLexerInstanceInterface *lexer, PatternMatcherInstanceInterface *matcher, const std::string &content, ErrorBufferInterface *errorhnd) |
Load a pattern matcher program with a lexer from source. More... | |
bool | load_PatternMatcher_programfile (const TextProcessorInterface *textproc, PatternLexerInstanceInterface *lexer, PatternMatcherInstanceInterface *matcher, const std::string &filename, ErrorBufferInterface *errorhnd) |
Load a pattern matcher program with a lexer from a resource file. More... | |
ContentStatisticsInterface * | createContentStatistics_std (const TextProcessorInterface *textproc, const DocumentClassDetectorInterface *detector, ErrorBufferInterface *errorhnd) |
Get the standard content statistics. More... | |
DocumentClassDetectorInterface * | createDetector_std (const TextProcessorInterface *textproc, ErrorBufferInterface *errorhnd) |
Get the standard content detector (with ownership) More... | |
std::string | markupDocumentTags (const analyzer::DocumentClass &documentClass, const std::string &content, const std::vector< DocumentTagMarkupDef > &markups, const TextProcessorInterface *textproc, ErrorBufferInterface *errorhnd) |
Analyze a content and put markups on every tag matching an expression. More... | |
TokenMarkupInstanceInterface * | createTokenMarkupInstance_standard (ErrorBufferInterface *errorhnd) |
Create the interface for markup of tokens in a document text. More... | |
NormalizerFunctionInterface * | createNormalizer_lowercase (ErrorBufferInterface *errorhnd) |
Get the normalizer that returns the lower case of the input as result. More... | |
NormalizerFunctionInterface * | createNormalizer_uppercase (ErrorBufferInterface *errorhnd) |
Get the normalizer that returns the upper case of the input as result. More... | |
NormalizerFunctionInterface * | createNormalizer_convdia (ErrorBufferInterface *errorhnd) |
Get the normalizer that returns the conversion of diacritical characters to ascii of the input as result. More... | |
NormalizerFunctionInterface * | createNormalizer_charselect (ErrorBufferInterface *errorhnd) |
Get the normalizer that returns the selection of characters defined by named sets as result. More... | |
NormalizerFunctionInterface * | createNormalizer_date2int (ErrorBufferInterface *errorhnd) |
Get the normalizer that returns the conversion of the input date as number (various units configurable base) More... | |
NormalizerFunctionInterface * | createNormalizer_dictmap (ErrorBufferInterface *errorhnd) |
Get the normalizer that returns the mapping of the input with a dictionary as result. More... | |
NormalizerFunctionInterface * | createNormalizer_ngram (ErrorBufferInterface *errorhnd) |
Get the normalizer that returns the ngrams of the input as result. More... | |
NormalizerFunctionInterface * | createNormalizer_regex (ErrorBufferInterface *errorhnd) |
Get the normalizer that returns the mapping of the input with help of regular expressions as result. More... | |
NormalizerFunctionInterface * | createNormalizer_snowball (ErrorBufferInterface *errorhnd) |
Get the normalizer that returns the stemming of the input with the snowball stemmer as result. More... | |
NormalizerFunctionInterface * | createNormalizer_substrindex (ErrorBufferInterface *errorhnd) |
Get the normalizer that returns the input trimmed as result. More... | |
NormalizerFunctionInterface * | createNormalizer_substrmap (ErrorBufferInterface *errorhnd) |
NormalizerFunctionInterface * | createNormalizer_trim (ErrorBufferInterface *errorhnd) |
Get the normalizer that returns the input trimmed as result. More... | |
NormalizerFunctionInterface * | createNormalizer_wordjoin (ErrorBufferInterface *errorhnd) |
Get the normalizer that returns the words tokenized from the input joined. More... | |
bool | isPatternSerializerContent (const std::string &m_itrcontent, ErrorBufferInterface *errorhnd) |
Evaluate, if a content is a pattern serialization. More... | |
PatternSerializer * | createPatternSerializer (const std::string &filename, const PatternSerializerType &serializerType, ErrorBufferInterface *errorhnd) |
Create a serializer of patterns loaded. More... | |
PatternSerializer * | createPatternSerializerText (std::ostream &output, const PatternSerializerType &serializerType, ErrorBufferInterface *errorhnd) |
Create a serializer of patterns loaded as text to a stream. More... | |
bool | loadPatternMatcherFromSerialization (const std::string &source, PatternLexerInstanceInterface *lexer, PatternMatcherInstanceInterface *matcher, ErrorBufferInterface *errorhnd) |
Instantiate pattern matching interfaces from serialization. More... | |
bool | loadPatternMatcherFromSerialization (const std::string &source, PatternTermFeederInstanceInterface *feeder, PatternMatcherInstanceInterface *matcher, ErrorBufferInterface *errorhnd) |
Instantiate pattern matching interfaces from serialization. More... | |
PatternTermFeederInterface * | createPatternTermFeeder_default (ErrorBufferInterface *errorhnd) |
Create the term feeder interface for pattern matching on analyzer output as input. More... | |
PatternLexerInterface * | createPatternLexer_test (ErrorBufferInterface *errorhnd) |
Create the interface for regular expression matching usable as groud truth for testing. More... | |
PatternMatcherInterface * | createPatternMatcher_test (ErrorBufferInterface *errorhnd) |
Create the interface for pattern matching usable as groud truth for testing. More... | |
PosTaggerDataInterface * | createPosTaggerData_standard (TokenizerFunctionInstanceInterface *tokenizer, ErrorBufferInterface *errorhnd) |
Create an interface for building up the data to tag documents with. More... | |
PosTaggerInterface * | createPosTagger_standard (ErrorBufferInterface *errorhnd) |
Create an interface for the construction of a POS tagger instance for a specified segmenter. More... | |
SegmenterInterface * | createSegmenter_cjson (ErrorBufferInterface *errorhnd) |
Get a document JSON segmenter based on cjson. More... | |
std::vector< std::string > | splitJsonDocumentList (const std::string &encoding, const std::string &content, ErrorBufferInterface *errorhnd) |
SegmenterInterface * | createSegmenter_plain (ErrorBufferInterface *errorhnd) |
Get a document plain text segmenter. More... | |
SegmenterInterface * | createSegmenter_textwolf (ErrorBufferInterface *errorhnd) |
Get a document XML segmenter based on textwolf. More... | |
SegmenterInterface * | createSegmenter_tsv (ErrorBufferInterface *errorhnd) |
Get a document segmenter using tab-separated files as input. More... | |
TextProcessorInterface * | createTextProcessor (const FileLocatorInterface *filelocator, ErrorBufferInterface *errorhnd) |
Create a text processor. More... | |
TokenizerFunctionInterface * | createTokenizer_punctuation (ErrorBufferInterface *errorhnd) |
Get the tokenizer type that creates the tokenization of punctuation elements in the input. More... | |
TokenizerFunctionInterface * | createTokenizer_regex (ErrorBufferInterface *errorhnd) |
Get the tokenizer type that creates the tokenization with help of regular expressions. More... | |
TokenizerFunctionInterface * | createTokenizer_textcat (const TextProcessorInterface *textproc, ErrorBufferInterface *errorhnd) |
Get the tokenizer type that creates the tokenization of words in a recognized language. More... | |
TokenizerFunctionInterface * | createTokenizer_word (ErrorBufferInterface *errorhnd) |
Get the tokenizer type that creates the tokenization of words in the input. More... | |
TokenizerFunctionInterface * | createTokenizer_whitespace (ErrorBufferInterface *errorhnd) |
Get the tokenizer type that creates the tokenization as splitting of the input by whitespaces. More... | |
TokenizerFunctionInterface * | createTokenizer_langtoken (ErrorBufferInterface *errorhnd) |
Get the tokenizer type that creates the tokenization as splitting of all tokens, returning sequnces of language characters as tokens and word boundary delimiters as single character. More... | |
strus toplevel namespace
Exported functions for the program loader of the analyzer (load program in a domain specific language)
typedef struct PatternResultFormat strus::PatternResultFormat |
Result format representation (hidden implementation)
typedef int strus::SegmenterPosition |
Position of a segment in the original source.
AggregatorFunctionInterface* strus::createAggregator_sumSquareTf | ( | ErrorBufferInterface * | errorhnd | ) |
Get the aggregator function type for the cosine measure normalization factor.
AggregatorFunctionInterface* strus::createAggregator_typeset | ( | ErrorBufferInterface * | errorhnd | ) |
Get the aggregator function type for the cosine measure normalization factor.
AggregatorFunctionInterface* strus::createAggregator_valueset | ( | ErrorBufferInterface * | errorhnd | ) |
AnalyzerObjectBuilderInterface* strus::createAnalyzerObjectBuilder_default | ( | const FileLocatorInterface * | filelocator, |
ErrorBufferInterface * | errorhnd | ||
) |
Create a storage object builder with the builders from the standard strus core libraries (without module support)
[in] | filelocator | resources and file locator interface |
[in] | errorhnd | error buffer interface |
ContentStatisticsInterface* strus::createContentStatistics_std | ( | const TextProcessorInterface * | textproc, |
const DocumentClassDetectorInterface * | detector, | ||
ErrorBufferInterface * | errorhnd | ||
) |
Get the standard content statistics.
DocumentClassDetectorInterface* strus::createDetector_std | ( | const TextProcessorInterface * | textproc, |
ErrorBufferInterface * | errorhnd | ||
) |
Get the standard content detector (with ownership)
DocumentAnalyzerInstanceInterface* strus::createDocumentAnalyzer | ( | const TextProcessorInterface * | textproc, |
const SegmenterInterface * | segmenter, | ||
const analyzer::SegmenterOptions & | opts, | ||
ErrorBufferInterface * | errorhnd | ||
) |
Creates a parameterizable analyzer instance for analyzing documents.
[in] | segmenter | segmenter type to be used by the created analyzer. |
[in] | textproc | text processor for creating functions and resources needed for analysis |
[in] | segmenter | segmenter type |
[in] | opts | options for the segmenter |
[in] | errorhnd | error buffer interface |
DocumentAnalyzerMapInterface* strus::createDocumentAnalyzerMap | ( | const AnalyzerObjectBuilderInterface * | objbuilder, |
ErrorBufferInterface * | errorhnd | ||
) |
Creates a analyzer map for bundling different instances of analyzers for different classes of documents.
[in] | objbuilder | analyzer object builder interface |
[in] | errorhnd | error buffer interface |
NormalizerFunctionInterface* strus::createNormalizer_charselect | ( | ErrorBufferInterface * | errorhnd | ) |
Get the normalizer that returns the selection of characters defined by named sets as result.
NormalizerFunctionInterface* strus::createNormalizer_convdia | ( | ErrorBufferInterface * | errorhnd | ) |
Get the normalizer that returns the conversion of diacritical characters to ascii of the input as result.
NormalizerFunctionInterface* strus::createNormalizer_date2int | ( | ErrorBufferInterface * | errorhnd | ) |
Get the normalizer that returns the conversion of the input date as number (various units configurable base)
NormalizerFunctionInterface* strus::createNormalizer_dictmap | ( | ErrorBufferInterface * | errorhnd | ) |
Get the normalizer that returns the mapping of the input with a dictionary as result.
NormalizerFunctionInterface* strus::createNormalizer_lowercase | ( | ErrorBufferInterface * | errorhnd | ) |
Get the normalizer that returns the lower case of the input as result.
NormalizerFunctionInterface* strus::createNormalizer_ngram | ( | ErrorBufferInterface * | errorhnd | ) |
Get the normalizer that returns the ngrams of the input as result.
NormalizerFunctionInterface* strus::createNormalizer_regex | ( | ErrorBufferInterface * | errorhnd | ) |
Get the normalizer that returns the mapping of the input with help of regular expressions as result.
NormalizerFunctionInterface* strus::createNormalizer_snowball | ( | ErrorBufferInterface * | errorhnd | ) |
Get the normalizer that returns the stemming of the input with the snowball stemmer as result.
NormalizerFunctionInterface* strus::createNormalizer_substrindex | ( | ErrorBufferInterface * | errorhnd | ) |
Get the normalizer that returns the input trimmed as result.
NormalizerFunctionInterface* strus::createNormalizer_substrmap | ( | ErrorBufferInterface * | errorhnd | ) |
NormalizerFunctionInterface* strus::createNormalizer_trim | ( | ErrorBufferInterface * | errorhnd | ) |
Get the normalizer that returns the input trimmed as result.
NormalizerFunctionInterface* strus::createNormalizer_uppercase | ( | ErrorBufferInterface * | errorhnd | ) |
Get the normalizer that returns the upper case of the input as result.
NormalizerFunctionInterface* strus::createNormalizer_wordjoin | ( | ErrorBufferInterface * | errorhnd | ) |
Get the normalizer that returns the words tokenized from the input joined.
PatternLexerInterface* strus::createPatternLexer_test | ( | ErrorBufferInterface * | errorhnd | ) |
Create the interface for regular expression matching usable as groud truth for testing.
PatternMatcherInterface* strus::createPatternMatcher_test | ( | ErrorBufferInterface * | errorhnd | ) |
Create the interface for pattern matching usable as groud truth for testing.
PatternSerializer* strus::createPatternSerializer | ( | const std::string & | filename, |
const PatternSerializerType & | serializerType, | ||
ErrorBufferInterface * | errorhnd | ||
) |
Create a serializer of patterns loaded.
[in] | filename | path to file where to write the output to |
[in] | serializerType | type of serialization |
[in] | errorhnd | error buffer interface |
PatternSerializer* strus::createPatternSerializerText | ( | std::ostream & | output, |
const PatternSerializerType & | serializerType, | ||
ErrorBufferInterface * | errorhnd | ||
) |
Create a serializer of patterns loaded as text to a stream.
[in] | output | where to print text output to |
[in] | serializerType | type of serialization |
[in] | errorhnd | error buffer interface |
PatternTermFeederInterface* strus::createPatternTermFeeder_default | ( | ErrorBufferInterface * | errorhnd | ) |
Create the term feeder interface for pattern matching on analyzer output as input.
[in] | errorhnd | error buffer interface |
PosTaggerInterface* strus::createPosTagger_standard | ( | ErrorBufferInterface * | errorhnd | ) |
Create an interface for the construction of a POS tagger instance for a specified segmenter.
[in] | errorhnd | error buffer interface for exceptions thrown |
PosTaggerDataInterface* strus::createPosTaggerData_standard | ( | TokenizerFunctionInstanceInterface * | tokenizer, |
ErrorBufferInterface * | errorhnd | ||
) |
Create an interface for building up the data to tag documents with.
[in] | tokenizer | tokenizer interface to use (passed with ownership) |
[in] | errorhnd | error buffer interface for exceptions thrown |
QueryAnalyzerInstanceInterface* strus::createQueryAnalyzer | ( | ErrorBufferInterface * | errorhnd | ) |
Creates a parameterizable analyzer instance for analyzing queries.
[in] | errorhnd | error buffer interface |
SegmenterInterface* strus::createSegmenter_cjson | ( | ErrorBufferInterface * | errorhnd | ) |
Get a document JSON segmenter based on cjson.
SegmenterInterface* strus::createSegmenter_plain | ( | ErrorBufferInterface * | errorhnd | ) |
Get a document plain text segmenter.
SegmenterInterface* strus::createSegmenter_textwolf | ( | ErrorBufferInterface * | errorhnd | ) |
Get a document XML segmenter based on textwolf.
SegmenterInterface* strus::createSegmenter_tsv | ( | ErrorBufferInterface * | errorhnd | ) |
Get a document segmenter using tab-separated files as input.
TextProcessorInterface* strus::createTextProcessor | ( | const FileLocatorInterface * | filelocator, |
ErrorBufferInterface * | errorhnd | ||
) |
Create a text processor.
TokenizerFunctionInterface* strus::createTokenizer_langtoken | ( | ErrorBufferInterface * | errorhnd | ) |
Get the tokenizer type that creates the tokenization as splitting of all tokens, returning sequnces of language characters as tokens and word boundary delimiters as single character.
TokenizerFunctionInterface* strus::createTokenizer_punctuation | ( | ErrorBufferInterface * | errorhnd | ) |
Get the tokenizer type that creates the tokenization of punctuation elements in the input.
TokenizerFunctionInterface* strus::createTokenizer_regex | ( | ErrorBufferInterface * | errorhnd | ) |
Get the tokenizer type that creates the tokenization with help of regular expressions.
TokenizerFunctionInterface* strus::createTokenizer_textcat | ( | const TextProcessorInterface * | textproc, |
ErrorBufferInterface * | errorhnd | ||
) |
Get the tokenizer type that creates the tokenization of words in a recognized language.
TokenizerFunctionInterface* strus::createTokenizer_whitespace | ( | ErrorBufferInterface * | errorhnd | ) |
Get the tokenizer type that creates the tokenization as splitting of the input by whitespaces.
TokenizerFunctionInterface* strus::createTokenizer_word | ( | ErrorBufferInterface * | errorhnd | ) |
Get the tokenizer type that creates the tokenization of words in the input.
TokenMarkupInstanceInterface* strus::createTokenMarkupInstance_standard | ( | ErrorBufferInterface * | errorhnd | ) |
Create the interface for markup of tokens in a document text.
bool strus::is_DocumentAnalyzer_program | ( | const std::string & | source, |
ErrorBufferInterface * | errorhnd | ||
) |
Test if a file is an analyzer program file.
[in] | filename | name of the file to load |
[in,out] | errorhnd | buffer for reporting errors (exceptions) |
bool strus::is_DocumentAnalyzer_programfile | ( | const TextProcessorInterface * | textproc, |
const std::string & | filename, | ||
ErrorBufferInterface * | errorhnd | ||
) |
Test if a file is an analyzer program file.
[in] | textproc | text processor interface to determine the path of the filename |
[in] | filename | name of the file to load |
[in,out] | errorhnd | buffer for reporting errors (exceptions) |
bool strus::isPatternSerializerContent | ( | const std::string & | m_itrcontent, |
ErrorBufferInterface * | errorhnd | ||
) |
Evaluate, if a content is a pattern serialization.
[in] | content | content to check |
bool strus::load_DocumentAnalyzer_program_std | ( | DocumentAnalyzerInstanceInterface * | analyzer, |
const TextProcessorInterface * | textproc, | ||
const std::string & | content, | ||
ErrorBufferInterface * | errorhnd | ||
) |
Load a program given as source without includes to a document analyzer.
[in,out] | analyzer | analyzer object to instrument |
[in] | source | source with definitions |
[in,out] | errorhnd | buffer for reporting errors (exceptions) |
bool strus::load_DocumentAnalyzer_programfile_std | ( | DocumentAnalyzerInstanceInterface * | analyzer, |
const TextProcessorInterface * | textproc, | ||
const std::string & | filename, | ||
ErrorBufferInterface * | errorhnd | ||
) |
Load a program given as source file name to a document analyzer, recursively expanding include directives (C preprocessor style) at the beginning of the source to load.
[in,out] | analyzer | analyzer object to instrument |
[in] | filename | name of the file to load |
[in,out] | errorhnd | buffer for reporting errors (exceptions) |
bool strus::load_DocumentAnalyzerMap_program | ( | DocumentAnalyzerMapInterface * | analyzermap, |
const TextProcessorInterface * | textproc, | ||
const std::string & | source, | ||
ErrorBufferInterface * | errorhnd | ||
) |
Load a map of definitions describing how different document types are mapped to an analyzer program from its source.
[in,out] | analyzermap | map of analyzers to instrument |
[in] | textproc | text processor interface to determine the path of filenames |
[in] | source | source with definitions |
[in,out] | errorhnd | buffer for reporting errors (exceptions) |
bool strus::load_DocumentAnalyzerMap_programfile | ( | DocumentAnalyzerMapInterface * | analyzermap, |
const TextProcessorInterface * | textproc, | ||
const std::string & | filename, | ||
ErrorBufferInterface * | errorhnd | ||
) |
Load a map of definitions describing how different document types are mapped to an analyzer program from a file.
[in,out] | analyzermap | map of analyzers to instrument |
[in] | textproc | text processor interface to determine the path of filenames |
[in] | filename | source with definitions |
[in,out] | errorhnd | buffer for reporting errors (exceptions) |
bool strus::load_PatternMatcher_program | ( | const TextProcessorInterface * | textproc, |
PatternTermFeederInstanceInterface * | feeder, | ||
PatternMatcherInstanceInterface * | matcher, | ||
const std::string & | content, | ||
ErrorBufferInterface * | errorhnd | ||
) |
Load a pattern matcher program with a term feeder from source.
[in] | textproc | text processor interface to determine the path of filenames |
[in,out] | feeder | term feeder to instrument |
[in,out] | matcher | pattern matcher to instrument |
[in] | content | source with definitions |
[in,out] | errorhnd | buffer for reporting errors (exceptions) |
bool strus::load_PatternMatcher_program | ( | const TextProcessorInterface * | textproc, |
PatternLexerInstanceInterface * | lexer, | ||
PatternMatcherInstanceInterface * | matcher, | ||
const std::string & | content, | ||
ErrorBufferInterface * | errorhnd | ||
) |
Load a pattern matcher program with a lexer from source.
[in] | textproc | text processor interface to determine the path of filenames |
[in,out] | lexer | tokenization for the pattern matcher |
[in,out] | matcher | pattern matcher to instrument |
[in] | content | source with definitions |
[in,out] | errorhnd | buffer for reporting errors (exceptions) |
bool strus::load_PatternMatcher_programfile | ( | const TextProcessorInterface * | textproc, |
PatternTermFeederInstanceInterface * | feeder, | ||
PatternMatcherInstanceInterface * | matcher, | ||
const std::string & | filename, | ||
ErrorBufferInterface * | errorhnd | ||
) |
Load a pattern matcher program with a term feeder from a resource file.
[in] | textproc | text processor interface to determine the path of filenames |
[in,out] | feeder | term feeder to instrument |
[in,out] | matcher | pattern matcher to instrument |
[in] | filename | file name of source with definitions |
[in,out] | errorhnd | buffer for reporting errors (exceptions) |
bool strus::load_PatternMatcher_programfile | ( | const TextProcessorInterface * | textproc, |
PatternLexerInstanceInterface * | lexer, | ||
PatternMatcherInstanceInterface * | matcher, | ||
const std::string & | filename, | ||
ErrorBufferInterface * | errorhnd | ||
) |
Load a pattern matcher program with a lexer from a resource file.
[in] | textproc | text processor interface to determine the path of filenames |
[in,out] | lexer | tokenization for the pattern matcher |
[in,out] | matcher | pattern matcher to instrument |
[in] | filename | file name of source with definitions |
[in,out] | errorhnd | buffer for reporting errors (exceptions) |
bool strus::load_QueryAnalyzer_program_std | ( | QueryAnalyzerInstanceInterface * | analyzer, |
const TextProcessorInterface * | textproc, | ||
const std::string & | content, | ||
ErrorBufferInterface * | errorhnd | ||
) |
Load a program given as source without includes to a document analyzer.
[in,out] | analyzer | analyzer object to instrument |
[in] | source | source with definitions |
[in,out] | errorhnd | buffer for reporting errors (exceptions) |
bool strus::load_QueryAnalyzer_programfile_std | ( | QueryAnalyzerInstanceInterface * | analyzer, |
const TextProcessorInterface * | textproc, | ||
const std::string & | filename, | ||
ErrorBufferInterface * | errorhnd | ||
) |
Load a program given as source file name to a query analyzer, recursively expanding include directives (C preprocessor style) at the beginning of the source to load.
[in,out] | analyzer | analyzer object to instrument |
[in] | filename | name of the file to load |
[in,out] | errorhnd | buffer for reporting errors (exceptions) |
bool strus::loadPatternMatcherFromSerialization | ( | const std::string & | source, |
PatternLexerInstanceInterface * | lexer, | ||
PatternMatcherInstanceInterface * | matcher, | ||
ErrorBufferInterface * | errorhnd | ||
) |
Instantiate pattern matching interfaces from serialization.
[in] | source | source content (not a filename!) to read the input from |
[in] | lexer | pattern lexer instance interface to instantiate from deserialization |
[in] | matcher | pattern matcher instance interface to instantiate from deserialization |
[in] | errorhnd | error buffer interface |
bool strus::loadPatternMatcherFromSerialization | ( | const std::string & | source, |
PatternTermFeederInstanceInterface * | feeder, | ||
PatternMatcherInstanceInterface * | matcher, | ||
ErrorBufferInterface * | errorhnd | ||
) |
Instantiate pattern matching interfaces from serialization.
[in] | source | source content (not a filename!) to read the input from |
[in] | feeder | pattern term feeder instance interface to instantiate from deserialization |
[in] | matcher | pattern matcher instance interface to instantiate from deserialization |
[in] | errorhnd | error buffer interface |
std::string strus::markupDocumentTags | ( | const analyzer::DocumentClass & | documentClass, |
const std::string & | content, | ||
const std::vector< DocumentTagMarkupDef > & | markups, | ||
const TextProcessorInterface * | textproc, | ||
ErrorBufferInterface * | errorhnd | ||
) |
Analyze a content and put markups on every tag matching an expression.
[in] | documentClass | document class of the content with the encoding specified |
[in] | content | the content to process |
[in] | markups | array of definitions for markup |
[in] | textproc | text processor interface |
[in] | errorhnd | error buffer for reporting errors/exceptions |
analyzer::DocumentClass strus::parse_DocumentClass | ( | const std::string & | src, |
ErrorBufferInterface * | errorhnd | ||
) |
parse the document class from source
[in] | src | document class definition as string |
[in,out] | errorhnd | interface for reporting errors and exceptions occurred |
std::vector<std::string> strus::splitJsonDocumentList | ( | const std::string & | encoding, |
const std::string & | content, | ||
ErrorBufferInterface * | errorhnd | ||
) |