textwolf  0.2
Classes | Public Types | Public Member Functions | Static Public Member Functions | List of all members
textwolf::XMLScanner< InputIterator, InputCharSet_, OutputCharSet_, OutputBuffer_ > Class Template Reference

XML scanner template that adds the functionality to the statemachine base definition. More...

#include <xmlscanner.hpp>

Inheritance diagram for textwolf::XMLScanner< InputIterator, InputCharSet_, OutputCharSet_, OutputBuffer_ >:
textwolf::XMLScannerBase

Classes

class  End
 end of input tag More...
 
class  iterator
 input iterator for iterating on the output of an XML scanner More...
 

Public Types

typedef InputCharSet_ InputCharSet
 
typedef OutputCharSet_ OutputCharSet
 
typedef TextScanner
< InputIterator, InputCharSet_ > 
InputReader
 
typedef XMLScanner
< InputIterator, InputCharSet_,
OutputCharSet_, OutputBuffer_ > 
ThisXMLScanner
 
typedef std::map< const char
*, UChar
EntityMap
 
typedef OutputBuffer_ OutputBuffer
 
- Public Types inherited from textwolf::XMLScannerBase
enum  ElementType {
  None, ErrorOccurred, HeaderStart, HeaderAttribName,
  HeaderAttribValue, HeaderEnd, DocAttribValue, DocAttribEnd,
  TagAttribName, TagAttribValue, OpenTag, CloseTag,
  CloseTagIm, Content, Exit
}
 Enumeration of XML element types returned by an XML scanner. More...
 
enum  { NofElementTypes =Exit+1 }
 
enum  Error {
  Ok, ErrIllegalDocumentAttributeDef, ErrExpectedOpenTag, ErrExpectedXMLTag,
  ErrUnexpectedEndOfText, ErrSyntaxToken, ErrStringNotTerminated, ErrUndefinedCharacterEntity,
  ErrExpectedTagEnd, ErrExpectedEqual, ErrExpectedTagAttribute, ErrExpectedCDATATag,
  ErrInternal, ErrUnexpectedEndOfInput, ErrExpectedEndOfLine, ErrExpectedDash2
}
 Enumeration of XML scanner error codes. More...
 
enum  STMState {
  START, STARTTAG, XTAG, PITAG,
  PITAGEND, XTAGEND, XTAGDONE, XTAGAISK,
  XTAGANAM, XTAGAESK, XTAGAVSK, XTAGAVID,
  XTAGAVSQ, XTAGAVDQ, XTAGAVQE, DOCSTART,
  CONTENT, TOKEN, SEEKTOK, XMLTAG,
  OPENTAG, CLOSETAG, TAGCLSK, TAGAISK,
  TAGANAM, TAGAESK, TAGAVSK, TAGAVID,
  TAGAVSQ, TAGAVDQ, TAGAVQE, TAGCLIM,
  ENTITYSL, ENTITY, ENTITYE, ENTITYID,
  ENTITYSQ, ENTITYDQ, ENTITYLC, COMDASH2,
  COMSEEKE, COMENDD2, COMENDCL, CDATA,
  CDATA1, CDATA2, CDATA3, EXIT
}
 Enumeration of states of the XML scanner state machine. More...
 
enum  STMAction {
  Return, ReturnWord, ReturnContent, ReturnIdentifier,
  ReturnSQString, ReturnDQString, ExpectIdentifierXML, ExpectIdentifierCDATA,
  ReturnEOF, NofSTMActions = 9
}
 Enumeration of actions in the XML scanner state machine. More...
 
typedef CharMap< bool, false,
NofControlCharacter
IsTokenCharMap
 Forms a set of characters by assigning (true/false) to the whole domain. More...
 

Public Member Functions

 XMLScanner (const InputIterator &p_src, const EntityMap &p_entityMap)
 Constructor. More...
 
 XMLScanner (const InputIterator &p_src)
 Constructor. More...
 
 XMLScanner (const InputCharSet &p_charset, const InputIterator &p_src, const EntityMap &p_entityMap)
 Constructor. More...
 
 XMLScanner (const InputCharSet &p_charset, const InputIterator &p_src)
 Constructor. More...
 
 XMLScanner (const InputCharSet &p_charset)
 Constructor. More...
 
 XMLScanner ()
 Default constructor. More...
 
 XMLScanner (const XMLScanner &o)
 Copy constructor. More...
 
template<class IteratorAssignment >
void setSource (const IteratorAssignment &a)
 Assign something to the source iterator while keeping the state. More...
 
std::size_t getPosition () const
 Get the current source iterator position. More...
 
std::size_t getTokenPosition () const
 Get the current token position. More...
 
const char * getItemPtr () const
 Get the current parsed XML element pointer, if it was not masked out, see nextItem(unsigned short) More...
 
std::size_t getItemSize () const
 Get the size of the current parsed XML element in bytes. More...
 
const OutputBuffergetItem () const
 Get the current parsed XML element, if it was not masked out, see nextItem(unsigned short) More...
 
ScannerStatemachine::ElementgetState ()
 Get the current XML scanner state machine state. More...
 
Error getError (const char **str=0)
 Get the last error. More...
 
const InputIterator & getIterator () const
 Get the iterator pointing to the current source position. More...
 
InputIterator & getIterator ()
 Get the iterator pointing to the current source position. More...
 
ElementType nextItem (unsigned short mask=0xFFFF)
 Scan the next XML element. More...
 
iterator begin (bool doSkipToFirst=true)
 Get begin iterator. More...
 
iterator end ()
 Get the pointer to the end of content. More...
 

Static Public Member Functions

template<class OutputBufferType >
static bool parseStaticToken (const IsTokenCharMap &isTok, InputReader ir, OutputBufferType &buf)
 Static version of parse a token for parsing table definition elements. More...
 
- Static Public Member Functions inherited from textwolf::XMLScannerBase
static const char * getElementTypeName (ElementType ee)
 Get the XML element type as string. More...
 
static const char * getErrorString (Error ee)
 Get the error code as string. More...
 
static const char * getStateString (STMState s)
 Get the scanner state machine state as string. More...
 
static const char * getActionString (STMAction a)
 Get the scanner state machine action as string. More...
 

Detailed Description

template<class InputIterator, class InputCharSet_, class OutputCharSet_, class OutputBuffer_>
class textwolf::XMLScanner< InputIterator, InputCharSet_, OutputCharSet_, OutputBuffer_ >

XML scanner template that adds the functionality to the statemachine base definition.

Template Parameters
InputIteratorinput iterator with ++ and read only * returning 0 als last character of the input
InputCharSet_character set encoding of the input, read as stream of bytes
OutputCharSet_character set encoding of the output, printed as string of the item type of the character set,
OutputBuffer_buffer for output with STL back insertion sequence interface (e.g. std::string,std::vector<char>,textwolf::StaticBuffer)

Member Typedef Documentation

template<class InputIterator , class InputCharSet_ , class OutputCharSet_ , class OutputBuffer_ >
typedef std::map<const char*,UChar> textwolf::XMLScanner< InputIterator, InputCharSet_, OutputCharSet_, OutputBuffer_ >::EntityMap
template<class InputIterator , class InputCharSet_ , class OutputCharSet_ , class OutputBuffer_ >
typedef InputCharSet_ textwolf::XMLScanner< InputIterator, InputCharSet_, OutputCharSet_, OutputBuffer_ >::InputCharSet
template<class InputIterator , class InputCharSet_ , class OutputCharSet_ , class OutputBuffer_ >
typedef TextScanner<InputIterator,InputCharSet_> textwolf::XMLScanner< InputIterator, InputCharSet_, OutputCharSet_, OutputBuffer_ >::InputReader
template<class InputIterator , class InputCharSet_ , class OutputCharSet_ , class OutputBuffer_ >
typedef OutputBuffer_ textwolf::XMLScanner< InputIterator, InputCharSet_, OutputCharSet_, OutputBuffer_ >::OutputBuffer
template<class InputIterator , class InputCharSet_ , class OutputCharSet_ , class OutputBuffer_ >
typedef OutputCharSet_ textwolf::XMLScanner< InputIterator, InputCharSet_, OutputCharSet_, OutputBuffer_ >::OutputCharSet
template<class InputIterator , class InputCharSet_ , class OutputCharSet_ , class OutputBuffer_ >
typedef XMLScanner<InputIterator,InputCharSet_,OutputCharSet_,OutputBuffer_> textwolf::XMLScanner< InputIterator, InputCharSet_, OutputCharSet_, OutputBuffer_ >::ThisXMLScanner

Constructor & Destructor Documentation

template<class InputIterator , class InputCharSet_ , class OutputCharSet_ , class OutputBuffer_ >
textwolf::XMLScanner< InputIterator, InputCharSet_, OutputCharSet_, OutputBuffer_ >::XMLScanner ( const InputIterator &  p_src,
const EntityMap p_entityMap 
)
inline

Constructor.

Parameters
[in]p_srcsource iterator
[in]p_entityMapread only map of named entities defined by the user
template<class InputIterator , class InputCharSet_ , class OutputCharSet_ , class OutputBuffer_ >
textwolf::XMLScanner< InputIterator, InputCharSet_, OutputCharSet_, OutputBuffer_ >::XMLScanner ( const InputIterator &  p_src)
inlineexplicit

Constructor.

Parameters
[in]p_srcsource iterator
template<class InputIterator , class InputCharSet_ , class OutputCharSet_ , class OutputBuffer_ >
textwolf::XMLScanner< InputIterator, InputCharSet_, OutputCharSet_, OutputBuffer_ >::XMLScanner ( const InputCharSet p_charset,
const InputIterator &  p_src,
const EntityMap p_entityMap 
)
inline

Constructor.

Parameters
[in]p_charsetcharacter set encoding of input in case of non default settings (code page) needed
[in]p_srcsource iterator
[in]p_entityMapread only map of named entities defined by the user
template<class InputIterator , class InputCharSet_ , class OutputCharSet_ , class OutputBuffer_ >
textwolf::XMLScanner< InputIterator, InputCharSet_, OutputCharSet_, OutputBuffer_ >::XMLScanner ( const InputCharSet p_charset,
const InputIterator &  p_src 
)
inline

Constructor.

Parameters
[in]p_charsetcharacter set encoding of input in case of non default settings (code page) needed
[in]p_srcsource iterator
template<class InputIterator , class InputCharSet_ , class OutputCharSet_ , class OutputBuffer_ >
textwolf::XMLScanner< InputIterator, InputCharSet_, OutputCharSet_, OutputBuffer_ >::XMLScanner ( const InputCharSet p_charset)
inlineexplicit

Constructor.

Parameters
[in]p_charsetcharacter set encoding of input in case of non default settings (code page) needed
template<class InputIterator , class InputCharSet_ , class OutputCharSet_ , class OutputBuffer_ >
textwolf::XMLScanner< InputIterator, InputCharSet_, OutputCharSet_, OutputBuffer_ >::XMLScanner ( )
inline

Default constructor.

template<class InputIterator , class InputCharSet_ , class OutputCharSet_ , class OutputBuffer_ >
textwolf::XMLScanner< InputIterator, InputCharSet_, OutputCharSet_, OutputBuffer_ >::XMLScanner ( const XMLScanner< InputIterator, InputCharSet_, OutputCharSet_, OutputBuffer_ > &  o)
inline

Copy constructor.

Parameters
[in]oscanner to copy

Member Function Documentation

template<class InputIterator , class InputCharSet_ , class OutputCharSet_ , class OutputBuffer_ >
iterator textwolf::XMLScanner< InputIterator, InputCharSet_, OutputCharSet_, OutputBuffer_ >::begin ( bool  doSkipToFirst = true)
inline

Get begin iterator.

Returns
iterator
Parameters
[in]doSkipToFirsttrue, if the iterator should skip to the first character of the input (default behaviour of STL conform iterators but maybe not exception save)
template<class InputIterator , class InputCharSet_ , class OutputCharSet_ , class OutputBuffer_ >
iterator textwolf::XMLScanner< InputIterator, InputCharSet_, OutputCharSet_, OutputBuffer_ >::end ( )
inline

Get the pointer to the end of content.

Returns
iterator
template<class InputIterator , class InputCharSet_ , class OutputCharSet_ , class OutputBuffer_ >
Error textwolf::XMLScanner< InputIterator, InputCharSet_, OutputCharSet_, OutputBuffer_ >::getError ( const char **  str = 0)
inline

Get the last error.

Parameters
[out]strthe error as string
Returns
the error code
template<class InputIterator , class InputCharSet_ , class OutputCharSet_ , class OutputBuffer_ >
const OutputBuffer& textwolf::XMLScanner< InputIterator, InputCharSet_, OutputCharSet_, OutputBuffer_ >::getItem ( ) const
inline

Get the current parsed XML element, if it was not masked out, see nextItem(unsigned short)

Returns
the item string
template<class InputIterator , class InputCharSet_ , class OutputCharSet_ , class OutputBuffer_ >
const char* textwolf::XMLScanner< InputIterator, InputCharSet_, OutputCharSet_, OutputBuffer_ >::getItemPtr ( ) const
inline

Get the current parsed XML element pointer, if it was not masked out, see nextItem(unsigned short)

Returns
the item string
template<class InputIterator , class InputCharSet_ , class OutputCharSet_ , class OutputBuffer_ >
std::size_t textwolf::XMLScanner< InputIterator, InputCharSet_, OutputCharSet_, OutputBuffer_ >::getItemSize ( ) const
inline

Get the size of the current parsed XML element in bytes.

Returns
the item string
template<class InputIterator , class InputCharSet_ , class OutputCharSet_ , class OutputBuffer_ >
const InputIterator& textwolf::XMLScanner< InputIterator, InputCharSet_, OutputCharSet_, OutputBuffer_ >::getIterator ( ) const
inline

Get the iterator pointing to the current source position.

template<class InputIterator , class InputCharSet_ , class OutputCharSet_ , class OutputBuffer_ >
InputIterator& textwolf::XMLScanner< InputIterator, InputCharSet_, OutputCharSet_, OutputBuffer_ >::getIterator ( )
inline

Get the iterator pointing to the current source position.

template<class InputIterator , class InputCharSet_ , class OutputCharSet_ , class OutputBuffer_ >
std::size_t textwolf::XMLScanner< InputIterator, InputCharSet_, OutputCharSet_, OutputBuffer_ >::getPosition ( ) const
inline

Get the current source iterator position.

Returns
source iterator position in character words (usually bytes)
template<class InputIterator , class InputCharSet_ , class OutputCharSet_ , class OutputBuffer_ >
ScannerStatemachine::Element* textwolf::XMLScanner< InputIterator, InputCharSet_, OutputCharSet_, OutputBuffer_ >::getState ( )
inline

Get the current XML scanner state machine state.

Returns
pointer to the state variables
template<class InputIterator , class InputCharSet_ , class OutputCharSet_ , class OutputBuffer_ >
std::size_t textwolf::XMLScanner< InputIterator, InputCharSet_, OutputCharSet_, OutputBuffer_ >::getTokenPosition ( ) const
inline

Get the current token position.

template<class InputIterator , class InputCharSet_ , class OutputCharSet_ , class OutputBuffer_ >
ElementType textwolf::XMLScanner< InputIterator, InputCharSet_, OutputCharSet_, OutputBuffer_ >::nextItem ( unsigned short  mask = 0xFFFF)
inline

Scan the next XML element.

Parameters
[in]maskelement types that should be printed to the output buffer (1 -> print, 0 -> mask out, just return the element as event)
Returns
the type of the XML element
template<class InputIterator , class InputCharSet_ , class OutputCharSet_ , class OutputBuffer_ >
template<class OutputBufferType >
static bool textwolf::XMLScanner< InputIterator, InputCharSet_, OutputCharSet_, OutputBuffer_ >::parseStaticToken ( const IsTokenCharMap isTok,
InputReader  ir,
OutputBufferType &  buf 
)
inlinestatic

Static version of parse a token for parsing table definition elements.

Template Parameters
OutputBufferTypetype buffer for output
Parameters
[in]isTokset of valid token characters
[in]irinput reader iterator
[out]bufbuffer where to write the result to
Returns
true on success
template<class InputIterator , class InputCharSet_ , class OutputCharSet_ , class OutputBuffer_ >
template<class IteratorAssignment >
void textwolf::XMLScanner< InputIterator, InputCharSet_, OutputCharSet_, OutputBuffer_ >::setSource ( const IteratorAssignment &  a)
inline

Assign something to the source iterator while keeping the state.

Parameters
[in]asource iterator assignment

The documentation for this class was generated from the following file: