NEXUS CLASS LIBRARY home | classes | functions

Class NxsToken

Enums

NxsTokenFlags

Data Members

atEOF, atEOL, comment, errormsg, filecol, fileline, filepos, in, labileFlags, punctuation[21], saved, special, token, whitespace[4]

Member Functions

Abbreviation, AppendToComment, AppendToToken, AtEOF, AtEOL, Begins, BlanksToUnderscores, Equals, GetComment, GetCurlyBracketedToken, GetDoubleQuotedToken, GetFileColumn, GetFileLine, GetFilePosition, GetNextChar, GetNextToken, GetParentheticalToken, GetQuoted, GetToken, GetTokenAsCStr, GetTokenLength, GetTokenReference, IsPlusMinusToken, IsPunctuation, IsPunctuationToken, IsWhitespace, IsWhitespaceToken, NxsToken, ~NxsToken, OutputComment, ReplaceToken, ResetToken, SetLabileFlagBit, SetSpecialPunctuationCharacter, StoppedOn, StripWhitespace, ToUpper, Write, Writeln

Class Description

NxsToken objects are used by NxsReader to extract words (tokens) from a NEXUS data file. NxsToken objects know to correctly skip NEXUS comments and understand NEXUS punctuation, making reading a NEXUS file as simple as repeatedly calling the GetNextToken() function and then interpreting the token returned. If the token object is not attached to an input stream, calls to GetNextToken() will have no effect. If the token object is not attached to an output stream, output comments will be discarded (i.e., not output anywhere) and calls to Write or Writeln will be ineffective. If input and output streams have been attached to the token object, however, tokens are read one at a time from the input stream, and comments are correctly read and either written to the output stream (if an output comment) or ignored (if not an output comment). Sequences of characters surrounded by single quotes are read in as single tokens. A pair of adjacent single quotes are stored as a single quote, and underscore characters are stored as blanks.

Key to symbols and colors

public, protected, private, A = abstract, C = constructor, D = destructor, I = inline, S = static, V = virtual, F = friend

 

Enums
enum NxsTokenFlags
  saveCommandComments = 0x0001
    if set, command comments of the form [&X] are not ignored but are instead saved as regular tokens (without the square brackets, however)
  parentheticalToken = 0x0002
    if set, and if next character encountered is a left parenthesis, token will include everything up to the matching right parenthesis
  curlyBracketedToken = 0x0004
    if set, and if next character encountered is a left curly bracket, token will include everything up to the matching right curly bracket
  doubleQuotedToken = 0x0008
    if set, grabs entire phrase surrounded by double quotes
  singleCharacterToken = 0x0010
    if set, next non-whitespace character returned as token
  newlineIsToken = 0x0020
    if set, newline character treated as a token and atEOL set if newline encountered
  tildeIsPunctuation = 0x0040
    if set, tilde character treated as punctuation and returned as a separate token
  useSpecialPunctuation = 0x0080
    if set, character specified by the data member special is treated as punctuation and returned as a separate token
  hyphenNotPunctuation = 0x0100
    if set, the hyphen character is not treated as punctutation (it is normally returned as a separate token)
  preserveUnderscores = 0x0200
    if set, underscore characters inside tokens are not converted to blank spaces (normally, all underscores are automatically converted to blanks)
  ignorePunctuation = 0x0400
    if set, the normal punctuation symbols are treated the same as any other darkspace characters

 

Data Members
     bool   atEOF
       
true if end of file has been encountered
     bool   atEOL
       
true if newline encountered while newlineIsToken labile flag set
     NxsString   comment
       
temporary buffer used to store output comments while they are being built
     NxsString   errormsg
       
     long   filecol
       
current column in current line (refers to column immediately following token just read)
     long   fileline
       
current file line
     file_pos   filepos
       
current file position (for Metrowerks compiler, type is streampos rather than long)
     istream   &in
       
reference to input stream from which tokens will be read
     int   labileFlags
       
storage for flags in the NxsTokenFlags enum
     char   punctuation[21]
       
stores the 20 NEXUS punctuation characters
     char   saved
       
either '' or is last character read from input stream
     char   special
       
ad hoc punctuation character; default value is ''
     NxsString   token
       
the character buffer used to store the current token
     char   whitespace[4]
       
stores the 3 whitespace characters: blank space, tab and newline

 

Member Functions
    bool   Abbreviation(NxsString s)
       
Returns true if token begins with the capitalized portion of s and, if token is longer than s, the remaining characters match those in the lower-case portion of s. The comparison is case insensitive. This function should be used instead of the Begins function if you wish to allow for abbreviations of commands and also want to ensure that user does not type in a word that does not correspond to any command.
I   void   AppendToComment(char ch)
       
Adds ch to end of comment NxsString.
I   void   AppendToToken(char ch)
       
Adds ch to end of current token.
I   bool   AtEOF()
       
Returns true if and only if last call to GetNextToken encountered the end-of-file character (or for some reason the input stream is now out of commission).
I   bool   AtEOL()
       
Returns true if and only if last call to GetNextToken encountered the newline character while the newlineIsToken labile flag was in effect.
    bool   Begins(NxsString s, bool respect_case)
       
Returns true if token NxsString begins with the NxsString s. This function should be used instead of the Equals function if you wish to allow for abbreviations of commands.
I   void   BlanksToUnderscores()
       
Converts all blanks in token to underscore characters. Normally, underscores found in the tokens read from a NEXUS file are converted to blanks automatically as they are read; this function reverts the blanks back to underscores.
    bool   Equals(NxsString s, bool respect_case)
       
Returns true if token NxsString exactly equals s. If abbreviations are to be allowed, either Begins or Abbreviation should be used instead of Equals.
    void   GetComment()
       
Reads rest of comment (starting '[' already input) and acts accordingly. If comment is an output comment, and if an output stream has been attached, writes the output comment to the output stream. Otherwise, output comments are simply ignored like regular comments. If the labileFlag bit saveCommandComments is in effect, the comment (without the square brackets) will be stored in token.
    void   GetCurlyBracketedToken()
       
Reads rest of a token surrounded with curly brackets (the starting '{' has already been input) up to and including the matching '}' character. All nested curly-bracketed phrases will be included.
    void   GetDoubleQuotedToken()
       
Gets remainder of a double-quoted NEXUS word (the first double quote character was read in already by GetNextToken). This function reads characters until the next double quote is encountered. Tandem double quotes within a double-quoted NEXUS word are not allowed and will be treated as the end of the first word and the beginning of the next double-quoted NEXUS word. Tandem single quotes inside a double-quoted NEXUS word are saved as two separate single quote characters; to embed a single quote inside a double-quoted NEXUS word, simply use the single quote by itself (not paired with another tandem single quote).
I   long   GetFileColumn()
       
Returns value stored in filecol, which keeps track of the current column in the data file (i.e., number of characters since the last new line was encountered).
I   long   GetFileLine()
       
Returns value stored in fileline, which keeps track of the current line in the data file (i.e., number of new lines encountered thus far).
I   file_pos   GetFilePosition()
       
Returns value stored in filepos, which keeps track of the current position in the data file (i.e., number of characters since the beginning of the file). Note: for Metrowerks compiler, you must use the offset() method of the streampos class to use the value returned.
I   char   GetNextChar()
       
Reads next character from in and does all of the following before returning it to the calling function:
  • if character read is either a carriage return or line feed, the variable line is incremented by one and the
  • variable col is reset to zero
  • if character read is a carriage return, and a peek at the next character to be read reveals that it is a line
  • feed, then the next (line feed) character is read
  • if either a carriage return or line feed is read, the character returned to the calling function is ' ' if
  • character read is neither a carriage return nor a line feed, col is incremented by one and the character is returned as is to the calling function
  • in all cases, the variable filepos is updated using a call to the tellg function of istream.
    void   GetNextToken()
       
Reads characters from in until a complete token has been read and stored in token. GetNextToken performs a number of useful operations in the process of retrieving tokens: o any underscore characters encountered are stored as blank spaces (unless the labile flag bit preserveUnderscores is set) o if the first character of the next token is an isolated single quote, then the entire quoted NxsString is saved as the next token o paired single quotes are automatically converted to single quotes before being stored o comments are handled automatically (normal comments are treated as whitespace and output comments are passed to the function OutputComment which does nothing in the NxsToken class but can be overridden in a derived class to handle these in an appropriate fashion) o leading whitespace (including comments) is automatically skipped o if the end of the file is reached on reading this token, the atEOF flag is set and may be queried using the AtEOF member function o punctuation characters are always returned as individual tokens (see the Maddison, Swofford, and Maddison paper for the definition of punctuation characters) unless the flag ignorePunctuation is set in labileFlags, in which case the normal punctuation symbols are treated just like any other darkspace character. The behavior of GetNextToken may be altered by using labile flags. For example, the labile flag saveCommandComments can be set using the member function SetLabileFlagBit. This will cause comments of the form [&X] to be saved as tokens (without the square brackets), but only for the aquisition of the next token. Labile flags are cleared after each application.
    void   GetParentheticalToken()
       
Reads rest of parenthetical token (starting '(' already input) up to and including the matching ')' character. All nested parenthetical phrases will be included.
    void   GetQuoted()
       
Gets remainder of a quoted NEXUS word (the first single quote character was read in already by GetNextToken). This function reads characters until the next single quote is encountered. An exception occurs if two single quotes occur one after the other, in which case the function continues to gather characters until an isolated single quote is found. The tandem quotes are stored as a single quote character in the token NxsString.
I   NxsString   GetToken(bool respect_case)
       
Returns the data member token. Specifying false forrespect_case parameter causes all characters in token to be converted to upper case before token is returned. Specifying true results in GetToken returning exactly what it read from the file.
I   const   *GetTokenAsCStr(bool respect_case)
       
Returns the data member token as a C-style string. Specifying false forrespect_case parameter causes all characters in token to be converted to upper case before the token C-string is returned. Specifying true results in GetTokenAsCStr returning exactly what it read from the file.
I   int   GetTokenLength()
       
Returns token.size().
I   const   &GetTokenReference()
       
Returns the token for functions that only need read only access - faster than GetToken.
I   bool   IsPlusMinusToken()
       
Returns true if current token is a single character and this character is either '+' or '-'.
I   bool   IsPunctuation(char ch)
       
Returns true if character supplied is considered a punctuation character. The following twenty characters are considered punctuation characters:
 ()[]{}/,;:=*'"`+-<>
Exceptions:
  • The tilde character ('~') is also considered punctuation if the tildeIsPunctuation labile flag is set
  • The special punctuation character (specified using the SetSpecialPunctuationCharacter) is also considered
  • punctuation if the useSpecialPunctuation labile flag is set
  • The hyphen (i.e., minus sign) character ('-') is not considered punctuation if the hyphenNotPunctuation
  • labile flag is set
Use the SetLabileFlagBit method to set one or more NxsLabileFlags flags in labileFlags
I   bool   IsPunctuationToken()
       
Returns true if current token is a single character and this character is a punctuation character (as defined in IsPunctuation function).
I   bool   IsWhitespace(char ch)
       
Returns true if character supplied is considered a whitespace character. Note: treats ' ' as darkspace if labile flag newlineIsToken is in effect.
I   bool   IsWhitespaceToken()
       
Returns true if current token is a single character and this character is a whitespace character (as defined in IsWhitespace function).
C     NxsToken(istream &i)
       
Sets atEOF and atEOL to false, comment and token to the empty string, filecol and fileline to 1, filepos to 0, labileFlags to 0 and saved and special to the null character. Initializes the istream reference data member in to the supplied istream i.
D     ~NxsToken()
       
Nothing needs to be done; all objects take care of deleting themselves.
IV   void   OutputComment(const NxsString &msg)
       
This function is called whenever an output comment (i.e., a comment beginning with an exclamation point) is found in the data file. This version of OutputComment does nothing; override this virtual function to display the output comment in the most appropriate way for the platform you are supporting.
I   void   ReplaceToken(const NxsString s)
       
Replaces current token NxsString with s.
I   void   ResetToken()
       
Sets token to the empty NxsString ("").
I   void   SetLabileFlagBit(int bit)
       
Sets the bit specified in the variable labileFlags. The available bits are specified in the NxsTokenFlags enum. All bits in labileFlags are cleared after each token is read.
I   void   SetSpecialPunctuationCharacter(char c)
       
Sets the special punctuation character to c. If the labile bit useSpecialPunctuation is set, this character will be added to the standard list of punctuation symbols, and will be returned as a separate token like the other punctuation characters.
I   bool   StoppedOn(char ch)
       
Checks character stored in the variable saved to see if it matches supplied character ch. Good for checking such things as whether token stopped reading characters because it encountered a newline (and labileFlags bit newlineIsToken was set):
 StoppedOn('
');
or whether token stopped reading characters because of a punctuation character such as a comma:
 StoppedOn(',');
    void   StripWhitespace()
       
Strips whitespace from currently-stored token. Removes leading, trailing, and embedded whitespace characters.
    void   ToUpper()
       
Converts all alphabetical characters in token to upper case.
I   void   Write(ostream &out)
       
Simply outputs the current NxsString stored in token to the output stream out. Does not send a newline to the output stream afterwards.
I   void   Writeln(ostream &out)
       
Simply outputs the current NxsString stored in token to the output stream out. Sends a newline to the output stream afterwards.