NEXUS CLASS LIBRARY home | classes | functions

Nexus Class Library (version 2.0)

by Paul O. Lewis

Contents

What is the NCL?

The NEXUS Class Library (NCL) is an integrated collection of C++ classes designed to allow the user to quickly write a program that reads NEXUS-formatted data files. It also allows easy extension of the NEXUS format to include new blocks of your own design.

A word about the intended audience is in order before we get too far along (no need to waste your time if the NCL will not be helpful to you). The intended audience for both this documentation and the accompanying class library comprises computer programmers who wish to endow their C++ programs with the ability to read NEXUS data files. If you are not a programmer and simply use NEXUS files as a means of inputting data to the programs you use for analyzing your data, the NCL is not something that will be useful to you. The NCL is also not for you if you are a programmer but do not use the C++ language, since the NCL depends heavily on the object oriented programming features built into C++. There is no Java version of the NCL, nor is one planned. This is simply a reflection of the fact that I primarily program in C++ and only have time to write the library once.

The NEXUS data file format was specified in the publication cited below. Please read this paper for further information about the format specification itself; the documentation for the NCL does not attempt to explain the structure of a NEXUS data file.

Maddison, D. R., D. L. Swofford, and Wayne P. Maddison. 1997. NEXUS: an extensible file format for systematic information. Systematic Biology 46(4): 590-621.

The basic goal of the NCL is to provide a relatively easy way to endow a C++ program with the ability to read NEXUS data files. The steps necessary to use the NCL to create a bare-bones program that can read a NEXUS data file are simple and few (see the section entitled Building a NEXUS File Reader below), and it is hoped that the availability of this class library will encourage the use of the NEXUS format. This will in turn encourage consistency in how programs read NEXUS files and how programs respond to errors in data files.

There are a large number of special data file formats in use. This places an extra burden on the end user, who must deal with an increasing number of file formats all differing in a number of ways. To convert one's data file to another file format often involves manual manipulation of the data, an activity that is inherently dangerous and probably has resulted in the corruption of many data files. At the very least, the large number of formats in existance has led to a proliferation of data file variants. With many copies of a given data file on a hard disk, each formatted differently for various analysis programs, it becomes very easy to change one (say, correct a datum found to be in error) and then fail to correct the other versions. The NEXUS file format provides a means for keeping one master copy of the data and using it with several programs without modification. The NCL provides a means for encouraging programmers to use the NEXUS file format in future programs they write.

Back to Table of Contents

Obtaining the NCL?

Please visit the NCL project web page at SourceForge to download the current source code for the library.

Back to Table of Contents

Characteristics of the NCL

Portability

The NCL has been designed to be as portable as possible for a C++ class library. The NCL does make use of the ANSI Standard C++ Library (STL), but use of the STL is now common and should not cause problems for modern compilers/platforms.

Cross-platform features

I have attempted to create the NCL in such a way that one is not limited in the type of platform targeted. For example, NEXUS files can contain "output comments" that are supposed to be displayed in the output of the program reading the NEXUS file. Such comments are handled automatically by the NCL, and are sent to a virtual function that can be overridden by you in a derived class. This provides a means for you to tailor the output of such comments to the platform of your choice. For example, if you are writing a standard Linux console application (i.e., not a graphical X-Windows application), you might want such output comments to simply be sent to standard output or to an ofstream object. For a graphical Windows, MacIntosh or X-Windows application, you might deem it more user-friendly to pop up a message box with the output comment as the message. This would ensure that the user noticed the output comment. You also have the option of having your program completely ignore such comments in the data file.

The NCL provides similar hooks for noting the progress in reading the data file. For example, the virtual function EnteringBlock is called and provided with the name of the block about to be read. You can override EnteringBlock in your derived class to allow, for example, a message to be displayed in a status bar at the bottom of your program's main window (in a graphical application) indicating which block is currently being read. Other such virtual functions include SkippingBlock (to allow users to be warned that your program is ignoring a block in the data file), SkippingCommand (to allow users to be warned about particular commands being skipped within a block), and NexusError , which is the function called whenever anything unexpected happens when reading the file.

Extensibility

The basic tools provided in the NCL allow you to create your own NEXUS blocks and use them in your program. This makes it easy to define a private block to contain commands that only your program recognizes, allowing your users to run your program in batch mode (see the section below entitled General Advice for more information on this topic).

Back to Table of Contents

Current limitations

The main current limitation is that the NCL is incomplete. Some standard NEXUS blocks have been provided with this distribution, but because the NEXUS format is so extensive, even some of the standard blocks described in the paper cited above have not been implemented (or have been only incompletely implemented). Here is a summary table showing what has been implemented thus far:

Block

Current Limitations

ASSUMPTIONS Only TAXSETS, CHARSETS, and EXSETS have been implemented thus far.
ALLELES Cannot yet handle transposed MATRIX, and only DATAPOINT=STANDARD is implemented.
CHARACTERS Only ITEMS=STATES and STATESFORMAT=STATESPRESENT has been implemented thus far, and DATATYPE=CONTINUOUS has not been implemented.
DISTANCES No limitations, completely implemented
DATA Since the DATA block is essentially the same as a CHARACTERS block, the same limitations apply.
TAXA No limitations, completely implemented
TREES No limitations, completely implemented

While the limitations for the CHARACTERS block may seem a bit extreme, this block is nevertheless implemented to the point where almost all existing morphological and molecular data sets can be read.

The ALLELES block has not yet been used in any program to my knowledge. It is very similar to the GDADATA block used in my program GDA, but differs in requiring NEWPOPS to be specified if a TAXA block does not precede the ALLELES block (this is to make the ALLELES block more like the CHARACTERS block.

Some recent modifications of the NEXUS format implemented in Mesquite (e.g. LINK statements in the CHARACTERS block) and MrBayes (e.g. DATATYPE=MIXED in the DATA block) are not supported in the NCL at this time.

The NCL has been designed to be portable, easy-to-use, and informative in the error messages produced. It will be apparent to anyone who looks very closely at the code that some efficiency (both in executable size and speed) has been sacrificed to meet these goals.

Back to Table of Contents

Building a NEXUS File Reader

This section illustrates how you could build a simple NEXUS file reader application capable of reading in a TAXA and a TREES block. Note that the file nclsimplest.cpp contains all of the code for this example. To keep things simple, we will just write output to an ofstream object (nothing graphical here).

As you work through this example, feel free to look into the NCL classes in more detail. A class index as well as a member function index is provided at the top of this document for quick access.

The Main Function

int main(int argc, char *argv[])
	{
	taxa = new NxsTaxaBlock();
	trees = new NxsTreesBlock(taxa);

	MyReader nexus(argv[1], argv[2]);
	nexus.Add(taxa);
	nexus.Add(trees);

	MyToken token(nexus.inf, nexus.outf);
	nexus.Execute(token);

	taxa->Report(nexus.outf);
	trees->Report(nexus.outf);

	return 0;
	}

Creating block objects

The first two lines of the main function involve the creation of objects corresponding to the two types of NEXUS blocks we want our program to recognize. NxsTaxaBlock is declared in the header file nxstaxablock.h and defined in the source code file nxstaxablock.cpp, whereas the NxsTreesBlock class is declared in nxstreesblock.h and defined in nxstreesblock.cpp. Note that the NxsTreesBlock constructor requires a reference to an object of type NxsTaxaBlock. This is because the taxa labels in a TREES block should correspond to any taxa previously defined in a TAXA block. If no TAXA block precedes the TREES block, taxon labels defined in the TREES block will be used to populate the TAXA block. In the NCL, any block that defines taxon labels stores this information in the NxsTaxaBlock object, and any block that needs such information requires a reference to the NxsTaxaBlock object in its constructor.

Adding the block objects to the NxsReader object

The next three lines involve creating a NxsReader object and adding our two block objects to a linked list maintained by the NxsReader object. The MyNexusReader class is derived from the NxsReader class, which is declared in nxsreader.h and defined in nxsreader.cpp. Although a NxsReader object can be created and used, you will probably wish to derive a class from it (as I did in this example) and override some of the NxsReader virtual functions, such as EnteringBlock , SkippingBlock , and NexusError (the NxsReader version of these functions does nothing, and it is important to at least report errors in some way to your program's users).

The reason the NxsReader object must maintain a list of block ojects is so that it can figure out which one is responsible for reading each block found in the data file. The block objects taxa and trees have each inherited an id variable of type char * that stores their block name (i.e., "TAXA" for the TaxaBlock and "TREES" for the TreesBlock). When the Execute member function encounters a block name, it searches its linked list of block objects until it finds one whose id variable is identical to the name of the block encountered. It then calls the Read function of that block object to do the work of reading the block from the data file and storing its contents. It is possible of course that a block name will appear in a data file for which there is no corresponding block object. In this case, the Execute method calls the SkippingBlock method to report the fact that it is skipping over the contents of the unknown block.

Reading the data file

The next two lines create a token object (MyToken is derived from the NxsToken class), and initiate the reading of the NEXUS data file using the Execute function. The input and output files are created within the MyNexusReader class. While this is not required, it facilitates handling messages generated while the data file is being read. The NxsToken class has one virtual member function ( OutputComment ) which enables you to control how output comments are displayed. The NxsToken version of OutputComment does nothing, so you must derive your own token class from NxsToken and override the OutputComment method in order for the output comments in the data file to be displayed. The main function of the NxsToken class is to provide a means for grabbing separate NEXUS tokens (words separated by blank spaces or punctuation) one by one from the data file. Calling the GetNextToken function reads and stores the next token found in the data file, correctly handling any comments found along the way. This automatic comment handling greatly simplifies reading a NEXUS data file.

Reporting on block objects' contents

The last two lines call the Report functions of each of the blocks. This just spits out a summary of any data contained in these objects that has been read from the data file.

Back to Table of Contents

Deriving From the Nexus Class

Note that the ifstream is opened in binary mode. You should always open your input file in binary mode so that the file can be read properly regardless of the platform on which it was created. For example, suppose someone created a NEXUS data file on a MacIntosh and wanted to read it with your program, which is running on a Windows XP machine. Opening the file in binary mode allows the NxsToken object you are using to recognize the newline character in the Mac file as such, even though MacIntosh computers use a different symbol (ASCII 13) to represent the newline character than computers running Windows (which use the ASCII 13, ASCII 10 combination for newlines).

class MyReader : public NxsReader
	{
	public:
		ifstream inf;
		ofstream outf;

		MyReader(char *infname, char *outfname) : NxsReader()
			{
			inf.open(infname, ios::binary);
			outf.open(outfname);
			}

		~MyReader()
			{
			inf.close();
			outf.close();
			}

	void ExecuteStarting() {}
	void ExecuteStopping() {}

	bool EnteringBlock(NxsString blockName)
		{
		cout << "Reading \"" << blockName << "\" block..." << endl;
		outf << "Reading \"" << blockName << "\" block..." << endl;

		// Returning true means it is ok to delete any data associated with 
		// blocks of this type read in previously
		//
		return true;	
		}

	void SkippingBlock(NxsString blockName)
		{
		cout << "Skipping unknown block (" << blockName << ")..." << endl;
		outf << "Skipping unknown block (" << blockName << ")..." << endl;
		}

	void SkippingDisabledBlock(NxsString blockName) {}

	void OutputComment(const NxsString &msg)
		{
		outf << msg;
		}

	void NexusError(NxsString msg, file_pos pos, unsigned line, unsigned col)
		{
		cerr << endl;
		cerr << "Error found at line " << line;
		cerr << ", column " << col;
		cerr << " (file position " << pos << "):" << endl;
		cerr << msg << endl;

		outf << endl;
		outf << "Error found at line " << line;
		outf << ", column " << col;
		outf << " (file position " << pos << "):" << endl;
		outf << msg << endl;

		exit(0);
		}
	};

Back to Table of Contents

Deriving From the NxsToken Class

We derive our own token reader from the NxsToken class in order to display the output comments present in the data file (if any). The virtual function OutputComment in the base class is overridden to accomplish this.

class MyToken : public NxsToken
	{
	public:

		MyToken(istream &is, ostream &os) : out(os), NxsToken(is) {}

		void OutputComment(const NxsString &msg)
			{
			cout << msg << endl;
			out << msg << endl;
			}

	private:
		ostream &out;
	};

Back to Table of Contents

Putting It All Together

Here is the entire program. Note that in order for this to link properly, you will need to also compile the following files included with the NCL (and instruct your linker to link them into your main executable): nxsblock.cpp, nxsexception.cpp, nxsreader.cpp, nxsstring.cpp, nxstaxablock.cpp, nxstreesblock.cpp and nxstoken.cpp.

#include "ncl.h"

NxsTaxaBlock	*taxa	= NULL;
NxsTreesBlock	*trees	= NULL;

class MyReader : public NxsReader
	{
	public:
		ifstream inf;
		ofstream outf;

		MyReader(char *infname, char *outfname) : NxsReader()
			{
			inf.open(infname, ios::binary);
			outf.open(outfname);
			}

		~MyReader()
			{
			inf.close();
			outf.close();
			}

	void ExecuteStarting() {}
	void ExecuteStopping() {}

	bool EnteringBlock(NxsString blockName)
		{
		cout << "Reading \"" << blockName << "\" block..." << endl;
		outf << "Reading \"" << blockName << "\" block..." << endl;

		// Returning true means it is ok to delete any data associated with 
		// blocks of this type read in previously
		//
		return true;	
		}

	void SkippingBlock(NxsString blockName)
		{
		cout << "Skipping unknown block (" << blockName << ")..." << endl;
		outf << "Skipping unknown block (" << blockName << ")..." << endl;
		}

	void SkippingDisabledBlock(NxsString blockName) {}

	void OutputComment(const NxsString &msg)
		{
		outf << msg;
		}

	void NexusError(NxsString msg, file_pos pos, unsigned line, unsigned col)
		{
		cerr << endl;
		cerr << "Error found at line " << line;
		cerr << ", column " << col;
		cerr << " (file position " << pos << "):" << endl;
		cerr << msg << endl;

		outf << endl;
		outf << "Error found at line " << line;
		outf << ", column " << col;
		outf << " (file position " << pos << "):" << endl;
		outf << msg << endl;

		exit(0);
		}
	};

class MyToken : public NxsToken
	{
	public:

		MyToken(istream &is, ostream &os) : out(os), NxsToken(is) {}

		void OutputComment(const NxsString &msg)
			{
			cout << msg << endl;
			out << msg << endl;
			}

	private:
		ostream &out;
	};

int main(int argc, char *argv[])
	{
	taxa = new NxsTaxaBlock();
	trees = new NxsTreesBlock(taxa);

	MyReader nexus(argv[1], argv[2]);
	nexus.Add(taxa);
	nexus.Add(trees);

	MyToken token(nexus.inf, nexus.outf);
	nexus.Execute(token);

	taxa->Report(nexus.outf);
	trees->Report(nexus.outf);

	return 0;
	}

Back to Table of Contents

A Sample Data File

Here is a sample data file that exercises a lot of the features of the NEXUS file reader we have just created. First, there are both output and regular comments scattered around. Some are between tokens, some occur at the beginning of a token, and still others begin right after a token. Some comments even have nested within them words surrounded by square brackets. There are also blocks in this data file (i.e., the paup block) that are not recognized by the NEXUS file reader we have created. The NEXUS reader handles all of these situations with very minimal effort on your part. Note that you can remove the TAXA block without ill effects because the taxon labels are specified in the TREES block.

#nexus

[!Output comment before first block]

begin paup; [this is an unknown block]
	lset nst=2 basefreq=empirical tratio=estimate rates=gamma shape=estimate;
end;

[!Let's see if we can deal with [nested] comments]

[!
What happens if we do this!
]

begin [comment at beginning of token]taxa;
	dimensions[comment at end of token] ntax=11;
	taxlabels  [comment between tokens]
		P._fimbriata
		'P. robusta'
		'P. americana'
		'P. myriophylla'
		'P. articulata'
		'P. parksii'
		'P. gracilis'
		'P. macrophylla'
		'P. polygama'
		'P. basiramia'
		'P. ciliata'
		[!output comment in TAXLABELS command]
	;
end;

begin trees;
	translate
		1  P._fimbriata,
		2  P._robusta,
		3  P._americana,
		4  P._myriophylla,
		5  P._articulata,
		6  P._parksii,
		7  P._polygama,
		8  P._macrophylla,
		9  P._gracilis,
		10  P._basiramia,
		11  P._ciliata
	;
	utree unrooted =      (1,2,((4,3,(5,6)),((7,8),(9,(10,11)))));
	tree  rooted   =      ((1,2),((4,3,(5,6)),(7,(8,(9,(10,11))))));
	utree u_to_r   = [&R] ((1,2),((4,3,(5,6)),(7,(8,(9,(10,11))))));
	tree  r_to_u   = [&U] (1,2,((4,3,(5,6)),((7,8),(9,(10,11)))));
end;

Back to Table of Contents

Creating Your Own NEXUS Block

Creating your own NEXUS block involves deriving a class from the NxsBlock base class and overriding the three virtual functions Read , Reset , and Report . Use the files emptyblock.cpp and emptyblock.h as templates for your own source code and header files. While creating your own block class is not a complicated endeavor, here are some things to watch out for:

Back to Table of Contents

General Advice

A typical program making use of this library might have the following two general characteristics:

After developing several programs like this, I have come up with the following strategy that makes efficient use of the object-oriented nature of the NCL. I will assume your non-graphical program will be called simply "Phylome" and will read a private NEXUS block named "PHYLOME". I will further assume that the GUI version will be targeted for the Windows platform, and will be colled "PhylomeWin".

Back to Table of Contents

Reporting Bugs

Please report bugs by email directly to Paul O. Lewis. Please include "NCL bug" in the subject line to ensure that my mail filter catches it. Your bug has a better chance of getting fixed if you can attach to your email a NEXUS data file that causes the problem you have noticed.

Back to Table of Contents

What You Can Do For Me

I hope this library is useful to you, and note that it is free software under the Gnu General Public License.

Although you are not obligated in any way to me as a result of using this library to create programs, there are a few things that you can do to help encourage me to continue improving this library. Please make use of any of the following means of support that you feel comfortable with:

Back to Table of Contents

Future Of The NCL Project

The current capabilities of the NCL are best illustrated by taking a look at some of the data files that it can successfully read. For example, the NCL can successfully read data files available on the "Green Plant Phylogeny Research Coordination Group" website, which have a reasonably compicated structure.Included with the NCL are data files containing multiple DISTANCES blocks (distances.nex) or multiple CHARACTERS blocks (characters.nex) to illustrate some of the formatting options available with these two NEXUS block types. These examples amply demonstrate the capabilities of the NCL as it now stands, however the NCL will continue to grow as code for recognizing more and more NEXUS blocks are added. I welcome both suggestions for improvement as well as bug reports, of course.

Back to Table of Contents

Contact Information

My current mail and email addresses as well as my phone and fax numbers are given below:

Paul O. Lewis, Assistant Professor
Department of Ecology and Evolutionary Biology
The University of Connecticut
75 North Eagleville Road, Unit 3043
Storrs, CT 06269-3043 U.S.A.

Ph:    +1-860-486-2069
FAX:   +1-860-486-6364 (the departmental fax machine)
Email: paul.lewis@uconn.edu
URL:   http://lewis.eeb.uconn.edu/lewishome/
Lewis Labs ~ EEB ~ UConn