The PlastLibrary is a library containing all the needed software components required for comparing two sequence databases with the PLAST algorithm.
As a library, it should not contain any 'main' function. However, to simplify the use of the library as a command-line tool, there is a main function provided in:
You will find here the code documentation for namespaces, classes, methods of the different components that makes the PlastLibrary design.
The PLAST library can be compiled in a terminal with the cmake tool.
As a result of source code compilation, one should get a dynamic library (in 'build/lib' directory) and a command-line tool (in 'build/bin' directory). That tool is called PlastCmd: this is the one to use to run PLAST sequence comparison jobs on the command-line.
Requirements to compile PLAST:
Be aware that the more recent OS version you use, the more faster PLAST will be.
Once the source archive has been retrieved and unzipped, one just has to do:
Windows: (use MinGW-MSYS to run a real Unix shell)
Note that the target architecture must support SSE instructions set (at least SSE2). For instance, most of Intel recent processors support SSE2.
PLAST is a parallel local alignment search tool for database comparison. It scans two provided genomic databases (in FASTA format for instance) and starts its search by looking for small similarities (3 or 4 letters for instance) between the two databases.
For efficiency issues, PLAST first indexes the two databases with an index model based on seeds, ie. small words of nucleotides or amino acids. Once the indexes are built, it looks for each occurrence in the subject and query databases of each possible seed. Such an occurrence in both databases is called a hit.
Then, the PLAST algorithm tries to keep only the best alignments from these initial hits set. This is done through different steps:
Then, the PLAST algorithm is composed of several steps, each step filtering out the result of previous step. At the end, only the alignments that fulfill the end user parameters are kept.
So, the PLAST algorithm looks like a unix pipe; actually, the chosen design will keep this point of view: an initial set of hits (from the subject and query indexations) is iterated at each step and filtered out if needed.
PLAST is proposed as a parallel alternative of the well known BLAST tool. The main idea is to use hardware capacities for speeding up the algorithm execution.
In particular, the following hardware capacities are used by PLAST:
The key concept of the PLAST design is the iteration. As said above, the algorithm iterates an initial set of hits and tries to filter it out. Then, the resulting filtered set of hits is iterated by the next algorithm step, and so on.
The Iterator concept (see GOF) is the object oriented view of the iteration. An Iterator object knows how to iterate some set. The interesting part is that the Iterator object uses always the same API for iterating the underlying set, whatever the set can be.
For PLAST, using Iterators will uniformize the design of the algorithm components. In particular, we will define a specific interface for iterating hits (called IHitIterator) that will be used throughout the algorithm steps. Different implementations of the IHitIterator interface will correspond to the ungap alignments step, small gap alignments step and so on.
Beyond this uniformization aspect, the Iterator pattern can be used for implementing our parallelization scheme for multicores architectures. We can split an Iterator instance in several Iterator instances, where the iterated set of the initial instance is the same as the union of the iterated sets of split instances. It is then easy to iterate each split iterator in a specific thread, running on a specific core and we are done with this parallelization aspect.
People who wants to use the library for comparing two databases have to know its "entry point", ie. what they have to call in their main function.
The launcher::core::PlastCmd class may be seen as such an "entry point". It provides an 'execute' method to be called for launching the PLAST algorithm. The parameterization of the algorithm (for instance the path of the two genomic databases to be compared) is provided to the constructor of this class.
Note that launcher::core::PlastCmd is an implementation of the dp::ICommand interface. Therefore, one can use some command dispatcher instead of using directly the 'execute' method for launching the algorithm.
Finally, tools::PlastCmd provides directly a main function to enable the use of the PlastLibrary as a command-line tool.
The top level namespaces are the following:
Most of these top level namespaces hold:
The top level 'algorithm' is a little bit special. It is there that the PLAST algorithm is actually implemented. It relies on all the other top level namespaces. It is itself composed of the following sub namespaces:
The header path files reflect the namespaces organization.
For instance, accessing to the ISequenceDatabase interface can be done with:
Accessing to a specific implementation of this interface can be done with:
It has two benefits:
Here is a small example of use of the PlastLibrary; it merely iterates the sequences of a FASTA file and print a few information about each sequence. We put this code sample in the test.cpp file.
Now, for generating the executable (on Linux), one should have to do something like this:
Note that you need (on Linux) to use the posix thread library (pthread) and the real time library (rt).