# MergeAlign

## How MergeAlign combines alignments Suppose we have two sequences, A (a1 a2 a3 a4 a5 a6) and B (b1 b2 b3 b4 b5 b6), and we have aligned them in five different ways. How does MergeAlign find the consensus alignment that maximally represents these five constituent alignments?

• Convert alignments into paths

Each column in an alignment is converted into a coordinate in N-dimensions, where N is the number of sequences in the alignment (here N=2). All sequences are assumed to start with 0 and each character in a sequence is converted into a number. Non-gaps are converted to their position in the sequence while gaps are converted to the preceding number in the sequence. An alignment is represented as a path between the coordinates.

Show path for alignment: 1, 2, 3, 4, 5. [Reset]
• Combine paths into a graph

Each coordinate that exists in at least one path is converted into a node in a graph. Edges between nodes represent transitions between coordinates. The weight of an edge is equal to the number of times that transition occurs in any alignment.

Show graph for paths: 1, 2, 3, 4, 5. [Reset]
• Score nodes of graph

Each node is assigned two values: a path score and a path length. The path score represents the sum of edge weights for the optimum path from node (0,0) to that node. The path length represents the length of the optimum path from node (0,0) to that node. The optimum path is defined as the path that has the maximum average edge weight (i.e. path score / path length). It is calculated recursively by dynamic programming. The path length to node (0,0) is 0 and its average edge weight is 0.

• Find optimum path by backtracing

The optimum path is found by starting at the final node and moving to the node that was selected as the one that gave the highest score. Nodes are followed back until node (0,0) is reached. This path is then reversed so it starts with (0,0), and converted back into an alignment by reversing the process used in the first step. The weight of each edge in the path is divided by the number of alignments to give the column score for each column in the final alignment.

## References

If you use MergeAlign please cite:

1. Collingridge PW & Kelly S (BMC Bioinformatics 2012, 13:117)
MergeAlign: improving multiple sequence alignment performance by dynamic reconstruction of consensus multiple sequence alignments.