Tag Archives: Eulerian Cycle

Graph – Overlap, De Bruijn

1. String Composition: Form the k-mer composition of a string.
Input: An integer k and a string Text.
Output: Compositionk(Text), where the k-mers are written in lexicographic order.
Composition3(TATGGGGTGC) = ATG, GGG, GGG, GGT, GTG, TAT, TGC, TGG

dnas = [in_dna[i:i+k] for i in range(0, len(in_dna)) if len(in_dna[i:i+k])==k]
dnas.sort()

For example TAATGCCATGGGATGTT are linked together to form the genome path.

s1
For example, each of the three occurrences of ATG should be connected to TGC, TGG, and TGT; these connections are overlapping 3-mers. The graph showing all overlap connections.

s2
The structure in the figure above is an example of a graph, or a network of nodes connected by edges. This graph is an example of a directed graph, whose edges have a direction and are represented by arrows (as opposed to undirected graphs whose edges do not have directions).

Sequence much shorter DNA fragments called reads.

Read more of this post

Advertisement