Did you know? Programmers convert coffee to code.

If you like my articles, sponsor me a coffee.

Creating a Profile Matrix

Previously I have written about the Greedy Motif Search. There I got a question about creating the profile matrix. Let’s take a look at how it is calculated.

What is the profile matrix?

It is a matrix which is calculated on the number of nucleotides in each string at the given position. The nucleotide-count adds up to 1 in each column.

To be more clear let’s see an example:

GGCGTTCAGGCA
AAGAATCAGTCA
CAAGGAGTTCGC
CACGTCAATCAC
CAATAATATTCG
In the example above we see a list of nucleotides. If we generate the profile we get the following result:
A: 0.2, 0.8, 0.4, 0.2, 0.4, 0.4, 0.2, 0.8, 0.0, 0.0, 0.2, 0.4
C: 0.6, 0.0, 0.4, 0.0, 0.0, 0.2, 0.4, 0.0, 0.0, 0.4, 0.6, 0.4
G: 0.2, 0.2, 0.2, 0.6, 0.2, 0.0, 0.2, 0.0, 0.4, 0.2, 0.2, 0.2
T: 0.0, 0.0, 0.0, 0.2, 0.4, 0.4, 0.2, 0.2, 0.6, 0.4, 0.0, 0.0
 As you can see, the profile is the amount of each nucleotide in the given position divided by the number of nucleotide strings provided.
If we want to code a profile generator in Python we could do it like this function:
def generate_profile(motifs):
    k = len(motifs[0])
    profile = {'A': [0] * k, 'C': [0] * k, 'G': [0] * k, 'T': [0] * k}
    div = float(len(motifs))
    for i in range(k):
        for motif in motifs:
            profile[motif[i]][i] += 1
        for key in profile:
            profile[key][i] /= div
    return profile
 This function returns the profile matrix as a dictionary with the nucleotide as the key and the profile score as the value.
This was easy. Now you can use this function to generate the profile for a Greedy Motif Search.
GHajba
 

Senior developer, consultant, author, mentor, apprentice.

I love to share my knowledge and insights what I achieve through my daily work which is not trivial — at least not for me.

Click Here to Leave a Comment Below 2 comments
%d bloggers like this: