Day 5: genome sequencing
Yesterday I did some coding at CodinGame: I did my first hard training, Genome Sequencing.
As mentioned previously, I won’t share my solution here but my approach to solve the problem.
First I approached the task with De Bruijn graphs. It seemed a good idea and I have got results for all the test cases but the last one. There Was some problem with my result: it was too short and missed one of the subsequences. So I thought that hour of coding was a waste of my time.
So I did some different approaches and came up with the following “algorithm”:
- filter out subsequences which are already in at least one other subsequence
- the minimum length is the sum of lengths’ of the remaining subsequences
- for all the permutations of the remaining sequences find out the combined sequence which has the smallest length
How do we come to find out the smallest sequence from the permutations? Well, this is really simple: we iterate through a list of permutations which has a defined order and we combine the next value with the already available combined sequence. Let’s see an example:
We have the following sub-sequences: AGATTA, GATTACA, and TACAGA. Because there is no sub-sequence which is contained in another so after the first step we continue with these three sub-sequences. The minimum length now is 19.
Now we create all 6 permutations of these sub-sequences and combine them to form a sequence:
- GATTACA, TACAGA, AGATTA results in GATTACAGATTA with the length of 12 so this is our current minimum
- GATTACA, AGATTA, TACAGA results in GATTACAGATTA with the length of 12 so the minimum length stays 12
- TACAGA, GATTACA, AGATTA results in TACAGATTACA with a length of 11 so this will be our new minimum length
- TACAGA, AGATTA, GATTACA results in TACAGATTACA with a length of 11 so the minimum does not change
- AGATTA, GATTACA, TACAGA results in AGATTACAGA with a length of 10 so this will be the new minimum
- AGATTA, TACAGA, GATTACA results in AGATTACAGA with a length of 10 so the minimum does not change
After the combinations we end up with a minimum length of 10. And this is the result for the example.