I am currently working on my Master thesis on the visualization of music structure and I’m looking to find an optimal description of repetitions found in a piece of music.
Given a section range in a song in seconds (or samples) , e.g. (10,20), I can look up where this section is repeated. Then we end up with a set of repeating sections like: ((10,20), (40,50), (70,80)). We call this a group. A group has a certain fitness given to it.
(As a sidenote, the fitness of a group is defined as a combination of the sum of similarity values and how much of the song they cover alltogether)
Our goal is to find a set of disjoint groups that altogether have the highest fitness; the optimal decomposition of the repetitions. Below are two different valid decompositions of the same song, one course, and one fine decomposition.
We are provided with a set of all candidates, here’s a small selection, sorted top to bottom by fitness:
Current Greedy Method
- Sort all candidates by fitness
- Pick group G with the highest fitness
- Remove any groups from candidates if they have overlap with G
- Repeat from step 2 until no candidates are left
Sometimes the candidates overlap every so slightly, which in the context should perhaps not immediately lead to disqualification.
There are options to relax the no-overlap rule. Note that each of the sections in a group has a different brightness. This brightness corresponds to a confidence, so in a group some sections are more certain to be proper repetitions than others.
For a group of sections G that we wish to add to a set of groups of sections S, we can:
- simply remove sections from G if they overlap with any sections in S
- trim the sides of to-be-added sections from G if they overlap with any sections in S
- keep the overlapping sections in G
I hope this problem is interesting enough to you to give it a shot!