What is the intuition behind Strassen’s Algorithm?

I came across Strassen’s algorithm for matrix multiplication, which has time complexity $O(n^{2.81})$, significantly better than the naive $O(n^3). Of course, there have been several other improvements in matrix multiplication since Strassen, but my question is specific to this algorithm.

If you see the algorithm, you’ll notice that 7 matrices $M_1$ to $M_7$ have been defined as intermediate computation steps, and the final matrix product can be expressed in terms of these. I understand how to verify this claim, and arrive at the expression for the desired time complexity, but I’m unable to grasp the intuition behind this algorithm, i.e. why are the matrices $M_1$ through $M_7$ defined the way they are?

Thank you!