Vectorization of Logistic Regression


I want to take the derivative (with respect to ${Theta}$ of:

$sum_{i = 1}^{m} w^{i}(y^{i} log(h_{Theta }(x^{i}) + (1-y^{i})log(1-h_{Theta}(x^{i})))$

Where $h_{Theta}(x)$ is the sigmoid function:

$frac{1}{1+e^{-{Theta}^{T}x^{i}}}$

i in this case is the training example (related to Machine Learning).

I’m able to get it to:

$sum_{i=1}^{m} w^{i}(y^{i} – h_{Theta}(x^{i}))$

However, I’m unable to understand how it turns into:

$X^{T}(w^{i}(y^{i}-h_{Theta}(x^{i}))$

It’s really a disconnection that I’m having. I understand that Matrix multiplication is the sum of column entries being multiplied by row entries, but I’ve never found a concrete way to naturally understand that the sum turns into X-transpose here.Just looking to build stronger intuition.