# matrix calculus – second order derivative of the loss function of logistic regression

For the loss function of logistic regression
$$ell = sum_{i=1}^n left( y_i boldsymbol{beta}^T mathbf{x}_{i} – log left(1 + exp( boldsymbol{beta}^T mathbf{x}_{i} right) right)$$
I understand that its first order derivative is
$$frac{partial ell}{partial beta} = boldsymbol{X}^T(boldsymbol{y} – boldsymbol{p})$$
where
$$p = frac{exp(boldsymbol{X} cdot beta)}{1 + exp(boldsymbol{X} cdot beta)}$$
and its second order derivative is

$$frac{partial^2 ell}{partial beta^2} = boldsymbol{X}^Tboldsymbol{W}boldsymbol{X}$$
where $$boldsymbol{W}$$ is a $$n*n$$ diagonal matrix and the $$i-th$$ diagonal element of $$boldsymbol{W}$$ is equal to $$p_i(1-p_i)$$. However, I am struggling with the first order and second order derivative of the loss function of logistic regression with L2 regularization

$$ell = sum_{i=1}^n left( y_i boldsymbol{beta}^T mathbf{x}_{i} – log left(1 + exp( boldsymbol{beta}^T mathbf{x}_{i} right) right) + lambda Sigma_{j}^{p}beta_j^2$$

I try to extrapolate $$boldsymbol{X}^T(boldsymbol{y} – boldsymbol{p})$$ and $$boldsymbol{X}^Tboldsymbol{W}boldsymbol{X}$$ by simply adding one more term according to my meager knowledge of calculus, making them $$boldsymbol{X}^T(boldsymbol{y} – boldsymbol{p}) + 2lambdaboldsymbol{beta}$$ and $$boldsymbol{X}^Tboldsymbol{W}boldsymbol{X} + 2lambda$$

But it appears to me that the thing does not work this way. So what is the correct 1st and 2nd order derivative of the loss function for the logistic regression with L2 regularization?