As far as I know, when you reach the step, in a gradient descent algorithm, to calculate `step_size`

, you calculate `learning_rate * slope`

Now, `slope`

is obtained by calculating the derivative of the `cost_function`

with respect to the feature you want to find the optimal coefficient for.

Let’s say that the cost function for the purposes of this question is the `sum of squared residuals`

.

My question is, how are coefficient of **other** features treated in the differentiation of the equation? For instance, if I have the equation $y = b_0 + x_1 + x_2$, then by calculating the derivative of the cost function with respect to $b_1$, one gets:

$frac{d}{dleft(x_1right)}left(left(mean:-:left(b_0:+:x_1+x_2right)right)^2right) =$

$2times:left(mean:-:b_{0:}-x_1-x_2right)left(frac{d}{dleft(x_1right)}(x_2):+:1right)$

In this case, how is a value obtained by substituting a value for $x_1$ while $frac{d}{d(x_1)}(x_2)$ is still in the formula?

I watched a YouTube video (it starts at the right point) that says that $x_2$ is a constant (while it’s a different feature) and, therefore, when differentiating, $frac{d}{d(x_1)}(x_2)$ is omitted and we are left with $2times :left(mean:-:b_{0:}-x_1-x_2right)(1)$. Is this the case or am I missing something?