# machine learning – How does Gradient Descent treat multiple features?

As far as I know, when you reach the step, in a gradient descent algorithm, to calculate `step_size`, you calculate `learning_rate * slope`

Now, `slope` is obtained by calculating the derivative of the `cost_function` with respect to the feature you want to find the optimal coefficient for.

Let’s say that the cost function for the purposes of this question is the `sum of squared residuals`.

My question is, how are coefficient of other features treated in the differentiation of the equation? For instance, if I have the equation $$y = b_0 + x_1 + x_2$$, then by calculating the derivative of the cost function with respect to $$b_1$$, one gets:

$$frac{d}{dleft(x_1right)}left(left(mean:-:left(b_0:+:x_1+x_2right)right)^2right) =$$

$$2times:left(mean:-:b_{0:}-x_1-x_2right)left(frac{d}{dleft(x_1right)}(x_2):+:1right)$$

In this case, how is a value obtained by substituting a value for $$x_1$$ while $$frac{d}{d(x_1)}(x_2)$$ is still in the formula?

I watched a YouTube video (it starts at the right point) that says that $$x_2$$ is a constant (while it’s a different feature) and, therefore, when differentiating, $$frac{d}{d(x_1)}(x_2)$$ is omitted and we are left with $$2times :left(mean:-:b_{0:}-x_1-x_2right)(1)$$. Is this the case or am I missing something?