Our work involves updating products, we have a large table of products, and the prices and other pertinent information gets updated hourly. Let’s say it’s an amazon store, and we are talking of amazon products, and we have to update the selling price, buy box price etc. we pull the information hourly from amazon into our program and update the data in database.
My workflow is, pull all the products from the database into the program, (we use C# and EF Core), update the relevant products, and send back the updates to the database.
This way has the penalty of reading a lot of info from the database into the program, but I feel this is very efficient, because EF Core has change detection, so even if I assign all products with the price of the incoming products, if there is no change EF core will not change anything, it will only generate an update statement for those products which have changed information.
Also, it does not make large update statements, it generates small, targeted update statements, like
update products, set BuyBoxPrice = 12.23 where productid = 23345.
I am working with a very gifted developer, who is very natural with SQL, he argues that this way is wrong, I should rather put all the incoming info in a temp table named #products, and send it into the database, and then run a stored procedure which should do like this,
update products, set BuyBoxPrice = #products.buyboxprice from products inner join #products on products.produtid = #products.productid.
So this approach avoids having to make large reads from the database.
I am not so experienced, my question is, do reads create locks or degrade the database performance, probably yes?
Now here are the reasons I am not comfortable with his approach.
It creates a lot of unnecessary updates, which is very wasteful in my eyes, since only 25 percent of the information changes, so why update all columns.
My colleague counters, that I can remedy that by adding a where statement, like
update products where products.buyboxprice <> #products.buyboxprice
I don’t think that this reduces the penalty u pay, I think its still the same effect.
Another major worry is that large updates create locks, for that alone it should be avoided. Now of course I can break up the updates into chunks of less than 3000 etc.
Third point, when SQL has tummy trouble, it throws symptoms all over, and strange things start to happen, customers call in yelling and boss gets angry, and I have very little visibility of what’s going on, but in C# whenever something crashed is right there clear for me.
So my question is, who is right, is it more performant to do the updates via reads and EF core, or via SQL