I need help with understanding if there's a problem with the very high CPU usage on my read-only server. Read-only means that this server is secondarily read-only through LS, and the DW team can run any type of Crapy query that runs for hours.
Issue – Primary LS Server is primarily OLTP and its vendor-based product, which avoids clustered indexes because DB, as the primary LS, undergoes extensive writes throughout the day. Secondary level recovery occurs once a night and users are DC for that time. For the rest of the day, there are always selected queries.
Index optimization is quite difficult here (no vendor support) because the primary index can not run with many indexes like CI or more NCIs. Therefore, most of the tables read on the secondary page go through the table scan or the key search.
For read-only servers with the following configuration
Logical processors = 80
Total RAM= 512, MAX=440
Numa node =4
MAXDOP=8 and CTOP=5 ( i know this is too low but setting this to even 100 on this server does not work , per my analysis sub tree cost of most of the queries on this server is over 1000 avg)
Now come at most 6-10 of these crappy queries and run in parallel. The CPU utilization is usually 90%. Users do not complain a lot because they do not take care of questions that take more than 4 hours.
But from a server point of view, I think it's not good, especially when 10-15 of them start together and the CPU goes down 100% and I'm afraid the server will fail.
Since index optimization is not currently included, I should increase MAXDOP by 16, because on average, there are 600 worker threads of 2944 available for these queries to execute quickly, or should they be lowered to 4 or 6 to match them lower CPU usage?
CXPACKET waits are the highest wait time with 95% of the average wait time for the last week.
From a storage perspective: Pending memory allocations are mostly 0, but pending average is 30-35 and PLE is an average of 1000, but will be too low at 100 if these parallel queries are executed