We have a deployment with roughly 10 Microsoft SQL Server 2017 (14.0.3335.7) instances running on 2 VMWare Virtual Machines, each running on different Dell servers. Each server has 1 main instance that processes 90+% of all connections and some other instances that serve other low traffic needs. The main servers both get hundreds of thousands of logins daily.
Among the hundreds of thousands, there are two sources of activity that are well monitored and measured. Each source targets a different main SQL server instance. The first is a job processing application server that runs roughly 1200 jobs overnight. The second is a job that runs once a minute all day and records results in a SQL server table.
Of these well measured process, we see 5-10 failures every night, all during the login process. There is no consistency in timing or which jobs fail.
The error message is very consistent. “A connection was successfully established with the server, but then an error occurred during the login process. (provider: SSL Provider, error: 0 – An existing connection was forcibly closed by the remote host.)”. The message comes from the .Net SqlConnection class during .Open().
I cannot find anything time correlated in firewall or SQL Server logs. My question is does anyone know how to get more information about what happens to cause this. It could be network packet drop, it could be SSL negotiation failure, it could have something to do with a certificate CRL download failure, or it could be something completely unexpected. I would like to get some SQL Server kernel-level-like debugging information that might give me a clue as to what is going on.
Any thoughts, tool references, or avenues to pursue would be appreciated.