I'm using the 2010 version of the Stack Overflow database on SQL Server 2017 with the new CE (compatibility level 140) and created this process:
CREATE OR ALTER PROCEDURE #sp_PostsByCommentCount
FROM dbo.Posts p
p.CommentCount = @CommentCount
There are no non-clustered indexes or statistics too
dbo.Posts Table (there is a clustered index for
If you ask for an estimated plan, the "estimated lines" will be output
dbo.Posts is 1,934.99:
EXEC #sp_PostsByCommentCount @CommentCount = 51;
The following statistic object was automatically created when I asked for the estimated plan:
DBCC SHOW_STATISTICS('dbo.Posts', (_WA_Sys_00000006_0519C6AF));
The highlights are:
- The statistics have a fairly low sample rate of 1.81% (67,796 / 3,744,192).
- Only 31 histogram steps were used
- The value for "All Density" is
0.03030303 (33 different values were scanned)
- The last
RANGE_HI_KEY in the histogram is 50, with
EQ_ROWS from 1
If you pass a value above 50 (up to and including 2,147,483,647), the row estimate is set to 1,934.99. Which calculation or which value is used to create this estimate? Incidentally, the legacy cardinality estimator generates an estimate of 1 line.
Here are some theories I had, things I tried, or additional information that I could dig up when I was doing it.
At first I thought it was the density vector, as if I had used it
OPTION (OPTIMIZE FOR UNKNOWN), However, the density vector for this statistic object is 3.744.192 * 0.03030303 = 113.460, so that's not it.
I tried to run an advanced event session in which the
query_optimizer_estimate_cardinality Event (from which I learned in Paul White's blog contribution Cardinality Estimation: Combining Density Statistics) and contains the following interesting treat:
So it seems the
CSelCalcAscendingKeyFilter The calculator was used (the other says he failed, whatever that means). This column is not a key or unique or unconditionally ascending, but whatever.
When I googled that term, I came up with some blog posts:
These contributions indicate the new CE values that these estimates outside the histogram are based on a combination of the density vector and the statistic change counter. Unfortunately, I have already excluded the density vector (I think so!) And the change counter is zero (per
Forrest suggested that TF 2363 be turned on for more information about the estimation process. I think the most relevant thing about this issue is this:
Plan for computation:
CSelCalcAscendingKeyFilter(avg. freq., QCOL: (p).CommentCount)
This is a breakthrough (thanks, Forrest!): That
0.000516798 Number (which seems to be unrounded in the XE)
Selectivity="0.001" Attribute above) multiplied by the number of rows in the table is the estimate I was looking for (1,934.99).
I probably miss something obvious, but I could not reverse how this selectivity value is generated within