I’m trying to develop a general framework for getting chunked cohorts out of a large database. Because of various factors, it’s convenient to write queries like this:
with tab as ( select whatever , id from whereever union all select whatever , id from whereever2 ) select t.whatever , t.id from tab as t inner join cohort as c on c.id = t.id
Now, I’m aware that most databases rewrite your query for efficiency, “under the hood.” If they didn’t, then this query would pull everything from
whereever, and then filter it using the join below. This would be inefficient.
Alternatively, I could do this:
select whatever , id from whereever inner join cohort on w.id = cohort.id union all select whatever , id from whereever2 inner join cohort on w.id = cohort.id
This one is uglier, repetitious and harder to write programmatically.
In principle, which should be faster, and WHY?
If it matters, I’m on a MS SQL database, but I’d be interested to know whether the answer would vary with postgres.