postgresql – How can I update query to use GROUP BY instead of DISTINCT while still preserving ORDER BY?


Trying to make this query more performant (basically I need to have distinct speech_ids from the speech_candidate table while at the same time counting the number of duplicate speech_ids for each of the unique rows from the speech_candidate table we’re pulling):

SELECT * FROM (SELECT DISTINCT ON (speech_candidate.speech_id) speech_candidate.speech_id, (speechtype_id = 7) AS is_position_paper, (speechdate IS NULL) AS is_null_date, (SELECT COALESCE(shorttitle, '') FROM office oc JOIN office_candidate oc2 USING (office_id) WHERE oc2.candidate_id = candidate.candidate_id AND officecandidatestatus_id = 1 ORDER BY rank LIMIT 1) AS shorttitle, speech_candidate.speech_candidate_id, speech_candidate.speech_id, speech_candidate.candidate_id, speech.speech_id, speech.title, speech.speechdate, speech.location, speech.key, speech.quote, candidate.candidate_id, candidate.firstname, candidate.middlename, candidate.lastname, candidate.nickname FROM speech_candidate INNER JOIN candidate ON ( speech_candidate.candidate_id = candidate.candidate_id ) INNER JOIN office_candidate ON ( candidate.candidate_id = office_candidate.candidate_id ) INNER JOIN office ON ( office_candidate.office_id = office.office_id ) INNER JOIN speech ON ( speech_candidate.speech_id = speech.speech_id ) WHERE (office_candidate.state_id IN ('NA', 'AL', 'AK', 'AS', 'AZ', 'AR', 'CA', 'CO', 'CT', 'DE', 'DC', 'FL', 'GA', 'GU', 'HI', 'ID', 'IL', 'IN', 'IA', 'KS', 'KY', 'LA', 'ME', 'MD', 'MA', 'MI', 'MN', 'MS', 'MO', 'MT', 'NC', 'ND', 'MP', 'NE', 'NV', 'NH', 'NJ', 'NM', 'NY', 'OH', 'OK', 'OR', 'PA', 'PR', 'RI', 'SC', 'SD', 'TN', 'TX', 'UT', 'VI', 'VA', 'VT', 'WA', 'WI', 'WV', 'WY') AND office_candidate.officecandidatestatus_id = 1 AND office.officetype_id = 'C' AND speech.release_id >= 1) ORDER BY speech_candidate.speech_id, is_position_paper ASC, is_null_date ASC, speech.speechdate DESC) sc ORDER BY sc.is_position_paper ASC, sc.is_null_date ASC, sc.speechdate DESC LIMIT 50 OFFSET 0

Would it be better to do this via GROUP BY or is DISTINCT really what I want to use here? Again, I need to both dedupe AND add a column with a counting of the duplicates for each row.