Suppressing a repeated column value in SQL

2 minute read

I was asked by one of my co-workers for some SQL help. He needed a SQL statement that would suppress repeated column values for the result set. Basically the value would be shown for the first row and blanked for each successful row that had the same value. Typically you would handle this in the application code, we had a case where we had to pass data to another application and we needed to do this within a single SQL select statement.

For example if we have the values: </p>

username             Category
-------------------- ----------
Brian                cs
Tom                  cs
Joe                  cs
Allen                cs
Bill                 ts
Steven               ts
Fred                 ts
Ted                  ts

We would want to return this as the output

username             Category
-------------------- ----------
Brian                cs
Tom                  
Joe                  
Allen                
Bill                 ts
Steven               
Fred                 
Ted

Using the following table structure:

create table test(id int, cat varchar(10), username varchar(20))

We can make a query like

select t.username
,case
  when t.id = (select top 1 id 
               from test t3 
               where t3.cat = t.cat 
               order by t3.cat, t3.username) then t.cat
  else ''
end as Category
from test t
order by t.cat, t.username

What the case keyword is doing is a sub-select on the same table and uses top 1 to match on only the first row for each set of categories. If we match, we use the category value, otherwise we use an empty string value. This is not very inefficient, you are doing the sub-select for each row of the query. We needed to do this because the situation only allowed a single SQL statement to be executed. We were working with a small set of records and this executed without any delay.

If you can call a stored procedure or execute a batch of SQL, you can split this up and gain a performance increase for larger sets of data. Instead of doing the sub-select on each row, populate a table variable with the first row for each category. Then do a left join from the main table to the table variable. The combined SQL would look something like this:

declare @q table(cat varchar(10), username varchar(20))
insert into @q(cat, username)
select t.cat, MIN(t.username)
from test t
group by t.cat

select t.username, COALESCE(q.cat,'') as Category
from test t
left join @q q on t.cat = q.cat and t.username = q.username
order by t.cat, t.username

Another way to get this affect is to use a Common Table Expression (CTE) as part of the query. This would behave like the table variable, but you would have just a single select statement. This would be useful for reporting tools where you can only specify a single SQL statement to retrieve the data.

Using the above example data, the new select statement would look like this

with cte as
(
  select cat, min(username) as username
  from test
  group by cat
)
select t.username, COALESCE(c.cat,'') as Category
from test t
left join cte c on t.username = c.username and c.cat = t.cat
order by t.cat, t.username

This query should be perform well (if not better) as the query with the table variable.

Twitter Facebook LinkedIn

Chris Miller

Suppressing a repeated column value in SQL

Comments

You May Also Enjoy

When Visual Studio Forgets Your Phone: Building an Automated Solution

Using the 3-2-1 strategy for digital backups

Generating a dynamic background on .NET MAUI

Backing up my main WSL2 Linux session