Distinct on postgresql

11/10/2023

I hope you found this tutorial interesting. There are so many things that you might have learned in this tutorial including the DISTINCT ON clause. In this tutorial, we learned the DISTINCT clause with multiple examples. It is a good practice to use the ORDER BY clause with it, so that you can always predict the result and bring the desired output. I hope you got the point of using the ORDER BY clause with the DISTINCT ON clause. However, when we sort the result in descending order, the first record which comes is the record of Alex. Therefore, customer name Peter comes first. The previous query returns the result in the ascending order of the city names. SELECT DISTINCT ON (city) city, country, name FROM customers ORDER BY city DESC Code language: PostgreSQL SQL dialect and PL/pgSQL ( pgsql ) Select Distinct On With Order Notice the highlighted area and compare it with the next example’s output. Therefore, the output will contain the duplicate city name but the combination of columns creates a unique record. Here, even if the city names are identical in the table, the combination of name and city makes that record unique. SELECT DISTINCT name, city FROM customers Code language: PostgreSQL SQL dialect and PL/pgSQL ( pgsql ) Here, we will return the customers names and their cities which are unique. Now let’s apply the DISTINCT clause on multiple columns. PostgreSQL DISTINCT Clause On Multiple Columns Here, we only return a single column value, therefore the order clause is not needed as long as you want the output in any order. SELECT DISTINCT city FROM customers Code language: PostgreSQL SQL dialect and PL/pgSQL ( pgsql ) Select Distinct Single Column Now let’s return only unique city names from the customers table. PostgreSQL DISTINCT Clause On Single Column

Note that, the city and country column contains duplicate values. The table contains the customer’s names, their city and country. Here, we have created a table ‘customers’. ( 'Alex', 'Sydney', 'Australia') Code language: PostgreSQL SQL dialect and PL/pgSQL ( pgsql ) Customers Table Data Waiting for our DISTINCT queries to return. PostgreSQL is an amazing database, but it can struggle with certain types of queries, especially as tables approach tens and hundreds of millions of rows (or more). INSERT INTO customers ( name, city, country) Improving DISTINCT Query Performance Up to 8,000x on PostgreSQL. Let’s create a table and insert some data into it so that we can perform operations with the DISTINCT clause on the table. Note that, when specifying the ORDER BY clause with the DISTINCT ON clause, make sure the DISTINCT ON expression matches the initial order of the ORDER BY expression, else it will throw an error. This is because, col2 contains only unique values, and the duplication of col1 doesn’t matter here. However, if you use the DISTINCT ON clause on col2, you will get both records in the output. If you sort the table in descending order based on col1 and col2, the output will be red-orange. The output may change based on the order of the records. In this case, the output will be red-blue. Now, we have applied the DISTINCT clause on the col1, therefore the output will contain only a single record whichever comes first by the order. Here, Let’s assume the table contains the above col1 and col2 records. If you are using Postgres 13 with it's new de-duplication feature should also make that a pretty small index.FROM tableName Code language: PostgreSQL SQL dialect and PL/pgSQL ( pgsql ) I would expect an index on (col1, col2, col3) to be eligible for an Index Only Scan (make sure you run vacuum analyze to update the visibility map).

I am surprised though that this isn't a Parallel Seq Scan given the size of the table. Whether or not that makes a big difference depends on the number of columns of the table. The only advantage an Index Scan might give is that the index is a bit smaller to read so that would mean a bit less I/O. The distinct then needs to be applied on all 39474942 rows. So the index on col3 is not going to help, the Seq Scan is the right choice here. It keeps one row for each group of duplicates. The plan shows that the condition on col3 removes only 145 from 39475087 rows which pretty much is the entire table. Step 4) The Select DISTINCT in PostgreSQL clause can be used to remove duplicate rows from the result. distinct (a),b is the same as distinct a,(b) or distinct a,b It always applies to all columns in the select list.Įnclosing one of the columns with parentheses won't change anything and is useless. Not directly related to your performance problem, but: distinct is not a function.

0 Comments

Distinct on postgresql

Leave a Reply.

Author

Archives

Categories