Common Mistakes
Here are mistakes I made during Hackerrank certification:
- using double quote
""
instead of single quote""
- what is usage differences between
GROUP BY
andDINSTINC
- should we use
WHERE
orHAVING
- using
NOT IN
Differences GROUP BY
and DISTINC
Understanding the difference between GROUP BY and DISTINCT is crucial for effective SQL querying. Here’s a breakdown of when to use each:
DISTINCT:
- Purpose: DISTINCT is used to retrieve unique values from one or more columns in a result set. It eliminates duplicate rows, showing each unique combination of the selected columns only once.
- When to use: When you simply want to see a list of unique values in a column or combination of columns. When you need to remove duplicate rows from your query results. It’s ideal for straightforward de-duplication.
- Example:
SELECT DISTINCT city FROM customers; (This will return a list of all the unique cities in the customers table.)
GROUP BY:
- Purpose: GROUP BY is used to group rows that have the same values in specified columns. It’s typically used in conjunction with aggregate functions (like COUNT(), SUM(), AVG(), MAX(), MIN()) to summarize data for each group.
- When to use: When you want to perform calculations or aggregations on groups of data. When you need to summarize data based on categories or groupings. When you want to analyze data in a summarized way.
- Example:
SELECT city, COUNT(*) FROM customers GROUP BY city; (This will return the number of customers in each city.)
Key Differences Summarized:
- Aggregation:
GROUP BY
is designed for aggregation.DISTINCT
is not.
- Functionality:
DISTINCT
removes duplicate rows.GROUP BY
organizes rows into groups.
- Use Cases:
DISTINCT
is for simple unique value retrieval.GROUP BY
is for summarizing and analyzing grouped data. In essence, if you just want to see unique values, useDISTINCT
. If you want to group data and perform calculations on those groups, useGROUP BY
.
Using HAVING
my code was like
WITH total_avg AS
(SELECT AVG(i.total_price) FROM invoice i)
SELECT c.country_name, COUNT(c.id), SUM(i.total_price)/COUNT(c.id) as country_avg
FROM country c
JOIN city ct ON ct.country_id = c.id
JOIN customer cs ON cs.city_id = ct.id
JOIN invoice i ON i.customer_id = cs.id
WHERE SUM(i.total_price)/COUNT(c.id) > total_avg
GROUP BY c.country_name;
the main issue is that you can’t use an aggregate function like SUM(i.total_price)/COUNT(c.id)
directly in a WHERE
clause. You need to use a HAVING
clause after the GROUP BY
clause to filter based on aggregated results. Also, you need to use the subquery’s result from the CTE.
Here’s the corrected syntax:
WITH total_avg AS
(SELECT AVG(i.total_price) as ta FROM invoice i)
SELECT c.country_name, COUNT(c.id), SUM(i.total_price)/COUNT(c.id) as country_avg
FROM country c
JOIN city ct ON ct.country_id = c.id
JOIN customer cs ON cs.city_id = ct.id
JOIN invoice i ON i.customer_id = cs.id
GROUP BY c.country_name
HAVING SUM(i.total_price)/COUNT(c.id) > (SELECT ta FROM total_avg);
Using NOT IN
It was not a mistake. But it is just a syntax that I need to familiar with. The problem was to filter the products were not sold.
SELECT p.sku, p.product_name
FROM product p
WHERE p.id NOT IN (SELECT product_id FROM invoice_item WHERE product_id IS NOT NULL);