Common Mistakes

Here are mistakes I made during Hackerrank certification:

  • using double quote "" instead of single quote ""
  • what is usage differences between GROUP BY and DINSTINC
  • should we use WHERE or HAVING
  • using NOT IN

Differences GROUP BY and DISTINC

Understanding the difference between GROUP BY and DISTINCT is crucial for effective SQL querying. Here’s a breakdown of when to use each:

DISTINCT:

  • Purpose: DISTINCT is used to retrieve unique values from one or more columns in a result set. It eliminates duplicate rows, showing each unique combination of the selected columns only once.
  • When to use: When you simply want to see a list of unique values in a column or combination of columns. When you need to remove duplicate rows from your query results. It’s ideal for straightforward de-duplication.
  • Example:
SELECT DISTINCT city FROM customers; (This will return a list of all the unique cities in the customers table.)

GROUP BY:

  • Purpose: GROUP BY is used to group rows that have the same values in specified columns. It’s typically used in conjunction with aggregate functions (like COUNT(), SUM(), AVG(), MAX(), MIN()) to summarize data for each group.
  • When to use: When you want to perform calculations or aggregations on groups of data. When you need to summarize data based on categories or groupings. When you want to analyze data in a summarized way.
  • Example:
SELECT city, COUNT(*) FROM customers GROUP BY city; (This will return the number of customers in each city.)

Key Differences Summarized:

  • Aggregation:
    • GROUP BY is designed for aggregation.
    • DISTINCT is not.
  • Functionality:
    • DISTINCT removes duplicate rows.
    • GROUP BY organizes rows into groups.
  • Use Cases:
    • DISTINCT is for simple unique value retrieval.
    • GROUP BY is for summarizing and analyzing grouped data. In essence, if you just want to see unique values, use DISTINCT. If you want to group data and perform calculations on those groups, use GROUP BY.

Using HAVING

my code was like

WITH total_avg AS
    (SELECT AVG(i.total_price) FROM invoice i)
SELECT c.country_name, COUNT(c.id), SUM(i.total_price)/COUNT(c.id) as country_avg
FROM country c
    JOIN city ct ON ct.country_id = c.id
    JOIN customer cs ON cs.city_id = ct.id
    JOIN invoice i ON i.customer_id = cs.id
WHERE SUM(i.total_price)/COUNT(c.id) > total_avg
GROUP BY c.country_name;

the main issue is that you can’t use an aggregate function like SUM(i.total_price)/COUNT(c.id) directly in a WHERE clause. You need to use a HAVING clause after the GROUP BY clause to filter based on aggregated results. Also, you need to use the subquery’s result from the CTE.

Here’s the corrected syntax:

WITH total_avg AS
    (SELECT AVG(i.total_price) as ta FROM invoice i)
SELECT c.country_name, COUNT(c.id), SUM(i.total_price)/COUNT(c.id) as country_avg
FROM country c
    JOIN city ct ON ct.country_id = c.id
    JOIN customer cs ON cs.city_id = ct.id
    JOIN invoice i ON i.customer_id = cs.id
GROUP BY c.country_name
HAVING SUM(i.total_price)/COUNT(c.id) > (SELECT ta FROM total_avg);

Using NOT IN

It was not a mistake. But it is just a syntax that I need to familiar with. The problem was to filter the products were not sold.

SELECT p.sku, p.product_name 
FROM product p
WHERE p.id NOT IN (SELECT product_id FROM invoice_item WHERE product_id IS NOT NULL);