SQL Articles
How to Use GROUP BY, HAVING, and ORDER BY SQL Clauses
SQL gives you options for retrieving, analyzing, and displaying the information you need with the GROUP BY, HAVING, and ORDER BY clauses. Here are some examples of how you can use them.
GROUP BY clauses
Sometimes, rather than retrieving individual records, you want to know something about a group of records. The GROUP BY clause is the tool you need.
Suppose you’re the sales manager of another location, and you want to look at the performance of your sales force. If you do a simple SELECT, such as the following query:
SELECT InvoiceNo, SaleDate, Salesperson, TotalSale
FROM SALES;
This result gives you some idea of how well your salespeople are doing because so few total sales are involved. However, in real life, a company would have many more sales — and it wouldn’t be so easy to tell whether sales objectives were being met.
To do the real analysis, you can combine the GROUP BY clause with one of the aggregate functions (also called set functions) to get a quantitative picture of sales performance. For example, you can see which salesperson is selling more of the profitable high-ticket items by using the average (AVG) function as follows:
SELECT Salesperson, AVG(TotalSale)
FROM SALES
GROUP BY Salesperson;
Running the query with a different database management system would retrieve the same result, but might appear a little different.
The average value of Bennett’s sales is considerably higher than that of the other two salespeople. You compare total sales with a similar query:
SELECT Salesperson, SUM(TotalSale)
FROM SALES
GROUP BY Salesperson;
Bennett also has the highest total sales, which is consistent with having the highest average sales.
HAVING clauses
You can analyze the grouped data further by using the HAVING clause. The HAVING clause is a filter that acts similar to a WHERE clause, but on groups of rows rather than on individual rows. To illustrate the function of the HAVING clause, suppose the sales manager considers Bennett to be in a class by himself.
His performance distorts the overall data for the other salespeople. (Aha — a curve-wrecker.) You can exclude Bennett’s sales from the grouped data by using a HAVING clause as follows:
SELECT Salesperson, SUM(TotalSale)
FROM SALES
GROUP BY Salesperson
HAVING Salesperson <>'Bennett';
Only rows where the salesperson is not Bennett are considered.
ORDER BY clauses
Use the ORDER BY clause to display the output table of a query in either ascending or descending alphabetical order. Whereas the GROUP BY clause gathers rows into groups and sorts the groups into alphabetical order, ORDER BY sorts individual rows. The ORDER BY clause must be the last clause that you specify in a query.
If the query also contains a GROUP BY clause, the clause first arranges the output rows into groups. The ORDER BY clause then sorts the rows within each group. If you have no GROUP BY clause, then the statement considers the entire table as a group, and the ORDER BY clause sorts all its rows according to the column (or columns) that the ORDER BY clause specifies.
To illustrate this point, consider the data in the SALES table. The SALES table contains columns for InvoiceNo, SaleDate, Salesperson, and TotalSale. If you use the following example, you see all the data in the SALES table — but in an arbitrary order:
SELECT * FROM SALES ;
In one implementation, this may be the order in which you inserted the rows in the table; in another implementation, the order may be that of the most recent updates. The order can also change unexpectedly if anyone physically reorganizes the database. That’s one reason it’s usually a good idea to specify the order in which you want the rows.
You may, for example, want to see the rows in order by the SaleDate like this:
SELECT * FROM SALES ORDER BY SaleDate ;
This example returns all the rows in the SALES table in order by SaleDate.
For rows with the same SaleDate, the default order depends on the implementation. You can, however, specify how to sort the rows that share the same SaleDate. You may want to see the sales for each SaleDate in order by InvoiceNo, as follows:
SELECT * FROM SALES ORDER BY SaleDate, InvoiceNo ;
This example first orders the sales by SaleDate; then for each SaleDate, it orders the sales by InvoiceNo. But don’t confuse that example with the following query:
SELECT * FROM SALES ORDER BY InvoiceNo, SaleDate ;
This query first orders the sales by INVOICE_NO. Then for each different InvoiceNo, the query orders the sales by SaleDate. This probably won’t yield the result you want, because it’s unlikely that multiple sale dates will exist for a single invoice number.
The following query is another example of how SQL can return data:
SELECT * FROM SALES ORDER BY Salesperson, SaleDate ;
This example first orders by Salesperson and then by SaleDate. After you look at the data in that order, you may want to invert it, as follows:
SELECT * FROM SALES ORDER BY SaleDate, Salesperson ;
This example orders the rows first by SaleDate and then by Salesperson.
All these ordering examples are in ascending (ASC) order, which is the default sort order. The last SELECT shows earlier sales first — and, within a given date, shows sales for ‘Adams’ before ‘Baker’. If you prefer descending (DESC) order, you can specify this order for one or more of the order columns, as follows:
SELECT * FROM SALES
ORDER BY SaleDate DESC, Salesperson ASC ;
This example specifies a descending order for sale dates, showing the more recent sales first, and an ascending order for salespeople, putting them in alphabetical order. That should give you a better picture of how Bennett’s performance stacks up against that of the other salespeople.