Plain Subqueries
Subqueris
A subquery is a SELECT statement that is nested within another T-SQL statement.
A subquery SELECT statement if executed independently of the T-SQL statement, in which it
is nested, will return a result set. Meaning a subquery SELECT statement can standalone
and is not depended on the statement in which it is nested. A subquery SELECT statement
can return any number of values, and can be found in, the column list of a SELECT
statement, a FROM, GROUP BY, HAVING, and/or ORDER BY clauses of a T-SQL statement. A
Subquery can also be used as a parameter to a function call. Basically a subquery can be
used anywhere an expression can be used.
Joining Virtual Tables
Joining virtual Tables is one of the most powerful
solution you can build with subqueries. Virtual means in this context, that the result
set you are joining is build on the fly. The following example shows, how to join a GROUP
BY result set with another, real table (Person).
SELECT P.id_person,
P.first_name,
P.last_name,
CONVERT(varchar(30), P.birth, 104),
A.id_council,
A.id_groupe,
A.numActivities
FROM Person P JOIN (SELECT id_person,
MIN(id_council) id_council,
MIN(id_groupe) id_groupe,
COUNT(*) numActivities
FROM Activity
GROUP BY id_person) A ON
(A.id_person = P.id_person)
WHERE P.id_person NOT IN (SELECT id_person
FROM Activity
WHERE id_council != 5)
The virtual Table is referenced in the outer query
by the alias A and is joined with person_id. You can use the virtual table columns
in the outer query using the alias A. for example A.numActivities.
Joining more than one virtual Table (SQL
Server)
The next example shows a very complex query using
more than one virtual table.
--
-- Declare Variables
--
DECLARE @LaufID BIGINT
DECLARE @AbrDatum DATETIME
DECLARE @CountLauf INT
--
-- Fill Variables
-- SELECT @LaufID = MAX(LaufID),
@AbrDatum = MAX(AbrDatum),
@CountLauf = COUNT(*)
FROM AbrLauf
WHERE BuchDatum >= CONVERT(datetime, @DatumVon, 104)
AND BuchDatum < DATEADD(day, 1, CONVERT(datetime, @DatumBis, 104))
--
-- Generate Report
--
SELECT P.Nr,
P.Name,
P.Vorname,
CASE R.Rat WHEN 1 THEN 'NR' WHEN 2 THEN 'SR' ELSE
NULL END Rat,
ISNULL(Entschaedigung.Betrag, 0)
EntschaedigungBetrag,
ISNULL(Vorsorge.Betrag, 0) VorsorgeBetrag,
ISNULL(Entschaedigung.Betrag, 0) +
ISNULL(Vorsorge.Betrag, 0) Total,
CONVERT(varchar(30), @DatumVon, 104) DatumVon,
CONVERT(varchar(30), @DatumBis, 104) DatumBis,
@LaufID LaufID,
CONVERT(varchar(30), @AbrDatum, 104) AbrDatum,
@CountLauf CountLauf
FROM Person P
--
-- Now join the real Table P with the virtaul Table R ...
--
LEFT OUTER JOIN (SELECT M.PersonID,
M.Rat
FROM Ratsmitglied M
WHERE M.Eintritt = (SELECT MAX(MI.Eintritt)
FROM Ratsmitglied MI
WHERE MI.PersonID = M.PersonID)) R
ON
(P.PersonID = R.PersonID)
--
-- ... then join Table P with the virtaul Table 'Entschaedigung'
--
LEFT OUTER JOIN (SELECT PersonID,
SUM(Betrag) Betrag
FROM ExportKreditor
WHERE ExportKreditorID IN (SELECT EAEK.ExportKreditorID
FROM EntAbrExportKreditor EAEK
JOIN EntAbr EA ON (EA.EntAbrID = EAEK.EntAbrID)
JOIN Abr A ON (A.AbrID = EA.AbrID)
JOIN AbrArt AA ON (AA.AbrArtID = A.AbrArtID)
WHERE AA.Abk = 'A')
AND SollHabenBez = 'H'
AND BuchDatum >= CONVERT(datetime, @DatumVon, 104)
AND BuchDatum < DATEADD(day, 1, CONVERT(datetime, @DatumBis, 104))
GROUP BY PersonID) Entschaedigung
ON
(P.PersonID = Entschaedigung.PersonID)
--
-- ... then join Table P with the virtaul Table 'Vorsorge'
--
LEFT OUTER JOIN (SELECT PersonID,
SUM(Betrag) Betrag
FROM ExportKreditor
WHERE ExportKreditorID IN (SELECT EAEK.ExportKreditorID
FROM EntAbrExportKreditor EAEK
JOIN EntAbr EA ON (EA.EntAbrID = EAEK.EntAbrID)
JOIN Abr A ON (A.AbrID = EA.AbrID)
JOIN AbrArt AA ON (AA.AbrArtID = A.AbrArtID)
WHERE AA.Abk = 'V')
AND SollHabenBez = 'H'
AND BuchDatum >= CONVERT(datetime, @DatumVon, 104)
AND BuchDatum < DATEADD(day, 1, CONVERT(datetime, @DatumBis, 104))
GROUP BY PersonID) Vorsorge
ON
(P.PersonID = Vorsorge.PersonID)
--
-- ... then the final WHERE Clause, based on the virtual Tables
-- WHERE ISNULL(Entschaedigung.Betrag, 0) + ISNULL(Vorsorge.Betrag, 0) > 0
ORDER BY P.Name, P.Vorname, R.Rat
Use of a Subquery in the Column List of a SELECT
Statement
Suppose you would like to see the last OrderID and the OrderDate
for the last order that was shipped to Paris. Along with that information, say you would
also like to see the OrderDate for the last order shipped regardless of the ShipCity. In
addition to this, you would also like to calculate the difference in days between the two
different OrderDates. Here is my T-SQL SELECT statement to accomplish this:
SELECT TOP 1 OrderId,
CONVERT(CHAR(10), OrderDate,121)
Last_Paris_Order,
(SELECT
CONVERT(CHAR(10),MAX(OrderDate),121)
FROM Northwind.dbo.Orders)
Last_OrderDate,
DATEDIFF(dd,OrderDate,(SELECT MAX(OrderDate)
FROM Northwind.dbo.Orders)) Day_Diff
FROM Northwind.dbo.Orders
WHERE ShipCity = 'Paris'
ORDER BY OrderDate DESC
The above code contains two subqueries. The first subquery gets the
OrderDate for the last order shipped regardless of ShipCity, and the second subquery
calculates the number of days between the two different OrderDates. Here we
used the first subquery to return a column value in the final result
set. The second subquery was used as a parameter in a function call. This subquery passed
the "max(OrderDate)" date to the DATEDIFF function.
Use of a Subquery in the WHERE clause
A subquery can be used to control the records returned from a
SELECT by controlling which records pass the conditions of a WHERE clause. In this case
the results of the subquery would be used on one side of a WHERE clause condition. Here
is an example:
SELECT DISTINCT country
FROM Northwind.dbo.Customers
WHERE country NOT IN (SELECT DISTINCT country
FROM Northwind.dbo.Suppliers)
Here we have returned a list of countries
where customers live, but there is no supplier located in that country. We
suppose if you where trying to provide better delivery time to
customers, then you might target these countries to look for additional
suppliers.
Suppose a company would like to do some targeted marketing. This
targeted marketing would contact customers in the country with the fewest number of
orders. It is hoped that this targeted marketing will increase the overall sales in the
targeted country. Here is an example that uses a subquery to return the customer contact
information for the country with the fewest number of orders:
SELECT Country,
CompanyName,
ContactName,
ContactTitle,
Phone
FROM Northwind.dbo.Customers
WHERE country = (SELECT TOP 1 country
FROM Northwind.dbo.Customers C
JOIN Northwind.dbo.Orders O
ON C.CustomerId = O.CustomerID
GROUP BY country
ORDER BY count(*))
Here we have written a subquery that
joins the Customer and Orders Tables to determine the total number of orders for each
country. The subquery uses the "TOP 1" clause to return the country with the fewest
number of orders. The country with the fewest number of orders is then used in the WHERE
clause to determine which Customer Information will be displayed.
Use of a Subquery in the FROM clause
The FROM clause normally identifies the tables used in the T-SQL
statement. You can think of each of the tables identified in the FROM clause as a set of
records. Well, a subquery is just a set of records, and therefore can be used in the FROM
clause just like a table. Here is an example where a subquery is used in the FROM clause
of a SELECT statement:
SELECT au_lname,
au_fname,
title FROM (SELECT au_lname, au_fname, au_id
FROM pubs.dbo.authors
WHERE state = 'CA') as A
JOIN pubs.dbo.titleauthor ta ON A.au_id = ta.au_id
JOIN pubs.dbo.titles t ON ta.title_id = t.title_id
Here we have used a subquery to select
only the author record information, if the author's record has a state column equal to
"CA." We have named the set returned from this subquery with a
table alias of "A". WeI can then use
this alias elsewhere in the T-SQL statement to refer to the columns from the subquery by
prefixing them with an "A", as we did
in the "ON" clause of the "JOIN" criteria. Sometimes using a subquery in the FROM clause
reduces the size of the set that needs to be joined. Reducing the number of records that
have to be joined enhances the performance of joining rows, and therefore speeds up the
overall execution of a query.
Subquery in the FROM clause of an UPDATE
statement:
SET NOCOUNT ON
CREATE TABLE x(
i INT IDENTITY,
a CHAR(1))
INSERT INTO x VALUES ('A')
INSERT INTO x VALUES ('B')
INSERT INTO x VALUES ('C')
INSERT INTO x VALUES ('D')
SELECT * FROM x
UPDATE x
SET a = b.a
FROM (SELECT MAX(a) AS a FROM x) b
WHERE I > 2
SELECT * FROM x
DROP TABLE x
Here we created a table named "x" that
has four rows. Then we proceeded to update the rows where "i"
was greater than 2 with the max value in column "a". We used a
subquery in the FROM clause of the UPDATE statement to identity the max value of column
"a."
Use of a Subquery in the HAVING clause
In the following example, we used a
subquery to find the number of books a publisher has published where the publisher is not
located in the state of California. To accomplish this we used
a subquery in a HAVING clause. Here is the code:
SELECT pub_name,
COUNT(*) bookcnt
FROM pubs.dbo.titles t
JOIN pubs.dbo.publishers p on t.pub_id = p.pub_id
GROUP BY pub_name
HAVING p.pub_name IN (SELECT pub_name
FROM pubs.dbo.publishers
WHERE state <> 'CA')
Here the subquery returns the pub_name
values for all publishers that have a state value not equal to "CA." The HAVING condition
then checks to see if the pub_name is in the set returned by my subquery.
Correlated Subqueries
A correlated subquery is a SELECT statement nested inside another T-SQL statement,
which contains a reference to one or more columns in the outer query. Therefore, the
correlated subquery can be said to be dependent on the outer query. This is the main
difference between a correlated subquery and just a plain subquery. A plain subquery is
not dependent on the outer query, can be run independently of the outer query, and will
return a result set. A correlated subquery, since it is dependent on the outer query will
return a syntax errors if it is run by itself.
A correlated subquery will be executed many times while processing the T-SQL statement
that contains the correlated subquery. The correlated subquery will be run once for each
candidate row selected by the outer query. The outer query columns, referenced in the
correlated subquery, are replaced with values from the candidate row prior to each
execution. Depending on the results of the execution of the correlated subquery, it will
determine if the row of the outer query is returned in the final result set.
Using a Correlated Subquery in a WHERE Clause
Suppose you want a report of all "OrderID's" where the customer did not purchase more
than 10% of the average quantity sold for a given product. This way you could review
these orders, and possibly contact the customers, to help determine if there was a reason
for the low quantity order. A correlated subquery in a WHERE clause can help you produce
this report. Here is a SELECT statement that produces the desired list of
"OrderID's":
SELECT DISTINCT OrderId
FROM Northwind.dbo.[Order Details] OD
WHERE Quantity > (SELECT AVG(Quantity) * .1
FROM Northwind.dbo.[Order Details]
WHERE OD.ProductID = ProductID)
The correlated subquery in the above command is contained within the parenthesis
following the greater than sign in the WHERE clause above. Here you can see this
correlated subquery contains a reference to "OD.ProductID". This reference compares the
outer query's "ProductID" with the inner query's "ProductID". When this query is
executed, the SQL engine will execute the inner query, the correlated subquery, for each
"[Order Details]" record. This inner query will calculate the average "Quantity" for the
particular "ProductID" for the candidate row being processed in the outer query. This
correlated subquery determines if the inner query returns a value that meets the
condition of the WHERE clause. If it does, the row identified by the outer query is
placed in the record set that will be returned from the complete T-SQL SELECT
statement.
The code below is another example that uses a correlated subquery in the WHERE clause
to display the top two customers, based on the dollar amount associated with their
orders, per region. You might want to perform a query like this so you can reward these
customers, since they buy the most per region.
SELECT C1.CompanyName,
C1.ContactName,
C1.Address,
C1.City,
C1.Country,
C1.PostalCode
FROM Northwind.dbo.Customers C1
WHERE C1.CustomerID IN (SELECT TOP 2 C2.CustomerId
FROM Northwind.dbo.[Order Details] OD
JOIN Northwind.dbo.Orders O on OD.OrderId = O.OrderID
JOIN Northwind.dbo.Customers C2 on O.CustomerID = C2.CustomerId
WHERE C2.Region = C1.Region
GROUP BY C2.Region, C2.CustomerId
ORDER BY SUM(OD.UnitPrice * OD.Quantity * (1 - OD.Discount)) DESC)
ORDER BY C1.Region
Here you can see the inner query is a correlated subquery because it references
"C1", which is the table alias for the
"Northwind.DBO.Customers" table in the outer query. This inner query uses the "Region"
value to calculate the top two customers for the region associated with the row being
processed from the outer query. If the "CustomerID" of the outer query is one of the top
two customers, then the record is placed in the record set to be returned.
Correlated Subquery in the HAVING Clause
Say your organizations wants to run a yearlong incentive program to increase revenue.
Therefore, they advertise to your customers that if each order they place, during the
year, is over $750 you will provide them a rebate at the end of the year at the rate of
$75 per order they place. Below is an example of how to calculate the rebate amount. This
example uses a correlated subquery in the HAVING clause to identify the customers that
qualify to receive the rebate.
SELECT C.CustomerID,
COUNT(*) * 75 Rebate
FROM Northwind.DBO.Customers C
JOIN Northwind.DBO.Orders O ON C.CustomerID = O.CustomerID
WHERE DATEPART(yy,OrderDate) = '1998'
GROUP BY C.CustomerId
HAVING 750 < ALL(SELECT SUM(UnitPrice * Quantity * (1 - Discount))
FROM Northwind.DBO.Orders O
JOIN Northwind.DBO.[Order Details] OD ON O.OrderID = OD.OrderID
WHERE O.CustomerID = C.CustomerId
AND DATEPART(yy,O.OrderDate) = '1998'
GROUP BY O.OrderId)
By reviewing this query, you can see the correlated query in the HAVING clause to
calculate the total order amount for each customer order. We use the "CustomerID" from
the outer query and the year of the order "Datepart(yy,OrderDate)", to help identify the
Order records associated with each customer, that were placed the year '1998'. For these
associated records I am calculating the total order amount, for each order, by summing up
all the "[Order Details]" records, using the following formula: sum(UnitPrice * Quantity
* (1-Discount)). If each and every order for a customer, for year 1998 has a total dollar
amount greater than 750, I then calculate the Rebate amount in the outer query using this
formula "Count(*) * 75 ".
SQL Server's query engine will only execute the inner correlated subquery in the
HAVING clause for those customer records identified in the outer query, or basically only
those customer that placed orders in "1998".
Performing an Update Statement Using a Correlated Subquery
A correlated subquery can even be used in an update statement. Here is an example:
create table A(A int, S int)
create table B(A int, B int)
set nocount on
insert into A(A) values(1)
insert into A(A) values(2)
insert into A(A) values(3)
insert into B values(1,1)
insert into B values(2,1)
insert into B values(2,1)
insert into B values(3,1)
insert into B values(3,1)
insert into B values(3,1)
update A
set S = (select sum(B)
from B
where
A.A = A group by A)
select * from A
drop table A,B
A
S
----------- -----------
1 1
2 2
3 3
In the query above, I used the correlated subquery to update column A in table A with
the sum of column B in table B for rows that have the same value in column A as the row
being updated.
Conclusion
A subquery and a correlated subquery are SELECT queries coded inside another query,
known as the outer query. The correlated subquery and the subquery help determine the
outcome of the result set returned by the complete query. A subquery, when executed
independent of the outer query, will return a result set, and is therefore not dependent
on the outer query. Where as, a correlated subquery cannot be executed independently of
the outer query because it uses one or more references to columns in the outer query to
determine the result set returned from the correlated subquery. I hope that you now
understand the different of subqueries and correlated subqueries, and how they can be
used in your T-SQL code.
|