CUBE and GROUPING

Posted on December 28, 2012

0


CUBE was introduced in SQL Server 2005. When you use the CUBE operator it generates a result set of every possible combination from your set of columns; much like a CUBE would be generated if you were using analysis services. the CUBE operator isn’t something I’ve had to use much so I’ll show you an example.

Lets imagine a Company gets rated by it’s employees in various categories out of 100. This is a table that holds a summarised rating for a Company; we’ll insert a single row into this as well so we can query on it.

CREATE TABLE CompanyRating
(
    CompanyID INT,
    CareerOpportunities INT,
    CompensationAndBenefits INT,
    WorkLifeBalance INT,
    SeniorLeadership INT,
    CultureAndValues INT
)

INSERT INTO CompanyRating
 ( CompanyId, CareerOpportunities, CompensationAndBenefits,
 WorkLifeBalance, SeniorLeadership, CultureAndValues )
VALUES
 ( 1, 60, 90, 75, 90, 10 )

To see the CUBE in action we’ll write a very simple query.


SELECT
 CompanyId , CareerOpportunities, CompensationAndBenefits,
 WorkLifeBalance, SeniorLeadership, CultureAndValues
FROM
 CompanyRating
GROUP BY
 CompanyID, CareerOpportunities, CompensationAndBenefits, WorkLifeBalance,
 SeniorLeadership, CultureAndValues
 WITH CUBE
HAVING CompanyID IS NOT NULL

This produces results for all combinations specified in the GROUP BY.

results

Useful huh? What for?. Well today I had a use for it. I wanted to generate a bitwise and sum each combination so I could use it as part of a search process I’m working on; the idea is that you pass in “I’m interested in Companies with a high rating in Work Life Balance and Culture And Values” and it would return the top 25% of companies rated high for that combination. For this to be fast, I wanted to cache the combinations.

To get a bitwise easily we can use another feature introduced at the same time called GROUPING. This is used to tell us whether a particular column within the GROUP BY expression is aggregated (1) or not (0) for each row in our results. This is really useful for generating a bitwise, because whenever there is a value for a column I’ll need to sum the bit value I assign to each column to create the bitwise.

I think this is easier to explain with an example:


SELECT
 CASE WHEN GROUPING(CareerOpportunities) = 0 THEN 1 ELSE 0 END C1,
 CASE WHEN GROUPING(CompensationAndBenefits) = 0 THEN 2 ELSE 0 END C2,
 CASE WHEN GROUPING(WorkLifeBalance) = 0 THEN 4 ELSE 0 END C3,
 CASE WHEN GROUPING(SeniorLeadership) = 0 THEN 8 ELSE 0 END C4,
 CASE WHEN GROUPING(CultureAndValues) = 0 THEN 16 ELSE 0 END C5,
 CompanyID, CareerOpportunities, CompensationAndBenefits,
 WorkLifeBalance, SeniorLeadership, CultureAndValues
FROM
 CompanyRating
GROUP BY
 CompanyID, CareerOpportunities, CompensationAndBenefits, WorkLifeBalance,
 SeniorLeadership, CultureAndValues
 WITH CUBE
HAVING CompanyID IS NOT NULL

Here I’ve allocated each column a unique bit value; 1,2,4,8,16,32.  The GROUPING function is used to say , “is WorkLifeBalance not aggregated (so displayed) for this row?”. I want to have the bit value if this is 0 (zero – not aggregated). To finish up this example I’ll add the bit values to get the bitwise and add the rating values to get a total per bitwise.


;WITH Ratings
AS
(
SELECT
 CASE WHEN GROUPING(CareerOpportunities) = 0 THEN 1 ELSE 0 END C1,
 CASE WHEN GROUPING(CompensationAndBenefits) = 0 THEN 2 ELSE 0 END C2,
 CASE WHEN GROUPING(WorkLifeBalance) = 0 THEN 4 ELSE 0 END C3,
 CASE WHEN GROUPING(SeniorLeadership) = 0 THEN 8 ELSE 0 END C4,
 CASE WHEN GROUPING(CultureAndValues) = 0 THEN 16 ELSE 0 END C5,
 CompanyID, CareerOpportunities, CompensationAndBenefits,
 WorkLifeBalance, SeniorLeadership, CultureAndValues
FROM
 CompanyRating
GROUP BY
 CompanyID, CareerOpportunities, CompensationAndBenefits, WorkLifeBalance,
 SeniorLeadership, CultureAndValues
 WITH CUBE
HAVING CompanyID IS NOT NULL
)
SELECT
CompanyID,
(C1+C2+C3+C4+C5) AS Bitwise,
ISNULL(CareerOpportunities,0)+ ISNULL( CompensationAndBenefits,0)+ ISNULL( WorkLifeBalance,0)
+ISNULL(SeniorLeadership,0) + ISNULL( CultureAndValues,0) AS RatingTotal
FROM Ratings

We can now store this in a table and use it in queries to quickly get us the top 25 percent companies with a particular combination of rating.


SELECT TOP 25 PERCENT CompanyID, RatingTotal
FROM CompanyRatingCache
WHERE Bitwise = 20 -- WorkLifeBalance + CultureAndValues
ORDER BY RatingTotal DESC

I’d like to explore other uses for CUBE and possibly ROLLUP too and hear about you have used these features. Please comment or get in touch and I hope you enjoyed this post.

Advertisements