About one non-conventional way of writing SQL code

by Boris Knizhnik,
BIK Information Services, Inc.

To my DBAs:
Susan Zweighaft, Tracy Chase, Sue Carbo and Jon Feld

Everyone knows SQL is a great language and it really pays to learn it in-depth. If only we had time to do so! We, full time GUI programmers, spend time learning intricacies of Powerbuilder, or VB or Delphi, or whatever else is out there, because we are hired to do just that. We never have enough time for serious SQL learning, because we are supposed to have a DBA to help when we need it. Yeah, right!

Well, first of all I am yet to see a place where DBAs are plenty and work for them is scarce so they really have time to work with us, mortals. Have you ever seen a DBA who could give you his undivided attention for more than 5 minutes without his beeper going off? Second, how are we supposed to know we need help? I personally can achieve whatever I want by running 8 SQL select statements and 3 cursors. How am I supposed to know I could do just that by using SQL more creatively? Where have you been before? I am certainly not going to change it now, it’s working.

I would like to present a couple of ideas I developed in attempts to learn some tricks to use SQL to achieve the results that usually are achieved in a way, that is half SQL- and half GUI-based.

I can see a lot of critic from DBA community and from SQL gurus. Hey, guys, you may know better, but I am writing for my community – GUI oriented guys.

Basic assumptions

I will be using Oracle 7 for my examples, but I am reasonably sure similar approach will work in MS SQL and Sybase.

Actually, I need an SQL function called DECODE in Oracle. There is a similar feature in MS SQL server and Sybase called CASE clause with similar capabilities but a lot more awkward syntax. Nevertheless, it can be used if you don’t have Oracle.

A couple of words about DECODE function:

DECODE(expr, comp_value1, subst_value1, comp_value2, subst_value2, … ,subst_value_if_nomatch)

This function will compare expr with the list of predefine values. If expr=comp_value1 then it returns subst_value1, if expr=comp_value2 then it returns subst_value2. If expr isn’t equal to any of the values, the function will return subst_value_if_nomatch. For example:

SELECT DECODE(sex, 'F', 'FEMALE', 'M', 'MALE', NULL, 'Not Reported', 'Invalid Value') FROM personnel;

Demo table

Let’s play with a demo table. Actually the table I will be using is a slightly modified version of the table that comes with Oracle 8 as a demo table emp. That is why I will call mine emp1. A script to create it is in Attachment A.

Table emp1:

empno ename job sal comm deptno

7369 SMITH 25 800 20

7499 ALLEN 23 1600 300 30

7521 WARD 23 1250 500 30

7566 JONES 13 2975 20

7654 MARTIN 23 1250 1400 30

7698 BLAKE 13 2850 30

7782 CLARK 13 2450 10

7788 SCOTT 20 3000 20

7839 KING 12 5000 10

7844 TURNER 23 1500 30

7876 ADAMS 25 1100 20

7900 JAMES 25 950 30

7902 FORD 20 3000 20

7934 MILLER 25 1300 10

Using commas in front of column names

I found it extremely convenient to list column names with commas in front of them rather than after. If you write your select statement while debugging it as

SELECT empno
, ename
,  job
, sal
,  comm
,  deptno 
 FROM emp1

you can always comment out pieces of it without caring too much about leftover commas. It just makes you life so much easier.

Using decoded computed items

We all know one can do calculations in select statements, but usually we come up with expressions that make some kind of economic sense, such as

SELECT empno
, ename
, sal
, sal * 0.02 employer_contribution_401K 
FROM emp1

Well, what about columns, which don’t seem to make that much sense:

SELECT empno , ename , job , comm , deptno , decode(job,'25',1,0) clerk , decode(job,'23',1,0) salesman , decode(job,'13',1,0) manager , decode(job,'20',1,0) analyst , decode(job,'12',1,0) president FROM emp1 ;

Here are the results:

empno Ename job comm deptno clerk salesman manager analyst president

7369 SMITH 25 20 1 0 0 0 0

7499 ALLEN 23 300 30 0 1 0 0 0

7521 WARD 23 500 30 0 1 0 0 0

7566 JONES 13 20 0 0 1 0 0

7654 MARTIN 23 1400 30 0 1 0 0 0

7698 BLAKE 13 30 0 0 1 0 0

7782 CLARK 13 10 0 0 1 0 0

7788 SCOTT 20 20 0 0 0 1 0

7839 KING 12 10 0 0 0 0 1

7844 TURNER 23 30 0 1 0 0 0

7876 ADAMS 25 20 1 0 0 0 0

7900 JAMES 25 30 1 0 0 0 0

7902 FORD 20 20 0 0 0 1 0

7934 MILLER 25 10 1 0 0 0 0

First, what does a column like "decode(job,'25',1,0) clerk" represent ? It will contain 1 if the person is a clerk and 0 if not. Similarly, column ‘salesman’ will be equal to 1 if a person is a salesman or 0 if not.

Essentially, what we just did, we created self-explanatory column names that could be used for documentation purposes and/or conditions:
If you are lucky and your DBA allows you to create a view like that, let’s call it emp_jobs_vw you can select all managers by select statement :

SELECT * FROM emp_jobs_vw WHERE manager = 1;

If you can’t create views you can still write your like this:

SELECT * FROM ( SELECT empno , ename , job , comm , deptno , decode(job,'25',1,0) clerk , decode(job,'23',1,0) salesman , decode(job,'13',1,0) manager , decode(job,'20',1,0) analyst decode(job,'12',1,0) president FROM emp1 WHERE manager = 1

Well, it may not look such a big deal since you can do the same thing by

SELECT * FROM emp1 WHERE job='13';

but I think it is a lot more cryptic than ‘manager = 1’.

Obviously we are loosing something here: If you had an index on column ‘job’, your optimizer could have used it for WHERE job=’13’ and wouldn’t use it for ‘manager = 1’. This is true but I encourage you to try it first and see whether performance goes down significantly and also consider some of the other suggestions in this article before you discard these tricks as inefficient. Consider this: Your total response time consists of the time it takes server to process your requests and the time it takes your GUI to prepare results for displaying. From this prospective you always have to weigh superefficient SQL that returns results in inconvenient format and your having to massage it in your GUI program versus maybe less efficient SQL that returns results in a ready-to-use format. I am not even mentioning maintenance problems, etc.

Also, don’t get me wrong – I am not lobbying for usage of words for all possible columns that contain code instead of English word. Certain applications or databases seem to lean heavily on a few concepts for which it may be convenient to have English-like words.

Let’s assume we have an application where we want to give a bird’s view of a store.

Give WHERE conditions names

I will be trying to avoid usage of views, because that way you can see the entire code and also because views are not always readily accepted by DBAs if they are coming from programmers. I must admit, though, inserting views as inline code makes the final select look a lot more complex and complicated than it is. So, if you have rights to create views just do so and it will simplify all the SQL statements significantly.

Let’s consider this SQL:

SELECT empno
, ename
, job
, sal
, decode(clerk+salesman,0,0,1) low_in_a_food_chain
, manager * decode(sign(sal - 2900),-1,1,0) underpaid_manager
, salesman * decode(sign(sal - 1300), 1,1,0) overpaid_salesman
, clerk * decode(sign(sal - 900), 1,1,0)
    * decode(sign(sal - 1200), -1,1,0) mid_level_clerk
FROM
( SELECT empno
, ename
, job
, sal
, comm
, deptno
, decode(deptno,10,1,0) dept10
, decode(deptno,20,1,0) dept20
, decode(deptno,30,1,0) dept30
, decode(job,'25',1,0) clerk
, decode(job,'23',1,0) salesman
, decode(job,'13',1,0) manager
, decode(job,'20',1,0) analyst
, decode(job,'12',1,0) president
FROM emp1
) ;

Here are the results:

empno ename Job sal Low in a food chain Underpaid manager Overpaid salesman mid level clerk

7369 SMITH 25 800 1 0 0 0

7499 ALLEN 23 1600 1 0 1 0

7521 WARD 23 1250 1 0 0 0

7566 JONES 13 2975 0 0 0 0

7654 MARTIN 23 1250 1 0 0 0

7698 BLAKE 13 2850 0 1 0 0

7782 CLARK 13 2450 0 1 0 0

7788 SCOTT 20 3000 0 0 0 0

7839 KING 12 5000 0 0 0 0

7844 TURNER 23 1500 1 0 1 0

7876 ADAMS 25 1100 1 0 0 1

7900 JAMES 25 950 1 0 0 1

7902 FORD 20 3000 0 0 0 0

7934 MILLER 25 1300 1 0 0 0

In the inner SELECT we added columns like decode(deptno,10,1,0) dept10 – the same approach – we will have this column equal 1 if this employee works for department 10, and 0 if not, etc.

The outer SELECT assigns names to certain conditions. For instance,

manager * decode(sign(sal - 2900),-1,1,0) underpaid_manager

column ‘underpaid_manager’ will be equal to 1 if a person is a manager (manager=1) and sal<2900.

What we did here is we assigned names to relatively complex conditions. If a programmer needs to select underpaid managers he/she may code WHERE underpaid_manager =1. Those of you who worked in COBOL environment would probably recognize the 88-level names.

If we could create a view like above mentioned, how much easier the maintenance would be, because the intricacies of who is underpaid manager and who is overpaid salesman would be hidden in the view rather than buried in guts of a GUI program!

With this select statement your GUI program can use columns ‘underpaid_manager’ or ‘overpaid salesman’ as auxiliary columns to select graphical presentation, such as different background colors or bitmaps. It needn’t worry anymore about the complexity of the condition, causing this difference, no complex logic to determine who is who in your program is required. Isn’t it wonderful?

Counting number of records

Now, as we agreed, we are trying to give a bird’s view on our company. Let’s get some counts:

SELECT SUM(dept10) employees_in_dept10
,SUM(dept20) employees_in_dept20
,SUM(dept30) employees_in_dept30
,count(empno) n_of_employees
,SUM(clerk) n_of_clerks
,SUM(salesman) n_of_salesmen
,SUM(manager) n_of_managers
,SUM(analyst) n_of_analysts
,SUM(president) n_of_presidents
FROM
(
SELECT empno
, ename
, job
, sal
, comm
, deptno
, decode(deptno,10,1,0) dept10
, decode(deptno,20,1,0) dept20
, decode(deptno,30,1,0) dept30
, decode(job,'25',1,0) clerk
, decode(job,'23',1,0) salesman
, decode(job,'13',1,0) manager
, decode(job,'20',1,0) analyst
, decode(job,'12',1,0) president
FROM emp1
)
;

Here are the results:

Employees in dept10	Employees in dept20	Employees in dept30	N of employees	N of clerks	N of sales men	N of mana gers	N of ana lysts	N of presi dents
3	5	6	14	4	4	3	2	1

This code would give us counts of people by all three departments and also by job. Now, honest, how many of us if application requires these kind of counts would with very little hesitation code eight separate queries for that, or if we are doing it after drinking our morning coffee we will end up with a union such as

SELECT 'D' record_type , TO_CHAR(deptno) fld1 , COUNT(empno) count1 FROM emp1 GROUP BY deptno UNION SELECT 'J' record_type , job fld1 , COUNT(job) count1 FROM emp1 GROUP BY job ;

And after a little coding efforts in your GUI program to tell what kind of record (‘D’ or ‘J’) you are looking at and decoding the values we will get the same result.

Again, obviously the latter solution is in a way universal, but if a number of items you are interested in is fixed I think the former solution is easier.

Counting number of records satisfying certain conditions

We can get pretty sophisticated in how to use these artificial columns. Let’s try to count number of clerks in the entire organization and their total salary as well as number of clerks in Department 20 and their total salary:

SELECT SUM(clerk) n_of_clerks
,SUM(clerk * sal) clerk_salary_total
,SUM(dept20 * clerk) n_of_clerks_in_dept20
,SUM(dept20 * clerk * sal) dept20_clerk_salary_total
FROM
(
SELECT empno
, ename
, job
, sal
, comm
, deptno 
, decode(deptno,10,1,0) dept10
, decode(deptno,20,1,0) dept20
, decode(deptno,30,1,0) dept30
, decode(job,'25',1,0) clerk
, decode(job,'23',1,0) salesman
, decode(job,'13',1,0) manager
, decode(job,'20',1,0) analyst
, decode(job,'12',1,0) president
FROM emp1
)
;

Here are the results:

N of clerks	Clerk salary total	N of clerks in dept20	dept20 clerk salary total
4	4150	2	1900

This select does the same thing as these two:

SELECT COUNT(empno) n_of_clerks_in_dept20 , SUM(sal) dept20_clerk_salary_total FROM emp1 WHERE deptno = '20' AND job = '25' ;

and

SELECT COUNT(empno) n_of_clerks , SUM(sal) clerk_salary_total FROM emp1 ;

If you are really proficient with SQL you probably know that you can put a complex expression as an argument for a ‘SUM’ or ‘COUNT’ function and therefore get all the results in one select statement. In such case these expressions will essentially be of the same level of complexity as mine are, and it will just prove my point that this may be the way to go.

Substitute a set of similar queries with one SELECT statement

Some times there is a need to get lots of counters in a report. The previous query showed how to get two sets of counters in one SELECT statement. In a more generic approach, let’s say we want to count numbers of underpaid managers, overpaid salesmen, mid level clerks and those low in a food chain. Obviously, we can run four SELECT statements for each category (or, a little fancier, a union of three SELECT statements) and the only difference between them will be their WHERE condition. Depending on your database design, you may have a really convenient set of indexes for the table(s) involved, but if a number of SELECT statements is significant (in a place where I have a contract now, one of the applications would automatically create in a loop what amounts to more than 300 SELECT statements just to get counts) it may still be beneficiary to get rid of all those different WHERE conditions and do a table sweep once to get all counts we need.

Look at this example:

SELECT sum(low_in_a_food_chain) low_in_a_food_chain , sum(underpaid_manager) underpaid_managers , sum(overpaid_salesman) overpaid_salesmen , sum(mid_level_clerk) mid_level_clerks FROM ( SELECT decode(clerk+salesman,0,0,1) low_in_a_food_chain , manager * decode(sign(sal - 2900),-1,1,0) underpaid_manager , salesman * decode(sign(sal - 1300), 1,1,0) overpaid_salesman , clerk * decode(sign(sal - 900), 1,1,0) * decode(sign(sal - 1200), -1,1,0) mid_level_clerk FROM ( SELECT empno , ename , job , sal , comm , deptno , decode(deptno,10,1,0) dept10 , decode(deptno,20,1,0) dept20 , decode(deptno,30,1,0) dept30 , decode(job,'25',1,0) clerk , decode(job,'23',1,0) salesman , decode(job,'13',1,0) manager , decode(job,'20',1,0) analyst , decode(job,'12',1,0) president FROM emp1 ) ) ;

Here are the results:

Low in a food chain Underpaid managers Overpaid salesmen mid level clerks

8 2 2 2

If you think it looks threatening - it does. I just want to remind you, I am inserting what could be a view as in-line SELECT statement, and a list of columns is given as one on a line. In any case, what matters here is not so much the time you spend typing this SELECT as the time it takes to run it. Try to use this approach if you have a lot of similar queries and you will see it saves you a lot on a total time.

Conclusion

I developed several of these approaches while working on my current project. The results of using them are very encouraging. I realize that moving some of the conditions from WHERE clause to SELECT clause, or using some complex logic to convert a logical expression into 1 or 0 slows down the server. But I have also seen some pretty complex GUI code that was doing almost the same thing but in PowerBuilder. Is it better? I don’t think so. I am not using this approach in every situation, but I keep it in mind and when the right time comes I would not hesitate to try it and compare with alternative approaches. Hey, have faith in your Server’s optimizer. What do you think – it can’t handle arithmetic expressions? And again, I keep saying that – the worst cases I have seen were caused not by a poor SQL performance, but rather misuse of SQL by a GUI program. So if we stop patching things in your GUI and let server do its job, maybe it will not be as inefficient as it appears.

ATTACHMENT A

CREATE TABLE emp1 (
empno integer NOT NULL,
ename char(10) NOT NULL,
job char(2) NOT NULL,
sal decimal(16, 2),
comm decimal(16, 2) NULL,
deptno integer NOT NULL
);

INSERT INTO emp1 VALUES (7369,'SMITH', '25', 800, NULL, 20);
INSERT INTO emp1 VALUES (7499,'ALLEN','23', 1600, 300, 30);
INSERT INTO emp1 VALUES (7521,'WARD','23',1250, 500, 30);
INSERT INTO emp1 VALUES (7566,'JONES','13',2975, NULL, 20);
INSERT INTO emp1 VALUES (7654,'MARTIN','23', 1250, 1400, 30);
INSERT INTO emp1 VALUES (7698,'BLAKE','13',2850, NULL, 30);
INSERT INTO emp1 VALUES (7782,'CLARK','13',2450, NULL, 10);
INSERT INTO emp1 VALUES (7788,'SCOTT','20',3000, NULL, 20);
INSERT INTO emp1 VALUES (7839,'KING','12',5000, NULL, 10);
INSERT INTO emp1 VALUES (7844,'TURNER','23', 1500, NULL, 30);
INSERT INTO emp1 VALUES (7876,'ADAMS','25',1100,NULL, 20);
INSERT INTO emp1 VALUES (7900,'JAMES','25', 950,NULL, 30);
INSERT INTO emp1 VALUES (7902,'FORD','20',3000,NULL, 20);
INSERT INTO emp1 VALUES (7934,'MILLER','25',1300,NULL,10);

If you have any comments about this article please send them to Boris Knizhnik borisk@bikinfo.com

empno	ename	job	sal	comm	deptno
7369	SMITH	25	800		20
7499	ALLEN	23	1600	300	30
7521	WARD	23	1250	500	30
7566	JONES	13	2975		20
7654	MARTIN	23	1250	1400	30
7698	BLAKE	13	2850		30
7782	CLARK	13	2450		10
7788	SCOTT	20	3000		20
7839	KING	12	5000		10
7844	TURNER	23	1500		30
7876	ADAMS	25	1100		20
7900	JAMES	25	950		30
7902	FORD	20	3000		20
7934	MILLER	25	1300		10

Low in a food chain	Underpaid managers	Overpaid salesmen	mid level clerks
8	2	2	2