About one non-conventional way of writing SQL code

 

by Boris Knizhnik,
BIK Information Services, Inc.
  To my DBAs:
Susan Zweighaft, Tracy Chase, Sue Carbo and Jon Feld

 

Everyone knows SQL is a great language and it really pays to learn it in-depth. If only we had time to do so! We, full time GUI programmers, spend time learning intricacies of Powerbuilder, or VB or Delphi, or whatever else is out there, because we are hired to do just that. We never have enough time for serious SQL learning, because we are supposed to have a DBA to help when we need it. Yeah, right!

Well, first of all I am yet to see a place where DBAs are plenty and work for them is scarce so they really have time to work with us, mortals. Have you ever seen a DBA who could give you his undivided attention for more than 5 minutes without his beeper going off? Second, how are we supposed to know we need help? I personally can achieve whatever I want by running 8 SQL select statements and 3 cursors. How am I supposed to know I could do just that by using SQL more creatively? Where have you been before? I am certainly not going to change it now, it’s working.

I would like to present a couple of ideas I developed in attempts to learn some tricks to use SQL to achieve the results that usually are achieved in a way, that is half SQL- and half GUI-based.

I can see a lot of critic from DBA community and from SQL gurus. Hey, guys, you may know better, but I am writing for my community – GUI oriented guys.

Basic assumptions

I will be using Oracle 7 for my examples, but I am reasonably sure similar approach will work in MS SQL and Sybase.

Actually, I need an SQL function called DECODE in Oracle. There is a similar feature in MS SQL server and Sybase called CASE clause with similar capabilities but a lot more awkward syntax. Nevertheless, it can be used if you don’t have Oracle.

A couple of words about DECODE function:
 
DECODE(expr, comp_value1, subst_value1, comp_value2, subst_value2, … ,subst_value_if_nomatch)

This function will compare expr with the list of predefine values. If expr=comp_value1 then it returns subst_value1, if expr=comp_value2 then it returns subst_value2. If expr isn’t equal to any of the values, the function will return subst_value_if_nomatch. For example:
 
SELECT DECODE(sex, 'F', 'FEMALE', 'M', 'MALE', NULL, 'Not Reported', 'Invalid Value') FROM personnel;

Demo table

Let’s play with a demo table. Actually the table I will be using is a slightly modified version of the table that comes with Oracle 8 as a demo table emp. That is why I will call mine emp1. A script to create it is in Attachment A.

Table emp1:
 
empno ename job sal comm deptno
7369 SMITH  25 800   20
7499 ALLEN  23 1600 300 30
7521 WARD  23 1250 500 30
7566 JONES  13 2975   20
7654 MARTIN  23 1250 1400 30
7698 BLAKE  13 2850   30
7782 CLARK  13 2450   10
7788 SCOTT  20 3000   20
7839 KING  12 5000   10
7844 TURNER  23 1500   30
7876 ADAMS  25 1100   20
7900 JAMES  25 950   30
7902 FORD  20 3000   20
7934 MILLER  25 1300   10

Using commas in front of column names

I found it extremely convenient to list column names with commas in front of them rather than after. If you write your select statement while debugging it as
 
SELECT empno
, ename
,  job
, sal
,  comm
,  deptno 
 FROM emp1

you can always comment out pieces of it without caring too much about leftover commas. It just makes you life so much easier.

Using decoded computed items

We all know one can do calculations in select statements, but usually we come up with expressions that make some kind of economic sense, such as
 
SELECT empno
, ename
, sal
, sal * 0.02 employer_contribution_401K 
FROM emp1

Well, what about columns, which don’t seem to make that much sense:
 
SELECT empno
, ename
, job
, comm
, deptno
, decode(job,'25',1,0) clerk
, decode(job,'23',1,0) salesman
, decode(job,'13',1,0) manager
, decode(job,'20',1,0) analyst
, decode(job,'12',1,0) president
FROM emp1
;

Here are the results:
empno Ename job comm deptno clerk salesman manager analyst president
7369 SMITH  25   20 1 0 0 0 0
7499 ALLEN  23 300 30 0 1 0 0 0
7521 WARD  23 500 30 0 1 0 0 0
7566 JONES  13   20 0 0 1 0 0
7654 MARTIN  23 1400 30 0 1 0 0 0
7698 BLAKE  13   30 0 0 1 0 0
7782 CLARK  13   10 0 0 1 0 0
7788 SCOTT  20   20 0 0 0 1 0
7839 KING  12   10 0 0 0 0 1
7844 TURNER  23   30 0 1 0 0 0
7876 ADAMS  25   20 1 0 0 0 0
7900 JAMES  25   30 1 0 0 0 0
7902 FORD  20   20 0 0 0 1 0
7934 MILLER  25   10 1 0 0 0 0

First, what does a column like "decode(job,'25',1,0) clerk" represent ? It will contain 1 if the person is a clerk and 0 if not. Similarly, column ‘salesman’ will be equal to 1 if a person is a salesman or 0 if not.

Essentially, what we just did, we created self-explanatory column names that could be used for documentation purposes and/or conditions:
If you are lucky and your DBA allows you to create a view like that, let’s call it emp_jobs_vw you can select all managers by select statement :
SELECT * FROM emp_jobs_vw WHERE manager = 1;

If you can’t create views you can still write your like this:
SELECT * FROM
(
SELECT empno
, ename
, job
, comm
, deptno
, decode(job,'25',1,0) clerk
, decode(job,'23',1,0) salesman
, decode(job,'13',1,0) manager
, decode(job,'20',1,0) analyst
 decode(job,'12',1,0) president
FROM emp1

WHERE manager = 1

Well, it may not look such a big deal since you can do the same thing by
SELECT * FROM emp1 WHERE job='13';
but I think it is a lot more cryptic than ‘manager = 1’.

Obviously we are loosing something here: If you had an index on column ‘job’, your optimizer could have used it for WHERE job=’13’ and wouldn’t use it for ‘manager = 1’. This is true but I encourage you to try it first and see whether performance goes down significantly and also consider some of the other suggestions in this article before you discard these tricks as inefficient. Consider this: Your total response time consists of the time it takes server to process your requests and the time it takes your GUI to prepare results for displaying. From this prospective you always have to weigh superefficient SQL that returns results in inconvenient format and your having to massage it in your GUI program versus maybe less efficient SQL that returns results in a ready-to-use format. I am not even mentioning maintenance problems, etc.

Also, don’t get me wrong – I am not lobbying for usage of words for all possible columns that contain code instead of English word. Certain applications or databases seem to lean heavily on a few concepts for which it may be convenient to have English-like words.

Let’s assume we have an application where we want to give a bird’s view of a store.

Give WHERE conditions names

I will be trying to avoid usage of views, because that way you can see the entire code and also because views are not always readily accepted by DBAs if they are coming from programmers. I must admit, though, inserting views as inline code makes the final select look a lot more complex and complicated than it is. So, if you have rights to create views just do so and it will simplify all the SQL statements significantly.

Let’s consider this SQL:
SELECT empno    
, ename         
, job   
, sal   
, decode(clerk+salesman,0,0,1) low_in_a_food_chain      
, manager * decode(sign(sal - 2900),-1,1,0) underpaid_manager   
, salesman * decode(sign(sal - 1300), 1,1,0) overpaid_salesman   
, clerk * decode(sign(sal - 900), 1,1,0) 
    * decode(sign(sal - 1200), -1,1,0) mid_level_clerk  
FROM  
( SELECT empno  
, ename         
, job   
, sal   
, comm  
, deptno        
, decode(deptno,10,1,0) dept10  
, decode(deptno,20,1,0) dept20  
, decode(deptno,30,1,0) dept30  
, decode(job,'25',1,0) clerk    
, decode(job,'23',1,0) salesman         
, decode(job,'13',1,0) manager  
, decode(job,'20',1,0) analyst  
, decode(job,'12',1,0) president 
FROM emp1 
) ;     

Here are the results:
empno ename Job sal Low in a food chain Underpaid manager Overpaid salesman mid level clerk
7369 SMITH  25 800 1 0 0 0
7499 ALLEN  23 1600 1 0 1 0
7521 WARD  23 1250 1 0 0 0
7566 JONES  13 2975 0 0 0 0
7654 MARTIN  23 1250 1 0 0 0
7698 BLAKE  13 2850 0 1 0 0
7782 CLARK  13 2450 0 1 0 0
7788 SCOTT  20 3000 0 0 0 0
7839 KING  12 5000 0 0 0 0
7844 TURNER  23 1500 1 0 1 0
7876 ADAMS  25 1100 1 0 0 1
7900 JAMES  25 950 1 0 0 1
7902 FORD  20 3000 0 0 0 0
7934 MILLER  25 1300 1 0 0 0

In the inner SELECT we added columns like decode(deptno,10,1,0) dept10 – the same approach – we will have this column equal 1 if this employee works for department 10, and 0 if not, etc.

The outer SELECT assigns names to certain conditions. For instance,
manager * decode(sign(sal - 2900),-1,1,0) underpaid_manager
column ‘underpaid_manager’ will be equal to 1 if a person is a manager (manager=1) and sal<2900.

What we did here is we assigned names to relatively complex conditions. If a programmer needs to select underpaid managers he/she may code WHERE underpaid_manager =1. Those of you who worked in COBOL environment would probably recognize the 88-level names.

If we could create a view like above mentioned, how much easier the maintenance would be, because the intricacies of who is underpaid manager and who is overpaid salesman would be hidden in the view rather than buried in guts of a GUI program!

With this select statement your GUI program can use columns ‘underpaid_manager’ or ‘overpaid salesman’ as auxiliary columns to select graphical presentation, such as different background colors or bitmaps. It needn’t worry anymore about the complexity of the condition, causing this difference, no complex logic to determine who is who in your program is required. Isn’t it wonderful?

Counting number of records

Now, as we agreed, we are trying to give a bird’s view on our company. Let’s get some counts:
 
SELECT SUM(dept10) employees_in_dept10
,SUM(dept20) employees_in_dept20
,SUM(dept30) employees_in_dept30
,count(empno) n_of_employees
,SUM(clerk) n_of_clerks
,SUM(salesman) n_of_salesmen
,SUM(manager) n_of_managers
,SUM(analyst) n_of_analysts
,SUM(president) n_of_presidents
FROM
(
SELECT empno
, ename
, job
, sal
, comm
, deptno
, decode(deptno,10,1,0) dept10
, decode(deptno,20,1,0) dept20
, decode(deptno,30,1,0) dept30
, decode(job,'25',1,0) clerk
, decode(job,'23',1,0) salesman
, decode(job,'13',1,0) manager
, decode(job,'20',1,0) analyst
, decode(job,'12',1,0) president
FROM emp1
)
;
Here are the results:
Employees in dept10 Employees in dept20 Employees in dept30 N of employees N of clerks N of sales men N of mana gers N of ana lysts N of presi dents
3 5 6 14 4 4 3 2 1

This code would give us counts of people by all three departments and also by job. Now, honest, how many of us if application requires these kind of counts would with very little hesitation code eight separate queries for that, or if we are doing it after drinking our morning coffee we will end up with a union such as
 
SELECT 'D' record_type
, TO_CHAR(deptno) fld1
, COUNT(empno) count1
FROM emp1
GROUP BY deptno
UNION
SELECT 'J' record_type
, job fld1
, COUNT(job) count1
FROM emp1
GROUP BY job
;

And after a little coding efforts in your GUI program to tell what kind of record (‘D’ or ‘J’) you are looking at and decoding the values we will get the same result.

Again, obviously the latter solution is in a way universal, but if a number of items you are interested in is fixed I think the former solution is easier.

Counting number of records satisfying certain conditions

We can get pretty sophisticated in how to use these artificial columns. Let’s try to count number of clerks in the entire organization and their total salary as well as number of clerks in Department 20 and their total salary:
 
SELECT SUM(clerk) n_of_clerks
,SUM(clerk * sal) clerk_salary_total
,SUM(dept20 * clerk) n_of_clerks_in_dept20
,SUM(dept20 * clerk * sal) dept20_clerk_salary_total
FROM
(
SELECT empno
, ename
, job
, sal
, comm
, deptno 
, decode(deptno,10,1,0) dept10
, decode(deptno,20,1,0) dept20
, decode(deptno,30,1,0) dept30
, decode(job,'25',1,0) clerk
, decode(job,'23',1,0) salesman
, decode(job,'13',1,0) manager
, decode(job,'20',1,0) analyst
, decode(job,'12',1,0) president
FROM emp1
)
;
Here are the results:
N of clerks Clerk salary total N of clerks in dept20 dept20 clerk salary total
4 4150 2 1900

This select does the same thing as these two:
 
SELECT COUNT(empno) n_of_clerks_in_dept20
, SUM(sal) dept20_clerk_salary_total
FROM emp1
WHERE deptno = '20'
AND job = '25'
;
and
SELECT COUNT(empno) n_of_clerks
, SUM(sal) clerk_salary_total
FROM emp1
;

If you are really proficient with SQL you probably know that you can put a complex expression as an argument for a ‘SUM’ or ‘COUNT’ function and therefore get all the results in one select statement. In such case these expressions will essentially be of the same level of complexity as mine are, and it will just prove my point that this may be the way to go.

Substitute a set of similar queries with one SELECT statement

Some times there is a need to get lots of counters in a report. The previous query showed how to get two sets of counters in one SELECT statement. In a more generic approach, let’s say we want to count numbers of underpaid managers, overpaid salesmen, mid level clerks and those low in a food chain. Obviously, we can run four SELECT statements for each category (or, a little fancier, a union of three SELECT statements) and the only difference between them will be their WHERE condition. Depending on your database design, you may have a really convenient set of indexes for the table(s) involved, but if a number of SELECT statements is significant (in a place where I have a contract now, one of the applications would automatically create in a loop what amounts to more than 300 SELECT statements just to get counts) it may still be beneficiary to get rid of all those different WHERE conditions and do a table sweep once to get all counts we need.

Look at this example:
 
SELECT sum(low_in_a_food_chain) low_in_a_food_chain
, sum(underpaid_manager) underpaid_managers
, sum(overpaid_salesman) overpaid_salesmen
, sum(mid_level_clerk) mid_level_clerks
FROM
(
SELECT decode(clerk+salesman,0,0,1) low_in_a_food_chain
, manager
  * decode(sign(sal - 2900),-1,1,0)  
  underpaid_manager
, salesman 
  * decode(sign(sal - 1300), 1,1,0) 
overpaid_salesman 
, clerk
  * decode(sign(sal - 900), 1,1,0)  
  * decode(sign(sal - 1200), -1,1,0)  
  mid_level_clerk 
FROM 
(
SELECT empno
, ename
, job
, sal
, comm
, deptno 
, decode(deptno,10,1,0) dept10
, decode(deptno,20,1,0) dept20
, decode(deptno,30,1,0) dept30
, decode(job,'25',1,0) clerk
, decode(job,'23',1,0) salesman
, decode(job,'13',1,0) manager
, decode(job,'20',1,0) analyst
, decode(job,'12',1,0) president
FROM emp1
)
)
;
Here are the results:
Low in a food chain Underpaid managers Overpaid salesmen mid level clerks
8 2 2 2

If you think it looks threatening - it does. I just want to remind you, I am inserting what could be a view as in-line SELECT statement, and a list of columns is given as one on a line. In any case, what matters here is not so much the time you spend typing this SELECT as the time it takes to run it. Try to use this approach if you have a lot of similar queries and you will see it saves you a lot on a total time.

Conclusion

I developed several of these approaches while working on my current project. The results of using them are very encouraging. I realize that moving some of the conditions from WHERE clause to SELECT clause, or using some complex logic to convert a logical expression into 1 or 0 slows down the server. But I have also seen some pretty complex GUI code that was doing almost the same thing but in PowerBuilder. Is it better? I don’t think so. I am not using this approach in every situation, but I keep it in mind and when the right time comes I would not hesitate to try it and compare with alternative approaches. Hey, have faith in your Server’s optimizer. What do you think – it can’t handle arithmetic expressions? And again, I keep saying that – the worst cases I have seen were caused not by a poor SQL performance, but rather misuse of SQL by a GUI program. So if we stop patching things in your GUI and let server do its job, maybe it will not be as inefficient as it appears.

ATTACHMENT A

CREATE TABLE emp1 (
empno integer NOT NULL,
ename char(10) NOT NULL,
job char(2) NOT NULL,
sal decimal(16, 2),
comm decimal(16, 2) NULL,
deptno integer NOT NULL
);

INSERT INTO emp1 VALUES (7369,'SMITH', '25', 800, NULL, 20);
INSERT INTO emp1 VALUES (7499,'ALLEN','23', 1600, 300, 30);
INSERT INTO emp1 VALUES (7521,'WARD','23',1250, 500, 30);
INSERT INTO emp1 VALUES (7566,'JONES','13',2975, NULL, 20);
INSERT INTO emp1 VALUES (7654,'MARTIN','23', 1250, 1400, 30);
INSERT INTO emp1 VALUES (7698,'BLAKE','13',2850, NULL, 30);
INSERT INTO emp1 VALUES (7782,'CLARK','13',2450, NULL, 10);
INSERT INTO emp1 VALUES (7788,'SCOTT','20',3000, NULL, 20);
INSERT INTO emp1 VALUES (7839,'KING','12',5000, NULL, 10);
INSERT INTO emp1 VALUES (7844,'TURNER','23', 1500, NULL, 30);
INSERT INTO emp1 VALUES (7876,'ADAMS','25',1100,NULL, 20);
INSERT INTO emp1 VALUES (7900,'JAMES','25', 950,NULL, 30);
INSERT INTO emp1 VALUES (7902,'FORD','20',3000,NULL, 20);
INSERT INTO emp1 VALUES (7934,'MILLER','25',1300,NULL,10);

If you have any comments about this article please send them to Boris Knizhnik  borisk@bikinfo.com


 

Back to BIK Information Services home page  Go to top


©1997-2015 BIK Information Services, Inc. All Rights Reserved.
Last modified on Monday, 13-Sep-2004 21:55:11 EDT