Assuming you are saddled with the responsibility of tuning the following query for performance and readability. You are allowed to change select query, add indexes or columns to the schema, however, you are not permitted to change existing columns schema.
Tune the query for performance and readability and explain why your changes can lead to any improvement.
First of all we will execute the following query in SSMS to create the necessary sample schema and data for this exercise.
CREATE DATABASE testDB GO USE testDB Go SET NOCOUNT ON GO CREATE TABLE Main (a INT, b DECIMAL(18,2) , c DATETIME, e NVARCHAR(25)) CREATE TABLE Sub (a INT, b INT, c DATETIME) DECLARE @a int SET @a=0 WHILE @a<40000 BEGIN INSERT Main SELECT @a,CAST(@a AS DECIMAL(18,2)) - 100, GETDATE() ,LEFT(CAST(NEWID() AS VARCHAR(255)),5)+ REPLACE(STR(CAST(@a+1 AS NVARCHAR(20)), 5, 0), ' ', '0') + RIGHT(CAST(NEWID() AS VARCHAR (255)),5) INSERT Sub SELECT @a+2,@a ,GETDATE() SET @a=@a+1 END GO CREATE FUNCTION dbo.Check_Exists (@ID int) RETURNS INT AS BEGIN DECLARE @Exists BIT IF EXISTS (SELECT b FROM Sub WHERE b = @ID) BEGIN SET @Exists = 0 END ELSE BEGIN SET @Exists = 1 END RETURN @Exists END GO
The following query is the query to be improved for performance and readability:
IF (SELECT SUM(b) FROM Main WHERE a IN (SELECT b FROM Sub WHERE DATEDIFF(DAY, c, GETDATE()-2) <=0 )) > 0 SELECT main.c, dbo.Check_Exists(main.a) exists_ ,e,SUBSTRING(e,6,5) seq_ , CAST(1000/SUM(main.b) AS DECIMAL(18,2)) divide_ FROM Main main INNER JOIN Sub sub ON main.a=sub.b WHERE DATEDIFF(DAY, main.c, GETDATE()-2) <=0 AND DATEDIFF(DAY, sub.c, GETDATE()-2) <=0 AND CAST(SUBSTRING(e,6,5) AS INT) BETWEEN 1000 AND 5000 GROUP BY main.c,main.a,main.e ELSE SELECT NULL,NULL,NULL,NULL
CREATE CLUSTERED INDEX IX_main_a ON Main(a) CREATE CLUSTERED INDEX IX_sub_b ON Sub(b) ALTER TABLE Main ADD computed_main_e AS CAST(SUBSTRING(e,6,5) AS INT) PERSISTED CREATE NONCLUSTERED INDEX IX_main_computed_test_main_e ON Main(computed_main_e,A,B,C) IF (SELECT SUM(TM.B) FROM MAIN TM INNER JOIN SUB TS ON TM.A=TS.B WHERE DATEDIFF(DAY, TS.C, GETDATE()-2) <=0) > 0 BEGIN SELECT TM.C, dbo.Check_Exists(TM.A) AS exists_ ,TM.computed_main_e, TM.computed_main_e AS sequence_ , CAST(1000/SUM(TM.B) AS DECIMAL(18,2)) AS divide_ FROM MAIN TM INNER JOIN SUB TS ON TM.A=TS.B WHERE TM.computed_main_e BETWEEN 1000 AND 5000 AND DATEDIFF(DAY, TS.C, GETDATE()-2) <=0 AND DATEDIFF(DAY, TM.C, GETDATE()-2) <=0 GROUP BY TM.C,TM.A,TM.computed_main_e END ELSE BEGIN SELECT NULL,NULL,NULL,NULL END
Refactored Query Explaination:
Adding clustered indexes on the Main(a) and Sub(b) tables and columns will prevent the expensive table scan therefore improving performance of the query.
Converting the expression on
e column into a computed column with a non-clustered index created for it further improved the performance of the query. Also the
PERSISTED clause on the computed column definition means that the expression evaluation/calculation will be done once during insertion.
I could have done the same for expressions on
c column, if not that
GETDATE() function is non-deterministic, hence I could not create an index for it. However based on the recommendation of the actual execution plan, I included columns
c, b, and a in the non-clustered index. This resulted in an index seek.
I also reordered the expressions in the predicate to ensure that the expression
TM.computed_main_e BETWEEN 1000 AND 5000 comes first before others. This expression is less expensive and if its evaluation yielded false, other more expensive expressions will not be evaluated.
I added proper indents for the conditional statements and flow, this made the code more readable. I also used the
AS keyword to properly denote all aliases in the query.
More from @sqldibia