rda submitsql: Questions on query performance

Hello,
Could you please clarify certain questions for me, connected with general
query execution speed...
1) I have this table with several fields, and one of them is of type
smalldatetime. The dates are always exact - ie, with time part equal to
"00:00". There is another table with smalldatetime field, and I want those
two to join in one query. The query runs, but somewhat slow. Question: Are
joins done with smalldatetime field types somewhat bad? I assume, if those
would be int fields, the join would run faster. However, I have no chance to
convert those fields to int. I have made an index in the first table on this
field, and on the other table it's one of the fields of primary key
(consisting from two fields, int and smalldatetime), but the index does not
seem to have impact on the join.
2) Is there a difference how I arrange the criteria parts which are joined
with 'AND' operator?
Example: SELECT ... FROM ... WHERE criteriaA AND criteriaB AND criteriaC.
Let's say, that criteriaB would be true for 1/4 of the all records, but
criteriaA is true for 5/4 of the all records. If the criteriaA comes first,
isn't the SQL serve engine somewhat more busy first selecting the 5/4 of all
the records, and only then calculating the criteriaB? Say, if I would
arrange the criteria so that criteriaB comes first, maybe the query will be
faster, because the majority of records will be discarded early? Of course,
if the server engine just loops over all the records, calculating criteria
for every row, then the order of parts would have no impact on performance.
3) Is there impact on performance, if I minimize the size of subqueries?
Example:
a) SELECT * FROM (SELECT .. FROM .. ) AS MyTable WHERE criteriaA
b) SELECT * FROM (SELECT .. FROM .. WHERE criteriaA) AS MyTable
Would (b) run faster, because the server engine can trim the dataset
earlier? Of course, the example above does not make sense why it should use
subqueries, but in real life the outer (parent) query would have multiple
joins with additional queries.
Thanks,
Pavils
"Pavils Jurjans" <pavils@.mailbox.riga.lv> wrote in message
news:uSkhj%235KFHA.3788@.tk2msftngp13.phx.gbl...
> Hello,
> Could you please clarify certain questions for me, connected with general
> query execution speed...
> 1) I have this table with several fields, and one of them is of type
> smalldatetime. The dates are always exact - ie, with time part equal to
> "00:00". There is another table with smalldatetime field, and I want those
> two to join in one query. The query runs, but somewhat slow. Question: Are
> joins done with smalldatetime field types somewhat bad? I assume, if those
> would be int fields, the join would run faster. However, I have no chance
> to convert those fields to int. I have made an index in the first table on
> this field, and on the other table it's one of the fields of primary key
> (consisting from two fields, int and smalldatetime), but the index does
> not seem to have impact on the join.
Joins on SMALLDATETIME are fine. The problem is may be the join criterion.
Expressions in the join criteria can slow down join performance.

> 2) Is there a difference how I arrange the criteria parts which are joined
> with 'AND' operator?
> Example: SELECT ... FROM ... WHERE criteriaA AND criteriaB AND criteriaC.
No. SQL Server will pick the order based on the statistics for the table,
not the lexical order of the operators.

> Let's say, that criteriaB would be true for 1/4 of the all records, but
> criteriaA is true for 5/4 of the all records. If the criteriaA comes
> first, isn't the SQL serve engine somewhat more busy first selecting the
> 5/4 of all the records, and only then calculating the criteriaB? Say, if I
> would arrange the criteria so that criteriaB comes first, maybe the query
> will be faster, because the majority of records will be discarded early?
> Of course, if the server engine just loops over all the records,
> calculating criteria for every row, then the order of parts would have no
> impact on performance.
> 3) Is there impact on performance, if I minimize the size of subqueries?
> Example:
> a) SELECT * FROM (SELECT .. FROM .. ) AS MyTable WHERE criteriaA
> b) SELECT * FROM (SELECT .. FROM .. WHERE criteriaA) AS MyTable
> Would (b) run faster, because the server engine can trim the dataset
> earlier?
You've got the right idea. "triming the dataset earlier" is a critical goal
of query optimization. But typically you don't do that "by hand". Rather,
providing statistics, appropriate indexes, not using expressions in joins or
WHERE clauses which suppress index use, etc will allow SQL Server to
discover a plan which "trims the dataset earlier".
David
|||See inline
Pavils Jurjans wrote:
> Hello,
> Could you please clarify certain questions for me, connected with general
> query execution speed...
> 1) I have this table with several fields, and one of them is of type
> smalldatetime. The dates are always exact - ie, with time part equal to
> "00:00". There is another table with smalldatetime field, and I want those
> two to join in one query. The query runs, but somewhat slow. Question: Are
> joins done with smalldatetime field types somewhat bad?
No.

> I assume, if those would be int fields, the join would run faster.
You would probably not see any difference.

> However, I have no chance to
> convert those fields to int. I have made an index in the first table on this
> field, and on the other table it's one of the fields of primary key
> (consisting from two fields, int and smalldatetime), but the index does not
> seem to have impact on the join.
The primary key index is not very useful in this case, because your
smalldatetime column is not the first in the index. Try adding a
seperate index on the smalldatetime column, it might make a very big
difference.

> 2) Is there a difference how I arrange the criteria parts which are joined
> with 'AND' operator?
No.

> Example: SELECT ... FROM ... WHERE criteriaA AND criteriaB AND criteriaC.
> Let's say, that criteriaB would be true for 1/4 of the all records, but
> criteriaA is true for 5/4 of the all records. If the criteriaA comes first,
> isn't the SQL serve engine somewhat more busy first selecting the 5/4 of all
> the records, and only then calculating the criteriaB? Say, if I would
> arrange the criteria so that criteriaB comes first, maybe the query will be
> faster, because the majority of records will be discarded early? Of course,
> if the server engine just loops over all the records, calculating criteria
> for every row, then the order of parts would have no impact on performance.
SQL-Server uses a cost based optimizer. This means that statistics about
table size, data distribution and selectivity are used to estimate what
the most efficient access path and index use is. The most efficient
query plan (based on these estimations) is used. So what you are
describing is what SQL-Server does under the covers.

> 3) Is there impact on performance, if I minimize the size of subqueries?
> Example:
> a) SELECT * FROM (SELECT .. FROM .. ) AS MyTable WHERE criteriaA
> b) SELECT * FROM (SELECT .. FROM .. WHERE criteriaA) AS MyTable
> Would (b) run faster, because the server engine can trim the dataset
> earlier? Of course, the example above does not make sense why it should use
> subqueries, but in real life the outer (parent) query would have multiple
> joins with additional queries.
If you use subqueries, you are limiting the options the optimizer. So
generally it is best to not use subqueries. In some instances,
SQL-Server is smart enough to recognize that the predicate in the outer
query in example a can be moved to the subquery.
HTH,
Gert-Jan

> Thanks,
> Pavils

Monday, February 20, 2012

Questions on query performance

No comments:

Post a Comment

rda submitsql

Blog Archive

About Me