Friday, March 9, 2012

Quick question about rowguid column

Is there any issue with assigning the primary key to a rowguid column?
Thoughts?
WBIt depends...
http://www.aspfaq.com/show.asp?id=2504
Adam Machanic
SQL Server MVP
http://www.sqljunkies.com/weblog/amachanic
--
"WB" <none> wrote in message news:OOk1gk2HFHA.2456@.TK2MSFTNGP09.phx.gbl...
> Is there any issue with assigning the primary key to a rowguid column?
> Thoughts?
> WB
>|||Yes....
1. Creates a VERY Wide Index that is inefficient (Compared to INT for
example)
2. IF this is also the Clustered Index (I recommend against), Then ALL
Indexes must propogate this GUID in their lookup tables
3. Clustering on GUID can cause page splits,etc
Company I recently worked at just added an identity column to all tables in
DB and clustered on the Ident instead of the GUID. Inserts of records sped
up tremendously. I cant quote the exact number of records, bu the "Process"
went form 17 minutes to literally seconds...
Food for thought
Some companies (My current employer for example) want to use a GUID type of
identifier as opposed to integers for whatever reason. So what they do, to
ensure that the GUID values are Monotonically increasing in value (to avoid
page splits), is the create their own custom "GUID" type data generator in
which the algorithm ensure incrementing values.
Greg Jackson
PDX, Oregon
"WB" <none> wrote in message news:OOk1gk2HFHA.2456@.TK2MSFTNGP09.phx.gbl...
> Is there any issue with assigning the primary key to a rowguid column?
> Thoughts?
> WB
>|||Cluster Index definition on surrogate keys are ALWAYS a bad idea, INT or
GUID. Usually, the Business Key the surrogate is proxiing for is a better
Cluster Index candidate.
Sincerely,
Anthony Thomas
"pdxJaxon" <GregoryAJackson@.Hotmail.com> wrote in message
news:uaC7za3HFHA.1948@.TK2MSFTNGP14.phx.gbl...
Yes....
1. Creates a VERY Wide Index that is inefficient (Compared to INT for
example)
2. IF this is also the Clustered Index (I recommend against), Then ALL
Indexes must propogate this GUID in their lookup tables
3. Clustering on GUID can cause page splits,etc
Company I recently worked at just added an identity column to all tables in
DB and clustered on the Ident instead of the GUID. Inserts of records sped
up tremendously. I cant quote the exact number of records, bu the "Process"
went form 17 minutes to literally seconds...
Food for thought
Some companies (My current employer for example) want to use a GUID type of
identifier as opposed to integers for whatever reason. So what they do, to
ensure that the GUID values are Monotonically increasing in value (to avoid
page splits), is the create their own custom "GUID" type data generator in
which the algorithm ensure incrementing values.
Greg Jackson
PDX, Oregon
"WB" <none> wrote in message news:OOk1gk2HFHA.2456@.TK2MSFTNGP09.phx.gbl...
> Is there any issue with assigning the primary key to a rowguid column?
> Thoughts?
> WB
>|||"Anthony Thomas" <ALThomas@.kc.rr.com> wrote in message
news:uPDlrHwKFHA.3340@.TK2MSFTNGP14.phx.gbl...
> Cluster Index definition on surrogate keys are ALWAYS a bad idea, INT or
> GUID. Usually, the Business Key the surrogate is proxiing for is a better
> Cluster Index candidate.
There's no such thing as "usually". There are plenty of applications
for either type of clustering key -- and clustering on a sequential integer
can be especially beneficial in many situations. You should be very careful
with that assumption.
Adam Machanic
SQL Server MVP
http://www.datamanipulation.net
--|||Name one.
Sincerely,
Anthony Thomas
"Adam Machanic" <amachanic@.hotmail._removetoemail_.com> wrote in message
news:OYusbfwKFHA.3132@.TK2MSFTNGP12.phx.gbl...
"Anthony Thomas" <ALThomas@.kc.rr.com> wrote in message
news:uPDlrHwKFHA.3340@.TK2MSFTNGP14.phx.gbl...
> Cluster Index definition on surrogate keys are ALWAYS a bad idea, INT or
> GUID. Usually, the Business Key the surrogate is proxiing for is a better
> Cluster Index candidate.
There's no such thing as "usually". There are plenty of applications
for either type of clustering key -- and clustering on a sequential integer
can be especially beneficial in many situations. You should be very careful
with that assumption.
Adam Machanic
SQL Server MVP
http://www.datamanipulation.net
--|||"Anthony Thomas" <ALThomas@.kc.rr.com> wrote in message
news:%23gGre3wKFHA.1176@.TK2MSFTNGP12.phx.gbl...
> Name one.
Inserting data.
Adam Machanic
SQL Server MVP
http://www.datamanipulation.net
--|||Yes, and Inserting Data in Business Key order shouldn't cause an inordinate
amount of page splits if you have your fill factors and database maintenance
routines set correctly. And, having the Cluster Index on the Business Key
will far outweigh any split issues compared to any scans you may encounter
and gather the covering aspects of the clustered index plus it is typlically
the Business Key that most range queries are conducted on.
Care to try again? Are you really telling me that page splits is the BEST
reason you can come up with? Please. Are you really telling me that was
the whole design goal with having Clustered Indexes over Heaps to begin
with? You've got to be kidding me.
Can I get a witness?
Sincerely,
Anthony Thomas
"Adam Machanic" <amachanic@.hotmail._removetoemail_.com> wrote in message
news:OuxM5KxKFHA.4052@.tk2msftngp13.phx.gbl...
"Anthony Thomas" <ALThomas@.kc.rr.com> wrote in message
news:%23gGre3wKFHA.1176@.TK2MSFTNGP12.phx.gbl...
> Name one.
Inserting data.
Adam Machanic
SQL Server MVP
http://www.datamanipulation.net
--|||"Anthony Thomas" <ALThomas@.kc.rr.com> wrote in message
news:eCMwAS5KFHA.2860@.TK2MSFTNGP10.phx.gbl...
> Can I get a witness?
No. Never worked in a 24-7 (4-nines, and we're trying for 5 this year),
high-volume OLTP environment, have you? Good luck maintaining those fill
factors when you have no maintenence window.
Adam Machanic
SQL Server MVP
http://www.datamanipulation.net
--|||This is a multi-part message in MIME format.
--=_NextPart_000_08EF_01C52BAD.4FA001B0
Content-Type: text/plain;
charset="Windows-1252"
Content-Transfer-Encoding: 7bit
Would agree that the DBREINDEX can be troublesome in highly available
systems. But the fill factors should be static from a design perspective.
Also, the DBREINDEX is semi-online in that it only needs to obtain exclusive
table access only on the table it is currently running against. Running the
Cluster Index rebuild WITH DROP EXISTING can speed up the process.
But then, again, that is a management issue. The 5 9's do NOT refer to
TOTAL TIME, but to SLA time. And, EVERY SYSTEM HAS TO HAVE AN SLA, which
would include a maintenance window. If you don't have an APPROPRIATE SLA,
then your 5 9's don't mean very much. But, yes, we maintain 5, and even 6,
9's of availability, but properly measured against reasonable SLAs that have
actually been documented and constructed with the various Business Units
that are demanding the system availabilities.
I suppose that if your SLAs DEMANDED 24x7x52, with NO MAINTENANCE, as
delusional as the Business Units may be, then sacrificing performance over
page splits would be one reason to not care about the Cluster Index
placement. But, then, why bother and just use a HEAP. You're not going to
get much better performance from an ill-chosen Clustered Index. Some, but
not much.
Sincerely,
Anthony Thomas
"Adam Machanic" <amachanic@.hotmail._removetoemail_.com> wrote in message
news:%23Ra$B$8KFHA.2252@.TK2MSFTNGP15.phx.gbl...
"Anthony Thomas" <ALThomas@.kc.rr.com> wrote in message
news:eCMwAS5KFHA.2860@.TK2MSFTNGP10.phx.gbl...
>
> Can I get a witness?
No. Never worked in a 24-7 (4-nines, and we're trying for 5 this
year),
high-volume OLTP environment, have you? Good luck maintaining those fill
factors when you have no maintenence window.
--
Adam Machanic
SQL Server MVP
http://www.datamanipulation.net
--
--=_NextPart_000_08EF_01C52BAD.4FA001B0
Content-Type: text/html;
charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
&

Would agree that the DBREINDEX can be =troublesome in highly available systems. But the fill factors should be static =from a design perspective. Also, the DBREINDEX is semi-online in that it =only needs to obtain exclusive table access only on the table it is currently =running against. Running the Cluster Index rebuild WITH DROP EXISTING can =speed up the process.
But then, again, that is a management =issue. The 5 9's do NOT refer to TOTAL TIME, but to SLA =time. And, EVERY SYSTEM HAS TO HAVE AN SLA, which would include a maintenance =window. If you don't have an APPROPRIATE SLA, then your 5 9's don't mean very much. But, yes, we maintain 5, and even 6, 9's of availability, =but properly measured against reasonable SLAs that have actually been =documented and constructed with the various Business Units that are demanding the =system availabilities.
I suppose that if your SLAs DEMANDED =24x7x52, with NO MAINTENANCE, as delusional as the Business Units may be, then sacrificing performance over page splits would be one reason to not care =about the Cluster Index placement. But, then, why bother and just use a HEAP. You're not going to get much better performance from an =ill-chosen Clustered Index. Some, but not much.
Sincerely,
Anthony Thomas
--
"Adam Machanic" wrote in message news:%23Ra$B$8KFHA.=2252@.TK2MSFTNGP15.phx.gbl..."Anthony Thomas" wrote in messagenews:eCMwAS5KFHA.2860=@.TK2MSFTNGP10.phx.gbl...>> Can I get a witness? No. Never =worked in a 24-7 (4-nines, and we're trying for 5 this year),high-volume OLTP environment, have you? Good luck maintaining those =fillfactors when you have no maintenence window.-- Adam MachanicSQL =Server MVPhttp://www.datamanipulation.net<=/A>--

--=_NextPart_000_08EF_01C52BAD.4FA001B0--|||Since you brought it up...well, and I brought it up, the topic of SLAs and
availability metrics is an interestin topic in its own right.
Craig Mullins has some good articles on the topic you should check out.
http://www.craigsmullins.com/dbta_006.htm
Sincerely,
Anthony Thomas
"Anthony Thomas" <ALThomas@.kc.rr.com> wrote in message
news:eXByZ$9KFHA.3832@.TK2MSFTNGP12.phx.gbl...
Would agree that the DBREINDEX can be troublesome in highly available
systems. But the fill factors should be static from a design perspective.
Also, the DBREINDEX is semi-online in that it only needs to obtain exclusive
table access only on the table it is currently running against. Running the
Cluster Index rebuild WITH DROP EXISTING can speed up the process.
But then, again, that is a management issue. The 5 9's do NOT refer to
TOTAL TIME, but to SLA time. And, EVERY SYSTEM HAS TO HAVE AN SLA, which
would include a maintenance window. If you don't have an APPROPRIATE SLA,
then your 5 9's don't mean very much. But, yes, we maintain 5, and even 6,
9's of availability, but properly measured against reasonable SLAs that have
actually been documented and constructed with the various Business Units
that are demanding the system availabilities.
I suppose that if your SLAs DEMANDED 24x7x52, with NO MAINTENANCE, as
delusional as the Business Units may be, then sacrificing performance over
page splits would be one reason to not care about the Cluster Index
placement. But, then, why bother and just use a HEAP. You're not going to
get much better performance from an ill-chosen Clustered Index. Some, but
not much.
Sincerely,
Anthony Thomas
"Adam Machanic" <amachanic@.hotmail._removetoemail_.com> wrote in message
news:%23Ra$B$8KFHA.2252@.TK2MSFTNGP15.phx.gbl...
"Anthony Thomas" <ALThomas@.kc.rr.com> wrote in message
news:eCMwAS5KFHA.2860@.TK2MSFTNGP10.phx.gbl...
> Can I get a witness?
No. Never worked in a 24-7 (4-nines, and we're trying for 5 this year),
high-volume OLTP environment, have you? Good luck maintaining those fill
factors when you have no maintenence window.
Adam Machanic
SQL Server MVP
http://www.datamanipulation.net
--|||http://www.dbta.com/columnists/craig_mullins/dba_corner_0902.html
http://www.craigsmullins.com/dbta_026.htm
"Anthony Thomas" <ALThomas@.kc.rr.com> wrote in message
news:OH8L1F%23KFHA.2748@.TK2MSFTNGP09.phx.gbl...
Since you brought it up...well, and I brought it up, the topic of SLAs and
availability metrics is an interestin topic in its own right.
Craig Mullins has some good articles on the topic you should check out.
http://www.craigsmullins.com/dbta_006.htm
Sincerely,
Anthony Thomas
"Anthony Thomas" <ALThomas@.kc.rr.com> wrote in message
news:eXByZ$9KFHA.3832@.TK2MSFTNGP12.phx.gbl...
Would agree that the DBREINDEX can be troublesome in highly available
systems. But the fill factors should be static from a design perspective.
Also, the DBREINDEX is semi-online in that it only needs to obtain exclusive
table access only on the table it is currently running against. Running the
Cluster Index rebuild WITH DROP EXISTING can speed up the process.
But then, again, that is a management issue. The 5 9's do NOT refer to
TOTAL TIME, but to SLA time. And, EVERY SYSTEM HAS TO HAVE AN SLA, which
would include a maintenance window. If you don't have an APPROPRIATE SLA,
then your 5 9's don't mean very much. But, yes, we maintain 5, and even 6,
9's of availability, but properly measured against reasonable SLAs that have
actually been documented and constructed with the various Business Units
that are demanding the system availabilities.
I suppose that if your SLAs DEMANDED 24x7x52, with NO MAINTENANCE, as
delusional as the Business Units may be, then sacrificing performance over
page splits would be one reason to not care about the Cluster Index
placement. But, then, why bother and just use a HEAP. You're not going to
get much better performance from an ill-chosen Clustered Index. Some, but
not much.
Sincerely,
Anthony Thomas
"Adam Machanic" <amachanic@.hotmail._removetoemail_.com> wrote in message
news:%23Ra$B$8KFHA.2252@.TK2MSFTNGP15.phx.gbl...
"Anthony Thomas" <ALThomas@.kc.rr.com> wrote in message
news:eCMwAS5KFHA.2860@.TK2MSFTNGP10.phx.gbl...
> Can I get a witness?
No. Never worked in a 24-7 (4-nines, and we're trying for 5 this year),
high-volume OLTP environment, have you? Good luck maintaining those fill
factors when you have no maintenence window.
Adam Machanic
SQL Server MVP
http://www.datamanipulation.net
--|||This is a multi-part message in MIME format.
--=_NextPart_000_0092_01C52BD8.4B71D9D0
Content-Type: text/plain;
charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable
You should probably read Kimberly Tripp's Q&A site:
http://www.sqlskills.com/ConsolidatedQA.asp
This will clarify a lot of things for you, I think. Heaps are certainly =not a good choice for insert performance, as gaps will be filled if rows =are deleted. And I'm confused about your comment regarding sacrificing =performance -- clustering on an IDENTITY or other sequential key will do =the opposite in many cases. Insert, and in many cases read performance =will both benefit. I've run extensive tests to prove this (I'm a load =testing fanatic) and you'll find upon reading that web page that =Kimberly apparently agrees with me. I'm not sure what basis your =arguments have, but you may want to run some tests for yourself.
Thanks for the links on SLAs -- I will send them to our business team =:-)
-- Adam Machanic
SQL Server MVP
http://www.datamanipulation.net
--
"Anthony Thomas" <ALThomas@.kc.rr.com> wrote in message =news:eXByZ$9KFHA.3832@.TK2MSFTNGP12.phx.gbl...
Would agree that the DBREINDEX can be troublesome in highly available =systems. But the fill factors should be static from a design =perspective. Also, the DBREINDEX is semi-online in that it only needs =to obtain exclusive table access only on the table it is currently =running against. Running the Cluster Index rebuild WITH DROP EXISTING =can speed up the process.
But then, again, that is a management issue. The 5 9's do NOT refer =to TOTAL TIME, but to SLA time. And, EVERY SYSTEM HAS TO HAVE AN SLA, =which would include a maintenance window. If you don't have an =APPROPRIATE SLA, then your 5 9's don't mean very much. But, yes, we =maintain 5, and even 6, 9's of availability, but properly measured =against reasonable SLAs that have actually been documented and =constructed with the various Business Units that are demanding the =system availabilities.
I suppose that if your SLAs DEMANDED 24x7x52, with NO MAINTENANCE, as =delusional as the Business Units may be, then sacrificing performance =over page splits would be one reason to not care about the Cluster Index =placement. But, then, why bother and just use a HEAP. You're not going =to get much better performance from an ill-chosen Clustered Index. Some, =but not much.
Sincerely,
Anthony Thomas
--
--=_NextPart_000_0092_01C52BD8.4B71D9D0
Content-Type: text/html;
charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
&

You should probably read Kimberly =Tripp's Q&A site:
http://www.sqlskills.com/ConsolidatedQA.asp">http://www.sqlskills=.com/ConsolidatedQA.asp
This will clarify a lot of things =for you, I think. Heaps are certainly not a good choice for insert =performance, as gaps will be filled if rows are deleted. And I'm confused about =your comment regarding sacrificing performance -- clustering on an IDENTITY =or other sequential key will do the opposite in many cases. Insert, and in =many cases read performance will both benefit. I've run extensive tests =to prove this (I'm a load testing fanatic) and you'll find upon reading =that web page that Kimberly apparently agrees with me. I'm not sure =what basis your arguments have, but you may want to run some tests for yourself.
Thanks for the links on SLAs -- I will =send them to our business team :-)
-- Adam =MachanicSQL Server MVPhttp://www.datamanipulation.net<=/A>--
"Anthony Thomas" wrote in =message news:eXByZ$9KFHA.3832=@.TK2MSFTNGP12.phx.gbl...
Would agree that the DBREINDEX can =be troublesome in highly available systems. But the fill factors =should be static from a design perspective. Also, the DBREINDEX is =semi-online in that it only needs to obtain exclusive table access only on the table =it is currently running against. Running the Cluster Index rebuild =WITH DROP EXISTING can speed up the process.

But then, again, that is a =management issue. The 5 9's do NOT refer to TOTAL TIME, but to SLA =time. And, EVERY SYSTEM HAS TO HAVE AN SLA, which would include a maintenance window. If you don't have an APPROPRIATE SLA, then your 5 9's =don't mean very much. But, yes, we maintain 5, and even 6, 9's of =availability, but properly measured against reasonable SLAs that have actually been =documented and constructed with the various Business Units that are demanding the =system availabilities.

I suppose that if your SLAs =DEMANDED 24x7x52, with NO MAINTENANCE, as delusional as the Business Units may be, then sacrificing performance over page splits would be one reason to not =care about the Cluster Index placement. But, then, why bother and just use =a HEAP. You're not going to get much better performance from an =ill-chosen Clustered Index. Some, but not much.

Sincerely,


Anthony Thomas

--

--=_NextPart_000_0092_01C52BD8.4B71D9D0--|||Okay. I took you up on your suggestions.
Yes, I read Kimberly Tripp's Q&A and you two agree, to an extent. She has a
basis for "Defaulting" to clustering the PK IDENTITY regardless if it is a
surrogate key or not. She also admits that this will create a "hot spot" in
the data file but assumes the end-users will immediately want to retrieve
this information after insertion and that the hot spots is mitigated by
having the recent insert already in cache. She also conceedes that the
Business Key would be an alternative Cluster Index candidate in some
situations.
I also took your advice and ran my own tests. Here's what I discovered:
The page splits happen only as a course of inserts; so, table types that
have many more inserts versus other CRUD or Query operations may benefit for
what you two are suggesting.
I also agree with the fact that HEAPs are detrimental for the myriad of
reasons you and Kimberly point out.
However, I still disagree with the notion of using IDENTITY Clustered
Indexes as a matter of default. Here is why:
The majority of tables are not of the type the two of you suggest would be
beneficial for using an IDENTITY attribute. The majority of tables are of
the reference type. For example, a Customer, Author, Title, Orders, and the
various other look up and reference kinds of tables. Once entered, these
tables are queried and/or joined for reference information far more often in
query type statements than they ever are versus the intitial INSERT.
Moreover, my previous comments ring true. That more often than not, a
SELECT * or at least many of the columns for these reference tables are
included. In which case, the Clustered Index is chosen more often than not
because it is a covering index. However, these types of tables typically
JOIN and are FILTERED by the very Business Key that defines the Unique
Contraint. When the Clustered Index is the surrogate IDENTITY you end up
forcing a Clustered Index Scan whereas having the Unique Constraint as the
Clustered Index, you end up with a Clustered Index Seek, which is more
efficient.
So, as a matter of "default," you will cover more tables if you choose the
Business Key as the Cluster Index instead of the surrogate IDENTITY and will
end up with more efficient queries.
So, what to do with the handful of transactional tables, that are low in
number of tables, but high in the quantity of data? My tests have shown
that the page splits is controlled more by the FILL FACTOR, as I suggested,
than by choosing the surrogate key over the Business Key. Now, you are
correct in that choosing increasing keys will force the splits at the ends,
which Kimberly and I both agree will create a local "hot spot." What we
disagree on is whether or not this is desirable. Next, having an
appropriate FILL Factor and choosing the Business Key for the Cluster Index,
can distribute this "hot spot" activity throughout the database files. I
contend that this is more desirable and have found that even with the
default FILL Factor causes few splits than using the surrogate key as the
Cluster Index.
Finally, there are situations, only with the transaction tables, that the
IDENTITY is NOT a surrogate, but the Business Key itself. In these
situations, only, I think we agree to use this as the Cluster Key, but only
because our definitions have coincided, not because of any concession on
either of our part.
I am sorry, but the tests I have just conducted, although emperical and
limited, at least suggest that what I am say bears truth. I would be
curious to know the particulars of your tests in order that I attempt to
reproduce your results.
Thanks for all of your time.
Sincerely,
Anthony Thomas
"Adam Machanic" <amachanic@.hotmail._removetoemail_.com> wrote in message
news:uha51IALFHA.568@.TK2MSFTNGP09.phx.gbl...
You should probably read Kimberly Tripp's Q&A site:
http://www.sqlskills.com/ConsolidatedQA.asp
This will clarify a lot of things for you, I think. Heaps are certainly not
a good choice for insert performance, as gaps will be filled if rows are
deleted. And I'm confused about your comment regarding sacrificing
performance -- clustering on an IDENTITY or other sequential key will do the
opposite in many cases. Insert, and in many cases read performance will
both benefit. I've run extensive tests to prove this (I'm a load testing
fanatic) and you'll find upon reading that web page that Kimberly apparently
agrees with me. I'm not sure what basis your arguments have, but you may
want to run some tests for yourself.
Thanks for the links on SLAs -- I will send them to our business team :-)
Adam Machanic
SQL Server MVP
http://www.datamanipulation.net
--
"Anthony Thomas" <ALThomas@.kc.rr.com> wrote in message
news:eXByZ$9KFHA.3832@.TK2MSFTNGP12.phx.gbl...
Would agree that the DBREINDEX can be troublesome in highly available
systems. But the fill factors should be static from a design perspective.
Also, the DBREINDEX is semi-online in that it only needs to obtain exclusive
table access only on the table it is currently running against. Running the
Cluster Index rebuild WITH DROP EXISTING can speed up the process.
But then, again, that is a management issue. The 5 9's do NOT refer to
TOTAL TIME, but to SLA time. And, EVERY SYSTEM HAS TO HAVE AN SLA, which
would include a maintenance window. If you don't have an APPROPRIATE SLA,
then your 5 9's don't mean very much. But, yes, we maintain 5, and even 6,
9's of availability, but properly measured against reasonable SLAs that have
actually been documented and constructed with the various Business Units
that are demanding the system availabilities.
I suppose that if your SLAs DEMANDED 24x7x52, with NO MAINTENANCE, as
delusional as the Business Units may be, then sacrificing performance over
page splits would be one reason to not care about the Cluster Index
placement. But, then, why bother and just use a HEAP. You're not going to
get much better performance from an ill-chosen Clustered Index. Some, but
not much.
Sincerely,
Anthony Thomas|||On Tue, 22 Mar 2005 07:05:30 -0600, Anthony Thomas wrote:
(snip)
(snip)
>Moreover, my previous comments ring true. That more often than not, a
>SELECT * or at least many of the columns for these reference tables are
>included. In which case, the Clustered Index is chosen more often than not
>because it is a covering index. However, these types of tables typically
>JOIN and are FILTERED by the very Business Key that defines the Unique
>Contraint. When the Clustered Index is the surrogate IDENTITY you end up
>forcing a Clustered Index Scan whereas having the Unique Constraint as the
>Clustered Index, you end up with a Clustered Index Seek, which is more
>efficient.
(snip)
Hi Anthony,
Yes, they are often filtered by the business key that defines the unique
constraint.
No, they are hardly ever joined by that business key. The whole point of
introducing an IDENTITY surrogate key is to use that integer value for
all references instead of the (often longer, sometimes spanned) business
key. So I'd expect joins to be using the identity surrogate key.
If most queries include a filter on the business key, AND the optimizer
decides to use that filter first in it's execution plan, than a
clustered index on the business key would be better than clustering on
the surrogate key. But if many queries don't filter the business key
(because the filters are on columns in other tables, and this table is
only joined in to display some extra columns), OR even if the queries do
filter on the business key, but the optimizer decides that the best plan
will filter other tables first, then join this table and apply the
remainging filter conditions, then the clustered index on the identity
column would be best.
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)|||"Hugo Kornelis" <hugo@.pe_NO_rFact.in_SPAM_fo> wrote in message
news:887141lbp95i5ibkgarn8t1ga3812gpbs0@.4ax.com...
> If most queries include a filter on the business key, AND the optimizer
> decides to use that filter first in it's execution plan, than a
> clustered index on the business key would be better than clustering on
> the surrogate key. But if many queries don't filter the business key
> (because the filters are on columns in other tables, and this table is
> only joined in to display some extra columns), OR even if the queries do
> filter on the business key, but the optimizer decides that the best plan
> will filter other tables first, then join this table and apply the
> remainging filter conditions, then the clustered index on the identity
> column would be best.
Excellent point :)
Adam Machanic
SQL Server MVP
http://www.datamanipulation.net
--|||Ok. A few replies...and I'll try to keep this short to minimize the
responding topics.
1. You are corrent in that I mispoke concerning a Joining on the Business
Key. Yes, that is the whole point of the surrogate, but as the predominant
restriction condition is still true.
2. A Cluster Index on the surrogate key WILL NOT make the join process
faster. Think about it. If you have to process a MERGE or INNER LOOP join,
with either a prefetch or not, it will have to do a matched seek. Both a
clustered index and a non-clustered index would be sufficient for the
singleton select lookups. Your suggestion that IDENTITY CLUSTERED INDEXES
are more efficient JOIN candidates is hyped but not supported by evidence.
3. Yes, queries against the reference tables are my chief concern because I
see this every day with vended solutions that use the PK IDENTITY Clustered
Index approach, and it kills the systems. Chiefly because the bulk of the
queries, like Customers, etc., are searched on Name Ranges, which the
IDENTITY Clustered Primary Key is the worst candidate.
4. This is a matter of "default." The very Q & A Kimberly wrote says this
explicitly, the very link you provided. And, it is what vendor after vendor
after vendor is shoving into the market place WITHOUT giving it a second
thought much less thought per individual table. Plus, the tool, SQL Server,
does this by default if you do not have the foresight to tell it otherwise.
5. The Orders table being a reference table is a hybrid, it is usually the
Order Details tables that I was thinking of when I was attempting to
distinguish between the two, reference and transaction, respectively. Note,
I have ONLY been talking about OLTP databases, just the table styles within
an OLTP database.
6. My tests showed that the IDENTITY CLUSTERED INDEX had MORE SPLITS, even
on a transaction type table than a Business Key. Now, this boggled my mind;
it was almost 3 to 1 greater. But think about it; these are B-Tree indexes.
If you use an increasing attribute, especially a monitonically increasing
one like an IDENTITY, you've unbalanced the index; you are making it
lop-sided. You end up causing more Intermediate Node splits more often than
if you had randomly split the leafs throughout the table. Now, there may be
a better explaination, but from the first pass tests I conducted, this is
what I saw.
7. The initial question was regarding INSERTS, splits, versus the payoff of
using the Business Key for queries. So, my tests have been first conducted
on individual tables. I am looking into multi-table, with relationships,
types of tests to conduct next. For now, however, here is what I did.
CREATE TABLE MyTable
(MyID INT IDENTITY NOT NULL
PRIMARY KEY CLUSTERED
,MyName VARCHAR(30) NOT NULL
UNIQUE NONCLUSTERED
,MyDescription VARCHAR(50) NOT NULL
,MyCreateDate DATETIME NOT NULL
DEFAULT (GETDATE())
)
CREATE NONCLUSTERED INDEX IX01_MyTable
ON MyTable(MyCreateDate)
Now, I ran several passes changing the Clustered Index attribute, single
column, and then reconducting the test.
The test was to monitor the split behavior, both at the leaf as well as the
node level and then look at the fragmentation once completed with the
inserts of 1,000,000 records. Each insert was a single row at a time to
simulate individual transacitons.
The MyID was IDENTITY; so, no explicit value inserted. MyName was preceeded
with a CHAR(x) value randomly created, then the length was varied between
the single character to the maximum value. MyDescription was just garbage,
but the length was randomly filled. MyCreateDate was allowed to assume the
transaction time default.
After the inserts and fragmentation analysis, a set of predefined queries
were ran with the execution plan, client statistics, I/O and Time
statistics.
The queries were:
SELECT * FROM MyTable
SELECT * FROM MyTable WHERE MyID = 550000
SELECT * FROM MyTable WHERE MyName LIKE 'M%'
SELECT * FROM MyTable WHERE MyName = 'M'
SELECT * FROM MyTable WHERE MyDate BETWEEN <1/2 way through the run time>
and <3/4 way through the run time>
SELECT * FROM MyTable WHERE MyDate = <a randomly selected time from the run
time>
SELECT MyName, MyCreateDate FROM MyTable
SELECT MyName, MyCreateDate FROM MyTable WHERE MyID = 550000
SELECT MyName, MyCreateDate FROM MyTable WHERE MyName LIKE 'M%'
SELECT MyName, MyCreateDate FROM MyTable WHERE MyName = 'M'
SELECT MyName, MyCreateDate FROM MyTable WHERE MyDate BETWEEN <1/2 way
through the run time> and <3/4 way through the run time>
SELECT MyName, MyCreateDate FROM MyTable WHERE MyDate = <a randomly selected
time from the run time>
I made two passes through these, once by just letting it run, the other by
running the DBCC FREEPROCCACHE and DBCC DROPCLEANBUFFERS in between
statements.
This results were saved and then the whole thing with the same constraints
but just a diffenent choice of clustered index: IDENTITY (MyID), Business
Key (MyName), and then an increasing but not necessarily unique attribute
(MyCreateDate).
I did my analysis on both sides, the fragmentation and efficiency, time and
I/O statistics, of the inserts, and the execution of the above queries.
At the moment, I am attempting to summarize the results, given they are
quite extensive. However, the examination still bears some of what everyone
else is saying, but there are hidden dangers, the Extent Fragmentation for
one. And, the Business Key still produced the best executions for these
queries.
Now, I would not claim that these tests are scientific research, but they
did give me better insight in to what I had originaly had considered, but
nothing would suggest that I detract from my initial statement, that as a
matter of default, which is what most solutions that are present to me are,
the Business Key is still the best candidate for the Clustered Index, which,
as the DBA, is one of the few, but most effective influence I can apply to
an already designed solution.
Sincerely,
Anthony Thomas
--
"Adam Machanic" <amachanic@.hotmail._removetoemail_.com> wrote in message
news:OZP1pBzLFHA.1956@.TK2MSFTNGP15.phx.gbl...
"Hugo Kornelis" <hugo@.pe_NO_rFact.in_SPAM_fo> wrote in message
news:887141lbp95i5ibkgarn8t1ga3812gpbs0@.4ax.com...
> If most queries include a filter on the business key, AND the optimizer
> decides to use that filter first in it's execution plan, than a
> clustered index on the business key would be better than clustering on
> the surrogate key. But if many queries don't filter the business key
> (because the filters are on columns in other tables, and this table is
> only joined in to display some extra columns), OR even if the queries do
> filter on the business key, but the optimizer decides that the best plan
> will filter other tables first, then join this table and apply the
> remainging filter conditions, then the clustered index on the identity
> column would be best.
Excellent point :)
Adam Machanic
SQL Server MVP
http://www.datamanipulation.net
--

No comments:

Post a Comment