Search This Blog

Friday, August 6, 2010

What is Selectivity

What is Selectivity:
The ratio between the number of distinct values in the indexed column to the number of records in a table is called as selectivity of an index. So it is obvious that ideal selectivity
would be 1(or Closer to 1) which means most of the values in column are unique

Examples of Good Selectivity:
Consider a table having say 1000 records and one of the indexed column has 900 distinct values then the selectivity would be 900/1000 = 0.90
Indexes on these columns are the most selective and the most effective in optimizing performance. So, the term selectivity of an index means percentage of rows in a table having
the same value for the indexed column.

Examples of Bad Selectivity:
Consider a table having say 1000 records and one of the indexed column has 50 distinct values then the selectivity would be 50/1000 = 0.05
It is evident that a full table scan is more efficient as using such an index where much more I/O is needed to scan repeatedly the index and the table.

B-TREE Indexes improve performance of queries which select small percentage of rows from a table. As a thumb rule, we should create indexes on tables that are often
queried for less than 15% of the table's rows.

Let us understand selectivity by means of a example

IF EXISTS ( SELECT 'X' FROM sysobjects where TYPE = 'U' and name = 'TEST_INDEX_SEL')
DROP TABLE TEST_INDEX_SEL
GO

CREATE TABLE TEST_INDEX_SEL
( Col1_Int INT IDENTITY(1,1) NOT NULL,
Col2_Varchar VARCHAR(255) NOT NULL,
Col3_Varchar VARCHAR(2000) NOT NULL,
COl4_Flag BIT NOT NULL
)

--Poplulate 1000 rows
DECLARE @iCount INTEGER
SELECT @iCount = 1

WHILE @iCount < = 1000
BEGIN

INSERT INTO TEST_INDEX_SEL(Col2_Varchar,Col3_Varchar,COl4_Flag)
SELECT 'ABC' + CONVERT(VARCHAR,@iCount),'\\ABC' + CONVERT (VARCHAR,@iCount) + '\Share',1

SELECT @iCount = @iCount + 1

END

--Update some rows randomly for sampling
UPDATE TEST_INDEX_SEL
SET Col2_Varchar = 'ABC2',
Col3_Varchar = '\\ABC5\Share',
COl4_Flag = 0
WHERE Col1_Int IN ( SELECT TOP 100 COL1_INT FROM TEST_INDEX_SEL ORDER BY NEWID())


UPDATE TEST_INDEX_SEL
SET Col2_Varchar = 'ABC203',
Col3_Varchar = '\\ABC764\Share',
COl4_Flag = 0
WHERE Col1_Int IN ( SELECT TOP 100 COL1_INT FROM TEST_INDEX_SEL ORDER BY NEWID())


UPDATE TEST_INDEX_SEL
SET Col3_Varchar = '\\ABC378\Share',
COl4_Flag = 0
WHERE Col1_Int IN ( SELECT TOP 100 COL1_INT FROM TEST_INDEX_SEL ORDER BY NEWID())

Now for example selectivity of Col2_Varchar can be calculated as follows:

SELECT COUNT (DISTINCT Col2_Varchar) FROM TEST_INDEX_SEL

-- Output
-- 808

SELECT COUNT ('X') FROM TEST_INDEX_SEL

-- Output
-- 1000

So selectivity would be arrived by dividing these 2 values which is not bad and its 0.8
which is just 20% away from ideal value of 1

SELECT 808/1000 = 0.808

--To find for all columns in that table
SELECT CONVERT(NUMERIC(5,2),COUNT (DISTINCT Col2_Varchar))/
(SELECT COUNT ('X') FROM TEST_INDEX_SEL),
CONVERT(NUMERIC(5,2),COUNT (DISTINCT Col3_Varchar))/
(SELECT COUNT ('X') FROM TEST_INDEX_SEL),
CONVERT(NUMERIC(5,2),COUNT (DISTINCT COl4_Flag))/
(SELECT COUNT ('X') FROM TEST_INDEX_SEL)
FROM TEST_INDEX_SEL T

/* Output

Col2_Selecvity Col3_Selectivity Col4_Selectivity
0.8080000000000 0.7260000000000 0.0020000000000

*/

By the above you we can infer that its ideal to create indexes on Col2 if the same is used in WHERE clauses and it wouldn’t help much to create on Col4 as it has low selectivity and that is the reason why many tables which have a bit value as 0 or 1 tend to be slow as it has low selectivity.

1 comment:

Anonymous said...

Thanks mate! WOnderful post, and glad i stumbled upon it since this selectivity concept is something i find to be confusing. Another good page for this is here:

Index selectivity