Search This Blog

Friday, September 28, 2007

BCP and Bulk Insert

(i) BCP is command line utility where as bulk insert is t-sql statement.

(ii) BCP can be used for import And export where as bulk INSERT for import only
While importing Large text files, bulk insert is proven to be faster than BCP because bulk insert converts the data into TDS (tabular data stream) , which is sql server complaint whereas BCP converts data into ODBC data Stream , again which has to be converted into TDS.

Force Optimiser to use Index

To force optimiser to use a specific index we can specify the index name in the select statement
using WITH clause like

SELECT au_id,au_lname from authors WITH Index(0)

To use a particular index
With Index(Index name)

To use table scan
With Index(0)

To use Clustered Index
With Index(1)

However its not a good practice to force a index to be used,as optimizer would have tried various ways of retriving the data like using table scan , clustered,non-clustered and then only choosen the best method to retreive the same

Why SQL Optimizer uses table scan when an index is available?

Sql server is smart enough , that its optimizer determines which execution plan costs lesser and if choosing index is much costlier than table scan , it would opt for table scan.

This happens in following situations.
1. Index keys are very wide and they are distributed arbitrarily or to be precise statistics are outdated,hence use UPDATE STATISTICS to update the same.

2. Estimated rowcount is more than 5 % of the table data.

Keys in Database World

Primary Key
One or more columns which uniquely identiifes each row in the table.If there are more than 1 columns wich makes the row unique then its called as Composite Primary Key.

Unique Key
Same like primary key except that it allows 1 null value in the key columns

Candidate Key
A candidate key is one that can identify each row of a table uniquely. Generally a candidate key becomes the primary key of the table

Alternate Key
If the table has more than one candidate key, one of them will become the primary key, and the rest are called alternate keys.

Foreign Key
Primary key defined on 1 table is called as foreign key in the table which is referring the primary table.
To elaborate more , there is table employee which has employee ID as primary key and if we see Employee Salary table which has salary detail of employee,employee ID in salary table is called as Foreign key.
By this we enforce Employee has to be first present in Employee table and then only he can exist in Employee Salary table.

What are Statistics

Statistics determine the selectivity of the indexes.

If an indexed column has unique values then the selectivity of that index is more, as opposed to an index with non-unique values. Query optimizer uses these indexes in determining whether to choose an index or not while executing a query.

Some situations under which you should update statistics:
1) If there is significant change in the key values in the index
2) If a large amount of data in an indexed column has been added, changed, or removed (that is, if the distribution of key values has changed), or the table has been truncated using the TRUNCATE TABLE statement and then repopulated
3) Database is upgraded from a previous version. following commands: UPDATE STATISTICS, STATS_DATE, DBCC SHOW_STATISTICS, CREATE STATISTICS, DROP STATISTICS, sp_autostats, sp_createstats, sp_updatestats

Drop Index

I have a Large table with 1 million records and a Clustered Index, 4 Non Clustered Indexes on the table. I have to re-index the table. Which Index I should drop first?

In presence of Clustered Index, Non Clustered index holds Clustered Index key as part of Index Key value. So, if we drop the Clustered index before dropping Non Clustered Indexes, SQL Server has to modify the Non clustered index keys with ROWID in place of Clustered index key value. This causes more overload on db log, tempdb log Files as well. So, Drop the Non clustered indexes first , and then followed by Clustered Index.

What is lock Escalation?

Lock escalation is the process of converting many fine-grain locks into fewer coarse-grain locks, reducing system overhead. Microsoft® SQL Server™ 2000 automatically escalates row locks and page locks into table locks when a transaction exceeds its escalation threshold.

For example, when a transaction requests rows from a table, SQL Server automatically acquires locks on those rows affected and places higher-level intent locks on the pages and table, or index, which contain those rows. When the number of locks held by the transaction exceeds its threshold, SQL Server attempts to change the intent lock on the table to a stronger lock (for example, an intent exclusive (IX) would change to an exclusive (X) lock). After acquiring the stronger lock, all page and row level locks held by the transaction on the table are released, reducing lock overhead.

SQL Server may choose to do both row and page locking for the same query, for example, placing page locks on the index (if enough contiguous keys in a nonclustered index node are selected to satisfy the query) and row locks on the data. This reduces the likelihood that lock escalation will be necessary.

Lock escalation thresholds are determined dynamically by SQL Server and do not require configuration.

Update 100 Million Records

I Have a one big transaction with UPDATE stmt, which affects 1 million records. Every time I run this SP/Query, my transaction log is growing too much and some times it is failing. What all the remedy actions you suggest?

Ensure that Always your transaction size should be small enough.

In this case, we need to split the big transaction into multiple small transactions.

Begin tran bigger_tran
Update table
Set col = value
Where condition
Commit tran


Spliting the transaction

declare @ctr int
declare @trancount int
declare @ln_error int
declare @ln_rowcount int

set @ctr =1
set @trancount = 0
Set ROWCOUNT sa

while (@ctr <> 0)
begin

begin tran chunk_tran

update big_tran_to_small_tran
set j = null
where j is not null

select @ln_error = @@error, @ln_rowcount = @@rowcount

if @ln_error <> 0
begin
rollback tran chunk_tran
break
end
if @ln_rowcount = 0
begin
set @ctr=0
end
else
begin
commit tran chunk_tran
set @trancount = @trancount +1 -- verification only
end

end

set rowcount 0

Difference Between Views & Procedures

Similarity is SP & View both can return the data from tables and both hide the database structure from the user and provide the security.

Differences
Ø SP can be parameterized whereas views can not be.
Ø Views can be indexed, Stored Procedures can not be.
Ø Stored Procedures are better than views in performance, particularly on large tables.

Tuesday, September 4, 2007

What happens internally when a UPDATE statement is issued

First,user issues a update statement inside a transaction.

If the record to be modified is not in the buffer cache thenSQL Server reads the same from data page and places it memory within the buffer cache.

The user modifies the record within the buffer cache. Modified pages within the buffer cache are marked as being ‘dirty’ and they will have to be written to disk in the future.

Immediately after the record is modified within the buffer cache, a corresponding record indicating the data change and the transaction that caused the data change is written to the
log cache.

When a commit happends the log cache information is written to the transaction log file on disk.

And when a checkpoint occurs, data is written to data file (.mdf,.ndf) on physical disk.

So,dirty pages within the buffer cache are “flushed” to disk and also data from the log file is
re-applied to data file to make the changes available in the full backup.

Basics of Normalisation

Normalization could be defined as the process of efficiently organizing data or in other terms elminating redunant dataand maintain the data relative to other tables by means of relationships.

There are various steps to Normalization which we generally call as Normal Forms.There are various NFs but in real database world,database is said to be normalised if we normalise to Third Normal Form (3NF).Let us see each normal forms in detail:


1 NF

(a) Elimination of duplicate columns from the same entity.

(b) Creation of separate entity for each group of related data and identify each row with a unique column or set of columns(composite) which is called as primary key.

Example:-EmployeeID PayElement1 PayElement2 PayElement3

Here EmployeeID,PayElement1,PayElement2,PayElement3 are the column names,but the basic flaw in this model is all employees will not have all of these Payelements,some might have 2 and other might have just 1.

And also what if some more elemnts are added in near future for which the table structure needs to be modified,so now we define these tables to accomodate the above data.


PayElement

PayElementID -- > PayElementID is primary KeyPayElementName


Employee

EmployeeID

PayElementID -- > Here EmployeeID,PayElementID is the composite primary key.


Now with the above structure we can accomodate any number of pay elements for employees


2 NF

(a) Remove columns which is partially dependent on a entity and place them in separate entity. (b) Create relationships with the new entity created by foreign keys.


3 NF

(a) Remove columns that are not fully dependent upon the primary key.


OrderID

CustomerID

Price

Quantity

Total

In this total is not dependent on the primary key OrderID and CustomerID and in fact its a derived field from Price and Quantity and hence can be removed.

Monday, September 3, 2007

SOUNDEX and DIFFERENCE

Recently I got a request from friend to search words which sounded similar.He was wondering whether the same came be acheived in SQL Server.Answer would be YES.

We can do the same with SOUNDEX and DIFFERENCE,depends on your business scenario you can choose which works better for you.Lets go in details of how each works with an example:

Busybody is the word in the table and I lets suppose I use buz and busi

SELECT SOUNDEX ('busybody'), SOUNDEX ('buz'),SOUNDEX ('busi')

Output would be: B213 B200 B200


Output returned says when I use Soundex it treats this both as different,basically to say it works on vowels a, e , i, o, u and which character position vowels are there.

So busybody has 2 vowels u and o but the other one has just 1 and other has two but its a different character.

So to make this work the best option would be to use a combination of SoundEX and Substring,

Or next option would be if we get the first 2 letters of the SOUNDEX to be same like in our case "B2 " then we can go for substring and find the characters matching between two string values say 3 chars match irrespective of where it is we can show those as similar.

All the above logic can be encapsulated inside a function and can be used in select or where clause.

Next coming to DIFFERENCE clause
One more option is to use DIFFERENCE function which returns 4 if its ideal match,so the value is 0 means there is no match.

You can try the same as SELECT DIFFERENCE('busybody','xmn') which returns 0 and if provide SELECT DIFFERENCE('busybody','xyz') then it returns coz there is yz
somewhat matching as buzz...

Now coming to string which you have provided SELECT DIFFERENCE('busybody','buz'),SELECT DIFFERENCE('busybody','busi') both returns which means some what matching..

So you can list values for which difference is 2 or greater than 2 which would be still more simpler.But note this works perfect with the example we have taken,there might be some cases this might not work.. Try supplying possible combinations to this function and if you are satisfied with the result then proceed with the same..