This blog is for SQL Developers who would like to tranform themselves to DBA
Search This Blog
Friday, September 28, 2007
BCP and Bulk Insert
(ii) BCP can be used for import And export where as bulk INSERT for import only
While importing Large text files, bulk insert is proven to be faster than BCP because bulk insert converts the data into TDS (tabular data stream) , which is sql server complaint whereas BCP converts data into ODBC data Stream , again which has to be converted into TDS.
Force Optimiser to use Index
using WITH clause like
SELECT au_id,au_lname from authors WITH Index(0)
To use a particular index
With Index(Index name)
To use table scan
With Index(0)
To use Clustered Index
With Index(1)
However its not a good practice to force a index to be used,as optimizer would have tried various ways of retriving the data like using table scan , clustered,non-clustered and then only choosen the best method to retreive the same
Why SQL Optimizer uses table scan when an index is available?
Sql server is smart enough , that its optimizer determines which execution plan costs lesser and if choosing index is much costlier than table scan , it would opt for table scan.
This happens in following situations.
1. Index keys are very wide and they are distributed arbitrarily or to be precise statistics are outdated,hence use UPDATE STATISTICS to update the same.
2. Estimated rowcount is more than 5 % of the table data.
Keys in Database World
One or more columns which uniquely identiifes each row in the table.If there are more than 1 columns wich makes the row unique then its called as Composite Primary Key.
Unique Key
Same like primary key except that it allows 1 null value in the key columns
Candidate Key
A candidate key is one that can identify each row of a table uniquely. Generally a candidate key becomes the primary key of the table
Alternate Key
If the table has more than one candidate key, one of them will become the primary key, and the rest are called alternate keys.
Foreign Key
Primary key defined on 1 table is called as foreign key in the table which is referring the primary table.
To elaborate more , there is table employee which has employee ID as primary key and if we see Employee Salary table which has salary detail of employee,employee ID in salary table is called as Foreign key.
By this we enforce Employee has to be first present in Employee table and then only he can exist in Employee Salary table.
What are Statistics
If an indexed column has unique values then the selectivity of that index is more, as opposed to an index with non-unique values. Query optimizer uses these indexes in determining whether to choose an index or not while executing a query.
Some situations under which you should update statistics:
1) If there is significant change in the key values in the index
2) If a large amount of data in an indexed column has been added, changed, or removed (that is, if the distribution of key values has changed), or the table has been truncated using the TRUNCATE TABLE statement and then repopulated
3) Database is upgraded from a previous version. following commands: UPDATE STATISTICS, STATS_DATE, DBCC SHOW_STATISTICS, CREATE STATISTICS, DROP STATISTICS, sp_autostats, sp_createstats, sp_updatestats
Drop Index
In presence of Clustered Index, Non Clustered index holds Clustered Index key as part of Index Key value. So, if we drop the Clustered index before dropping Non Clustered Indexes, SQL Server has to modify the Non clustered index keys with ROWID in place of Clustered index key value. This causes more overload on db log, tempdb log Files as well. So, Drop the Non clustered indexes first , and then followed by Clustered Index.
What is lock Escalation?
For example, when a transaction requests rows from a table, SQL Server automatically acquires locks on those rows affected and places higher-level intent locks on the pages and table, or index, which contain those rows. When the number of locks held by the transaction exceeds its threshold, SQL Server attempts to change the intent lock on the table to a stronger lock (for example, an intent exclusive (IX) would change to an exclusive (X) lock). After acquiring the stronger lock, all page and row level locks held by the transaction on the table are released, reducing lock overhead.
SQL Server may choose to do both row and page locking for the same query, for example, placing page locks on the index (if enough contiguous keys in a nonclustered index node are selected to satisfy the query) and row locks on the data. This reduces the likelihood that lock escalation will be necessary.
Lock escalation thresholds are determined dynamically by SQL Server and do not require configuration.
Update 100 Million Records
Ensure that Always your transaction size should be small enough.
In this case, we need to split the big transaction into multiple small transactions.
Begin tran bigger_tran
Update table
Set col = value
Where condition
Commit tran
Spliting the transaction
declare @ctr int
declare @trancount int
declare @ln_error int
declare @ln_rowcount int
set @ctr =1
set @trancount = 0
Set ROWCOUNT sa
while (@ctr <> 0)
begin
begin tran chunk_tran
update big_tran_to_small_tran
set j = null
where j is not null
select @ln_error = @@error, @ln_rowcount = @@rowcount
if @ln_error <> 0
begin
rollback tran chunk_tran
break
end
if @ln_rowcount = 0
begin
set @ctr=0
end
else
begin
commit tran chunk_tran
set @trancount = @trancount +1 -- verification only
end
end
set rowcount 0
Difference Between Views & Procedures
Differences
Ø SP can be parameterized whereas views can not be.
Ø Views can be indexed, Stored Procedures can not be.
Ø Stored Procedures are better than views in performance, particularly on large tables.
Tuesday, September 4, 2007
What happens internally when a UPDATE statement is issued
If the record to be modified is not in the buffer cache thenSQL Server reads the same from data page and places it memory within the buffer cache.
The user modifies the record within the buffer cache. Modified pages within the buffer cache are marked as being ‘dirty’ and they will have to be written to disk in the future.
Immediately after the record is modified within the buffer cache, a corresponding record indicating the data change and the transaction that caused the data change is written to the
log cache.
When a commit happends the log cache information is written to the transaction log file on disk.
And when a checkpoint occurs, data is written to data file (.mdf,.ndf) on physical disk.
So,dirty pages within the buffer cache are “flushed” to disk and also data from the log file is
re-applied to data file to make the changes available in the full backup.
Basics of Normalisation
Normalization could be defined as the process of efficiently organizing data or in other terms elminating redunant dataand maintain the data relative to other tables by means of relationships.
There are various steps to Normalization which we generally call as Normal Forms.There are various NFs but in real database world,database is said to be normalised if we normalise to Third Normal Form (3NF).Let us see each normal forms in detail:
1 NF
(a) Elimination of duplicate columns from the same entity.
(b) Creation of separate entity for each group of related data and identify each row with a unique column or set of columns(composite) which is called as primary key.
Example:-EmployeeID PayElement1 PayElement2 PayElement3
Here EmployeeID,PayElement1,PayElement2,PayElement3 are the column names,but the basic flaw in this model is all employees will not have all of these Payelements,some might have 2 and other might have just 1.
And also what if some more elemnts are added in near future for which the table structure needs to be modified,so now we define these tables to accomodate the above data.
PayElement
PayElementID -- > PayElementID is primary KeyPayElementName
Employee
EmployeeID
PayElementID -- > Here EmployeeID,PayElementID is the composite primary key.
Now with the above structure we can accomodate any number of pay elements for employees
2 NF
(a) Remove columns which is partially dependent on a entity and place them in separate entity. (b) Create relationships with the new entity created by foreign keys.
3 NF
(a) Remove columns that are not fully dependent upon the primary key.
OrderID
CustomerID
Price
Quantity
Total
In this total is not dependent on the primary key OrderID and CustomerID and in fact its a derived field from Price and Quantity and hence can be removed.
Monday, September 3, 2007
SOUNDEX and DIFFERENCE
We can do the same with SOUNDEX and DIFFERENCE,depends on your business scenario you can choose which works better for you.Lets go in details of how each works with an example:
Busybody is the word in the table and I lets suppose I use buz and busi
SELECT SOUNDEX ('busybody'), SOUNDEX ('buz'),SOUNDEX ('busi')
Output would be: B213 B200 B200
Output returned says when I use Soundex it treats this both as different,basically to say it works on vowels a, e , i, o, u and which character position vowels are there.
So busybody has 2 vowels u and o but the other one has just 1 and other has two but its a different character.
So to make this work the best option would be to use a combination of SoundEX and Substring,
Or next option would be if we get the first 2 letters of the SOUNDEX to be same like in our case "B2 " then we can go for substring and find the characters matching between two string values say 3 chars match irrespective of where it is we can show those as similar.
All the above logic can be encapsulated inside a function and can be used in select or where clause.
Next coming to DIFFERENCE clause
One more option is to use DIFFERENCE function which returns 4 if its ideal match,so the value is 0 means there is no match.
You can try the same as SELECT DIFFERENCE('busybody','xmn') which returns 0 and if provide SELECT DIFFERENCE('busybody','xyz') then it returns coz there is yz
somewhat matching as buzz...
Now coming to string which you have provided SELECT DIFFERENCE('busybody','buz'),SELECT DIFFERENCE('busybody','busi') both returns which means some what matching..
So you can list values for which difference is 2 or greater than 2 which would be still more simpler.But note this works perfect with the example we have taken,there might be some cases this might not work.. Try supplying possible combinations to this function and if you are satisfied with the result then proceed with the same..