Search This Blog

Monday, September 3, 2007

SOUNDEX and DIFFERENCE

Recently I got a request from friend to search words which sounded similar.He was wondering whether the same came be acheived in SQL Server.Answer would be YES.

We can do the same with SOUNDEX and DIFFERENCE,depends on your business scenario you can choose which works better for you.Lets go in details of how each works with an example:

Busybody is the word in the table and I lets suppose I use buz and busi

SELECT SOUNDEX ('busybody'), SOUNDEX ('buz'),SOUNDEX ('busi')

Output would be: B213 B200 B200


Output returned says when I use Soundex it treats this both as different,basically to say it works on vowels a, e , i, o, u and which character position vowels are there.

So busybody has 2 vowels u and o but the other one has just 1 and other has two but its a different character.

So to make this work the best option would be to use a combination of SoundEX and Substring,

Or next option would be if we get the first 2 letters of the SOUNDEX to be same like in our case "B2 " then we can go for substring and find the characters matching between two string values say 3 chars match irrespective of where it is we can show those as similar.

All the above logic can be encapsulated inside a function and can be used in select or where clause.

Next coming to DIFFERENCE clause
One more option is to use DIFFERENCE function which returns 4 if its ideal match,so the value is 0 means there is no match.

You can try the same as SELECT DIFFERENCE('busybody','xmn') which returns 0 and if provide SELECT DIFFERENCE('busybody','xyz') then it returns coz there is yz
somewhat matching as buzz...

Now coming to string which you have provided SELECT DIFFERENCE('busybody','buz'),SELECT DIFFERENCE('busybody','busi') both returns which means some what matching..

So you can list values for which difference is 2 or greater than 2 which would be still more simpler.But note this works perfect with the example we have taken,there might be some cases this might not work.. Try supplying possible combinations to this function and if you are satisfied with the result then proceed with the same..

No comments: