Welcome to Aussie SQL Server Bloggers Sign in | Join | Help

An idea for indexing phone numbers (index reversed phone numbers)

A recent thread in microsoft.public.sqlserver.programming (http://tinyurl.com/pzuq4) prompted me to think about how to index phone numbers for efficient query performance. One problem with storing & indexing phone numbers in database systems is that area codes are handled differently by different database designers, leading to various query performance issues. Some systems explicitly require users to enter area codes seperately from the local phone number, possibly storing these in seperate columns or parenthesizing the values prior to insertion into the phno column. Other systems simply allow end users to enter whatever combination of characters they wish. Input masks on user interfaces are also often used to control input formats, although strictly speaking, this is only a secondary form of input constraint (as it only compliments constraint checking performed by the database).

If a query needs to be written against a phone number column, a problem can occur if the phone number column is stored in an un-controlled way (free text) or, where the area code is stored in the phno column, if that area code is not known at the time of query execution. This is a fairlu common problem in CRM or call centre type applications, where a key lookup for customer information is phone-number related.

Whilst reading this short thread, the idea of storing the phone number in reverse order occured to me. Perhaps others have already thought of this idea, but it was new to me so I decided to explore it & write up a script that could demonstrate how one might implement such a solution (see below).

A major assumption is made with this technique that all area codes are simply prefixes to a number. This is an assumption I believe to be correct based on my own experiences with Australian phone numbers, but I guess it's possible that wierd things might happen with phone numbers in other parts of the world so this might not work for everyone.

In this script, I'm re-using the script I posted yesterday to generate a fairly random table of "people", this time customising it to include phone number reversal. Note the use of two triggers to keep the phoneno & revphno columns in synch. A scalar function is also used to strip any non-numeric values & whitespace from the numbers prior to reversing the number & storing it in revphno.

An index is created on (revphno, phoneno) to allow fast, indexed lookups against any reversed phone number. The calling code could perform the phone number reversal, or the scalar function could also be used in TSQL prior to calling the select statement.

This is just an idea, but it might help someone..

/*

* Reverse-indexing phone numbers

*/

set nocount on

go

use tempdb

go

/*

* BEGIN SETUP people table

*/

if object_id('people') is not null drop table people

if object_id('revphno') is not null drop function dbo.revphno

go

create table people (

personid int not null identity (1, 1) primary key nonclustered

, lname varchar (25) not null

, fname varchar (25) not null

, phoneno varchar (50) not null

, revphno varchar (50) null)

go

create function dbo.revphno (@ph varchar (50)) returns varchar (50)

as

begin

declare @rv varchar(50)

declare @i int, @l int

select @i = 1, @l = datalength(@ph), @rv = ''

while @i < @l + 1

begin

if substring(@ph, @i, 1) in ('0','1','2','3','4','5','6','7','8','9')

set @rv = @rv + substring(@ph, @i, 1)

set @i = @i + 1

end

select @rv = reverse(@rv)

return @rv

end

go

create trigger ins_people

on people

for insert as

begin

update p

set revphno = dbo.revphno(i.phoneno)

from people p

join inserted i on p.personid = i.personid

end

go

create trigger upd_people

on people

for update as

if (columns_updated() & 4) > 0

begin

update p

set revphno = dbo.revphno(i.phoneno)

from people p

join inserted i on p.personid = i.personid

end

go

declare @l int, @fn varchar(25), @ln varchar(25), @pn varchar(50), @bm char(6)

declare @i int, @report_every int, @rows_to_insert int, @p varchar (50)

select @i = 1, @rows_to_insert = 10000, @report_every = 1000

while @i < @rows_to_insert

begin

if @i % @report_every = 0

begin

select @p = convert(varchar(25), @i) + ' from '

+ convert(varchar(25), @rows_to_insert) + ' inserted'

print @p

end

/* generate a fname of random length (between 1 to 25 chrs), of random chars */

select @l = round(1 + (rand() * (25-1)),0)

select @fn='', @bm = 'uaeioy'

while @l > 0

begin

if (@l%2) = 0

set @fn = @fn + substring(@bm,convert(int,round(1 + (rand() * (5)),0)),1)

else

set @fn = @fn + char(round(97 + (rand() * (25)),0))

set @l = @l - 1

end

/* generate a lname of random length (between 1 to 25 chrs), of random chars */

select @l = round(1 + (rand() * (25-1)),0)

select @ln='', @bm = 'goeuai'

while @l > 0

begin

if (@l%2) = 0

set @ln = @ln + substring(@bm,convert(int,round(1 + (rand() * (5)),0)),1)

else

set @ln = @ln + char(round(97 + (rand() * (25)),0))

set @l = @l - 1

end

/* generate a random phone number, with occassional parenthesised area codes */

select @l = 1--round(1 + (rand() * (25-1)),0)

select @pn='', @bm = 'jeaoui'

while @l < 8

begin

set @pn = @pn + convert(varchar(2), round(1 + (rand() * (9-1)),0))

if @l = 3 set @pn = @pn + ','

set @l = @l + 1

end

--insert row with random phoneno (reversed in trigger)

insert into people (lname, fname, phoneno)

values (@ln, @fn, @pn)

set @i = @i + 1

end

/*

* END SETUP people table

*/

create clustered index cix_people_personid on people(personid)

create nonclustered index ncix_people_revphno on people (revphno, phoneno)

set statistics io on

set statistics profile on

select phoneno, personid from people where revphno like '12%'

set statistics profile off

set statistics io off

go

if object_id('people') is not null drop table people

if object_id('revphno') is not null drop function dbo.revphno

Published Sunday, 17 September 2006 1:09 PM by Greg_Linwood

Comments

# re: An idea for indexing phone numbers (index reversed phone numbers)

I have really enjoyed this series on indexing, really good stuff that I have not seen talked about much anywhere else.

One small observation, if you reverse the order of the loop in the revphno function or even changed the string concatenation line to:

set @rv = substring(@ph, @i, 1) + @rv

You could remove the call to the reverse function.

Cheers
Darren
Tuesday, 19 September 2006 9:51 AM by dgosbell

# re: An idea for indexing phone numbers (index reversed phone numbers)

Well spotted Darren! The main thrust of this blog is to highlight the technique of reverse index matching rather than the performance of the reversal algorithm, but it's always worth saving cycles!
Cheers,
Greg
Sunday, 24 September 2006 7:04 PM by Greg_Linwood
Anonymous comments are disabled