An idea for indexing phone numbers (index reversed phone numbers)
A recent thread in microsoft.public.sqlserver.programming (http://tinyurl.com/pzuq4) prompted me to think about how to index phone numbers for efficient query performance. One problem with storing & indexing phone numbers in database systems is that area codes are handled differently by different database designers, leading to various query performance issues. Some systems explicitly require users to enter area codes seperately from the local phone number, possibly storing these in seperate columns or parenthesizing the values prior to insertion into the phno column. Other systems simply allow end users to enter whatever combination of characters they wish. Input masks on user interfaces are also often used to control input formats, although strictly speaking, this is only a secondary form of input constraint (as it only compliments constraint checking performed by the database).
If a query needs to be written against a phone number column, a problem can occur if the phone number column is stored in an un-controlled way (free text) or, where the area code is stored in the phno column, if that area code is not known at the time of query execution. This is a fairlu common problem in CRM or call centre type applications, where a key lookup for customer information is phone-number related.
Whilst reading this short thread, the idea of storing the phone number in reverse order occured to me. Perhaps others have already thought of this idea, but it was new to me so I decided to explore it & write up a script that could demonstrate how one might implement such a solution (see below).
A major assumption is made with this technique that all area codes are simply prefixes to a number. This is an assumption I believe to be correct based on my own experiences with Australian phone numbers, but I guess it's possible that wierd things might happen with phone numbers in other parts of the world so this might not work for everyone.
In this script, I'm re-using the script I posted yesterday to generate a fairly random table of "people", this time customising it to include phone number reversal. Note the use of two triggers to keep the phoneno & revphno columns in synch. A scalar function is also used to strip any non-numeric values & whitespace from the numbers prior to reversing the number & storing it in revphno.
An index is created on (revphno, phoneno) to allow fast, indexed lookups against any reversed phone number. The calling code could perform the phone number reversal, or the scalar function could also be used in TSQL prior to calling the select statement.
This is just an idea, but it might help someone..
/*
* Reverse-indexing phone numbers
*/
set
nocount on
go
use
tempdb
go
/*
* BEGIN SETUP people table
*/
if
object_id('people') is not null drop table people
if
object_id('revphno') is not null drop function dbo.revphno
go
create
table people (
personid
int not null identity (1, 1) primary key nonclustered
, lname varchar (25) not null
, fname varchar (25) not null
, phoneno varchar (50) not null
, revphno varchar (50) null)
go
create
function dbo.revphno (@ph varchar (50)) returns varchar (50)
as
begin
declare @rv varchar(50)
declare @i int, @l int
select @i = 1, @l = datalength(@ph), @rv = ''
while @i < @l + 1
begin
if substring(@ph, @i, 1) in ('0','1','2','3','4','5','6','7','8','9')
set @rv = @rv + substring(@ph, @i, 1)
set @i = @i + 1
end
select @rv = reverse(@rv)
return @rv
end
go
create
trigger ins_people
on
people
for
insert as
begin
update p
set revphno = dbo.revphno(i.phoneno)
from people p
join inserted i on p.personid = i.personid
end
go
create
trigger upd_people
on
people
for
update as
if
(columns_updated() & 4) > 0
begin
update p
set revphno = dbo.revphno(i.phoneno)
from people p
join inserted i on p.personid = i.personid
end
go
declare
@l int, @fn varchar(25), @ln varchar(25), @pn varchar(50), @bm char(6)
declare
@i int, @report_every int, @rows_to_insert int, @p varchar (50)
select
@i = 1, @rows_to_insert = 10000, @report_every = 1000
while
@i < @rows_to_insert
begin
if @i % @report_every = 0
begin
select @p = convert(varchar(25), @i) + ' from '
+ convert(varchar(25), @rows_to_insert) + ' inserted'
print @p
end
/* generate a fname of random length (between 1 to 25 chrs), of random chars */
select @l = round(1 + (rand() * (25-1)),0)
select @fn='', @bm = 'uaeioy'
while @l > 0
begin
if (@l%2) = 0
set @fn = @fn + substring(@bm,convert(int,round(1 + (rand() * (5)),0)),1)
else
set @fn = @fn + char(round(97 + (rand() * (25)),0))
set @l = @l - 1
end
/* generate a lname of random length (between 1 to 25 chrs), of random chars */
select @l = round(1 + (rand() * (25-1)),0)
select @ln='', @bm = 'goeuai'
while @l > 0
begin
if (@l%2) = 0
set @ln = @ln + substring(@bm,convert(int,round(1 + (rand() * (5)),0)),1)
else
set @ln = @ln + char(round(97 + (rand() * (25)),0))
set @l = @l - 1
end
/* generate a random phone number, with occassional parenthesised area codes */
select @l = 1--round(1 + (rand() * (25-1)),0)
select @pn='', @bm = 'jeaoui'
while @l < 8
begin
set @pn = @pn + convert(varchar(2), round(1 + (rand() * (9-1)),0))
if @l = 3 set @pn = @pn + ','
set @l = @l + 1
end
--insert row with random phoneno (reversed in trigger)
insert into people (lname, fname, phoneno)
values (@ln, @fn, @pn)
set @i = @i + 1
end
/*
* END SETUP people table
*/
create
clustered index cix_people_personid on people(personid)
create
nonclustered index ncix_people_revphno on people (revphno, phoneno)
set
statistics io on
set
statistics profile on
select
phoneno, personid from people where revphno like '12%'
set
statistics profile off
set
statistics io off
go
if
object_id('people') is not null drop table people
if
object_id('revphno') is not null drop function dbo.revphno