Matching email ids to people names -


i have database(say 5000 records) full of people names(first , last name). have huge set of email ids (say around 30000). have match these email ids people names ever possible , discard other ids. doing is, have made patterns like:

1. firstname.lastname@something.com 2. lastname.firstname@something.com 3. firstname_lastname@something.com 4. lastname_firstname@something.com etc

i trying use fuzzy search in both first , last names following above patterns. people tend use lot of patterns in email ids. of tend more 1 result people. there better way increase probability in matching emails correctly. searching lot , didn't find solid ideas.

to make bit smarter assume non alpha numeric name separator , use regular expression, e.g.

$jan[^a-z0-9]smith@.*^

but doesn't multiple matches. think it's inevitable you'll false positives email format not constrained. given size of database think you're stuck doing of hand :(


Comments

Popular posts from this blog

c# - Unity IoC Lifetime per HttpRequest for UserStore -

Change the color of an oval at click in Java AWT -

I am trying to solve the error message 'incompatible ranks 0 and 1 in assignment' in a fortran 95 program. -