Matching email ids to people names -
i have database(say 5000 records) full of people names(first , last name). have huge set of email ids (say around 30000). have match these email ids people names ever possible , discard other ids. doing is, have made patterns like:
1. firstname.lastname@something.com 2. lastname.firstname@something.com 3. firstname_lastname@something.com 4. lastname_firstname@something.com etc
i trying use fuzzy search in both first , last names following above patterns. people tend use lot of patterns in email ids. of tend more 1 result people. there better way increase probability in matching emails correctly. searching lot , didn't find solid ideas.
to make bit smarter assume non alpha numeric name separator , use regular expression, e.g.
$jan[^a-z0-9]smith@.*^
but doesn't multiple matches. think it's inevitable you'll false positives email format not constrained. given size of database think you're stuck doing of hand :(
Comments
Post a Comment