parsing - Create Regex that accept name but not Word "to" -
i working on parsing commentary of espncricinfo , want parse following of statements.
example1 : yuvraj singh nasir jamshed
example2 : kumar shoaib malik
i write same regex both bowler , batsman name,
regex : [a-za-z[-]*]*\s[a-za-z[-]*]*\s
example1 parse facing problem in example2 like,
"kumar to" consider bowler name...
i need rid of word "to" bowler name.
you can try following regex
(?<=to |^).*?(?= to|$)
it work in case of yuvraj singh nasir jamshed kumar shoaib malik
string.
ex.
string[] names = regex.matches("yuvraj singh nasir jamshed kumar shoaib malik", "(?<=to |^).*?(?= to|$)") .cast<match>() .select(m => m.value) .toarray();
another option, since know every part of name starts capital letter, force rule (to
won't matched it, trailing whitespace will):
([a-z][\w-]*\s*)+
Comments
Post a Comment