regex - Regular Expression to match few characters from a string -
i trying find string within string. however, trying match if 1 or more character not matching.
let me explain example :
let's have string 'abcdefghij'. if string match 'abcd',
i write strfind('abcdefghij', 'abc')
now, have string 'adcf'. notice that, there mismatch in 2 characters, consider match.
any idea how ?
i know, not optimal code.
example :
a='abcdefghijk'; b='xbcx' c='abxx' d='axxd' e='abcx' f='xabc' g='axcd' h='abxd' ='abcd'
all these strings should match a
. hope example makes more clear. idea is, if there mismatch of 1 or 2 characters also, should considered match.
you this:
a = 'abcdefghij'; % main string b = 'adcf'; % string found tolerance = 2; % maximum number of different characters tolerate na = numel(a); nb = numel(b); pos = find(sum(a(mod(cumsum([(1:na)' ones(na, nb - 1)], 2) - 1, na) + 1) == repmat(b, na, 1), 2) >= nb - tolerance);
in case return pos = [1 3]'; because "adcf" can matched on first position (matching "a?c?") , on third position (matching "?d?f")
explanation:
- first, take sizes of , b
- then, create matrix
[(1:na)' ones(na, nb - 1)]
, gives this:
output:
1 1 1 1 2 1 1 1 3 1 1 1 4 1 1 1 5 1 1 1 6 1 1 1 7 1 1 1 8 1 1 1 9 1 1 1 10 1 1 1
- we perform cumulative sum right, using cumsum, achieve this:
output:
1 2 3 4 2 3 4 5 3 4 5 6 4 5 6 7 5 6 7 8 6 7 8 9 7 8 9 10 8 9 10 11 9 10 11 12 10 11 12 13
- and use mod function each number between 1 , na, this:
output:
1 2 3 4 2 3 4 5 3 4 5 6 4 5 6 7 5 6 7 8 6 7 8 9 7 8 9 10 8 9 10 1 9 10 1 2 10 1 2 3
- we use matrix index matrix.
output:
abcd bcde cdef defg efgh fghi ghij hija ijab jabc
note matrix has possible substrings of size nb.
- now use repmat replicate b down, 'na rows'.
output:
adcf adcf adcf adcf adcf adcf adcf adcf adcf adcf
- and perform direct comparison:
output:
1 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
- summing right give this:
output:
2 0 2 0 0 0 0 0 0 0
which number of character matches on each possible substring.
- to finish, use find select indexes of matches within our tolerance.
Comments
Post a Comment