regex - Regular Expression to match few characters from a string -


i trying find string within string. however, trying match if 1 or more character not matching.

let me explain example :

let's have string 'abcdefghij'. if string match 'abcd',

i write strfind('abcdefghij', 'abc')

now, have string 'adcf'. notice that, there mismatch in 2 characters, consider match.

any idea how ?

i know, not optimal code.

example :

a='abcdefghijk'; b='xbcx' c='abxx' d='axxd' e='abcx' f='xabc' g='axcd' h='abxd' ='abcd' 

all these strings should match a. hope example makes more clear. idea is, if there mismatch of 1 or 2 characters also, should considered match.

you this:

a = 'abcdefghij'; % main string b = 'adcf'; % string found tolerance = 2; % maximum number of different characters tolerate  na = numel(a); nb = numel(b); pos = find(sum(a(mod(cumsum([(1:na)' ones(na, nb - 1)], 2) - 1, na) + 1) == repmat(b, na, 1), 2) >= nb - tolerance); 

in case return pos = [1 3]'; because "adcf" can matched on first position (matching "a?c?") , on third position (matching "?d?f")

explanation:

  • first, take sizes of , b
  • then, create matrix [(1:na)' ones(na, nb - 1)], gives this:

output:

 1     1     1     1  2     1     1     1  3     1     1     1  4     1     1     1  5     1     1     1  6     1     1     1  7     1     1     1  8     1     1     1  9     1     1     1 10     1     1     1 
  • we perform cumulative sum right, using cumsum, achieve this:

output:

 1     2     3     4  2     3     4     5  3     4     5     6  4     5     6     7  5     6     7     8  6     7     8     9  7     8     9    10  8     9    10    11  9    10    11    12 10    11    12    13 
  • and use mod function each number between 1 , na, this:

output:

 1     2     3     4  2     3     4     5  3     4     5     6  4     5     6     7  5     6     7     8  6     7     8     9  7     8     9    10  8     9    10     1  9    10     1     2 10     1     2     3 
  • we use matrix index matrix.

output:

abcd bcde cdef defg efgh fghi ghij hija ijab jabc 

note matrix has possible substrings of size nb.

  • now use repmat replicate b down, 'na rows'.

output:

adcf adcf adcf adcf adcf adcf adcf adcf adcf adcf 
  • and perform direct comparison:

output:

 1     0     1     0  0     0     0     0  0     1     0     1  0     0     0     0  0     0     0     0  0     0     0     0  0     0     0     0  0     0     0     0  0     0     0     0  0     0     0     0 
  • summing right give this:

output:

2 0 2 0 0 0 0 0 0 0 

which number of character matches on each possible substring.

  • to finish, use find select indexes of matches within our tolerance.

Comments

Popular posts from this blog

c# - Unity IoC Lifetime per HttpRequest for UserStore -

Change the color of an oval at click in Java AWT -

I am trying to solve the error message 'incompatible ranks 0 and 1 in assignment' in a fortran 95 program. -