How to find all capital and lower case occurrences of unicode character using regex and re.sub in Python? -


this code in django view (intentionally simplified)(python 2.7):

# -*- coding: utf-8 -*- django.shortcuts import render import re  def index(request):     found_verses = []      pattern = re.compile('ю')      open('d.txt', 'r') doc:         line in doc:              found = pattern.search(line)              if found:                 modified_line = pattern.sub('!'+'\g<0>'+'!',line)                 found_verses.append(modified_line)  context = {'found_verses': found_verses} return render(request, 'myapp/index.html', context) 

d.txt (also utf-8) contains 1 line (intentionally simplified):

1. Я сказал Юлию одному. 

the above, when rendered, gives me expected result:

1. Я сказал Юли!ю! одному. 

when change capital letter pattern = re.compile('Ю'), gives me expected result:

1. Я сказал !Ю!лию одному. 

but when change group pattern = re.compile('[юЮ]') or pattern = re.compile('[Юю]') or pattern = re.compile('[ю]') or pattern = re.compile('[Ю]'), gives me nothing. trying that:

1. Я сказал !Ю!ли!ю! одному. 

please me result. i've been struggling more day , tried different configurations pattern = re.compile('[юЮ]', re.unicode) , pattern = re.compile('ю', re.unicode|re.i) , this , countless others in vain.

use unicodes.

with io.open('d.txt', 'r', encoding='utf-8') doc:    ... 

...

pattern = re.compile(u'[юЮ]', re.unicode) 

Comments

Popular posts from this blog

PHPMotion implementation - URL based videos (Hosted on separate location) -

javascript - Using Windows Media Player as video fallback for video tag -

c# - Unity IoC Lifetime per HttpRequest for UserStore -