How to find all capital and lower case occurrences of unicode character using regex and re.sub in Python? -
this code in django view (intentionally simplified)(python 2.7):
# -*- coding: utf-8 -*- django.shortcuts import render import re def index(request): found_verses = [] pattern = re.compile('ю') open('d.txt', 'r') doc: line in doc: found = pattern.search(line) if found: modified_line = pattern.sub('!'+'\g<0>'+'!',line) found_verses.append(modified_line) context = {'found_verses': found_verses} return render(request, 'myapp/index.html', context)
d.txt
(also utf-8) contains 1 line (intentionally simplified):
1. Я сказал Юлию одному.
the above, when rendered, gives me expected result:
1. Я сказал Юли!ю! одному.
when change capital letter pattern = re.compile('Ю')
, gives me expected result:
1. Я сказал !Ю!лию одному.
but when change group pattern = re.compile('[юЮ]')
or pattern = re.compile('[Юю]')
or pattern = re.compile('[ю]')
or pattern = re.compile('[Ю]')
, gives me nothing. trying that:
1. Я сказал !Ю!ли!ю! одному.
please me result. i've been struggling more day , tried different configurations pattern = re.compile('[юЮ]', re.unicode)
, pattern = re.compile('ю', re.unicode|re.i)
, this , countless others in vain.
with io.open('d.txt', 'r', encoding='utf-8') doc: ...
...
pattern = re.compile(u'[юЮ]', re.unicode)
Comments
Post a Comment