How to find all capital and lower case occurrences of unicode character using regex and re.sub in Python? -
this code in django view (intentionally simplified)(python 2.7):
# -*- coding: utf-8 -*- django.shortcuts import render import re def index(request): found_verses = [] pattern = re.compile('ю') open('d.txt', 'r') doc: line in doc: found = pattern.search(line) if found: modified_line = pattern.sub('!'+'\g<0>'+'!',line) found_verses.append(modified_line) context = {'found_verses': found_verses} return render(request, 'myapp/index.html', context) d.txt (also utf-8) contains 1 line (intentionally simplified):
1. Я сказал Юлию одному. the above, when rendered, gives me expected result:
1. Я сказал Юли!ю! одному. when change capital letter pattern = re.compile('Ю'), gives me expected result:
1. Я сказал !Ю!лию одному. but when change group pattern = re.compile('[юЮ]') or pattern = re.compile('[Юю]') or pattern = re.compile('[ю]') or pattern = re.compile('[Ю]'), gives me nothing. trying that:
1. Я сказал !Ю!ли!ю! одному. please me result. i've been struggling more day , tried different configurations pattern = re.compile('[юЮ]', re.unicode) , pattern = re.compile('ю', re.unicode|re.i) , this , countless others in vain.
with io.open('d.txt', 'r', encoding='utf-8') doc: ... ...
pattern = re.compile(u'[юЮ]', re.unicode)
Comments
Post a Comment