How to find all capital and lower case occurrences of unicode character using regex and re.sub in Python? -

- September 15, 2015

this code in django view (intentionally simplified)(python 2.7):

# -*- coding: utf-8 -*- django.shortcuts import render import re  def index(request):     found_verses = []      pattern = re.compile('ю')      open('d.txt', 'r') doc:         line in doc:              found = pattern.search(line)              if found:                 modified_line = pattern.sub('!'+'\g<0>'+'!',line)                 found_verses.append(modified_line)  context = {'found_verses': found_verses} return render(request, 'myapp/index.html', context)

d.txt (also utf-8) contains 1 line (intentionally simplified):

1. Я сказал Юлию одному.

the above, when rendered, gives me expected result:

1. Я сказал Юли!ю! одному.

when change capital letter pattern = re.compile('Ю'), gives me expected result:

1. Я сказал !Ю!лию одному.

but when change group pattern = re.compile('[юЮ]') or pattern = re.compile('[Юю]') or pattern = re.compile('[ю]') or pattern = re.compile('[Ю]'), gives me nothing. trying that:

1. Я сказал !Ю!ли!ю! одному.

please me result. i've been struggling more day , tried different configurations pattern = re.compile('[юЮ]', re.unicode) , pattern = re.compile('ю', re.unicode|re.i) , this , countless others in vain.

use unicodes.

with io.open('d.txt', 'r', encoding='utf-8') doc:    ...

...

pattern = re.compile(u'[юЮ]', re.unicode)

Search This Blog

XPATH

How to find all capital and lower case occurrences of unicode character using regex and re.sub in Python? -

Comments

Post a Comment

Popular posts from this blog

Change the color of an oval at click in Java AWT -

c# - MSBuild\12.0\bin\Microsoft.Common.CurrentVersion.targets(3243,9): error MSB4094 -

javafx 8 - JavaFX8 TreeTableView multiple root items -