Question

0 0

如何用Python re正则匹配中文?

正则做字符串识别很方便，请问python下如何使用正则匹配中文呢?

python 中文匹配开发正则

10 years, 2 months ago

lailai

share

lailai 10 years, 2 months ago

Answer 1

0

Python re正则匹配中文，需要将输入的utf-8中文转为unicode编码，然后交由python处理。

unicode中中文的编码为/u4e00-/u9fa5，因此正则表达式u”[\u4e00-\u9fa5]+”可以表示一个或者多个中文字符

>>> import re
>>> s='中文：123456一二三四五六'
>>> s1=unicode(s,"gbk")
>>> s1
u'\u4e2d\u6587\uff1a123456\u4e00\u4e8c\u4e09\u56db\u4e94\u516d'
>>> print s
中文：123456一二三四五六

>>> re.match(u"[\u4e00-\u9fa5]+",s1)
<_sre.SRE_Match object at 0x7f2231c025e0>

>>> pat=unicode('中文, "gbk")
>>> re.search(pat,s)
<_sre.SRE_Match object at 0x7f2231c02988>

>>> newpat=unicode('中文内容', "gbk")

>>> news=re.sub(pat,newpat,s)
>>> print news
中文内容：123456一二三四五六

answered 10 years, 2 months ago

下水道的美人鱼

share

下水道的美人鱼 answered 10 years, 2 months ago

如何用Python re正则匹配中文?

lailai

Answers

下水道的美人鱼

Your Answer