环境: oracle 10g r2
更多内容可以参考:
<<Oracle? Database Application Developer's Guide - Fundamentals 10g Release 2 (10.2)>>必须明白的是,oracle的正则表达式和其它的语言的并不是完全一样。
oracle支持以下标准的正则表达式:
- IEEE Portable Operating System Interface (POSIX) standard draft 1003.2/D11.2
- Unicode Regular Expression Guidelines of the Unicode Consortium
- Extends the matching capabilities for multilingual data beyond what is specified in the POSIX standard(即posix的多语言支持).
- Adds support for the common Perl regular expression extensions that are not
included in the POSIX standard but do not conflict with it. Oracle Database
provides built-in support for some of the most heavily used Perl regular
expression operators, for example, character class shortcuts, the non-greedy
modifier, and so on(即部分perl标准,不包含在posix标准,但是又不与之冲突的)
---2008-09-08 做了个实验,关键的要点是三个,重复次数{},+,*,?
---以及匹配模式[...] ,[^...],(...),(...)
--以及转义符号 \d,\D,\\等。
------------------------------------------------------------------------------------------------------------------
ID VALUE
--------------------------------------- ----------------------------------------
9 lzf is a chinese
10 1231231231lzf
11 perfect
1 13625061300
2 059183383335
3 13959029697
4 13950323157
5 who is your daddy?
6 who are you?
6 what is your name?
6 I am 20 years old!
7 股市下挫到2100点左右
8 lzf ' is very happy 136ok
8 do you know lzf,ok?
8 ok,lzf is my brother
------------------------------------------------------------------------------------------------------------------
select * from test_reg where regexp_like(value,'^1(.+)7$') --1开头,7结尾的内容
select * from test_reg where regexp_like(value,'\d+') --一个或者多个数字
select * from test_reg where regexp_like(value,'^[a-zA-Z]{1,}[a-zA-Z]+$') --全部字符的.[:digit:]
select * from test_reg where regexp_like(value,'^[0-9]{1,}[0-9]+$') --全部数字的.
select * from test_reg where regexp_like(value,'[y][o][u]') --包含you
select * from test_reg where regexp_like(value,'(you)') --包含you
select * from test_reg where regexp_like(value,'[you]') --包含y o u的,不分顺序
select * from test_reg where regexp_like(value,'(lzf){1,}(.){0,}(ok)') --包含lzf 和ok的. 但是也仅仅限制与lzf 在ok前
以上试验是在看了下文以及参考了OReilly.Oracle.PL.SQL.Programming.4th.Edition.Aug.2005.chs后作出的.
--------------
2010/08/04 增加以下例子---
--以'POSIX('开头的串,以下几个表达式的结果都是一样的。
select 1 from dual where regexp_like('POSIX(ADDDDDD,Y,13)','^(POSIX\()([A-Z]+)(,)([RYN]{1})(,)\d{1,}?\)')
select 1 from dual where regexp_like('POSIX(ADDDDDD,Y,13)','^(POSIX\()([A-Z]+)(,)([RYN]{1})(,)\d+\)')
select 1 from dual where regexp_like('POSIX(ADDDDDD,Y,13)','^(POSIX\()([A-Z]+)(,)([RYN]{1}),\d+\)')
select 1 from dual where regexp_like('POSIX(ADDDDDD,Y,13)','^(POSIX\()([A-Z]+),([RYN]{1}),\d+\)')
select 1 from dual where regexp_like('POSIX(ADDDDDD,Y,13)','^(POSIX\()([A-Z]+),([RYN]+),\d+\)')
select 1 from dual where regexp_like('POSIX(ADDDDDD,Y,13)','^(POSIX\()([A-Z]+),[RYN]+,\d+\)')
select 1 from dual where regexp_like('POSIX(A,N,13)','^(POSIX\()([A-Z]+),[RYN]+,\d+\)')
----------
--位于字符串中的串
select 1 from dual where regexp_like('POSIX(A,N,13)','^(POSIX\()([A-Z]+),[RYN]+,\d+\)')
select 1 from dual where regexp_like('P_OSI_X(JYL_GP_R,N,13)','^[A-Z_]+\(([A-Z_]+),[RYN]+,\d+\)')
--测试regexp_substr,基本的还是regexp_like
SQL> select regexp_substr('P_OSI_X(JYL_GP_R,N,13)+AVG(JYL_JJ_R,Y,3)','^[A-Z_]+\(([A-Z_]+),[RYN]+,\d+\)') FROM DUAL 2 / REGEXP_SUBSTR('P_OSI_X(JYL_GP_ ------------------------------ P_OSI_X(JYL_GP_R,N,13) SQL> select regexp_substr('P_OSI_X(JYL_GP_R,N,13)+AVG(JYL_JJ_R,Y,3)','[A-Z_]+\(([A-Z_]+),[RYN]+,\d+\)',1,2) FROM DUAL 2 / REGEXP_SUBSTR('P_OSI_X(JYL_GP_ ------------------------------ AVG(JYL_JJ_R,Y,3) |
2010/08/26 新增例子
-- delete the first () select regexp_replace('dad(disc_id=50003118)add','(^([A-Z]{1,}\()|(\)[A-Z]{1,})$)','',1,0,'i') FROM DUAL -- match pattern using classes select regexp_substr(upper('dad(ssdsd89\9 dd== (sddf(d)),sdsd)add'),'^([A-Z]{1,}\()[[:space:][:alnum:]\,\_\(\)\=]{1,}\)[A-Z]{1,}$',1,1,'i') FROM DUAL
|
OReilly. 的参考内容Section A.1. Metacharacters
A.1. Metacharacters
The R2 column in Tables A-1 through A-3 indicates which metacharacters were introduced in Oracle Database 10g Release 2. All others were introduced in Release 1.
Table A-1. Character-matching metacharacters
Syntax |
R2? |
Description |
. |
|
Matches any single character except for newline. Will match newline when the n flag is set. On Windows, Linux, and Unix platforms, chr(10) is recognized as the newline. |
[ ... ] |
|
Defines a matching list that matches any character listed between the brackets. You may specify ranges of characters, as in a-z. These ranges are interpreted based on the NLS_SORT setting.
A dash (-) is a literal when it occurs first or last in the list (e.g., [abc-]). A closing-bracket (]) is a literal when it occurs first in the list (e.g., []abc]). A caret (^) in the first position makes the list a nonmatching list (see the next entry). |
[^ ... ] |
|
Matches any character not listed between the brackets. Referred to as a "nonmatching list." |
[:class:] |
|
Matches any character that belongs to the specified character class. May only be used within a matching list: [[:class:]abc] is a valid expression, [:class:]abc is not. Table A-5 lists the valid character class names. |
[.coll.] |
|
Matches the specified collation element, which may be one or more characters. May only be used within a matching list. For example, the expression [[.ch.]] matches the Spanish letter "ch". Table A-4 lists the valid collation elements. |
[=char=] |
|
Matches all characters that share the same base character as char. May be used only within a matching list. For example, [[=e=]] matches any of: "eé?è???E". |
\d |
|
Matches any digit. Equivalent to [[:digit:]]. |
\D |
|
Matches any nondigit. Equivalent to [^[:digit:]] |
\w |
|
Matches any "word character." Word characters are defined to be alphabetic characters, numeric characters, and the underscore. |
\W |
|
Matches any nonword character. |
\s |
|
Matches any whitespace character. Equivalent to [[:space:]]. |
\S |
|
Matches nonwhitespace characters. Equivalent to [^[:space:]]. |
Table A-2. Quantifiers
Syntax |
R2? |
Description |
+ |
|
One or more |
? |
|
Zero or one |
* |
|
Zero or more |
{m} |
|
Exactly m occurrences |
{m,} |
|
At least m occurrences |
{m,n} |
|
At least m, and at most n occurrences |
+? |
|
One or more, but nongreedy |
?? |
|
Zero or one, but nongreedy |
{m}? |
|
The same as {m} |
{m,}? |
|
At least m occurrences, but nongreedy and stops as soon as m occurrences are reached |
{m,n}? |
|
At least m, and at most n occurrences, but nongreedy; when possible, m occurrences are matched |
Table A-3. Other metacharacters
Syntax |
R2? |
Description |
| |
|
Specifies an alternation. An alternation within a subexpression doesn't extend beyond the subexpression. |
( ...) |
|
Defines a subexpresson. |
\n |
|
References the text matched by the nth subexpression. Backreferences may range from \1 through \9. |
\ |
|
When not followed by a digit, the \ is an escape character. For example, use the pattern \\1 to look for a single backslash followed by the digit 1, use \( to look for an opening-parentheses (rather than begin a subexpression), etc. |
^ |
|
Anchors an expression to the beginning of the string (in multiline mode, to the beginning of a line). |
$ |
|
Anchors an expression to the end of the string (in multiline mode, to the end of a line). |
\A |
|
Anchors an expression to the beginning of the string regardless of whether multiline mode is specified. |
\Z |
|
Anchors an expression to the end of the string, or the a newline that happens to be ending a string, regardless of whether multiline mode is specified. |
\z |
|
Anchors an expression to the end of the string regardless of whether multiline mode is specified. |
Table A-4. Collation elements
NLS_SORT |
Multicharacter collation elements |
XDANISH |
aa
oe |
AA
OE |
Aa
Oe |
XSPANISH |
ch
ll |
CH
LL |
Ch
Ll |
XHUNGARIAN |
cs
gy
ly
ny
sz
ty
zs |
CS
GY
LY
NY
SZ
TY
ZS |
Cs
Gy
Ly
Ny
Sz
Ty
Zs |
XCZECH |
ch |
CH |
Ch |
XCZECH_PUNCTUATION |
ch |
CH |
Ch |
XSLOVAK |
dz
d_
ch |
DZ
D_
CH |
Dz
D_
Ch |
XCROATIAN |
d_
lj
nj |
D_
LJ
Nj |
D_
Lj
NJ |
Table A-5. Supported character classes
Class |
Description |
[:alnum:] |
Alphanumeric characters (same as [:alpha:] + [:digit:]) |
[:alpha:] |
Alphabetic characters only |
[:blank:] |
Blank space characters, such as space and tab |
[:cntrl:] |
Nonprinting, or control characters |
[:digit:] |
Numeric digits |
[:graph:] |
Graphical characters (same as [:punct:] + [:upper:] + [:lower:] + [:digit:]) |
[:lower:] |
Lowercase letters |
[:print:] |
Printable characters |
[:punct:] |
Punctuation characters |
[:space:] |
Whitespace characters such as space, formfeed, newline, carriage return, horizontal tab, and vertical tab |
[:upper:] |
Uppercase letters |
[:xdigit:] |
Hexadecimal characters |
|
对于类的使用,要特别注意点,因为基本上要包含在一对中括号中。
例一:检验非汉字的函数例如可以定义一个检验非汉字的函数,不考虑其它UNICODE
CREATE OR REPLACE Function func_existNormalChar(chars In Varchar2) Return Pls_Integer
Is
vResult Pls_Integer;
Begin
Select 1
Into vresult From dual
Where regexp_like(chars,'[!-~ ,。、?’“;:『』【】+=-——~!◎#¥%……※×()]+')
Or regexp_like(chars,'[[:space:]+]')
Or regexp_like(chars,'[[:blank:]+]')
Or regexp_like(chars,'[[:cntrl:]+]');
Return vresult;
Exception
When Others Then
Return 0;
End;
例二:出现次数的检验
序号 |
例子 |
结果 |
说明 |
1 |
regexp_replace('a12c','[[:digit:]]+','*') |
a*c |
测试"+",即出现一次或者更多次的.由于采取的是连续的匹配,所以结果上是把多个数字合并为一个替代字符串,为了理解连续匹配的意思,看序号9更能够明白.这是总体偏少的模式 |
2 |
regexp_replace('a12c','[[:digit:]]?','*') |
*a***c* |
测试"?",即0或者一次.这个符号,会在任意一个其它字符边上插入一个假设出现的匹配类型符号,如例子,a的左边本来是没有,但是表达式认为数字字符在a的左边出现了0次(这个是成立的).然后还要逐个替换过去,这是总体偏多的模式,过于匹配和逐个匹配结合
|
3 |
regexp_replace('a12c','[[:digit:]]*','*')
|
*a**c* |
测试"*",即0或者更多次数,这个模式类似与?类似,采取连续匹配+多余匹配 |
4 |
regexp_replace('a12c','[[:digit:]]+?','*')
|
a**c |
测试"+?",匹配在前面一次或者多次的串,采取的是逐个匹配的策略(如果有连续的情况出现),这个是通常最希望使用的模式. |
5 |
regexp_replace('a12c','[[:digit:]]*?','*') |
*a*1*2*c* |
测试"*?",匹配0或者更多次,难于理解! 但可以简单地理解为为在各个字符中以及两边插入替代串.
|
6 |
regexp_replace('a12c','[[:digit:]]??','*') |
*a*1*2*c* |
测试"??",匹配0或者1次,难于理解啊! 但可以简单地理解为为在各个字符中以及两边插入替代串. |
7 |
regexp_replace('a12c','[[:digit:]]{1,}','*')
|
a*c |
测试{1,},同+,结果已经说明 |
8 |
|
|
|
9 |
regexp_replace('a1x2c','[[:digit:]]+','*') |
a*x*c |
测试"+",这个例子说明,如果被检测的字符串中特定的匹配串如果不是连续的(多次),那么替代的还是属于正常的范围. |
其它的次数模式暂时没有测试,对于 *,?,+ ,*?,+?,??,基本有其对应的模式 {n,m}? 或者是{n,}? ,除了*,*?之外,因为此二者表示的都是0或者多次(大于1),无法用{n,m}的方式表示.
例三:多个类的并用
很多时候使用类还是很方便的,例如[:digit:]表示数字.[:space:]表示空格,换行,回车等等.
如果希望把特定串中的数字和回车空格等都替换为'-',则可以使用 "|"(替代操作符,其实就是类似一般高级语言中的或操作符号).
select regexp_replace('adfa jkjk jk
jk','([[:digit:]]+?)|([[:space:]]+?)','-') from dual;
结果如下:
adfa-jkjk-jk-jk
已经把空格和回车符号都替换为'-'了.
又例如:
Select regexp_replace('a1b c1d jk
jk','([[:digit:]]+?)|([[:space:]]+?)','-') From dual
结果如下:
a-b-c-d-jk-jk
例四:迭代替换
例如希望把以下串替换掉
'<xxxx <often> <common> <fat> >'
关于范围操作符号 -
发音:横杆或者减号
在列表匹配操作情况下,如果没有特别的设置,就是表示匹配一个范围:
例子:把字符串中非数字+-()的符号替换为空
这是一个不成功的匹配
SQL> select regexp_replace('1+[2c- 3([ad]km,4','[^0-9\+\-\(\)]','',1) from dual; REGEXP_REPLACE('1+[2C-3([AD]KM ------------------------------ 1+23(4
这也是一个错误的匹配,因为减号这个时候表示一个范围 SQL> select regexp_replace('1+[2c- 3([ad]km,4','[^0-9\+-\(\)]','',1) from dual; REGEXP_REPLACE('1+[2C-3([AD]KM ------------------------------ 1+[2-3([,4
二者的区别在于单独的一个减号,如果前面没有跟上斜杠\,那么就是表示范围,否则就是减号本身。
但即使如此,依然没有达到目的,第一个把减号还是给去除了,虽然排除了其它的需要排除的符号,
第二个,则是当作范围符号,大部分的没有排除掉。
但是不太明白的是,按照oracle文档的约定,在范围匹配中形如[\-]表示匹配一个减号,为何依然匹配补上了,尤其是当夹着其它需要匹配的模式的时候。
现在只能尝试着把这个模式放在最后:
SQL> select regexp_replace('1+[2c- 3([ad4','[^0-9\(+\)\-]','',1) from dual 2 / REGEXP_REPLACE('1+[2C-3([AD4', ------------------------------ 1+2-3(4
结果是可以,因为模式"\-"被放到了范围中的尾巴。
那是文档不全,还是理解不透彻了,
看来以后遇到这样的问题还是先把减号放到尾巴去吧。 |