环境: oracle 10g r2 
更多内容可以参考:
<<Oracle? Database Application Developer's Guide - Fundamentals 10g Release 2 (10.2)>>必须明白的是,oracle的正则表达式和其它的语言的并不是完全一样。
oracle支持以下标准的正则表达式:
- IEEE Portable Operating System Interface (POSIX) standard draft 1003.2/D11.2
- Unicode Regular Expression Guidelines of the Unicode Consortium
- Extends the matching capabilities for multilingual data beyond what is specified  in the POSIX standard(即posix的多语言支持).
 
- Adds support for the common Perl regular expression extensions that are not
          included in the POSIX standard but do not conflict with it. Oracle Database
provides built-in support for some of the most heavily used Perl regular
expression operators, for example, character class shortcuts, the non-greedy
modifier, and so on(即部分perl标准,不包含在posix标准,但是又不与之冲突的)
---2008-09-08 做了个实验,关键的要点是三个,重复次数{},+,*,?   
---以及匹配模式[...] ,[^...],(...),(...)
--以及转义符号 \d,\D,\\等。
------------------------------------------------------------------------------------------------------------------
                                     ID VALUE
--------------------------------------- ----------------------------------------
                                      9 lzf is a chinese
                                     10 1231231231lzf
                                     11 perfect
                                      1 13625061300
                                      2 059183383335
                                      3 13959029697
                                      4 13950323157
                                      5 who is your daddy?
                                      6 who are you?
                                      6 what is your name?
                                      6 I am 20 years old!
                                      7 股市下挫到2100点左右
                                      8 lzf ' is very happy 136ok
                                      8 do you know lzf,ok?
                                      8 ok,lzf is my brother
------------------------------------------------------------------------------------------------------------------
select * from  test_reg where regexp_like(value,'^1(.+)7$')  --1开头,7结尾的内容
select * from  test_reg where regexp_like(value,'\d+')       --一个或者多个数字
select * from  test_reg where regexp_like(value,'^[a-zA-Z]{1,}[a-zA-Z]+$')  --全部字符的.[:digit:]
select * from  test_reg where regexp_like(value,'^[0-9]{1,}[0-9]+$')  --全部数字的.
select * from  test_reg where regexp_like(value,'[y][o][u]') --包含you
select * from  test_reg where regexp_like(value,'(you)') --包含you
select * from  test_reg where regexp_like(value,'[you]')     --包含y o u的,不分顺序
select * from  test_reg where regexp_like(value,'(lzf){1,}(.){0,}(ok)') --包含lzf 和ok的.  但是也仅仅限制与lzf 在ok前
以上试验是在看了下文以及参考了OReilly.Oracle.PL.SQL.Programming.4th.Edition.Aug.2005.chs后作出的.
--------------
2010/08/04 增加以下例子---
--以'POSIX('开头的串,以下几个表达式的结果都是一样的。
select  1 from dual where regexp_like('POSIX(ADDDDDD,Y,13)','^(POSIX\()([A-Z]+)(,)([RYN]{1})(,)\d{1,}?\)')
select  1 from dual where regexp_like('POSIX(ADDDDDD,Y,13)','^(POSIX\()([A-Z]+)(,)([RYN]{1})(,)\d+\)')
select  1 from dual where regexp_like('POSIX(ADDDDDD,Y,13)','^(POSIX\()([A-Z]+)(,)([RYN]{1}),\d+\)')
select  1 from dual where regexp_like('POSIX(ADDDDDD,Y,13)','^(POSIX\()([A-Z]+),([RYN]{1}),\d+\)')
select  1 from dual where regexp_like('POSIX(ADDDDDD,Y,13)','^(POSIX\()([A-Z]+),([RYN]+),\d+\)')
select  1 from dual where regexp_like('POSIX(ADDDDDD,Y,13)','^(POSIX\()([A-Z]+),[RYN]+,\d+\)')
select  1 from dual where regexp_like('POSIX(A,N,13)','^(POSIX\()([A-Z]+),[RYN]+,\d+\)')
----------
--位于字符串中的串
select  1 from dual where regexp_like('POSIX(A,N,13)','^(POSIX\()([A-Z]+),[RYN]+,\d+\)')
select  1 from dual where regexp_like('P_OSI_X(JYL_GP_R,N,13)','^[A-Z_]+\(([A-Z_]+),[RYN]+,\d+\)')
--测试regexp_substr,基本的还是regexp_like
| SQL> select regexp_substr('P_OSI_X(JYL_GP_R,N,13)+AVG(JYL_JJ_R,Y,3)','^[A-Z_]+\(([A-Z_]+),[RYN]+,\d+\)') FROM DUAL 2  /
 
 REGEXP_SUBSTR('P_OSI_X(JYL_GP_
 ------------------------------
 P_OSI_X(JYL_GP_R,N,13)
 
 SQL> select regexp_substr('P_OSI_X(JYL_GP_R,N,13)+AVG(JYL_JJ_R,Y,3)','[A-Z_]+\(([A-Z_]+),[RYN]+,\d+\)',1,2) FROM DUAL
 2  /
 
 REGEXP_SUBSTR('P_OSI_X(JYL_GP_
 ------------------------------
 AVG(JYL_JJ_R,Y,3)
 | 
2010/08/26 新增例子
| -- delete the first () select regexp_replace('dad(disc_id=50003118)add','(^([A-Z]{1,}\()|(\)[A-Z]{1,})$)','',1,0,'i') FROM DUAL
 --
 match pattern
 using classes
 select regexp_substr(upper('dad(ssdsd89\9 dd== (sddf(d)),sdsd)add'),'^([A-Z]{1,}\()[[:space:][:alnum:]\,\_\(\)\=]{1,}\)[A-Z]{1,}$',1,1,'i') FROM DUAL
 
 
 | 
   
OReilly. 的参考内容Section A.1.  Metacharacters
| A.1. MetacharactersThe R2 column in Tables A-1 through A-3 indicates which metacharacters were introduced in Oracle Database 10g Release 2. All others were introduced in Release 1.   
Table A-1. Character-matching metacharacters
| Syntax | R2? | Description |  
| . |  | Matches any single character except for newline. Will match newline when the n flag is set. On Windows, Linux, and Unix platforms, chr(10) is recognized as the newline. |  
| [ ... ] |  | Defines a matching list that matches any character listed between the brackets. You may specify ranges of characters, as in a-z. These ranges are interpreted based on the NLS_SORT setting. A dash (-) is a literal when it occurs first or last in the list (e.g., [abc-]). A closing-bracket (]) is a literal when it occurs first in the list (e.g., []abc]). A caret (^) in the first position makes the list a nonmatching list (see the next entry). |  
| [^ ... ] |  | Matches any character not listed between the brackets. Referred to as a "nonmatching list." |  
| [:class:] |  | Matches any character that belongs to the specified character class. May only be used within a matching list: [[:class:]abc] is a valid expression, [:class:]abc is not. Table A-5 lists the valid character class names. |  
| [.coll.] |  | Matches the specified collation element, which may be one or more characters. May only be used within a matching list. For example, the expression [[.ch.]] matches the Spanish letter "ch". Table A-4 lists the valid collation elements. |  
| [=char=] |  | Matches all characters that share the same base character as char. May be used only within a matching list. For example, [[=e=]] matches any of: "eé?è???E". |  
| \d | 
 | Matches any digit. Equivalent to [[:digit:]]. |  
| \D | 
 | Matches any nondigit. Equivalent to [^[:digit:]] |  
| \w | 
 | Matches any "word character." Word characters are defined to be alphabetic characters, numeric characters, and the underscore. |  
| \W | 
 | Matches any nonword character. |  
| \s | 
 | Matches any whitespace character. Equivalent to [[:space:]]. |  
| \S | 
 | Matches nonwhitespace characters. Equivalent to [^[:space:]]. |    
   
Table A-2. Quantifiers
| Syntax | R2? | Description |  
| + |  | One or more |  
| ? |  | Zero or one |  
| * |  | Zero or more |  
| {m} |  | Exactly m occurrences |  
| {m,} |  | At least m occurrences |  
| {m,n} |  | At least m, and at most n occurrences |  
| +? | 
 | One or more, but nongreedy |  
| ?? | 
 | Zero or one, but nongreedy |  
| {m}? | 
 | The same as {m} |  
| {m,}? | 
 | At least m occurrences, but nongreedy and stops as soon as m occurrences are reached |  
| {m,n}? | 
 | At least m, and at most n occurrences, but nongreedy; when possible, m occurrences are matched |    
   
Table A-3. Other metacharacters
| Syntax | R2? | Description |  
| | |  | Specifies an alternation. An alternation within a subexpression doesn't extend beyond the subexpression. |  
| ( ...) |  | Defines a subexpresson. |  
| \n |  | References the text matched by the nth subexpression. Backreferences may range from \1 through \9. |  
| \ |  | When not followed by a digit, the \ is an escape character. For example, use the pattern \\1 to look for a single backslash followed by the digit 1, use \( to look for an opening-parentheses (rather than begin a subexpression), etc. |  
| ^ |  | Anchors an expression to the beginning of the string (in multiline mode, to the beginning of a line). |  
| $ |  | Anchors an expression to the end of the string (in multiline mode, to the end of a line). |  
| \A | 
 | Anchors an expression to the beginning of the string regardless of whether multiline mode is specified. |  
| \Z | 
 | Anchors an expression to the end of the string, or the a newline that happens to be ending a string, regardless of whether multiline mode is specified. |  
| \z | 
 | Anchors an expression to the end of the string regardless of whether multiline mode is specified. |    
   
Table A-4. Collation elements
| NLS_SORT | Multicharacter collation elements |  
| XDANISH | aa oe | AA OE | Aa Oe |  
| XSPANISH | ch ll | CH LL | Ch Ll |  
| XHUNGARIAN | cs gy ly ny sz ty zs | CS GY LY NY SZ TY ZS | Cs Gy Ly Ny Sz Ty Zs |  
| XCZECH | ch | CH | Ch |  
| XCZECH_PUNCTUATION | ch | CH | Ch |  
| XSLOVAK | dz d_ ch | DZ D_ CH | Dz D_ Ch |  
| XCROATIAN | d_ lj nj | D_ LJ Nj | D_ Lj NJ |    
   
Table A-5. Supported character classes
| Class | Description |  
| [:alnum:] | Alphanumeric characters (same as [:alpha:] + [:digit:]) |  
| [:alpha:] | Alphabetic characters only |  
| [:blank:] | Blank space characters, such as space and tab |  
| [:cntrl:] | Nonprinting, or control characters |  
| [:digit:] | Numeric digits |  
| [:graph:] | Graphical characters (same as [:punct:] + [:upper:] + [:lower:] + [:digit:]) |  
| [:lower:] | Lowercase letters |  
| [:print:] | Printable characters |  
| [:punct:] | Punctuation characters |  
| [:space:] | Whitespace characters such as space, formfeed, newline, carriage return, horizontal tab, and vertical tab |  
| [:upper:] | Uppercase letters |  
| [:xdigit:] | Hexadecimal characters |  | 
对于类的使用,要特别注意点,因为基本上要包含在一对中括号中。
例一:检验非汉字的函数例如可以定义一个检验非汉字的函数,不考虑其它UNICODE
CREATE OR REPLACE Function func_existNormalChar(chars In Varchar2) Return Pls_Integer
Is
  vResult Pls_Integer;
Begin
  Select 1
     Into vresult From dual
        Where regexp_like(chars,'[!-~ ,。、?’“;:『』【】+=-——~!◎#¥%……※×()]+')
        Or    regexp_like(chars,'[[:space:]+]')
        Or    regexp_like(chars,'[[:blank:]+]')
        Or    regexp_like(chars,'[[:cntrl:]+]');
  Return vresult;
Exception
  When Others Then
    Return 0;
End;
例二:出现次数的检验
| 序号 | 例子 | 结果 | 说明 | 
| 1 | regexp_replace('a12c','[[:digit:]]+','*') | a*c | 测试"+",即出现一次或者更多次的.由于采取的是连续的匹配,所以结果上是把多个数字合并为一个替代字符串,为了理解连续匹配的意思,看序号9更能够明白.这是总体偏少的模式 | 
| 2 | regexp_replace('a12c','[[:digit:]]?','*') | *a***c* | 测试"?",即0或者一次.这个符号,会在任意一个其它字符边上插入一个假设出现的匹配类型符号,如例子,a的左边本来是没有,但是表达式认为数字字符在a的左边出现了0次(这个是成立的).然后还要逐个替换过去,这是总体偏多的模式,过于匹配和逐个匹配结合 
 | 
| 3 | regexp_replace('a12c','[[:digit:]]*','*') 
 | *a**c* | 测试"*",即0或者更多次数,这个模式类似与?类似,采取连续匹配+多余匹配 | 
| 4 | regexp_replace('a12c','[[:digit:]]+?','*') 
 | a**c | 测试"+?",匹配在前面一次或者多次的串,采取的是逐个匹配的策略(如果有连续的情况出现),这个是通常最希望使用的模式. | 
| 5 | regexp_replace('a12c','[[:digit:]]*?','*') | *a*1*2*c* | 测试"*?",匹配0或者更多次,难于理解! 但可以简单地理解为为在各个字符中以及两边插入替代串. 
 | 
| 6 | regexp_replace('a12c','[[:digit:]]??','*') | *a*1*2*c* | 测试"??",匹配0或者1次,难于理解啊! 但可以简单地理解为为在各个字符中以及两边插入替代串. | 
| 7 | regexp_replace('a12c','[[:digit:]]{1,}','*') 
 | a*c | 测试{1,},同+,结果已经说明 | 
| 8 |  |  |  | 
| 9 | regexp_replace('a1x2c','[[:digit:]]+','*') | a*x*c | 测试"+",这个例子说明,如果被检测的字符串中特定的匹配串如果不是连续的(多次),那么替代的还是属于正常的范围. | 
其它的次数模式暂时没有测试,对于 *,?,+ ,*?,+?,??,基本有其对应的模式 {n,m}? 或者是{n,}? ,除了*,*?之外,因为此二者表示的都是0或者多次(大于1),无法用{n,m}的方式表示.
例三:多个类的并用
很多时候使用类还是很方便的,例如[:digit:]表示数字.[:space:]表示空格,换行,回车等等.
如果希望把特定串中的数字和回车空格等都替换为'-',则可以使用 "|"(替代操作符,其实就是类似一般高级语言中的或操作符号).
select  regexp_replace('adfa jkjk jk
jk','([[:digit:]]+?)|([[:space:]]+?)','-')  from dual;
结果如下:
adfa-jkjk-jk-jk
已经把空格和回车符号都替换为'-'了.
又例如:
Select regexp_replace('a1b c1d jk
jk','([[:digit:]]+?)|([[:space:]]+?)','-') From dual
结果如下:
a-b-c-d-jk-jk
例四:迭代替换
例如希望把以下串替换掉
'<xxxx   <often>  <common>   <fat>   >'
关于范围操作符号 - 
发音:横杆或者减号
在列表匹配操作情况下,如果没有特别的设置,就是表示匹配一个范围:
 
| 例子:把字符串中非数字+-()的符号替换为空 这是一个不成功的匹配 SQL> select regexp_replace('1+[2c- 3([ad]km,4','[^0-9\+\-\(\)]','',1) from dual;
 REGEXP_REPLACE('1+[2C-3([AD]KM
 ------------------------------
 1+23(4
 这也是一个错误的匹配,因为减号这个时候表示一个范围SQL> select regexp_replace('1+[2c- 3([ad]km,4','[^0-9\+-\(\)]','',1) from dual;
 
 REGEXP_REPLACE('1+[2C-3([AD]KM
 ------------------------------
 1+[2-3([,4
 二者的区别在于单独的一个减号,如果前面没有跟上斜杠\,那么就是表示范围,否则就是减号本身。 但即使如此,依然没有达到目的,第一个把减号还是给去除了,虽然排除了其它的需要排除的符号, 第二个,则是当作范围符号,大部分的没有排除掉。 但是不太明白的是,按照oracle文档的约定,在范围匹配中形如[\-]表示匹配一个减号,为何依然匹配补上了,尤其是当夹着其它需要匹配的模式的时候。 现在只能尝试着把这个模式放在最后: SQL> select regexp_replace('1+[2c- 3([ad4','[^0-9\(+\)\-]','',1) from dual2  /
 
 REGEXP_REPLACE('1+[2C-3([AD4',
 ------------------------------
 1+2-3(4
 结果是可以,因为模式"\-"被放到了范围中的尾巴。 那是文档不全,还是理解不透彻了, 看来以后遇到这样的问题还是先把减号放到尾巴去吧。 |