IN和EXISTS

有时候会将一列和一系列值相比较。最简单的办法就是在where子句中使用子查询。在where子句中可以使用两种格式的子查询。

第一种格式是使用IN操作符:

... where column in(select * from ... where ...);

第二种格式是使用EXIST操作符:

... where exists (select 'X' from ...where ...);

我相信绝大多数人会使用第一种格式,因为它比较容易编写,而实际上第二种格式要远比第一种格式的效率高。在Oracle中可以几乎将所有的IN操作符子查询改写为使用EXISTS的子查询。

第二种格式中,子查询以‘select 'X'开始。运用EXISTS子句不管子查询从表中抽取什么数据它只查看where子句。这样优化器就不必遍历整个表而仅根据索引就可完成工作(这里假定在where语句中使用的列存在索引)。相对于IN子句来说,EXISTS使用相连子查询,构造起来要比IN子查询困难一些。

通过使用EXIST,Oracle系统会首先检查主查询,然后运行子查询直到它找到第一个匹配项,这就节省了时间。Oracle系统在执行IN子查询时,首先执行子查询,并将获得的结果列表存放在在一个加了索引的临时表中。在执行子查询之前,系统先将主查询挂起,待子查询执行完毕,存放在临时表中以后再执行主查询。这也就是使用EXISTS比使用IN通常查询速度快的原因。

同时应尽可能使用NOT EXISTS来代替NOT IN,尽管二者都使用了NOT(不能使用索引而降低速度),NOT EXISTS要比NOT IN查询效率更高。

EXISTS检查是否有结果,判断是否有记录,返回的是一个布尔型(TRUE/FALSE)。
IN是对结果值进行比较,判断一个字段是否存在于几个值的范围中,所以 EXISTS 比 IN 快。


主要区别是:
exists主要用于片面的,有满足一个条件的即可,
in主要用于具体的集合操作,有多少满足条件.

exists是判断是否存在这样的记录,
in是判断某个字段是否在指定的某个范围内。
exists快一些吧 。


in适合内外表都很大的情况,exists适合外表结果集很小的情况。


在ASKTOM的讲解:
Well, the two are processed very very differently.

Select * from T1 where x in ( select y from T2 )

is typically processed as:

select *
from t1, ( select distinct y from t2 ) t2
where t1.x = t2.y;

The subquery is evaluated, distinct'ed, indexed (or hashed or sorted) and then
joined to the original table -- typically.


As opposed to

select * from t1 where exists ( select null from t2 where y = x )

That is processed more like:


for x in ( select * from t1 )
loop
if ( exists ( select null from t2 where y = x.x )
then
OUTPUT THE RECORD
end if
end loop

It always results in a full scan of T1 whereas the first query can make use of
an index on T1(x).


So, when is where exists appropriate and in appropriate?

Lets say the result of the subquery
( select y from T2 )

is "huge" and takes a long time. But the table T1 is relatively small and
executing ( select null from t2 where y = x.x ) is very very fast (nice index on
t2(y)). Then the exists will be faster as the time to full scan T1 and do the
index probe into T2 could be less then the time to simply full scan T2 to build
the subquery we need to distinct on.


Lets say the result of the subquery is small -- then IN is typicaly more
appropriate.


If both the subquery and the outer table are huge -- either might work as well
as the other -- depends on the indexes and other factors.