﻿<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/"><channel><title>BlogJava-Change Dir-随笔分类-聚类分析</title><link>http://www.blogjava.net/changedi/category/43518.html</link><description>先知cd——热爱生活是一切艺术的开始</description><language>zh-cn</language><lastBuildDate>Wed, 24 Oct 2012 10:20:18 GMT</lastBuildDate><pubDate>Wed, 24 Oct 2012 10:20:18 GMT</pubDate><ttl>60</ttl><item><title>聚类算法学习笔记（五）——划分聚类</title><link>http://www.blogjava.net/changedi/archive/2010/05/11/320631.html</link><dc:creator>changedi</dc:creator><author>changedi</author><pubDate>Tue, 11 May 2010 13:07:00 GMT</pubDate><guid>http://www.blogjava.net/changedi/archive/2010/05/11/320631.html</guid><wfw:comment>http://www.blogjava.net/changedi/comments/320631.html</wfw:comment><comments>http://www.blogjava.net/changedi/archive/2010/05/11/320631.html#Feedback</comments><slash:comments>4</slash:comments><wfw:commentRss>http://www.blogjava.net/changedi/comments/commentRss/320631.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/changedi/services/trackbacks/320631.html</trackback:ping><description><![CDATA[&nbsp;
<h1 style="text-indent: -18pt; margin-left: 18pt">1.<span style="font: 7pt 'Times New Roman'">&nbsp;&nbsp;&nbsp;&nbsp; </span><span style="font-family: 宋体">划分聚类</span></h1>
<p style="text-indent: 21pt" class="MsoNormal"><span style="font-family: 宋体">其实从某种角度讲，划分聚类是完全不用赘述的一种聚类方法，可能也是最常见的聚类算法了。著名的</span>k-means<span style="font-family: 宋体">算法就是个中典型。这次的内容主要是通过</span>k-means<span style="font-family: 宋体">聚类算法来总体介绍一下划分聚类。</span></p>
<p style="text-indent: 21pt" class="MsoNormal"><span style="font-family: 宋体">简单来讲，</span>k<span style="font-family: 宋体">均值聚类究竟做了什么事，我们可以这样来看，有</span>N<span style="font-family: 宋体">个数据点的集合</span>D={x1,x2,&#8230;,xn}<span style="font-family: 宋体">，每个</span>xi<span style="font-family: 宋体">代表一个特征向量，目标是将这</span>N<span style="font-family: 宋体">个点根据某种相似准则将其划分到</span>K<span style="font-family: 宋体">个分类中。而</span>k<span style="font-family: 宋体">均值所表达的重要在于相似准则的选取，即不断的使用类簇的均值来完成这样的划分。当然也有书把这种相似准则称之为评分函数。基于划分的聚类算法对于</span>homogeneity<span style="font-family: 宋体">的实现是通过选取适当的评分函数并使每一个数据点到它所属的聚类中心的距离最小化。而关键就是如何定义这种距离，和所谓的聚类中心。举个例子来讲，如果定义聚类间距离为欧式距离，那么可以使用协方差的概念来定义通用的评分函数。划分聚类的思想是最直观和易懂的分类思想，因此我也不在这里长篇介绍，还是以算法的实现和代码来直观表现划分聚类的性能。</span></p>
<h1>2. <span style="font-family: 宋体">算法实现</span></h1>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span style="font-family: 宋体">我们以</span>k-means<span style="font-family: 宋体">算法为例来实现划分聚类。该算法的复杂度为</span>O(KnI)<span style="font-family: 宋体">，其中</span>I<span style="font-family: 宋体">是迭代次数。这种算法的一个变体是依次分析每个数据点，而且一旦有数据点被重新分配就更新聚类中心，反复的在数据点中循环直到解不再变化。</span>k-means<span style="font-family: 宋体">算法的搜索过程局限于全部可能的划分空间的一个很小的部分。因此有可能因为算法收敛到评分函数的局部而非全局最小而错过更好的解。当然缓解方法可以通过选取随机起始点来改进搜索（我们例子中的</span>KMPP<span style="font-family: 宋体">算法），或者利用模拟退火等策略来改善搜索性能。因此，从这个角度来理解，聚类分析实质上是一个在庞大的解空间中优化特定评分函数的搜索问题。</span></p>
<p style="text-indent: 21pt" class="MsoNormal"><span style="font-family: 宋体">不多说了，直接上代码吧！！！</span></p>
<p>k-means<span style="font-family: 宋体">算法：</span></p>
<p>for k = 1, &#8230; , K <span style="font-family: 宋体">令</span> r(k) <span style="font-family: 宋体">为从</span>D<span style="font-family: 宋体">中随机选取的一个点；</span></p>
<p>while <span style="font-family: 宋体">在聚类</span>Ck<span style="font-family: 宋体">中有变化发生</span> do</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span style="font-family: 宋体">形成聚类：</span></p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; For k = 1, &#8230; , K do</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Ck = { x <span style="font-family: 宋体">&#8712;</span> D | d(rk,x) &lt;= d(rj,x) <span style="font-family: 宋体">对所有</span>j=1, &#8230; , K, j != k}<span style="font-family: 宋体">；</span></p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; End;</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span style="font-family: 宋体">计算新聚类中心：</span></p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; For k = 1, &#8230; , K do</p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Rk = Ck <span style="font-family: 宋体">内点的均值向量</span></p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; End;</p>
<p>End;</p>
<p style="text-indent: 21pt" class="MsoNormal"><span style="font-family: 宋体">具体实现部分因为有</span>Apache Commons Math<span style="font-family: 宋体">的现成代码，秉着</span>Eric Raymond<span style="font-family: 宋体">的</span>TAOUP<span style="font-family: 宋体">中的极大利用工具原则，我没有写</span>k-means<span style="font-family: 宋体">的实现，而是直接利用</span>Apache Commons Math<span style="font-family: 宋体">中的</span>k-means plus plus<span style="font-family: 宋体">代码来作为例子。</span></p>
<p><span style="font-family: 宋体">具体如何测试这一算法，给出了测试代码如下：<br />
</p>
<div style="border-bottom: #cccccc 1px solid; border-left: #cccccc 1px solid; padding-bottom: 4px; background-color: #eeeeee; padding-left: 4px; width: 98%; padding-right: 5px; font-size: 13px; word-break: break-all; border-top: #cccccc 1px solid; border-right: #cccccc 1px solid; padding-top: 4px"><span style="color: #008080">&nbsp;1</span><img id="Codehighlighter1_34_719_Open_Image" onclick="this.style.display='none'; Codehighlighter1_34_719_Open_Text.style.display='none'; Codehighlighter1_34_719_Closed_Image.style.display='inline'; Codehighlighter1_34_719_Closed_Text.style.display='inline';" alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedBlockStart.gif" /><img style="display: none" id="Codehighlighter1_34_719_Closed_Image" onclick="this.style.display='none'; Codehighlighter1_34_719_Closed_Text.style.display='none'; Codehighlighter1_34_719_Open_Image.style.display='inline'; Codehighlighter1_34_719_Open_Text.style.display='inline';" alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/ContractedBlock.gif" /><span style="color: #0000ff">private</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">static</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">void</span><span style="color: #000000">&nbsp;testKMeansPP()</span><span style="border-bottom: #808080 1px solid; border-left: #808080 1px solid; background-color: #ffffff; display: none; border-top: #808080 1px solid; border-right: #808080 1px solid" id="Codehighlighter1_34_719_Closed_Text"><img alt="" src="http://www.blogjava.net/Images/dot.gif" /></span><span id="Codehighlighter1_34_719_Open_Text"><span style="color: #000000">{<br />
</span><span style="color: #008080">&nbsp;2</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" /><br />
</span><span style="color: #008080">&nbsp;3</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #008000">//</span><span style="color: #008000">ori&nbsp;is&nbsp;sample&nbsp;as&nbsp;n&nbsp;instances&nbsp;with&nbsp;m&nbsp;features,&nbsp;here&nbsp;n=8,m=2</span><span style="color: #008000"><br />
</span><span style="color: #008080">&nbsp;4</span><span style="color: #008000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" /></span><span style="color: #000000"><br />
</span><span style="color: #008080">&nbsp;5</span><span style="color: #000000"><img id="Codehighlighter1_128_176_Open_Image" onclick="this.style.display='none'; Codehighlighter1_128_176_Open_Text.style.display='none'; Codehighlighter1_128_176_Closed_Image.style.display='inline'; Codehighlighter1_128_176_Closed_Text.style.display='inline';" alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockStart.gif" /><img style="display: none" id="Codehighlighter1_128_176_Closed_Image" onclick="this.style.display='none'; Codehighlighter1_128_176_Closed_Text.style.display='none'; Codehighlighter1_128_176_Open_Image.style.display='inline'; Codehighlighter1_128_176_Open_Text.style.display='inline';" alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/ContractedSubBlock.gif" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">int</span><span style="color: #000000">&nbsp;ori[][]&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;</span><span style="border-bottom: #808080 1px solid; border-left: #808080 1px solid; background-color: #ffffff; display: none; border-top: #808080 1px solid; border-right: #808080 1px solid" id="Codehighlighter1_128_176_Closed_Text"><img alt="" src="http://www.blogjava.net/Images/dot.gif" /></span><span id="Codehighlighter1_128_176_Open_Text"><span style="color: #000000">{</span><span style="border-bottom: #808080 1px solid; border-left: #808080 1px solid; background-color: #ffffff; display: none; border-top: #808080 1px solid; border-right: #808080 1px solid" id="Codehighlighter1_129_133_Closed_Text"><img alt="" src="http://www.blogjava.net/Images/dot.gif" /></span><span id="Codehighlighter1_129_133_Open_Text"><span style="color: #000000">{</span><span style="color: #000000">2</span><span style="color: #000000">,</span><span style="color: #000000">5</span><span style="color: #000000">}</span></span><span style="color: #000000">,</span><span style="border-bottom: #808080 1px solid; border-left: #808080 1px solid; background-color: #ffffff; display: none; border-top: #808080 1px solid; border-right: #808080 1px solid" id="Codehighlighter1_135_139_Closed_Text"><img alt="" src="http://www.blogjava.net/Images/dot.gif" /></span><span id="Codehighlighter1_135_139_Open_Text"><span style="color: #000000">{</span><span style="color: #000000">6</span><span style="color: #000000">,</span><span style="color: #000000">4</span><span style="color: #000000">}</span></span><span style="color: #000000">,</span><span style="border-bottom: #808080 1px solid; border-left: #808080 1px solid; background-color: #ffffff; display: none; border-top: #808080 1px solid; border-right: #808080 1px solid" id="Codehighlighter1_141_145_Closed_Text"><img alt="" src="http://www.blogjava.net/Images/dot.gif" /></span><span id="Codehighlighter1_141_145_Open_Text"><span style="color: #000000">{</span><span style="color: #000000">5</span><span style="color: #000000">,</span><span style="color: #000000">3</span><span style="color: #000000">}</span></span><span style="color: #000000">,</span><span style="border-bottom: #808080 1px solid; border-left: #808080 1px solid; background-color: #ffffff; display: none; border-top: #808080 1px solid; border-right: #808080 1px solid" id="Codehighlighter1_147_151_Closed_Text"><img alt="" src="http://www.blogjava.net/Images/dot.gif" /></span><span id="Codehighlighter1_147_151_Open_Text"><span style="color: #000000">{</span><span style="color: #000000">2</span><span style="color: #000000">,</span><span style="color: #000000">2</span><span style="color: #000000">}</span></span><span style="color: #000000">,</span><span style="border-bottom: #808080 1px solid; border-left: #808080 1px solid; background-color: #ffffff; display: none; border-top: #808080 1px solid; border-right: #808080 1px solid" id="Codehighlighter1_153_157_Closed_Text"><img alt="" src="http://www.blogjava.net/Images/dot.gif" /></span><span id="Codehighlighter1_153_157_Open_Text"><span style="color: #000000">{</span><span style="color: #000000">1</span><span style="color: #000000">,</span><span style="color: #000000">4</span><span style="color: #000000">}</span></span><span style="color: #000000">,</span><span style="border-bottom: #808080 1px solid; border-left: #808080 1px solid; background-color: #ffffff; display: none; border-top: #808080 1px solid; border-right: #808080 1px solid" id="Codehighlighter1_159_163_Closed_Text"><img alt="" src="http://www.blogjava.net/Images/dot.gif" /></span><span id="Codehighlighter1_159_163_Open_Text"><span style="color: #000000">{</span><span style="color: #000000">5</span><span style="color: #000000">,</span><span style="color: #000000">2</span><span style="color: #000000">}</span></span><span style="color: #000000">,</span><span style="border-bottom: #808080 1px solid; border-left: #808080 1px solid; background-color: #ffffff; display: none; border-top: #808080 1px solid; border-right: #808080 1px solid" id="Codehighlighter1_165_169_Closed_Text"><img alt="" src="http://www.blogjava.net/Images/dot.gif" /></span><span id="Codehighlighter1_165_169_Open_Text"><span style="color: #000000">{</span><span style="color: #000000">3</span><span style="color: #000000">,</span><span style="color: #000000">3</span><span style="color: #000000">}</span></span><span style="color: #000000">,</span><span style="border-bottom: #808080 1px solid; border-left: #808080 1px solid; background-color: #ffffff; display: none; border-top: #808080 1px solid; border-right: #808080 1px solid" id="Codehighlighter1_171_175_Closed_Text"><img alt="" src="http://www.blogjava.net/Images/dot.gif" /></span><span id="Codehighlighter1_171_175_Open_Text"><span style="color: #000000">{</span><span style="color: #000000">2</span><span style="color: #000000">,</span><span style="color: #000000">3</span><span style="color: #000000">}</span></span><span style="color: #000000">}</span></span><span style="color: #000000">;<br />
</span><span style="color: #008080">&nbsp;6</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" /><br />
</span><span style="color: #008080">&nbsp;7</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">int</span><span style="color: #000000">&nbsp;n&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;</span><span style="color: #000000">8</span><span style="color: #000000">;<br />
</span><span style="color: #008080">&nbsp;8</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" /><br />
</span><span style="color: #008080">&nbsp;9</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Collection</span><span style="color: #000000">&lt;</span><span style="color: #000000">EuclideanIntegerPoint</span><span style="color: #000000">&gt;</span><span style="color: #000000">&nbsp;col&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">new</span><span style="color: #000000">&nbsp;ArrayList</span><span style="color: #000000">&lt;</span><span style="color: #000000">EuclideanIntegerPoint</span><span style="color: #000000">&gt;</span><span style="color: #000000">();<br />
</span><span style="color: #008080">10</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" /><br />
</span><span style="color: #008080">11</span><span style="color: #000000"><img id="Codehighlighter1_314_422_Open_Image" onclick="this.style.display='none'; Codehighlighter1_314_422_Open_Text.style.display='none'; Codehighlighter1_314_422_Closed_Image.style.display='inline'; Codehighlighter1_314_422_Closed_Text.style.display='inline';" alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockStart.gif" /><img style="display: none" id="Codehighlighter1_314_422_Closed_Image" onclick="this.style.display='none'; Codehighlighter1_314_422_Closed_Text.style.display='none'; Codehighlighter1_314_422_Open_Image.style.display='inline'; Codehighlighter1_314_422_Open_Text.style.display='inline';" alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/ContractedSubBlock.gif" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">for</span><span style="color: #000000">(</span><span style="color: #0000ff">int</span><span style="color: #000000">&nbsp;i</span><span style="color: #000000">=</span><span style="color: #000000">0</span><span style="color: #000000">;i</span><span style="color: #000000">&lt;</span><span style="color: #000000">n;i</span><span style="color: #000000">++</span><span style="color: #000000">)</span><span style="border-bottom: #808080 1px solid; border-left: #808080 1px solid; background-color: #ffffff; display: none; border-top: #808080 1px solid; border-right: #808080 1px solid" id="Codehighlighter1_314_422_Closed_Text"><img alt="" src="http://www.blogjava.net/Images/dot.gif" /></span><span id="Codehighlighter1_314_422_Open_Text"><span style="color: #000000">{<br />
</span><span style="color: #008080">12</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" /><br />
</span><span style="color: #008080">13</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;EuclideanIntegerPoint&nbsp;ec&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;new EuclideanIntegerPoint(ori[i]);<br />
</span><span style="color: #008080">14</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" /><br />
</span><span style="color: #008080">15</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;col.add(ec);<br />
</span><span style="color: #008080">16</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" /><br />
</span><span style="color: #008080">17</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockEnd.gif" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}</span></span><span style="color: #000000"><br />
</span><span style="color: #008080">18</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" /><br />
</span><span style="color: #008080">19</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;KMeansPlusPlusClusterer</span><span style="color: #000000">&lt;</span><span style="color: #000000">EuclideanIntegerPoint</span><span style="color: #000000">&gt;</span><span style="color: #000000">&nbsp;km&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">new</span><span style="color: #000000">&nbsp;KMeansPlusPlusClusterer</span><span style="color: #000000">&lt;</span><span style="color: #000000">EuclideanIntegerPoint</span><span style="color: #000000">&gt;</span><span style="color: #000000">(</span><span style="color: #0000ff">new</span><span style="color: #000000">&nbsp;Random(n));<br />
</span><span style="color: #008080">20</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" /><br />
</span><span style="color: #008080">21</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;List</span><span style="color: #000000">&lt;</span><span style="color: #000000">Cluster</span><span style="color: #000000">&lt;</span><span style="color: #000000">EuclideanIntegerPoint</span><span style="color: #000000">&gt;&gt;</span><span style="color: #000000">&nbsp;list&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">new</span><span style="color: #000000">&nbsp;ArrayList</span><span style="color: #000000">&lt;</span><span style="color: #000000">Cluster</span><span style="color: #000000">&lt;</span><span style="color: #000000">EuclideanIntegerPoint</span><span style="color: #000000">&gt;&gt;</span><span style="color: #000000">();<br />
</span><span style="color: #008080">22</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" /><br />
</span><span style="color: #008080">23</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;list&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;km.cluster(col,&nbsp;</span><span style="color: #000000">3</span><span style="color: #000000">,&nbsp;</span><span style="color: #000000">100</span><span style="color: #000000">);<br />
</span><span style="color: #008080">24</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" /><br />
</span><span style="color: #008080">25</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;output(list);<br />
</span><span style="color: #008080">26</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" /><br />
</span><span style="color: #008080">27</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedBlockEnd.gif" />&nbsp;&nbsp;&nbsp;&nbsp;}</span></span><span style="color: #000000"><br />
</span><span style="color: #008080">28</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/None.gif" /><br />
</span><span style="color: #008080">29</span><span style="color: #000000"><img id="Codehighlighter1_791_1344_Open_Image" onclick="this.style.display='none'; Codehighlighter1_791_1344_Open_Text.style.display='none'; Codehighlighter1_791_1344_Closed_Image.style.display='inline'; Codehighlighter1_791_1344_Closed_Text.style.display='inline';" alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedBlockStart.gif" /><img style="display: none" id="Codehighlighter1_791_1344_Closed_Image" onclick="this.style.display='none'; Codehighlighter1_791_1344_Closed_Text.style.display='none'; Codehighlighter1_791_1344_Open_Image.style.display='inline'; Codehighlighter1_791_1344_Open_Text.style.display='inline';" alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/ContractedBlock.gif" /></span><span style="color: #0000ff">private</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">static</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">void</span><span style="color: #000000">&nbsp;output(List</span><span style="color: #000000">&lt;</span><span style="color: #000000">Cluster</span><span style="color: #000000">&lt;</span><span style="color: #000000">EuclideanIntegerPoint</span><span style="color: #000000">&gt;&gt;</span><span style="color: #000000">&nbsp;list)</span><span style="border-bottom: #808080 1px solid; border-left: #808080 1px solid; background-color: #ffffff; display: none; border-top: #808080 1px solid; border-right: #808080 1px solid" id="Codehighlighter1_791_1344_Closed_Text"><img alt="" src="http://www.blogjava.net/Images/dot.gif" /></span><span id="Codehighlighter1_791_1344_Open_Text"><span style="color: #000000">{<br />
</span><span style="color: #008080">30</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" /><br />
</span><span style="color: #008080">31</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">int</span><span style="color: #000000">&nbsp;ind&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;</span><span style="color: #000000">1</span><span style="color: #000000">;<br />
</span><span style="color: #008080">32</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" /><br />
</span><span style="color: #008080">33</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Iterator</span><span style="color: #000000">&lt;</span><span style="color: #000000">Cluster</span><span style="color: #000000">&lt;</span><span style="color: #000000">EuclideanIntegerPoint</span><span style="color: #000000">&gt;&gt;</span><span style="color: #000000">&nbsp;it&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;list.iterator();<br />
</span><span style="color: #008080">34</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" /><br />
</span><span style="color: #008080">35</span><span style="color: #000000"><img id="Codehighlighter1_912_1337_Open_Image" onclick="this.style.display='none'; Codehighlighter1_912_1337_Open_Text.style.display='none'; Codehighlighter1_912_1337_Closed_Image.style.display='inline'; Codehighlighter1_912_1337_Closed_Text.style.display='inline';" alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockStart.gif" /><img style="display: none" id="Codehighlighter1_912_1337_Closed_Image" onclick="this.style.display='none'; Codehighlighter1_912_1337_Closed_Text.style.display='none'; Codehighlighter1_912_1337_Open_Image.style.display='inline'; Codehighlighter1_912_1337_Open_Text.style.display='inline';" alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/ContractedSubBlock.gif" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">while</span><span style="color: #000000">(it.hasNext())</span><span style="border-bottom: #808080 1px solid; border-left: #808080 1px solid; background-color: #ffffff; display: none; border-top: #808080 1px solid; border-right: #808080 1px solid" id="Codehighlighter1_912_1337_Closed_Text"><img alt="" src="http://www.blogjava.net/Images/dot.gif" /></span><span id="Codehighlighter1_912_1337_Open_Text"><span style="color: #000000">{<br />
</span><span style="color: #008080">36</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" /><br />
</span><span style="color: #008080">37</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Cluster</span><span style="color: #000000">&lt;</span><span style="color: #000000">EuclideanIntegerPoint</span><span style="color: #000000">&gt;</span><span style="color: #000000">&nbsp;cl&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;it.next();<br />
</span><span style="color: #008080">38</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" /><br />
</span><span style="color: #008080">39</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;System.out.print(</span><span style="color: #000000">"</span><span style="color: #000000">Cluster</span><span style="color: #000000">"</span><span style="color: #000000">+</span><span style="color: #000000">(ind</span><span style="color: #000000">++</span><span style="color: #000000">)</span><span style="color: #000000">+</span><span style="color: #000000">"</span><span style="color: #000000">&nbsp;:</span><span style="color: #000000">"</span><span style="color: #000000">);<br />
</span><span style="color: #008080">40</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" /><br />
</span><span style="color: #008080">41</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;List</span><span style="color: #000000">&lt;</span><span style="color: #000000">EuclideanIntegerPoint</span><span style="color: #000000">&gt;</span><span style="color: #000000">&nbsp;li&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;cl.getPoints();<br />
</span><span style="color: #008080">42</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" /><br />
</span><span style="color: #008080">43</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Iterator</span><span style="color: #000000">&lt;</span><span style="color: #000000">EuclideanIntegerPoint</span><span style="color: #000000">&gt;</span><span style="color: #000000">&nbsp;ii&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;li.iterator();<br />
</span><span style="color: #008080">44</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" /><br />
</span><span style="color: #008080">45</span><span style="color: #000000"><img id="Codehighlighter1_1183_1293_Open_Image" onclick="this.style.display='none'; Codehighlighter1_1183_1293_Open_Text.style.display='none'; Codehighlighter1_1183_1293_Closed_Image.style.display='inline'; Codehighlighter1_1183_1293_Closed_Text.style.display='inline';" alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockStart.gif" /><img style="display: none" id="Codehighlighter1_1183_1293_Closed_Image" onclick="this.style.display='none'; Codehighlighter1_1183_1293_Closed_Text.style.display='none'; Codehighlighter1_1183_1293_Open_Image.style.display='inline'; Codehighlighter1_1183_1293_Open_Text.style.display='inline';" alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/ContractedSubBlock.gif" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">while</span><span style="color: #000000">(ii.hasNext())</span><span style="border-bottom: #808080 1px solid; border-left: #808080 1px solid; background-color: #ffffff; display: none; border-top: #808080 1px solid; border-right: #808080 1px solid" id="Codehighlighter1_1183_1293_Closed_Text"><img alt="" src="http://www.blogjava.net/Images/dot.gif" /></span><span id="Codehighlighter1_1183_1293_Open_Text"><span style="color: #000000">{<br />
</span><span style="color: #008080">46</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" /><br />
</span><span style="color: #008080">47</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;EuclideanIntegerPoint&nbsp;eip&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;ii.next();<br />
</span><span style="color: #008080">48</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" /><br />
</span><span style="color: #008080">49</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;System.out.print(eip</span><span style="color: #000000">+</span><span style="color: #000000">"</span><span style="color: #000000">&nbsp;</span><span style="color: #000000">"</span><span style="color: #000000">);<br />
</span><span style="color: #008080">50</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" /><br />
</span><span style="color: #008080">51</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockEnd.gif" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}</span></span><span style="color: #000000"><br />
</span><span style="color: #008080">52</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" /><br />
</span><span style="color: #008080">53</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;System.out.println();<br />
</span><span style="color: #008080">54</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" /><br />
</span><span style="color: #008080">55</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockEnd.gif" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}</span></span><span style="color: #000000"><br />
</span><span style="color: #008080">56</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" /><br />
</span><span style="color: #008080">57</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedBlockEnd.gif" />&nbsp;&nbsp;&nbsp;&nbsp;}</span></span><span style="color: #000000"><br />
</span><span style="color: #008080">58</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/None.gif" /><br />
</span><span style="color: #008080">59</span><span style="color: #000000"><img id="Codehighlighter1_1351_1379_Open_Image" onclick="this.style.display='none'; Codehighlighter1_1351_1379_Open_Text.style.display='none'; Codehighlighter1_1351_1379_Closed_Image.style.display='inline'; Codehighlighter1_1351_1379_Closed_Text.style.display='inline';" alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedBlockStart.gif" /><img style="display: none" id="Codehighlighter1_1351_1379_Closed_Image" onclick="this.style.display='none'; Codehighlighter1_1351_1379_Closed_Text.style.display='none'; Codehighlighter1_1351_1379_Open_Image.style.display='inline'; Codehighlighter1_1351_1379_Open_Text.style.display='inline';" alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/ContractedBlock.gif" />&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="border-bottom: #808080 1px solid; border-left: #808080 1px solid; background-color: #ffffff; display: none; border-top: #808080 1px solid; border-right: #808080 1px solid" id="Codehighlighter1_1351_1379_Closed_Text">/**&nbsp;*/</span><span id="Codehighlighter1_1351_1379_Open_Text"><span style="color: #008000">/**</span><span style="color: #008000"><br />
</span><span style="color: #008080">60</span><span style="color: #008000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" /><br />
</span><span style="color: #008080">61</span><span style="color: #008000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" />&nbsp;&nbsp;&nbsp;&nbsp;*</span><span style="color: #808080">@param</span><span style="color: #008000">&nbsp;args<br />
</span><span style="color: #008080">62</span><span style="color: #008000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" /><br />
</span><span style="color: #008080">63</span><span style="color: #008000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedBlockEnd.gif" />&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #008000">*/</span></span><span style="color: #000000"><br />
</span><span style="color: #008080">64</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/None.gif" /><br />
</span><span style="color: #008080">65</span><span style="color: #000000"><img id="Codehighlighter1_1425_1537_Open_Image" onclick="this.style.display='none'; Codehighlighter1_1425_1537_Open_Text.style.display='none'; Codehighlighter1_1425_1537_Closed_Image.style.display='inline'; Codehighlighter1_1425_1537_Closed_Text.style.display='inline';" alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedBlockStart.gif" /><img style="display: none" id="Codehighlighter1_1425_1537_Closed_Image" onclick="this.style.display='none'; Codehighlighter1_1425_1537_Closed_Text.style.display='none'; Codehighlighter1_1425_1537_Open_Image.style.display='inline'; Codehighlighter1_1425_1537_Open_Text.style.display='inline';" alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/ContractedBlock.gif" />&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">public</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">static</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">void</span><span style="color: #000000">&nbsp;main(String[]&nbsp;args)&nbsp;</span><span style="border-bottom: #808080 1px solid; border-left: #808080 1px solid; background-color: #ffffff; display: none; border-top: #808080 1px solid; border-right: #808080 1px solid" id="Codehighlighter1_1425_1537_Closed_Text"><img alt="" src="http://www.blogjava.net/Images/dot.gif" /></span><span id="Codehighlighter1_1425_1537_Open_Text"><span style="color: #000000">{<br />
</span><span style="color: #008080">66</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" /><br />
</span><span style="color: #008080">67</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #008000">//</span><span style="color: #008000">testHierachicalCluster();</span><span style="color: #008000"><br />
</span><span style="color: #008080">68</span><span style="color: #008000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" /></span><span style="color: #000000"><br />
</span><span style="color: #008080">69</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;testKMeansPP();<br />
</span><span style="color: #008080">70</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" /><br />
</span><span style="color: #008080">71</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #008000">//</span><span style="color: #008000">testBSAS();<br />
</span><span style="color: #008080">72</span><span style="color: #008000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" /><br />
</span><span style="color: #008080">73</span><span style="color: #008000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #008000">//</span><span style="color: #008000">testMBSAS();</span><span style="color: #008000"><br />
</span><span style="color: #008080">74</span><span style="color: #008000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" /></span><span style="color: #000000"><br />
</span><span style="color: #008080">75</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedBlockEnd.gif" />&nbsp;&nbsp;&nbsp;&nbsp;}</span></span><span style="color: #000000"><br />
</span><span style="color: #008080">76</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/None.gif" /><br />
</span><span style="color: #008080">77</span><span style="color: #000000"><img alt="" align="top" src="http://www.blogjava.net/images/OutliningIndicators/None.gif" /></span></div>
<p><br />
</span></p>
<h1>3. <span style="font-family: 宋体">小结</span></h1>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span style="font-family: 宋体">划分聚类是聚类分析中最常用的一种聚类算法了，对于其研究的论文也是多如牛毛。感兴趣的朋友们完全可以通过阅读各种相关论文来感受这一算法的美妙。当然还要再次感谢</span>Apache Commons Math<span style="font-family: 宋体">对于诸多常用数学计算的实现。对于聚类分析的总结学习暂时到此告一段落，最近要忙着写论文，等过段时间有空可以考虑继续聚类算法的研究学习。</span></p>
<h1>4. <span style="font-family: 宋体">参考文献及推荐阅读</span></h1>
<p>[1]PatternRecognitionThird Edition, Sergios Theodoridis, Konstantinos Koutroumbas</p>
<p>[2]<span style="font-family: 宋体">模式识别第三版</span>, Sergios Theodoridis, Konstantinos Koutroumbas<span style="font-family: 宋体">著</span>, <span style="font-family: 宋体">李晶皎</span>, <span style="font-family: 宋体">王爱侠</span>, <span style="font-family: 宋体">张广源等译</span></p>
<p>[3]<span style="font-family: 宋体">数据挖掘原理</span>, David Hand and et al, <span style="font-family: 宋体">张银奎等译</span></p>
<p>[4]http://commons.apache.org/math/</p>
<img src ="http://www.blogjava.net/changedi/aggbug/320631.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/changedi/" target="_blank">changedi</a> 2010-05-11 21:07 <a href="http://www.blogjava.net/changedi/archive/2010/05/11/320631.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>聚类算法学习笔记（四）——层次聚类</title><link>http://www.blogjava.net/changedi/archive/2010/03/19/315963.html</link><dc:creator>changedi</dc:creator><author>changedi</author><pubDate>Fri, 19 Mar 2010 12:08:00 GMT</pubDate><guid>http://www.blogjava.net/changedi/archive/2010/03/19/315963.html</guid><wfw:comment>http://www.blogjava.net/changedi/comments/315963.html</wfw:comment><comments>http://www.blogjava.net/changedi/archive/2010/03/19/315963.html#Feedback</comments><slash:comments>15</slash:comments><wfw:commentRss>http://www.blogjava.net/changedi/comments/commentRss/315963.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/changedi/services/trackbacks/315963.html</trackback:ping><description><![CDATA[&nbsp;&nbsp;&nbsp;&nbsp; 摘要: &nbsp;1.&nbsp;&nbsp;&nbsp; 层次聚类层次聚类算法与之前所讲的顺序聚类有很大不同，它不再产生单一聚类，而是产生一个聚类层次。说白了就是一棵层次树。介绍层次聚类之前，要先介绍一个概念&#8212;&#8212;嵌套聚类。讲的简单点，聚类的嵌套与程序的嵌套一样，一个聚类中R1包含了另一个R2，那这就是R2嵌套在R1中，或者说是R1嵌套了R2。具体说怎么算嵌套呢？聚类R1...&nbsp;&nbsp;<a href='http://www.blogjava.net/changedi/archive/2010/03/19/315963.html'>阅读全文</a><img src ="http://www.blogjava.net/changedi/aggbug/315963.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/changedi/" target="_blank">changedi</a> 2010-03-19 20:08 <a href="http://www.blogjava.net/changedi/archive/2010/03/19/315963.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>聚类算法学习笔记（三）——顺序聚类</title><link>http://www.blogjava.net/changedi/archive/2010/03/06/314698.html</link><dc:creator>changedi</dc:creator><author>changedi</author><pubDate>Sat, 06 Mar 2010 07:02:00 GMT</pubDate><guid>http://www.blogjava.net/changedi/archive/2010/03/06/314698.html</guid><wfw:comment>http://www.blogjava.net/changedi/comments/314698.html</wfw:comment><comments>http://www.blogjava.net/changedi/archive/2010/03/06/314698.html#Feedback</comments><slash:comments>13</slash:comments><wfw:commentRss>http://www.blogjava.net/changedi/comments/commentRss/314698.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/changedi/services/trackbacks/314698.html</trackback:ping><description><![CDATA[<p>&nbsp;&nbsp;&nbsp;</p>
<h1 style="margin-left: 18pt; text-indent: -18pt">1.<span style="font: 7pt 'Times New Roman'">&nbsp;&nbsp;&nbsp; </span><span style="font-family: 宋体">顺序聚类</span></h1>
<p style="text-indent: 21pt"><span style="font-family: 宋体">事实上，将</span>n<span style="font-family: 宋体">个对象，聚类到</span>k<span style="font-family: 宋体">个聚类中这件事本身是一个</span>NP<span style="font-family: 宋体">难问题。熟悉组合数学应该知道这个问题的解事第二类</span>Stirling<span style="font-family: 宋体">数：<img alt="" src="http://www.blogjava.net/images/blogjava_net/changedi/3.2.jpg" border="0" /></span><span style="font-family: 宋体">。这样问题也就出现了，如果</span>k<span style="font-family: 宋体">值固定，那么计算还是可行的，如果</span>k<span style="font-family: 宋体">值不固定，就要对所有的可能</span>k<span style="font-family: 宋体">都进行计算，那运行时间可想而知了。然而并不是所有的可行聚类方案都是合理的，所谓的合理，我理解就是说接近你的聚类目标的，之所以我们要分类，必然有初始动机，那么可以根据这个动机制定可行的聚类方案，这样，复杂度的问题就回避了。</span></p>
<p style="text-indent: 21pt"><span style="font-family: 宋体">顺序算法（</span>sequential algorithms<span style="font-family: 宋体">）是一种非常简单的聚类算法，大多数都至少将所有特征向量使用一次或几次，最后的结果依赖于向量参与算法的顺序。这种聚类算法一般是不预先知道聚类数量</span>k<span style="font-family: 宋体">的，但有可能给出一个聚类数上界</span>q<span style="font-family: 宋体">。本文将主要介绍基本顺序算法（</span>Basic Sequential Algorithmic Scheme,BSAS<span style="font-family: 宋体">）和其几个变种，并给出代码实现。</span></p>
<p style="text-indent: 21pt"><span style="font-family: 宋体">首先看</span>BSAS<span style="font-family: 宋体">，这个算法方案需要用户定义参数：不相似性阈值&#952;和允许的最大聚类数</span>q<span style="font-family: 宋体">。算法的基本思想：由于要考虑每个新向量，根据向量到已有聚类的距离，将它分配到一个已有的聚类中，或者一个新生成的聚类中。算法的伪码描述如下：</span></p>
<p style="margin-left: 18pt; text-indent: -18pt"><em>1.<span style="font: 7pt 'Times New Roman'">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></em><em>m</em>=1&nbsp;&nbsp; /*{<span style="font-family: 宋体">聚类数量</span>}*/</p>
<p style="margin-left: 18pt; text-indent: -18pt"><em>2.<span style="font: 7pt 'Times New Roman'">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></em><em>C<sub>m</sub></em>={<em><u>x</u></em><sub>1</sub>}</p>
<p style="margin-left: 18pt; text-indent: -18pt"><em>3.<span style="font: 7pt 'Times New Roman'">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></em>For <em>i</em>=2 to <em>N</em></p>
<p style="margin-left: 18pt; text-indent: -18pt"><em>4.<span style="font: 7pt 'Times New Roman'">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></em>&nbsp;&nbsp;&nbsp;&nbsp;<span style="font-family: 宋体">找</span><em>C<sub>k</sub>: d</em>(<em><u>x</u><sub>i</sub>,C<sub>k</sub></em>)=<em>min</em><sub>1</sub><sub><span style="font-family: Symbol">&#163;</span><em>j</em></sub><sub><span style="font-family: Symbol">&#163;</span><em>m</em></sub><em>d</em>(<em><u>x</u><sub>i</sub>,C<sub>j</sub></em>)</p>
<p style="margin-left: 18pt; text-indent: -18pt"><em>5.<span style="font: 7pt 'Times New Roman'">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></em>&nbsp;&nbsp;&nbsp;&nbsp;If (<em>d</em>(<em><u>x</u><sub>i</sub>,C<sub>k</sub></em>)&gt;<em>&#920;</em>) <em>AND </em>(<em>m</em>&lt;<em>q</em>) then</p>
<p style="margin-left: 18pt; text-indent: -18pt"><em>6.<span style="font: 7pt 'Times New Roman'">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></em><em>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;m</em>=<em>m</em>+1</p>
<p style="margin-left: 18pt; text-indent: -18pt"><em>7.<span style="font: 7pt 'Times New Roman'">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></em><em>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;C<sub>m</sub></em>={<em><u>x</u><sub>i</sub></em>}</p>
<p style="margin-left: 18pt; text-indent: -18pt"><em>8.<span style="font: 7pt 'Times New Roman'">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></em>&nbsp;&nbsp;&nbsp;&nbsp;Else</p>
<p style="margin-left: 18pt; text-indent: -18pt"><em>9.<span style="font: 7pt 'Times New Roman'">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></em><em>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;C<sub>k</sub></em>=<em>C<sub>k</sub></em><span style="font-family: Symbol">&#200;</span>{<em><u>x</u><sub>i</sub></em>}</p>
<p style="margin-left: 18pt; text-indent: -18pt"><em>10.<span style="font: 7pt 'Times New Roman'">&nbsp;&nbsp; </span></em>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span style="font-family: 宋体">如果需要，更新向量表达</span></p>
<p style="margin-left: 18pt; text-indent: -18pt"><em>11.<span style="font: 7pt 'Times New Roman'">&nbsp;&nbsp; </span></em>&nbsp;&nbsp;&nbsp;&nbsp;End {if}</p>
<p style="margin-left: 18pt; text-indent: -18pt"><em>12.<span style="font: 7pt 'Times New Roman'">&nbsp;&nbsp; </span></em>End {for}</p>
<p style="text-indent: 21pt"><span style="font-family: 宋体">由上面的描述可以看出</span>BSAS<span style="font-family: 宋体">算法对向量顺序非常依赖，无论是聚类数量还是聚类本身，不同的向量顺序会导致完全不同的聚类结果。另一个影响聚类算法结果的重要因素是阈值&#952;的选择，这个值直接影响最终聚类的数量，如果&#952;太小，就会生成很多不必要的聚类，因为很多情况下向量与聚类的合并条件都受到&#952;的限制，而如果&#952;太大，则聚类数量又会不够。</span>BSAS<span style="font-family: 宋体">比较适合致密聚类，其对数据集进行一次扫描，每次迭代中计算当前向量与聚类间的距离，因为最后的聚类数</span><em>m</em><span style="font-family: 宋体">被认为远小于</span><em>N</em><span style="font-family: 宋体">，故</span>BSAS<span style="font-family: 宋体">的时间复杂度为</span><em>O(N)</em><span style="font-family: 宋体">。</span></p>
<p style="text-indent: 21pt"><span style="font-family: 宋体">由于</span>BSAS<span style="font-family: 宋体">算法依赖于</span>q<span style="font-family: 宋体">，因此这里介绍一种自动估计聚类数</span>q<span style="font-family: 宋体">的简单方法，该方法也适用于其他的聚类算法，令</span><em>BSAS</em>(<em>&#920;</em>)<span style="font-family: 宋体">为具有给定不相似阈值&#952;的</span>BSAS<span style="font-family: 宋体">算法。</span></p>
<p style="margin-left: 18pt; text-indent: -18pt">1.<span style="font: 7pt 'Times New Roman'">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>For <em>&#920;</em>=<em>a</em> to <em>b</em> step <em>c</em></p>
<p style="margin-left: 18pt; text-indent: -18pt">2.<span style="font: 7pt 'Times New Roman'">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>&nbsp;&nbsp;&nbsp;<span style="font-family: 宋体">算法</span><em>BSAS</em>(<em>&#920;</em>)<span style="font-family: 宋体">执行</span>s<span style="font-family: 宋体">次</span><span style="font-family: 宋体">，每一次都使用不同的顺序表示数据。</span></p>
<p style="margin-left: 18pt; text-indent: -18pt">3.<span style="font: 7pt 'Times New Roman'">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>&nbsp;&nbsp;&nbsp;<span style="font-family: 宋体">估计聚类数，</span><em>m</em><em><sub>&#920;</sub></em><span style="font-family: 宋体">作为从</span>s<span style="font-family: 宋体">次</span><em>BSAS</em>(<em>&#920;</em>)<span style="font-family: 宋体">算法得来的最常出现的聚类数。</span></p>
<p style="margin-left: 18pt; text-indent: -18pt">4.<span style="font: 7pt 'Times New Roman'">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>Next <em>&#920;</em></p>
<p style="text-indent: 21pt"><span style="font-family: 宋体">其中</span>a<span style="font-family: 宋体">和</span>b<span style="font-family: 宋体">是数据集的所有向量对的最小和最大不相似级别，</span>c<span style="font-family: 宋体">的选择直接受</span><em>d</em>(<em><u>x</u>,C</em>)<span style="font-family: 宋体">的影响。</span></p>
<h1>2. <span style="font-family: 宋体">算法实现<br />
</span></h1>
<h1>
<div style="border-right: #cccccc 1px solid; padding-right: 5px; border-top: #cccccc 1px solid; padding-left: 4px; font-size: 13px; padding-bottom: 4px; border-left: #cccccc 1px solid; width: 98%; word-break: break-all; padding-top: 4px; border-bottom: #cccccc 1px solid; background-color: #eeeeee"><img alt="" src="http://www.blogjava.net/images/OutliningIndicators/None.gif" align="top" /><span style="color: #0000ff">package</span><span style="color: #000000">&nbsp;util.clustering;<br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/None.gif" align="top" /><br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/None.gif" align="top" /></span><span style="color: #0000ff">import</span><span style="color: #000000">&nbsp;java.util.ArrayList;<br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/None.gif" align="top" /></span><span style="color: #0000ff">import</span><span style="color: #000000">&nbsp;java.util.Collection;<br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/None.gif" align="top" /></span><span style="color: #0000ff">import</span><span style="color: #000000">&nbsp;java.util.Iterator;<br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/None.gif" align="top" /></span><span style="color: #0000ff">import</span><span style="color: #000000">&nbsp;java.util.List;<br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/None.gif" align="top" /><br />
<img id="Codehighlighter1_134_161_Open_Image" onclick="this.style.display='none'; Codehighlighter1_134_161_Open_Text.style.display='none'; Codehighlighter1_134_161_Closed_Image.style.display='inline'; Codehighlighter1_134_161_Closed_Text.style.display='inline';" alt="" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedBlockStart.gif" align="top" /><img id="Codehighlighter1_134_161_Closed_Image" style="display: none" onclick="this.style.display='none'; Codehighlighter1_134_161_Closed_Text.style.display='none'; Codehighlighter1_134_161_Open_Image.style.display='inline'; Codehighlighter1_134_161_Open_Text.style.display='inline';" alt="" src="http://www.blogjava.net/images/OutliningIndicators/ContractedBlock.gif" align="top" /></span><span id="Codehighlighter1_134_161_Closed_Text" style="border-right: #808080 1px solid; border-top: #808080 1px solid; display: none; border-left: #808080 1px solid; border-bottom: #808080 1px solid; background-color: #ffffff">/**&nbsp;*/</span><span id="Codehighlighter1_134_161_Open_Text"><span style="color: #008000">/**</span><span style="color: #008000"><br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;*&nbsp;</span><span style="color: #808080">@author</span><span style="color: #008000">&nbsp;Jia&nbsp;Yu<br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;*<br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedBlockEnd.gif" align="top" />&nbsp;</span><span style="color: #008000">*/</span></span><span style="color: #000000"><br />
<img id="Codehighlighter1_208_2007_Open_Image" onclick="this.style.display='none'; Codehighlighter1_208_2007_Open_Text.style.display='none'; Codehighlighter1_208_2007_Closed_Image.style.display='inline'; Codehighlighter1_208_2007_Closed_Text.style.display='inline';" alt="" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedBlockStart.gif" align="top" /><img id="Codehighlighter1_208_2007_Closed_Image" style="display: none" onclick="this.style.display='none'; Codehighlighter1_208_2007_Closed_Text.style.display='none'; Codehighlighter1_208_2007_Open_Image.style.display='inline'; Codehighlighter1_208_2007_Open_Text.style.display='inline';" alt="" src="http://www.blogjava.net/images/OutliningIndicators/ContractedBlock.gif" align="top" /></span><span style="color: #0000ff">public</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">class</span><span style="color: #000000">&nbsp;BSAS&nbsp;</span><span style="color: #000000">&lt;</span><span style="color: #000000">T&nbsp;</span><span style="color: #0000ff">extends</span><span style="color: #000000">&nbsp;Clusterable</span><span style="color: #000000">&lt;</span><span style="color: #000000">T</span><span style="color: #000000">&gt;&gt;</span><span style="color: #000000">&nbsp;</span><span id="Codehighlighter1_208_2007_Closed_Text" style="border-right: #808080 1px solid; border-top: #808080 1px solid; display: none; border-left: #808080 1px solid; border-bottom: #808080 1px solid; background-color: #ffffff"><img alt="" src="http://www.blogjava.net/Images/dot.gif" /></span><span id="Codehighlighter1_208_2007_Open_Text"><span style="color: #000000">{<br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" /><br />
<img id="Codehighlighter1_212_271_Open_Image" onclick="this.style.display='none'; Codehighlighter1_212_271_Open_Text.style.display='none'; Codehighlighter1_212_271_Closed_Image.style.display='inline'; Codehighlighter1_212_271_Closed_Text.style.display='inline';" alt="" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockStart.gif" align="top" /><img id="Codehighlighter1_212_271_Closed_Image" style="display: none" onclick="this.style.display='none'; Codehighlighter1_212_271_Closed_Text.style.display='none'; Codehighlighter1_212_271_Open_Image.style.display='inline'; Codehighlighter1_212_271_Open_Text.style.display='inline';" alt="" src="http://www.blogjava.net/images/OutliningIndicators/ContractedSubBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;</span><span id="Codehighlighter1_212_271_Closed_Text" style="border-right: #808080 1px solid; border-top: #808080 1px solid; display: none; border-left: #808080 1px solid; border-bottom: #808080 1px solid; background-color: #ffffff">/**&nbsp;*/</span><span id="Codehighlighter1_212_271_Open_Text"><span style="color: #008000">/**</span><span style="color: #008000"><br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;*&nbsp;Basic&nbsp;Sequential&nbsp;Algorithmic&nbsp;Scheme<br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;*&nbsp;适用于致密聚类<br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockEnd.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #008000">*/</span></span><span style="color: #000000"><br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;<br />
<img id="Codehighlighter1_290_293_Open_Image" onclick="this.style.display='none'; Codehighlighter1_290_293_Open_Text.style.display='none'; Codehighlighter1_290_293_Closed_Image.style.display='inline'; Codehighlighter1_290_293_Closed_Text.style.display='inline';" alt="" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockStart.gif" align="top" /><img id="Codehighlighter1_290_293_Closed_Image" style="display: none" onclick="this.style.display='none'; Codehighlighter1_290_293_Closed_Text.style.display='none'; Codehighlighter1_290_293_Open_Image.style.display='inline'; Codehighlighter1_290_293_Open_Text.style.display='inline';" alt="" src="http://www.blogjava.net/images/OutliningIndicators/ContractedSubBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">public</span><span style="color: #000000">&nbsp;BSAS()&nbsp;</span><span id="Codehighlighter1_290_293_Closed_Text" style="border-right: #808080 1px solid; border-top: #808080 1px solid; display: none; border-left: #808080 1px solid; border-bottom: #808080 1px solid; background-color: #ffffff"><img alt="" src="http://www.blogjava.net/Images/dot.gif" /></span><span id="Codehighlighter1_290_293_Open_Text"><span style="color: #000000">{<br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockEnd.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;}</span></span><span style="color: #000000"><br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;<br />
<img id="Codehighlighter1_298_547_Open_Image" onclick="this.style.display='none'; Codehighlighter1_298_547_Open_Text.style.display='none'; Codehighlighter1_298_547_Closed_Image.style.display='inline'; Codehighlighter1_298_547_Closed_Text.style.display='inline';" alt="" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockStart.gif" align="top" /><img id="Codehighlighter1_298_547_Closed_Image" style="display: none" onclick="this.style.display='none'; Codehighlighter1_298_547_Closed_Text.style.display='none'; Codehighlighter1_298_547_Open_Image.style.display='inline'; Codehighlighter1_298_547_Open_Text.style.display='inline';" alt="" src="http://www.blogjava.net/images/OutliningIndicators/ContractedSubBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;</span><span id="Codehighlighter1_298_547_Closed_Text" style="border-right: #808080 1px solid; border-top: #808080 1px solid; display: none; border-left: #808080 1px solid; border-bottom: #808080 1px solid; background-color: #ffffff">/**&nbsp;*/</span><span id="Codehighlighter1_298_547_Open_Text"><span style="color: #008000">/**</span><span style="color: #008000"><br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;*&nbsp;Basic&nbsp;Sequential&nbsp;Algorithmic&nbsp;Scheme<br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;*&nbsp;考虑样本空间中每个向量，根据向量到已有的聚类中心的距离，将它分配到一个已有聚类中，或者一个新生成的聚类中。<br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;*&nbsp;time&nbsp;complexity&nbsp;is&nbsp;O(N)<br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;*&nbsp;BSAS算法对整个数据集只进行一次扫描。<br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;*&nbsp;</span><span style="color: #808080">@param</span><span style="color: #008000">&nbsp;points&nbsp;待聚类的向量<br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;*&nbsp;</span><span style="color: #808080">@param</span><span style="color: #008000">&nbsp;Phi&nbsp;用户定义的不相似性阈值<br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;*&nbsp;</span><span style="color: #808080">@param</span><span style="color: #008000">&nbsp;q&nbsp;用户定义的允许的最大聚类数<br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;*&nbsp;</span><span style="color: #808080">@return</span><span style="color: #008000"><br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockEnd.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #008000">*/</span></span><span style="color: #000000"><br />
<img id="Codehighlighter1_638_1839_Open_Image" onclick="this.style.display='none'; Codehighlighter1_638_1839_Open_Text.style.display='none'; Codehighlighter1_638_1839_Closed_Image.style.display='inline'; Codehighlighter1_638_1839_Closed_Text.style.display='inline';" alt="" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockStart.gif" align="top" /><img id="Codehighlighter1_638_1839_Closed_Image" style="display: none" onclick="this.style.display='none'; Codehighlighter1_638_1839_Closed_Text.style.display='none'; Codehighlighter1_638_1839_Open_Image.style.display='inline'; Codehighlighter1_638_1839_Open_Text.style.display='inline';" alt="" src="http://www.blogjava.net/images/OutliningIndicators/ContractedSubBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">public</span><span style="color: #000000">&nbsp;List</span><span style="color: #000000">&lt;</span><span style="color: #000000">Cluster</span><span style="color: #000000">&lt;</span><span style="color: #000000">T</span><span style="color: #000000">&gt;&gt;</span><span style="color: #000000">&nbsp;cluster(</span><span style="color: #0000ff">final</span><span style="color: #000000">&nbsp;Collection</span><span style="color: #000000">&lt;</span><span style="color: #000000">T</span><span style="color: #000000">&gt;</span><span style="color: #000000">&nbsp;points,</span><span style="color: #0000ff">final</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">double</span><span style="color: #000000">&nbsp;Phi,</span><span style="color: #0000ff">final</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">int</span><span style="color: #000000">&nbsp;q)</span><span id="Codehighlighter1_638_1839_Closed_Text" style="border-right: #808080 1px solid; border-top: #808080 1px solid; display: none; border-left: #808080 1px solid; border-bottom: #808080 1px solid; background-color: #ffffff"><img alt="" src="http://www.blogjava.net/Images/dot.gif" /></span><span id="Codehighlighter1_638_1839_Open_Text"><span style="color: #000000">{<br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">int</span><span style="color: #000000">&nbsp;m&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;</span><span style="color: #000000">0</span><span style="color: #000000">;<br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">int</span><span style="color: #000000">&nbsp;n&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;points.size();<br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">double</span><span style="color: #000000">&nbsp;disOfXandCj&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;</span><span style="color: #000000">0</span><span style="color: #000000">;<br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">double</span><span style="color: #000000">&nbsp;disOfXandCk;<br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;List</span><span style="color: #000000">&lt;</span><span style="color: #000000">T</span><span style="color: #000000">&gt;</span><span style="color: #000000">&nbsp;ptList&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">new</span><span style="color: #000000">&nbsp;ArrayList</span><span style="color: #000000">&lt;</span><span style="color: #000000">T</span><span style="color: #000000">&gt;</span><span style="color: #000000">(points);<br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Cluster</span><span style="color: #000000">&lt;</span><span style="color: #000000">T</span><span style="color: #000000">&gt;</span><span style="color: #000000">&nbsp;C&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">new</span><span style="color: #000000">&nbsp;Cluster</span><span style="color: #000000">&lt;</span><span style="color: #000000">T</span><span style="color: #000000">&gt;</span><span style="color: #000000">(ptList.get(m));<br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;C.addPoint(ptList.get(m));<br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Cluster</span><span style="color: #000000">&lt;</span><span style="color: #000000">T</span><span style="color: #000000">&gt;</span><span style="color: #000000">&nbsp;Ck&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;C;<br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;List</span><span style="color: #000000">&lt;</span><span style="color: #000000">Cluster</span><span style="color: #000000">&lt;</span><span style="color: #000000">T</span><span style="color: #000000">&gt;</span><span style="color: #000000">&nbsp;</span><span style="color: #000000">&gt;</span><span style="color: #000000">&nbsp;cList&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">new</span><span style="color: #000000">&nbsp;ArrayList</span><span style="color: #000000">&lt;</span><span style="color: #000000">Cluster</span><span style="color: #000000">&lt;</span><span style="color: #000000">T</span><span style="color: #000000">&gt;</span><span style="color: #000000">&nbsp;</span><span style="color: #000000">&gt;</span><span style="color: #000000">();<br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;cList.add(C);<br />
<img id="Codehighlighter1_965_1820_Open_Image" onclick="this.style.display='none'; Codehighlighter1_965_1820_Open_Text.style.display='none'; Codehighlighter1_965_1820_Closed_Image.style.display='inline'; Codehighlighter1_965_1820_Closed_Text.style.display='inline';" alt="" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockStart.gif" align="top" /><img id="Codehighlighter1_965_1820_Closed_Image" style="display: none" onclick="this.style.display='none'; Codehighlighter1_965_1820_Closed_Text.style.display='none'; Codehighlighter1_965_1820_Open_Image.style.display='inline'; Codehighlighter1_965_1820_Open_Text.style.display='inline';" alt="" src="http://www.blogjava.net/images/OutliningIndicators/ContractedSubBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">for</span><span style="color: #000000">(</span><span style="color: #0000ff">int</span><span style="color: #000000">&nbsp;i</span><span style="color: #000000">=</span><span style="color: #000000">1</span><span style="color: #000000">;i</span><span style="color: #000000">&lt;</span><span style="color: #000000">n;i</span><span style="color: #000000">++</span><span style="color: #000000">)</span><span id="Codehighlighter1_965_1820_Closed_Text" style="border-right: #808080 1px solid; border-top: #808080 1px solid; display: none; border-left: #808080 1px solid; border-bottom: #808080 1px solid; background-color: #ffffff"><img alt="" src="http://www.blogjava.net/Images/dot.gif" /></span><span id="Codehighlighter1_965_1820_Open_Text"><span style="color: #000000">{<br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;disOfXandCk&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;Double.MAX_VALUE;<br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Iterator</span><span style="color: #000000">&lt;</span><span style="color: #000000">Cluster</span><span style="color: #000000">&lt;</span><span style="color: #000000">T</span><span style="color: #000000">&gt;</span><span style="color: #000000">&nbsp;</span><span style="color: #000000">&gt;</span><span style="color: #000000">&nbsp;cListIt&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;cList.iterator();&nbsp;<br />
<img id="Codehighlighter1_1083_1272_Open_Image" onclick="this.style.display='none'; Codehighlighter1_1083_1272_Open_Text.style.display='none'; Codehighlighter1_1083_1272_Closed_Image.style.display='inline'; Codehighlighter1_1083_1272_Closed_Text.style.display='inline';" alt="" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockStart.gif" align="top" /><img id="Codehighlighter1_1083_1272_Closed_Image" style="display: none" onclick="this.style.display='none'; Codehighlighter1_1083_1272_Closed_Text.style.display='none'; Codehighlighter1_1083_1272_Open_Image.style.display='inline'; Codehighlighter1_1083_1272_Open_Text.style.display='inline';" alt="" src="http://www.blogjava.net/images/OutliningIndicators/ContractedSubBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">while</span><span style="color: #000000">(cListIt.hasNext())</span><span id="Codehighlighter1_1083_1272_Closed_Text" style="border-right: #808080 1px solid; border-top: #808080 1px solid; display: none; border-left: #808080 1px solid; border-bottom: #808080 1px solid; background-color: #ffffff"><img alt="" src="http://www.blogjava.net/Images/dot.gif" /></span><span id="Codehighlighter1_1083_1272_Open_Text"><span style="color: #000000">{<br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Cluster</span><span style="color: #000000">&lt;</span><span style="color: #000000">T</span><span style="color: #000000">&gt;</span><span style="color: #000000">&nbsp;Cj&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;cListIt.next();<br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;disOfXandCj&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;getDisOfPointAndCluster(ptList.get(i),Cj);<br />
<img id="Codehighlighter1_1215_1267_Open_Image" onclick="this.style.display='none'; Codehighlighter1_1215_1267_Open_Text.style.display='none'; Codehighlighter1_1215_1267_Closed_Image.style.display='inline'; Codehighlighter1_1215_1267_Closed_Text.style.display='inline';" alt="" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockStart.gif" align="top" /><img id="Codehighlighter1_1215_1267_Closed_Image" style="display: none" onclick="this.style.display='none'; Codehighlighter1_1215_1267_Closed_Text.style.display='none'; Codehighlighter1_1215_1267_Open_Image.style.display='inline'; Codehighlighter1_1215_1267_Open_Text.style.display='inline';" alt="" src="http://www.blogjava.net/images/OutliningIndicators/ContractedSubBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">if</span><span style="color: #000000">(disOfXandCk&nbsp;</span><span style="color: #000000">&gt;</span><span style="color: #000000">&nbsp;disOfXandCj)</span><span id="Codehighlighter1_1215_1267_Closed_Text" style="border-right: #808080 1px solid; border-top: #808080 1px solid; display: none; border-left: #808080 1px solid; border-bottom: #808080 1px solid; background-color: #ffffff"><img alt="" src="http://www.blogjava.net/Images/dot.gif" /></span><span id="Codehighlighter1_1215_1267_Open_Text"><span style="color: #000000">{<br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;disOfXandCk&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;disOfXandCj;<br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Ck&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;Cj;<br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockEnd.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}</span></span><span style="color: #000000"><br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockEnd.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}</span></span><span style="color: #000000"><br />
<img id="Codehighlighter1_1307_1441_Open_Image" onclick="this.style.display='none'; Codehighlighter1_1307_1441_Open_Text.style.display='none'; Codehighlighter1_1307_1441_Closed_Image.style.display='inline'; Codehighlighter1_1307_1441_Closed_Text.style.display='inline';" alt="" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockStart.gif" align="top" /><img id="Codehighlighter1_1307_1441_Closed_Image" style="display: none" onclick="this.style.display='none'; Codehighlighter1_1307_1441_Closed_Text.style.display='none'; Codehighlighter1_1307_1441_Open_Image.style.display='inline'; Codehighlighter1_1307_1441_Open_Text.style.display='inline';" alt="" src="http://www.blogjava.net/images/OutliningIndicators/ContractedSubBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">if</span><span style="color: #000000">(disOfXandCk&nbsp;</span><span style="color: #000000">&gt;</span><span style="color: #000000">&nbsp;Phi&nbsp;</span><span style="color: #000000">&amp;&amp;</span><span style="color: #000000">&nbsp;m&nbsp;</span><span style="color: #000000">&lt;</span><span style="color: #000000">&nbsp;q)</span><span id="Codehighlighter1_1307_1441_Closed_Text" style="border-right: #808080 1px solid; border-top: #808080 1px solid; display: none; border-left: #808080 1px solid; border-bottom: #808080 1px solid; background-color: #ffffff"><img alt="" src="http://www.blogjava.net/Images/dot.gif" /></span><span id="Codehighlighter1_1307_1441_Open_Text"><span style="color: #000000">{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #008000">//</span><span style="color: #008000">不满足条件，则产生新的聚类</span><span style="color: #008000"><br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" /></span><span style="color: #000000">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;m</span><span style="color: #000000">++</span><span style="color: #000000">;<br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Cluster</span><span style="color: #000000">&lt;</span><span style="color: #000000">T</span><span style="color: #000000">&gt;</span><span style="color: #000000">&nbsp;cm&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">new</span><span style="color: #000000">&nbsp;Cluster</span><span style="color: #000000">&lt;</span><span style="color: #000000">T</span><span style="color: #000000">&gt;</span><span style="color: #000000">(ptList.get(i));<br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;cm.addPoint(ptList.get(i));<br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;cList.add(cm);<br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockEnd.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}</span></span><span style="color: #000000"><br />
<img id="Codehighlighter1_1450_1816_Open_Image" onclick="this.style.display='none'; Codehighlighter1_1450_1816_Open_Text.style.display='none'; Codehighlighter1_1450_1816_Closed_Image.style.display='inline'; Codehighlighter1_1450_1816_Closed_Text.style.display='inline';" alt="" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockStart.gif" align="top" /><img id="Codehighlighter1_1450_1816_Closed_Image" style="display: none" onclick="this.style.display='none'; Codehighlighter1_1450_1816_Closed_Text.style.display='none'; Codehighlighter1_1450_1816_Open_Image.style.display='inline'; Codehighlighter1_1450_1816_Open_Text.style.display='inline';" alt="" src="http://www.blogjava.net/images/OutliningIndicators/ContractedSubBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">else</span><span id="Codehighlighter1_1450_1816_Closed_Text" style="border-right: #808080 1px solid; border-top: #808080 1px solid; display: none; border-left: #808080 1px solid; border-bottom: #808080 1px solid; background-color: #ffffff"><img alt="" src="http://www.blogjava.net/Images/dot.gif" /></span><span id="Codehighlighter1_1450_1816_Open_Text"><span style="color: #000000">{&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #008000">//</span><span style="color: #008000">满足条件的将点加入已有聚类，并更新聚类中心</span><span style="color: #008000"><br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" /></span><span style="color: #000000">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">if</span><span style="color: #000000">(cList.contains(Ck))<br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;cList.remove(Ck);<br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Ck.addPoint(ptList.get(i));<br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">final</span><span style="color: #000000">&nbsp;T&nbsp;newCenter&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;Ck.getCenter().centroidOf(Ck.getPoints());<br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Cluster</span><span style="color: #000000">&lt;</span><span style="color: #000000">T</span><span style="color: #000000">&gt;</span><span style="color: #000000">&nbsp;tempCluster&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">new</span><span style="color: #000000">&nbsp;Cluster</span><span style="color: #000000">&lt;</span><span style="color: #000000">T</span><span style="color: #000000">&gt;</span><span style="color: #000000">(newCenter);<br />
<img id="Codehighlighter1_1727_1783_Open_Image" onclick="this.style.display='none'; Codehighlighter1_1727_1783_Open_Text.style.display='none'; Codehighlighter1_1727_1783_Closed_Image.style.display='inline'; Codehighlighter1_1727_1783_Closed_Text.style.display='inline';" alt="" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockStart.gif" align="top" /><img id="Codehighlighter1_1727_1783_Closed_Image" style="display: none" onclick="this.style.display='none'; Codehighlighter1_1727_1783_Closed_Text.style.display='none'; Codehighlighter1_1727_1783_Open_Image.style.display='inline'; Codehighlighter1_1727_1783_Open_Text.style.display='inline';" alt="" src="http://www.blogjava.net/images/OutliningIndicators/ContractedSubBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">for</span><span style="color: #000000">(</span><span style="color: #0000ff">int</span><span style="color: #000000">&nbsp;j</span><span style="color: #000000">=</span><span style="color: #000000">0</span><span style="color: #000000">;j</span><span style="color: #000000">&lt;</span><span style="color: #000000">Ck.getPoints().size();j</span><span style="color: #000000">++</span><span style="color: #000000">)</span><span id="Codehighlighter1_1727_1783_Closed_Text" style="border-right: #808080 1px solid; border-top: #808080 1px solid; display: none; border-left: #808080 1px solid; border-bottom: #808080 1px solid; background-color: #ffffff"><img alt="" src="http://www.blogjava.net/Images/dot.gif" /></span><span id="Codehighlighter1_1727_1783_Open_Text"><span style="color: #000000">{<br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;tempCluster.addPoint(Ck.getPoints().get(j));<br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockEnd.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}</span></span><span style="color: #000000"><br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;cList.add(tempCluster);<br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockEnd.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}</span></span><span style="color: #000000"><br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockEnd.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}</span></span><span style="color: #000000"><br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">return</span><span style="color: #000000">&nbsp;cList;<br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockEnd.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;}</span></span><span style="color: #000000"><br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" /><br />
<img id="Codehighlighter1_1843_1898_Open_Image" onclick="this.style.display='none'; Codehighlighter1_1843_1898_Open_Text.style.display='none'; Codehighlighter1_1843_1898_Closed_Image.style.display='inline'; Codehighlighter1_1843_1898_Closed_Text.style.display='inline';" alt="" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockStart.gif" align="top" /><img id="Codehighlighter1_1843_1898_Closed_Image" style="display: none" onclick="this.style.display='none'; Codehighlighter1_1843_1898_Closed_Text.style.display='none'; Codehighlighter1_1843_1898_Open_Image.style.display='inline'; Codehighlighter1_1843_1898_Open_Text.style.display='inline';" alt="" src="http://www.blogjava.net/images/OutliningIndicators/ContractedSubBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;</span><span id="Codehighlighter1_1843_1898_Closed_Text" style="border-right: #808080 1px solid; border-top: #808080 1px solid; display: none; border-left: #808080 1px solid; border-bottom: #808080 1px solid; background-color: #ffffff">/**&nbsp;*/</span><span id="Codehighlighter1_1843_1898_Open_Text"><span style="color: #008000">/**</span><span style="color: #008000"><br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;*&nbsp;选择不同的测度，有不同的算法。<br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;*&nbsp;这里默认dis(x,C)为点到聚类中心的距离。<br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockEnd.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #008000">*/</span></span><span style="color: #000000"><br />
<img id="Codehighlighter1_1960_2004_Open_Image" onclick="this.style.display='none'; Codehighlighter1_1960_2004_Open_Text.style.display='none'; Codehighlighter1_1960_2004_Closed_Image.style.display='inline'; Codehighlighter1_1960_2004_Closed_Text.style.display='inline';" alt="" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockStart.gif" align="top" /><img id="Codehighlighter1_1960_2004_Closed_Image" style="display: none" onclick="this.style.display='none'; Codehighlighter1_1960_2004_Closed_Text.style.display='none'; Codehighlighter1_1960_2004_Open_Image.style.display='inline'; Codehighlighter1_1960_2004_Open_Text.style.display='inline';" alt="" src="http://www.blogjava.net/images/OutliningIndicators/ContractedSubBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">private</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">double</span><span style="color: #000000">&nbsp;getDisOfPointAndCluster(T&nbsp;t,&nbsp;Cluster</span><span style="color: #000000">&lt;</span><span style="color: #000000">T</span><span style="color: #000000">&gt;</span><span style="color: #000000">&nbsp;cj)&nbsp;</span><span id="Codehighlighter1_1960_2004_Closed_Text" style="border-right: #808080 1px solid; border-top: #808080 1px solid; display: none; border-left: #808080 1px solid; border-bottom: #808080 1px solid; background-color: #ffffff"><img alt="" src="http://www.blogjava.net/Images/dot.gif" /></span><span id="Codehighlighter1_1960_2004_Open_Text"><span style="color: #000000">{<br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">return</span><span style="color: #000000">&nbsp;t.distanceFrom(cj.getCenter());<br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockEnd.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;}</span></span><span style="color: #000000"><br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" /><br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedBlockEnd.gif" align="top" />}</span></span><span style="color: #000000"><br />
<img alt="" src="http://www.blogjava.net/images/OutliningIndicators/None.gif" align="top" /></span></div>
</h1>
<h1>3. <span style="font-family: 宋体">程序框架</span></h1>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span style="font-family: 宋体">我的聚类程序主要扩展自</span>Apache Commons Math<span style="font-family: 宋体">开源框架，下面是其结构，我简单加入了</span>Clusterer<span style="font-family: 宋体">类作为抽象模板类，使用模板方法模式修改了框架，为后续加入的例如</span>BSAS<span style="font-family: 宋体">算法提供模板。<br />
</span></p>
<h1>
<div align="center"><img alt="" src="http://www.blogjava.net/images/blogjava_net/changedi/1.png" border="0" /></div>
</h1>
<h1>4. <span style="font-family: 宋体">小结</span></h1>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span style="font-family: 宋体">顺序算法简单易实现，对于学习聚类来说是入门的最好选择，考虑到篇幅的限制，不能将代码全部发上来，如果有需要可以向我索要，</span>Apache Commons Math<span style="font-family: 宋体">框架可以到</span>Apache<span style="font-family: 宋体">的网站上下载。另外还有很多介绍不够详细，感兴趣的朋友可以继续深入研究</span>BSAS<span style="font-family: 宋体">的扩展。</span></p>
<h1>5. <span style="font-family: 宋体">参考文献及推荐阅读</span></h1>
<p>[1]Pattern Recognition Third Edition, Sergios Theodoridis, Konstantinos Koutroumbas </p>
<p>[2]<span style="font-family: 宋体">模式识别</span><span style="font-family: 宋体">第三版</span>, Sergios Theodoridis, Konstantinos Koutroumbas<span style="font-family: 宋体">著</span>, <span style="font-family: 宋体">李晶皎</span>, <span style="font-family: 宋体">王爱侠</span>, <span style="font-family: 宋体">张广源等译</span></p>
<img src ="http://www.blogjava.net/changedi/aggbug/314698.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/changedi/" target="_blank">changedi</a> 2010-03-06 15:02 <a href="http://www.blogjava.net/changedi/archive/2010/03/06/314698.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>聚类算法学习笔记（二）——近邻测度</title><link>http://www.blogjava.net/changedi/archive/2010/01/17/309845.html</link><dc:creator>changedi</dc:creator><author>changedi</author><pubDate>Sun, 17 Jan 2010 05:10:00 GMT</pubDate><guid>http://www.blogjava.net/changedi/archive/2010/01/17/309845.html</guid><wfw:comment>http://www.blogjava.net/changedi/comments/309845.html</wfw:comment><comments>http://www.blogjava.net/changedi/archive/2010/01/17/309845.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/changedi/comments/commentRss/309845.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/changedi/services/trackbacks/309845.html</trackback:ping><description><![CDATA[&nbsp;
<h1 style="margin-left: 18pt; text-indent: -18pt">1.<span style="font: 7pt 'Times New Roman'">&nbsp;&nbsp;&nbsp; </span><span style="font-family: 宋体">测度定义</span></h1>
<p style="text-indent: 21pt"><span style="font-family: 宋体">&#8220;数学上，测度</span>(Measure)<span style="font-family: 宋体">是一个函数，它对一个给定集合的某些子集指定一个数，这个数可以比作大小、体积、概率等等。传统的积分是在区间上进行的，后来人们希望把积分推广到任意的集合上，就发展出测度的概念，它在数学分析和概率论有重要的地位&#8221;</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span style="font-family: 宋体">——</span>wikipedia</p>
<p style="text-indent: 21pt"><span style="font-family: 宋体">聚类之前一定要定义好向量之间的相似程度——即近邻测度。在聚类过程中我们使用的测度，范围要更广泛，首先定义向量之间的测度，接着就是集合与向量，集合之间的测度。</span></p>
<p style="text-indent: 21pt"><span style="font-family: 宋体">对于</span>X<span style="font-family: 宋体">上的<strong>不相似测度</strong></span>(Dissimilarity Measure, DM) <em>d</em> <span style="font-family: 宋体">是一个函数：<img height="29" alt="" src="http://www.blogjava.net/images/blogjava_net/changedi/1.JPG" width="129" border="0" /></span>&nbsp;<span style="font-family: 宋体">其中</span>R<span style="font-family: 宋体">是实数集合，如果</span><em>d</em><span style="font-family: 宋体">有以下的属性：</span></p>
<p style="text-indent: 21pt; text-align: center" align="center"><img height="29" alt="" src="http://www.blogjava.net/images/blogjava_net/changedi/1.1.JPG" width="366" border="0" />&nbsp;&nbsp;&nbsp;&nbsp; <span style="font-family: 宋体">（</span>1.1<span style="font-family: 宋体">）</span></p>
<p style="text-indent: 21pt; text-align: center" align="center"><img height="28" alt="" src="http://www.blogjava.net/images/blogjava_net/changedi/1.2.JPG" width="165" border="0" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span style="font-family: 宋体">（</span>1.2<span style="font-family: 宋体">）</span></p>
<p style="text-indent: 21pt; text-align: center" align="center"><img height="28" alt="" src="http://www.blogjava.net/images/blogjava_net/changedi/1.3.JPG" width="220" border="0" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span style="font-family: 宋体">（</span>1.3<span style="font-family: 宋体">）</span></p>
<p style="text-indent: 21pt"><span style="font-family: 宋体">如果又满足</span></p>
<p style="text-indent: 21pt; text-align: center" align="center"><img height="32" alt="" src="http://www.blogjava.net/images/blogjava_net/changedi/1.4.JPG" width="240" border="0" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span style="font-family: 宋体">（</span>1.4<span style="font-family: 宋体">）</span></p>
<p style="text-indent: 21pt; text-align: center" align="center"><img height="28" alt="" src="http://www.blogjava.net/images/blogjava_net/changedi/1.5.JPG" width="302" border="0" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span style="font-family: 宋体">（</span>1.5<span style="font-family: 宋体">）</span></p>
<p style="text-indent: 21pt"><span style="font-family: 宋体">那么</span><em>d</em><span style="font-family: 宋体">被称为度量</span>DM<span style="font-family: 宋体">。其中的公式（</span>1.5<span style="font-family: 宋体">）也叫三角不等式。稍稍解释一下（其实太好理解了），不相似性测度其实就像我们说的距离一样，两个向量代表两个对象好了。公式</span>1.2<span style="font-family: 宋体">定义（向量）对象自己和自己的距离是</span><em>d<sub>0</sub></em><span style="font-family: 宋体">；公式</span>1.1<span style="font-family: 宋体">说明了任意两个对象之间的距离要小于正无穷却大于自己和自己的距离（你和别人的距离大于你和自己的距离，这不废话吗＾＿＾）；公式</span>1.3<span style="font-family: 宋体">说明距离的交互性；公式</span>1.4<span style="font-family: 宋体">不解释了，公式</span>1.5<span style="font-family: 宋体">就是三角不等式（初中水平）。</span></p>
<p style="text-indent: 21pt"><span style="font-family: 宋体">同理<strong>相似性测度</strong></span>(Similarity Measure, SM)<span style="font-family: 宋体">定义为<img style="width: 128px; height: 29px" height="29" alt="" src="http://www.blogjava.net/images/blogjava_net/changedi/1.0.JPG" width="128" border="0" /></span><span style="font-family: 宋体">满足：</span></p>
<p style="text-indent: 21pt; text-align: center" align="center"><img height="28" alt="" src="http://www.blogjava.net/images/blogjava_net/changedi/1.6.JPG" width="358" border="0" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span style="font-family: 宋体">（</span>1.6<span style="font-family: 宋体">）</span></p>
<p style="text-indent: 21pt; text-align: center" align="center"><img height="28" alt="" src="http://www.blogjava.net/images/blogjava_net/changedi/1.7.JPG" width="164" border="0" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span style="font-family: 宋体">（</span>1.7<span style="font-family: 宋体">）</span></p>
<p style="text-indent: 21pt; text-align: center" align="center"><img height="28" alt="" src="http://www.blogjava.net/images/blogjava_net/changedi/1.8.JPG" width="218" border="0" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span style="font-family: 宋体">（</span>1.8<span style="font-family: 宋体">）</span></p>
<p style="text-indent: 21pt"><span style="font-family: 宋体">如果又满足</span></p>
<p style="text-indent: 21pt; text-align: center" align="center"><img height="32" alt="" src="http://www.blogjava.net/images/blogjava_net/changedi/1.9.JPG" width="213" border="0" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span style="font-family: 宋体">（</span>1.9<span style="font-family: 宋体">）</span></p>
<p style="text-indent: 21pt; text-align: center" align="center"><img height="28" alt="" src="http://www.blogjava.net/images/blogjava_net/changedi/1.10.JPG" width="406" border="0" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span style="font-family: 宋体">（</span>1.10<span style="font-family: 宋体">）</span></p>
<p style="text-indent: 21pt"><span style="font-family: 宋体">就把</span><em>s</em><span style="font-family: 宋体">叫做度量</span>SM<span style="font-family: 宋体">。具体同</span>DM<span style="font-family: 宋体">，各公式的表达一目了然哦</span>~~~</p>
<p style="text-indent: 21pt"><span style="font-family: 宋体">从定义和字面上我们都可以看出二者的不同，在表达相似性时两者都可以，只不过度量的角度不同，对于判别相似，</span>DM<span style="font-family: 宋体">越大说明越不相似，越小则越相似，而</span>SM<span style="font-family: 宋体">却正好相反，因此我们也可以联想，</span>DM<span style="font-family: 宋体">与</span>SM<span style="font-family: 宋体">可以利用这种对立关系来定义。举例来说，如果</span><em>d</em><span style="font-family: 宋体">是一个</span>DM<span style="font-family: 宋体">，那么</span><em>s=</em>1/<em>d</em><span style="font-family: 宋体">就是一个</span>SM<span style="font-family: 宋体">。</span></p>
<h1>2. <span style="font-family: 宋体">向量之间的近邻测度</span></h1>
<p style="text-indent: 21pt"><span style="font-family: 宋体">上面的定义只是一个宏观的概括，那么具体的向量之间的测度如何计算呢？下面将详细的介绍。</span></p>
<p style="text-indent: 21pt"><span style="font-family: 宋体">首先对于实向量的不相似测度，实际应用中最通用的就是<strong>加权</strong></span><strong><em>l<sub>p</sub></em></strong><strong><span style="font-family: 宋体">度量</span></strong><span style="font-family: 宋体">了：</span></p>
<p style="text-indent: 21pt; text-align: center" align="center"><img height="48" alt="" src="http://www.blogjava.net/images/blogjava_net/changedi/2.1.JPG" width="242" border="0" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span style="font-family: 宋体">（</span>2.1<span style="font-family: 宋体">）</span></p>
<p style="text-indent: 21pt"><span style="font-family: 宋体">其中的</span><em>x<sub>i</sub></em><span style="font-family: 宋体">和</span><em>y<sub>i</sub></em><span style="font-family: 宋体">分别是向量</span><em><u>x</u></em><span style="font-family: 宋体">和</span><em><u>y</u></em><span style="font-family: 宋体">中的第</span><em>i</em><span style="font-family: 宋体">个值，</span><em>w<sub>i</sub></em><span style="font-family: 宋体">是第</span><em>i</em><span style="font-family: 宋体">个权重系数，</span><em>l</em><span style="font-family: 宋体">是向量的维数（以下公式定义同）。而我们比较感兴趣的就是当</span>p=1<span style="font-family: 宋体">时，该度量就是加权</span>Manhattan<span style="font-family: 宋体">范数，而当</span>p=2<span style="font-family: 宋体">时就是加权欧几里得范数，当</span>p=<span style="font-family: 宋体">&#8734;</span><span style="font-family: 宋体">时就是</span>max<em><sub>1</sub></em><sub><span style="font-family: Symbol">&#163;</span></sub><em><sub>i</sub></em><sub><span style="font-family: Symbol">&#163;</span></sub><em><sub>l</sub></em> <em>w<sub>i</sub></em>|<em>x<sub>i</sub>-y<sub>i</sub></em>|<span style="font-family: 宋体">了。根据这些</span>DM<span style="font-family: 宋体">，我们定义</span>SM<span style="font-family: 宋体">为</span><em>b<sub>max </sub>- d<sub>p</sub>(<u>x</u>,<u>y</u>)</em><span style="font-family: 宋体">。</span></p>
<p style="text-indent: 21pt"><span style="font-family: 宋体">另外还有一些其他的定义方法，比如</span></p>
<p style="text-indent: 21pt; text-align: center" align="center"><img height="54" alt="" src="http://www.blogjava.net/images/blogjava_net/changedi/2.2.JPG" width="288" border="0" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span style="font-family: 宋体">（</span>2.2<span style="font-family: 宋体">）</span></p>
<p style="text-indent: 21pt; text-align: center" align="center"><img height="62" alt="" src="http://www.blogjava.net/images/blogjava_net/changedi/2.3.JPG" width="213" border="0" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span style="font-family: 宋体">（</span>2.3<span style="font-family: 宋体">）</span></p>
<p style="text-indent: 21pt"><span style="font-family: 宋体">其他懒得列出了，先查阅资料，这里不详述了。</span></p>
<p style="text-indent: 21pt"><span style="font-family: 宋体">对于实向量的相似性测度，实际中常用的有：</span></p>
<p style="text-indent: 21pt; text-align: center" align="center"><span style="font-family: 宋体">内积：<img height="48" alt="" src="http://www.blogjava.net/images/blogjava_net/changedi/2.4.JPG" width="208" border="0" /></span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span style="font-family: 宋体">（</span>2.4<span style="font-family: 宋体">）</span></p>
<p style="text-indent: 21pt; text-align: center" align="center"><em>Tanimoto</em><span style="font-family: 宋体">测度：<img height="57" alt="" src="http://www.blogjava.net/images/blogjava_net/changedi/2.5.JPG" width="261" border="0" /></span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span style="font-family: 宋体">（</span>2.5<span style="font-family: 宋体">）</span></p>
<p style="text-indent: 21pt; text-align: center" align="center"><span style="font-family: 宋体">其他：<img height="50" alt="" src="http://www.blogjava.net/images/blogjava_net/changedi/2.6.JPG" width="229" border="0" /></span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span style="font-family: 宋体">（</span>2.6<span style="font-family: 宋体">）</span></p>
<p style="text-indent: 21pt" align="center">------------------------------------------------take a nap------------------------------------------------------------</p>
<p style="text-indent: 21pt"><span style="font-family: 宋体">对于离散值的向量，首先必须要搞清楚一个概念，这里在《模式识别》的中文译作中我感觉翻译的并不好理解，所以这里展开说明一下，那就是一个叫做相依表</span>(contingency table)<span style="font-family: 宋体">的概念。对于一个向量</span><em><u>x</u></em><span style="font-family: 宋体">，其元素值属于有限集</span><em>F=</em>{0<em>,</em>1<em>,&#8230;,k</em>-1}<span style="font-family: 宋体">，其中</span>k<span style="font-family: 宋体">是正整数。令</span><em>A</em>(<em><u>x</u>,<u>y</u></em>)=[<em>a<sub>ij</sub></em>]<em>, i, j</em>=0<em>,</em>1<em>,&#8230;,k</em>-1<span style="font-family: 宋体">是一个</span><em>k</em><span style="font-family: 宋体">阶方阵，其中元素</span><em>a<sub>ij</sub></em><span style="font-family: 宋体">代表在</span><em><u>x</u></em><span style="font-family: 宋体">中所有</span><em>i</em><span style="font-family: 宋体">值所在的位置在</span><em><u>y</u></em><span style="font-family: 宋体">的同样位置有</span><em>j</em><span style="font-family: 宋体">值的个数。附原文：</span>the number of places where <em><u>x</u></em> has the <em>i</em>-th symbol and <em><u>y</u></em> has the <em>j</em>-th symbol<span style="font-family: 宋体">。举例来说吧，</span><em>k</em>=3<span style="font-family: 宋体">，且</span><em><u>x</u></em>=[0,1,2,1,2,1]<span style="font-family: 宋体">，</span><em><u>y</u></em>=[1,0,2,1,0,1]<span style="font-family: 宋体">，那么</span><em>A(<u>x</u>,<u>y</u>)</em> = [0 1 0, 1 2 0, 1 0 1]<span style="font-family: 宋体">。以第一个</span>0(<em>a<sub>00</sub></em>)<span style="font-family: 宋体">为例说明，</span>0<span style="font-family: 宋体">在</span><em>A</em><span style="font-family: 宋体">中的位置决定</span><em>i</em>=0<span style="font-family: 宋体">，</span><em>j</em>=0<span style="font-family: 宋体">，在</span><em><u>x</u></em><span style="font-family: 宋体">中</span>0<span style="font-family: 宋体">所在的位置是第一个位置，而</span><em><u>y</u></em><span style="font-family: 宋体">中</span>0<span style="font-family: 宋体">所在的位置为第二个和第五个，两个向量中没有相同位置上的相同</span>0<span style="font-family: 宋体">元素，因此</span><em>A</em><span style="font-family: 宋体">中第一个元素</span><em>a<sub>00</sub></em><span style="font-family: 宋体">为</span>0<span style="font-family: 宋体">，而</span><em>A</em><span style="font-family: 宋体">中第二个为</span>1(<em>a<sub>01</sub></em>)<span style="font-family: 宋体">，所以</span><em>i</em>=0<span style="font-family: 宋体">，</span><em>j</em>=1<span style="font-family: 宋体">，在</span><em><u>x</u></em><span style="font-family: 宋体">中</span>0<span style="font-family: 宋体">所在的位置是第一个，而</span><em><u>y</u></em><span style="font-family: 宋体">中</span>1<span style="font-family: 宋体">所在的位置为第一、四、六个，因此有一个相同，所以</span><em>a<sub>01</sub></em>=1<span style="font-family: 宋体">。</span></p>
<p style="text-indent: 21pt"><span style="font-family: 宋体">关于计算矩阵</span><em>A</em><span style="font-family: 宋体">这里附加</span>java<span style="font-family: 宋体">代码实现，可参考：</span></p>
<p style="text-indent: 21pt"><span style="font-family: 宋体"></p>
<p style="text-indent: 21pt"></p>
<div style="border-right: #cccccc 1px solid; padding-right: 5px; border-top: #cccccc 1px solid; padding-left: 4px; font-size: 13px; padding-bottom: 4px; border-left: #cccccc 1px solid; width: 98%; word-break: break-all; padding-top: 4px; border-bottom: #cccccc 1px solid; background-color: #eeeeee"><span style="color: #008080">&nbsp;1</span><img id="Codehighlighter1_0_235_Open_Image" onclick="this.style.display='none'; Codehighlighter1_0_235_Open_Text.style.display='none'; Codehighlighter1_0_235_Closed_Image.style.display='inline'; Codehighlighter1_0_235_Closed_Text.style.display='inline';" alt="" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedBlockStart.gif" align="top" /><img id="Codehighlighter1_0_235_Closed_Image" style="display: none" onclick="this.style.display='none'; Codehighlighter1_0_235_Closed_Text.style.display='none'; Codehighlighter1_0_235_Open_Image.style.display='inline'; Codehighlighter1_0_235_Open_Text.style.display='inline';" alt="" src="http://www.blogjava.net/images/OutliningIndicators/ContractedBlock.gif" align="top" /><span id="Codehighlighter1_0_235_Closed_Text" style="border-right: #808080 1px solid; border-top: #808080 1px solid; display: none; border-left: #808080 1px solid; border-bottom: #808080 1px solid; background-color: #ffffff">/**&nbsp;*/</span><span id="Codehighlighter1_0_235_Open_Text"><span style="color: #008000">/**</span><span style="color: #008000"><br />
</span><span style="color: #008080">&nbsp;2</span><span style="color: #008000"><img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;*&nbsp;<br />
</span><span style="color: #008080">&nbsp;3</span><span style="color: #008000"><img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;*&nbsp;</span><span style="color: #808080">@param</span><span style="color: #008000">&nbsp;k<br />
</span><span style="color: #008080">&nbsp;4</span><span style="color: #008000"><img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;the&nbsp;number&nbsp;of&nbsp;finite&nbsp;set&nbsp;F<br />
</span><span style="color: #008080">&nbsp;5</span><span style="color: #008000"><img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;*&nbsp;</span><span style="color: #808080">@param</span><span style="color: #008000">&nbsp;x<br />
</span><span style="color: #008080">&nbsp;6</span><span style="color: #008000"><img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;the&nbsp;vector&nbsp;x&nbsp;belongs&nbsp;to&nbsp;F^l<br />
</span><span style="color: #008080">&nbsp;7</span><span style="color: #008000"><img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;*&nbsp;</span><span style="color: #808080">@param</span><span style="color: #008000">&nbsp;y<br />
</span><span style="color: #008080">&nbsp;8</span><span style="color: #008000"><img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;*&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;the&nbsp;vector&nbsp;y&nbsp;belongs&nbsp;to&nbsp;F^l<br />
</span><span style="color: #008080">&nbsp;9</span><span style="color: #008000"><img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;*&nbsp;</span><span style="color: #808080">@return</span><span style="color: #008000">&nbsp;the&nbsp;contingency&nbsp;table&nbsp;A<br />
</span><span style="color: #008080">10</span><span style="color: #008000"><img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;*&nbsp;</span><span style="color: #808080">@author</span><span style="color: #008000">&nbsp;$Jia&nbsp;Yu<br />
</span><span style="color: #008080">11</span><span style="color: #008000"><img alt="" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedBlockEnd.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #008000">*/</span></span><span style="color: #000000"><br />
</span><span style="color: #008080">12</span><span style="color: #000000"><img alt="" src="http://www.blogjava.net/images/OutliningIndicators/None.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">public</span><span style="color: #000000">&nbsp;Integer[][]&nbsp;calContingencyTable(Integer&nbsp;k,&nbsp;Vector</span><span style="color: #000000">&lt;</span><span style="color: #000000">Integer</span><span style="color: #000000">&gt;</span><span style="color: #000000">&nbsp;x,<br />
</span><span style="color: #008080">13</span><span style="color: #000000"><img id="Codehighlighter1_329_765_Open_Image" onclick="this.style.display='none'; Codehighlighter1_329_765_Open_Text.style.display='none'; Codehighlighter1_329_765_Closed_Image.style.display='inline'; Codehighlighter1_329_765_Closed_Text.style.display='inline';" alt="" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedBlockStart.gif" align="top" /><img id="Codehighlighter1_329_765_Closed_Image" style="display: none" onclick="this.style.display='none'; Codehighlighter1_329_765_Closed_Text.style.display='none'; Codehighlighter1_329_765_Open_Image.style.display='inline'; Codehighlighter1_329_765_Open_Text.style.display='inline';" alt="" src="http://www.blogjava.net/images/OutliningIndicators/ContractedBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Vector</span><span style="color: #000000">&lt;</span><span style="color: #000000">Integer</span><span style="color: #000000">&gt;</span><span style="color: #000000">&nbsp;y)&nbsp;</span><span id="Codehighlighter1_329_765_Closed_Text" style="border-right: #808080 1px solid; border-top: #808080 1px solid; display: none; border-left: #808080 1px solid; border-bottom: #808080 1px solid; background-color: #ffffff"><img alt="" src="http://www.blogjava.net/Images/dot.gif" /></span><span id="Codehighlighter1_329_765_Open_Text"><span style="color: #000000">{<br />
</span><span style="color: #008080">14</span><span style="color: #000000"><img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">if</span><span style="color: #000000">&nbsp;(x.size()&nbsp;</span><span style="color: #000000">!=</span><span style="color: #000000">&nbsp;y.size())<br />
</span><span style="color: #008080">15</span><span style="color: #000000"><img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">throw</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">new</span><span style="color: #000000">&nbsp;IllegalArgumentException(<br />
</span><span style="color: #008080">16</span><span style="color: #000000"><img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #000000">"</span><span style="color: #000000">The&nbsp;two&nbsp;vectors&nbsp;are&nbsp;not&nbsp;the&nbsp;same&nbsp;size!</span><span style="color: #000000">"</span><span style="color: #000000">);<br />
</span><span style="color: #008080">17</span><span style="color: #000000"><img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Integer[][]&nbsp;A&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;</span><span style="color: #0000ff">new</span><span style="color: #000000">&nbsp;Integer[k][k];<br />
</span><span style="color: #008080">18</span><span style="color: #000000"><img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Integer&nbsp;count_ij;<br />
</span><span style="color: #008080">19</span><span style="color: #000000"><img id="Codehighlighter1_533_750_Open_Image" onclick="this.style.display='none'; Codehighlighter1_533_750_Open_Text.style.display='none'; Codehighlighter1_533_750_Closed_Image.style.display='inline'; Codehighlighter1_533_750_Closed_Text.style.display='inline';" alt="" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockStart.gif" align="top" /><img id="Codehighlighter1_533_750_Closed_Image" style="display: none" onclick="this.style.display='none'; Codehighlighter1_533_750_Closed_Text.style.display='none'; Codehighlighter1_533_750_Open_Image.style.display='inline'; Codehighlighter1_533_750_Open_Text.style.display='inline';" alt="" src="http://www.blogjava.net/images/OutliningIndicators/ContractedSubBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">for</span><span style="color: #000000">&nbsp;(</span><span style="color: #0000ff">int</span><span style="color: #000000">&nbsp;i&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;</span><span style="color: #000000">0</span><span style="color: #000000">;&nbsp;i&nbsp;</span><span style="color: #000000">&lt;</span><span style="color: #000000">&nbsp;k;&nbsp;i</span><span style="color: #000000">++</span><span style="color: #000000">)&nbsp;</span><span id="Codehighlighter1_533_750_Closed_Text" style="border-right: #808080 1px solid; border-top: #808080 1px solid; display: none; border-left: #808080 1px solid; border-bottom: #808080 1px solid; background-color: #ffffff"><img alt="" src="http://www.blogjava.net/Images/dot.gif" /></span><span id="Codehighlighter1_533_750_Open_Text"><span style="color: #000000">{<br />
</span><span style="color: #008080">20</span><span style="color: #000000"><img id="Codehighlighter1_566_746_Open_Image" onclick="this.style.display='none'; Codehighlighter1_566_746_Open_Text.style.display='none'; Codehighlighter1_566_746_Closed_Image.style.display='inline'; Codehighlighter1_566_746_Closed_Text.style.display='inline';" alt="" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockStart.gif" align="top" /><img id="Codehighlighter1_566_746_Closed_Image" style="display: none" onclick="this.style.display='none'; Codehighlighter1_566_746_Closed_Text.style.display='none'; Codehighlighter1_566_746_Open_Image.style.display='inline'; Codehighlighter1_566_746_Open_Text.style.display='inline';" alt="" src="http://www.blogjava.net/images/OutliningIndicators/ContractedSubBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">for</span><span style="color: #000000">&nbsp;(</span><span style="color: #0000ff">int</span><span style="color: #000000">&nbsp;j&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;</span><span style="color: #000000">0</span><span style="color: #000000">;&nbsp;j&nbsp;</span><span style="color: #000000">&lt;</span><span style="color: #000000">&nbsp;k;&nbsp;j</span><span style="color: #000000">++</span><span style="color: #000000">)&nbsp;</span><span id="Codehighlighter1_566_746_Closed_Text" style="border-right: #808080 1px solid; border-top: #808080 1px solid; display: none; border-left: #808080 1px solid; border-bottom: #808080 1px solid; background-color: #ffffff"><img alt="" src="http://www.blogjava.net/Images/dot.gif" /></span><span id="Codehighlighter1_566_746_Open_Text"><span style="color: #000000">{<br />
</span><span style="color: #008080">21</span><span style="color: #000000"><img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;count_ij&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;</span><span style="color: #000000">0</span><span style="color: #000000">;<br />
</span><span style="color: #008080">22</span><span style="color: #000000"><img id="Codehighlighter1_628_717_Open_Image" onclick="this.style.display='none'; Codehighlighter1_628_717_Open_Text.style.display='none'; Codehighlighter1_628_717_Closed_Image.style.display='inline'; Codehighlighter1_628_717_Closed_Text.style.display='inline';" alt="" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockStart.gif" align="top" /><img id="Codehighlighter1_628_717_Closed_Image" style="display: none" onclick="this.style.display='none'; Codehighlighter1_628_717_Closed_Text.style.display='none'; Codehighlighter1_628_717_Open_Image.style.display='inline'; Codehighlighter1_628_717_Open_Text.style.display='inline';" alt="" src="http://www.blogjava.net/images/OutliningIndicators/ContractedSubBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">for</span><span style="color: #000000">&nbsp;(</span><span style="color: #0000ff">int</span><span style="color: #000000">&nbsp;xi&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;</span><span style="color: #000000">0</span><span style="color: #000000">;&nbsp;xi&nbsp;</span><span style="color: #000000">&lt;</span><span style="color: #000000">&nbsp;x.size();&nbsp;xi</span><span style="color: #000000">++</span><span style="color: #000000">)&nbsp;</span><span id="Codehighlighter1_628_717_Closed_Text" style="border-right: #808080 1px solid; border-top: #808080 1px solid; display: none; border-left: #808080 1px solid; border-bottom: #808080 1px solid; background-color: #ffffff"><img alt="" src="http://www.blogjava.net/Images/dot.gif" /></span><span id="Codehighlighter1_628_717_Open_Text"><span style="color: #000000">{<br />
</span><span style="color: #008080">23</span><span style="color: #000000"><img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">if</span><span style="color: #000000">&nbsp;(x.elementAt(xi).equals(i)&nbsp;</span><span style="color: #000000">&amp;&amp;</span><span style="color: #000000">&nbsp;y.elementAt(xi).equals(j))<br />
</span><span style="color: #008080">24</span><span style="color: #000000"><img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;count_ij</span><span style="color: #000000">++</span><span style="color: #000000">;<br />
</span><span style="color: #008080">25</span><span style="color: #000000"><img alt="" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockEnd.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}</span></span><span style="color: #000000"><br />
</span><span style="color: #008080">26</span><span style="color: #000000"><img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;A[i][j]&nbsp;</span><span style="color: #000000">=</span><span style="color: #000000">&nbsp;count_ij;<br />
</span><span style="color: #008080">27</span><span style="color: #000000"><img alt="" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockEnd.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}</span></span><span style="color: #000000"><br />
</span><span style="color: #008080">28</span><span style="color: #000000"><img alt="" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedSubBlockEnd.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}</span></span><span style="color: #000000"><br />
</span><span style="color: #008080">29</span><span style="color: #000000"><img alt="" src="http://www.blogjava.net/images/OutliningIndicators/InBlock.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span><span style="color: #0000ff">return</span><span style="color: #000000">&nbsp;A;<br />
</span><span style="color: #008080">30</span><span style="color: #000000"><img alt="" src="http://www.blogjava.net/images/OutliningIndicators/ExpandedBlockEnd.gif" align="top" />&nbsp;&nbsp;&nbsp;&nbsp;}</span></span></div>
<p style="text-indent: 21pt"><br />
有了相依表的定义，可以定义离散向量之间的不相似性测度了。</span></p>
<p style="text-indent: 21pt; text-align: center" align="center"><span style="font-family: 宋体">汉明距离：<img height="58" alt="" src="http://www.blogjava.net/images/blogjava_net/changedi/2.7.JPG" width="150" border="0" /></span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span style="font-family: 宋体">（</span>2.7<span style="font-family: 宋体">）</span></p>
<p style="text-indent: 21pt; text-align: center" align="center">L1<span style="font-family: 宋体">距离：<img height="48" alt="" src="http://www.blogjava.net/images/blogjava_net/changedi/2.8.JPG" width="176" border="0" /></span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span style="font-family: 宋体">（</span>2.8<span style="font-family: 宋体">）</span></p>
<p style="text-indent: 21pt"><span style="font-family: 宋体">同样，相似性测度有</span></p>
<p style="text-indent: 21pt; text-align: center" align="center">Tanimoto<span style="font-family: 宋体">测度：<img height="93" alt="" src="http://www.blogjava.net/images/blogjava_net/changedi/2.9.JPG" width="225" border="0" /></span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span style="font-family: 宋体">（</span>2.9<span style="font-family: 宋体">）</span></p>
<p style="text-indent: 21pt"><span style="font-family: 宋体">其中的</span><em>n<sub>x</sub></em>( <em>n<sub>y</sub></em>)<span style="font-family: 宋体">表示</span><em><u>x</u></em>(<em><u>y</u></em>)<span style="font-family: 宋体">中非零元素的个数。</span></p>
<p style="text-indent: 21pt"><span style="font-family: 宋体">书本往往教给我们的是基础而不是应用，这些基础知识在实际应用中才会得到更多的改进和变化。也许我们不会简单的在聚类中应用这些测度概念，但是复杂的组合都是来源于基础。因此，对测度的基础概念一定要牢牢把握。在前一阶段做图像分割时，聚类算法执行的前提之一测度，我就做过多个实验，</span>L1<span style="font-family: 宋体">和</span>L2<span style="font-family: 宋体">范数，</span>Tanimoto<span style="font-family: 宋体">测度等。当然不同的图像特征有不同的计算距离方法，总之实际的经验告诉我，基础扎实后，在应用起来是相当的顺手啊</span>~~~<span style="font-family: 宋体">（最起码不会被复杂公式吓到）</span></p>
<h1>3. <span style="font-family: 宋体">特殊情况处理</span></h1>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span style="font-family: 宋体">考虑到实例向量的特征类型往往是复杂混合的，这种情况下，如何计算近邻测度呢？一些偷懒的做法就是将所有值都看作是实值类型，把混合向量当作实向量来处理。但是现实使用中，这样做的效果往往差强人意。考虑将实值类型转换成离散类型，这就是著名的离散化了，特征的离散化操作时特征或属性过滤</span>(filter)<span style="font-family: 宋体">的一个重要的方面。当然我最推荐的还是基于自己开发的应用场景，设计相关的近邻测度。这样可能通用性比较差，但是如果是问题驱动的话，或者目标驱动，那么这个作为一个</span>solution<span style="font-family: 宋体">也不失优良性。当然引入模糊测度的概念也是一种解决方法，这里就不细说了，具体应用可以参看有关模糊和不确定性的文章。另外一点需要说明就是实例向量中部分特征丢失的情况，对于丢失数据，如果我们知道数据的分布，那么合理假设是一个替代方案，但是如果为了省事，常用的做法是直接丢弃该实例向量，或者好点的做法是取所有实例的平均数据作为该维度的替代数据。</span></p>
<h1>4. <span style="font-family: 宋体">点与集合之间的测度</span></h1>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span style="font-family: 宋体">随着聚类过程的不断进行，层次逐渐深入，聚类已经不仅仅是判断点与点之间的相似程度了，点与集合的相似程度也需要计算。而如何定义向量</span><em><u>x</u></em><span style="font-family: 宋体">和聚类</span><em>C</em><span style="font-family: 宋体">之间的近邻性，从而判断是否将</span><em><u>x</u></em><span style="font-family: 宋体">归类为</span><em>C</em><span style="font-family: 宋体">。以下三个定义经常用到。</span></p>
<p style="text-align: center" align="center"><span style="font-family: 宋体">最大近邻函数</span>Max proximity function<span style="font-family: 宋体">：</span>&nbsp;<img height="30" alt="" src="http://www.blogjava.net/images/blogjava_net/changedi/4.1.JPG" width="197" border="0" />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span style="font-family: 宋体">（</span>4.1<span style="font-family: 宋体">）</span></p>
<p style="text-align: center" align="center"><span style="font-family: 宋体">最小近邻函数</span>Min proximity function<span style="font-family: 宋体">：<img height="30" alt="" src="http://www.blogjava.net/images/blogjava_net/changedi/4.2.JPG" width="197" border="0" /></span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span style="font-family: 宋体">（</span>4.2<span style="font-family: 宋体">）</span></p>
<p style="text-align: center" align="center"><span style="font-family: 宋体">平均近邻函数</span>Average proximity function<span style="font-family: 宋体">：<img height="49" alt="" src="http://www.blogjava.net/images/blogjava_net/changedi/4.3.JPG" width="193" border="0" /></span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span style="font-family: 宋体">（</span>4.3<span style="font-family: 宋体">）</span></p>
<p><span style="font-family: 宋体">其中</span><em>n<sub>c</sub></em><span style="font-family: 宋体">是集合</span>C<span style="font-family: 宋体">的势。</span></p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span style="font-family: 宋体">可以看到，这样的定义在概念理论层次上仍旧将点视作点，将聚类视作集合。另一种情况则是将聚类视作一个点，因为点与点之间的近邻测度已经可以计算，那么将集合视为一个点，就将这个问题归约到了点与点之间的问题了。对聚类进行表达，主要有以下几种表达：</span></p>
<p style="margin-left: 42.75pt; text-indent: -21.75pt">1）<span style="font: 7pt 'Times New Roman'">&nbsp;&nbsp;&nbsp; </span><span style="font-family: 宋体">点表达：将聚类视作一个点，可以是均值点</span>(mean vector)<span style="font-family: 宋体">，也可以是均值中心</span>(mean center)<span style="font-family: 宋体">，也可以是中值中心</span>(median center)<span style="font-family: 宋体">。关于这几个概念和公式，任何的统计教材里都有涉猎，我就不一一枚举了。（主要贴公式真的很累，怀念</span>Tex<span style="font-family: 宋体">）</span></p>
<p style="margin-left: 42.75pt; text-indent: -21.75pt">2）<span style="font: 7pt 'Times New Roman'">&nbsp;&nbsp;&nbsp; </span><span style="font-family: 宋体">超平面表达：线性聚类中常用。不表。有兴趣者去查资料。</span></p>
<p style="margin-left: 42.75pt; text-indent: -21.75pt">3）<span style="font: 7pt 'Times New Roman'">&nbsp;&nbsp;&nbsp; </span><span style="font-family: 宋体">超球面表达：球形聚类中常用。同上。</span></p>
<p style="text-indent: 21pt"><span style="font-family: 宋体">一切的学习都为应用，根据实际应用的不同，我们在定义这种点与集合之间测度时候也有很大的灵活性。</span></p>
<h1>5. <span style="font-family: 宋体">集合与集合之间的测度</span></h1>
<p style="text-indent: 21pt"><span style="font-family: 宋体">同样的，对于集合与集合的测度，可以同点与集合的测度类似。只要记住一点，那就是集合与集合间的近邻测度是建立在点与点之间的测度的基础上的。所以近邻测度的基础在点与点之间。当然聚类结果的优化是一个反复试验的过程，其中也要考虑领域专家的意见。</span></p>
<h1>6. <span style="font-family: 宋体">小结</span></h1>
<p style="text-indent: 21pt"><span style="font-family: 宋体">对于近邻测度的学习，乍一看像是纯数学知识的学习，其实则是对我们开始聚类算法研究之前的一个夯实基础的复习过程。</span></p>
<h1>7. <span style="font-family: 宋体">参考文献及推荐阅读</span></h1>
<p>[1]Pattern Recognition Third Edition, Sergios Theodoridis, Konstantinos Koutroumbas</p>
<p>[2] http://zh.wikipedia.org/wiki/%E6%B5%8B%E5%BA%A6%E8%AE%BA</p>
<p>[3]<span style="font-family: 宋体">模式识别</span><span style="font-family: 宋体">第三版</span>, Sergios Theodoridis, Konstantinos Koutroumbas<span style="font-family: 宋体">著</span>, <span style="font-family: 宋体">李晶皎</span>, <span style="font-family: 宋体">王爱侠</span>, <span style="font-family: 宋体">张广源等译</span></p>
 <img src ="http://www.blogjava.net/changedi/aggbug/309845.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/changedi/" target="_blank">changedi</a> 2010-01-17 13:10 <a href="http://www.blogjava.net/changedi/archive/2010/01/17/309845.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>聚类算法学习笔记（一）——基础</title><link>http://www.blogjava.net/changedi/archive/2010/01/11/308984.html</link><dc:creator>changedi</dc:creator><author>changedi</author><pubDate>Mon, 11 Jan 2010 02:39:00 GMT</pubDate><guid>http://www.blogjava.net/changedi/archive/2010/01/11/308984.html</guid><wfw:comment>http://www.blogjava.net/changedi/comments/308984.html</wfw:comment><comments>http://www.blogjava.net/changedi/archive/2010/01/11/308984.html#Feedback</comments><slash:comments>1</slash:comments><wfw:commentRss>http://www.blogjava.net/changedi/comments/commentRss/308984.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/changedi/services/trackbacks/308984.html</trackback:ping><description><![CDATA[&nbsp;
<h1>0. <span style="font-family: 宋体">引子</span></h1>
<p style="text-indent: 21pt"><span style="font-family: 宋体">传说</span><span style="font-family: 宋体">：&#8220;聚类是人类最原始的精神活动，用于处理他们每天接收到的大量信息&#8221;。为方便广大同学学习使用，将我学习聚类时的笔记整理发布共享。</span></p>
<h1>1. <span style="font-family: 宋体">聚类定义</span></h1>
<p style="text-indent: 21pt"><span style="font-family: 宋体">&#8220;聚类是把相似的对象通过静态分类的方法分成不同的组别或者更多的子集（</span>subset<span style="font-family: 宋体">）</span>,<span style="font-family: 宋体">这样让在同一个子集中的成员对象都有相似的一些属性。&#8221;</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span style="font-family: 宋体">——</span>wikipedia</p>
<p style="text-indent: 21pt"><span style="font-family: 宋体">&#8220;</span><span style="color: black; font-family: 宋体; letter-spacing: 0.4pt">聚类分析指将物理或抽象对象的集合分组成为由类似的对象组成的多个类的分析过程。它是一种重要的人类行为。聚类是将数据分类到不同的类或者簇这样的一个过程，所以同一个簇中的对象有很大的相似性，而不同簇间的对象有很大的相异性。</span><span style="font-family: 宋体">&#8221;</span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span style="font-family: 宋体">——百度百科</span></p>
<p style="text-indent: 21pt"><span style="font-family: 宋体">说白了，聚类（</span>clustering<span style="font-family: 宋体">）是完全可以按字面意思来理解的——将相同、相似、相近、相关的对象实例聚成一类的过程。简单理解，如果一个数据集合包含</span>N<span style="font-family: 宋体">个实例，根据某种准则可以将这</span>N<span style="font-family: 宋体">个实例划分为</span>m<span style="font-family: 宋体">个类别，每个类别中的实例都是相关的，而不同类别之间是区别的也就是不相关的，这个过程就叫聚类了。</span></p>
<p style="text-indent: 21pt"><span style="font-family: 宋体">形式化一点，令<img style="width: 162px; height: 22px" height="22" alt="" src="http://www.blogjava.net/images/blogjava_net/changedi/abc.JPG" width="162" border="0" /></span><span style="font-family: 宋体">，其中的</span><u>x</u><span style="font-family: 宋体">都是向量，一个</span><em>X</em><span style="font-family: 宋体">的</span>m<span style="font-family: 宋体">聚类</span><em>R</em><span style="font-family: 宋体">将</span><em>X</em><span style="font-family: 宋体">分割为</span>m<span style="font-family: 宋体">个集合</span><em>C</em><sub>1</sub><em>, C</em><sub>2</sub><em>,&#8230;,C<sub>m</sub></em><span style="font-family: 宋体">，使其满足下面三个条件：</span></p>
<p style="text-indent: 21pt"><span style="font-family: 宋体">（</span>1<span style="font-family: 宋体">）<img height="22" alt="" src="http://www.blogjava.net/images/blogjava_net/changedi/abcd.JPG" width="162" border="0" /></span></p>
<p style="text-indent: 21pt"><span style="font-family: 宋体">（</span>2<span style="font-family: 宋体">）<img height="37" alt="" src="http://www.blogjava.net/images/blogjava_net/changedi/abcde.JPG" width="70" border="0" /></span></p>
<p style="text-indent: 21pt"><span style="font-family: 宋体">（</span>3<span style="font-family: 宋体">）<img height="28" alt="" src="http://www.blogjava.net/images/blogjava_net/changedi/ff.JPG" width="275" border="0" /></span></p>
<p style="text-indent: 21pt"><span style="font-family: 宋体">满足上述条件的同时，在聚类</span><em>C<sub>i</sub></em><span style="font-family: 宋体">中的向量彼此相似，而与其他类中的向量不相似。</span></p>
<p style="text-indent: 21pt"><span style="font-family: 宋体">但是这种定义也只是定义了确定性的聚类，也叫做硬聚类</span>(hard clustering)<span style="font-family: 宋体">，每个实例</span><em><u>x</u></em><span style="font-family: 宋体">都确定的属于某个聚类。而不确定性聚类，也需要定义，这就引出了模糊聚类</span>(fuzzy clustering)<span style="font-family: 宋体">的概念了。模糊聚类中，每个实例向量</span><em><u>x</u></em><span style="font-family: 宋体">以一定的隶属度属于某个聚类。同上面的设置，</span><em>X</em><span style="font-family: 宋体">的模糊聚类是将</span><em>X</em><span style="font-family: 宋体">分成</span>m<span style="font-family: 宋体">个类，由</span>m<span style="font-family: 宋体">个函数</span><em>u<sub>j</sub></em><span style="font-family: 宋体">表示，其中满足：</span></p>
<p style="text-indent: 21pt"><span style="font-family: 宋体">（</span>1<span style="font-family: 宋体">）<img height="28" alt="" src="http://www.blogjava.net/images/blogjava_net/changedi/fff.JPG" width="214" border="0" /></span></p>
<p style="text-indent: 21pt"><span style="font-family: 宋体">（</span>2<span style="font-family: 宋体">）<img height="44" alt="" src="http://www.blogjava.net/images/blogjava_net/changedi/de.JPG" width="214" border="0" /></span></p>
<p style="text-indent: 21pt"><span style="font-family: 宋体">（</span>3<span style="font-family: 宋体">）<img height="44" alt="" src="http://www.blogjava.net/images/blogjava_net/changedi/def.JPG" width="240" border="0" /></span></p>
<p style="text-indent: 21pt"><span style="font-family: 宋体">其中这个隶属度函数<img height="23" alt="" src="http://www.blogjava.net/images/blogjava_net/changedi/ew.JPG" width="45" border="0" /></span><span style="font-family: 宋体">越接近</span>1<span style="font-family: 宋体">，说明</span><em><u>x</u></em><sub>i</sub><span style="font-family: 宋体">越可能属于</span><em>C<sub>i</sub></em><span style="font-family: 宋体">，反之如果越接近</span>0<span style="font-family: 宋体">，则说明越不可能属于</span><em>C<sub>i</sub></em><span style="font-family: 宋体">。</span></p>
<h1>2. <span style="font-family: 宋体">聚类过程</span></h1>
<p style="text-indent: 21pt"><span style="font-family: 宋体">当我们知道聚类是什么时，我们下一步想知道的应该是怎么进行聚类。这一点，教材上做了详细介绍，补充一点自己理解：</span></p>
<p style="text-indent: 21pt"><span style="font-family: 'Times New Roman PS MT'">1</span><span style="font-family: 宋体">）特征选择</span><span style="font-family: 'Times New Roman PS MT'">(feature selection)</span><span style="font-family: 宋体">：就像其他分类任务一样，特征往往是一切活动的基础，如何选取特征来尽可能的表达需要分类的信息是一个重要问题。表达性强的特征将很影响聚类效果。这点在以后的实验中我会展示。</span></p>
<p style="text-indent: 21pt">2<span style="font-family: 宋体">）近邻测度</span>(proximity measure)<span style="font-family: 宋体">：当选定了实例向量的特征表达后，如何判断两个实例向量相似呢？这个问题是非常关键的一个问题，在聚类过程中也有着决定性的意义，因为聚类本质在区分相似与不相似，而近邻测度就是对这种相似性的一种定义。</span></p>
<p style="text-indent: 21pt">3<span style="font-family: 宋体">）聚类准则</span>(clustering criterion)<span style="font-family: 宋体">：定义了相似性还不够，结合近邻测度，如何判断相似才是关键。直观理解聚类准则这个概念就是何时聚类，何时不聚类的聚类条件。当我们使用聚类算法进行计算时，如何聚类是算法关心的，而聚与否需要一个标准，聚类准则就是这个标准。（话说标准这东西一拿出来，够吓人了吧</span>^_^<span style="font-family: 宋体">）</span></p>
<p style="text-indent: 21pt">4<span style="font-family: 宋体">）聚类算法</span>(clustering algorithm)<span style="font-family: 宋体">：这个东西不用细说了吧，整个学习的重中之重，核心的东西这里不讲，以后会细说，简单开个头——利用近邻测度和聚类准则开始聚类的过程。</span></p>
<p style="text-indent: 21pt">5<span style="font-family: 宋体">）结果验证</span>(validation of the results)<span style="font-family: 宋体">：其实对于</span>PR<span style="font-family: 宋体">的作者提出这个过程也放到聚类任务流程中，我觉得有点冗余，因为对于验证算法的正确性这事应该放到算法层面吧，可以把</span>4<span style="font-family: 宋体">）和</span>5<span style="font-family: 宋体">）结合至一层。因为算法正确和有穷的验证本身就是算法的特性嘛。（谁设计了一个算法不得证明啊）</span></p>
<p style="text-indent: 21pt">6<span style="font-family: 宋体">）</span>(interpretation of the results)<span style="font-family: 宋体">：中文版的</span>PR<span style="font-family: 宋体">上翻译为结果判定，而我感觉字面意思就是结果解释。（聚类最终会将数据集分成若干个类，做事前要有原则，做事后要有解释，这个就是解释了。自圆其说可能是比较好的了</span>^_^<span style="font-family: 宋体">）</span></p>
<p style="text-indent: 21pt"><span style="font-family: 宋体">整个聚类任务详细的东西会在以后详细介绍，这里先细说一下聚类准则（虽然我感觉在上面我说的已经够细了）。举例吧，比如，有这样一个数据集</span><em>X</em><span style="font-family: 宋体">，包含了四名同学的基本信息和数学成绩。</span></p>
<div align="center">
<table style="border-right: medium none; border-top: medium none; border-left: medium none; border-bottom: medium none; border-collapse: collapse" cellspacing="0" cellpadding="0" border="1">
    <tbody>
        <tr>
            <td style="border-right: windowtext 1pt solid; padding-right: 5.4pt; border-top: windowtext 1pt solid; padding-left: 5.4pt; padding-bottom: 0cm; border-left: windowtext 1pt solid; width: 119.7pt; padding-top: 0cm; border-bottom: windowtext 1pt solid" valign="top" width="160">
            <p style="text-align: center" align="center"><strong><span style="font-family: 宋体">姓名</span></strong></p>
            </td>
            <td style="border-right: windowtext 1pt solid; padding-right: 5.4pt; border-top: windowtext 1pt solid; padding-left: 5.4pt; padding-bottom: 0cm; border-left: medium none; width: 119.7pt; padding-top: 0cm; border-bottom: windowtext 1pt solid" valign="top" width="160">
            <p style="text-align: center" align="center"><strong><span style="font-family: 宋体">年级</span></strong></p>
            </td>
            <td style="border-right: windowtext 1pt solid; padding-right: 5.4pt; border-top: windowtext 1pt solid; padding-left: 5.4pt; padding-bottom: 0cm; border-left: medium none; width: 119.7pt; padding-top: 0cm; border-bottom: windowtext 1pt solid" valign="top" width="160">
            <p style="text-align: center" align="center"><strong><span style="font-family: 宋体">班级</span></strong></p>
            </td>
            <td style="border-right: windowtext 1pt solid; padding-right: 5.4pt; border-top: windowtext 1pt solid; padding-left: 5.4pt; padding-bottom: 0cm; border-left: medium none; width: 119.7pt; padding-top: 0cm; border-bottom: windowtext 1pt solid" valign="top" width="160">
            <p style="text-align: center" align="center"><strong><span style="font-family: 宋体">数学成绩</span></strong></p>
            </td>
        </tr>
        <tr>
            <td style="border-right: windowtext 1pt solid; padding-right: 5.4pt; border-top: medium none; padding-left: 5.4pt; padding-bottom: 0cm; border-left: windowtext 1pt solid; width: 119.7pt; padding-top: 0cm; border-bottom: windowtext 1pt solid" valign="top" width="160">
            <p style="text-align: center" align="center"><span style="font-family: 宋体">张三</span></p>
            </td>
            <td style="border-right: windowtext 1pt solid; padding-right: 5.4pt; border-top: medium none; padding-left: 5.4pt; padding-bottom: 0cm; border-left: medium none; width: 119.7pt; padding-top: 0cm; border-bottom: windowtext 1pt solid" valign="top" width="160">
            <p style="text-align: center" align="center">1</p>
            </td>
            <td style="border-right: windowtext 1pt solid; padding-right: 5.4pt; border-top: medium none; padding-left: 5.4pt; padding-bottom: 0cm; border-left: medium none; width: 119.7pt; padding-top: 0cm; border-bottom: windowtext 1pt solid" valign="top" width="160">
            <p style="text-align: center" align="center">2</p>
            </td>
            <td style="border-right: windowtext 1pt solid; padding-right: 5.4pt; border-top: medium none; padding-left: 5.4pt; padding-bottom: 0cm; border-left: medium none; width: 119.7pt; padding-top: 0cm; border-bottom: windowtext 1pt solid" valign="top" width="160">
            <p style="text-align: center" align="center">99</p>
            </td>
        </tr>
        <tr>
            <td style="border-right: windowtext 1pt solid; padding-right: 5.4pt; border-top: medium none; padding-left: 5.4pt; padding-bottom: 0cm; border-left: windowtext 1pt solid; width: 119.7pt; padding-top: 0cm; border-bottom: windowtext 1pt solid" valign="top" width="160">
            <p style="text-align: center" align="center"><span style="font-family: 宋体">李四</span></p>
            </td>
            <td style="border-right: windowtext 1pt solid; padding-right: 5.4pt; border-top: medium none; padding-left: 5.4pt; padding-bottom: 0cm; border-left: medium none; width: 119.7pt; padding-top: 0cm; border-bottom: windowtext 1pt solid" valign="top" width="160">
            <p style="text-align: center" align="center">2</p>
            </td>
            <td style="border-right: windowtext 1pt solid; padding-right: 5.4pt; border-top: medium none; padding-left: 5.4pt; padding-bottom: 0cm; border-left: medium none; width: 119.7pt; padding-top: 0cm; border-bottom: windowtext 1pt solid" valign="top" width="160">
            <p style="text-align: center" align="center">2</p>
            </td>
            <td style="border-right: windowtext 1pt solid; padding-right: 5.4pt; border-top: medium none; padding-left: 5.4pt; padding-bottom: 0cm; border-left: medium none; width: 119.7pt; padding-top: 0cm; border-bottom: windowtext 1pt solid" valign="top" width="160">
            <p style="text-align: center" align="center">95</p>
            </td>
        </tr>
        <tr>
            <td style="border-right: windowtext 1pt solid; padding-right: 5.4pt; border-top: medium none; padding-left: 5.4pt; padding-bottom: 0cm; border-left: windowtext 1pt solid; width: 119.7pt; padding-top: 0cm; border-bottom: windowtext 1pt solid" valign="top" width="160">
            <p style="text-align: center" align="center"><span style="font-family: 宋体">张飞</span></p>
            </td>
            <td style="border-right: windowtext 1pt solid; padding-right: 5.4pt; border-top: medium none; padding-left: 5.4pt; padding-bottom: 0cm; border-left: medium none; width: 119.7pt; padding-top: 0cm; border-bottom: windowtext 1pt solid" valign="top" width="160">
            <p style="text-align: center" align="center">3</p>
            </td>
            <td style="border-right: windowtext 1pt solid; padding-right: 5.4pt; border-top: medium none; padding-left: 5.4pt; padding-bottom: 0cm; border-left: medium none; width: 119.7pt; padding-top: 0cm; border-bottom: windowtext 1pt solid" valign="top" width="160">
            <p style="text-align: center" align="center">1</p>
            </td>
            <td style="border-right: windowtext 1pt solid; padding-right: 5.4pt; border-top: medium none; padding-left: 5.4pt; padding-bottom: 0cm; border-left: medium none; width: 119.7pt; padding-top: 0cm; border-bottom: windowtext 1pt solid" valign="top" width="160">
            <p style="text-align: center" align="center">59</p>
            </td>
        </tr>
        <tr>
            <td style="border-right: windowtext 1pt solid; padding-right: 5.4pt; border-top: medium none; padding-left: 5.4pt; padding-bottom: 0cm; border-left: windowtext 1pt solid; width: 119.7pt; padding-top: 0cm; border-bottom: windowtext 1pt solid" valign="top" width="160">
            <p style="text-align: center" align="center"><span style="font-family: 宋体">赵云</span></p>
            </td>
            <td style="border-right: windowtext 1pt solid; padding-right: 5.4pt; border-top: medium none; padding-left: 5.4pt; padding-bottom: 0cm; border-left: medium none; width: 119.7pt; padding-top: 0cm; border-bottom: windowtext 1pt solid" valign="top" width="160">
            <p style="text-align: center" align="center">2</p>
            </td>
            <td style="border-right: windowtext 1pt solid; padding-right: 5.4pt; border-top: medium none; padding-left: 5.4pt; padding-bottom: 0cm; border-left: medium none; width: 119.7pt; padding-top: 0cm; border-bottom: windowtext 1pt solid" valign="top" width="160">
            <p style="text-align: center" align="center">1</p>
            </td>
            <td style="border-right: windowtext 1pt solid; padding-right: 5.4pt; border-top: medium none; padding-left: 5.4pt; padding-bottom: 0cm; border-left: medium none; width: 119.7pt; padding-top: 0cm; border-bottom: windowtext 1pt solid" valign="top" width="160">
            <p style="text-align: center" align="center">90</p>
            </td>
        </tr>
    </tbody>
</table>
</div>
<p style="text-indent: 21pt"><span style="font-family: 宋体">聚类准则就是一个分类标准，对于示例中这样一个数据集合，如何聚类呢。当然聚类的可能情况有很多。比如，如果我们按照年级是否为大于</span>1<span style="font-family: 宋体">来分类，那么数据集</span>X<span style="font-family: 宋体">分为两类：</span>{<span style="font-family: 宋体">张三</span>}<span style="font-family: 宋体">，</span>{<span style="font-family: 宋体">李四，张飞，赵云</span>}<span style="font-family: 宋体">；如果按照班级不同来分，分为两类：</span>{<span style="font-family: 宋体">张三，李四</span>}<span style="font-family: 宋体">，</span>{<span style="font-family: 宋体">张飞，赵云</span>}<span style="font-family: 宋体">；如果按照成绩是否及格来分（假设及格为</span>60<span style="font-family: 宋体">分），分两类：</span>{<span style="font-family: 宋体">张三，李四，赵云</span>}<span style="font-family: 宋体">，</span>{<span style="font-family: 宋体">张飞</span>}<span style="font-family: 宋体">。当然聚类准则的设计往往是复杂的，就看你想怎么划分了。按照对分类思想的几何理解，数据集相当于样本空间，数据实例的特征数（本例共有</span>4<span style="font-family: 宋体">个特征</span>[<span style="font-family: 宋体">姓名，年级，班级，数学成绩</span>]<span style="font-family: 宋体">）相当于空间维度，而实例向量对应到空间中的一个点。那么聚类准则就应该是那些神奇的超平面（对应有数学函数表达式，我个人认为这些函数就等同于聚类准则），这些超平面将数据&#8220;完美的&#8221;分离开了。</span></p>
<h1>3. <span style="font-family: 宋体">聚类特征类型</span></h1>
<p style="text-indent: 21pt"><span style="font-family: 宋体">聚类时用到的特征如何区分呢，有什么类型要求？聚类的特征按照域划分，可以分为连续的特征和离散特征。其中连续特征对应的定义域是数据空间</span>R<span style="font-family: 宋体">的连续子空间，而离散特征对应的是离散子集，另外如果离散特征只包含两个特征值，那么这个离散特征又叫二值特征。</span></p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span style="font-family: 宋体">根据特征取值的相对意义又可以将特征分为以下四种：标量的</span>(Nominal)<span style="font-family: 宋体">，顺序的</span>(Ordinal)<span style="font-family: 宋体">，区间尺度的</span>(Interval-scaled)<span style="font-family: 宋体">以及比率尺度的</span>(Ratio-scaled)<span style="font-family: 宋体">。其中，标量特征用于编码一类特征的可能状态，比如人的性别，编码为男和女；天气状况编码为阴、晴和雨等。顺序特征同标量特征类似，同样是一系列状态的编码，只是对这些编码稍加约束，即编码顺序是有意义的，比如对一道菜，它的特征有</span>{<span style="font-family: 宋体">很难吃，难吃，一般，好吃，美味</span>}<span style="font-family: 宋体">几个值来定义状态，但是这些状态是有顺序意义的。这类特征我认为就是标量特征的一个特定子集，或者是一个加约束的标量特征。区间尺度特征表示该特征数值之间的区间有意义而数值的比率无意义，经典例子就是温度，</span>A<span style="font-family: 宋体">地的温度（</span>20<span style="font-family: 宋体">℃）比</span>B<span style="font-family: 宋体">地（</span>15<span style="font-family: 宋体">℃）高</span>5<span style="font-family: 宋体">度，这里的区间差值是有意义的，但你不能说</span>A<span style="font-family: 宋体">地比</span>B<span style="font-family: 宋体">地热</span>1/3<span style="font-family: 宋体">，这是无意义的。比率特征与此相反，其比率是有意义的，经典例子是重量，</span>C<span style="font-family: 宋体">重</span>100g<span style="font-family: 宋体">，</span>D<span style="font-family: 宋体">重</span>50g<span style="font-family: 宋体">，那么</span>C<span style="font-family: 宋体">比</span>D<span style="font-family: 宋体">重</span>2<span style="font-family: 宋体">倍，这是有意义的。（当然说</span>C<span style="font-family: 宋体">比</span>D<span style="font-family: 宋体">重</span>50g<span style="font-family: 宋体">也是可以的，因此可以认为区间尺度是比率尺度的一个真子集）。</span></p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span style="font-family: 宋体">在常见应用中，包括我们平日关心的编程实现中，一般只定义</span>nominal<span style="font-family: 宋体">特征和</span>numeric<span style="font-family: 宋体">特征，其中</span>nominal<span style="font-family: 宋体">可以用</span>string<span style="font-family: 宋体">来表示，而</span>numeric<span style="font-family: 宋体">可以用</span>number<span style="font-family: 宋体">来表示。（</span>weka<span style="font-family: 宋体">中的</span>attribute<span style="font-family: 宋体">的特征类型就是这么定义的）</span></p>
<h1>4. <span style="font-family: 宋体">聚类分析的应用</span></h1>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span style="font-family: 宋体">说了这么多基本概念，最实际的话题莫过于应用了。就像为聚类做广告一样，到底我们可以在哪里应用它呢。就像引言里我提到的传说一样，分类作为人类识别对象的一个基本活动大概与人类的意识共同存在着，也可以说人类智能认识的本质活动之一就是分类。而研究者对分类的研究又将分类划分为有监督与无监督，其中聚类就是无监督分类的最常用方法也是绝对代表性方法。设想一下，对于一组数据，或者一堆信息，计算机可以自动地将其分为若干类，那这对于辅助人类智能来说绝对是必要的也是有意义的。所以聚类的一个核心应用就是数据挖掘与模式识别。另外各个科学领域只要涉及到分类的任务，大家无不联想到聚类</span>~~~<span style="font-family: 宋体">（话说我第一次正式地解除聚类，还是在</span>23<span style="font-family: 宋体">教学楼听一个貌似是自动化的教授讲的信息化课程）。而学者比较权威的分类将聚类的应用分为四个基本的方向：</span>1<span style="font-family: 宋体">）数据去冗，即将海量数据中的冗余信息去除。</span>2<span style="font-family: 宋体">）假说生成，为了推导出数据的某些性质，我们可以对数据进行聚类分析。</span>3<span style="font-family: 宋体">）假说检验，其实就是通过聚类分析来验证某个决策的风险程度。</span>4<span style="font-family: 宋体">）基于分组的预测，同所有预测任务一样，将已有的数据都聚类分类后，新的未来数据可以用同样的规则进行识别预测其所属分类。</span></p>
<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <span style="font-family: 宋体">聚类的应用非常广泛，如果按科目枚举，我是懒得罗列了。只要知道了其原理和目标，其应用领域也就自然理解了。</span></p>
<h1>5. <span style="font-family: 宋体">小结</span></h1>
<p style="text-indent: 21pt"><span style="font-family: 宋体">聚类的基本概念就是这么些了，关于聚类的学习和研究已经历经几十年，可以庆幸的一点是这里的学习我们可以站在很多巨人的肩膀上，而如何去改进创新扩展应用，那就是我们未来的目的，&#8220;工欲善其事，必先利其器&#8221;，这里聚类就是我们的&#8220;器&#8221;了。</span></p>
<h1>6. <span style="font-family: 宋体">参考文献及推荐阅读</span></h1>
<p>[1]Pattern Recognition Third Edition, Sergios Theodoridis, Konstantinos Koutroumbas</p>
<p>[2] <a href="http://baike.baidu.com/view/903740.htm?fr=ala0_1_1">http://baike.baidu.com/view/903740.htm?fr=ala0_1_1</a></p>
<p>[3] <a href="http://zh.wikipedia.org/zh-cn/%E6%95%B0%E6%8D%AE%E8%81%9A%E7%B1%BB">http://zh.wikipedia.org/zh-cn/%E6%95%B0%E6%8D%AE%E8%81%9A%E7%B1%BB</a></p>
<p>[4]<span style="font-family: 宋体">数据挖掘</span><span style="font-family: 宋体">概念与技术</span>(Data mining concepts and techniques) Jiawei Han, Micheline Kamber<span style="font-family: 宋体">著</span><span style="font-family: 宋体">范明</span>, <span style="font-family: 宋体">孟小峰译</span></p>
<p>[5]<span style="font-family: 宋体">模式识别</span><span style="font-family: 宋体">第三版</span>, Sergios Theodoridis, Konstantinos Koutroumbas<span style="font-family: 宋体">著</span>, <span style="font-family: 宋体">李晶皎</span>, <span style="font-family: 宋体">王爱侠</span>, <span style="font-family: 宋体">张广源等译</span></p>
<p>[6]<span style="font-family: 宋体">数据挖掘导论</span>(Introduction to data mining) Pang-Ning Tan, Michael Steinbach, Vipin Kumar<span style="font-family: 宋体">著</span><span style="font-family: 宋体">范明</span>, <span style="font-family: 宋体">范宏建</span><span style="font-family: 宋体">等译</span></p>
<p>[7]<span style="font-family: 宋体">数据挖掘实用机器学习技术</span> (Data mining practical machine learning tools and techniques) Ian H.Witten, Eibe Frank<span style="font-family: 宋体">著</span><span style="font-family: 宋体">董琳</span><span style="font-family: 宋体">等译<br />
<br />
<br />
<br />
文章转载请标明~~~</span></p>
<img src ="http://www.blogjava.net/changedi/aggbug/308984.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/changedi/" target="_blank">changedi</a> 2010-01-11 10:39 <a href="http://www.blogjava.net/changedi/archive/2010/01/11/308984.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item></channel></rss>