﻿<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/"><channel><title>BlogJava-SIMONE-随笔分类-hadoop mahout</title><link>http://www.blogjava.net/wangxinsh55/category/54635.html</link><description /><language>zh-cn</language><lastBuildDate>Thu, 04 Dec 2014 17:46:44 GMT</lastBuildDate><pubDate>Thu, 04 Dec 2014 17:46:44 GMT</pubDate><ttl>60</ttl><item><title>Hadoop2.2.0构建mahout环境</title><link>http://www.blogjava.net/wangxinsh55/archive/2014/12/04/421047.html</link><dc:creator>SIMONE</dc:creator><author>SIMONE</author><pubDate>Thu, 04 Dec 2014 09:24:00 GMT</pubDate><guid>http://www.blogjava.net/wangxinsh55/archive/2014/12/04/421047.html</guid><wfw:comment>http://www.blogjava.net/wangxinsh55/comments/421047.html</wfw:comment><comments>http://www.blogjava.net/wangxinsh55/archive/2014/12/04/421047.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/wangxinsh55/comments/commentRss/421047.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/wangxinsh55/services/trackbacks/421047.html</trackback:ping><description><![CDATA[<div>http://www.cnblogs.com/weiqiang-liu/p/3791330.html</div><br /><br /><div><p>一：下载软件包</p> <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;下载链接：</p> <div> <pre><span style="color: #008080;">1</span> http://mirrors.hust.edu.cn/apache/mahout/0.9/</pre> </div> <p>&nbsp;</p> <p>&nbsp;</p> <p>二：解压文件</p> <p>&nbsp;</p> <div> <pre><span style="color: #0000ff;">tar</span> -zxvf  mahout-distribution-<span style="color: #800080;">0.9</span>-src.<span style="color: #0000ff;">tar</span>.gz -C /usr/share/<br /><span style="color: #0000ff;">tar</span> -zxvf  mahout-distribution-<span style="color: #800080;">0.9</span>.<span style="color: #0000ff;">tar</span>.gz  -C /usr/share/</pre> </div> <p>&nbsp;</p> <p>&nbsp;</p> <p>三：编译源码</p> <p>&nbsp;&nbsp;&nbsp; 1.cd&nbsp;/usr/share/mahout-distribution-0.9-src</p> <p>&nbsp;&nbsp;&nbsp; 2.打补丁:下载补丁文件，然后使用patch命令打补丁</p> <p>&nbsp; &nbsp;</p> <div> <pre>Wget https:<span style="color: #008000;">//</span><span style="color: #008000;">issues.apache.org/jira/secure/attachment/12629768/1329.patch</span> <br /><span style="color: #0000ff;">patch</span> -p0 &lt; <span style="color: #800080;">1329</span>.<span style="color: #0000ff;">patch</span></pre> </div> <p>&nbsp;</p> <p>&nbsp;</p> <p>&nbsp;&nbsp;&nbsp; 3.编译</p> <p>&nbsp;<span style="color: #008080;">1 mvn clean package -Dhadoop.profile=<span style="color: #800080;">200</span> -Dhadoop.<span style="color: #800080;">2</span>.version=<span style="color: #800080;">2.2</span>.<span style="color: #800080;">0</span> -Dhbase.version=<span style="color: #800080;">0.68</span>.<span style="color: #800080;">0</span>-hadoop2 -DskipTests</span>&nbsp;</p> <p>&nbsp;</p> <p>四：替换jar包</p> <p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 用刚编译的jar文件替换(/usr/share/mahout-distribution-0.9)目录下的jar,共6个。</p> <div sh-gutter"=""> <div><div id="highlighter_968817"  java"=""><table border="0" cellpadding="0" cellspacing="0"><tbody><tr><td><div number1="" index0=""  alt2"="">1</div><div number2="" index1=""  alt1"="">2</div><div number3="" index2=""  alt2"="">3</div><div number4="" index3=""  alt1"="">4</div><div number5="" index4=""  alt2"="">5</div><div number6="" index5=""  alt1"="">6</div><div number7="" index6=""  alt2"="">7</div><div number8="" index7=""  alt1"="">8</div><div number9="" index8=""  alt2"="">9</div><div number10="" index9=""  alt1"="">10</div><div number11="" index10=""  alt2"="">11</div></td><td><div><div number1="" index0=""  alt2"=""><code plain"="">mahout-core-</code><code value"="">0.9</code><code plain"="">.jar</code></div><div number2="" index1=""  alt1"="">&nbsp;</div><div number3="" index2=""  alt2"=""><code plain"="">mahout-core-</code><code value"="">0.9</code><code plain"="">-job.jar</code></div><div number4="" index3=""  alt1"="">&nbsp;</div><div number5="" index4=""  alt2"=""><code plain"="">mahou -examples-</code><code value"="">0.9</code><code plain"="">.jar</code></div><div number6="" index5=""  alt1"="">&nbsp;</div><div number7="" index6=""  alt2"=""><code plain"="">mahout-examples-</code><code value"="">0.9</code><code plain"="">-job.jar</code></div><div number8="" index7=""  alt1"="">&nbsp;</div><div number9="" index8=""  alt2"=""><code plain"="">mahout-integration-</code><code value"="">0.9</code><code plain"="">.jar</code></div><div number10="" index9=""  alt1"="">&nbsp;</div><div number11="" index10=""  alt2"=""><code plain"="">mahout-math-</code><code value"="">0.9</code><code plain"="">.jar</code></div></div></td></tr></tbody></table></div></div> </div> <p>&nbsp;</p></div><img src ="http://www.blogjava.net/wangxinsh55/aggbug/421047.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/wangxinsh55/" target="_blank">SIMONE</a> 2014-12-04 17:24 <a href="http://www.blogjava.net/wangxinsh55/archive/2014/12/04/421047.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Mahout基于Hadoop平台建立的推荐器说明</title><link>http://www.blogjava.net/wangxinsh55/archive/2014/12/04/421032.html</link><dc:creator>SIMONE</dc:creator><author>SIMONE</author><pubDate>Thu, 04 Dec 2014 06:39:00 GMT</pubDate><guid>http://www.blogjava.net/wangxinsh55/archive/2014/12/04/421032.html</guid><wfw:comment>http://www.blogjava.net/wangxinsh55/comments/421032.html</wfw:comment><comments>http://www.blogjava.net/wangxinsh55/archive/2014/12/04/421032.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/wangxinsh55/comments/commentRss/421032.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/wangxinsh55/services/trackbacks/421032.html</trackback:ping><description><![CDATA[<div>http://www.linuxidc.com/Linux/2012-07/65008.htm</div><br /><br /><div><p>推荐器实现类在：</p> <p>org.apache.mahout.cf.taste.<a href="http://www.linuxidc.com/topicnews.aspx?tid=13" target="_blank" title="Hadoop">Hadoop</a>.item.RecommenderJob。其输入数据放在默认输入目录下，使用mapred.input.dir参数指定的输入数据，是userID,itemID[,preferencevalue]值对形成的文本文件。可以有多个文件存放在该目录下。</p> <p>运行时相关参数如下：</p> <p>numRecommendations：为每个用户产生的推荐个数 "Number of recommendations per user" </p> <p>usersFile：包含待推荐用户的用户ID列表； </p> <p>itemsFile：包含待推荐项目的项目ID列表； </p> <p>filterFile：用来做推荐过滤的训练文件，内容为使用逗号分隔的userID,itemID对， </p> <p>booleanData：不带推荐值的训练数据文件； </p> <p>maxPrefsPerUser：Maximum number of preferences considered per user in final recommendation phase； </p> <p>minPrefsPerUser：ignore users with less preferences than this in the  similarity computation ；&nbsp;&nbsp; maxSimilaritiesPerItem：Maximum number of  similarities considered per item； </p> <p>maxurrencesPerItem：try to cap the number of urrences per item to this; </p> <p>similarityClassname：Name of distributed similarity class to  instantiate, alternatively use one of the predefined  similarities，可用的相似度类有： </p> <p>&nbsp;SIMILARITY_URRENCE(DistributedurrenceVectorSimilarity.class), </p> <p>&nbsp;SIMILARITY_EUCLIDEAN_DISTANCE(DistributedEuclideanDistanceVectorSimilarity.class), </p> <p>&nbsp;SIMILARITY_LOGLIKELIHOOD(DistributedLoglikelihoodVectorSimilarity.class), </p> <p>&nbsp;SIMILARITY_PEARSON_CORRELATION(DistributedPearsonCorrelationVectorSimilarity.class), </p> <p>&nbsp;SIMILARITY_TANIMOTO_COEFFICIENT(DistributedTanimotoCoefficientVectorSimilarity.class), </p> <p>&nbsp;SIMILARITY_UNCENTERED_COSINE(DistributedUncenteredCosineVectorSimilarity.class), </p> <p>&nbsp;SIMILARITY_UNCENTERED_ZERO_ASSUMING_COSINE(DistributedUncenteredZeroAssumingCosineVectorSimilarity.class), </p> <p>&nbsp;SIMILARITY_CITY_BLOCK(DistributedCityBlockVectorSimilarity.class); </p> <p>RecommendJob运行一系列MR任务，在开发时，可以根据自己的需要进行改写。但是RecommendJob申明成final，这个比较头疼。</p> <p>1.itemIDIndex 任务： </p> <p>map：解析输入的itemsFile；将长整型的ID通过算法映射到整形的序号上，以便后续处理。由于处理中涉及到矩阵计算，每一个项目对应矩阵中的一个维度，所以必须处理成整形；产生序号-ID值对； </p> <p>reducer：对序号-ID对进行验证，产生序号-ID值对； </p> <p>2.toUserVector任务： </p> <p>ToItemPrefsMapper：从filterFile中读取偏好信息，转成用户-偏好值对。 </p> <p>ToUserVectorReducer：将用户-偏好*，转成用户-偏好矢量对，矢量表即为所有的ItemID。</p> <p>3.countUsers 任务：计算用户数量，输出为用户数量--空。 </p> <p>4.maybePruneAndTransponse，一个名称很奇怪的任务。 </p> <p>MaybePruneRowsMapper:输入为任务2的输出，生成针对每个item项目的推荐值矩阵单元，即Item序号和矩阵单元的值对。 </p> <p>ToItemVectorsReducer：输出为矩阵行号（即Item序号）-矩阵行矢量 </p> <p>5. RowSimilarityJob: 计算相似度矩阵：这是引用一个现有的任务来完成计算，输入为任务4输出的矩阵；输出为相似度矩阵，即item-相似度矢量。其中相似度矢量是当前item和其他item的相似度值形成的矢量。 </p> <p>6. prePartialMultiply1：输入为任务5的输出，将相似度矩阵中的对角线行，即（N,N）数值设置为Double.NaN,为后续计算做准备； </p> <p>7. prePartialMultiply2：输入为任务2的输出，将user-（项目矢量），拆分成item-(userId, 推荐值)对。如果设置了usersFile，则仅处理usersFile中指定的用户。 </p> <p>8. partialMultiply: 合并任务6和7的额输出，变成item-(相似度矢量、userId、推荐值) 对。 </p> <p>9. itemFiltering：如果有filterFile,则处理filterFile文件，转换成item-(相似度矢量、userId、推荐值)对。其中相似度矢量的值为0； </p> <p>aggregateAndRecommend：将8和9的输出合并作为输入， </p> <p>PartialMultiplyMapper: 将item-(相似度矢量、userId、推荐值)集转换成userId-(推荐值，相似度矢量)值对； </p> <p>AggregateAndRecommendReducer：汇总map输出，产生userId-（(itemId,  推荐值)列表）值对，其中(itemId,  推荐值)列表是按照推荐度来排序，如果maxPrefsPerUser、minPrefsPerUser、maxurrencesPerItem，则只产 生符合条件的userId值对。 </p></div><img src ="http://www.blogjava.net/wangxinsh55/aggbug/421032.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/wangxinsh55/" target="_blank">SIMONE</a> 2014-12-04 14:39 <a href="http://www.blogjava.net/wangxinsh55/archive/2014/12/04/421032.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>mahout网摘</title><link>http://www.blogjava.net/wangxinsh55/archive/2014/12/03/421004.html</link><dc:creator>SIMONE</dc:creator><author>SIMONE</author><pubDate>Wed, 03 Dec 2014 09:50:00 GMT</pubDate><guid>http://www.blogjava.net/wangxinsh55/archive/2014/12/03/421004.html</guid><wfw:comment>http://www.blogjava.net/wangxinsh55/comments/421004.html</wfw:comment><comments>http://www.blogjava.net/wangxinsh55/archive/2014/12/03/421004.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/wangxinsh55/comments/commentRss/421004.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/wangxinsh55/services/trackbacks/421004.html</trackback:ping><description><![CDATA[hadoop2.2+mahout0.9实战 <br /><div>http://blog.csdn.net/fansy1990/article/details/23261633</div><br />Mahout推荐算法API详解 <br /><div>http://blog.csdn.net/zhoubl668/article/details/13297663<br /><br />Mahout学习之Mahout简介、安装、配置、入门程序测试及相关推荐<br /><div>http://itindex.net/detail/49323-mahout-%E5%AD%A6%E4%B9%A0-mahout</div><br />mahout的推荐引擎<br /><div>http://eric-gcm.iteye.com/category/266011</div><br />mahout推荐<br /><div>http://www.cnblogs.com/jsunday/category/598231.html</div><br /> 基于MapReduce的ItemBase推荐算法的共现矩阵实现<br /><div>http://zengzhaozheng.blog.51cto.com/8219051/1557054<br /><br />mahout learning 代码示例<br /><div>http://www.cnblogs.com/wentingtu/archive/2012/03/28/2422450.html<br /><br />Mahout中相似度计算方法介绍 <br /><div>http://blog.csdn.net/samxx8/article/details/7691868</div></div></div></div><img src ="http://www.blogjava.net/wangxinsh55/aggbug/421004.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/wangxinsh55/" target="_blank">SIMONE</a> 2014-12-03 17:50 <a href="http://www.blogjava.net/wangxinsh55/archive/2014/12/03/421004.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item></channel></rss>