﻿<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/"><channel><title>BlogJava-paulwong-随笔分类-STORM</title><link>http://www.blogjava.net/paulwong/category/53883.html</link><description /><language>zh-cn</language><lastBuildDate>Sun, 04 Jan 2015 17:23:29 GMT</lastBuildDate><pubDate>Sun, 04 Jan 2015 17:23:29 GMT</pubDate><ttl>60</ttl><item><title>HADOOP各种框架应用领域</title><link>http://www.blogjava.net/paulwong/archive/2015/01/04/422020.html</link><dc:creator>paulwong</dc:creator><author>paulwong</author><pubDate>Sun, 04 Jan 2015 04:57:00 GMT</pubDate><guid>http://www.blogjava.net/paulwong/archive/2015/01/04/422020.html</guid><wfw:comment>http://www.blogjava.net/paulwong/comments/422020.html</wfw:comment><comments>http://www.blogjava.net/paulwong/archive/2015/01/04/422020.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/paulwong/comments/commentRss/422020.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/paulwong/services/trackbacks/422020.html</trackback:ping><description><![CDATA[***** Data Analytics : Technology Area *****<br />1. Real Time Analytics	: Apache Storm<br />2. In-memory Analytics	: Apache Spark<br />3. Search Analytics	: Apache Elastic search, SOLR<br />4. Log Analytics	: Apache ELK Stack,ESK Stack(Elastic Search, Log <br />Stash, Spark Streaming, Kibana)<br />5. Batch Analytics	: Apache MapReduce<br /><br />***** NO SQL DB *****	<br />1. MongoDB	<br />2. Hbase	<br />3. Cassandra	<br /><br />***** SOA *****	<br />1. Oracle SOA	<br />2. JBoss SOA	<br />3. TiBco SOA	<br />4. SOAP, RESTful Webservices&nbsp;<img src ="http://www.blogjava.net/paulwong/aggbug/422020.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/paulwong/" target="_blank">paulwong</a> 2015-01-04 12:57 <a href="http://www.blogjava.net/paulwong/archive/2015/01/04/422020.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>最火爆的开源流式系统Storm vs 新星Samza</title><link>http://www.blogjava.net/paulwong/archive/2014/12/02/420922.html</link><dc:creator>paulwong</dc:creator><author>paulwong</author><pubDate>Tue, 02 Dec 2014 07:03:00 GMT</pubDate><guid>http://www.blogjava.net/paulwong/archive/2014/12/02/420922.html</guid><wfw:comment>http://www.blogjava.net/paulwong/comments/420922.html</wfw:comment><comments>http://www.blogjava.net/paulwong/archive/2014/12/02/420922.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/paulwong/comments/commentRss/420922.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/paulwong/services/trackbacks/420922.html</trackback:ping><description><![CDATA[<div id="article_content" style="margin: 20px 0px 0px; line-height: 26px; font-family: Arial; color: #333333; background-color: #ffffff;"><p>分布计算系统框架，按照数据集的特点来说，主要分为data-flow和streaming两种。data-flow主要是以数据块为数据源来处理数据，代表有：MR、Spark等，我称作它们为大数据，而streaming主要是处理单位内得到的数据，这种方式，更注重于实时性，主要包括Strom、JStorm和Samza等，我称作它们为快数据。</p><p>在这篇文章中，我主要谈论streaming相关的框架。</p><p>第一个是Storm，一个实时计算系统，它假定数据源是动态的，可以向流水一样处理数据。</p><p>它的特点是：低延迟、高性能、分布式、可扩展和容错性。</p><p>架构如下图所示。</p><p><img src="http://img.blog.csdn.net/20131124220021015?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvaGxqbHpjMjAwNw==/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/SouthEast" alt="" style="border: none; max-width: 100%;" /><br /></p><p>&nbsp;</p><p>Storm的具体概念可以参照：<a target="_blank" href="http://blog.csdn.net/hljlzc2007/article/details/12976211" style="color: #336699; text-decoration: none;">http://blog.csdn.net/hljlzc2007/article/details/12976211</a>，这里不做具体介绍。</p><p>Storm目前算是最最稳定的开源流式处理框架，但是个人认为它有两个问题。</p><p>1. Storm虽然支持多个语言编写spout和bolt端的代码，但是它的主要技术实现是clojure，这给玩大数据、开源的朋友带来了极大的不变，因为大家会的语言不是以java和C++等大众语言为主，这样的话，变得不可控了，难以深入了解、修改其细节。</p><p>2. Storm可以支持在Yarn(Hadoop 2.0)上，可以和其他开源框架共享Hadoop集群的资源，但是性能不佳，这个有待Storm改善</p><p>当然无论如何，Storm依然是目前开源流式处理框架的王者。</p><p>第二个我想说的是JStorm，这个是阿里做的，算是Storm的另一个实现，它用的语言是Java.</p><p>特点：</p><p>1. 客户端的API与Storm基本上是一致的，如果从Storm迁移过来，不需要修改bolt和spout的代码</p><p>2. Jstrom比Strom稳定，速度更快</p><p>3. 提供了一些新的特性</p><p>大家有兴趣可以去玩玩，项目地址<a target="_blank" href="https://github.com/alibaba/jstorm" style="color: #336699; text-decoration: none;">https://github.com/alibaba/jstorm</a>&nbsp;</p><p>第三个是Samza</p><p>Samza是由LinkedIn开源的一个技术，它是一个开源的分布式流处理系统，非常类似于Storm。不同的是它运行在Hadoop之上，并且使用了自己开发的Kafka分布式消息处理系统。</p><p>这是Linkin开发的一个小而美的项目，如何美呢？</p><p>1. 只有几千行代码，完成的功能就可以和Storm媲美，当然目前还有很多的不足</p><p>2. 和Kafka结合紧密，更方便的处理数据</p><p>3. 运行在Yarn上</p><p>之前我做过的一个项目，是Kafka + Storm + ElasticSearch，将来完全可以将Storm替换成Samza，这样的话，还可以利用Hadoop集群的资源，做一些存储、离线分析的功能。将实时处理和离线分析都运行在Hadoop上，不得不说Samza是一个伟大的项目，这样可以减少项目的增长复杂度，利于维护，还是那句话，小而美的东西，更受欢迎一些。</p><p>架构：</p><p>Samza主要包含三层，</p><p>1. 流处理层 --&gt; Kafka</p><p>2. 执行层 &nbsp; &nbsp;&nbsp;--&gt; YARN</p><p>3. 处理层 &nbsp; &nbsp;--&gt; Samza API</p><p>Samza的流处理层和执行层都是可插拔式的，开发人员可以使用其他框架来替代，不局限于上述两种技术。</p><p>Samza提供了一个YARN ApplicationMaster，和YARN job，运行在集群之外，下图中不同颜色代表不同的主机。</p><p>Samza客户端告诉YARN的Resouce Manager，它想启动一个Samza job， YARN RM 告诉YARN Node manager，分配空间给YARN ApplicationMaster，NM指定完空间后，YARN container会运行Samza Task Runner。</p><p><img src="http://img.blog.csdn.net/20131124224344562?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvaGxqbHpjMjAwNw==/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/SouthEast" alt="" style="border: none; max-width: 100%;" /><br /></p><p>Samza状态管理</p><p>流式处理数据对状态的管理是很难的，由于数据是流动的，本身没有状态，这样就需要靠历史数据来记录应用的场合，Samza提供了一个内部的key-value数据库，它是基于LevelDB，运行的JVM之外的，使用它来存储历史数据。这样的做的好处是：</p><p>1. 减少JVM的开销</p><p>2. 使用内部存储，极大提高的吞吐率</p><p>3. 减少并发操作</p><p>Samza处理流程.</p><p>下图是Samza官方给的一例子，根据Member ID分组，计算页面访问次数。入口消息分别来自Machine1、2，出口是Machine3，我们可以这样理解，消息分散在不同的消息系统中（Kafka），Samza从不同的Kafka中读取topic，在将topic进行处理后，发送到Machine3，这里不做过多分解，具体可以参照官方文档。</p><p><br /></p><p><img src="http://img.blog.csdn.net/20131124225641921?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvaGxqbHpjMjAwNw==/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/SouthEast" alt="" style="border: none; max-width: 100%;" /><br /></p><p>项目地址：<a target="_blank" href="https://github.com/apache/incubator-samza" style="color: #336699; text-decoration: none;">https://github.com/apache/incubator-samza</a></p><p>官方文件：<a target="_blank" href="http://samza.incubator.apache.org/" style="color: #336699; text-decoration: none;">http://samza.incubator.apache.org/</a></p><p>以上给了我们无限遐想，Storm是否会保持领先地位，Samza能否取而代之呢，无论如何，作为开发者来说，几千行代码，我都迫不及待去要读一下了。</p></div><img src ="http://www.blogjava.net/paulwong/aggbug/420922.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/paulwong/" target="_blank">paulwong</a> 2014-12-02 15:03 <a href="http://www.blogjava.net/paulwong/archive/2014/12/02/420922.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Auto rebalance Storm</title><link>http://www.blogjava.net/paulwong/archive/2014/05/09/413479.html</link><dc:creator>paulwong</dc:creator><author>paulwong</author><pubDate>Fri, 09 May 2014 15:48:00 GMT</pubDate><guid>http://www.blogjava.net/paulwong/archive/2014/05/09/413479.html</guid><wfw:comment>http://www.blogjava.net/paulwong/comments/413479.html</wfw:comment><comments>http://www.blogjava.net/paulwong/archive/2014/05/09/413479.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/paulwong/comments/commentRss/413479.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/paulwong/services/trackbacks/413479.html</trackback:ping><description><![CDATA[<a href="http://stackoverflow.com/questions/15010420/storm-topology-rebalance-using-java-code" target="_blank">http://stackoverflow.com/questions/15010420/storm-topology-rebalance-using-java-code</a><br /><br /><br />使用Nimbus获取STORM的信息<br /><a href="http://www.andys-sundaypink.com/i/retrieve-storm-cluster-statistic-from-nimbus-java-mode/" target="_blank">http://www.andys-sundaypink.com/i/retrieve-storm-cluster-statistic-from-nimbus-java-mode/</a><br /><div style="background-color:#eeeeee;font-size:13px;border:1px solid #CCCCCC;padding-right: 5px;padding-bottom: 4px;padding-left: 4px;padding-top: 4px;width: 98%;word-break:break-all"><!--<br /><br />Code highlighting produced by Actipro CodeHighlighter (freeware)<br />http://www.CodeHighlighter.com/<br /><br />-->TSocket&nbsp;tsocket&nbsp;=&nbsp;<span style="color: #0000FF; ">new</span>&nbsp;TSocket("localhost",&nbsp;6627);<br />TFramedTransport&nbsp;tTransport&nbsp;=&nbsp;<span style="color: #0000FF; ">new</span>&nbsp;TFramedTransport(tsocket);<br />TBinaryProtocol&nbsp;tBinaryProtocol&nbsp;=&nbsp;<span style="color: #0000FF; ">new</span>&nbsp;TBinaryProtocol(tTransport);<br />Nimbus.Client&nbsp;client&nbsp;=&nbsp;<span style="color: #0000FF; ">new</span>&nbsp;Nimbus.Client(tBinaryProtocol);<br />String&nbsp;topologyId&nbsp;=&nbsp;"test-1-234232567";<br /><br /><br /><span style="color: #0000FF; ">try</span>&nbsp;{<br /><br />tTransport.open();<br />ClusterSummary&nbsp;clusterSummary&nbsp;=&nbsp;client.getClusterInfo();<br />StormTopology&nbsp;stormTopology&nbsp;=&nbsp;client.getTopology(topologyId);<br />TopologyInfo&nbsp;topologyInfo&nbsp;=&nbsp;client.getTopologyInfo(topologyId);<br />List&lt;ExecutorSummary&gt;&nbsp;executorSummaries&nbsp;=&nbsp;topologyInfo.get_executors();<br /><br />List&lt;TopologySummary&gt;&nbsp;topologies&nbsp;=&nbsp;clusterSummary.get_topologies();<br /><span style="color: #0000FF; ">for</span>(ExecutorSummary&nbsp;executorSummary&nbsp;:&nbsp;executorSummaries){<br /><br />String&nbsp;id&nbsp;=&nbsp;executorSummary.get_component_id();<br />ExecutorInfo&nbsp;executorInfo&nbsp;=&nbsp;executorSummary.get_executor_info();<br />ExecutorStats&nbsp;executorStats&nbsp;=&nbsp;executorSummary.get_stats();<br />System.out.println("executorSummary&nbsp;::&nbsp;"&nbsp;+&nbsp;id&nbsp;+&nbsp;"&nbsp;emit&nbsp;size&nbsp;::&nbsp;"&nbsp;+&nbsp;executorStats.get_emitted_size());<br />}<br />}&nbsp;<span style="color: #0000FF; ">catch</span>&nbsp;(TTransportException&nbsp;e)&nbsp;{<br />e.printStackTrace();<br />}&nbsp;<span style="color: #0000FF; ">catch</span>&nbsp;(TException&nbsp;e)&nbsp;{<br />e.printStackTrace();<br />}&nbsp;<span style="color: #0000FF; ">catch</span>&nbsp;(NotAliveException&nbsp;e)&nbsp;{<br />e.printStackTrace();<br />}</div><br /><br /><br /><br /><img src ="http://www.blogjava.net/paulwong/aggbug/413479.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/paulwong/" target="_blank">paulwong</a> 2014-05-09 23:48 <a href="http://www.blogjava.net/paulwong/archive/2014/05/09/413479.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>浅释STORM</title><link>http://www.blogjava.net/paulwong/archive/2014/05/09/413476.html</link><dc:creator>paulwong</dc:creator><author>paulwong</author><pubDate>Fri, 09 May 2014 14:56:00 GMT</pubDate><guid>http://www.blogjava.net/paulwong/archive/2014/05/09/413476.html</guid><wfw:comment>http://www.blogjava.net/paulwong/comments/413476.html</wfw:comment><comments>http://www.blogjava.net/paulwong/archive/2014/05/09/413476.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/paulwong/comments/commentRss/413476.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/paulwong/services/trackbacks/413476.html</trackback:ping><description><![CDATA[STORM是一个消息处理引擎，可以处理源源不断的进来的消息，这些消息的处理是可以按步骤的。<br />
<br />
处理的方式有各种自定义：<br />
<br />
<ol>
     <li>可自定义消息处理的步骤<br />
     <br />
     </li>
     <li>可自定义每种类型的消息需要多少个进程来处理<br />
     <br />
     </li>
     <li>每个步骤里的消息是在某个进程里的线程来做处理的<br />
     <br />
     </li>
     <li>可自定义每个步骤里的消息的线程数<br />
     <br />
     </li>
     <li>可以增加和删除要处理的消息类型
     </li>
</ol>
如果要处理某种消息了，要怎么办呢？<br />
<br />
<ol>
     <li>定义数据来源组件(SPOUT)<br />
     <br />
     </li>
     <li>定义处理步骤(BOLT)<br />
     <br />
     </li>
     <li>组合成一个消息处理流程框架TOPOLOGY<br />
     <br />
     </li>
     <li>定义处理消息的进程的数量、定义每个步骤并发时可用的线程数<br />
     <br />
     </li>
     <li>部署TOPOLOGY</li>
</ol>当一个TOPOLOGY被部署到STORM时，STORM会查找配置对象的WORKER数量，根据这个数量相应的启动N个JVM，然后根据每个步骤配置的NUMTASKS生成相应个数的线程，然后每个步骤中配置的数量实例化相应个数的对象，然后就启动一个线程不断的执行SPOUT中的nextTuple()方法，如果这个方法中有输出结果，就启动另一线程，并在此线程中将这个结果作为参数传到下一个对象的excue方法中。<br /><br />如果此时又有一个步骤BOLT需要执行的话，也是新取一个线程去执行BOLT中的方法启动的线程不会越过NUMTASKS的数量。<br /><br /><br /><br /><img src ="http://www.blogjava.net/paulwong/aggbug/413476.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/paulwong/" target="_blank">paulwong</a> 2014-05-09 22:56 <a href="http://www.blogjava.net/paulwong/archive/2014/05/09/413476.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Storm performance</title><link>http://www.blogjava.net/paulwong/archive/2014/05/08/413391.html</link><dc:creator>paulwong</dc:creator><author>paulwong</author><pubDate>Thu, 08 May 2014 01:19:00 GMT</pubDate><guid>http://www.blogjava.net/paulwong/archive/2014/05/08/413391.html</guid><wfw:comment>http://www.blogjava.net/paulwong/comments/413391.html</wfw:comment><comments>http://www.blogjava.net/paulwong/archive/2014/05/08/413391.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/paulwong/comments/commentRss/413391.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/paulwong/services/trackbacks/413391.html</trackback:ping><description><![CDATA[<p style="line-height: 1.3em; margin: 11px 0px 10px; padding: 0px; font-family: Verdana, Arial, sans-serif;">The configuration is used to tune various aspects of the running topology. The two configurations specified here are very common:</p>
<ol style="font-family: Verdana, Arial, sans-serif; line-height: normal;">
     <li><strong>TOPOLOGY_WORKERS</strong>&nbsp;(set with&nbsp;<code>setNumWorkers</code>) specifies how many&nbsp;<em>processes</em>&nbsp;you want allocated around the cluster to execute the topology. Each component in the topology will execute as many&nbsp;<em>threads</em>. The number of threads allocated to a given component is configured through the&nbsp;<code>setBolt</code>&nbsp;and&nbsp;<code>setSpout</code>&nbsp;methods. Those&nbsp;<em>threads</em>exist within worker&nbsp;<em>processes</em>. Each worker&nbsp;<em>process</em>&nbsp;contains within it some number of&nbsp;<em>threads</em>&nbsp;for some number of components. For instance, you may have 300 threads specified across all your components and 50 worker processes specified in your config. Each worker process will execute 6 threads, each of which of could belong to a different component. You tune the performance of Storm topologies by tweaking the parallelism for each component and the number of worker processes those threads should run within.</li>
     <li><strong>TOPOLOGY_DEBUG</strong>&nbsp;(set with&nbsp;<code>setDebug</code>), when set to true, tells Storm to log every message every emitted by a component. This is useful in local mode when testing topologies, but you probably want to keep this turned off when running topologies on the cluster.</li>
</ol>
<p style="line-height: 1.3em; margin: 11px 0px 10px; padding: 0px; font-family: Verdana, Arial, sans-serif;">There's many other configurations you can set for the topology. The various configurations are detailed on&nbsp;<a href="http://storm.incubator.apache.org/apidocs/backtype/storm/Config.html" style="color: #3366ff; text-decoration: none;">the Javadoc for Config</a>.<br />
<br />
<br />
</p>
<h3 style="font-family: Verdana, Arial, sans-serif; line-height: normal;">Common configurations</h3>
<p style="line-height: 1.3em; margin: 11px 0px 10px; padding: 0px; font-family: Verdana, Arial, sans-serif;"><br />
</p>
<p style="line-height: 1.3em; margin: 11px 0px 10px; padding: 0px; font-family: Verdana, Arial, sans-serif;">There are a variety of configurations you can set per topology. A list of all the configurations you can set can be found&nbsp;<a href="http://storm.incubator.apache.org/apidocs/backtype/storm/Config.html" style="color: #3366ff; text-decoration: none;">here</a>. The ones prefixed with "TOPOLOGY" can be overridden on a topology-specific basis (the other ones are cluster configurations and cannot be overridden). Here are some common ones that are set for a topology:</p>
<ol style="font-family: Verdana, Arial, sans-serif; line-height: normal;">
     <li><strong>Config.TOPOLOGY_WORKERS</strong>: This sets the number of worker processes to use to execute the topology. For example, if you set this to 25, there will be 25 Java processes across the cluster executing all the tasks. If you had a combined 150 parallelism across all components in the topology, each worker process will have 6 tasks running within it as threads.</li>
     <li><strong>Config.TOPOLOGY_ACKERS</strong>: This sets the number of tasks that will track tuple trees and detect when a spout tuple has been fully processed. Ackers are an integral part of Storm's reliability model and you can read more about them on<a href="http://storm.incubator.apache.org/documentation/Guaranteeing-message-processing.html" style="color: #3366ff; text-decoration: none;">Guaranteeing message processing</a>.</li>
     <li><strong>Config.TOPOLOGY_MAX_SPOUT_PENDING</strong>: This sets the maximum number of spout tuples that can be pending on a single spout task at once (pending means the tuple has not been acked or failed yet). It is highly recommended you set this config to prevent queue explosion.</li>
     <li><strong>Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS</strong>: This is the maximum amount of time a spout tuple has to be fully completed before it is considered failed. This value defaults to 30 seconds, which is sufficient for most topologies. See<a href="http://storm.incubator.apache.org/documentation/Guaranteeing-message-processing.html" style="color: #3366ff; text-decoration: none;">Guaranteeing message processing</a>&nbsp;for more information on how Storm's reliability model works.</li>
     <li><strong>Config.TOPOLOGY_SERIALIZATIONS</strong>: You can register more serializers to Storm using this config so that you can use custom types within tuples.<br />
     <br />
     </li>
</ol>
Reference:<br /><a href="http://storm.incubator.apache.org/documentation/Running-topologies-on-a-production-cluster.html" target="_blank">http://storm.incubator.apache.org/documentation/Running-topologies-on-a-production-cluster.html</a><br /><br /><div>storm rebalance 命令调整topology并行数及问题分析</div><a href="http://blog.csdn.net/jmppok/article/details/17243857" target="_blank">http://blog.csdn.net/jmppok/article/details/17243857</a><br /><br />flume+kafka+storm+mysql 数据流<br /><a href="http://blog.csdn.net/jmppok/article/details/17259145" target="_blank">http://blog.csdn.net/jmppok/article/details/17259145</a><br /><br /><br /><br /><a href="http://storm.incubator.apache.org/documentation/Tutorial.html" target="_blank">http://storm.incubator.apache.org/documentation/Tutorial.html</a><img src ="http://www.blogjava.net/paulwong/aggbug/413391.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/paulwong/" target="_blank">paulwong</a> 2014-05-08 09:19 <a href="http://www.blogjava.net/paulwong/archive/2014/05/08/413391.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>安装STORM</title><link>http://www.blogjava.net/paulwong/archive/2014/05/04/413230.html</link><dc:creator>paulwong</dc:creator><author>paulwong</author><pubDate>Sun, 04 May 2014 10:01:00 GMT</pubDate><guid>http://www.blogjava.net/paulwong/archive/2014/05/04/413230.html</guid><wfw:comment>http://www.blogjava.net/paulwong/comments/413230.html</wfw:comment><comments>http://www.blogjava.net/paulwong/archive/2014/05/04/413230.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/paulwong/comments/commentRss/413230.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/paulwong/services/trackbacks/413230.html</trackback:ping><description><![CDATA[<ol>
     <li>
     install ZeroMQ<br />
     <div style="background-color:#eeeeee;font-size:13px;border:1px solid #CCCCCC;padding-right: 5px;padding-bottom: 4px;padding-left: 4px;padding-top: 4px;width: 98%;word-break:break-all"><!--<br />
     <br />
     Code highlighting produced by Actipro CodeHighlighter (freeware)<br />
     http://www.CodeHighlighter.com/<br />
     <br />
     -->wget&nbsp;http://download.zeromq.org/historic/zeromq-<span style="color: #800000; ">2.1</span>.<span style="color: #800000; ">7</span>.tar.gz<br />
     tar&nbsp;-xzf&nbsp;zeromq-<span style="color: #800000; ">2.1</span>.<span style="color: #800000; ">7</span>.tar.gz<br />
     cd&nbsp;zeromq-<span style="color: #800000; ">2.1</span>.<span style="color: #800000; ">7</span><br />
     ./configure<br />
     &nbsp;//在configure时可能会报缺包，安装即可：sudo&nbsp;apt-get&nbsp;install&nbsp;g++&nbsp;uuid-dev<br />
     make<br />
     sudo&nbsp;make&nbsp;install</div>
     </li>
     <li>
     install JZMQ<br />
     <div style="background-color:#eeeeee;font-size:13px;border:1px solid #CCCCCC;padding-right: 5px;padding-bottom: 4px;padding-left: 4px;padding-top: 4px;width: 98%;word-break:break-all"><!--<br />
     <br />
     Code highlighting produced by Actipro CodeHighlighter (freeware)<br />
     http://www.CodeHighlighter.com/<br />
     <br />
     -->git&nbsp;clone&nbsp;https://github.com/nathanmarz/jzmq.git<br />
     cd&nbsp;jzmq<br />
     ./autogen.sh<br />
     ./configure<br />
     make<br />
     sudo&nbsp;make&nbsp;install</div>
     <br />
     </li>
     <li>
     下载并解压STORM<br />
     <br />
     </li>
     <li>
     编辑conf/storm.yaml<br />
     <div style="background-color:#eeeeee;font-size:13px;border:1px solid #CCCCCC;padding-right: 5px;padding-bottom: 4px;padding-left: 4px;padding-top: 4px;width: 98%;word-break:break-all"><!--<br />
     <br />
     Code highlighting produced by Actipro CodeHighlighter (freeware)<br />
     http://www.CodeHighlighter.com/<br />
     <br />
     -->storm.zookeeper.servers:<br />
     -&nbsp;<span style="font-weight: bold;">"</span><span style="font-weight: bold;">1.2.3.5</span><span style="font-weight: bold;">"</span><br />
     -&nbsp;<span style="font-weight: bold;">"</span><span style="font-weight: bold;">1.2.3.6</span><span style="font-weight: bold;">"</span><br />
     -&nbsp;<span style="font-weight: bold;">"</span><span style="font-weight: bold;">1.2.3.7</span><span style="font-weight: bold;">"</span><br />
     storm.<span style="color: #0000FF; ">local</span>.dir:&nbsp;<span style="font-weight: bold;">"</span><span style="font-weight: bold;">/opt/folder</span><span style="font-weight: bold;">"</span><br />
     nimbus.host:&nbsp;<span style="font-weight: bold;">"</span><span style="font-weight: bold;">54.72.4.92</span><span style="font-weight: bold;">"</span><br />
     supervisor.slots.ports:<br />
     -&nbsp;<span style="color: #800000; ">6700</span><br />
     -&nbsp;<span style="color: #800000; ">6701</span><br />
     -&nbsp;<span style="color: #800000; ">6702</span></div>
     </li>
     <li>编辑/etc/profile<br />
     <div style="background-color:#eeeeee;font-size:13px;border:1px solid #CCCCCC;padding-right: 5px;padding-bottom: 4px;padding-left: 4px;padding-top: 4px;width: 98%;word-break:break-all"><!--<br />
     <br />
     Code highlighting produced by Actipro CodeHighlighter (freeware)<br />
     http://www.CodeHighlighter.com/<br />
     <br />
     -->export&nbsp;JAVA_HOME=/usr/lib/jvm/java-7-oracle<br />
     export&nbsp;STORM_HOME=/home/ubuntu/java/storm-0.8.1<br />
     export&nbsp;KAFKA_HOME=/home/ubuntu/java/kafka_2.9.2-0.8.1.1<br />
     export&nbsp;ZOOKEEPER_HOME=/home/ubuntu/java/zookeeper-3.4.6<br />
     <br />
     export&nbsp;PATH=$JAVA_HOME/bin:$STORM_HOME/bin:$KAFKA_HOME/bin:$ZOOKEEPER_HOME/bin:$PATH</div>
     </li><br />
     <li>
     制作启动命令: start-storm.sh<br /><div style="background-color:#eeeeee;font-size:13px;border:1px solid #CCCCCC;padding-right: 5px;padding-bottom: 4px;padding-left: 4px;padding-top: 4px;width: 98%;word-break:break-all"><!--<br /><br />Code highlighting produced by Actipro CodeHighlighter (freeware)<br />http://www.CodeHighlighter.com/<br /><br />-->storm&nbsp;nimbus&nbsp;&amp;<br />storm&nbsp;supervisor&nbsp;&amp;<br />storm&nbsp;ui&nbsp;&amp;</div><br />
     </li>
</ol>
安装途中如果遇到问题<br />
<a href="http://my.oschina.net/mingdongcheng/blog/43009" target="_blank">http://my.oschina.net/mingdongcheng/blog/43009</a><img src ="http://www.blogjava.net/paulwong/aggbug/413230.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/paulwong/" target="_blank">paulwong</a> 2014-05-04 18:01 <a href="http://www.blogjava.net/paulwong/archive/2014/05/04/413230.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>STORM启动与部署TOPOLOGY</title><link>http://www.blogjava.net/paulwong/archive/2013/09/11/403942.html</link><dc:creator>paulwong</dc:creator><author>paulwong</author><pubDate>Wed, 11 Sep 2013 03:00:00 GMT</pubDate><guid>http://www.blogjava.net/paulwong/archive/2013/09/11/403942.html</guid><wfw:comment>http://www.blogjava.net/paulwong/comments/403942.html</wfw:comment><comments>http://www.blogjava.net/paulwong/archive/2013/09/11/403942.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/paulwong/comments/commentRss/403942.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/paulwong/services/trackbacks/403942.html</trackback:ping><description><![CDATA[<ol><li>启动ZOOPKEEPER<div style="padding: 4px 5px 4px 4px; border: 1px solid rgb(204, 204, 204); width: 98%; font-size: 13px; -ms-word-break: break-all; background-color: rgb(238, 238, 238);"><!--<br /><br />Code highlighting produced by Actipro CodeHighlighter (freeware)<br />http://www.CodeHighlighter.com/<br /><br />--><img align="top" alt="" src="http://www.blogjava.net/images/OutliningIndicators/None.gif" /><span style="color: rgb(0, 0, 0);">zkServer.sh&nbsp;start</span></div></li><li>启动NIMBUS<div style="padding: 4px 5px 4px 4px; border: 1px solid rgb(204, 204, 204); width: 98%; font-size: 13px; -ms-word-break: break-all; background-color: rgb(238, 238, 238);"><!--<br /><br />Code highlighting produced by Actipro CodeHighlighter (freeware)<br />http://www.CodeHighlighter.com/<br /><br />--><img align="top" alt="" src="http://www.blogjava.net/images/OutliningIndicators/None.gif" /><span style="color: rgb(0, 0, 0);">storm&nbsp;nimbus&nbsp;&amp;</span></div></li><li>启动SUPERVISOR<div style="padding: 4px 5px 4px 4px; border: 1px solid rgb(204, 204, 204); width: 98%; font-size: 13px; -ms-word-break: break-all; background-color: rgb(238, 238, 238);"><!--<br /><br />Code highlighting produced by Actipro CodeHighlighter (freeware)<br />http://www.CodeHighlighter.com/<br /><br />--><img align="top" alt="" src="http://www.blogjava.net/images/OutliningIndicators/None.gif" /><span style="color: rgb(0, 0, 0);">storm&nbsp;supervisor&nbsp;&amp;</span></div></li><li>启动UI<div style="padding: 4px 5px 4px 4px; border: 1px solid rgb(204, 204, 204); width: 98%; font-size: 13px; -ms-word-break: break-all; background-color: rgb(238, 238, 238);"><!--<br /><br />Code highlighting produced by Actipro CodeHighlighter (freeware)<br />http://www.CodeHighlighter.com/<br /><br />--><img align="top" alt="" src="http://www.blogjava.net/images/OutliningIndicators/None.gif" /><span style="color: rgb(0, 0, 0);">storm ui &amp;</span></div></li><li>部署TOPOLOGY<div style="padding: 4px 5px 4px 4px; border: 1px solid rgb(204, 204, 204); width: 98%; font-size: 13px; -ms-word-break: break-all; background-color: rgb(238, 238, 238);"><!--<br /><br />Code highlighting produced by Actipro CodeHighlighter (freeware)<br />http://www.CodeHighlighter.com/<br /><br />--><img align="top" alt="" src="http://www.blogjava.net/images/OutliningIndicators/None.gif" /><span style="color: rgb(0, 0, 0);">storm jar /opt/hadoop/loganalyst/storm-dependend/data/teststorm-1.0.jar teststorm.TopologyMain /opt/hadoop/loganalyst/storm-dependend/data/words.txt</span></div></li><li>删除TOPOLOGY<div style="padding: 4px 5px 4px 4px; border: 1px solid rgb(204, 204, 204); width: 98%; font-size: 13px; -ms-word-break: break-all; background-color: rgb(238, 238, 238);"><!--<br /><br />Code highlighting produced by Actipro CodeHighlighter (freeware)<br />http://www.CodeHighlighter.com/<br /><br />--><img align="top" alt="" src="http://www.blogjava.net/images/OutliningIndicators/None.gif" /><span style="color: rgb(0, 0, 0);">storm&nbsp;kill&nbsp;{toponame}</span></div></li><li>激活TOPOLOGY<div style="padding: 4px 5px 4px 4px; border: 1px solid rgb(204, 204, 204); width: 98%; font-size: 13px; -ms-word-break: break-all; background-color: rgb(238, 238, 238);"><!--<br /><br />Code highlighting produced by Actipro CodeHighlighter (freeware)<br />http://www.CodeHighlighter.com/<br /><br />--><img align="top" src="http://www.blogjava.net/images/OutliningIndicators/None.gif"  alt="" /><span style="color: rgb(0, 0, 0);">storm&nbsp;active&nbsp;{toponame}</span></div></li><li>不激活TOPOLOGY<div style="padding: 4px 5px 4px 4px; border: 1px solid rgb(204, 204, 204); width: 98%; font-size: 13px; -ms-word-break: break-all; background-color: rgb(238, 238, 238);"><!--<br /><br />Code highlighting produced by Actipro CodeHighlighter (freeware)<br />http://www.CodeHighlighter.com/<br /><br />--><img align="top" src="http://www.blogjava.net/images/OutliningIndicators/None.gif"  alt="" /><span style="color: rgb(0, 0, 0);">storm&nbsp;deactive&nbsp;{toponame}</span></div></li><li>列出所有TOPOLOGY<div style="padding: 4px 5px 4px 4px; border: 1px solid rgb(204, 204, 204); width: 98%; font-size: 13px; -ms-word-break: break-all; background-color: rgb(238, 238, 238);"><!--<br /><br />Code highlighting produced by Actipro CodeHighlighter (freeware)<br />http://www.CodeHighlighter.com/<br /><br />--><img align="top" src="http://www.blogjava.net/images/OutliningIndicators/None.gif"  alt="" /><span style="color: rgb(0, 0, 0);">storm&nbsp;list</span></div><br /><br /><br /></li></ol> 
 <img src ="http://www.blogjava.net/paulwong/aggbug/403942.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/paulwong/" target="_blank">paulwong</a> 2013-09-11 11:00 <a href="http://www.blogjava.net/paulwong/archive/2013/09/11/403942.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>STORM资源</title><link>http://www.blogjava.net/paulwong/archive/2013/09/08/403826.html</link><dc:creator>paulwong</dc:creator><author>paulwong</author><pubDate>Sun, 08 Sep 2013 11:59:00 GMT</pubDate><guid>http://www.blogjava.net/paulwong/archive/2013/09/08/403826.html</guid><wfw:comment>http://www.blogjava.net/paulwong/comments/403826.html</wfw:comment><comments>http://www.blogjava.net/paulwong/archive/2013/09/08/403826.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/paulwong/comments/commentRss/403826.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/paulwong/services/trackbacks/403826.html</trackback:ping><description><![CDATA[Install Storm<br /><a href="http://www.jansipke.nl/installing-a-storm-cluster-on-centos-hosts/" target="_blank">http://www.jansipke.nl/installing-a-storm-cluster-on-centos-hosts/</a><br /><a href="http://www.cnblogs.com/kemaswill/archive/2012/10/24/2737833.html" target="_blank">http://www.cnblogs.com/kemaswill/archive/2012/10/24/2737833.html</a><br /><a href="http://abentotoro.blog.sohu.com/197023262.html" target="_blank">http://abentotoro.blog.sohu.com/197023262.html</a><br /><a href="http://www.cnblogs.com/panfeng412/archive/2012/11/30/how-to-install-and-deploy-storm-cluster.html" target="_blank">http://www.cnblogs.com/panfeng412/archive/2012/11/30/how-to-install-and-deploy-storm-cluster.html</a><br /><br /><br />使用 Twitter Storm 处理实时的大数据<br /><a href="http://www.ibm.com/developerworks/cn/opensource/os-twitterstorm/" target="_blank">http://www.ibm.com/developerworks/cn/opensource/os-twitterstorm/</a><br /><br /><br />Storm数据流模型的分析及讨论<br /><a href="http://www.cnblogs.com/panfeng412/archive/2012/07/29/storm-stream-model-analysis-and-discussion.html" target="_blank">http://www.cnblogs.com/panfeng412/archive/2012/07/29/storm-stream-model-analysis-and-discussion.html</a><br /><a href="http://www.cnblogs.com/panfeng412/tag/Storm/" target="_blank">http://www.cnblogs.com/panfeng412/tag/Storm/</a><br /><br /><br />storm-kafka<br /><a href="https://github.com/nathanmarz/storm-contrib/tree/master/storm-kafka" target="_blank">https://github.com/nathanmarz/storm-contrib/tree/master/storm-kafka</a><br /><br /><br />使用Storm实现实时大数据分析！<br /><a href="http://www.csdn.net/article/2012-12-24/2813117-storm-realtime-big-data-analysis" target="_blank">http://www.csdn.net/article/2012-12-24/2813117-storm-realtime-big-data-analysis</a><br /><br /><br />storm-deploy-aws<br /><a href="https://github.com/nathanmarz/storm-deploy/wiki" target="_blank">https://github.com/nathanmarz/storm-deploy/wiki</a><br /><br /><br />!!!知乎网站上的Twitter Storm<br /><a href="http://www.zhihu.com/topic/19673110" target="_blank">http://www.zhihu.com/topic/19673110</a><br /><br /><br />storm-elastic-search<br /><a href="https://github.com/hmsonline/storm-elastic-search" target="_blank">https://github.com/hmsonline/storm-elastic-search</a><br /><br /><br />storm-examples<br /><a href="https://github.com/stormprocessor/storm-examples" target="_blank">https://github.com/stormprocessor/storm-examples</a><br /><br /><br />kafka-aws<br /><a href="https://github.com/nathanmarz/kafka-deploy" target="_blank">https://github.com/nathanmarz/kafka-deploy</a><br /> 
 
 
 
 
<br /><br />Next Gen Real-time Streaming with Storm-Kafka Integration<br /><a href="http://blog.infochimps.com/2012/10/30/next-gen-real-time-streaming-storm-kafka-integration/" target="_blank">http://blog.infochimps.com/2012/10/30/next-gen-real-time-streaming-storm-kafka-integration/</a><br /><br /><br />flume+kafka+storm+mysql 数据流 <br /><a href="http://blog.csdn.net/baiyangfu/article/details/8096088" target="_blank">http://blog.csdn.net/baiyangfu/article/details/8096088</a><br /><a href="http://blog.csdn.net/baiyangfu/article/category/1244640" target="_blank">http://blog.csdn.net/baiyangfu/article/category/1244640</a><br /><br /><br />Kafka学习笔记 <br /><a href="http://blog.csdn.net/baiyangfu/article/details/8096084" target="_blank">http://blog.csdn.net/baiyangfu/article/details/8096084</a><br /><br /><br />STORM+KAFKA<br /><a href="https://github.com/buildlackey/cep" target="_blank">https://github.com/buildlackey/cep</a><br /><br /><br />STORM+KETTLE<br /><a href="https://github.com/buildlackey/kettle-storm" target="_blank">https://github.com/buildlackey/kettle-storm</a><br /><br /><img src ="http://www.blogjava.net/paulwong/aggbug/403826.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/paulwong/" target="_blank">paulwong</a> 2013-09-08 19:59 <a href="http://www.blogjava.net/paulwong/archive/2013/09/08/403826.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>STORM与HADOOP的比较</title><link>http://www.blogjava.net/paulwong/archive/2013/09/08/403824.html</link><dc:creator>paulwong</dc:creator><author>paulwong</author><pubDate>Sun, 08 Sep 2013 11:49:00 GMT</pubDate><guid>http://www.blogjava.net/paulwong/archive/2013/09/08/403824.html</guid><wfw:comment>http://www.blogjava.net/paulwong/comments/403824.html</wfw:comment><comments>http://www.blogjava.net/paulwong/archive/2013/09/08/403824.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/paulwong/comments/commentRss/403824.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/paulwong/services/trackbacks/403824.html</trackback:ping><description><![CDATA[对于一堆时刻在增长的数据，如果要统计，可以采取什么方法呢？<br /><ol><li>等数据增长到一定程度的时候，跑一个统计程序进行统计。适用于实时性要求不高的场景。<br />如将数据导到HDFS，再运行一个MAP REDUCE JOB。<br /></li><li>如果实时性要求高的，上面的方法就不行了。因此就带来第二种方法。<br />在数据每次增长一笔的时候，就进行统计JOB，结果放到DB或搜索引擎的INDEX中。<br />STORM就是完成这种工作的。</li></ol><br />HADOOP与STORM比较<br /><ol><li>数据来源：HADOOP是HDFS上某个文件夹下的可能是成TB的数据，STORM是实时新增的某一笔数据</li><li>处理过程：HADOOP是分MAP阶段到REDUCE阶段，STORM是由用户定义处理流程，<br />流程中可以包含多个步骤，每个步骤可以是数据源(SPOUT)或处理逻辑(BOLT)</li><li>是否结束：HADOOP最后是要结束的，STORM是没有结束状态，到最后一步时，就停在那，直到有新<br />数据进入时再从头开始</li><li>处理速度：HADOOP是以处理HDFS上大量数据为目的，速度慢，STORM是只要处理新增的某一笔数据即可<br />可以做到很快。</li><li>适用场景：HADOOP是在要处理一批数据时用的，不讲究时效性，要处理就提交一个JOB，STORM是要处理<br />某一新增数据时用的，要讲时效性<br /></li><li>与MQ对比：HADOOP没有对比性，STORM可以看作是有N个步骤，每个步骤处理完就向下一个MQ发送消息，<br />监听这个MQ的消费者继续处理<br /><br /></li></ol><img src ="http://www.blogjava.net/paulwong/aggbug/403824.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/paulwong/" target="_blank">paulwong</a> 2013-09-08 19:49 <a href="http://www.blogjava.net/paulwong/archive/2013/09/08/403824.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item></channel></rss>