﻿<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/"><channel><title>BlogJava-Big Data Road-文章分类-Storm</title><link>http://www.blogjava.net/xuhongxing016/category/50624.html</link><description /><language>zh-cn</language><lastBuildDate>Thu, 26 Jan 2012 13:22:05 GMT</lastBuildDate><pubDate>Thu, 26 Jan 2012 13:22:05 GMT</pubDate><ttl>60</ttl><item><title>Clojure DSL</title><link>http://www.blogjava.net/xuhongxing016/articles/368750.html</link><dc:creator>徐红星 </dc:creator><author>徐红星 </author><pubDate>Thu, 19 Jan 2012 07:38:00 GMT</pubDate><guid>http://www.blogjava.net/xuhongxing016/articles/368750.html</guid><wfw:comment>http://www.blogjava.net/xuhongxing016/comments/368750.html</wfw:comment><comments>http://www.blogjava.net/xuhongxing016/articles/368750.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/xuhongxing016/comments/commentRss/368750.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/xuhongxing016/services/trackbacks/368750.html</trackback:ping><description><![CDATA[<div>&nbsp;&nbsp;&nbsp;&nbsp; <span style="font-size: 10pt;">Storm 用Clojure DSL 来定义 spouts, bolts, and topologies。 Clojure的DSL访问的任何公开的Java API，如果你是一个Clojure的用户，你编写直接用Clojure 编写Storm 的Topologies，而不用接触Java。 Clojure的DSL定义在backtype.storm.clojure命名空间里。<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 本页面概述了Clojure的DSL的所有细节，包括：<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 1.定义Topology（拓扑结构）<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 2.定义bolt<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 3.定义spout<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 4.在本地模式下或在集群模式下运行Topology<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5.测试Topology<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <div><span style="font-size: 10pt;"><h3>定义 topologies</h3></span></div>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 要定义Topology， 需要使用Topology函数。Topology需要两个参数：一个关于&#8220;Spou Specs&#8221;的映射和一个关于&#8220;Bolt Spec&#8221;的映射。每个Spout和Bolt指定组件到Topology上，如输入和并行拓扑结构的代码。<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 让我们来看看在Storm启动项目的例子的拓扑定义<br />(topology<br />&nbsp;{"1" (spout-spec sentence-spout)<br />&nbsp; "2" (spout-spec (sentence-spout-parameterized<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; ["the cat jumped over the door"<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; "greetings from a faraway land"])<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; :p 2)}<br />&nbsp;{"3" (bolt-spec {"1" :shuffle "2" :shuffle}<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; split-sentence<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; :p 5)<br />&nbsp; "4" (bolt-spec {"3" ["word"]}<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; word-count<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; :p 6)})<br />映射 Spout 和Bolt Spces 都是从组件ID到Correponding Spec的映射。组件ID必须在映射间唯一。就像在Java中定义Topology一样，在一个Topology里，在申明bolts的输入时，组件ID将用到。<br /><div> <div> <div id="wiki-body"  instapaper_body"=""> <div>  <h4>spout-spec</h4> <p><code>spout-spec</code> 作为Spout实现的参数和可选的关键字参数使用 。 目前唯一的可选参数是：P, 这个用来定义Spout的并行度。如果你忽略 <code>:p</code>, spout将会作为单一任务执行。</p> <h4>bolt-spec</h4> <p><span style="font-size: 10pt;"></span></p><div><span style="font-size: 10pt;"><code>bolt-spec</code></span>作为bolt的输入声明参数和可选的关键字参数使用<code></code> 。输入声明是数据流ID到数据流组的一个映射。数据流ID可以用以下两种形式中的一种：</div><p>&nbsp;</p><ol><li><code>[==component id== ==stream id==]</code>: 在组件上订阅指定流<br /></li><li><code>==component id==</code>: 在组件上订阅默认流<br /></li></ol> <p>数据流组可以是以下中的一个</p> <ol><li><code>:shuffle</code>: 订阅shuffle组<br /></li><li>字段名称的向量, like <code>["id" "name"]</code>: 订阅指定字段上的字段组</li><li><code>:global</code>: 订阅一个 global grouping</li><li><code>:all</code>: subscribes with an all grouping</li><li><code>:direct</code>: subscribes with a direct grouping</li></ol> <p>可以参考 <a present"="" href="/nathanmarz/storm/wiki/Concepts">Concepts</a> 获得更多关于流组的信息. 这里有一个示例来展示不同的方法来声明输入:</p>  <div><pre>{["2" "1"] :shuffle  "3" ["field1" "field2"]  ["4" "2"] :global} <br />输入声明总共订阅三种流。他在组件&#8220;2&#8221;上定义流&#8220;1&#8221;，是Shuffle分组方式。在组件"3"上订阅默认的流，是Fileds分组方式，分组标准是"Field1"和"Field2"。在组件4上定义流&#8220;2&#8221;，是Global分组方式，<br />跟Spout-Spec 方式类似，bolt-spec目前唯一支持的关键参数是:p,这个用来定义bolt的并行度。<br /></pre></div><h4>shell-bolt-spec</h4> <p><code>shell-bolt-spec</code>是用在non-JVM语言环境下来实现bolts。他作为参数输入，命令行程序去跑。the name of the file implementing the bolt, an output  specification, and then the same keyword arguments that <code>bolt-spec</code>  accepts.</p> <p>以下是 <code>shell-bolt-spec的一个示例</code>:</p>  <div><pre>(shell-bolt-spec {"1" :shuffle "2" ["id"]}  "python"  "mybolt.py"  ["outfield1" "outfield2"]  :p 25) </pre></div> <p>&nbsp;</p><div>输出声明的语法是在下面的defbolt部分详细描述。有如何在Storm上使用multilang的更多细节，请参阅使用非JVM语言</div><br /><p>&nbsp;</p> <h3>defbolt</h3> <p><code>defbolt</code> is used for defining bolts in Clojure. Bolts have the  constraint that they must be serializable, and this is why you can't just reify  <code>IRichBolt</code> to implement a bolt (closures aren't serializable).  <code>defbolt</code> works around this restriction and provides a nicer syntax  for defining bolts than just implementing a Java interface.</p> <p>At its fullest expressiveness, <code>defbolt</code> supports parameterized  bolts and maintaining state in a closure around the bolt implementation. It also  provides shortcuts for defining bolts that don't need this extra functionality.  The signature for <code>defbolt</code> looks like the following:</p> <p>(defbolt <em>name</em> <em>output-declaration</em> *<em>option-map</em> &amp;  <em>impl</em>)</p> <p>Omitting the option map is equivalent to having an option map of  <code>{:prepare false}</code>.</p> <h4>Simple bolts</h4> <p>Let's start with the simplest form of <code>defbolt</code>. Here's an example  bolt that splits a tuple containing a sentence into a tuple for each word:</p>  <div><pre>(defbolt split-sentence ["word"] [tuple collector]  (let [words (.split (.getString tuple 0) " ")]  (doseq [w words]  (emit-bolt! collector [w] :anchor tuple))  (ack! collector tuple)  )) </pre></div> <p>Since the option map is omitted, this is a non-prepared bolt. The DSL simply  expects an implementation for the <code>execute</code> method of  <code>IRichBolt</code>. The implementation takes two parameters, the tuple and  the <code>OutputCollector</code>, and is followed by the body of the  <code>execute</code> function. The DSL automatically type-hints the parameters  for you so you don't need to worry about reflection if you use Java interop.</p> <p>This implementation binds <code>split-sentence</code> to an actual  <code>IRichBolt</code> object that you can use in topologies, like so:</p>  <div><pre>(bolt-spec {"1" :shuffle}  split-sentence  :p 5) </pre></div> <h4>Parameterized bolts</h4> <p>Many times you want to parameterize your bolts with other arguments. For  example, let's say you wanted to have a bolt that appends a suffix to every  input string it receives, and you want that suffix to be set at runtime. You do  this with <code>defbolt</code> by including a <code>:params</code> option in the  option map, like so:</p>  <div><pre>(defbolt suffix-appender ["word"] {:params [suffix]}  [tuple collector]  (emit-bolt! collector [(str (.getString tuple 0) suffix)] :anchor tuple)  ) </pre></div> <p>Unlike the previous example, <code>suffix-appender</code> will be bound to a  function that returns an <code>IRichBolt</code> rather than be an  <code>IRichBolt</code> object directly. This is caused by specifying  <code>:params</code> in its option map. So to use <code>suffix-appender</code>  in a topology, you would do something like:</p>  <div><pre>(bolt-spec {"1" :shuffle}  (suffix-appender "-suffix")  :p 10) </pre></div> <h4>Prepared bolts</h4> <p>To do more complex bolts, such as ones that do joins and streaming  aggregations, the bolt needs to store state. You can do this by creating a  prepared bolt which is specified by including <code>{:prepare true}</code> in  the option map. Consider, for example, this bolt that implements word  counting:</p>  <div><pre>(defbolt word-count ["word" "count"] {:prepare true}  [conf context collector]  (let [counts (atom {})]  (bolt  (execute [tuple]  (let [word (.getString tuple 0)]  (swap! counts (partial merge-with +) {word 1})  (emit-bolt! collector [word (@counts word)] :anchor tuple)  (ack! collector tuple)  ))))) </pre></div> <p>The implementation for a prepared bolt is a function that takes as input the  topology config, <code>TopologyContext</code>, and <code>OutputCollector</code>,  and returns an implementation of the <code>IBolt</code> interface. This design  allows you to have a closure around the implementation of <code>execute</code>  and <code>cleanup</code>. </p> <p>In this example, the word counts are stored in the closure in a map called  <code>counts</code>. The <code>bolt</code> macro is used to create the  <code>IBolt</code> implementation. The <code>bolt</code> macro is a more concise  way to implement the interface than reifying, and it automatically type-hints  all of the method parameters. This bolt implements the execute method which  updates the count in the map and emits the new word count.</p> <p>Note that the <code>execute</code> method in prepared bolts only takes as  input the tuple since the <code>OutputCollector</code> is already in the closure  of the function (for simple bolts the collector is a second parameter to the  <code>execute</code> function).</p> <p>Prepared bolts can be parameterized just like simple bolts.</p> <h4>Output declarations</h4> <p>The Clojure DSL has a concise syntax for declaring the outputs of a bolt. The  most general way to declare the outputs is as a map from stream id a stream  spec. For example:</p>  <div><pre>{"1" ["field1" "field2"]  "2" (direct-stream ["f1" "f2" "f3"])  "3" ["f1"]} </pre></div> <p>The stream id is a string, while the stream spec is either a vector of fields  or a vector of fields wrapped by <code>direct-stream</code>. <code>direct  stream</code> marks the stream as a direct stream (See <a present"="" href="/nathanmarz/storm/wiki/Concepts">Concepts</a> and  <a absent"="" href="/nathanmarz/storm/wiki/Direct-groupings">Direct  groupings</a> for more details on direct streams).</p> <p>If the bolt only has one output stream, you can define the default stream of  the bolt by using a vector instead of a map for the output declaration. For  example:</p>  <div><pre>["word" "count"] </pre></div>This declares the output of the bolt as the fields ["word" "count"]  on the default stream id.  <h4>Emitting, acking, and failing</h4> <p>Rather than use the Java methods on <code>OutputCollector</code> directly,  the DSL provides a nicer set of functions for using  <code>OutputCollector</code>: <code>emit-bolt!</code>,  <code>emit-direct-bolt!</code>, <code>ack!</code>, and <code>fail!</code>.</p> <ol><li><code>emit-bolt!</code>: takes as parameters the  <code>OutputCollector</code>, the values to emit (a Clojure sequence), and  keyword arguments for <code>:anchor</code> and <code>:stream</code>.  <code>:anchor</code> can be a single tuple or a list of tuples, and  <code>:stream</code> is the id of the stream to emit to. Omitting the keyword  arguments emits an unanchored tuple to the default stream.</li><li><code>emit-direct-bolt!</code>: takes as parameters the  <code>OutputCollector</code>, the task id to send the tuple to, the values to  emit, and keyword arguments for <code>:anchor</code> and <code>:stream</code>.  This function can only emit to streams declared as direct streams.</li><li><code>ack!</code>: takes as parameters the <code>OutputCollector</code> and  the tuple to ack.</li><li><code>fail!</code>: takes as parameters the <code>OutputCollector</code> and  the tuple to fail.</li></ol> <p>See <a present"="" href="/nathanmarz/storm/wiki/Guaranteeing-message-processing">Guaranteeing  message processing</a> for more info on acking and anchoring.</p> <h3>defspout</h3> <p><code>defspout</code> is used for defining spouts in Clojure. Like bolts,  spouts must be serializable so you can't just reify <code>IRichSpout</code> to  do spout implementations in Clojure. <code>defspout</code> works around this  restriction and provides a nicer syntax for defining spouts than just  implementing a Java interface.</p> <p>The signature for <code>defspout</code> looks like the following:</p> <p>(defspout <em>name</em> <em>output-declaration</em> *<em>option-map</em>  &amp; <em>impl</em>)</p> <p>If you leave out the option map, it defaults to {:prepare true}. The output  declaration for <code>defspout</code> has the same syntax as  <code>defbolt</code>.</p> <p>Here's an example <code>defspout</code> implementation from <a href="https://github.com/nathanmarz/storm-starter/blob/master/src/clj/storm/starter/clj/word_count.clj">storm-starter</a>:</p>  <div><pre>(defspout sentence-spout ["sentence"]  [conf context collector]  (let [sentences ["a little brown dog"  "the man petted the dog"  "four score and seven years ago"  "an apple a day keeps the doctor away"]]  (spout  (nextTuple []  (Thread/sleep 100)  (emit-spout! collector [(rand-nth sentences)])   )  (ack [id]  ;; You only need to define this method for reliable spouts  ;; (such as one that reads off of a queue like Kestrel)  ;; This is an unreliable spout, so it does nothing here  )))) </pre></div> <p>The implementation takes in as input the topology config,  <code>TopologyContext</code>, and <code>SpoutOutputCollector</code>. The  implementation returns an <code>ISpout</code> object. Here, the  <code>nextTuple</code> function emits a random sentence from  <code>sentences</code>. </p> <p>This spout isn't reliable, so the <code>ack</code> and <code>fail</code>  methods will never be called. A reliable spout will add a message id when  emitting tuples, and then <code>ack</code> or <code>fail</code> will be called  when the tuple is completed or failed respectively. See <a present"="" href="/nathanmarz/storm/wiki/Guaranteeing-message-processing">Guaranteeing  message processing</a> for more info on how reliability works within Storm.</p> <p><code>emit-spout!</code> takes in as parameters the  <code>SpoutOutputCollector</code> and the new tuple to be emitted, and accepts  as keyword arguments <code>:stream</code> and <code>:id</code>.  <code>:stream</code> specifies the stream to emit to, and <code>:id</code>  specifies a message id for the tuple (used in the <code>ack</code> and  <code>fail</code> callbacks). Omitting these arguments emits an unanchored tuple  to the default output stream.</p> <p>There is also a <code>emit-direct-spout!</code> function that emits a tuple  to a direct stream and takes an additional argument as the second parameter of  the task id to send the tuple to.</p> <p>Spouts can be parameterized just like bolts, in which case the symbol is  bound to a function returning <code>IRichSpout</code> instead of the  <code>IRichSpout</code> itself. You can also declare an unprepared spout which  only defines the <code>nextTuple</code> method. Here is an example of an  unprepared spout that emits random sentences parameterized at runtime:</p>  <div><pre>(defspout sentence-spout-parameterized ["word"] {:params [sentences] :prepare false}  [collector]  (Thread/sleep 500)  (emit-spout! collector [(rand-nth sentences)])) </pre></div> <p>The following example illustrates how to use this spout in a  <code>spout-spec</code>:</p>  <div><pre>(spout-spec (sentence-spout-parameterized  ["the cat jumped over the door"  "greetings from a faraway land"])  :p 2) </pre></div> <h3>Running topologies in local mode or on a cluster</h3> <p>That's all there is to the Clojure DSL. To submit topologies in remote mode  or local mode, just use the <code>StormSubmitter</code> or  <code>LocalCluster</code> classes just like you would from Java.</p> <p>To create topology configs, it's easiest to use the  <code>backtype.storm.config</code> namespace which defines constants for all of  the possible configs. The constants are the same as the static constants in the  <code>Config</code> class, except with dashes instead of underscores. For  example, here's a topology config that sets the number of workers to 15 and  configures the topology in debug mode:</p>  <div><pre>{TOPOLOGY-DEBUG true  TOPOLOGY-WORKERS 15} </pre></div> <h3>Testing topologies</h3> <p><a href="http://www.pixelmachine.org/2011/12/17/Testing-Storm-Topologies.html">This  blog post</a> and its <a href="http://www.pixelmachine.org/2011/12/21/Testing-Storm-Topologies-Part-2.html">follow-up</a>  give a good overview of Storm's powerful built-in facilities for testing  topologies in Clojure.</p></div></div></div></div></span><strong style="font-size: 10pt;"><br /> </strong></div><img src ="http://www.blogjava.net/xuhongxing016/aggbug/368750.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/xuhongxing016/" target="_blank">徐红星 </a> 2012-01-19 15:38 <a href="http://www.blogjava.net/xuhongxing016/articles/368750.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Storm 序列化</title><link>http://www.blogjava.net/xuhongxing016/articles/368731.html</link><dc:creator>徐红星 </dc:creator><author>徐红星 </author><pubDate>Thu, 19 Jan 2012 06:29:00 GMT</pubDate><guid>http://www.blogjava.net/xuhongxing016/articles/368731.html</guid><wfw:comment>http://www.blogjava.net/xuhongxing016/comments/368731.html</wfw:comment><comments>http://www.blogjava.net/xuhongxing016/articles/368731.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/xuhongxing016/comments/commentRss/368731.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/xuhongxing016/services/trackbacks/368731.html</trackback:ping><description><![CDATA[<div>&nbsp; 本文翻译至Storm官方Wiki, 欢迎转载，转载请注明出处：<div><a id="Editor_Edit_hlEntryLink" title="view: Storm 序列化" href="../articles/368731.html" target="_blank">&nbsp;&nbsp; http://www.blogjava.net/xuhongxing016/articles/368731.html</a></div>&nbsp;&nbsp; 初次翻译，英文好的同学，可以查看英文文档： https://github.com/nathanmarz/storm/wiki/Serialization<br /><div>&nbsp;&nbsp;&nbsp; 本文是介绍关于Storm 0.6.0及以上版本的序列化系统，Storm 在之前版本使用了另外一套序列化系统。<br /><div><span id="result_box"><span title="Tuples can be comprised of objects of any types.">&nbsp;&nbsp;&nbsp; Tuple可以包含任何类型的对象。</span><span title="Since Storm is a distributed system, it needs to know how to serialize and deserialize objects when they're passed between tasks.">由于Storm是一个分布式系统，它需要知道任务之间传递对象时，怎样序列化和反序列化对象</span>。<br /><div><span id="result_box"><span title="Storm uses Kryo for serialization.">&nbsp;&nbsp;&nbsp; Storm使用Kryo来进行序列化。 </span><span title="Kryo is a flexible and fast serialization library that produces small serializations.">Kryo是一个灵活和快速的的序列化库，序列化对象较小。</span><span title="By default, Storm can serialize primitive types, strings, byte arrays, ArrayList, HashMap, HashSet, and the Clojure collection types.">默认情况下，Storm可以序列化的原始类型，字符串，字节数组的 ArrayList，HashMap，HashSet和Clojure的集合类型。</span><span title="If you want to use another type in your tuples, you'll need to register a custom serializer.">如果你想在你的元组使用另一种类型，你需要注册一个自定义序列化。<br /><br /></span><span title="Dynamic typing"><strong>动态类型</strong></span><span title="Dynamic typing"><br /></span><span title="There are no type declarations for fields in a Tuple.">Tuple中的字段没有进行类型声明。</span><span title="You put objects in fields and Storm figures out the serialization dynamically.">你把对象放到Fields里，Storm<div><span id="result_box"><span title="You put objects in fields and Storm figures out the serialization dynamically.">动态</span>序列化对象。<br /><span title="Before we get to the interface for serialization, let's spend a moment understanding why Storm's tuples are dynamically typed.">在我们获得序列化接口之前，让我们花点时间理解Storm 的Tuple 为什么是动态类型。<br /><div><span id="result_box"><span title="Adding static typing to tuple fields would add large amount of complexity to Storm's API.">添加静态类型的到Tuple 的Fileds，将会给Storm的API增加大量的复杂性。</span><span title="Hadoop, for example, statically types its keys and values but requires a huge amount of annotations on the part of the user.">例如，Hadoop，静态类型的键和值，在用户使用时，需要的很多注释。 </span><span title="Hadoop's API is a burden to use and the &quot;type safety&quot; isn't worth it.">Hadoop的API使用起来比较麻烦，&#8220;类型安全&#8221;是不值得的。</span><span title="Dynamic typing is simply easier to use.">动态类型是简单容易使用。<br />很难用一种合理的静态方式来统计Storm的Tuples。假设一个Bolt订阅了多个流，这些流的Tuple可能在Fields上有不同的类型，当一个Bolt在执行阶段接受Tuple，这些Tuple可能来自于任何流，</span><span title="When a Bolt receives a Tuple in execute, that tuple could have come from any stream and so could have any combination of types.">还有可能使任何类型的组合。</span><span title="There might be some reflection magic you can do to declare a different method for every tuple stream a bolt subscribes to, but Storm opts for the simpler, straightforward approach of dynamic typing.">可能有一些反射魔法，你可以为每一个被Bolt<div><span id="result_box"><span title="You put objects in fields and Storm figures out the serialization dynamically."><span title="Before we get to the interface for serialization, let's spend a moment understanding why Storm's tuples are dynamically typed."><span id="result_box"><span title="There might be some reflection magic you can do to declare a different method for every tuple stream a bolt subscribes to, but Storm opts for the simpler, straightforward approach of dynamic typing.">订阅的Tuple流定义不同的方法，但是</span></span></span></span></span></div>，但简单，直接的方法是动态类型。<br /></span><span title="Finally, another reason for using dynamic typing is so Storm can be used in a straightforward manner from dynamically typed languages like Clojure and JRuby.">最后，使用动态类型的另一个原因是Storm可以以一种简单直接的方式来使用类似于Clojure和JRuby的动态类型语言<br /></span><span title="Custom serialization">自定义序列化<br /></span><span title="As mentioned, Storm uses Kryo for serialization.">如前所述，Storm使用Kryo<div><span id="result_box"><span title="You put objects in fields and Storm figures out the serialization dynamically."><span title="Before we get to the interface for serialization, let's spend a moment understanding why Storm's tuples are dynamically typed."><span id="result_box"><span title="As mentioned, Storm uses Kryo for serialization.">来进行序列化</span></span></span></span></span></div>。</span><span title="To implement custom serializers, you need to register new serializers with Kryo.">要实现自定义的序列化，您需要注册新的序列化与Kryo。</span><span title="It's highly recommended that you read over Kryo's home page to understand how it handles custom serialization.">强烈建议您通过Kryo的主页阅读理解它是如何处理自定义序列。<br /><br /></span><span title="Adding custom serializers is done through the &quot;topology.kryo.register&quot; property in your topology config.">添加自定义的序列化是在Topology的配置属性&#8220;topology.kryo.register&#8221;完成的。</span><span title="It takes a list of registrations, where each registration can take one of two forms:">它需要一个注册的清单，其中每个登记可以采取以下两种形式之一：<br /><br /></span><span title="1.The name of a class to register.">1、 需要注册登记的类的名称。</span><span title="In this case, Storm will use Kryo's FieldsSerializer to serialize the class.">在这种情况下，Storm将使用Kryo的FieldsSerializer来序列化类。</span><span title="This may or may not be optimal for the class -- see the Kryo docs for more details.">这可能是最佳的类，也可能不是 - 更多细节见Kryo文档。<br /></span><span title="2.A map from the name of a class to register to an implementation of com.esotericsoftware.kryo.Serializer.">2、 映射下类的名字来注册登记com.esotericsoftware.kryo.Serializer实施。<br /></span><span title="Let's look at an example.">让我们来看看一个例子。</span></span></div><div> <div><pre>topology.kryo.register:  <br /> - com.mycompany.CustomType1  <br /> - com.mycompany.CustomType2: com.mycompany.serializer.CustomType2Serializer <br /> - com.mycompany.CustomType3 <br /><div><code>com.mycompany.CustomType1</code> 和  <code>com.mycompany.CustomType3</code> 使用  <code>FieldsSerializer</code> 来进行序列化<br /><code>com.mycompany.CustomType2</code> 使用 <code>com.mycompany.serializer.CustomType2Serializer</code> 来进行序列化</div><div><div><span id="result_box"><div><span id="result_box"><span title="You put objects in fields and Storm figures out the serialization dynamically."><span title="Before we get to the interface for serialization, let's spend a moment understanding why Storm's tuples are dynamically typed."><pre><span id="result_box">注册序列化器的</span></pre></span></span></span></div>的帮助。配置类有一个名为registerSerialization一个方法，在注册时添加到配置里<br /><div><span id="result_box">有更高阶的配置称为Config.TOPOLOGY_SKIP_MISSING_KRYO_REGISTRATIONS。如果设置为true，Storm将忽略任何已注册的，但在classpath中<div><span id="result_box"><span title="You put objects in fields and Storm figures out the serialization dynamically."><span title="Before we get to the interface for serialization, let's spend a moment understanding why Storm's tuples are dynamically typed."><pre><span id="result_box"><span id="result_box">没有自己的代码</span></span></pre></span></span></span></div>可用的序列化。否则，Storm会引发错误时，它无法找到一个序列化。如果你运行多个群集上，每个人都有不同的序列化的拓扑结构，这是有用的。<br />但要声明所有在storm.yaml文件拓扑中所有的序列化。<br /><div><span id="result_box">Java序列化<br />Storm如果遇到一个类型，它没有一个序列化注册，它有可能使用Java序列化。如果对象不能被Java序列化程序序列，Storm会抛出一个错误。<br /><br />Java序列化是极其昂贵的，无论是在CPU成本以及序列化的对象的大小。强烈建议您在Topology上生产环境之前注册定制序列。 Java序列化在处理Topology原型的时候就是这样的。<br /><br />可以通过配置文件来关闭Java序列化功能，只需将Config.TOPOLOGY_FALL_BACK_ON_JAVA_SERIALIZATION配置设置为false</span></div><br /></span></div><br /></span></div></div><br /></pre></div></div><br /></span></span></div></span></span></div></span></div></div></div><img src ="http://www.blogjava.net/xuhongxing016/aggbug/368731.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/xuhongxing016/" target="_blank">徐红星 </a> 2012-01-19 14:29 <a href="http://www.blogjava.net/xuhongxing016/articles/368731.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Twitter Storm 分布式RPC</title><link>http://www.blogjava.net/xuhongxing016/articles/368748.html</link><dc:creator>徐红星 </dc:creator><author>徐红星 </author><pubDate>Thu, 19 Jan 2012 06:17:00 GMT</pubDate><guid>http://www.blogjava.net/xuhongxing016/articles/368748.html</guid><wfw:comment>http://www.blogjava.net/xuhongxing016/comments/368748.html</wfw:comment><comments>http://www.blogjava.net/xuhongxing016/articles/368748.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/xuhongxing016/comments/commentRss/368748.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/xuhongxing016/services/trackbacks/368748.html</trackback:ping><description><![CDATA[<div><p><strong>本文转载自<div> <div lh22"=""><a href="/4096635/748348">http://chenlx.blog.51cto.com/4096635/748348</a><br />本来准备自己翻译一下，后来Google一下，发现网上已有，遂转载。<br /> </div></div><br /></strong></p><p><strong>分布式RPC</strong></p> <div>分布式RPC（DRPC）的真正目的是使用storm实时并行计算极端功能。Storm拓扑需要一个输入流作为函数参数，以一个输出流的形式发射每个函数调用的结果。</div> <div>&nbsp;</div> <div>DRPC没有多少storm特性，因为它是从storm的原始流，spouts，bolts，拓扑来表达一个模式。DRPC没有单独打包，但它如此有用，以至于和storm捆绑在一起。</div> <div>&nbsp;</div> <div><strong>概述</strong></div> <div>分布式RPC通过&#8220;DRPC  server&#8221;协调。DRPC服务器协调接收一个RPC请求，发送请求到storm拓扑，从storm拓扑接收结果，发送结果回等待的客户端。从一个客户端的角度来看，一个分布式RPC调用就像是一个常规的RPC调用。例如，一个客户端如何为带&#8220;http://twitter.com&#8221;参数的&#8220;reach&#8221;功能计算结果。</div><pre><ol><li><span>DRPCClient&nbsp;client&nbsp;=&nbsp;new&nbsp;DRPCClient("drpc-host",&nbsp;3772);&nbsp;</span></li><li>String&nbsp;result&nbsp;=&nbsp;client.execute(<span>"reach",&nbsp;"http://twitter.com");&nbsp;</span></li></ol></pre> <div> <div>分布式RPC工作流程如下：</div> <div><a href="http://img1.51cto.com/attachment/201112/114641689.png" target="_blank"><img alt="" src="http://img1.51cto.com/attachment/201112/114641689.png" border="0" width="650" /></a></div></div> <div> <p style="line-height: 150%"><strong><span style="line-height: 150%; font-family: 宋体; font-size: 12pt; font-weight: normal;'Times New Roman';">客户端发送功能名称及功能所需参数到</span></strong><strong><span style="line-height: 150%; font-family: 'Calibri', 'sans-serif'; font-size: 12pt; font-weight: normal;'Times New Roman';">DRPC</span></strong><strong><span style="line-height: 150%; font-family: 宋体; font-size: 12pt; font-weight: normal;'Times New Roman';">服务器去执行。</span></strong>图中的拓扑实现了此功能，它使用DRPCSpout从DRPC服务器接收功能调用流。每个功能调用通过DRPC服务器使用唯一ID标记，随后拓扑计算结果，在拓扑的最后，一个称之为&#8220;ReturnResults&#8221;的bolt连接到DRPC服务器，把结果交给这个功能调用（根据功能调用ID），DRPC服务器根据ID找到等待中的客户端，为等待中的客户端消除阻塞，并发送结果给客户端。</p> <div><strong>LinearDRPCTopologyBuilder</strong></div> <div>Storm有一个称之为LinearDRPCTopologyBuilder的拓扑Builder几乎自动完成DRPC所需的所有相关步骤。包括：</div> <div style="margin-left: 40px">1.设置spout</div> <div style="margin-left: 40px">2.返回结果给DRPC服务器</div> <div style="margin-left: 40px">3.为bolt提供对一组元组的有限聚合功能</div> <div>让我们看一个简单的例子。这是一个DRPC拓扑的实现，在输入参数的尾部追加&#8220;！&#8221;并返回：</div><pre><ol><li><span>public&nbsp;static&nbsp;class&nbsp;ExclaimBolt&nbsp;implements&nbsp;IBasicBolt&nbsp;{&nbsp;</span></li><li>&nbsp;&nbsp;&nbsp;&nbsp;<span>public&nbsp;void&nbsp;prepare(Map&nbsp;conf,&nbsp;TopologyContext&nbsp;context)&nbsp;{&nbsp;</span></li><li>&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;</li><li>&nbsp;</li><li>&nbsp;&nbsp;&nbsp;&nbsp;<span>public&nbsp;void&nbsp;execute(Tuple&nbsp;tuple,&nbsp;BasicOutputCollector&nbsp;collector)&nbsp;{&nbsp;</span></li><li>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;String&nbsp;input&nbsp;=&nbsp;tuple.getString(<span>1);&nbsp;</span></li><li>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;collector.emit(<span>new&nbsp;Values(tuple.getValue(0),&nbsp;input&nbsp;+&nbsp;"!"));&nbsp;</span></li><li>&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;</li><li>&nbsp;</li><li>&nbsp;&nbsp;&nbsp;&nbsp;<span>public&nbsp;void&nbsp;cleanup()&nbsp;{&nbsp;</span></li><li>&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;</li><li>&nbsp;</li><li>&nbsp;&nbsp;&nbsp;&nbsp;<span>public&nbsp;void&nbsp;declareOutputFields(OutputFieldsDeclarer&nbsp;declarer)&nbsp;{&nbsp;</span></li><li>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;declarer.declare(<span>new&nbsp;Fields("id",&nbsp;"result"));&nbsp;</span></li><li>&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;</li><li>&nbsp;</li><li>}&nbsp;</li><li>&nbsp;</li><li><span>public&nbsp;static&nbsp;void&nbsp;main(String[]&nbsp;args)&nbsp;throws&nbsp;Exception&nbsp;{&nbsp;</span></li><li>&nbsp;&nbsp;&nbsp;&nbsp;LinearDRPCTopologyBuilder&nbsp;builder&nbsp;=&nbsp;<span>new&nbsp;LinearDRPCTopologyBuilder("exclamation");&nbsp;</span></li><li>&nbsp;&nbsp;&nbsp;&nbsp;builder.addBolt(<span>new&nbsp;ExclaimBolt(),&nbsp;3);&nbsp;</span></li><li>&nbsp;&nbsp;&nbsp;&nbsp;<span>//&nbsp;...&nbsp;</span></li><li>}&nbsp;</li></ol></pre> <p style="line-height: 150%">如你所见，代码非常少。当创建LinearDRPCTopologyBuilder时，你把这个拓扑的DRPC功能名称告诉storm。一个DRPC服务器可以协调许多功能，功能名称用于区别不同的功能，首先声明的bolt将接收一个输入的2-tuples，第一个字段是请求ID，第二个字段是请求参数。LinearDRPCTopologyBuilder认为最后的bolt会发射一个输出流，该输出流包含[id,  result]格式的2-tuples。最后，所有拓扑中间过程产生的元组（tuple）都包含请求id作为其第一个字段。</p> <div> <div>在这个例子中，ExclaimBolt只是简单地在元组的第二个字段尾部追加&#8220;！&#8221;字符。LinearDRPCTopologyBuilder处理其余的协调工作，包括连接DRPC服务器，发送最终结果。</div> <div><strong>&nbsp;</strong></div> <div><strong>本地模式DRPC</strong></div></div> <div> <div>DRPC可以运行在本地模式。这是如何在本地模式运行上述例子：</div><pre><ol><li><span>LocalDRPC&nbsp;drpc&nbsp;=&nbsp;new&nbsp;LocalDRPC();&nbsp;</span></li><li>LocalCluster&nbsp;cluster&nbsp;=&nbsp;<span>new&nbsp;LocalCluster();&nbsp;</span></li><li>&nbsp;</li><li>cluster.submitTopology(<span>"drpc-demo",&nbsp;conf,&nbsp;builder.createLocalTopology(drpc));&nbsp;</span></li><li>&nbsp;</li><li>System.out.println(<span>"Results&nbsp;for&nbsp;'hello':"&nbsp;+&nbsp;drpc.execute("exclamation",&nbsp;"hello"));&nbsp;</span></li><li>&nbsp;</li><li>cluster.shutdown();&nbsp;</li><li>drpc.shutdown();&nbsp;</li></ol></pre></div> <div> <div>首先你创建一个LocalDRPC对象。这个对象在进程内模拟一个DRPC服务器，就像在进程内模拟一个storm集群一样。然后你创建本地集群，在本地模式运行这个拓扑。创建本地拓扑和远程拓扑，LinearDRPCTopologyBuilder有不同的方法。在本地模式，LocalDRPC未绑定任何端口，拓扑也需要知道与哪个对象通讯，这是为什么createLocaclTopology方法需要接受LocalDRPC对象作为输入参数的原因。</div> <div>&nbsp;</div> <div>载入拓扑后，你可以用LocalDRPC的execute方法执行DRPC调用。</div> <div>&nbsp;</div> <div><strong>远程模式DRPC</strong></div></div> <div> <div>在实际的集群使用DRPC也很简单。有三个步骤：</div> <div>1.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 启动DRPC服务器</div> <div>2.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 配置DRPC服务器位置</div> <div>3.&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 提交DRPC拓扑到storm集群</div> <div>使用storm脚本启动DRPC服务器，和启动nimbus和ui一样：</div></div><pre><ol><li><span>bin/storm&nbsp;drpc&nbsp;</span></li></ol></pre> <div> <div>接下来，配置你的storm集群，让集群知道DRPC服务器的位置，这样DRPCSpout就知道从哪里读取功能调用。可以通过修改storm.yaml配置文件或拓扑配置完成配置DRPC服务器位置。修改storm.yaml配置文件如下所示：</div></div><pre><ol><li><span>drpc.servers:&nbsp;</span></li><li>&nbsp;&nbsp;-&nbsp;<span>"drpc1.foo.com"&nbsp;</span></li><li>&nbsp;&nbsp;-&nbsp;<span>"drpc2.foo.com"&nbsp;</span></li></ol></pre> <div> <div>最后，使用StormSubmitter启动DRPC拓扑，就像启动其它拓扑一样。在远程模式运行上述例子，代码如下所示：</div><pre><ol><li><span>StormSubmitter.submitTopology("exclamation-drpc",&nbsp;conf,&nbsp;builder.createRemoteTopology());&nbsp;</span></li></ol></pre> <div> <div>createRemoteTopology方法用于在storm集群创建拓扑。</div> <div><strong>&nbsp;</strong></div> <div><strong>一个更完整的例子</strong></div></div></div> <div> <div>这个exclaimation  DRPC例子只是一个用来说明DRPC概念的玩具。让我们看一个更完整的例子，该例子是一个真正需要storm集群的并行计算的DRPC功能。我们将要看的例子是对twitter网站上的一个URL的接触用户进行统计。</div> <div>一个URL的接触用户数是在twitter网站上接触一个URL的用户数，你需要以下4步：</div> <div>1.&nbsp;获取tweeted the URL的全部用户</div> <div>2.&nbsp;获取这些用户的全部追随者</div> <div>3.&nbsp;使追随者集合中的用户唯一</div> <div>4.&nbsp;统计唯一的用户数</div> <div>一个单独的reach计算在计算期间涉及到数千数据库访问和数千万追随者记录。它是一个真正的耗时计算。正如你将要看到的，在storm上实现这个功能非常简单。在一台机器上，reach计算花费数分钟，在storm集群，最难计算reach的URL也只需数秒。</div> <div>Storm-starter项目<a href="https://github.com/nathanmarz/storm-starter/blob/master/src/jvm/storm/starter/ReachTopology.java">这里</a>定义了一个reach样例，reach拓扑定义如下所示：</div></div><pre><ol><li><span>LinearDRPCTopologyBuilder&nbsp;builder&nbsp;=&nbsp;new&nbsp;LinearDRPCTopologyBuilder("reach");&nbsp;</span></li><li>builder.addBolt(<span>new&nbsp;GetTweeters(),&nbsp;3);&nbsp;</span></li><li>builder.addBolt(<span>new&nbsp;GetFollowers(),&nbsp;12)&nbsp;</span></li><li>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.shuffleGrouping();&nbsp;</li><li>builder.addBolt(<span>new&nbsp;PartialUniquer(),&nbsp;6)&nbsp;</span></li><li>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.fieldsGrouping(<span>new&nbsp;Fields("id",&nbsp;"follower"));&nbsp;</span></li><li>builder.addBolt(<span>new&nbsp;CountAggregator(),&nbsp;2)&nbsp;</span></li><li>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.fieldsGrouping(<span>new&nbsp;Fields("id"));&nbsp;</span></li></ol></pre> <div> <div>这个拓扑以4个步骤的形式执行：</div> <p>1.&nbsp;GetTweeters获取tweeted the URL的用户。它转换一个[id, url]形式的输入流到[id,  tweeter]形式的输出流。每个url元组将映射到多个tweeter元组。</p> <p>2.&nbsp;GetFollowers获取这些tweeter的追随者。它转换一个[id, tweeter]形式的输入流到[id,  follower]形式的输出流。跨所有任务，当某人追随多个tweeter，这些tweeter又tweeted相同的URL时，这可能会得到重复的追随者。</p> <div>3.&nbsp;PartialUniquer按追随者ID对追随者数据流进行分组。同一的追随者去到同一的任务，因此每个PartialUniquer任务都接收到独立的相互独立的追随者集合。PartialUniquer一旦收到请求ID用于它的所有追随者元组，它就发射追随者子集的唯一总数。</div> <div>4.&nbsp;最后，CountAggregator从每个PartialUniquer任务接收计数并对它们求和。  <div>让我们来看看PartialUniquer：</div><pre><ol><li><span>public&nbsp;static&nbsp;class&nbsp;PartialUniquer&nbsp;implements&nbsp;IRichBolt,&nbsp;FinishedCallback&nbsp;{&nbsp;</span></li><li>&nbsp;&nbsp;&nbsp;&nbsp;OutputCollector&nbsp;_collector;&nbsp;</li><li>&nbsp;&nbsp;&nbsp;&nbsp;Map&lt;Object,&nbsp;Set&lt;String&gt;&gt;&nbsp;_sets&nbsp;=&nbsp;<span>new&nbsp;HashMap&lt;Object,&nbsp;Set&lt;String&gt;&gt;();&nbsp;</span></li><li>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</li><li>&nbsp;&nbsp;&nbsp;&nbsp;<span>public&nbsp;void&nbsp;prepare(Map&nbsp;conf,&nbsp;TopologyContext&nbsp;context,&nbsp;OutputCollector&nbsp;collector)&nbsp;{&nbsp;</span></li><li>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;_collector&nbsp;=&nbsp;collector;&nbsp;</li><li>&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;</li><li>&nbsp;</li><li>&nbsp;&nbsp;&nbsp;&nbsp;<span>public&nbsp;void&nbsp;execute(Tuple&nbsp;tuple)&nbsp;{&nbsp;</span></li><li>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Object&nbsp;id&nbsp;=&nbsp;tuple.getValue(<span>0);&nbsp;</span></li><li>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Set&lt;String&gt;&nbsp;curr&nbsp;=&nbsp;_sets.get(id);&nbsp;</li><li>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span>if(curr==null)&nbsp;{&nbsp;</span></li><li>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;curr&nbsp;=&nbsp;<span>new&nbsp;HashSet&lt;String&gt;();&nbsp;</span></li><li>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;_sets.put(id,&nbsp;curr);&nbsp;</li><li>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;</li><li>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;curr.add(tuple.getString(<span>1));&nbsp;</span></li><li>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;_collector.ack(tuple);&nbsp;</li><li>&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;</li><li>&nbsp;</li><li>&nbsp;&nbsp;&nbsp;&nbsp;<span>public&nbsp;void&nbsp;cleanup()&nbsp;{&nbsp;</span></li><li>&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;</li><li>&nbsp;</li><li>&nbsp;&nbsp;&nbsp;&nbsp;<span>public&nbsp;void&nbsp;finishedId(Object&nbsp;id)&nbsp;{&nbsp;</span></li><li>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Set&lt;String&gt;&nbsp;curr&nbsp;=&nbsp;_sets.remove(id);&nbsp;</li><li>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span>int&nbsp;count;&nbsp;</span></li><li>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<span>if(curr!=null)&nbsp;{&nbsp;</span></li><li>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;count&nbsp;=&nbsp;curr.size();&nbsp;</li><li>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;<span>else&nbsp;{&nbsp;</span></li><li>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;count&nbsp;=&nbsp;<span>0;&nbsp;</span></li><li>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;</li><li>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;_collector.emit(<span>new&nbsp;Values(id,&nbsp;count));&nbsp;</span></li><li>&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;</li><li>&nbsp;</li><li>&nbsp;&nbsp;&nbsp;&nbsp;<span>public&nbsp;void&nbsp;declareOutputFields(OutputFieldsDeclarer&nbsp;declarer)&nbsp;{&nbsp;</span></li><li>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;declarer.declare(<span>new&nbsp;Fields("id",&nbsp;"partial-count"));&nbsp;</span></li><li>&nbsp;&nbsp;&nbsp;&nbsp;}&nbsp;</li><li>}&nbsp;</li></ol></pre> <div>当PartialUniquer在exectue方法中接收一个follower元组时，它用一个内部HashMap添加它到与请求ID对应的集合。</div> <div>PartialUniquer也实现了FinishedCallback接口，它告诉LinearDRPCTopologyBuilder，对于任意给定的请求ID，当它已收到所有指向它的元组时，请通知它。这个回调是finishedId方法。在这个回调中，PartialUniquer发射单一的元组，元组包含它的追随者子集的唯一总数。&nbsp;</div> <p>在底层，CoordinatedBolt用于检测一个bolt何时收到该请求ID的所有元组。CoordinatedBolt使用direct  stream管理协调。</p> <div>其它的拓扑应该是不言自明。如你所见，reach计算的每一单步都是并行执行的，而且定义一个DRPC拓扑也非常简单。</div> <div> <div><strong>Non-Linear DRPC拓扑</strong></div> <div>LinearDRPCTopologyBuilder仅处理&#8220;线性的&#8221;DRPC拓扑，计算以一连串步骤的形式表达（像reach）。不难想象某些功能将需要更复杂的拓扑结构，这些拓扑带有带分支和合并bolt。目前，要做到这一点，你需要直接使用CoordinateBolt。务必在邮件列表中谈谈你的非线性DRPC拓扑用例，写下DRPC拓扑更普遍的抽象结构。</div> <div><strong>LinearDRPCTopologyBuilder如何工作？</strong></div> <p>DRPCSpout发射[args, return-info]，return-info是DRPC服务器的主机和端口，还有DRPC服务器生成的ID。</p> <div>拓扑组成部分：</div> <ul><li>DRPCSpout</li><li>PrepareRequest（生成一个请求ID，创建一个返回信息流，一个参数流）</li><li>CoordinatedBolt包装器和直接分组</li><li>JoinResult（同返回信息一起连接结果）</li><li>ReturnResult（连接DRPC服务器并返回结果）</li><li>LinearDRPCTopologyBuilder是一个构建在Storm原语之上的高层次抽象的好例子。</li></ul> <div><strong>高级</strong></div> <ul><li>同时编排处理多个请的KeyedFairBolt</li><li>如何直接使用CoordinateBolt</li></ul></div></div></div></div></div><img src ="http://www.blogjava.net/xuhongxing016/aggbug/368748.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/xuhongxing016/" target="_blank">徐红星 </a> 2012-01-19 14:17 <a href="http://www.blogjava.net/xuhongxing016/articles/368748.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Storm简介</title><link>http://www.blogjava.net/xuhongxing016/articles/368503.html</link><dc:creator>徐红星 </dc:creator><author>徐红星 </author><pubDate>Sat, 14 Jan 2012 09:08:00 GMT</pubDate><guid>http://www.blogjava.net/xuhongxing016/articles/368503.html</guid><wfw:comment>http://www.blogjava.net/xuhongxing016/comments/368503.html</wfw:comment><comments>http://www.blogjava.net/xuhongxing016/articles/368503.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/xuhongxing016/comments/commentRss/368503.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/xuhongxing016/services/trackbacks/368503.html</trackback:ping><description><![CDATA[<span style="font-family: 宋体; color: #333333; font-size: 9pt; mso-ascii-font-family: Arial; mso-hansi-font-family: Arial; mso-bidi-font-family: Arial; mso-font-kerning: 0pt"><strong>Storm简介<br /></strong>Storm是一个分布式的、容错的实时计算系统，可以方便地在一个计算机集群中编写与扩展复杂的实时计算。在海量领域里，Storm用于实时数据的处理，Hadoop用于批数据的处理，两者可以说是绝代双雄！Storm保证每个消息都会得到处理，而且它很快&#8212;&#8212;在一个小集群中，每秒可以处理数以百万计的消息。<br />Storm的优点<br />1. 简单的编程模型。类似于MapReduce降低了并行批处理复杂性，Storm降低了进行实时处理的复杂性。<br />2. 服务化,一个服务框架,支持热部署,即时上线或下线App.<br />3. 可以使用各种编程语言。你可以在Storm之上使用各种编程语言。默认支持Clojure、Java、Ruby和Python。要增加对其他语言的支持，只需实现一个简单的Storm通信协议即可。<br />4. 容错性。Storm会管理工作进程和节点的故障。<br />5. 水平扩展。计算是在多个线程、进程和服务器之间并行进行的。<br />6. 可靠的消息处理。Storm保证每个消息至少能得到一次完整处理。任务失败时，它会负责从消息源重试消息。<br />7. 快速。系统的设计保证了消息能得到快速的处理，使用ZeroMQ作为其底层消息队列。<br />8. 本地模式。Storm有一个&#8220;本地模式&#8221;，可以在处理过程中完全模拟Storm集群。这让你可以快速进行开发和单元测试。<br />有优点，必定有缺点。不过相对来说，我觉得这些<span style="font-family: 'Arial', 'sans-serif'; color: #333333; font-size: 9pt; mso-font-kerning: 0pt; mso-fareast-font-family: 宋体" xml:lang="EN-US" lang="EN-US"><font face=""><font face="宋体">问题都不大<br /></font>1.</font></span> <span style="font-family: 宋体; color: #333333; font-size: 9pt; mso-ascii-font-family: Arial; mso-hansi-font-family: Arial; mso-bidi-font-family: Arial; mso-font-kerning: 0pt">目前的开源版本中只是单节点</span><span style="font-family: 'Arial', 'sans-serif'; color: #333333; font-size: 9pt; mso-font-kerning: 0pt; mso-fareast-font-family: 宋体" xml:lang="EN-US" lang="EN-US"><font face="">Nimbus（我们可以在生产环境</font></span><span style="font-family: 宋体; color: #333333; font-size: 9pt; mso-ascii-font-family: Arial; mso-hansi-font-family: Arial; mso-bidi-font-family: Arial; mso-font-kerning: 0pt">实现一个双</span><span style="font-family: 'Arial', 'sans-serif'; color: #333333; font-size: 9pt; mso-font-kerning: 0pt; mso-fareast-font-family: 宋体" xml:lang="EN-US" lang="EN-US"><font face="">nimbus</font></span><span style="font-family: 宋体; color: #333333; font-size: 9pt; mso-ascii-font-family: Arial; mso-hansi-font-family: Arial; mso-bidi-font-family: Arial; mso-font-kerning: 0pt">的布局）。</span><span style="font-family: 'Arial', 'sans-serif'; color: #333333; font-size: 9pt; mso-font-kerning: 0pt; mso-fareast-font-family: 宋体" xml:lang="EN-US" lang="EN-US"><br /><font face="">2. Clojure</font></span><span style="font-family: 宋体; color: #333333; font-size: 9pt; mso-ascii-font-family: Arial; mso-hansi-font-family: Arial; mso-bidi-font-family: Arial; mso-font-kerning: 0pt">是一个在</span><span style="font-family: 'Arial', 'sans-serif'; color: #333333; font-size: 9pt; mso-font-kerning: 0pt; mso-fareast-font-family: 宋体" xml:lang="EN-US" lang="EN-US"><font face="">JVM</font></span><span style="font-family: 宋体; color: #333333; font-size: 9pt; mso-ascii-font-family: Arial; mso-hansi-font-family: Arial; mso-bidi-font-family: Arial; mso-font-kerning: 0pt">平台运行的动态函数式编程语言</span><span style="font-family: 'Arial', 'sans-serif'; color: #333333; font-size: 9pt; mso-font-kerning: 0pt; mso-fareast-font-family: 宋体" xml:lang="EN-US" lang="EN-US"><font face="">,</font></span><span style="font-family: 宋体; color: #333333; font-size: 9pt; mso-ascii-font-family: Arial; mso-hansi-font-family: Arial; mso-bidi-font-family: Arial; mso-font-kerning: 0pt">优势在于流程计算，</span><span style="font-family: 'Arial', 'sans-serif'; color: #333333; font-size: 9pt; mso-font-kerning: 0pt; mso-fareast-font-family: 宋体" xml:lang="EN-US" lang="EN-US"><font face="">Storm</font></span><span style="font-family: 宋体; color: #333333; font-size: 9pt; mso-ascii-font-family: Arial; mso-hansi-font-family: Arial; mso-bidi-font-family: Arial; mso-font-kerning: 0pt">的部分核心内容由</span><span style="font-family: 'Arial', 'sans-serif'; color: #333333; font-size: 9pt; mso-font-kerning: 0pt; mso-fareast-font-family: 宋体" xml:lang="EN-US" lang="EN-US"><font face="">Clojure</font></span><span style="font-family: 宋体; color: #333333; font-size: 9pt; mso-ascii-font-family: Arial; mso-hansi-font-family: Arial; mso-bidi-font-family: Arial; mso-font-kerning: 0pt">编写，虽然性能上提高不少但同时也提升了维护成本--学学Clojure也很不错，只要你融入里面。<br /><br /><strong><span style="font-family: Arial, sans-serif; font-size: 9pt" xml:lang="EN-US" lang="EN-US"><font face="">Storm</font></span></strong><strong><span style="font-family: 宋体; font-size: 9pt">架构<br /><br /></span></strong>
<div style="border-bottom: #919699 1pt solid; padding-bottom: 2pt; border-right-style: none; padding-left: 0cm; padding-right: 0cm; border-top-style: none;background: white; border-left-style: none; padding-top: 0cm; mso-element: para-border-div; mso-border-bottom-alt: solid #919699 .75pt"><span style="font-family: 'Arial', 'sans-serif'; color: #333333; font-size: 9pt; mso-font-kerning: 0pt; mso-fareast-font-family: 宋体" xml:lang="EN-US" lang="EN-US"><font face="">Storm</font></span><span style="font-family: 宋体; color: #333333; font-size: 9pt; mso-ascii-font-family: Arial; mso-hansi-font-family: Arial; mso-bidi-font-family: Arial; mso-font-kerning: 0pt">集群由一个主节点和多个工作节点组成，分布式的架构大多如此，没什么好说的。主节点运行了一个名为</span><span style="font-family: 'Arial', 'sans-serif'; color: #333333; font-size: 9pt; mso-font-kerning: 0pt; mso-fareast-font-family: 宋体" xml:lang="EN-US" lang="EN-US"><font face="">&#8220;Nimbus&#8221;</font></span><span style="font-family: 宋体; color: #333333; font-size: 9pt; mso-ascii-font-family: Arial; mso-hansi-font-family: Arial; mso-bidi-font-family: Arial; mso-font-kerning: 0pt">的守护进程，用于分配代码、布置任务及故障检测。每个工作节点都运行了一个名为</span><span style="font-family: 'Arial', 'sans-serif'; color: #333333; font-size: 9pt; mso-font-kerning: 0pt; mso-fareast-font-family: 宋体" xml:lang="EN-US" lang="EN-US"><font face="">&#8220;Supervisor&#8221;</font></span><span style="font-family: 宋体; color: #333333; font-size: 9pt; mso-ascii-font-family: Arial; mso-hansi-font-family: Arial; mso-bidi-font-family: Arial; mso-font-kerning: 0pt">的守护进程，用于监听工作，开始并终止工作进程。</span><span style="font-family: 'Arial', 'sans-serif'; color: #333333; font-size: 9pt; mso-font-kerning: 0pt; mso-fareast-font-family: 宋体" xml:lang="EN-US" lang="EN-US"><font face="">Nimbus</font></span><span style="font-family: 宋体; color: #333333; font-size: 9pt; mso-ascii-font-family: Arial; mso-hansi-font-family: Arial; mso-bidi-font-family: Arial; mso-font-kerning: 0pt">和</span><span style="font-family: 'Arial', 'sans-serif'; color: #333333; font-size: 9pt; mso-font-kerning: 0pt; mso-fareast-font-family: 宋体" xml:lang="EN-US" lang="EN-US"><font face="">Supervisor</font></span><span style="font-family: 宋体; color: #333333; font-size: 9pt; mso-ascii-font-family: Arial; mso-hansi-font-family: Arial; mso-bidi-font-family: Arial; mso-font-kerning: 0pt">都能快速失败，而且是无状态的，这样十分健壮，两者的协调当然是由</span><span style="font-family: 'Arial', 'sans-serif'; color: #333333; font-size: 9pt; mso-font-kerning: 0pt; mso-fareast-font-family: 宋体" xml:lang="EN-US" lang="EN-US"><font face="">Zookeeper</font></span><span style="font-family: 宋体; color: #333333; font-size: 9pt; mso-ascii-font-family: Arial; mso-hansi-font-family: Arial; mso-bidi-font-family: Arial; mso-font-kerning: 0pt">来完成的，</span><span style="font-family: 'Arial', 'sans-serif'; color: #333333; font-size: 9pt; mso-font-kerning: 0pt; mso-fareast-font-family: 宋体" xml:lang="EN-US" lang="EN-US"><font face="">ZooKeeper</font></span><span style="font-family: 宋体; color: #333333; font-size: 9pt; mso-ascii-font-family: Arial; mso-hansi-font-family: Arial; mso-bidi-font-family: Arial; mso-font-kerning: 0pt">用于管理集群中的不同组件，</span><span style="font-family: 'Arial', 'sans-serif'; color: #333333; font-size: 9pt; mso-font-kerning: 0pt; mso-fareast-font-family: 宋体" xml:lang="EN-US" lang="EN-US"><font face="">ZeroMQ</font></span><span style="font-family: 宋体; color: #333333; font-size: 9pt; mso-ascii-font-family: Arial; mso-hansi-font-family: Arial; mso-bidi-font-family: Arial; mso-font-kerning: 0pt">是内部消息系统，</span><span style="font-family: 'Arial', 'sans-serif'; color: #333333; font-size: 9pt; mso-font-kerning: 0pt; mso-fareast-font-family: 宋体" xml:lang="EN-US" lang="EN-US"><font face="">JZMQ</font></span><span style="font-family: 宋体; color: #333333; font-size: 9pt; mso-ascii-font-family: Arial; mso-hansi-font-family: Arial; mso-bidi-font-family: Arial; mso-font-kerning: 0pt">是</span><span style="font-family: 'Arial', 'sans-serif'; color: #333333; font-size: 9pt; mso-font-kerning: 0pt; mso-fareast-font-family: 宋体" xml:lang="EN-US" lang="EN-US"><font face="">ZeroMQMQ</font></span><span style="font-family: 宋体; color: #333333; font-size: 9pt; mso-ascii-font-family: Arial; mso-hansi-font-family: Arial; mso-bidi-font-family: Arial; mso-font-kerning: 0pt">的</span><span style="font-family: 'Arial', 'sans-serif'; color: #333333; font-size: 9pt; mso-font-kerning: 0pt; mso-fareast-font-family: 宋体" xml:lang="EN-US" lang="EN-US"><font face="">Java Binding</font></span><span style="font-family: 宋体; color: #333333; font-size: 9pt; mso-ascii-font-family: Arial; mso-hansi-font-family: Arial; mso-bidi-font-family: Arial; mso-font-kerning: 0pt">。<br /></span><span style="font-family: 'Arial', 'sans-serif'; color: #333333; font-size: 9pt; mso-font-kerning: 0pt; mso-fareast-font-family: 宋体; mso-no-proof: yes" xml:lang="EN-US" lang="EN-US"></span></div></span></span><img src ="http://www.blogjava.net/xuhongxing016/aggbug/368503.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/xuhongxing016/" target="_blank">徐红星 </a> 2012-01-14 17:08 <a href="http://www.blogjava.net/xuhongxing016/articles/368503.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item></channel></rss>