﻿<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/"><channel><title>语源科技BlogJava-&lt;b&gt;成都心情&lt;/b&gt;</title><link>http://www.blogjava.net/rosen/</link><description /><language>zh-cn</language><lastBuildDate>Mon, 11 May 2026 00:12:17 GMT</lastBuildDate><pubDate>Mon, 11 May 2026 00:12:17 GMT</pubDate><ttl>60</ttl><item><title>Hadoop周刊—第 176 期</title><link>http://www.blogjava.net/rosen/archive/2016/07/12/431174.html</link><dc:creator>Rosen</dc:creator><author>Rosen</author><pubDate>Tue, 12 Jul 2016 13:21:00 GMT</pubDate><guid>http://www.blogjava.net/rosen/archive/2016/07/12/431174.html</guid><wfw:comment>http://www.blogjava.net/rosen/comments/431174.html</wfw:comment><comments>http://www.blogjava.net/rosen/archive/2016/07/12/431174.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/rosen/comments/commentRss/431174.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/rosen/services/trackbacks/431174.html</trackback:ping><description><![CDATA[<p align="left" style="line-height: 10%;"><strong>&nbsp;</strong></p>  <p align="left" style="line-height: 10%;"><strong><span style="font-size:16.0pt;line-height:10%">Hadoop</span></strong><strong><span style="font-size:16.0pt;line-height:10%;font-family:宋体;">周刊</span></strong><strong> </strong><strong><span style="font-size:16.0pt;line-height: 10%;font-family:宋体;">第</span></strong><strong><span style="font-size:16.0pt;line-height:10%"> 176 </span></strong><strong><span style="font-size:16.0pt;line-height: 10%;font-family:宋体;">期</span></strong><strong></strong></p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;"><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">启明星辰平台和大数据总体组编译</span></p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;"><span style="font-size:14.0pt;line-height:10%">2016</span><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">年</span><span style="font-size:14.0pt;line-height:10%">6</span><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">月</span><span style="font-size:14.0pt;line-height:10%">29</span><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">日</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">峰会本周在圣何塞召开，所以很期待在下期周刊看到新项目的发布和精彩演讲（请向我们提供任何相关的幻灯片）。至于本期周刊，有大量关于</span><span style="font-family:Helvetica;">Kafka Streams</span><span style="font-family:宋体;">、从</span><span style="font-family:Helvetica;">Amazon Kinesis</span><span style="font-family:宋体;">向</span><span style="font-family:Helvetica;">Google BigQuery</span><span style="font-family:宋体;">传递流式数据、</span><span style="font-family:Helvetica;">Google</span><span style="font-family:宋体;">数据集搜索系统的文章。</span></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">技术新闻</span></strong><strong></strong></p>  <p><span style="font-family:Helvetica;">Shine</span><span style="font-family:宋体;">介绍了他们如何使用</span><span style="font-family:Helvetica;">Amazon Lambda</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">Amazon Kinesis</span><span style="font-family:宋体;">，以及为</span><span style="font-family: Helvetica;">Apache web</span><span style="font-family: 宋体;">服务器提供的</span><span style="font-family: Helvetica;">Kinesis</span><span style="font-family: 宋体;">代理（用于采日志）</span><span style="font-family:宋体;">，以及从</span><span style="font-family:Helvetica;">EC2</span><span style="font-family:宋体;">移动数据到</span><span style="font-family:Helvetica;">Google BigQuery</span><span style="font-family: 宋体;">的内容。本文提供了</span><span style="font-family:Helvetica;">Lambda</span><span style="font-family:宋体;">函数（</span><span style="font-family:Helvetica;">javascript</span><span style="font-family:宋体;">编写）代码片段，规模和开销方面的信息，描述了如何通过</span><span style="font-family:Helvetica;">gzip</span><span style="font-family:宋体;">压缩数据从而优化传输开销。</span></p>  <p align="left"><a href="https://blog.shinetech.com/2016/06/21/kinesis-lambda-bigquery/"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">https://blog.shinetech.com/2016/06/21/kinesis-lambda-bigquery/</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Cloudera</span><span style="font-family:宋体;">博客撰文介绍了如何通过</span><span style="font-family: Helvetica;">Apache Spark</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Apache Impala</span><span style="font-family:宋体;">（孵化中）、</span><span style="font-family:Helvetica;">Hue</span><span style="font-family:宋体;">对梦之队数据进行分析。本文主要聚焦在分析上，附带了些</span><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">代码以及</span><span style="font-family:Helvetica;">Hue</span><span style="font-family:宋体;">的功能演示。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://blog.cloudera.com/blog/2016/06/how-to-analyze-fantasy-sports-with-apache-spark-and-sql-part-2-data-exploration/</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">KDnuggets</span><span style="font-family:宋体;">撰文介绍了</span><span style="font-family:Helvetica;">13</span><span style="font-family:宋体;">个和</span><span style="font-family:Helvetica;">Apache Spark</span><span style="font-family:宋体;">相关的主要</span><span style="font-family:Helvetica;">API/</span><span style="font-family:宋体;">项目</span><span style="font-family:Helvetica;">/</span><span style="font-family:宋体;">名词。包括</span><span style="font-family:Helvetica;">RDD</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">DataFrame</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Dataset</span><span style="font-family:宋体;">、结构化流式计算、</span><span style="font-family:Helvetica;">GraphX</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Tungsten</span><span style="font-family:宋体;">。每个条目都有一段章节介绍，足够很好的了解</span><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">主要特性了。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://www.kdnuggets.com/2016/06/spark-key-terms-explained.html</span></p>  <p>&nbsp;</p>  <p><span style="font-family:宋体;">本文来自</span><span style="font-family:Helvetica;">Confluent</span><span style="font-family:宋体;">博客，介绍了那些虽看起来简单却又不简单的</span><span style="font-family:Helvetica;">Kafka Streams</span><span style="font-family:宋体;">应用。例如用</span><span style="font-family:Helvetica;">Kafka Streams</span><span style="font-family:宋体;">编写结合用户点击流数据和用户位置数据的程序。后者存储在</span><span style="font-family:Helvetica;">KTable</span><span style="font-family:宋体;">中，</span><span style="font-family:Helvetica;">KTable</span><span style="font-family:宋体;">提供了类似带有数据库表主键的抽象（主键的最新值通过</span><span style="font-family:Helvetica;">API</span><span style="font-family:宋体;">暴露）。最后的程序倒是简单</span><span style="font-family:Helvetica;">&#8212;&#8212;</span><span style="font-family:宋体;">只有几行代码。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://www.confluent.io/blog/distributed-real-time-joins-and-aggregations-on-user-activity-events-using-kafka-streams</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Cloudera</span><span style="font-family:宋体;">博客撰文介绍了</span><span style="font-family:Helvetica;">meinstadt.de</span><span style="font-family:宋体;">构建在</span><span style="font-family:Helvetica;">Apache Flume</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Apache Spark Streaming</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Apache Impala</span><span style="font-family:宋体;">（孵化中）上的</span><span style="font-family:Helvetica;">HTTP</span><span style="font-family:宋体;">请求异常检测系统。实现代码放在了</span><span style="font-family: Helvetica;">github</span><span style="font-family:宋体;">上。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://blog.cloudera.com/blog/2016/06/how-to-detect-and-report-web-traffic-anomalies-in-near-real-time/</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">AWS</span><span style="font-family:宋体;">大数据博客有教程介绍了如何使用</span><span style="font-family: Helvetica;">Apache Spark</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">Apache Zeppelin</span><span style="font-family:宋体;">从</span><span style="font-family:Helvetica;">Amazon EMR</span><span style="font-family:宋体;">集群处理</span><span style="font-family:Helvetica;">Amazon Kinesis</span><span style="font-family:宋体;">流数据。本文包含了一些通过</span><span style="font-family:Helvetica;">Zeppelin notebook</span><span style="font-family:宋体;">运行</span><span style="font-family:Helvetica;">SQL</span><span style="font-family:宋体;">产生的数据可视化范例。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://blogs.aws.amazon.com/bigdata/post/Tx3K805CZ8WFBRP/Analyze-Realtime-Data-from-Amazon-Kinesis-Streams-Using-Zeppelin-and-Spark-Strea</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Apache Kudu</span><span style="font-family: 宋体;">（孵化中）接近</span><span style="font-family:Helvetica;">1.0</span><span style="font-family:宋体;">版发布了，将全面支持高可用性。本文介绍了这最后一块拼图</span><span style="font-family:Helvetica;">&#8220;</span><span style="font-family:宋体;">主复制</span><span style="font-family:Helvetica;">&#8221;</span><span style="font-family:宋体;">是如何实现的。晒了下</span><span style="font-family:Helvetica;">JIRA</span><span style="font-family:宋体;">上各种问题的跟进的情况，以及完成与剩余的测试。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://kudu.apache.org/2016/06/24/multi-master-1-0-0.html</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Google</span><span style="font-family:宋体;">的所有数据平台拥有超过</span><span style="font-family: Helvetica;">260</span><span style="font-family:宋体;">亿的数据集，每天要添加和删除</span><span style="font-family:Helvetica;">16</span><span style="font-family:宋体;">亿的数据集路径。为了跟踪、查询、比较数据集，他们研发了</span><span style="font-family: Helvetica;">Google Dataset Search</span><span style="font-family:宋体;">（</span><span style="font-family:Helvetica;">GOODS</span><span style="font-family:宋体;">）。</span><span style="font-family:Helvetica;">GOODS</span><span style="font-family:宋体;">跟踪由</span><span style="font-family:Helvetica;">API</span><span style="font-family:宋体;">暴露的元数据，这些元数据被用于检索、监控等。</span></p>  <p align="left"><a href="http://dl.acm.org/citation.cfm?id=2903730"><span style="font-family:Helvetica;color:#386EFF;text-decoration: none;text-underline:none">http://dl.acm.org/citation.cfm?id=2903730</span></a></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">其他新闻</span></strong><strong></strong></p>  <p><span style="font-family:Helvetica;">SiliconAngle</span><span style="font-family: 宋体;">采访了</span><span style="font-family:Helvetica;">Hortonworks CEO Rob Bearden</span><span style="font-family:宋体;">。主题包括业界趋势、</span><span style="font-family:Helvetica;">Hortonworks</span><span style="font-family:宋体;">财务、</span><span style="font-family:Helvetica;">Hortonworks</span><span style="font-family: 宋体;">的非</span><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">技术以及物联网。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://siliconangle.com/blog/2016/06/24/hadoop-and-beyond-a-conversation-with-hortonworks-ceo-rob-bearden/</span></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">产品发布</span></strong><strong></strong></p>  <p align="left"><span style="font-family:Helvetica;">Apache Sentry</span><span style="font-family:宋体;">本周发布了</span><span style="font-family:Helvetica;">1.7.0</span><span style="font-family:宋体;">版，修复了</span><span style="font-family:Helvetica;">bug</span><span style="font-family:宋体;">，增加了新特性和其他方面的提升。本次发布把</span><span style="font-family:Helvetica;">Hive</span><span style="font-family:宋体;">授权框架升级到了第二版。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">http://mail-archives.us.apache.org/mod_mbox/www-announce/201606.mbox/%3CCAPOmu3sDqdzu9ntDSvkMaDRQnVfHrkGV5qhyh-ZRiMmwgMMvBA@mail.gmail.com%3E</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:宋体;">基于</span><span style="font-family:Helvetica;">Apache Cassandra 3.0</span><span style="font-family:宋体;">构建的</span><span style="font-family:Helvetica;">DataStax Enterprise 5.0</span><span style="font-family:宋体;">，增加了对图数据、分层存储、</span><span style="font-family:Helvetica;">Cassandra</span><span style="font-family:宋体;">多实例的支持。本次发布也增加了诸如加密和基于角色访问控制的附加安全特性支持。</span></p>  <p align="left"><a href="https://www.datastax.com/2016/06/introducing-datastax-enterprise-5-0"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">https://www.datastax.com/2016/06/introducing-datastax-enterprise-5-0</span></a></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Driven</span><span style="font-family:宋体;">，大数据应用性能监控系统发布了</span><span style="font-family:Helvetica;">2.2</span><span style="font-family:宋体;">版。本次发布的亮点是对</span><span style="font-family:Helvetica;">Apache Spark</span><span style="font-family:宋体;">的监控提供了支持。</span></p>  <p align="left"><a href="http://www.driven.io/2016/06/driven-inc-delivering-hadoop-spark-performance-monitoring-announces-driven-2-2/"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">http://www.driven.io/2016/06/driven-inc-delivering-hadoop-spark-performance-monitoring-announces-driven-2-2/</span></a></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">BlueData</span><span style="font-family:宋体;">发布了他们为</span><span style="font-family:Helvetica;">Amazon Web Services</span><span style="font-family:宋体;">提供的</span><span style="font-family:Helvetica;">EPIC</span><span style="font-family:宋体;">企业大数据既服务产品。本产品通过简单的点击就能自动装载到基于</span><span style="font-family: Helvetica;">Docker</span><span style="font-family:宋体;">的</span><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">集群。</span></p>  <p align="left"><a href="http://www.bluedata.com/blog/2016/06/big-data-as-a-service-on-prem-or-cloud-bdaas/"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">http://www.bluedata.com/blog/2016/06/big-data-as-a-service-on-prem-or-cloud-bdaas/</span></a></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Apache Accumulo</span><span style="font-family:宋体;">发布了</span><span style="font-family:Helvetica;">1.7.2</span><span style="font-family:宋体;">版。本次发布修复了</span><span style="font-family:Helvetica;">write-ahead</span><span style="font-family:宋体;">日志处理方式，优化了</span><span style="font-family:Helvetica;">RFiles</span><span style="font-family:宋体;">，以及性能上的小提升。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">https://accumulo.apache.org/release_notes/1.7.2.html</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Apache ZooKeeper</span><span style="font-family:宋体;">的顶级</span><span style="font-family:Helvetica;">SDK</span><span style="font-family:宋体;">，</span><span style="font-family:Helvetica;">Apache Curator</span><span style="font-family:宋体;">发布了</span><span style="font-family:Helvetica;">2.11.0</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">3.2.0</span><span style="font-family:宋体;">版。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">https://cwiki.apache.org/confluence/display/CURATOR/Releases#Releases-June23,2016,Releases2.11.0and3.2.0available</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Apache Hive</span><span style="font-family:宋体;">发布了</span><span style="font-family:Helvetica;">2.1.0</span><span style="font-family:宋体;">版。修复了大量</span><span style="font-family:Helvetica;">bug</span><span style="font-family:宋体;">和功能增强，包括对</span><span style="font-family:Helvetica;">Hive</span><span style="font-family:宋体;">的</span><span style="font-family:Helvetica;">Live Longer</span><span style="font-family: 宋体;">和</span><span style="font-family:Helvetica;">Prosper&nbsp;</span><span style="font-family:宋体;">改进和以及</span><span style="font-family:Helvetica;">JDBC</span><span style="font-family:宋体;">支持。</span></p>  <p align="left"><a href="http://mail-archives.us.apache.org/mod_mbox/www-announce/201606.mbox/%3C7194557D-CB5E-45B7-B905-82F27B7CB33F@apache.org%3E"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">http://mail-archives.us.apache.org/mod_mbox/www-announce/201606.mbox/%3C7194557D-CB5E-45B7-B905-82F27B7CB33F@apache.org%3E</span></a></p>  <p align="left">&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">活动</span></strong><strong></strong></p>  <p align="left"><span style="font-size:14.0pt;font-family:SimSun;">中国</span></p>  <p align="left"><span style="font-family:Helvetica;">7</span><span style="font-family:宋体;">月</span><span style="font-family:Helvetica;">2</span><span style="font-family:宋体;">日</span> <span style="font-family:宋体;">上海</span><span style="font-family:Helvetica;">BigData Streaming</span><span style="font-family: 宋体;">第三次见面会</span></p><img src ="http://www.blogjava.net/rosen/aggbug/431174.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/rosen/" target="_blank">Rosen</a> 2016-07-12 21:21 <a href="http://www.blogjava.net/rosen/archive/2016/07/12/431174.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Hadoop周刊—第 175 期</title><link>http://www.blogjava.net/rosen/archive/2016/07/01/431070.html</link><dc:creator>Rosen</dc:creator><author>Rosen</author><pubDate>Fri, 01 Jul 2016 07:44:00 GMT</pubDate><guid>http://www.blogjava.net/rosen/archive/2016/07/01/431070.html</guid><wfw:comment>http://www.blogjava.net/rosen/comments/431070.html</wfw:comment><comments>http://www.blogjava.net/rosen/archive/2016/07/01/431070.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/rosen/comments/commentRss/431070.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/rosen/services/trackbacks/431070.html</trackback:ping><description><![CDATA[<p align="left" style="line-height: 10%;"><strong>&nbsp;</strong></p>  <p align="left" style="line-height: 10%;"><strong><span style="font-size:16.0pt;line-height:10%">Hadoop</span></strong><strong><span style="font-size:16.0pt;line-height:10%;font-family:宋体;">周刊</span></strong><strong> </strong><strong><span style="font-size:16.0pt;line-height: 10%;font-family:宋体;">第</span></strong><strong><span style="font-size:16.0pt;line-height:10%"> 175 </span></strong><strong><span style="font-size:16.0pt;line-height: 10%;font-family:宋体;">期</span></strong><strong></strong></p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;"><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">启明星辰平台和大数据总体组编译</span></p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;"><span style="font-size:14.0pt;line-height:10%">2016</span><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">年</span><span style="font-size:14.0pt;line-height:10%">6</span><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">月</span><span style="font-size:14.0pt;line-height:10%">19</span><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">日</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">峰会已过去一周了，我们已看到有多个产品（项目）敲定了发布时间。所以在技术新闻部分，有关于</span><span style="font-family:Helvetica;">Hadoop Kerberos</span><span style="font-family:宋体;">认证的内容另外还有</span><span style="font-family:Helvetica;">Salsify</span><span style="font-family:宋体;">应用</span><span style="font-family:Helvetica;">Avro</span><span style="font-family:宋体;">的文章。在产品发布部分，包括</span><span style="font-family: Helvetica;">Yandex</span><span style="font-family:宋体;">新近开源的列式数据库在内的多个项目均有新版本发布。</span></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">技术新闻</span></strong><strong></strong></p>  <p><span style="font-family:Helvetica;">OpenCore</span><span style="font-family:宋体;">博客撰文示范了多种</span><span style="font-family:Helvetica;">Hadoop Kerberos</span><span style="font-family:宋体;">认证协议调试工具。尤其示范了如何使用</span><span style="font-family:Helvetica;">UserGropuInformation</span><span style="font-family:宋体;">的</span><span style="font-family:Helvetica;">&#8220;main()&#8221;</span><span style="font-family:宋体;">方法导出一些有用的调试信息。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://www.opencore.com/blog/2016/5/user-name-handling-in-hadoop/</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">YARN</span><span style="font-family:宋体;">系列文章的第四部分，</span><span style="font-family: Helvetica;">Cloduera</span><span style="font-family:宋体;">博客介绍了如何配置公平调度队列。尤其对资源约束设置、队列安置策略和抢占进行了详解。</span></p>  <p align="left"><a href="http://blog.cloudera.com/blog/2016/06/untangling-apache-hadoop-yarn-part-4-fair-scheduler-queue-basics/"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">http://blog.cloudera.com/blog/2016/06/untangling-apache-hadoop-yarn-part-4-fair-scheduler-queue-basics/</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Salsify</span><span style="font-family:宋体;">基于</span><span style="font-family:Helvetica;">Apache Kafka</span><span style="font-family:宋体;">构建了一个异步微服务架构，并采用</span><span style="font-family:Helvetica;">Apache Avro</span><span style="font-family:宋体;">进行数据序列化。该应用使用</span><span style="font-family:Helvetica;">Ruby</span><span style="font-family:宋体;">开发，他们创建了多个新工具使得</span><span style="font-family:Helvetica;">Avro</span><span style="font-family:宋体;">能和</span><span style="font-family:Helvetica;">Ruby</span><span style="font-family:宋体;">语言很好的配合。本文介绍了这些工具和它们的价值：</span><span style="font-family:Helvetica;">avro-builder</span><span style="font-family:宋体;">用于定义记录、基于</span><span style="font-family:Helvetica;">postgres</span><span style="font-family:宋体;">的模式注册表，</span><span style="font-family:Helvetica;">avromatic</span><span style="font-family:宋体;">则从</span><span style="font-family:Helvetica;">avro schema</span><span style="font-family:宋体;">生成模型。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://blog.salsify.com/engineering/adventures-in-avro</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Apache Drill</span><span style="font-family: 宋体;">可以动态推断模式，还支持多模式</span><span style="font-family: Helvetica;">(</span><span style="font-family:宋体;">但相互兼容</span><span style="font-family:Helvetica;">)</span><span style="font-family:宋体;">数据。这种组合使得一些有趣的用例得以实现，例如跨多个不同模式的</span><span style="font-family: Helvetica;">json</span><span style="font-family:宋体;">文件查询。</span><span style="font-family:Helvetica;">MapR</span><span style="font-family:宋体;">博客探究了这些特性并进行了示范。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://www.mapr.com/blog/sql-query-mixed-schema-data-using-apache-drill</span></p>  <p>&nbsp;</p>  <p><span style="font-family:宋体;">本教程展示了如何将</span><span style="font-family:Helvetica;">Druid</span><span style="font-family:宋体;">与</span><span style="font-family:Helvetica;">Apache Kafka</span><span style="font-family: 宋体;">结合构建流式分析和可视化（借助</span><span style="font-family: Helvetica;">Pivot</span><span style="font-family:宋体;">，</span><span style="font-family:Helvetica;">Druid</span><span style="font-family:宋体;">的</span><span style="font-family:Helvetica;">web UI</span><span style="font-family:宋体;">）应用。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://www.confluent.io/blog/building-a-streaming-analytics-stack-with-apache-kafka-and-druid</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Apache Beam</span><span style="font-family: 宋体;">（孵化中）博客撰文介绍了他们在连接</span><span style="font-family:Helvetica;">Apache Flink</span><span style="font-family:宋体;">批处理集群方面的成果。</span><span style="font-family:Helvetica;">Beam</span><span style="font-family:宋体;">是一个开源</span><span style="font-family:Helvetica;">SDK</span><span style="font-family:宋体;">，最初来自于</span><span style="font-family:Helvetica;">Google</span><span style="font-family:宋体;">，用于暴露后端未知数据管道</span><span style="font-family:Helvetica;">API</span><span style="font-family:宋体;">。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://beam.incubator.apache.org/blog/2016/06/13/flink-batch-runner-milestone.html</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Cask Hydrator</span><span style="font-family: 宋体;">是一个通过</span><span style="font-family:Helvetica;">UI</span><span style="font-family:宋体;">界面采用拖拽方式构建数据管道的工具。本教程也演示了如何使用</span><span style="font-family:Helvetica;">Hydrator</span><span style="font-family:宋体;">把数据从</span><span style="font-family:Helvetica;">MySQL</span><span style="font-family:宋体;">导入到</span><span style="font-family:Helvetica;">HDFS</span><span style="font-family:宋体;">。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://blog.cask.co/2016/06/bringing-relational-data-into-data-lakes/</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Databricks</span><span style="font-family:宋体;">撰文介绍了即将发布的</span><span style="font-family: Helvetica;">Apache Spark 2.0</span><span style="font-family:宋体;">中新的</span><span style="font-family:Helvetica;">SQL</span><span style="font-family:宋体;">子查询功能。有趣的是，本文以手册形式呈现，最直截了当的展现了代码和范例数据。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://databricks.com/blog/2016/06/17/sql-subqueries-in-apache-spark-2-0.html</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Apache Kudu</span><span style="font-family: 宋体;">（孵化中）博客撰写了在单集群节点使用</span><span style="font-family:Helvetica;">Raft</span><span style="font-family:宋体;">的文章，借此动态扩展到多主节点集群。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://getkudu.io/2016/06/17/raft-consensus-single-node.html</span></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">其他新闻</span></strong><strong></strong></p>  <p><span style="font-family:宋体;">本文指出</span><span style="font-family:Helvetica;">Apache Spark</span><span style="font-family:宋体;">社区如果不用心经营，可能会重走因碎片化导致</span><span style="font-family:Helvetica;">Apache Hadoop</span><span style="font-family:宋体;">生态系统混乱的老路。举例来说，最新版本的</span><span style="font-family:Helvetica;">CDH</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">HDP</span><span style="font-family:宋体;">支持不同版本的</span><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://techcrunch.com/2016/06/12/spark-fragmentation-undermines-community/</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">New Stack</span><span style="font-family:宋体;">撰写了一篇关于</span><span style="font-family:Helvetica;">Concord</span><span style="font-family:宋体;">的文章，</span><span style="font-family:Helvetica;">Concord</span><span style="font-family:宋体;">是一个构建在</span><span style="font-family:Helvetica;">Apache Mesos</span><span style="font-family: 宋体;">上新的流式处理框架（公开测试状态）。</span><span style="font-family:Helvetica;">Concord</span><span style="font-family:宋体;">使用</span><span style="font-family:Helvetica;">C++</span><span style="font-family:宋体;">开发，支持动态拓扑（无需停机实现管道的增加和减少）。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://thenewstack.io/concord-leverages-mesos-high-performance-stream-processing/</span></p>  <p>&nbsp;</p>  <p><span style="font-family:宋体;">随着</span><span style="font-family:Helvetica;">Databricks</span><span style="font-family:宋体;">社区版的正式发布，</span><span style="font-family:Helvetica;">Databricks</span><span style="font-family:宋体;">发布了使用</span><span style="font-family:Helvetica;">Databricks</span><span style="font-family:宋体;">编写</span><span style="font-family:Helvetica;">Apache Spark</span><span style="font-family:宋体;">应用程序系列教程的第一篇。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://databricks.com/blog/2016/06/15/an-introduction-to-writing-apache-spark-applications-on-databricks.html</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">圣何塞峰会于几周前召开，期间举行了题为</span><span style="font-family:Helvetica;">&#8220;</span><span style="font-family:宋体;">大数据行业中的女性</span><span style="font-family:Helvetica;">&#8221;</span><span style="font-family:宋体;">专场午宴。</span><span style="font-family:Helvetica;">Hortonworks</span><span style="font-family: 宋体;">博客特意采访了午宴主持人</span><span style="font-family: Helvetica;">Hortonworks CMO</span><span style="font-family:宋体;">：</span><span style="font-family:Helvetica;">Ingrid Burton</span><span style="font-family:宋体;">。</span></p>  <p align="left"><a href="http://hortonworks.com/blog/summer-hortonworks-part-2-wibd-assertive-innovative-take-risks/"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">http://hortonworks.com/blog/summer-hortonworks-part-2-wibd-assertive-innovative-take-risks/</span></a></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">产品发布</span></strong><strong></strong></p>  <p align="left"><span style="font-family:Helvetica;">Apache SystemML</span><span style="font-family:宋体;">（孵化中）最近发布了</span><span style="font-family: Helvetica;">0.10.0</span><span style="font-family:宋体;">版。</span><span style="font-family:Helvetica;">SystemML</span><span style="font-family:宋体;">是一个机器学习框架，由多个项目在背后支撑，包括</span><span style="font-family:Helvetica;">Apache Spark</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">Apache Hadoop</span><span style="font-family:宋体;">。本次发布包括新的</span><span style="font-family:Helvetica;">Spark Matrix Block</span><span style="font-family:宋体;">类型、支持深度学习、性能上的提升、新的</span><span style="font-family:Helvetica;">KNN</span><span style="font-family:宋体;">算法等等。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">http://systemml.apache.org/0.10.0-incubating/release_notes.html</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Apache Mahout</span><span style="font-family:宋体;">，另一个机器学习框架发布了</span><span style="font-family: Helvetica;">0.12.2</span><span style="font-family:宋体;">版。本次发布向着集成</span><span style="font-family:Helvetica;">Apache Zeppelin</span><span style="font-family:宋体;">可视化和支持</span><span style="font-family:Helvetica;">notebook</span><span style="font-family:宋体;">的目标迈进了一步。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">http://mail-archives.us.apache.org/mod_mbox/www-announce/201606.mbox/%3CCAOtpBjgBAuQs5FiX5X_5A+Rd-A1fVz0R7SKttGe4cJuCLRiGww@mail.gmail.com%3E</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Qubole</span><span style="font-family:宋体;">宣布他们的</span><span style="font-family:Helvetica;">HBase-as-a-Service</span><span style="font-family:宋体;">已经在</span><span style="font-family:Helvetica;">AWS</span><span style="font-family:宋体;">上提供。它为长时运行集群提供了许多漂亮的特性。支持</span><span style="font-family:Helvetica;">Hannibal</span><span style="font-family:宋体;">和其它监控工具，集成了</span><span style="font-family:Helvetica;">Apache Zeppelin</span><span style="font-family:宋体;">，并能通过节点引导程序与</span><span style="font-family: Helvetica;">OpenTSDB</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">Apache Phoenix</span><span style="font-family:宋体;">配置。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">https://www.qubole.com/blog/product/quboles-hbase-as-a-service-is-generally-available-on-aws/</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Altiscale</span><span style="font-family:宋体;">发布了</span><span style="font-family:Helvetica;">Altiscale Insight Cloud</span><span style="font-family:宋体;">实时版。本系统由</span><span style="font-family:Helvetica;">Apache HBase</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">Spark Streaming</span><span style="font-family:宋体;">支撑。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">https://www.altiscale.com/blog/announcing-the-altiscale-insight-cloud-real-time-edition/</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">`hs2client`</span><span style="font-family:宋体;">是一个为</span><span style="font-family:Helvetica;">Apache Hive</span><span style="font-family: 宋体;">和</span><span style="font-family:Helvetica;">Apache Impala</span><span style="font-family:宋体;">（孵化中）提供的新</span><span style="font-family:Helvetica;">C++</span><span style="font-family:宋体;">库。除了支持</span><span style="font-family:Helvetica;">C++</span><span style="font-family:宋体;">，这个库还绑定了</span><span style="font-family:Helvetica;">python</span><span style="font-family:宋体;">，可以在</span><span style="font-family:Helvetica;">pandas</span><span style="font-family:宋体;">中把数据读到</span><span style="font-family:Helvetica;">DataFrame</span><span style="font-family:宋体;">。</span></p>  <p align="left"><a href="http://blog.cloudera.com/blog/2016/06/announcing-hs2client-a-fast-new-c-python-thrift-client-for-impala-and-hive/"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">http://blog.cloudera.com/blog/2016/06/announcing-hs2client-a-fast-new-c-python-thrift-client-for-impala-and-hive/</span></a></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">MapR</span><span style="font-family:宋体;">在其发行版中支持了</span><span style="font-family:Helvetica;">Apache Spark 2.0</span><span style="font-family: 宋体;">开发者预览版。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">https://www.mapr.com/blog/spark-20-now-developer-preview-mode-mapr-platform</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Apache Beam</span><span style="font-family:宋体;">发布了其</span><span style="font-family:Helvetica;">0.1.0</span><span style="font-family:宋体;">孵化版，是本项目加入</span><span style="font-family: Helvetica;">Apache</span><span style="font-family:宋体;">孵化器以来首次发布。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">http://beam.incubator.apache.org/beam/release/2016/06/15/first-release.html</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Yandex</span><span style="font-family:宋体;">开源了</span><span style="font-family:Helvetica;">ClickHouse</span><span style="font-family:宋体;">，一个列式分析数据库。本系统为横向和纵向扩展而生。支持复杂数据类型（例如数组）和近似查询。该团队还发布了与其它数据库相比的基准测试结果。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">https://clickhouse.yandex/</span></p>  <p align="left">&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">活动</span></strong><strong></strong></p>  <p align="left"><span style="font-size:14.0pt;font-family:SimSun;">中国</span></p>  <p align="left">&nbsp;</p><img src ="http://www.blogjava.net/rosen/aggbug/431070.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/rosen/" target="_blank">Rosen</a> 2016-07-01 15:44 <a href="http://www.blogjava.net/rosen/archive/2016/07/01/431070.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Hadoop周刊—第 174 期</title><link>http://www.blogjava.net/rosen/archive/2016/06/28/431032.html</link><dc:creator>Rosen</dc:creator><author>Rosen</author><pubDate>Tue, 28 Jun 2016 09:39:00 GMT</pubDate><guid>http://www.blogjava.net/rosen/archive/2016/06/28/431032.html</guid><wfw:comment>http://www.blogjava.net/rosen/comments/431032.html</wfw:comment><comments>http://www.blogjava.net/rosen/archive/2016/06/28/431032.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/rosen/comments/commentRss/431032.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/rosen/services/trackbacks/431032.html</trackback:ping><description><![CDATA[<p align="left" style="line-height: 10%;"><strong>&nbsp;</strong></p>  <p align="left" style="line-height: 10%;"><strong><span style="font-size:16.0pt;line-height:10%">Hadoop</span></strong><strong><span style="font-size:16.0pt;line-height:10%;font-family:宋体;">周刊</span></strong><strong> </strong><strong><span style="font-size:16.0pt;line-height: 10%;font-family:宋体;">第</span></strong><strong><span style="font-size:16.0pt;line-height:10%"> 174 </span></strong><strong><span style="font-size:16.0pt;line-height: 10%;font-family:宋体;">期</span></strong><strong></strong></p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;"><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">启明星辰平台和大数据总体组编译</span></p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;"><span style="font-size:14.0pt;line-height:10%">2016</span><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">年</span><span style="font-size:14.0pt;line-height:10%">6</span><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">月</span><span style="font-size:14.0pt;line-height:10%">12</span><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">日</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">峰会本周在旧金山召开，正如所料，本期周刊有大量关于</span><span style="font-family:Helvetica;">Apache Spark</span><span style="font-family:宋体;">的新闻、公告和版本发布。除</span><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">外，本期还有</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Cask</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Ambari</span><span style="font-family:宋体;">方面的文章。在产品发布部分，有一年来</span><span style="font-family:Helvetica;">Apache Pig</span><span style="font-family:宋体;">首次版本更新，还一个为分布式系统设计的简洁新工具</span><span style="font-family:Helvetica;">Runway</span><span style="font-family:宋体;">，最后是新版</span><span style="font-family:Helvetica;">Apache Kudu</span><span style="font-family:宋体;">（孵化中）。</span></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">技术新闻</span></strong><strong></strong></p>  <p><span style="font-family:Helvetica;">Debezium</span><span style="font-family:宋体;">是一个相对较新的项目，用于数据库和</span><span style="font-family:Helvetica;">Apache Kafka topic</span><span style="font-family:宋体;">行级改变数据捕获。当面支持</span><span style="font-family:Helvetica;">MySQL</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Zookeeper</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">，这是一篇在</span><span style="font-family:Helvetica;">Docker</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Kubernetes</span><span style="font-family:宋体;">容器上配置</span><span style="font-family:Helvetica;">Zookeeper, Kafka, MySQL</span><span style="font-family:宋体;">的教程。</span></p>  <p align="left"><a href="http://debezium.io/blog/2016/05/31/Debezium-on-Kubernetes/"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">http://debezium.io/blog/2016/05/31/Debezium-on-Kubernetes/</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:宋体;">有些人对</span><span style="font-family:Helvetica;">Apache Kafka</span><span style="font-family:宋体;">项目宣布采用另一种流式处理引擎感到惊讶，这就是</span><span style="font-family:Helvetica;">Kafka Streams</span><span style="font-family:宋体;">。</span><span style="font-family:Helvetica;">Kafka Streams</span><span style="font-family: 宋体;">与其它系统存在显著的关键差异。本文很好的示范了这些不同点</span><span style="font-family:Helvetica;">&#8212;&#8212;abstraction</span><span style="font-family:宋体;">、部署模型、支持基于状态的计算。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://softwaremill.com/kafka-streams-how-does-it-fit-stream-landscape/</span></p>  <p>&nbsp;</p>  <p><span style="font-family:宋体;">每个使用</span><span style="font-family:Helvetica;">MapReduce</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">或类似系统的人都会陷入难以调试、数据特征</span><span style="font-family:Helvetica;">bug</span><span style="font-family:宋体;">这些问题中。</span><span style="font-family:Helvetica;">BigDebug</span><span style="font-family:宋体;">是</span><span style="font-family:Helvetica;">UCLA</span><span style="font-family:宋体;">（加州大学洛杉矶分校）的研究项目</span><span style="font-family: Helvetica;">/</span><span style="font-family:宋体;">论文，旨在让开发人员通过工具发现单机问题：传入参数导致的崩溃，跟踪、断点、观察点、延迟报警等。该工具支持</span><span style="font-family:Helvetica;">Apache Spark 1.2.1</span><span style="font-family:宋体;">上。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://blog.acolyer.org/2016/06/07/bigdebug-debugging-primitives-for-interactive-big-data-processing-in-spark/</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Cask</span><span style="font-family:宋体;">撰文介绍了在开源</span><span style="font-family:Helvetica;">Cask Data Application Platform (CDAP)</span><span style="font-family:宋体;">中运行</span><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">的文章。运行在</span><span style="font-family:Helvetica;">CDAP</span><span style="font-family:宋体;">的</span><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">程序通过访问</span><span style="font-family:Helvetica;">Apache Tephra</span><span style="font-family:宋体;">（孵化中）实现细粒度事务支持。这样，就能很容易利用快照隔离实现从一个表复制到另一个表的一致性。</span><span style="font-family:Helvetica;">CDAP</span><span style="font-family:宋体;">中的</span><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">也能访问</span><span style="font-family:Helvetica;">Cask Tracker</span><span style="font-family:宋体;">，</span><span style="font-family:Helvetica;">Cask Tracker</span><span style="font-family:宋体;">提供数据血缘信息（什么时候创建、使用等）。根据应用的不同，</span><span style="font-family:Helvetica;">CDAP</span><span style="font-family:宋体;">工具还能发挥更大价值。</span><strong></strong></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://blog.cask.co/2016/06/cdap-spark-prototype-to-production/</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">IBM Hadoop Dev</span><span style="font-family: 宋体;">博客撰写了从</span><span style="font-family:Helvetica;">cURL</span><span style="font-family:宋体;">调用</span><span style="font-family:Helvetica;">Ambari REST API</span><span style="font-family:宋体;">的教程。还示范了在</span><span style="font-family:Helvetica;">vanilla</span><span style="font-family:宋体;">和启用了</span><span style="font-family:Helvetica;">kerberos</span><span style="font-family:宋体;">的集群上建立会话，并为接下来的请求复用会话。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://developer.ibm.com/hadoop/2016/06/07/ambari-rest-calls-for-kerberos-enabled-clusters/</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Google</span><span style="font-family:宋体;">云平台博客撰文介绍了如何调试运行在</span><span style="font-family:Helvetica;">Google Dataflow</span><span style="font-family:宋体;">上的</span><span style="font-family:Helvetica;">Apache Beam</span><span style="font-family: 宋体;">（孵化中）任务。为了调试性能瓶颈，</span><span style="font-family:Helvetica;">Dataflow</span><span style="font-family:宋体;">有一些有用的统计数据和</span><span style="font-family:Helvetica;">UI</span><span style="font-family:宋体;">来帮助使用者深入每一个步骤。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://cloud.google.com/blog/big-data/2016/06/understanding-timing-in-cloud-dataflow-pipelines</span></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">其他新闻</span></strong><strong></strong></p>  <p align="left"><span style="font-family:Helvetica;">Transaction Processing Performance Council(TPC)</span><span style="font-family:宋体;">发布了</span><span style="font-family:Helvetica;">TPCx-BB</span><span style="font-family:宋体;">基准测试，该基准测试为大数据系统设计。除了衡量</span><span style="font-family:Helvetica;">SQL</span><span style="font-family:宋体;">外，还可以对机器学习集群和分类问题进行测试。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">http://www.datanami.com/2016/06/01/big-data-benchmark-gauges-hadoop-platforms/</span></p>    <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:宋体;">伦敦</span><span style="font-family:Helvetica;">Strata + Hadoop</span><span style="font-family:宋体;">世界大会两周前已召开。演讲者的专题报告和幻灯片已发布到会议网站上。</span></p>  <p align="left"><a href="http://conferences.oreilly.com/strata/hadoop-big-data-eu/public/schedule/proceedings"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">http://conferences.oreilly.com/strata/hadoop-big-data-eu/public/schedule/proceedings</span></a></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Splice Machine</span><span style="font-family:宋体;">，</span><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">上的</span><span style="font-family:Helvetica;">RDBMS</span><span style="font-family:宋体;">构建者，宣布开源他们的软件。当前，他们正在寻找贡献者</span><span style="font-family:Helvetica;">/</span><span style="font-family:宋体;">导师</span><span style="font-family:Helvetica;">/</span><span style="font-family:宋体;">豪杰来提升开源后的效果。</span><span style="font-family:Helvetica;">Splice Machine</span><span style="font-family:宋体;">有不少有趣的特性，例如</span><span style="font-family:Helvetica;">ACID</span><span style="font-family:宋体;">事务，二级索引，引用完整性。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">http://www.splicemachine.com/were_going_open_source/</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Altiscale</span><span style="font-family:宋体;">博客编辑了许多关于客户服务、情感分析、气候变化、智慧城市、</span><span style="font-family: Helvetica;">bias</span><span style="font-family:宋体;">等方面的大数据应用案例文章。还收集了一些大数据怀疑论者的文章。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">https://www.altiscale.com/blog/big-data-news-health-and-public-safety-sentiment-analysis-fixing-education-2/</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">峰会本周在旧金山召开。会议组织者</span><span style="font-family:Helvetica;">Databricks</span><span style="font-family:宋体;">概述了两天内的热点内容，链接了许多的演讲和专题报告。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">https://databricks.com/blog/2016/06/08/another-record-setting-spark-summit.html</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:&quot;MS Mincho&quot;;MS Mincho&quot;;">大数据即服</span><span style="font-family:SimSun;">务</span><span style="font-family:&quot;MS Mincho&quot;;MS Mincho&quot;;">（BDaaS）公司</span><span style="font-family:Helvetica;">Qubole</span><span style="font-family:宋体;">，撰文介绍了他们的客户如何接受使用</span><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">。接受速度之快</span><span style="font-family:Helvetica;">&#8212;&#8212;</span><span style="font-family:宋体;">一半多的客户现在开始用</span><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">。</span><span style="font-family:Helvetica;">Qubole</span><span style="font-family:宋体;">也支持</span><span style="font-family:Helvetica;">Presto</span><span style="font-family:宋体;">，他们也看到了类似的增长。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">https://www.qubole.com/blog/big-data/spark-usage/</span></p>  <p align="left">&nbsp;</p>  <p><span style="font-family:Helvetica;">Twitter</span><span style="font-family:宋体;">向</span><span style="font-family:Helvetica;">Apache</span><span style="font-family:宋体;">孵化器提交了他们的复制日志服务</span><span style="font-family:Helvetica;">DistributedLog</span><span style="font-family:宋体;">。</span></p>  <p align="left"><a href="https://wiki.apache.org/incubator/DistributedLogProposal"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">https://wiki.apache.org/incubator/DistributedLogProposal</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Big Data Day LA</span><span style="font-family: 宋体;">于</span><span style="font-family:Helvetica;">6</span><span style="font-family:宋体;">月</span><span style="font-family:Helvetica;">9</span><span style="font-family:宋体;">日在</span><span style="font-family:宋体;color:#2E2E2E;">西洛杉矶学院召开。这次活动是免费的（如果预先注册的话），演讲者来自于</span><span style="font-family:Helvetica;">Confluent</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Databricks</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Yahoo</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Netflix</span><span style="font-family:宋体;">等。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://www.bigdatadayla.com/</span></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">产品发布</span></strong><strong></strong></p>  <p align="left"><span style="font-family:Helvetica;">Apache Spark</span><span style="font-family:宋体;">发布了</span><span style="font-family:Helvetica;">Spark 2.0</span><span style="font-family:宋体;">预览版。发布声明中说道</span><span style="font-family:Helvetica;">API</span><span style="font-family:宋体;">和功能都尚未最终敲定。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">https://spark.apache.org/news/spark-2.0.0-preview.html</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">JustOne</span><span style="font-family:宋体;">构建并开源了</span><span style="font-family:Helvetica;">Kafka-to-PostgreSQL</span><span style="font-family:宋体;">连接器。本文介绍了该连接器的性能，详细描述了如何把消息转换为行，还描述了如何设定配置等。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">http://www.confluent.io/blog/kafka-connect-sink-for-postgresql-from-justone-database</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Salesforce</span><span style="font-family:宋体;">开源了</span><span style="font-family:Helvetica;">Runway</span><span style="font-family:宋体;">，这是一个建模、仿真以及可视化分布式系统。在</span><span style="font-family:Helvetica;">runway.system</span><span style="font-family:宋体;">上有一个在线演示环境，演示了</span><span style="font-family:Helvetica;">&#8220;too many bananas&#8221;</span><span style="font-family:宋体;">模型，电梯系统和</span><span style="font-family:Helvetica;">Raft</span><span style="font-family:宋体;">一致性系统。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">https://medium.com/salesforce-open-source/runway-intro-dc0d9578e248</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Bloomberg</span><span style="font-family:宋体;">最近开源了</span><span style="font-family:Helvetica;">Presto Accumulo</span><span style="font-family: 宋体;">，面向</span><span style="font-family:Helvetica;">Apache Accumulo</span><span style="font-family:宋体;">的</span><span style="font-family:Helvetica;">Presto</span><span style="font-family:宋体;">连接器。在声明中，链接了</span><span style="font-family:Helvetica;">11</span><span style="font-family:宋体;">页的论文，比较了基于的</span><span style="font-family:Helvetica;">Presto</span><span style="font-family:宋体;">查询和基于</span><span style="font-family:Helvetica;">Accumulo Java API</span><span style="font-family:宋体;">查询的基准测试结果。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">http://www.bloomberg.com/company/announcements/open-source-at-bloomberg-reducing-application-development-time-via-presto-accumulo/</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:&quot;MS Mincho&quot;;MS Mincho&quot;;">微</span><span style="font-family:SimSun;">软</span><span style="font-family:Helvetica;">Azure</span><span style="font-family:宋体;">发布了基于</span><span style="font-family:Helvetica;">Apache Spark 1.6.1 </span><span style="font-family:宋体;">稳定版的</span><span style="font-family:Helvetica;">Azure HDInsight</span><span style="font-family:宋体;">。本次发布支持了面向</span><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">的</span><span style="font-family:Helvetica;">Project Livy REST</span><span style="font-family: 宋体;">任务服务支持，集成了</span><span style="font-family: Helvetica;">Azure</span><span style="font-family:宋体;">数据湖存储（基于角色的访问控制），集成了</span><span style="font-family:Helvetica;">IntelliJ</span><span style="font-family:宋体;">，支持了</span><span style="font-family:Helvetica;">Jupyter</span><span style="font-family:宋体;">笔记本等。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">https://azure.microsoft.com/en-us/blog/apache-spark-for-azure-hdinsight-now-generally-available/</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">LinkedIn</span><span style="font-family:宋体;">开源了</span><span style="font-family:Helvetica;">Photon ML</span><span style="font-family:宋体;">，他们的大规模回归分析库。</span><span style="font-family: Helvetica;">Photon</span><span style="font-family:宋体;">构建在</span><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">之上并在</span><span style="font-family:Helvetica;">LinkedIn</span><span style="font-family:宋体;">的</span><span style="font-family:Helvetica;">YARN</span><span style="font-family:宋体;">上运行（过去基于</span><span style="font-family:Helvetica;">MapReduce</span><span style="font-family:宋体;">，似乎因为要提升性能才迁移）。</span></p>  <p align="left"><a href="https://engineering.linkedin.com/blog/2016/06/open-sourcing-photon-ml"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">https://engineering.linkedin.com/blog/2016/06/open-sourcing-photon-ml</span></a></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Hortonworks</span><span style="font-family:宋体;">发布了</span><span style="font-family:Helvetica;">Spark-HBase</span><span style="font-family: 宋体;">连接器的技术预览版。预览版原生支持</span><span style="font-family:Helvetica;">Avro</span><span style="font-family:宋体;">，支持运行安全集群，原生支持</span><span style="font-family:Helvetica;">Spark Datasource API</span><span style="font-family:宋体;">，并优化了分区修剪，列修剪，谓词下推。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">http://hortonworks.com/blog/spark-hbase-dataframe-based-hbase-connector/</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Databricks</span><span style="font-family:宋体;">发布了</span><span style="font-family:Helvetica;">Apache Spark</span><span style="font-family: 宋体;">平台的第一阶段安全特性。本阶段对集群</span><span style="font-family:Helvetica;">ACL</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">SAML 2.0</span><span style="font-family:宋体;">进行了支持，端对端的审计日志。</span></p>  <p align="left"><span style="font-family:Helvetica;color:#386EFF;">https://databricks.com/blog/2016/06/08/achieving-end-to-end-security-for-apache-spark-with-databricks.html</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Apache ORC 1.1.0</span><span style="font-family:宋体;">版发布了。本次发布完成了从基于</span><span style="font-family: Helvetica;">Apache Hive</span><span style="font-family:宋体;">的代码到基于</span><span style="font-family:Helvetica;">Java</span><span style="font-family:宋体;">的代码迁移，修正了</span><span style="font-family:Helvetica;">C++</span><span style="font-family:宋体;">时间戳处理程序，增加了</span><span style="font-family: Helvetica;">Hadoop MapReduce</span><span style="font-family:宋体;">连接器。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">http://orc.apache.org/news/2016/06/10/ORC-1.1.0/</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Apache Kudu</span><span style="font-family:宋体;">发布了</span><span style="font-family:Helvetica;">0.9.0</span><span style="font-family:宋体;">版。增加了</span><span style="font-family:Helvetica;">UPSERT</span><span style="font-family:宋体;">命令，新的</span><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">数据源不会依赖</span><span style="font-family:Helvetica;">MapReduce API</span><span style="font-family: 宋体;">，提升了</span><span style="font-family:Helvetica;">Tablet Server</span><span style="font-family:宋体;">写性能。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">http://getkudu.io/2016/06/10/apache-kudu-0-9-0-released.html</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Google</span><span style="font-family:宋体;">云服务平台团队发布了支持</span><span style="font-family:Helvetica;">Spark 2.0</span><span style="font-family:宋体;">预览版的</span><span style="font-family:Helvetica;">Google Cloud Dataproc</span><span style="font-family: 宋体;">。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">https://cloud.google.com/blog/big-data/2016/06/google-cloud-dataproc-the-fast-easy-and-safe-way-to-try-spark-20-preview</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Dory</span><span style="font-family:宋体;">（</span><span style="font-family:Helvetica;">Bruce</span><span style="font-family:宋体;">的继承者）</span><span style="font-family:Helvetica;">Kafka producer</span><span style="font-family:宋体;">的守护进程，现在支持从</span><span style="font-family:Helvetica;">UNIX domain sockets</span><span style="font-family:宋体;">或本地</span><span style="font-family:Helvetica;">TCP</span><span style="font-family:&quot;MS Mincho&quot;;MS Mincho&quot;;">接收数据了。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">http://mail-archives.apache.org/mod_mbox/kafka-users/201606.mbox/%3C1465683894.608424023@apps.rackspace.com%3E</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Apache Pig 0.16.0</span><span style="font-family:宋体;">版，一年来首次发布。坚定了对</span><span style="font-family: Helvetica;">Tez</span><span style="font-family:宋体;">的支持。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">http://pig.apache.org/releases.html#8+June%2C+2016%3A+release+0.16.0+available</span></p>  <p align="left">&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">活动</span></strong><strong></strong></p>  <p align="left"><span style="font-size:14.0pt;font-family:SimSun;">中国</span></p>  <p align="left"><span style="font-family:Helvetica;">Spark Meetup (</span><span style="font-family:宋体;">上海</span><span style="font-family:Helvetica;">) &#8211; </span><span style="font-family:宋体;">周六</span><span style="font-family:Helvetica;">, 6</span><span style="font-family:宋体;">月</span><span style="font-family:Helvetica;">18</span><span style="font-family:宋体;">日</span></p><img src ="http://www.blogjava.net/rosen/aggbug/431032.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/rosen/" target="_blank">Rosen</a> 2016-06-28 17:39 <a href="http://www.blogjava.net/rosen/archive/2016/06/28/431032.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Hadoop周刊—第 173 期</title><link>http://www.blogjava.net/rosen/archive/2016/06/20/430972.html</link><dc:creator>Rosen</dc:creator><author>Rosen</author><pubDate>Mon, 20 Jun 2016 01:47:00 GMT</pubDate><guid>http://www.blogjava.net/rosen/archive/2016/06/20/430972.html</guid><wfw:comment>http://www.blogjava.net/rosen/comments/430972.html</wfw:comment><comments>http://www.blogjava.net/rosen/archive/2016/06/20/430972.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/rosen/comments/commentRss/430972.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/rosen/services/trackbacks/430972.html</trackback:ping><description><![CDATA[<p align="left" style="line-height: 10%;"><strong>&nbsp;</strong></p>  <p align="left" style="line-height: 10%;"><strong><span style="font-size:16.0pt;line-height:10%">Hadoop</span></strong><strong><span style="font-size:16.0pt;line-height:10%;font-family:宋体;">周刊</span></strong><strong> </strong><strong><span style="font-size:16.0pt;line-height: 10%;font-family:宋体;">第</span></strong><strong><span style="font-size:16.0pt;line-height:10%"> 173 </span></strong><strong><span style="font-size:16.0pt;line-height: 10%;font-family:宋体;">期</span></strong><strong></strong></p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;"><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">启明星辰平台和大数据总体组编译</span></p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;"><span style="font-size:14.0pt;line-height:10%">2016</span><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">年</span><span style="font-size:14.0pt;line-height:10%">6</span><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">月</span><span style="font-size:14.0pt;line-height:10%">5</span><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">日</span></p>  <p>&nbsp;</p>  <p><span style="font-family:宋体;">本周，</span><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">NiFi</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Netflix Meson</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Storm</span><span style="font-family:宋体;">方面只有少量内容。</span><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">峰会本周在旧金山召开，所以呢，下周肯定有不少内容。</span></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">技术新闻</span></strong><strong></strong></p>  <p><span style="font-family:Helvetica;">Databricks</span><span style="font-family:宋体;">博客介绍了</span><span style="font-family:Helvetica;">Apache Spark 2.0</span><span style="font-family:宋体;">的新特性</span><span style="font-family:Helvetica;">&#8212;&#8212;</span><span style="font-family:宋体;">跨语言支持存储和加载机器学习模型。模型通过简单的</span><span style="font-family:Helvetica;">API</span><span style="font-family:宋体;">被存储和加载，模型的元数据与参数保存为</span><span style="font-family:Helvetica;">JSON</span><span style="font-family:宋体;">风格，模型的数据保存为</span><span style="font-family:Helvetica;">Parquet</span><span style="font-family:宋体;">风格。</span></p>  <p><span style="font-family:Helvetica;">https://databricks.com/blog/2016/05/31/apache-spark-2-0-preview-machine-learning-model-persistence.html</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://databricks.com/blog/2016/05/31/apache-spark-2-0-preview-machine-learning-model-persistence.html</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Meson</span><span style="font-family:宋体;">是</span><span style="font-family:Helvetica;">Netflix</span><span style="font-family:宋体;">用于执行机器学习工作流的框架。它是</span><span style="font-family:Helvetica;">Apache Hive</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Mesos</span><span style="font-family:宋体;">这些大数据技术之间的粘合剂。工作流使用</span><span style="font-family:Helvetica;">DSL</span><span style="font-family:宋体;">进行编写，</span><span style="font-family:Helvetica;">Meson</span><span style="font-family:宋体;">还提供了更加先进的流水线可视化</span><span style="font-family: Helvetica;">UI</span><span style="font-family:宋体;">。</span><span style="font-family:Helvetica;">Netflix</span><span style="font-family:宋体;">目前没开源</span><span style="font-family:Helvetica;">Meson</span><span style="font-family:宋体;">，但他们有这方面的计划。</span></p>  <p align="left"><a href="http://techblog.netflix.com/2016/05/meson_31.html"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">http://techblog.netflix.com/2016/05/meson_31.html</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">IBM Hadoop Dev</span><span style="font-family: 宋体;">博客简要介绍和示范了</span><span style="font-family: Helvetica;">HDFS</span><span style="font-family:宋体;">归档存储能力。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://developer.ibm.com/hadoop/2016/06/01/use-hdfs-archival-storage/</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Apache Storm 1.0</span><span style="font-family: 宋体;">有了令人惊讶的新特性。本文关注了几个调试能力方面的增强：动态日志级别、统一日志搜索、</span><span style="font-family:Cambria;">事件抽样、集成</span><span style="font-family: Helvetica;">jstack/heap dumps/java</span><span style="font-family:宋体;">飞行记录器分析</span><span style="font-family:Helvetica;">worker</span><span style="font-family:宋体;">。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://hortonworks.com/blog/whats-new-apache-storm-1-0-part-1-enhanced-debugging/</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Cloudera</span><span style="font-family:宋体;">博客撰文介绍了如何使用</span><span style="font-family: Helvetica;">Apache Spark</span><span style="font-family:宋体;">来探索性分析存储在</span><span style="font-family:Helvetica;">CSV</span><span style="font-family:宋体;">文件中的</span><span style="font-family:Helvetica;">NBA</span><span style="font-family:宋体;">历史统计数据。分析过程混合使用了</span><span style="font-family: Helvetica;">Scala</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">SQL</span><span style="font-family:宋体;">。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://blog.cloudera.com/blog/2016/06/how-to-analyze-fantasy-sports-using-apache-spark-and-sql/</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Apache NiFi</span><span style="font-family: 宋体;">作为一种通用工具受到了很多的关注。它为</span><span style="font-family:Helvetica;">&#8220;</span><span style="font-family:宋体;">基于流程的处理</span><span style="font-family:Helvetica;">&#8221;</span><span style="font-family:宋体;">而生，可能对很多人并不意味着什么，但</span><span style="font-family:Helvetica;">NiFi</span><span style="font-family:宋体;">支持标准的</span><span style="font-family:Helvetica;">ETL</span><span style="font-family:宋体;">，流式处理等。许多</span><span style="font-family:Helvetica;">NiFi</span><span style="font-family:宋体;">例子都示范了如何从</span><span style="font-family:Helvetica;">Twitter firehose</span><span style="font-family:宋体;">把数据移动到</span><span style="font-family:Helvetica;">HDFS</span><span style="font-family:宋体;">中，但本文聚焦在</span><span style="font-family:Helvetica;">NiFi</span><span style="font-family:宋体;">另外的特性上</span><span style="font-family:Helvetica;">&#8212;&#8212;</span><span style="font-family:宋体;">示范了一些简单的从</span><span style="font-family:Helvetica;">HTTP</span><span style="font-family:宋体;">拉数据的过程。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://hortonworks.com/blog/apache-nifi-not-scratch/</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Amazon Redshift</span><span style="font-family: 宋体;">构建于</span><span style="font-family:Helvetica;">PostgreSQL</span><span style="font-family:宋体;">引擎上，所以你可以利用</span><span style="font-family:Helvetica;">PostgreSQL</span><span style="font-family:宋体;">的扩展功能让</span><span style="font-family:Helvetica;">Redshift</span><span style="font-family:宋体;">集群连接</span><span style="font-family:Helvetica;">PostgresSQL</span><span style="font-family:宋体;">实例。这样一来，诸如跨数据库连接、将</span><span style="font-family:Helvetica;">Redshift</span><span style="font-family:宋体;">的结果转换为</span><span style="font-family:Helvetica;">JSON</span><span style="font-family:宋体;">、在</span><span style="font-family:Helvetica;">Postgres</span><span style="font-family:宋体;">中创建</span><span style="font-family:Helvetica;">Redshift</span><span style="font-family:宋体;">数据视图、</span></p>  <p><span style="font-family:宋体;">数据库之间复制数据等有趣的应用都能实现。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://blogs.aws.amazon.com/bigdata/post/Tx1GQ6WLEWVJ1OX/JOIN-Amazon-Redshift-AND-Amazon-RDS-PostgreSQL-WITH-dblink</span></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">其他发布</span></strong><strong></strong></p>  <p align="left"><span style="font-family:Helvetica;">FeatherCast</span><span style="font-family:宋体;">发布了超过</span><span style="font-family:Helvetica;">100</span><span style="font-family:宋体;">个</span><span style="font-family:Helvetica;">ApacheCon</span><span style="font-family:宋体;">北美峰会的相关录音。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">http://feathercast.apache.org/tag/apacheconna2016/</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">InfoWorld</span><span style="font-family:宋体;">介绍了</span><span style="font-family:Helvetica;">Heron</span><span style="font-family:宋体;">，</span><span style="font-family:Helvetica;">Twitter</span><span style="font-family:宋体;">才开源的</span><span style="font-family:Helvetica;">Apache Storm</span><span style="font-family:宋体;">兼容项目。本文介绍了两个项目在架构上的不同。主要指出了</span><span style="font-family:Helvetica;">Heron</span><span style="font-family:宋体;">起步于几个月前（</span><span style="font-family:Helvetica;">Storm</span><span style="font-family:宋体;">已发布），就是说</span><span style="font-family:Helvetica;">Storm</span><span style="font-family:宋体;">在特性上比</span><span style="font-family:Helvetica;">Heron</span><span style="font-family:宋体;">更有优势。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">http://www.infoworld.com/article/3078134/analytics/had-it-with-apache-storm-heron-swoops-to-the-rescue.html</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Databricks</span><span style="font-family:宋体;">在</span><span style="font-family:Helvetica;">edX</span><span style="font-family:宋体;">上开了一门新课程，</span><span style="font-family:Helvetica;">&#8220;Apache Spark</span><span style="font-family:宋体;">入门</span><span style="font-family:Helvetica;">&#8221;</span><span style="font-family:宋体;">。课程从</span><span style="font-family:Helvetica;">6</span><span style="font-family:宋体;">月</span><span style="font-family:Helvetica;">15</span><span style="font-family:宋体;">日开始，一直持续两周。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">launch-first-of-five-free-big-data-courses-on-apache-spark.html</span></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">产品发布</span></strong><strong></strong></p>  <p align="left"><span style="font-family:Helvetica;">Amazon EMR</span><span style="font-family:宋体;">发布了</span><span style="font-family:Helvetica;">4.7.0</span><span style="font-family:宋体;">版。本次发布支持了</span><span style="font-family:Helvetica;">Apache Tez</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">Apache Phoenix</span><span style="font-family:宋体;">，并内置了新版本的</span><span style="font-family:Helvetica;">Apache HBase</span><span style="font-family: 宋体;">、</span><span style="font-family:Helvetica;">Apache Mahout</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Presto</span><span style="font-family:宋体;">。另外，</span><span style="font-family:Helvetica;">AWS</span><span style="font-family:宋体;">大数据博客还指导了</span><span style="font-family:Helvetica;">Phoenix</span><span style="font-family:宋体;">如何上手。</span></p>  <p align="left"><a href="http://aws.amazon.com/blogs/aws/amazon-emr-4-7-0-apache-tez-phoenix-updates-to-existing-apps/"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">http://aws.amazon.com/blogs/aws/amazon-emr-4-7-0-apache-tez-phoenix-updates-to-existing-apps/</span></a></p>  <p align="left"><a href="http://blogs.aws.amazon.com/bigdata/post/Tx2ZF1NDQYDJFGT/Supercharge-SQL-on-Your-Data-in-Apache-HBase-with-Apache-Phoenix"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">http://blogs.aws.amazon.com/bigdata/post/Tx2ZF1NDQYDJFGT/Supercharge-SQL-on-Your-Data-in-Apache-HBase-with-Apache-Phoenix</span></a></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Apache Hive</span><span style="font-family:宋体;">本周发布了</span><span style="font-family:Helvetica;">2.0.1</span><span style="font-family:宋体;">版。从二月发布</span><span style="font-family:Helvetica;">2.0.0</span><span style="font-family:宋体;">以来，首次小版本发布。本次修复了</span><span style="font-family:Helvetica;">60</span><span style="font-family:宋体;">个</span><span style="font-family:Helvetica;">bug</span><span style="font-family:宋体;">。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">http://mail-archives.us.apache.org/mod_mbox/www-announce/201605.mbox/%3CD37344A3.77A64%25sershe@apache.org%3E</span></p>  <p align="left">&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">活动</span></strong><strong></strong></p>  <p align="left"><span style="font-size:14.0pt;font-family:SimSun;">中国</span></p>  <p align="left"><span style="font-family:宋体;">无</span></p><img src ="http://www.blogjava.net/rosen/aggbug/430972.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/rosen/" target="_blank">Rosen</a> 2016-06-20 09:47 <a href="http://www.blogjava.net/rosen/archive/2016/06/20/430972.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Hadoop周刊—第 172 期</title><link>http://www.blogjava.net/rosen/archive/2016/06/09/430841.html</link><dc:creator>Rosen</dc:creator><author>Rosen</author><pubDate>Wed, 08 Jun 2016 16:11:00 GMT</pubDate><guid>http://www.blogjava.net/rosen/archive/2016/06/09/430841.html</guid><wfw:comment>http://www.blogjava.net/rosen/comments/430841.html</wfw:comment><comments>http://www.blogjava.net/rosen/archive/2016/06/09/430841.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/rosen/comments/commentRss/430841.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/rosen/services/trackbacks/430841.html</trackback:ping><description><![CDATA[<p align="left" style="line-height: 10%;"><strong>&nbsp;</strong></p>  <p align="left" style="line-height: 10%;"><strong><span style="font-size:16.0pt;line-height:10%">Hadoop</span></strong><strong><span style="font-size:16.0pt;line-height:10%;font-family:宋体;">周刊</span></strong><strong> </strong><strong><span style="font-size:16.0pt;line-height: 10%;font-family:宋体;">第</span></strong><strong><span style="font-size:16.0pt;line-height:10%"> 172 </span></strong><strong><span style="font-size:16.0pt;line-height: 10%;font-family:宋体;">期</span></strong><strong></strong></p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;"><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">启明星辰平台和大数据总体组编译</span></p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;"><span style="font-size:14.0pt;line-height:10%">2016</span><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">年</span><span style="font-size:14.0pt;line-height:10%">5</span><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">月</span><span style="font-size:14.0pt;line-height:10%">22</span><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">日</span></p>  <p>&nbsp;</p>  <p><span style="font-family:宋体;">本周主要关注流式计算</span><span style="font-family:Helvetica;">&#8212;&#8212; Twitter</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">Cloudera</span><span style="font-family:宋体;">介绍了他们新的流式计算框架，有文章介绍了</span><span style="font-family:Helvetica;">Apache Flink</span><span style="font-family:宋体;">的流式</span><span style="font-family:Helvetica;">SQL</span><span style="font-family:宋体;">，</span><span style="font-family:Helvetica;">DataTorrent</span><span style="font-family: 宋体;">介绍了</span><span style="font-family:Helvetica;">Apache Apex</span><span style="font-family:宋体;">容错机制，还有</span><span style="font-family:Helvetica;">Concord</span><span style="font-family:宋体;">这样新的流式计算框架，另外还有</span><span style="font-family:Helvetica;">Apache Kafka</span><span style="font-family:宋体;">的</span><span style="font-family:Helvetica;">0.10</span><span style="font-family:宋体;">版。其他新闻方面，</span><span style="font-family:Helvetica;">Apache</span><span style="font-family:宋体;">孵化器有新动向</span><span style="font-family:Helvetica;">&#8212;&#8212;Apache TinkerPop</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">Apache Zeppelin</span><span style="font-family:宋体;">孵化成为顶级项目，</span><span style="font-family:Helvetica;">Tephra</span><span style="font-family:宋体;">进入孵化器。除了上述内容，</span><span style="font-family: Helvetica;">Apache Spark</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Apache HBase</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Apache Drill</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Apache Ambari</span><span style="font-family:宋体;">等也有新文章。</span></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">技术新闻</span></strong><strong></strong></p>  <p align="left"><span style="font-family:Helvetica;">DataTorrent</span><span style="font-family:宋体;">博客撰文介绍了</span><span style="font-family:Helvetica;">Apache Apex</span><span style="font-family:宋体;">在读写数据文件时的容错机制。</span><span style="font-family:Helvetica;">Apex</span><span style="font-family:宋体;">是专门处理流式数据的，流式计算有一些微妙但重要的细节需要考虑。例如使用</span><span style="font-family:Helvetica;">HDFS</span><span style="font-family:宋体;">输出时，</span><span style="font-family:Helvetica;">HDFS</span><span style="font-family:宋体;">的租约机制会引发问题。</span></p>  <p align="left"><a href="https://www.datatorrent.com/blog/fault-tolerant-file-processing/"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">https://www.datatorrent.com/blog/fault-tolerant-file-processing/</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Databricks</span><span style="font-family:宋体;">博客介绍了</span><span style="font-family:Helvetica;">Spark 2.0</span><span style="font-family:宋体;">中</span><span style="font-family:Helvetica;">Tungsten</span><span style="font-family:宋体;">代码生成引擎带来的性能提升。博文举例说明了由于虚拟函数的管理，更好地利用</span><span style="font-family:Helvetica;">CPU</span><span style="font-family:宋体;">寄存器和循环展开，所以代码生成引擎能更快的生成代码。除了</span><span style="font-family: Helvetica;">Databricks</span><span style="font-family:宋体;">的博文外，</span><span style="font-family:Helvetica;">Morning Paper</span><span style="font-family:宋体;">还谈到以上技术其实是受到</span><span style="font-family: Helvetica;">VLDB</span><span style="font-family:宋体;">论文的启发。</span></p>  <p align="left"><a href="https://databricks.com/blog/2016/05/23/apache-spark-as-a-compiler-joining-a-billion-rows-per-second-on-a-laptop.html"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">https://databricks.com/blog/2016/05/23/apache-spark-as-a-compiler-joining-a-billion-rows-per-second-on-a-laptop.html</span></a></p>  <p><a href="https://blog.acolyer.org/2016/05/23/efficiently-compiling-efficient-query-plans-for-modern-hardware/"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">https://blog.acolyer.org/2016/05/23/efficiently-compiling-efficient-query-plans-for-modern-hardware/</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">StreamScope</span><span style="font-family: 宋体;">是微软流式处理系统，是</span><span style="font-family: Helvetica;">Morning Paper</span><span style="font-family:宋体;">本周撰写的另一个流式计算文章。介绍了该系统的特征</span><span style="font-family:Helvetica;">&#8212;&#8212;</span><span style="font-family:宋体;">吞吐量</span><span style="font-family:Helvetica;">/</span><span style="font-family:宋体;">集群大小、编程模型</span><span style="font-family:Helvetica;">(SQL)</span><span style="font-family:宋体;">、时间模型、语义学</span><span style="font-family:Helvetica;">/</span><span style="font-family:宋体;">保证，以及微软产品中的应用。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://blog.acolyer.org/2016/05/24/streamscope-continuous-reliable-distributed-processing-of-big-data-streams/</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Apache</span><span style="font-family:宋体;">博客撰文介绍了</span><span style="font-family:Helvetica;">HubSpot</span><span style="font-family:宋体;">团队对</span><span style="font-family:Helvetica;">Apache HBase</span><span style="font-family:宋体;">的</span><span style="font-family:Helvetica;">G1GC</span><span style="font-family:宋体;">调优方面的经验。本文回顾</span><span style="font-family:Helvetica;">HubSpot</span><span style="font-family:宋体;">如何尝试和保障稳定性、如何保障</span><span style="font-family:Helvetica;">99%</span><span style="font-family:宋体;">的性能、如何缩短花在垃圾回收上的时间。该团队使用很多技巧，很好地决绝了错综复杂的</span><span style="font-family:Helvetica;">GC</span><span style="font-family:宋体;">算法。本文最后，还一步步示范了</span><span style="font-family:Helvetica;">HBase</span><span style="font-family:宋体;">的</span><span style="font-family:Helvetica;">G1GC</span><span style="font-family:宋体;">调优。</span></p>  <p align="left"><a href="https://blogs.apache.org/hbase/entry/tuning_g1gc_for_your_hbase"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">https://blogs.apache.org/hbase/entry/tuning_g1gc_for_your_hbase</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">LinkedIn</span><span style="font-family:宋体;">撰文阐述了调试</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">偏移量管理问题的诸多困难。本文聚焦了两个所谓</span><span style="font-family:Helvetica;">"offset rewind"</span><span style="font-family: Cambria;">事件的症状，如何在监控过程中检测到这类事件，以及导致这两个事件的根本原因（及解决方案）。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://engineering.linkedin.com/blog/2016/05/kafkaesque-days-at-linkedin--part-1</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Databricks</span><span style="font-family:宋体;">博客发布了使用</span><span style="font-family:Helvetica;">Apache Spark</span><span style="font-family:宋体;">进行基因变异分析系列文章的第三部分也是最后一篇。本文从准备（把文件转换到</span><span style="font-family:Helvetica;">Parquet</span><span style="font-family:宋体;">并加载进</span><span style="font-family:Helvetica;">Spark RRD</span><span style="font-family:宋体;">）到如何加载基因型数据再到运行</span><span style="font-family: Helvetica;">kmeans</span><span style="font-family:宋体;">聚类算法基于基因型特征预测地理种群。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://databricks.com/blog/2016/05/24/predicting-geographic-population-using-genome-variants-and-k-means.html</span></p>  <p>&nbsp;</p>  <p><span style="font-family:宋体;">许多批处理大数据生态系统已从自定义</span><span style="font-family:Helvetica;">API</span><span style="font-family:宋体;">回到</span><span style="font-family:Helvetica;">SQL</span><span style="font-family:宋体;">上，所以如果流式处理框架也发生了同样的变化，一定很有趣。本文，</span><span style="font-family:Helvetica;">Apache Flink</span><span style="font-family:宋体;">团队介绍他们计划支持流式</span><span style="font-family:Helvetica;">SQL</span><span style="font-family:宋体;">。</span><span style="font-family:Helvetica;">Flink</span><span style="font-family:宋体;">已经有了</span><span style="font-family:Helvetica;">Table API</span><span style="font-family:宋体;">，他们利用</span><span style="font-family:Helvetica;">Apache Calcite</span><span style="font-family:宋体;">提供了对</span><span style="font-family:Helvetica;">SQL</span><span style="font-family:宋体;">的支持。对于</span><span style="font-family:Helvetica;">windowing</span><span style="font-family:宋体;">，他们计划用</span><span style="font-family:Helvetica;">Calcite</span><span style="font-family:宋体;">的流式</span><span style="font-family:Helvetica;">SQL</span><span style="font-family:宋体;">扩展。最初对</span><span style="font-family:Helvetica;">SQL</span><span style="font-family:宋体;">的支持将在</span><span style="font-family:Helvetica;">1.1.0</span><span style="font-family:宋体;">版中体现，在</span><span style="font-family:Helvetica;">1.2.0</span><span style="font-family:宋体;">版加强。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://flink.apache.org/news/2016/05/24/stream-sql.html</span></p>  <p>&nbsp;</p>  <p><span style="font-family:宋体;">本文介绍了</span><span style="font-family:Helvetica;">Apache Drill</span><span style="font-family:宋体;">的</span><span style="font-family:Helvetica;">XML</span><span style="font-family:宋体;">插件。尽管还没有和</span><span style="font-family:Helvetica;">Drill</span><span style="font-family:宋体;">集成在一起，但它相当容易被编译成</span><span style="font-family:Helvetica;">jar</span><span style="font-family:宋体;">和配置对</span><span style="font-family:Helvetica;">XML</span><span style="font-family:宋体;">的支持。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://www.mapr.com/blog/how-use-xml-plugin-apache-drill</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Hortonworks</span><span style="font-family: 宋体;">博客简略介绍了</span><span style="font-family:Helvetica;">Ambari</span><span style="font-family:宋体;">监控度量系统的架构，最近加入了</span><span style="font-family:Helvetica;">Grafana</span><span style="font-family:宋体;">作为其前端仪表盘。该系统使用</span><span style="font-family:Helvetica;">Apache Phoenix</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">Apache HBase</span><span style="font-family:宋体;">作为存储支撑，所以是可以横向扩展的。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://hortonworks.com/blog/hood-ambari-metrics-grafana/</span></p>  <p>&nbsp;</p>  <p><span style="font-family:宋体;">这篇教程介绍了怎样在</span><span style="font-family:Helvetica;">Amazon EMR</span><span style="font-family:宋体;">上使用</span><span style="font-family:Helvetica;">Spark SQL</span><span style="font-family:宋体;">与</span><span style="font-family:Helvetica;">Hue</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Apache Zeppelin</span><span style="font-family:宋体;">配合运行</span><span style="font-family:Helvetica;">SQL</span><span style="font-family:宋体;">查询存储在</span><span style="font-family:Helvetica;">S3</span><span style="font-family:宋体;">中跨制表符分割的数据。本文最后展示了如何从</span><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">向</span><span style="font-family:Helvetica;">DynamoDB</span><span style="font-family:宋体;">存储数据。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://blogs.aws.amazon.com/bigdata/post/Tx2D93GZRHU3TES/Using-Spark-SQL-for-ETL</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Heroku</span><span style="font-family:宋体;">团队分享了他们使用最新版</span><span style="font-family: Helvetica;">Apache Kafka</span><span style="font-family:宋体;">的体验</span><span style="font-family:Helvetica;">&#8212;&#8212;</span><span style="font-family:宋体;">才引入的</span><span style="font-family:Helvetica;">timestamp</span><span style="font-family:宋体;">字段（</span><span style="font-family:Helvetica;">8</span><span style="font-family:宋体;">字节）会导致一些反直觉的性能变化。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://engineering.heroku.com/blogs/2016-05-27-apache-kafka-010-evaluating-performance-in-distributed-systems/</span></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">其他新闻</span></strong><strong></strong></p>  <p><span style="font-family:Helvetica;">O'Reilly</span><span style="font-family:宋体;">数据播客秀就</span><span style="font-family:Helvetica;">Spark 2.0</span><span style="font-family:宋体;">中结构化流式计算方面的问题采访了来自</span><span style="font-family:Helvetica;">Databricks</span><span style="font-family:宋体;">的</span><span style="font-family:Helvetica;">Michael Armbrust</span><span style="font-family: 宋体;">。网站上的一篇文章选择引用了其中的话题</span><span style="font-family:Helvetica;">&#8212;&#8212; Spark SQL</span><span style="font-family:宋体;">、结构化流式计算的目标、端到端管道的保证、对在线处理运用</span><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">机器学习算法。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://www.oreilly.com/ideas/structured-streaming-comes-to-apache-spark-2-0</span></p>  <p>&nbsp;</p>  <p><span style="font-family:宋体;">本周两个大数据项目从</span><span style="font-family:Helvetica;">Apache</span><span style="font-family:宋体;">孵化器孵化完成</span><span style="font-family:Helvetica;">&#8212;&#8212;Apache TinkerPop</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">Apache Zeppelin</span><span style="font-family:宋体;">。</span><span style="font-family:Helvetica;">TinkerPop</span><span style="font-family:宋体;">是图计算框架，</span><span style="font-family:Helvetica;">Zeppelin</span><span style="font-family:宋体;">是面向数据分析基于</span><span style="font-family:Helvetica;">web</span><span style="font-family:宋体;">的</span><span style="font-family:Helvetica;">notebook</span><span style="font-family:宋体;">。</span></p>  <p align="left"><a href="https://blogs.apache.org/foundation/entry/the_apache_software_foundation_announces91"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">https://blogs.apache.org/foundation/entry/the_apache_software_foundation_announces91</span></a></p>  <p><a href="https://blogs.apache.org/foundation/entry/the_apache_software_foundation_announces92"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">https://blogs.apache.org/foundation/entry/the_apache_software_foundation_announces92</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Tephra</span><span style="font-family:宋体;">，</span><span style="font-family:Helvetica;">HBase</span><span style="font-family:宋体;">的事务引擎进入了</span><span style="font-family:Helvetica;">Apache</span><span style="font-family:宋体;">孵化器。</span><span style="font-family:Helvetica;">Tephra</span><span style="font-family:宋体;">最初由</span><span style="font-family:Helvetica;">Cask</span><span style="font-family:宋体;">的团队创建，目前仅和</span><span style="font-family:Helvetica;">Apache Phoenix</span><span style="font-family:宋体;">进行了集成。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://blog.cask.co/2016/05/tephra-a-transaction-engine-for-hbase-moves-to-apache-incubation/</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">TechRepublic</span><span style="font-family: 宋体;">撰文介绍了</span><span style="font-family:Helvetica;">Concord.io</span><span style="font-family:宋体;">，一个由</span><span style="font-family:Helvetica;">C++</span><span style="font-family:宋体;">开发的流式处理框架。旨在填补高性能流式计算市场的空缺。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://www.techrepublic.com/article/could-concord-topple-apache-spark-from-its-big-data-throne/</span></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">产品发布</span></strong><strong></strong></p>  <p align="left"><span style="font-family:Helvetica;">Apache Avro</span><span style="font-family:宋体;">本周发布了</span><span style="font-family:Helvetica;">1.8.1</span><span style="font-family:宋体;">版。修复了超过</span><span style="font-family:Helvetica;">20</span><span style="font-family:宋体;">个</span><span style="font-family:Helvetica;">bug</span><span style="font-family:宋体;">和一些其它进步。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">http://mail-archives.us.apache.org/mod_mbox/www-announce/201605.mbox/%3CCAO4re1nYMm79WQ2LUeODWjHmJ9EiYOF=mty6p2aiq-S_4R95iQ@mail.gmail.com%3E</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Confluent</span><span style="font-family:宋体;">发布了基于</span><span style="font-family:Helvetica;">librdkafka</span><span style="font-family:宋体;">开发的</span><span style="font-family:Helvetica;">Kafka Python</span><span style="font-family:宋体;">客户端。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">https://pypi.python.org/pypi/confluent-kafka/0.9.1.1</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:&quot;MS Mincho&quot;;MS Mincho&quot;;">伴随着新的</span><span style="font-family:Helvetica;">Kafka </span><span style="font-family:宋体;">流式计算方式，</span><span style="font-family:Helvetica;">Apache Kafka 0.10</span><span style="font-family:宋体;">版发布了。新版本支持了机架感知和消息中的</span><span style="font-family:Helvetica;">timestamp</span><span style="font-family:宋体;">，提升了</span><span style="font-family:Helvetica;">SASL</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">Kafka Connect</span><span style="font-family:宋体;">等。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">http://mail-archives.us.apache.org/mod_mbox/www-announce/201605.mbox/%3CCAPuboUuRyCRxDp5CLjv2yVM77SpYFF+HdnBeiiyeumYTJNpY4g@mail.gmail.com%3E</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Confluent</span><span style="font-family:宋体;">发布了基于</span><span style="font-family:Helvetica;">Apache Kafka 0.10</span><span style="font-family: 宋体;">的</span><span style="font-family:Helvetica;">Confluent Platform 3.0</span><span style="font-family:宋体;">版。除了</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">的核心特性，</span><span style="font-family:Helvetica;">Confluent Platform</span><span style="font-family:宋体;">还有一个商业组件为</span><span style="font-family:Helvetica;">Kafka Connect</span><span style="font-family:宋体;">提供配置工具和端到端流监控。</span></p>  <p align="left"><a href="http://www.confluent.io/blog/announcing-apache-kafka-0.10-and-confluent-platform-3.0"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">http://www.confluent.io/blog/announcing-apache-kafka-0.10-and-confluent-platform-3.0</span></a></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Apache Kylin</span><span style="font-family:宋体;">，大数据</span><span style="font-family:Helvetica;">OLAP</span><span style="font-family:宋体;">引擎，发布了</span><span style="font-family:Helvetica;">1.5.2</span><span style="font-family:宋体;">版。作为一次补丁级的发布，</span><span style="font-family:Helvetica;">1.5.2</span><span style="font-family:宋体;">有不少新特性</span><span style="font-family:Helvetica;">/</span><span style="font-family:宋体;">提升</span><span style="font-family:Helvetica;">/bug</span><span style="font-family:宋体;">修复，包括支持</span><span style="font-family:Helvetica;">CDH 5.7</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">MapR</span><span style="font-family:宋体;">。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">http://mail-archives.us.apache.org/mod_mbox/www-announce/201605.mbox/%3CCA+LQBaTDxb4wVYVvtOC22gMbJ0p9cvhAWzEY_x2n1oNGvEDPSQ@mail.gmail.com%3E</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Twitter</span><span style="font-family:宋体;">开源了他们的流式处理系统</span><span style="font-family:Helvetica;">Heron</span><span style="font-family:宋体;">。</span><span style="font-family:Helvetica;">Heron</span><span style="font-family:宋体;">是</span><span style="font-family:Helvetica;">Twitter</span><span style="font-family:宋体;">用于替换</span><span style="font-family:Helvetica;">Apache Storm</span><span style="font-family: 宋体;">的产品，发力点在性能、调试以及开发人员生产率。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">https://blog.twitter.com/2016/open-sourcing-twitter-heron</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Envelope</span><span style="font-family:宋体;">是来自于</span><span style="font-family:Helvetica;">Cloudera Labs</span><span style="font-family: 宋体;">的新项目，它提供了基于配置文件的流式</span><span style="font-family:Helvetica;">ETL</span><span style="font-family:宋体;">处理过程。构建在</span><span style="font-family:Helvetica;">Spark streaming</span><span style="font-family:宋体;">之上，</span><span style="font-family:Helvetica;">Envelope</span><span style="font-family:宋体;">最近正在研发面向</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">Kudu</span><span style="font-family:宋体;">的连接器。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">http://blog.cloudera.com/blog/2016/05/new-in-cloudera-labs-envelope-for-apache-spark-streaming/</span></p>  <p align="left">&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">活动</span></strong><strong></strong></p>  <p align="left"><span style="font-size:14.0pt;font-family:SimSun;">中国</span></p>  <p align="left"><span style="font-family:Helvetica;">Spark Meetup 4 (</span><span style="font-family:宋体;">杭州</span><span style="font-family:Helvetica;">) &#8211; </span><span style="font-family:宋体;">周日</span><span style="font-family:Helvetica;">, 6</span><span style="font-family:宋体;">月</span><span style="font-family:Helvetica;">5</span><span style="font-family:宋体;">日</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">http://www.meetup.com/Hangzhou-Apache-Spark-Meetup/events/231071384/</span></p><img src ="http://www.blogjava.net/rosen/aggbug/430841.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/rosen/" target="_blank">Rosen</a> 2016-06-09 00:11 <a href="http://www.blogjava.net/rosen/archive/2016/06/09/430841.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Hadoop周刊—第 171 期</title><link>http://www.blogjava.net/rosen/archive/2016/06/08/430838.html</link><dc:creator>Rosen</dc:creator><author>Rosen</author><pubDate>Wed, 08 Jun 2016 08:42:00 GMT</pubDate><guid>http://www.blogjava.net/rosen/archive/2016/06/08/430838.html</guid><wfw:comment>http://www.blogjava.net/rosen/comments/430838.html</wfw:comment><comments>http://www.blogjava.net/rosen/archive/2016/06/08/430838.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/rosen/comments/commentRss/430838.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/rosen/services/trackbacks/430838.html</trackback:ping><description><![CDATA[<p align="left" style="line-height: 10%;"><strong>&nbsp;</strong></p>  <p align="left" style="line-height: 10%;"><strong><span style="font-size:16.0pt;line-height:10%">Hadoop</span></strong><strong><span style="font-size:16.0pt;line-height:10%;font-family:宋体;">周刊</span></strong><strong> </strong><strong><span style="font-size:16.0pt;line-height: 10%;font-family:宋体;">第</span></strong><strong><span style="font-size:16.0pt;line-height:10%"> 171 </span></strong><strong><span style="font-size:16.0pt;line-height: 10%;font-family:宋体;">期</span></strong><strong></strong></p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;"><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">启明星辰平台和大数据总体组编译</span></p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;"><span style="font-size:14.0pt;line-height:10%">2016</span><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">年</span><span style="font-size:14.0pt;line-height:10%">5</span><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">月</span><span style="font-size:14.0pt;line-height:10%">22</span><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">日</span></p>  <p>&nbsp;</p>  <p><span style="font-family:宋体;">本周，包括</span><span style="font-family:Helvetica;">LinkedIn</span><span style="font-family:宋体;">新开源项目在内的几个项目都有版本发布。在技术新闻和其他新闻方面，多篇文章回顾了</span><span style="font-family:Helvetica;">Apache: Big Data North America</span><span style="font-family:宋体;">会议，另外有一组跨越多个不同数据系统分析纽约出租车数据的系列文章。</span></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">技术新闻</span></strong><strong></strong></p>  <p><span style="font-family:Helvetica;">Databricks</span><span style="font-family:宋体;">博客分析了</span><span style="font-family:Helvetica;">Apache Spark</span><span style="font-family:宋体;">中两种逼近算法。之一，</span><span style="font-family:Helvetica;">&#8220;approxCountDistict&#8221;</span><span style="font-family:宋体;">是用来评估不同值的数量；之二，</span><span style="font-family: Helvetica;">&#8220;approxQuantile&#8221;</span><span style="font-family:宋体;">用于生成逼近百分比。本文介绍了算法和可视化精度不同的残差。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://databricks.com/blog/2016/05/19/approximate-algorithms-in-apache-spark-hyperloglog-and-quantiles.html</span></p>  <p>&nbsp;</p>  <p><span style="font-family:宋体;">本教程描述了如何使用</span><span style="font-family:Helvetica;">Apache Hadoop HDFS</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Apache Solr</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Hue</span><span style="font-family:宋体;">存储、索引、查询</span><span style="font-family:Helvetica;">DICOM</span><span style="font-family:宋体;">格式的医学影像。文章贯穿了加载和获取数据的整个步骤。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://blog.cloudera.com/blog/2016/05/how-to-process-and-index-medical-images-with-apache-hadoop-and-apache-solr/</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">MapR Streams</span><span style="font-family: 宋体;">是一个</span><span style="font-family:Helvetica;">API</span><span style="font-family:宋体;">兼容</span><span style="font-family:Helvetica;">Apache Kafka</span><span style="font-family:宋体;">的系统。本文在宏观上比较了</span><span style="font-family:Helvetica;">MapR Streams</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">的异同。同时阐明了</span><span style="font-family:Helvetica;">Kafka Streams</span><span style="font-family:宋体;">怎样和</span><span style="font-family:Helvetica;">MapR Streams</span><span style="font-family:宋体;">扯上关系的。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://www.mapr.com/blog/apache-kafka-and-mapr-streams-terms-techniques-and-new-designs</span></p>  <p>&nbsp;</p>  <p><span style="font-family:宋体;">本文在我看来是最清晰介绍</span><span style="font-family:Helvetica;">Paxos</span><span style="font-family:宋体;">的文章之一，</span><span style="font-family:Helvetica;">Paxos</span><span style="font-family:宋体;">为分布式系统构建了一致性协议。本文用绘图计算机和分布式拍卖示范了这个协议。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://ifeanyi.co/posts/understanding-consensus/</span></p>  <p>&nbsp;</p>  <p><span style="font-family:宋体;">基于</span><span style="font-family:Helvetica;">Apache: Big Data North America</span><span style="font-family:宋体;">会议上的一篇演讲。</span><span style="font-family:Helvetica;">Datanami</span><span style="font-family:宋体;">窥探了即将发布的</span><span style="font-family:Helvetica;">Apache Hadoop 3</span><span style="font-family: 宋体;">的新特性。包括，</span><span style="font-family:Helvetica;">shell</span><span style="font-family:宋体;">脚本重写、任务集本地优化、内存大小自动伸缩能力、支持</span><span style="font-family:Helvetica;">HDFS erasure codings</span><span style="font-family:宋体;">。本文着重在</span><span style="font-family:Helvetica;">erasure codings</span><span style="font-family:宋体;">上，文章密切关注了</span><span style="font-family:Helvetica;">erasure codings</span><span style="font-family:宋体;">在存储效率方面的提升（</span><span style="font-family: Helvetica;">3x</span><span style="font-family:宋体;">磁盘消耗降低到</span><span style="font-family:Helvetica;">1.5x</span><span style="font-family:宋体;">）。</span></p>  <p align="left"><a href="http://www.datanami.com/2016/05/18/hadoop-3-poised-boost-storage-capacity-resilience-erasure-coding/"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">http://www.datanami.com/2016/05/18/hadoop-3-poised-boost-storage-capacity-resilience-erasure-coding/</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:宋体;">这篇演讲来自于</span><span style="font-family:Helvetica;">PyData</span><span style="font-family:宋体;">柏林会议，描述了</span><span style="font-family:Helvetica;">Apache Arrow</span><span style="font-family: 宋体;">和</span><span style="font-family:Helvetica;">Feather</span><span style="font-family:宋体;">文件格式，探究了数据在跨语言</span><span style="font-family:Helvetica;">/</span><span style="font-family:宋体;">框架互操作性的工作机制。</span></p>  <p align="left"><a href="http://www.slideshare.net/wesm/python-data-ecosystem-thoughts-on-building-for-the-future"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">http://www.slideshare.net/wesm/python-data-ecosystem-thoughts-on-building-for-the-future</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:宋体;">发布了两个来自于不同会议与</span><span style="font-family:Helvetica;">Apache Kafka</span><span style="font-family:宋体;">有关的演讲视频。第一个讨论了</span><span style="font-family: Helvetica;">Kafka</span><span style="font-family:宋体;">的安全特性，第二个探索了</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">如何跨系统共享数据。</span></p>  <p align="left"><a href="https://www.oreilly.com/learning/securing-apache-kafka"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">https://www.oreilly.com/learning/securing-apache-kafka</span></a></p>  <p><a href="https://www.infoq.com/presentations/event-streams-kafka"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">https://www.infoq.com/presentations/event-streams-kafka</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:宋体;">这篇博客集成了数篇利用</span><span style="font-family:Helvetica;">Amazon Redshift</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Google BigQuery</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Postgres</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Presto</span><span style="font-family:宋体;">数据系统加载</span><span style="font-family:Helvetica;">/</span><span style="font-family:宋体;">查询纽约出租车数据的文章。除了原始基准测试，还详细介绍了如何处理故障、优化、比较替代方案（</span><span style="font-family:Helvetica;">AWS</span><span style="font-family:宋体;">的</span><span style="font-family:Helvetica;">S3</span><span style="font-family:宋体;">与</span><span style="font-family:Helvetica;">HDFS</span><span style="font-family:宋体;">比）。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://tech.marksblogg.com/all-billion-nyc-taxi-rides-redshift.html</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">O'Reilly</span><span style="font-family:宋体;">撰文介绍了通过</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Flink</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Elasticsearch</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Kibana</span><span style="font-family:宋体;">怎样实现</span><span style="font-family:Helvetica;">kappa</span><span style="font-family:宋体;">架构。文章概述了</span><span style="font-family:Helvetica;">lambda</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">kappa</span><span style="font-family:宋体;">架构，介绍了主要的架构组件，以及怎样设置使用贝叶斯模型发现新奇事物。</span></p>  <p align="left"><a href="https://www.oreilly.com/ideas/applying-the-kappa-architecture-in-the-telco-industry"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">http://www.oreilly.com/ideas/applying-the-kappa-architecture-in-the-telco-industry</span></a></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">其他新闻</span></strong><strong></strong></p>  <p><span style="font-family:宋体;">本文列举了最近在</span><span style="font-family:Helvetica;">Apache: Big Data North America</span><span style="font-family:宋体;">会议上提到的几个大数据生态系统项目。有不少是我们没纳入视线的内容。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://www.datanami.com/2016/05/11/open-source-tour-de-force-apache-big-data-2016/</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Pivotal</span><span style="font-family:宋体;">博客有一篇关于大数据和敏捷开发有趣的文章。大数据系统往往停留在非敏捷的世界，例如在装载数据前需求要收集到位，模型要定义好。本文认为，没有在云环境中经过长期验证的话，要对这种方式进行约束（有限的能力和性能、竖井式数据等）。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://blog.pivotal.io/big-data-pivotal/features/when-it-comes-to-big-data-cloud-and-agility-go-hand-in-hand</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Databricks</span><span style="font-family:宋体;">发布了他们记录的网络会议视频</span><span style="font-family: Helvetica;">&#8220;Apache Spark MLlib: From Quick Start to Scikit-Learn&#8221;</span><span style="font-family:宋体;">。除了视频内容，他们还在会议中解答了八个常见问题。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://databricks.com/blog/2016/05/18/spark-mllib-from-quick-start-to-scikit-learn.html</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Hortonworks</span><span style="font-family: 宋体;">博客回顾了</span><span style="font-family:Helvetica;">Apache Storm</span><span style="font-family:宋体;">的历史。</span><span style="font-family:Helvetica;">2011</span><span style="font-family:宋体;">年开源，</span><span style="font-family:Helvetica;">2013</span><span style="font-family:宋体;">年进入</span><span style="font-family:Helvetica;">Apache</span><span style="font-family:宋体;">孵化器，</span><span style="font-family:Helvetica;">2014</span><span style="font-family:宋体;">年成为顶级项目，今年初发布了</span><span style="font-family:Helvetica;">1.0</span><span style="font-family:宋体;">版。本文论述了每个里程碑的主要技术进步。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://hortonworks.com/blog/brief-history-apache-storm/</span></p>  <p>&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">HBaseCon</span><span style="font-family:宋体;">本周在旧金山召开。这次会议，</span><span style="font-family:Helvetica;">Apple</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Yahoo</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Facebook</span><span style="font-family:宋体;">都有演讲材料。</span></p>  <p align="left"><a href="http://hbasecon.com/"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline: none">http://hbasecon.com</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">MapR</span><span style="font-family:宋体;">发图庆祝了过去一年中</span><span style="font-family: Helvetica;">Apache Drill</span><span style="font-family:宋体;">取得的成绩。一年中发布了</span><span style="font-family:Helvetica;">7</span><span style="font-family:宋体;">个版本，完成了多个里程碑。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://www.mapr.com/blog/happy-anniversary-apache-drill-what-difference-year-makes</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Datanami</span><span style="font-family:宋体;">发布了在</span><span style="font-family:Helvetica;">Apache: Big Data North America</span><span style="font-family:宋体;">会议上，</span><span style="font-family:Helvetica;">ASF</span><span style="font-family:宋体;">总监</span><span style="font-family:Helvetica;">Jim Jagielski</span><span style="font-family: 宋体;">和</span><span style="font-family:Helvetica;">ODPi</span><span style="font-family:宋体;">项目总监</span><span style="font-family:Helvetica;">John Mertic</span><span style="font-family:宋体;">的问答录，如大家所料，主要话题还是</span><span style="font-family:Helvetica;">ASF</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">ODPi</span><span style="font-family:宋体;">的关系。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://www.datanami.com/2016/05/20/apache-foundation-keeps-eyes-wide-open-odpi/</span></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">产品发布</span></strong><strong></strong></p>  <p align="left"><span style="font-family:Helvetica;">LinkedIn</span><span style="font-family:宋体;">开源了</span><span style="font-family:Helvetica;">Ambry</span><span style="font-family:宋体;">，他们的</span><span style="font-family:Helvetica;">ObjectStore</span><span style="font-family: 宋体;">分布式系统。</span><span style="font-family:Helvetica;">Ambry</span><span style="font-family:宋体;">代码已提交到</span><span style="font-family:Helvetica;">github</span><span style="font-family:宋体;">，这篇博文介绍了</span><span style="font-family:Helvetica;">Ambry</span><span style="font-family:宋体;">的服务承诺，设计目标，体系架构和接口。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">https://engineering.linkedin.com/blog/2016/05/introducing-and-open-sourcing-ambry---linkedins-new-distributed-</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:宋体;">由</span><span style="font-family:Helvetica;">apache HAWQ</span><span style="font-family:宋体;">（孵化中）驱动的</span><span style="font-family:Helvetica;">Pivotal HDB </span><span style="font-family:宋体;">本周发布了</span><span style="font-family:Helvetica;">2.0</span><span style="font-family:宋体;">版，</span><span style="font-family:Helvetica;">HDB</span><span style="font-family:宋体;">为</span><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">提供了分析数据库。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">https://blog.pivotal.io/big-data-pivotal/products/fail-fast-and-ask-more-questions-of-your-data-with-hdb-2-0</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Apache Mahout</span><span style="font-family:宋体;">本周发布了</span><span style="font-family:Helvetica;">0.12.1</span><span style="font-family:&quot;MS Mincho&quot;;MS Mincho&quot;;">版，</span><span style="font-family:Helvetica;">Mahout</span><span style="font-family:宋体;">是一个机器学习和数据挖掘系统。本次发布旨在推进</span><span style="font-family:Helvetica;">Flink</span><span style="font-family:宋体;">与</span><span style="font-family:Helvetica;">Mahout</span><span style="font-family:宋体;">的集成。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">http://mail-archives.us.apache.org/mod_mbox/www-announce/201605.mbox/%3CCAOtpBjhshagyLN3Qnt0xRnc7YbnMVJjTS4piVXL7LiS2pQguXw@mail.gmail.com%3E</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Apache Tajo</span><span style="font-family:宋体;">发布了</span><span style="font-family:Helvetica;">0.11.3</span><span style="font-family:宋体;">版。</span><span style="font-family:Helvetica;">Tajo</span><span style="font-family:宋体;">是</span><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">的数据仓库。本次发布修正了</span><span style="font-family:Helvetica;">5</span><span style="font-family:宋体;">个</span><span style="font-family:Helvetica;">bug</span><span style="font-family:宋体;">。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">http://tajo.apache.org/releases/0.11.3/announcement.html</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">MongoDB</span><span style="font-family:宋体;">为</span><span style="font-family:Helvetica;">Apache Spark</span><span style="font-family: 宋体;">发布了新的</span><span style="font-family:Helvetica;">MongoDB Connector</span><span style="font-family:宋体;">。除了对应</span><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">的</span><span style="font-family:Helvetica;">Hadoop InputFormat shim</span><span style="font-family:宋体;">外，该</span><span style="font-family:Helvetica;">Connector</span><span style="font-family:宋体;">还有其他特性。最后，还解释了</span><span style="font-family: Helvetica;">MongoDB</span><span style="font-family:宋体;">一些关键特性。</span></p>  <p align="left"><a href="https://www.mongodb.com/blog/post/mongodb-connector-for-apache-spark-announcing-early-access-program-and-new-spark-training"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">https://www.mongodb.com/blog/post/mongodb-connector-for-apache-spark-announcing-early-access-program-and-new-spark-training</span></a></p>  <p align="left"><a href="http://rosslawley.co.uk/introducing-a-new=mongodb-spark-connector/"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">http://rosslawley.co.uk/introducing-a-new=mongodb-spark-connector/</span></a></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">SyncSort</span><span style="font-family:宋体;">发布了</span><span style="font-family:Helvetica;">DMX-h v9</span><span style="font-family:宋体;">，支持</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">以及新的智能执行框架。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">http://insidebigdata.com/2016/05/20/syncsorts-latest-innovations-simplify-integration-of-streaming-data-in-spark-kafka-and-hadoop-for-real-time-analytics/</span></p>  <p align="left">&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">活动</span></strong><strong></strong></p>  <p align="left"><span style="font-size:14.0pt;font-family:SimSun;">中国</span></p>  <p align="left"><span style="font-family:SimSun;">无</span></p><img src ="http://www.blogjava.net/rosen/aggbug/430838.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/rosen/" target="_blank">Rosen</a> 2016-06-08 16:42 <a href="http://www.blogjava.net/rosen/archive/2016/06/08/430838.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Hadoop周刊—第 169 期</title><link>http://www.blogjava.net/rosen/archive/2016/05/15/430513.html</link><dc:creator>Rosen</dc:creator><author>Rosen</author><pubDate>Sun, 15 May 2016 12:30:00 GMT</pubDate><guid>http://www.blogjava.net/rosen/archive/2016/05/15/430513.html</guid><wfw:comment>http://www.blogjava.net/rosen/comments/430513.html</wfw:comment><comments>http://www.blogjava.net/rosen/archive/2016/05/15/430513.html#Feedback</comments><slash:comments>1</slash:comments><wfw:commentRss>http://www.blogjava.net/rosen/comments/commentRss/430513.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/rosen/services/trackbacks/430513.html</trackback:ping><description><![CDATA[<p class="MsoNormal" align="left" style="text-align:left;line-height:10%;
mso-outline-level:1"><strong style="mso-bidi-font-weight:normal"><span lang="EN-US" style="font-size:16.0pt;line-height:10%"><o:p>&nbsp;</o:p></span></strong></p><p align="left" style="line-height: 10%;"><br /></p><p align="left" style="line-height: 10%;"><strong><span style="font-size:16.0pt;line-height:10%">Hadoop</span></strong><strong><span style="font-size:16.0pt;line-height:10%;font-family:宋体;">周刊</span></strong><strong> </strong><strong><span style="font-size:16.0pt;line-height: 10%;font-family:宋体;">第</span></strong><strong><span style="font-size:16.0pt;line-height:10%"> 169 </span></strong><strong><span style="font-size:16.0pt;line-height: 10%;font-family:宋体;">期</span></strong><strong></strong></p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;"><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">启明星辰平台和大数据整体组编译</span></p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;"><span style="font-size:14.0pt;line-height:10%">2016</span><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">年</span><span style="font-size:14.0pt;line-height:10%">5</span><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">月</span><span style="font-size:14.0pt;line-height:10%">8</span><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">日</span></p>  <p>&nbsp;</p>  <p><span style="font-family:宋体;">本周内容短小精练。主题覆盖</span><span style="font-family:Helvetica;">Apache Beam</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">MapR</span><span style="font-family:宋体;">季度业绩、最近的</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">峰会，以及来自</span><span style="font-family:Helvetica;">Cloudera</span><span style="font-family:宋体;">新开源的分布式单元测试框架。</span></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">技术新闻</span></strong><strong></strong></p>  <p><span style="font-family:Helvetica;">Elastic</span><span style="font-family:宋体;">分析了宕机事件的根源。错误配置</span><span style="font-family: Helvetica;">ZooKeeper</span><span style="font-family:宋体;">内存设置会引起过度的</span><span style="font-family:Helvetica;">GC</span><span style="font-family:宋体;">，这将从根本上导致</span><span style="font-family:Helvetica;">ZooKeeper</span><span style="font-family:宋体;">集群丢失。文章介绍了一些缓解策略，用来防止未来类似问题的发生。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://www.elastic.co/blog/elastic-cloud-outage-april-2016</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Cask</span><span style="font-family:宋体;">博客简明扼要的归纳了最近</span><span style="font-family: Helvetica;">Big Data Applications Meetup</span><span style="font-family:宋体;">的花絮。首先出场的是</span><span style="font-family:Helvetica;">Pachyderm</span><span style="font-family:宋体;">，它基于</span><span style="font-family:Helvetica;">Docker</span><span style="font-family:宋体;">容器提供</span><span style="font-family:Helvetica;">&#8220;</span><span style="font-family:宋体;">数据</span><span style="font-family:Helvetica;">Git&#8221;</span><span style="font-family:宋体;">语义。第二个出场的是</span><span style="font-family: Helvetica;">TubeMogul</span><span style="font-family:宋体;">大数据平台，</span><span style="font-family:Helvetica;">TubeMogul</span><span style="font-family:宋体;">构建于</span><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Hive</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Presto</span><span style="font-family:宋体;">之上。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://blog.cask.co/2016/05/pachyderm-and-tubemogul-share-their-big-data-application-platforms-and-experience/</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Google</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">dataArtisans</span><span style="font-family:宋体;">同时撰文介绍了</span><span style="font-family:Helvetica;">Apache Beam</span><span style="font-family:宋体;">（前生是</span><span style="font-family:Helvetica;">Google Dataflow SDK</span><span style="font-family:宋体;">）。</span><span style="font-family:Helvetica;">Google</span><span style="font-family:宋体;">的文章解释了为何开源和开发</span><span style="font-family:Helvetica;">Beam</span><span style="font-family:宋体;">的动机，</span><span style="font-family:Helvetica;">dataArtisans</span><span style="font-family: 宋体;">的文章介绍他们对</span><span style="font-family:Helvetica;">Beam</span><span style="font-family:宋体;">模型的支持以及怎样考虑</span><span style="font-family:Helvetica;">Flink</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">Beam API</span><span style="font-family:宋体;">之间的关系。</span></p>  <p align="left"><a href="https://cloud.google.com/blog/big-data/2016/05/why-apache-beam-a-google-perspective"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">https://cloud.google.com/blog/big-data/2016/05/why-apache-beam-a-google-perspective</span></a></p>  <p><a href="http://data-artisans.com/why-apache-beam/"><span style="font-family:Helvetica;color:#386EFF;text-decoration: none;text-underline:none">http://data-artisans.com/why-apache-beam/</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">IBM Hadoop dev</span><span style="font-family: 宋体;">博客有个关于安装</span><span style="font-family:Helvetica;">Python</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Scala</span><span style="font-family:宋体;">和为</span><span style="font-family:Helvetica;">Jupyter notebook</span><span style="font-family:宋体;">嵌入</span><span style="font-family:Helvetica;">R</span><span style="font-family:宋体;">内核的操作说明。同时，也说明了怎样连接</span><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">和通过</span><span style="font-family:Helvetica;">SSL</span><span style="font-family:宋体;">暴露</span><span style="font-family:Helvetica;">notebook</span><span style="font-family:宋体;">。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://developer.ibm.com/hadoop/blog/2016/05/04/install-jupyter-notebook-spark/</span></p>  <p>&nbsp;</p>  <p><span style="font-family:宋体;">本文介绍了</span><span style="font-family:Helvetica;">Mongo Hadoop</span><span style="font-family:宋体;">的连接函数是如何窜起</span><span style="font-family: Helvetica;">Spark</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">Mon