﻿<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/"><channel><title>语源科技BlogJava-&lt;b&gt;成都心情&lt;/b&gt;</title><link>http://www.blogjava.net/rosen/</link><description /><language>zh-cn</language><lastBuildDate>Fri, 24 Apr 2026 03:33:47 GMT</lastBuildDate><pubDate>Fri, 24 Apr 2026 03:33:47 GMT</pubDate><ttl>60</ttl><item><title>Hadoop周刊—第 176 期</title><link>http://www.blogjava.net/rosen/archive/2016/07/12/431174.html</link><dc:creator>Rosen</dc:creator><author>Rosen</author><pubDate>Tue, 12 Jul 2016 13:21:00 GMT</pubDate><guid>http://www.blogjava.net/rosen/archive/2016/07/12/431174.html</guid><wfw:comment>http://www.blogjava.net/rosen/comments/431174.html</wfw:comment><comments>http://www.blogjava.net/rosen/archive/2016/07/12/431174.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/rosen/comments/commentRss/431174.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/rosen/services/trackbacks/431174.html</trackback:ping><description><![CDATA[<p align="left" style="line-height: 10%;"><strong>&nbsp;</strong></p>  <p align="left" style="line-height: 10%;"><strong><span style="font-size:16.0pt;line-height:10%">Hadoop</span></strong><strong><span style="font-size:16.0pt;line-height:10%;font-family:宋体;">周刊</span></strong><strong> </strong><strong><span style="font-size:16.0pt;line-height: 10%;font-family:宋体;">第</span></strong><strong><span style="font-size:16.0pt;line-height:10%"> 176 </span></strong><strong><span style="font-size:16.0pt;line-height: 10%;font-family:宋体;">期</span></strong><strong></strong></p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;"><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">启明星辰平台和大数据总体组编译</span></p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;"><span style="font-size:14.0pt;line-height:10%">2016</span><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">年</span><span style="font-size:14.0pt;line-height:10%">6</span><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">月</span><span style="font-size:14.0pt;line-height:10%">29</span><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">日</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">峰会本周在圣何塞召开，所以很期待在下期周刊看到新项目的发布和精彩演讲（请向我们提供任何相关的幻灯片）。至于本期周刊，有大量关于</span><span style="font-family:Helvetica;">Kafka Streams</span><span style="font-family:宋体;">、从</span><span style="font-family:Helvetica;">Amazon Kinesis</span><span style="font-family:宋体;">向</span><span style="font-family:Helvetica;">Google BigQuery</span><span style="font-family:宋体;">传递流式数据、</span><span style="font-family:Helvetica;">Google</span><span style="font-family:宋体;">数据集搜索系统的文章。</span></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">技术新闻</span></strong><strong></strong></p>  <p><span style="font-family:Helvetica;">Shine</span><span style="font-family:宋体;">介绍了他们如何使用</span><span style="font-family:Helvetica;">Amazon Lambda</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">Amazon Kinesis</span><span style="font-family:宋体;">，以及为</span><span style="font-family: Helvetica;">Apache web</span><span style="font-family: 宋体;">服务器提供的</span><span style="font-family: Helvetica;">Kinesis</span><span style="font-family: 宋体;">代理（用于采日志）</span><span style="font-family:宋体;">，以及从</span><span style="font-family:Helvetica;">EC2</span><span style="font-family:宋体;">移动数据到</span><span style="font-family:Helvetica;">Google BigQuery</span><span style="font-family: 宋体;">的内容。本文提供了</span><span style="font-family:Helvetica;">Lambda</span><span style="font-family:宋体;">函数（</span><span style="font-family:Helvetica;">javascript</span><span style="font-family:宋体;">编写）代码片段，规模和开销方面的信息，描述了如何通过</span><span style="font-family:Helvetica;">gzip</span><span style="font-family:宋体;">压缩数据从而优化传输开销。</span></p>  <p align="left"><a href="https://blog.shinetech.com/2016/06/21/kinesis-lambda-bigquery/"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">https://blog.shinetech.com/2016/06/21/kinesis-lambda-bigquery/</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Cloudera</span><span style="font-family:宋体;">博客撰文介绍了如何通过</span><span style="font-family: Helvetica;">Apache Spark</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Apache Impala</span><span style="font-family:宋体;">（孵化中）、</span><span style="font-family:Helvetica;">Hue</span><span style="font-family:宋体;">对梦之队数据进行分析。本文主要聚焦在分析上，附带了些</span><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">代码以及</span><span style="font-family:Helvetica;">Hue</span><span style="font-family:宋体;">的功能演示。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://blog.cloudera.com/blog/2016/06/how-to-analyze-fantasy-sports-with-apache-spark-and-sql-part-2-data-exploration/</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">KDnuggets</span><span style="font-family:宋体;">撰文介绍了</span><span style="font-family:Helvetica;">13</span><span style="font-family:宋体;">个和</span><span style="font-family:Helvetica;">Apache Spark</span><span style="font-family:宋体;">相关的主要</span><span style="font-family:Helvetica;">API/</span><span style="font-family:宋体;">项目</span><span style="font-family:Helvetica;">/</span><span style="font-family:宋体;">名词。包括</span><span style="font-family:Helvetica;">RDD</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">DataFrame</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Dataset</span><span style="font-family:宋体;">、结构化流式计算、</span><span style="font-family:Helvetica;">GraphX</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Tungsten</span><span style="font-family:宋体;">。每个条目都有一段章节介绍，足够很好的了解</span><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">主要特性了。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://www.kdnuggets.com/2016/06/spark-key-terms-explained.html</span></p>  <p>&nbsp;</p>  <p><span style="font-family:宋体;">本文来自</span><span style="font-family:Helvetica;">Confluent</span><span style="font-family:宋体;">博客，介绍了那些虽看起来简单却又不简单的</span><span style="font-family:Helvetica;">Kafka Streams</span><span style="font-family:宋体;">应用。例如用</span><span style="font-family:Helvetica;">Kafka Streams</span><span style="font-family:宋体;">编写结合用户点击流数据和用户位置数据的程序。后者存储在</span><span style="font-family:Helvetica;">KTable</span><span style="font-family:宋体;">中，</span><span style="font-family:Helvetica;">KTable</span><span style="font-family:宋体;">提供了类似带有数据库表主键的抽象（主键的最新值通过</span><span style="font-family:Helvetica;">API</span><span style="font-family:宋体;">暴露）。最后的程序倒是简单</span><span style="font-family:Helvetica;">&#8212;&#8212;</span><span style="font-family:宋体;">只有几行代码。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://www.confluent.io/blog/distributed-real-time-joins-and-aggregations-on-user-activity-events-using-kafka-streams</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Cloudera</span><span style="font-family:宋体;">博客撰文介绍了</span><span style="font-family:Helvetica;">meinstadt.de</span><span style="font-family:宋体;">构建在</span><span style="font-family:Helvetica;">Apache Flume</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Apache Spark Streaming</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Apache Impala</span><span style="font-family:宋体;">（孵化中）上的</span><span style="font-family:Helvetica;">HTTP</span><span style="font-family:宋体;">请求异常检测系统。实现代码放在了</span><span style="font-family: Helvetica;">github</span><span style="font-family:宋体;">上。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://blog.cloudera.com/blog/2016/06/how-to-detect-and-report-web-traffic-anomalies-in-near-real-time/</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">AWS</span><span style="font-family:宋体;">大数据博客有教程介绍了如何使用</span><span style="font-family: Helvetica;">Apache Spark</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">Apache Zeppelin</span><span style="font-family:宋体;">从</span><span style="font-family:Helvetica;">Amazon EMR</span><span style="font-family:宋体;">集群处理</span><span style="font-family:Helvetica;">Amazon Kinesis</span><span style="font-family:宋体;">流数据。本文包含了一些通过</span><span style="font-family:Helvetica;">Zeppelin notebook</span><span style="font-family:宋体;">运行</span><span style="font-family:Helvetica;">SQL</span><span style="font-family:宋体;">产生的数据可视化范例。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://blogs.aws.amazon.com/bigdata/post/Tx3K805CZ8WFBRP/Analyze-Realtime-Data-from-Amazon-Kinesis-Streams-Using-Zeppelin-and-Spark-Strea</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Apache Kudu</span><span style="font-family: 宋体;">（孵化中）接近</span><span style="font-family:Helvetica;">1.0</span><span style="font-family:宋体;">版发布了，将全面支持高可用性。本文介绍了这最后一块拼图</span><span style="font-family:Helvetica;">&#8220;</span><span style="font-family:宋体;">主复制</span><span style="font-family:Helvetica;">&#8221;</span><span style="font-family:宋体;">是如何实现的。晒了下</span><span style="font-family:Helvetica;">JIRA</span><span style="font-family:宋体;">上各种问题的跟进的情况，以及完成与剩余的测试。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://kudu.apache.org/2016/06/24/multi-master-1-0-0.html</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Google</span><span style="font-family:宋体;">的所有数据平台拥有超过</span><span style="font-family: Helvetica;">260</span><span style="font-family:宋体;">亿的数据集，每天要添加和删除</span><span style="font-family:Helvetica;">16</span><span style="font-family:宋体;">亿的数据集路径。为了跟踪、查询、比较数据集，他们研发了</span><span style="font-family: Helvetica;">Google Dataset Search</span><span style="font-family:宋体;">（</span><span style="font-family:Helvetica;">GOODS</span><span style="font-family:宋体;">）。</span><span style="font-family:Helvetica;">GOODS</span><span style="font-family:宋体;">跟踪由</span><span style="font-family:Helvetica;">API</span><span style="font-family:宋体;">暴露的元数据，这些元数据被用于检索、监控等。</span></p>  <p align="left"><a href="http://dl.acm.org/citation.cfm?id=2903730"><span style="font-family:Helvetica;color:#386EFF;text-decoration: none;text-underline:none">http://dl.acm.org/citation.cfm?id=2903730</span></a></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">其他新闻</span></strong><strong></strong></p>  <p><span style="font-family:Helvetica;">SiliconAngle</span><span style="font-family: 宋体;">采访了</span><span style="font-family:Helvetica;">Hortonworks CEO Rob Bearden</span><span style="font-family:宋体;">。主题包括业界趋势、</span><span style="font-family:Helvetica;">Hortonworks</span><span style="font-family:宋体;">财务、</span><span style="font-family:Helvetica;">Hortonworks</span><span style="font-family: 宋体;">的非</span><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">技术以及物联网。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://siliconangle.com/blog/2016/06/24/hadoop-and-beyond-a-conversation-with-hortonworks-ceo-rob-bearden/</span></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">产品发布</span></strong><strong></strong></p>  <p align="left"><span style="font-family:Helvetica;">Apache Sentry</span><span style="font-family:宋体;">本周发布了</span><span style="font-family:Helvetica;">1.7.0</span><span style="font-family:宋体;">版，修复了</span><span style="font-family:Helvetica;">bug</span><span style="font-family:宋体;">，增加了新特性和其他方面的提升。本次发布把</span><span style="font-family:Helvetica;">Hive</span><span style="font-family:宋体;">授权框架升级到了第二版。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">http://mail-archives.us.apache.org/mod_mbox/www-announce/201606.mbox/%3CCAPOmu3sDqdzu9ntDSvkMaDRQnVfHrkGV5qhyh-ZRiMmwgMMvBA@mail.gmail.com%3E</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:宋体;">基于</span><span style="font-family:Helvetica;">Apache Cassandra 3.0</span><span style="font-family:宋体;">构建的</span><span style="font-family:Helvetica;">DataStax Enterprise 5.0</span><span style="font-family:宋体;">，增加了对图数据、分层存储、</span><span style="font-family:Helvetica;">Cassandra</span><span style="font-family:宋体;">多实例的支持。本次发布也增加了诸如加密和基于角色访问控制的附加安全特性支持。</span></p>  <p align="left"><a href="https://www.datastax.com/2016/06/introducing-datastax-enterprise-5-0"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">https://www.datastax.com/2016/06/introducing-datastax-enterprise-5-0</span></a></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Driven</span><span style="font-family:宋体;">，大数据应用性能监控系统发布了</span><span style="font-family:Helvetica;">2.2</span><span style="font-family:宋体;">版。本次发布的亮点是对</span><span style="font-family:Helvetica;">Apache Spark</span><span style="font-family:宋体;">的监控提供了支持。</span></p>  <p align="left"><a href="http://www.driven.io/2016/06/driven-inc-delivering-hadoop-spark-performance-monitoring-announces-driven-2-2/"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">http://www.driven.io/2016/06/driven-inc-delivering-hadoop-spark-performance-monitoring-announces-driven-2-2/</span></a></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">BlueData</span><span style="font-family:宋体;">发布了他们为</span><span style="font-family:Helvetica;">Amazon Web Services</span><span style="font-family:宋体;">提供的</span><span style="font-family:Helvetica;">EPIC</span><span style="font-family:宋体;">企业大数据既服务产品。本产品通过简单的点击就能自动装载到基于</span><span style="font-family: Helvetica;">Docker</span><span style="font-family:宋体;">的</span><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">集群。</span></p>  <p align="left"><a href="http://www.bluedata.com/blog/2016/06/big-data-as-a-service-on-prem-or-cloud-bdaas/"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">http://www.bluedata.com/blog/2016/06/big-data-as-a-service-on-prem-or-cloud-bdaas/</span></a></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Apache Accumulo</span><span style="font-family:宋体;">发布了</span><span style="font-family:Helvetica;">1.7.2</span><span style="font-family:宋体;">版。本次发布修复了</span><span style="font-family:Helvetica;">write-ahead</span><span style="font-family:宋体;">日志处理方式，优化了</span><span style="font-family:Helvetica;">RFiles</span><span style="font-family:宋体;">，以及性能上的小提升。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">https://accumulo.apache.org/release_notes/1.7.2.html</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Apache ZooKeeper</span><span style="font-family:宋体;">的顶级</span><span style="font-family:Helvetica;">SDK</span><span style="font-family:宋体;">，</span><span style="font-family:Helvetica;">Apache Curator</span><span style="font-family:宋体;">发布了</span><span style="font-family:Helvetica;">2.11.0</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">3.2.0</span><span style="font-family:宋体;">版。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">https://cwiki.apache.org/confluence/display/CURATOR/Releases#Releases-June23,2016,Releases2.11.0and3.2.0available</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Apache Hive</span><span style="font-family:宋体;">发布了</span><span style="font-family:Helvetica;">2.1.0</span><span style="font-family:宋体;">版。修复了大量</span><span style="font-family:Helvetica;">bug</span><span style="font-family:宋体;">和功能增强，包括对</span><span style="font-family:Helvetica;">Hive</span><span style="font-family:宋体;">的</span><span style="font-family:Helvetica;">Live Longer</span><span style="font-family: 宋体;">和</span><span style="font-family:Helvetica;">Prosper&nbsp;</span><span style="font-family:宋体;">改进和以及</span><span style="font-family:Helvetica;">JDBC</span><span style="font-family:宋体;">支持。</span></p>  <p align="left"><a href="http://mail-archives.us.apache.org/mod_mbox/www-announce/201606.mbox/%3C7194557D-CB5E-45B7-B905-82F27B7CB33F@apache.org%3E"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">http://mail-archives.us.apache.org/mod_mbox/www-announce/201606.mbox/%3C7194557D-CB5E-45B7-B905-82F27B7CB33F@apache.org%3E</span></a></p>  <p align="left">&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">活动</span></strong><strong></strong></p>  <p align="left"><span style="font-size:14.0pt;font-family:SimSun;">中国</span></p>  <p align="left"><span style="font-family:Helvetica;">7</span><span style="font-family:宋体;">月</span><span style="font-family:Helvetica;">2</span><span style="font-family:宋体;">日</span> <span style="font-family:宋体;">上海</span><span style="font-family:Helvetica;">BigData Streaming</span><span style="font-family: 宋体;">第三次见面会</span></p><img src ="http://www.blogjava.net/rosen/aggbug/431174.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/rosen/" target="_blank">Rosen</a> 2016-07-12 21:21 <a href="http://www.blogjava.net/rosen/archive/2016/07/12/431174.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Hadoop周刊—第 175 期</title><link>http://www.blogjava.net/rosen/archive/2016/07/01/431070.html</link><dc:creator>Rosen</dc:creator><author>Rosen</author><pubDate>Fri, 01 Jul 2016 07:44:00 GMT</pubDate><guid>http://www.blogjava.net/rosen/archive/2016/07/01/431070.html</guid><wfw:comment>http://www.blogjava.net/rosen/comments/431070.html</wfw:comment><comments>http://www.blogjava.net/rosen/archive/2016/07/01/431070.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/rosen/comments/commentRss/431070.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/rosen/services/trackbacks/431070.html</trackback:ping><description><![CDATA[<p align="left" style="line-height: 10%;"><strong>&nbsp;</strong></p>  <p align="left" style="line-height: 10%;"><strong><span style="font-size:16.0pt;line-height:10%">Hadoop</span></strong><strong><span style="font-size:16.0pt;line-height:10%;font-family:宋体;">周刊</span></strong><strong> </strong><strong><span style="font-size:16.0pt;line-height: 10%;font-family:宋体;">第</span></strong><strong><span style="font-size:16.0pt;line-height:10%"> 175 </span></strong><strong><span style="font-size:16.0pt;line-height: 10%;font-family:宋体;">期</span></strong><strong></strong></p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;"><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">启明星辰平台和大数据总体组编译</span></p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;"><span style="font-size:14.0pt;line-height:10%">2016</span><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">年</span><span style="font-size:14.0pt;line-height:10%">6</span><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">月</span><span style="font-size:14.0pt;line-height:10%">19</span><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">日</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">峰会已过去一周了，我们已看到有多个产品（项目）敲定了发布时间。所以在技术新闻部分，有关于</span><span style="font-family:Helvetica;">Hadoop Kerberos</span><span style="font-family:宋体;">认证的内容另外还有</span><span style="font-family:Helvetica;">Salsify</span><span style="font-family:宋体;">应用</span><span style="font-family:Helvetica;">Avro</span><span style="font-family:宋体;">的文章。在产品发布部分，包括</span><span style="font-family: Helvetica;">Yandex</span><span style="font-family:宋体;">新近开源的列式数据库在内的多个项目均有新版本发布。</span></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">技术新闻</span></strong><strong></strong></p>  <p><span style="font-family:Helvetica;">OpenCore</span><span style="font-family:宋体;">博客撰文示范了多种</span><span style="font-family:Helvetica;">Hadoop Kerberos</span><span style="font-family:宋体;">认证协议调试工具。尤其示范了如何使用</span><span style="font-family:Helvetica;">UserGropuInformation</span><span style="font-family:宋体;">的</span><span style="font-family:Helvetica;">&#8220;main()&#8221;</span><span style="font-family:宋体;">方法导出一些有用的调试信息。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://www.opencore.com/blog/2016/5/user-name-handling-in-hadoop/</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">YARN</span><span style="font-family:宋体;">系列文章的第四部分，</span><span style="font-family: Helvetica;">Cloduera</span><span style="font-family:宋体;">博客介绍了如何配置公平调度队列。尤其对资源约束设置、队列安置策略和抢占进行了详解。</span></p>  <p align="left"><a href="http://blog.cloudera.com/blog/2016/06/untangling-apache-hadoop-yarn-part-4-fair-scheduler-queue-basics/"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">http://blog.cloudera.com/blog/2016/06/untangling-apache-hadoop-yarn-part-4-fair-scheduler-queue-basics/</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Salsify</span><span style="font-family:宋体;">基于</span><span style="font-family:Helvetica;">Apache Kafka</span><span style="font-family:宋体;">构建了一个异步微服务架构，并采用</span><span style="font-family:Helvetica;">Apache Avro</span><span style="font-family:宋体;">进行数据序列化。该应用使用</span><span style="font-family:Helvetica;">Ruby</span><span style="font-family:宋体;">开发，他们创建了多个新工具使得</span><span style="font-family:Helvetica;">Avro</span><span style="font-family:宋体;">能和</span><span style="font-family:Helvetica;">Ruby</span><span style="font-family:宋体;">语言很好的配合。本文介绍了这些工具和它们的价值：</span><span style="font-family:Helvetica;">avro-builder</span><span style="font-family:宋体;">用于定义记录、基于</span><span style="font-family:Helvetica;">postgres</span><span style="font-family:宋体;">的模式注册表，</span><span style="font-family:Helvetica;">avromatic</span><span style="font-family:宋体;">则从</span><span style="font-family:Helvetica;">avro schema</span><span style="font-family:宋体;">生成模型。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://blog.salsify.com/engineering/adventures-in-avro</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Apache Drill</span><span style="font-family: 宋体;">可以动态推断模式，还支持多模式</span><span style="font-family: Helvetica;">(</span><span style="font-family:宋体;">但相互兼容</span><span style="font-family:Helvetica;">)</span><span style="font-family:宋体;">数据。这种组合使得一些有趣的用例得以实现，例如跨多个不同模式的</span><span style="font-family: Helvetica;">json</span><span style="font-family:宋体;">文件查询。</span><span style="font-family:Helvetica;">MapR</span><span style="font-family:宋体;">博客探究了这些特性并进行了示范。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://www.mapr.com/blog/sql-query-mixed-schema-data-using-apache-drill</span></p>  <p>&nbsp;</p>  <p><span style="font-family:宋体;">本教程展示了如何将</span><span style="font-family:Helvetica;">Druid</span><span style="font-family:宋体;">与</span><span style="font-family:Helvetica;">Apache Kafka</span><span style="font-family: 宋体;">结合构建流式分析和可视化（借助</span><span style="font-family: Helvetica;">Pivot</span><span style="font-family:宋体;">，</span><span style="font-family:Helvetica;">Druid</span><span style="font-family:宋体;">的</span><span style="font-family:Helvetica;">web UI</span><span style="font-family:宋体;">）应用。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://www.confluent.io/blog/building-a-streaming-analytics-stack-with-apache-kafka-and-druid</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Apache Beam</span><span style="font-family: 宋体;">（孵化中）博客撰文介绍了他们在连接</span><span style="font-family:Helvetica;">Apache Flink</span><span style="font-family:宋体;">批处理集群方面的成果。</span><span style="font-family:Helvetica;">Beam</span><span style="font-family:宋体;">是一个开源</span><span style="font-family:Helvetica;">SDK</span><span style="font-family:宋体;">，最初来自于</span><span style="font-family:Helvetica;">Google</span><span style="font-family:宋体;">，用于暴露后端未知数据管道</span><span style="font-family:Helvetica;">API</span><span style="font-family:宋体;">。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://beam.incubator.apache.org/blog/2016/06/13/flink-batch-runner-milestone.html</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Cask Hydrator</span><span style="font-family: 宋体;">是一个通过</span><span style="font-family:Helvetica;">UI</span><span style="font-family:宋体;">界面采用拖拽方式构建数据管道的工具。本教程也演示了如何使用</span><span style="font-family:Helvetica;">Hydrator</span><span style="font-family:宋体;">把数据从</span><span style="font-family:Helvetica;">MySQL</span><span style="font-family:宋体;">导入到</span><span style="font-family:Helvetica;">HDFS</span><span style="font-family:宋体;">。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://blog.cask.co/2016/06/bringing-relational-data-into-data-lakes/</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Databricks</span><span style="font-family:宋体;">撰文介绍了即将发布的</span><span style="font-family: Helvetica;">Apache Spark 2.0</span><span style="font-family:宋体;">中新的</span><span style="font-family:Helvetica;">SQL</span><span style="font-family:宋体;">子查询功能。有趣的是，本文以手册形式呈现，最直截了当的展现了代码和范例数据。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://databricks.com/blog/2016/06/17/sql-subqueries-in-apache-spark-2-0.html</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Apache Kudu</span><span style="font-family: 宋体;">（孵化中）博客撰写了在单集群节点使用</span><span style="font-family:Helvetica;">Raft</span><span style="font-family:宋体;">的文章，借此动态扩展到多主节点集群。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://getkudu.io/2016/06/17/raft-consensus-single-node.html</span></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">其他新闻</span></strong><strong></strong></p>  <p><span style="font-family:宋体;">本文指出</span><span style="font-family:Helvetica;">Apache Spark</span><span style="font-family:宋体;">社区如果不用心经营，可能会重走因碎片化导致</span><span style="font-family:Helvetica;">Apache Hadoop</span><span style="font-family:宋体;">生态系统混乱的老路。举例来说，最新版本的</span><span style="font-family:Helvetica;">CDH</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">HDP</span><span style="font-family:宋体;">支持不同版本的</span><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://techcrunch.com/2016/06/12/spark-fragmentation-undermines-community/</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">New Stack</span><span style="font-family:宋体;">撰写了一篇关于</span><span style="font-family:Helvetica;">Concord</span><span style="font-family:宋体;">的文章，</span><span style="font-family:Helvetica;">Concord</span><span style="font-family:宋体;">是一个构建在</span><span style="font-family:Helvetica;">Apache Mesos</span><span style="font-family: 宋体;">上新的流式处理框架（公开测试状态）。</span><span style="font-family:Helvetica;">Concord</span><span style="font-family:宋体;">使用</span><span style="font-family:Helvetica;">C++</span><span style="font-family:宋体;">开发，支持动态拓扑（无需停机实现管道的增加和减少）。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://thenewstack.io/concord-leverages-mesos-high-performance-stream-processing/</span></p>  <p>&nbsp;</p>  <p><span style="font-family:宋体;">随着</span><span style="font-family:Helvetica;">Databricks</span><span style="font-family:宋体;">社区版的正式发布，</span><span style="font-family:Helvetica;">Databricks</span><span style="font-family:宋体;">发布了使用</span><span style="font-family:Helvetica;">Databricks</span><span style="font-family:宋体;">编写</span><span style="font-family:Helvetica;">Apache Spark</span><span style="font-family:宋体;">应用程序系列教程的第一篇。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://databricks.com/blog/2016/06/15/an-introduction-to-writing-apache-spark-applications-on-databricks.html</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">圣何塞峰会于几周前召开，期间举行了题为</span><span style="font-family:Helvetica;">&#8220;</span><span style="font-family:宋体;">大数据行业中的女性</span><span style="font-family:Helvetica;">&#8221;</span><span style="font-family:宋体;">专场午宴。</span><span style="font-family:Helvetica;">Hortonworks</span><span style="font-family: 宋体;">博客特意采访了午宴主持人</span><span style="font-family: Helvetica;">Hortonworks CMO</span><span style="font-family:宋体;">：</span><span style="font-family:Helvetica;">Ingrid Burton</span><span style="font-family:宋体;">。</span></p>  <p align="left"><a href="http://hortonworks.com/blog/summer-hortonworks-part-2-wibd-assertive-innovative-take-risks/"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">http://hortonworks.com/blog/summer-hortonworks-part-2-wibd-assertive-innovative-take-risks/</span></a></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">产品发布</span></strong><strong></strong></p>  <p align="left"><span style="font-family:Helvetica;">Apache SystemML</span><span style="font-family:宋体;">（孵化中）最近发布了</span><span style="font-family: Helvetica;">0.10.0</span><span style="font-family:宋体;">版。</span><span style="font-family:Helvetica;">SystemML</span><span style="font-family:宋体;">是一个机器学习框架，由多个项目在背后支撑，包括</span><span style="font-family:Helvetica;">Apache Spark</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">Apache Hadoop</span><span style="font-family:宋体;">。本次发布包括新的</span><span style="font-family:Helvetica;">Spark Matrix Block</span><span style="font-family:宋体;">类型、支持深度学习、性能上的提升、新的</span><span style="font-family:Helvetica;">KNN</span><span style="font-family:宋体;">算法等等。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">http://systemml.apache.org/0.10.0-incubating/release_notes.html</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Apache Mahout</span><span style="font-family:宋体;">，另一个机器学习框架发布了</span><span style="font-family: Helvetica;">0.12.2</span><span style="font-family:宋体;">版。本次发布向着集成</span><span style="font-family:Helvetica;">Apache Zeppelin</span><span style="font-family:宋体;">可视化和支持</span><span style="font-family:Helvetica;">notebook</span><span style="font-family:宋体;">的目标迈进了一步。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">http://mail-archives.us.apache.org/mod_mbox/www-announce/201606.mbox/%3CCAOtpBjgBAuQs5FiX5X_5A+Rd-A1fVz0R7SKttGe4cJuCLRiGww@mail.gmail.com%3E</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Qubole</span><span style="font-family:宋体;">宣布他们的</span><span style="font-family:Helvetica;">HBase-as-a-Service</span><span style="font-family:宋体;">已经在</span><span style="font-family:Helvetica;">AWS</span><span style="font-family:宋体;">上提供。它为长时运行集群提供了许多漂亮的特性。支持</span><span style="font-family:Helvetica;">Hannibal</span><span style="font-family:宋体;">和其它监控工具，集成了</span><span style="font-family:Helvetica;">Apache Zeppelin</span><span style="font-family:宋体;">，并能通过节点引导程序与</span><span style="font-family: Helvetica;">OpenTSDB</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">Apache Phoenix</span><span style="font-family:宋体;">配置。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">https://www.qubole.com/blog/product/quboles-hbase-as-a-service-is-generally-available-on-aws/</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Altiscale</span><span style="font-family:宋体;">发布了</span><span style="font-family:Helvetica;">Altiscale Insight Cloud</span><span style="font-family:宋体;">实时版。本系统由</span><span style="font-family:Helvetica;">Apache HBase</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">Spark Streaming</span><span style="font-family:宋体;">支撑。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">https://www.altiscale.com/blog/announcing-the-altiscale-insight-cloud-real-time-edition/</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">`hs2client`</span><span style="font-family:宋体;">是一个为</span><span style="font-family:Helvetica;">Apache Hive</span><span style="font-family: 宋体;">和</span><span style="font-family:Helvetica;">Apache Impala</span><span style="font-family:宋体;">（孵化中）提供的新</span><span style="font-family:Helvetica;">C++</span><span style="font-family:宋体;">库。除了支持</span><span style="font-family:Helvetica;">C++</span><span style="font-family:宋体;">，这个库还绑定了</span><span style="font-family:Helvetica;">python</span><span style="font-family:宋体;">，可以在</span><span style="font-family:Helvetica;">pandas</span><span style="font-family:宋体;">中把数据读到</span><span style="font-family:Helvetica;">DataFrame</span><span style="font-family:宋体;">。</span></p>  <p align="left"><a href="http://blog.cloudera.com/blog/2016/06/announcing-hs2client-a-fast-new-c-python-thrift-client-for-impala-and-hive/"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">http://blog.cloudera.com/blog/2016/06/announcing-hs2client-a-fast-new-c-python-thrift-client-for-impala-and-hive/</span></a></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">MapR</span><span style="font-family:宋体;">在其发行版中支持了</span><span style="font-family:Helvetica;">Apache Spark 2.0</span><span style="font-family: 宋体;">开发者预览版。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">https://www.mapr.com/blog/spark-20-now-developer-preview-mode-mapr-platform</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Apache Beam</span><span style="font-family:宋体;">发布了其</span><span style="font-family:Helvetica;">0.1.0</span><span style="font-family:宋体;">孵化版，是本项目加入</span><span style="font-family: Helvetica;">Apache</span><span style="font-family:宋体;">孵化器以来首次发布。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">http://beam.incubator.apache.org/beam/release/2016/06/15/first-release.html</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Yandex</span><span style="font-family:宋体;">开源了</span><span style="font-family:Helvetica;">ClickHouse</span><span style="font-family:宋体;">，一个列式分析数据库。本系统为横向和纵向扩展而生。支持复杂数据类型（例如数组）和近似查询。该团队还发布了与其它数据库相比的基准测试结果。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">https://clickhouse.yandex/</span></p>  <p align="left">&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">活动</span></strong><strong></strong></p>  <p align="left"><span style="font-size:14.0pt;font-family:SimSun;">中国</span></p>  <p align="left">&nbsp;</p><img src ="http://www.blogjava.net/rosen/aggbug/431070.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/rosen/" target="_blank">Rosen</a> 2016-07-01 15:44 <a href="http://www.blogjava.net/rosen/archive/2016/07/01/431070.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Hadoop周刊—第 174 期</title><link>http://www.blogjava.net/rosen/archive/2016/06/28/431032.html</link><dc:creator>Rosen</dc:creator><author>Rosen</author><pubDate>Tue, 28 Jun 2016 09:39:00 GMT</pubDate><guid>http://www.blogjava.net/rosen/archive/2016/06/28/431032.html</guid><wfw:comment>http://www.blogjava.net/rosen/comments/431032.html</wfw:comment><comments>http://www.blogjava.net/rosen/archive/2016/06/28/431032.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/rosen/comments/commentRss/431032.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/rosen/services/trackbacks/431032.html</trackback:ping><description><![CDATA[<p align="left" style="line-height: 10%;"><strong>&nbsp;</strong></p>  <p align="left" style="line-height: 10%;"><strong><span style="font-size:16.0pt;line-height:10%">Hadoop</span></strong><strong><span style="font-size:16.0pt;line-height:10%;font-family:宋体;">周刊</span></strong><strong> </strong><strong><span style="font-size:16.0pt;line-height: 10%;font-family:宋体;">第</span></strong><strong><span style="font-size:16.0pt;line-height:10%"> 174 </span></strong><strong><span style="font-size:16.0pt;line-height: 10%;font-family:宋体;">期</span></strong><strong></strong></p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;"><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">启明星辰平台和大数据总体组编译</span></p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;"><span style="font-size:14.0pt;line-height:10%">2016</span><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">年</span><span style="font-size:14.0pt;line-height:10%">6</span><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">月</span><span style="font-size:14.0pt;line-height:10%">12</span><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">日</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">峰会本周在旧金山召开，正如所料，本期周刊有大量关于</span><span style="font-family:Helvetica;">Apache Spark</span><span style="font-family:宋体;">的新闻、公告和版本发布。除</span><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">外，本期还有</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Cask</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Ambari</span><span style="font-family:宋体;">方面的文章。在产品发布部分，有一年来</span><span style="font-family:Helvetica;">Apache Pig</span><span style="font-family:宋体;">首次版本更新，还一个为分布式系统设计的简洁新工具</span><span style="font-family:Helvetica;">Runway</span><span style="font-family:宋体;">，最后是新版</span><span style="font-family:Helvetica;">Apache Kudu</span><span style="font-family:宋体;">（孵化中）。</span></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">技术新闻</span></strong><strong></strong></p>  <p><span style="font-family:Helvetica;">Debezium</span><span style="font-family:宋体;">是一个相对较新的项目，用于数据库和</span><span style="font-family:Helvetica;">Apache Kafka topic</span><span style="font-family:宋体;">行级改变数据捕获。当面支持</span><span style="font-family:Helvetica;">MySQL</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Zookeeper</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">，这是一篇在</span><span style="font-family:Helvetica;">Docker</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Kubernetes</span><span style="font-family:宋体;">容器上配置</span><span style="font-family:Helvetica;">Zookeeper, Kafka, MySQL</span><span style="font-family:宋体;">的教程。</span></p>  <p align="left"><a href="http://debezium.io/blog/2016/05/31/Debezium-on-Kubernetes/"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">http://debezium.io/blog/2016/05/31/Debezium-on-Kubernetes/</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:宋体;">有些人对</span><span style="font-family:Helvetica;">Apache Kafka</span><span style="font-family:宋体;">项目宣布采用另一种流式处理引擎感到惊讶，这就是</span><span style="font-family:Helvetica;">Kafka Streams</span><span style="font-family:宋体;">。</span><span style="font-family:Helvetica;">Kafka Streams</span><span style="font-family: 宋体;">与其它系统存在显著的关键差异。本文很好的示范了这些不同点</span><span style="font-family:Helvetica;">&#8212;&#8212;abstraction</span><span style="font-family:宋体;">、部署模型、支持基于状态的计算。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://softwaremill.com/kafka-streams-how-does-it-fit-stream-landscape/</span></p>  <p>&nbsp;</p>  <p><span style="font-family:宋体;">每个使用</span><span style="font-family:Helvetica;">MapReduce</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">或类似系统的人都会陷入难以调试、数据特征</span><span style="font-family:Helvetica;">bug</span><span style="font-family:宋体;">这些问题中。</span><span style="font-family:Helvetica;">BigDebug</span><span style="font-family:宋体;">是</span><span style="font-family:Helvetica;">UCLA</span><span style="font-family:宋体;">（加州大学洛杉矶分校）的研究项目</span><span style="font-family: Helvetica;">/</span><span style="font-family:宋体;">论文，旨在让开发人员通过工具发现单机问题：传入参数导致的崩溃，跟踪、断点、观察点、延迟报警等。该工具支持</span><span style="font-family:Helvetica;">Apache Spark 1.2.1</span><span style="font-family:宋体;">上。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://blog.acolyer.org/2016/06/07/bigdebug-debugging-primitives-for-interactive-big-data-processing-in-spark/</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Cask</span><span style="font-family:宋体;">撰文介绍了在开源</span><span style="font-family:Helvetica;">Cask Data Application Platform (CDAP)</span><span style="font-family:宋体;">中运行</span><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">的文章。运行在</span><span style="font-family:Helvetica;">CDAP</span><span style="font-family:宋体;">的</span><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">程序通过访问</span><span style="font-family:Helvetica;">Apache Tephra</span><span style="font-family:宋体;">（孵化中）实现细粒度事务支持。这样，就能很容易利用快照隔离实现从一个表复制到另一个表的一致性。</span><span style="font-family:Helvetica;">CDAP</span><span style="font-family:宋体;">中的</span><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">也能访问</span><span style="font-family:Helvetica;">Cask Tracker</span><span style="font-family:宋体;">，</span><span style="font-family:Helvetica;">Cask Tracker</span><span style="font-family:宋体;">提供数据血缘信息（什么时候创建、使用等）。根据应用的不同，</span><span style="font-family:Helvetica;">CDAP</span><span style="font-family:宋体;">工具还能发挥更大价值。</span><strong></strong></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://blog.cask.co/2016/06/cdap-spark-prototype-to-production/</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">IBM Hadoop Dev</span><span style="font-family: 宋体;">博客撰写了从</span><span style="font-family:Helvetica;">cURL</span><span style="font-family:宋体;">调用</span><span style="font-family:Helvetica;">Ambari REST API</span><span style="font-family:宋体;">的教程。还示范了在</span><span style="font-family:Helvetica;">vanilla</span><span style="font-family:宋体;">和启用了</span><span style="font-family:Helvetica;">kerberos</span><span style="font-family:宋体;">的集群上建立会话，并为接下来的请求复用会话。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://developer.ibm.com/hadoop/2016/06/07/ambari-rest-calls-for-kerberos-enabled-clusters/</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Google</span><span style="font-family:宋体;">云平台博客撰文介绍了如何调试运行在</span><span style="font-family:Helvetica;">Google Dataflow</span><span style="font-family:宋体;">上的</span><span style="font-family:Helvetica;">Apache Beam</span><span style="font-family: 宋体;">（孵化中）任务。为了调试性能瓶颈，</span><span style="font-family:Helvetica;">Dataflow</span><span style="font-family:宋体;">有一些有用的统计数据和</span><span style="font-family:Helvetica;">UI</span><span style="font-family:宋体;">来帮助使用者深入每一个步骤。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://cloud.google.com/blog/big-data/2016/06/understanding-timing-in-cloud-dataflow-pipelines</span></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">其他新闻</span></strong><strong></strong></p>  <p align="left"><span style="font-family:Helvetica;">Transaction Processing Performance Council(TPC)</span><span style="font-family:宋体;">发布了</span><span style="font-family:Helvetica;">TPCx-BB</span><span style="font-family:宋体;">基准测试，该基准测试为大数据系统设计。除了衡量</span><span style="font-family:Helvetica;">SQL</span><span style="font-family:宋体;">外，还可以对机器学习集群和分类问题进行测试。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">http://www.datanami.com/2016/06/01/big-data-benchmark-gauges-hadoop-platforms/</span></p>    <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:宋体;">伦敦</span><span style="font-family:Helvetica;">Strata + Hadoop</span><span style="font-family:宋体;">世界大会两周前已召开。演讲者的专题报告和幻灯片已发布到会议网站上。</span></p>  <p align="left"><a href="http://conferences.oreilly.com/strata/hadoop-big-data-eu/public/schedule/proceedings"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">http://conferences.oreilly.com/strata/hadoop-big-data-eu/public/schedule/proceedings</span></a></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Splice Machine</span><span style="font-family:宋体;">，</span><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">上的</span><span style="font-family:Helvetica;">RDBMS</span><span style="font-family:宋体;">构建者，宣布开源他们的软件。当前，他们正在寻找贡献者</span><span style="font-family:Helvetica;">/</span><span style="font-family:宋体;">导师</span><span style="font-family:Helvetica;">/</span><span style="font-family:宋体;">豪杰来提升开源后的效果。</span><span style="font-family:Helvetica;">Splice Machine</span><span style="font-family:宋体;">有不少有趣的特性，例如</span><span style="font-family:Helvetica;">ACID</span><span style="font-family:宋体;">事务，二级索引，引用完整性。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">http://www.splicemachine.com/were_going_open_source/</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Altiscale</span><span style="font-family:宋体;">博客编辑了许多关于客户服务、情感分析、气候变化、智慧城市、</span><span style="font-family: Helvetica;">bias</span><span style="font-family:宋体;">等方面的大数据应用案例文章。还收集了一些大数据怀疑论者的文章。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">https://www.altiscale.com/blog/big-data-news-health-and-public-safety-sentiment-analysis-fixing-education-2/</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">峰会本周在旧金山召开。会议组织者</span><span style="font-family:Helvetica;">Databricks</span><span style="font-family:宋体;">概述了两天内的热点内容，链接了许多的演讲和专题报告。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">https://databricks.com/blog/2016/06/08/another-record-setting-spark-summit.html</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:&quot;MS Mincho&quot;;MS Mincho&quot;;">大数据即服</span><span style="font-family:SimSun;">务</span><span style="font-family:&quot;MS Mincho&quot;;MS Mincho&quot;;">（BDaaS）公司</span><span style="font-family:Helvetica;">Qubole</span><span style="font-family:宋体;">，撰文介绍了他们的客户如何接受使用</span><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">。接受速度之快</span><span style="font-family:Helvetica;">&#8212;&#8212;</span><span style="font-family:宋体;">一半多的客户现在开始用</span><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">。</span><span style="font-family:Helvetica;">Qubole</span><span style="font-family:宋体;">也支持</span><span style="font-family:Helvetica;">Presto</span><span style="font-family:宋体;">，他们也看到了类似的增长。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">https://www.qubole.com/blog/big-data/spark-usage/</span></p>  <p align="left">&nbsp;</p>  <p><span style="font-family:Helvetica;">Twitter</span><span style="font-family:宋体;">向</span><span style="font-family:Helvetica;">Apache</span><span style="font-family:宋体;">孵化器提交了他们的复制日志服务</span><span style="font-family:Helvetica;">DistributedLog</span><span style="font-family:宋体;">。</span></p>  <p align="left"><a href="https://wiki.apache.org/incubator/DistributedLogProposal"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">https://wiki.apache.org/incubator/DistributedLogProposal</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Big Data Day LA</span><span style="font-family: 宋体;">于</span><span style="font-family:Helvetica;">6</span><span style="font-family:宋体;">月</span><span style="font-family:Helvetica;">9</span><span style="font-family:宋体;">日在</span><span style="font-family:宋体;color:#2E2E2E;">西洛杉矶学院召开。这次活动是免费的（如果预先注册的话），演讲者来自于</span><span style="font-family:Helvetica;">Confluent</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Databricks</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Yahoo</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Netflix</span><span style="font-family:宋体;">等。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://www.bigdatadayla.com/</span></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">产品发布</span></strong><strong></strong></p>  <p align="left"><span style="font-family:Helvetica;">Apache Spark</span><span style="font-family:宋体;">发布了</span><span style="font-family:Helvetica;">Spark 2.0</span><span style="font-family:宋体;">预览版。发布声明中说道</span><span style="font-family:Helvetica;">API</span><span style="font-family:宋体;">和功能都尚未最终敲定。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">https://spark.apache.org/news/spark-2.0.0-preview.html</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">JustOne</span><span style="font-family:宋体;">构建并开源了</span><span style="font-family:Helvetica;">Kafka-to-PostgreSQL</span><span style="font-family:宋体;">连接器。本文介绍了该连接器的性能，详细描述了如何把消息转换为行，还描述了如何设定配置等。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">http://www.confluent.io/blog/kafka-connect-sink-for-postgresql-from-justone-database</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Salesforce</span><span style="font-family:宋体;">开源了</span><span style="font-family:Helvetica;">Runway</span><span style="font-family:宋体;">，这是一个建模、仿真以及可视化分布式系统。在</span><span style="font-family:Helvetica;">runway.system</span><span style="font-family:宋体;">上有一个在线演示环境，演示了</span><span style="font-family:Helvetica;">&#8220;too many bananas&#8221;</span><span style="font-family:宋体;">模型，电梯系统和</span><span style="font-family:Helvetica;">Raft</span><span style="font-family:宋体;">一致性系统。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">https://medium.com/salesforce-open-source/runway-intro-dc0d9578e248</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Bloomberg</span><span style="font-family:宋体;">最近开源了</span><span style="font-family:Helvetica;">Presto Accumulo</span><span style="font-family: 宋体;">，面向</span><span style="font-family:Helvetica;">Apache Accumulo</span><span style="font-family:宋体;">的</span><span style="font-family:Helvetica;">Presto</span><span style="font-family:宋体;">连接器。在声明中，链接了</span><span style="font-family:Helvetica;">11</span><span style="font-family:宋体;">页的论文，比较了基于的</span><span style="font-family:Helvetica;">Presto</span><span style="font-family:宋体;">查询和基于</span><span style="font-family:Helvetica;">Accumulo Java API</span><span style="font-family:宋体;">查询的基准测试结果。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">http://www.bloomberg.com/company/announcements/open-source-at-bloomberg-reducing-application-development-time-via-presto-accumulo/</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:&quot;MS Mincho&quot;;MS Mincho&quot;;">微</span><span style="font-family:SimSun;">软</span><span style="font-family:Helvetica;">Azure</span><span style="font-family:宋体;">发布了基于</span><span style="font-family:Helvetica;">Apache Spark 1.6.1 </span><span style="font-family:宋体;">稳定版的</span><span style="font-family:Helvetica;">Azure HDInsight</span><span style="font-family:宋体;">。本次发布支持了面向</span><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">的</span><span style="font-family:Helvetica;">Project Livy REST</span><span style="font-family: 宋体;">任务服务支持，集成了</span><span style="font-family: Helvetica;">Azure</span><span style="font-family:宋体;">数据湖存储（基于角色的访问控制），集成了</span><span style="font-family:Helvetica;">IntelliJ</span><span style="font-family:宋体;">，支持了</span><span style="font-family:Helvetica;">Jupyter</span><span style="font-family:宋体;">笔记本等。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">https://azure.microsoft.com/en-us/blog/apache-spark-for-azure-hdinsight-now-generally-available/</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">LinkedIn</span><span style="font-family:宋体;">开源了</span><span style="font-family:Helvetica;">Photon ML</span><span style="font-family:宋体;">，他们的大规模回归分析库。</span><span style="font-family: Helvetica;">Photon</span><span style="font-family:宋体;">构建在</span><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">之上并在</span><span style="font-family:Helvetica;">LinkedIn</span><span style="font-family:宋体;">的</span><span style="font-family:Helvetica;">YARN</span><span style="font-family:宋体;">上运行（过去基于</span><span style="font-family:Helvetica;">MapReduce</span><span style="font-family:宋体;">，似乎因为要提升性能才迁移）。</span></p>  <p align="left"><a href="https://engineering.linkedin.com/blog/2016/06/open-sourcing-photon-ml"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">https://engineering.linkedin.com/blog/2016/06/open-sourcing-photon-ml</span></a></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Hortonworks</span><span style="font-family:宋体;">发布了</span><span style="font-family:Helvetica;">Spark-HBase</span><span style="font-family: 宋体;">连接器的技术预览版。预览版原生支持</span><span style="font-family:Helvetica;">Avro</span><span style="font-family:宋体;">，支持运行安全集群，原生支持</span><span style="font-family:Helvetica;">Spark Datasource API</span><span style="font-family:宋体;">，并优化了分区修剪，列修剪，谓词下推。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">http://hortonworks.com/blog/spark-hbase-dataframe-based-hbase-connector/</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Databricks</span><span style="font-family:宋体;">发布了</span><span style="font-family:Helvetica;">Apache Spark</span><span style="font-family: 宋体;">平台的第一阶段安全特性。本阶段对集群</span><span style="font-family:Helvetica;">ACL</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">SAML 2.0</span><span style="font-family:宋体;">进行了支持，端对端的审计日志。</span></p>  <p align="left"><span style="font-family:Helvetica;color:#386EFF;">https://databricks.com/blog/2016/06/08/achieving-end-to-end-security-for-apache-spark-with-databricks.html</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Apache ORC 1.1.0</span><span style="font-family:宋体;">版发布了。本次发布完成了从基于</span><span style="font-family: Helvetica;">Apache Hive</span><span style="font-family:宋体;">的代码到基于</span><span style="font-family:Helvetica;">Java</span><span style="font-family:宋体;">的代码迁移，修正了</span><span style="font-family:Helvetica;">C++</span><span style="font-family:宋体;">时间戳处理程序，增加了</span><span style="font-family: Helvetica;">Hadoop MapReduce</span><span style="font-family:宋体;">连接器。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">http://orc.apache.org/news/2016/06/10/ORC-1.1.0/</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Apache Kudu</span><span style="font-family:宋体;">发布了</span><span style="font-family:Helvetica;">0.9.0</span><span style="font-family:宋体;">版。增加了</span><span style="font-family:Helvetica;">UPSERT</span><span style="font-family:宋体;">命令，新的</span><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">数据源不会依赖</span><span style="font-family:Helvetica;">MapReduce API</span><span style="font-family: 宋体;">，提升了</span><span style="font-family:Helvetica;">Tablet Server</span><span style="font-family:宋体;">写性能。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">http://getkudu.io/2016/06/10/apache-kudu-0-9-0-released.html</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Google</span><span style="font-family:宋体;">云服务平台团队发布了支持</span><span style="font-family:Helvetica;">Spark 2.0</span><span style="font-family:宋体;">预览版的</span><span style="font-family:Helvetica;">Google Cloud Dataproc</span><span style="font-family: 宋体;">。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">https://cloud.google.com/blog/big-data/2016/06/google-cloud-dataproc-the-fast-easy-and-safe-way-to-try-spark-20-preview</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Dory</span><span style="font-family:宋体;">（</span><span style="font-family:Helvetica;">Bruce</span><span style="font-family:宋体;">的继承者）</span><span style="font-family:Helvetica;">Kafka producer</span><span style="font-family:宋体;">的守护进程，现在支持从</span><span style="font-family:Helvetica;">UNIX domain sockets</span><span style="font-family:宋体;">或本地</span><span style="font-family:Helvetica;">TCP</span><span style="font-family:&quot;MS Mincho&quot;;MS Mincho&quot;;">接收数据了。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">http://mail-archives.apache.org/mod_mbox/kafka-users/201606.mbox/%3C1465683894.608424023@apps.rackspace.com%3E</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Apache Pig 0.16.0</span><span style="font-family:宋体;">版，一年来首次发布。坚定了对</span><span style="font-family: Helvetica;">Tez</span><span style="font-family:宋体;">的支持。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">http://pig.apache.org/releases.html#8+June%2C+2016%3A+release+0.16.0+available</span></p>  <p align="left">&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">活动</span></strong><strong></strong></p>  <p align="left"><span style="font-size:14.0pt;font-family:SimSun;">中国</span></p>  <p align="left"><span style="font-family:Helvetica;">Spark Meetup (</span><span style="font-family:宋体;">上海</span><span style="font-family:Helvetica;">) &#8211; </span><span style="font-family:宋体;">周六</span><span style="font-family:Helvetica;">, 6</span><span style="font-family:宋体;">月</span><span style="font-family:Helvetica;">18</span><span style="font-family:宋体;">日</span></p><img src ="http://www.blogjava.net/rosen/aggbug/431032.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/rosen/" target="_blank">Rosen</a> 2016-06-28 17:39 <a href="http://www.blogjava.net/rosen/archive/2016/06/28/431032.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Hadoop周刊—第 173 期</title><link>http://www.blogjava.net/rosen/archive/2016/06/20/430972.html</link><dc:creator>Rosen</dc:creator><author>Rosen</author><pubDate>Mon, 20 Jun 2016 01:47:00 GMT</pubDate><guid>http://www.blogjava.net/rosen/archive/2016/06/20/430972.html</guid><wfw:comment>http://www.blogjava.net/rosen/comments/430972.html</wfw:comment><comments>http://www.blogjava.net/rosen/archive/2016/06/20/430972.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/rosen/comments/commentRss/430972.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/rosen/services/trackbacks/430972.html</trackback:ping><description><![CDATA[<p align="left" style="line-height: 10%;"><strong>&nbsp;</strong></p>  <p align="left" style="line-height: 10%;"><strong><span style="font-size:16.0pt;line-height:10%">Hadoop</span></strong><strong><span style="font-size:16.0pt;line-height:10%;font-family:宋体;">周刊</span></strong><strong> </strong><strong><span style="font-size:16.0pt;line-height: 10%;font-family:宋体;">第</span></strong><strong><span style="font-size:16.0pt;line-height:10%"> 173 </span></strong><strong><span style="font-size:16.0pt;line-height: 10%;font-family:宋体;">期</span></strong><strong></strong></p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;"><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">启明星辰平台和大数据总体组编译</span></p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;"><span style="font-size:14.0pt;line-height:10%">2016</span><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">年</span><span style="font-size:14.0pt;line-height:10%">6</span><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">月</span><span style="font-size:14.0pt;line-height:10%">5</span><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">日</span></p>  <p>&nbsp;</p>  <p><span style="font-family:宋体;">本周，</span><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">NiFi</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Netflix Meson</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Storm</span><span style="font-family:宋体;">方面只有少量内容。</span><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">峰会本周在旧金山召开，所以呢，下周肯定有不少内容。</span></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">技术新闻</span></strong><strong></strong></p>  <p><span style="font-family:Helvetica;">Databricks</span><span style="font-family:宋体;">博客介绍了</span><span style="font-family:Helvetica;">Apache Spark 2.0</span><span style="font-family:宋体;">的新特性</span><span style="font-family:Helvetica;">&#8212;&#8212;</span><span style="font-family:宋体;">跨语言支持存储和加载机器学习模型。模型通过简单的</span><span style="font-family:Helvetica;">API</span><span style="font-family:宋体;">被存储和加载，模型的元数据与参数保存为</span><span style="font-family:Helvetica;">JSON</span><span style="font-family:宋体;">风格，模型的数据保存为</span><span style="font-family:Helvetica;">Parquet</span><span style="font-family:宋体;">风格。</span></p>  <p><span style="font-family:Helvetica;">https://databricks.com/blog/2016/05/31/apache-spark-2-0-preview-machine-learning-model-persistence.html</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://databricks.com/blog/2016/05/31/apache-spark-2-0-preview-machine-learning-model-persistence.html</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Meson</span><span style="font-family:宋体;">是</span><span style="font-family:Helvetica;">Netflix</span><span style="font-family:宋体;">用于执行机器学习工作流的框架。它是</span><span style="font-family:Helvetica;">Apache Hive</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Mesos</span><span style="font-family:宋体;">这些大数据技术之间的粘合剂。工作流使用</span><span style="font-family:Helvetica;">DSL</span><span style="font-family:宋体;">进行编写，</span><span style="font-family:Helvetica;">Meson</span><span style="font-family:宋体;">还提供了更加先进的流水线可视化</span><span style="font-family: Helvetica;">UI</span><span style="font-family:宋体;">。</span><span style="font-family:Helvetica;">Netflix</span><span style="font-family:宋体;">目前没开源</span><span style="font-family:Helvetica;">Meson</span><span style="font-family:宋体;">，但他们有这方面的计划。</span></p>  <p align="left"><a href="http://techblog.netflix.com/2016/05/meson_31.html"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">http://techblog.netflix.com/2016/05/meson_31.html</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">IBM Hadoop Dev</span><span style="font-family: 宋体;">博客简要介绍和示范了</span><span style="font-family: Helvetica;">HDFS</span><span style="font-family:宋体;">归档存储能力。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://developer.ibm.com/hadoop/2016/06/01/use-hdfs-archival-storage/</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Apache Storm 1.0</span><span style="font-family: 宋体;">有了令人惊讶的新特性。本文关注了几个调试能力方面的增强：动态日志级别、统一日志搜索、</span><span style="font-family:Cambria;">事件抽样、集成</span><span style="font-family: Helvetica;">jstack/heap dumps/java</span><span style="font-family:宋体;">飞行记录器分析</span><span style="font-family:Helvetica;">worker</span><span style="font-family:宋体;">。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://hortonworks.com/blog/whats-new-apache-storm-1-0-part-1-enhanced-debugging/</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Cloudera</span><span style="font-family:宋体;">博客撰文介绍了如何使用</span><span style="font-family: Helvetica;">Apache Spark</span><span style="font-family:宋体;">来探索性分析存储在</span><span style="font-family:Helvetica;">CSV</span><span style="font-family:宋体;">文件中的</span><span style="font-family:Helvetica;">NBA</span><span style="font-family:宋体;">历史统计数据。分析过程混合使用了</span><span style="font-family: Helvetica;">Scala</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">SQL</span><span style="font-family:宋体;">。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://blog.cloudera.com/blog/2016/06/how-to-analyze-fantasy-sports-using-apache-spark-and-sql/</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Apache NiFi</span><span style="font-family: 宋体;">作为一种通用工具受到了很多的关注。它为</span><span style="font-family:Helvetica;">&#8220;</span><span style="font-family:宋体;">基于流程的处理</span><span style="font-family:Helvetica;">&#8221;</span><span style="font-family:宋体;">而生，可能对很多人并不意味着什么，但</span><span style="font-family:Helvetica;">NiFi</span><span style="font-family:宋体;">支持标准的</span><span style="font-family:Helvetica;">ETL</span><span style="font-family:宋体;">，流式处理等。许多</span><span style="font-family:Helvetica;">NiFi</span><span style="font-family:宋体;">例子都示范了如何从</span><span style="font-family:Helvetica;">Twitter firehose</span><span style="font-family:宋体;">把数据移动到</span><span style="font-family:Helvetica;">HDFS</span><span style="font-family:宋体;">中，但本文聚焦在</span><span style="font-family:Helvetica;">NiFi</span><span style="font-family:宋体;">另外的特性上</span><span style="font-family:Helvetica;">&#8212;&#8212;</span><span style="font-family:宋体;">示范了一些简单的从</span><span style="font-family:Helvetica;">HTTP</span><span style="font-family:宋体;">拉数据的过程。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://hortonworks.com/blog/apache-nifi-not-scratch/</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Amazon Redshift</span><span style="font-family: 宋体;">构建于</span><span style="font-family:Helvetica;">PostgreSQL</span><span style="font-family:宋体;">引擎上，所以你可以利用</span><span style="font-family:Helvetica;">PostgreSQL</span><span style="font-family:宋体;">的扩展功能让</span><span style="font-family:Helvetica;">Redshift</span><span style="font-family:宋体;">集群连接</span><span style="font-family:Helvetica;">PostgresSQL</span><span style="font-family:宋体;">实例。这样一来，诸如跨数据库连接、将</span><span style="font-family:Helvetica;">Redshift</span><span style="font-family:宋体;">的结果转换为</span><span style="font-family:Helvetica;">JSON</span><span style="font-family:宋体;">、在</span><span style="font-family:Helvetica;">Postgres</span><span style="font-family:宋体;">中创建</span><span style="font-family:Helvetica;">Redshift</span><span style="font-family:宋体;">数据视图、</span></p>  <p><span style="font-family:宋体;">数据库之间复制数据等有趣的应用都能实现。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://blogs.aws.amazon.com/bigdata/post/Tx1GQ6WLEWVJ1OX/JOIN-Amazon-Redshift-AND-Amazon-RDS-PostgreSQL-WITH-dblink</span></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">其他发布</span></strong><strong></strong></p>  <p align="left"><span style="font-family:Helvetica;">FeatherCast</span><span style="font-family:宋体;">发布了超过</span><span style="font-family:Helvetica;">100</span><span style="font-family:宋体;">个</span><span style="font-family:Helvetica;">ApacheCon</span><span style="font-family:宋体;">北美峰会的相关录音。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">http://feathercast.apache.org/tag/apacheconna2016/</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">InfoWorld</span><span style="font-family:宋体;">介绍了</span><span style="font-family:Helvetica;">Heron</span><span style="font-family:宋体;">，</span><span style="font-family:Helvetica;">Twitter</span><span style="font-family:宋体;">才开源的</span><span style="font-family:Helvetica;">Apache Storm</span><span style="font-family:宋体;">兼容项目。本文介绍了两个项目在架构上的不同。主要指出了</span><span style="font-family:Helvetica;">Heron</span><span style="font-family:宋体;">起步于几个月前（</span><span style="font-family:Helvetica;">Storm</span><span style="font-family:宋体;">已发布），就是说</span><span style="font-family:Helvetica;">Storm</span><span style="font-family:宋体;">在特性上比</span><span style="font-family:Helvetica;">Heron</span><span style="font-family:宋体;">更有优势。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">http://www.infoworld.com/article/3078134/analytics/had-it-with-apache-storm-heron-swoops-to-the-rescue.html</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Databricks</span><span style="font-family:宋体;">在</span><span style="font-family:Helvetica;">edX</span><span style="font-family:宋体;">上开了一门新课程，</span><span style="font-family:Helvetica;">&#8220;Apache Spark</span><span style="font-family:宋体;">入门</span><span style="font-family:Helvetica;">&#8221;</span><span style="font-family:宋体;">。课程从</span><span style="font-family:Helvetica;">6</span><span style="font-family:宋体;">月</span><span style="font-family:Helvetica;">15</span><span style="font-family:宋体;">日开始，一直持续两周。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">launch-first-of-five-free-big-data-courses-on-apache-spark.html</span></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">产品发布</span></strong><strong></strong></p>  <p align="left"><span style="font-family:Helvetica;">Amazon EMR</span><span style="font-family:宋体;">发布了</span><span style="font-family:Helvetica;">4.7.0</span><span style="font-family:宋体;">版。本次发布支持了</span><span style="font-family:Helvetica;">Apache Tez</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">Apache Phoenix</span><span style="font-family:宋体;">，并内置了新版本的</span><span style="font-family:Helvetica;">Apache HBase</span><span style="font-family: 宋体;">、</span><span style="font-family:Helvetica;">Apache Mahout</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Presto</span><span style="font-family:宋体;">。另外，</span><span style="font-family:Helvetica;">AWS</span><span style="font-family:宋体;">大数据博客还指导了</span><span style="font-family:Helvetica;">Phoenix</span><span style="font-family:宋体;">如何上手。</span></p>  <p align="left"><a href="http://aws.amazon.com/blogs/aws/amazon-emr-4-7-0-apache-tez-phoenix-updates-to-existing-apps/"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">http://aws.amazon.com/blogs/aws/amazon-emr-4-7-0-apache-tez-phoenix-updates-to-existing-apps/</span></a></p>  <p align="left"><a href="http://blogs.aws.amazon.com/bigdata/post/Tx2ZF1NDQYDJFGT/Supercharge-SQL-on-Your-Data-in-Apache-HBase-with-Apache-Phoenix"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">http://blogs.aws.amazon.com/bigdata/post/Tx2ZF1NDQYDJFGT/Supercharge-SQL-on-Your-Data-in-Apache-HBase-with-Apache-Phoenix</span></a></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Apache Hive</span><span style="font-family:宋体;">本周发布了</span><span style="font-family:Helvetica;">2.0.1</span><span style="font-family:宋体;">版。从二月发布</span><span style="font-family:Helvetica;">2.0.0</span><span style="font-family:宋体;">以来，首次小版本发布。本次修复了</span><span style="font-family:Helvetica;">60</span><span style="font-family:宋体;">个</span><span style="font-family:Helvetica;">bug</span><span style="font-family:宋体;">。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">http://mail-archives.us.apache.org/mod_mbox/www-announce/201605.mbox/%3CD37344A3.77A64%25sershe@apache.org%3E</span></p>  <p align="left">&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">活动</span></strong><strong></strong></p>  <p align="left"><span style="font-size:14.0pt;font-family:SimSun;">中国</span></p>  <p align="left"><span style="font-family:宋体;">无</span></p><img src ="http://www.blogjava.net/rosen/aggbug/430972.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/rosen/" target="_blank">Rosen</a> 2016-06-20 09:47 <a href="http://www.blogjava.net/rosen/archive/2016/06/20/430972.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Hadoop周刊—第 172 期</title><link>http://www.blogjava.net/rosen/archive/2016/06/09/430841.html</link><dc:creator>Rosen</dc:creator><author>Rosen</author><pubDate>Wed, 08 Jun 2016 16:11:00 GMT</pubDate><guid>http://www.blogjava.net/rosen/archive/2016/06/09/430841.html</guid><wfw:comment>http://www.blogjava.net/rosen/comments/430841.html</wfw:comment><comments>http://www.blogjava.net/rosen/archive/2016/06/09/430841.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/rosen/comments/commentRss/430841.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/rosen/services/trackbacks/430841.html</trackback:ping><description><![CDATA[<p align="left" style="line-height: 10%;"><strong>&nbsp;</strong></p>  <p align="left" style="line-height: 10%;"><strong><span style="font-size:16.0pt;line-height:10%">Hadoop</span></strong><strong><span style="font-size:16.0pt;line-height:10%;font-family:宋体;">周刊</span></strong><strong> </strong><strong><span style="font-size:16.0pt;line-height: 10%;font-family:宋体;">第</span></strong><strong><span style="font-size:16.0pt;line-height:10%"> 172 </span></strong><strong><span style="font-size:16.0pt;line-height: 10%;font-family:宋体;">期</span></strong><strong></strong></p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;"><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">启明星辰平台和大数据总体组编译</span></p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;"><span style="font-size:14.0pt;line-height:10%">2016</span><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">年</span><span style="font-size:14.0pt;line-height:10%">5</span><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">月</span><span style="font-size:14.0pt;line-height:10%">22</span><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">日</span></p>  <p>&nbsp;</p>  <p><span style="font-family:宋体;">本周主要关注流式计算</span><span style="font-family:Helvetica;">&#8212;&#8212; Twitter</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">Cloudera</span><span style="font-family:宋体;">介绍了他们新的流式计算框架，有文章介绍了</span><span style="font-family:Helvetica;">Apache Flink</span><span style="font-family:宋体;">的流式</span><span style="font-family:Helvetica;">SQL</span><span style="font-family:宋体;">，</span><span style="font-family:Helvetica;">DataTorrent</span><span style="font-family: 宋体;">介绍了</span><span style="font-family:Helvetica;">Apache Apex</span><span style="font-family:宋体;">容错机制，还有</span><span style="font-family:Helvetica;">Concord</span><span style="font-family:宋体;">这样新的流式计算框架，另外还有</span><span style="font-family:Helvetica;">Apache Kafka</span><span style="font-family:宋体;">的</span><span style="font-family:Helvetica;">0.10</span><span style="font-family:宋体;">版。其他新闻方面，</span><span style="font-family:Helvetica;">Apache</span><span style="font-family:宋体;">孵化器有新动向</span><span style="font-family:Helvetica;">&#8212;&#8212;Apache TinkerPop</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">Apache Zeppelin</span><span style="font-family:宋体;">孵化成为顶级项目，</span><span style="font-family:Helvetica;">Tephra</span><span style="font-family:宋体;">进入孵化器。除了上述内容，</span><span style="font-family: Helvetica;">Apache Spark</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Apache HBase</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Apache Drill</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Apache Ambari</span><span style="font-family:宋体;">等也有新文章。</span></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">技术新闻</span></strong><strong></strong></p>  <p align="left"><span style="font-family:Helvetica;">DataTorrent</span><span style="font-family:宋体;">博客撰文介绍了</span><span style="font-family:Helvetica;">Apache Apex</span><span style="font-family:宋体;">在读写数据文件时的容错机制。</span><span style="font-family:Helvetica;">Apex</span><span style="font-family:宋体;">是专门处理流式数据的，流式计算有一些微妙但重要的细节需要考虑。例如使用</span><span style="font-family:Helvetica;">HDFS</span><span style="font-family:宋体;">输出时，</span><span style="font-family:Helvetica;">HDFS</span><span style="font-family:宋体;">的租约机制会引发问题。</span></p>  <p align="left"><a href="https://www.datatorrent.com/blog/fault-tolerant-file-processing/"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">https://www.datatorrent.com/blog/fault-tolerant-file-processing/</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Databricks</span><span style="font-family:宋体;">博客介绍了</span><span style="font-family:Helvetica;">Spark 2.0</span><span style="font-family:宋体;">中</span><span style="font-family:Helvetica;">Tungsten</span><span style="font-family:宋体;">代码生成引擎带来的性能提升。博文举例说明了由于虚拟函数的管理，更好地利用</span><span style="font-family:Helvetica;">CPU</span><span style="font-family:宋体;">寄存器和循环展开，所以代码生成引擎能更快的生成代码。除了</span><span style="font-family: Helvetica;">Databricks</span><span style="font-family:宋体;">的博文外，</span><span style="font-family:Helvetica;">Morning Paper</span><span style="font-family:宋体;">还谈到以上技术其实是受到</span><span style="font-family: Helvetica;">VLDB</span><span style="font-family:宋体;">论文的启发。</span></p>  <p align="left"><a href="https://databricks.com/blog/2016/05/23/apache-spark-as-a-compiler-joining-a-billion-rows-per-second-on-a-laptop.html"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">https://databricks.com/blog/2016/05/23/apache-spark-as-a-compiler-joining-a-billion-rows-per-second-on-a-laptop.html</span></a></p>  <p><a href="https://blog.acolyer.org/2016/05/23/efficiently-compiling-efficient-query-plans-for-modern-hardware/"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">https://blog.acolyer.org/2016/05/23/efficiently-compiling-efficient-query-plans-for-modern-hardware/</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">StreamScope</span><span style="font-family: 宋体;">是微软流式处理系统，是</span><span style="font-family: Helvetica;">Morning Paper</span><span style="font-family:宋体;">本周撰写的另一个流式计算文章。介绍了该系统的特征</span><span style="font-family:Helvetica;">&#8212;&#8212;</span><span style="font-family:宋体;">吞吐量</span><span style="font-family:Helvetica;">/</span><span style="font-family:宋体;">集群大小、编程模型</span><span style="font-family:Helvetica;">(SQL)</span><span style="font-family:宋体;">、时间模型、语义学</span><span style="font-family:Helvetica;">/</span><span style="font-family:宋体;">保证，以及微软产品中的应用。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://blog.acolyer.org/2016/05/24/streamscope-continuous-reliable-distributed-processing-of-big-data-streams/</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Apache</span><span style="font-family:宋体;">博客撰文介绍了</span><span style="font-family:Helvetica;">HubSpot</span><span style="font-family:宋体;">团队对</span><span style="font-family:Helvetica;">Apache HBase</span><span style="font-family:宋体;">的</span><span style="font-family:Helvetica;">G1GC</span><span style="font-family:宋体;">调优方面的经验。本文回顾</span><span style="font-family:Helvetica;">HubSpot</span><span style="font-family:宋体;">如何尝试和保障稳定性、如何保障</span><span style="font-family:Helvetica;">99%</span><span style="font-family:宋体;">的性能、如何缩短花在垃圾回收上的时间。该团队使用很多技巧，很好地决绝了错综复杂的</span><span style="font-family:Helvetica;">GC</span><span style="font-family:宋体;">算法。本文最后，还一步步示范了</span><span style="font-family:Helvetica;">HBase</span><span style="font-family:宋体;">的</span><span style="font-family:Helvetica;">G1GC</span><span style="font-family:宋体;">调优。</span></p>  <p align="left"><a href="https://blogs.apache.org/hbase/entry/tuning_g1gc_for_your_hbase"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">https://blogs.apache.org/hbase/entry/tuning_g1gc_for_your_hbase</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">LinkedIn</span><span style="font-family:宋体;">撰文阐述了调试</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">偏移量管理问题的诸多困难。本文聚焦了两个所谓</span><span style="font-family:Helvetica;">"offset rewind"</span><span style="font-family: Cambria;">事件的症状，如何在监控过程中检测到这类事件，以及导致这两个事件的根本原因（及解决方案）。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://engineering.linkedin.com/blog/2016/05/kafkaesque-days-at-linkedin--part-1</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Databricks</span><span style="font-family:宋体;">博客发布了使用</span><span style="font-family:Helvetica;">Apache Spark</span><span style="font-family:宋体;">进行基因变异分析系列文章的第三部分也是最后一篇。本文从准备（把文件转换到</span><span style="font-family:Helvetica;">Parquet</span><span style="font-family:宋体;">并加载进</span><span style="font-family:Helvetica;">Spark RRD</span><span style="font-family:宋体;">）到如何加载基因型数据再到运行</span><span style="font-family: Helvetica;">kmeans</span><span style="font-family:宋体;">聚类算法基于基因型特征预测地理种群。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://databricks.com/blog/2016/05/24/predicting-geographic-population-using-genome-variants-and-k-means.html</span></p>  <p>&nbsp;</p>  <p><span style="font-family:宋体;">许多批处理大数据生态系统已从自定义</span><span style="font-family:Helvetica;">API</span><span style="font-family:宋体;">回到</span><span style="font-family:Helvetica;">SQL</span><span style="font-family:宋体;">上，所以如果流式处理框架也发生了同样的变化，一定很有趣。本文，</span><span style="font-family:Helvetica;">Apache Flink</span><span style="font-family:宋体;">团队介绍他们计划支持流式</span><span style="font-family:Helvetica;">SQL</span><span style="font-family:宋体;">。</span><span style="font-family:Helvetica;">Flink</span><span style="font-family:宋体;">已经有了</span><span style="font-family:Helvetica;">Table API</span><span style="font-family:宋体;">，他们利用</span><span style="font-family:Helvetica;">Apache Calcite</span><span style="font-family:宋体;">提供了对</span><span style="font-family:Helvetica;">SQL</span><span style="font-family:宋体;">的支持。对于</span><span style="font-family:Helvetica;">windowing</span><span style="font-family:宋体;">，他们计划用</span><span style="font-family:Helvetica;">Calcite</span><span style="font-family:宋体;">的流式</span><span style="font-family:Helvetica;">SQL</span><span style="font-family:宋体;">扩展。最初对</span><span style="font-family:Helvetica;">SQL</span><span style="font-family:宋体;">的支持将在</span><span style="font-family:Helvetica;">1.1.0</span><span style="font-family:宋体;">版中体现，在</span><span style="font-family:Helvetica;">1.2.0</span><span style="font-family:宋体;">版加强。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://flink.apache.org/news/2016/05/24/stream-sql.html</span></p>  <p>&nbsp;</p>  <p><span style="font-family:宋体;">本文介绍了</span><span style="font-family:Helvetica;">Apache Drill</span><span style="font-family:宋体;">的</span><span style="font-family:Helvetica;">XML</span><span style="font-family:宋体;">插件。尽管还没有和</span><span style="font-family:Helvetica;">Drill</span><span style="font-family:宋体;">集成在一起，但它相当容易被编译成</span><span style="font-family:Helvetica;">jar</span><span style="font-family:宋体;">和配置对</span><span style="font-family:Helvetica;">XML</span><span style="font-family:宋体;">的支持。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://www.mapr.com/blog/how-use-xml-plugin-apache-drill</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Hortonworks</span><span style="font-family: 宋体;">博客简略介绍了</span><span style="font-family:Helvetica;">Ambari</span><span style="font-family:宋体;">监控度量系统的架构，最近加入了</span><span style="font-family:Helvetica;">Grafana</span><span style="font-family:宋体;">作为其前端仪表盘。该系统使用</span><span style="font-family:Helvetica;">Apache Phoenix</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">Apache HBase</span><span style="font-family:宋体;">作为存储支撑，所以是可以横向扩展的。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://hortonworks.com/blog/hood-ambari-metrics-grafana/</span></p>  <p>&nbsp;</p>  <p><span style="font-family:宋体;">这篇教程介绍了怎样在</span><span style="font-family:Helvetica;">Amazon EMR</span><span style="font-family:宋体;">上使用</span><span style="font-family:Helvetica;">Spark SQL</span><span style="font-family:宋体;">与</span><span style="font-family:Helvetica;">Hue</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Apache Zeppelin</span><span style="font-family:宋体;">配合运行</span><span style="font-family:Helvetica;">SQL</span><span style="font-family:宋体;">查询存储在</span><span style="font-family:Helvetica;">S3</span><span style="font-family:宋体;">中跨制表符分割的数据。本文最后展示了如何从</span><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">向</span><span style="font-family:Helvetica;">DynamoDB</span><span style="font-family:宋体;">存储数据。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://blogs.aws.amazon.com/bigdata/post/Tx2D93GZRHU3TES/Using-Spark-SQL-for-ETL</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Heroku</span><span style="font-family:宋体;">团队分享了他们使用最新版</span><span style="font-family: Helvetica;">Apache Kafka</span><span style="font-family:宋体;">的体验</span><span style="font-family:Helvetica;">&#8212;&#8212;</span><span style="font-family:宋体;">才引入的</span><span style="font-family:Helvetica;">timestamp</span><span style="font-family:宋体;">字段（</span><span style="font-family:Helvetica;">8</span><span style="font-family:宋体;">字节）会导致一些反直觉的性能变化。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://engineering.heroku.com/blogs/2016-05-27-apache-kafka-010-evaluating-performance-in-distributed-systems/</span></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">其他新闻</span></strong><strong></strong></p>  <p><span style="font-family:Helvetica;">O'Reilly</span><span style="font-family:宋体;">数据播客秀就</span><span style="font-family:Helvetica;">Spark 2.0</span><span style="font-family:宋体;">中结构化流式计算方面的问题采访了来自</span><span style="font-family:Helvetica;">Databricks</span><span style="font-family:宋体;">的</span><span style="font-family:Helvetica;">Michael Armbrust</span><span style="font-family: 宋体;">。网站上的一篇文章选择引用了其中的话题</span><span style="font-family:Helvetica;">&#8212;&#8212; Spark SQL</span><span style="font-family:宋体;">、结构化流式计算的目标、端到端管道的保证、对在线处理运用</span><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">机器学习算法。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://www.oreilly.com/ideas/structured-streaming-comes-to-apache-spark-2-0</span></p>  <p>&nbsp;</p>  <p><span style="font-family:宋体;">本周两个大数据项目从</span><span style="font-family:Helvetica;">Apache</span><span style="font-family:宋体;">孵化器孵化完成</span><span style="font-family:Helvetica;">&#8212;&#8212;Apache TinkerPop</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">Apache Zeppelin</span><span style="font-family:宋体;">。</span><span style="font-family:Helvetica;">TinkerPop</span><span style="font-family:宋体;">是图计算框架，</span><span style="font-family:Helvetica;">Zeppelin</span><span style="font-family:宋体;">是面向数据分析基于</span><span style="font-family:Helvetica;">web</span><span style="font-family:宋体;">的</span><span style="font-family:Helvetica;">notebook</span><span style="font-family:宋体;">。</span></p>  <p align="left"><a href="https://blogs.apache.org/foundation/entry/the_apache_software_foundation_announces91"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">https://blogs.apache.org/foundation/entry/the_apache_software_foundation_announces91</span></a></p>  <p><a href="https://blogs.apache.org/foundation/entry/the_apache_software_foundation_announces92"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">https://blogs.apache.org/foundation/entry/the_apache_software_foundation_announces92</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Tephra</span><span style="font-family:宋体;">，</span><span style="font-family:Helvetica;">HBase</span><span style="font-family:宋体;">的事务引擎进入了</span><span style="font-family:Helvetica;">Apache</span><span style="font-family:宋体;">孵化器。</span><span style="font-family:Helvetica;">Tephra</span><span style="font-family:宋体;">最初由</span><span style="font-family:Helvetica;">Cask</span><span style="font-family:宋体;">的团队创建，目前仅和</span><span style="font-family:Helvetica;">Apache Phoenix</span><span style="font-family:宋体;">进行了集成。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://blog.cask.co/2016/05/tephra-a-transaction-engine-for-hbase-moves-to-apache-incubation/</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">TechRepublic</span><span style="font-family: 宋体;">撰文介绍了</span><span style="font-family:Helvetica;">Concord.io</span><span style="font-family:宋体;">，一个由</span><span style="font-family:Helvetica;">C++</span><span style="font-family:宋体;">开发的流式处理框架。旨在填补高性能流式计算市场的空缺。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://www.techrepublic.com/article/could-concord-topple-apache-spark-from-its-big-data-throne/</span></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">产品发布</span></strong><strong></strong></p>  <p align="left"><span style="font-family:Helvetica;">Apache Avro</span><span style="font-family:宋体;">本周发布了</span><span style="font-family:Helvetica;">1.8.1</span><span style="font-family:宋体;">版。修复了超过</span><span style="font-family:Helvetica;">20</span><span style="font-family:宋体;">个</span><span style="font-family:Helvetica;">bug</span><span style="font-family:宋体;">和一些其它进步。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">http://mail-archives.us.apache.org/mod_mbox/www-announce/201605.mbox/%3CCAO4re1nYMm79WQ2LUeODWjHmJ9EiYOF=mty6p2aiq-S_4R95iQ@mail.gmail.com%3E</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Confluent</span><span style="font-family:宋体;">发布了基于</span><span style="font-family:Helvetica;">librdkafka</span><span style="font-family:宋体;">开发的</span><span style="font-family:Helvetica;">Kafka Python</span><span style="font-family:宋体;">客户端。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">https://pypi.python.org/pypi/confluent-kafka/0.9.1.1</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:&quot;MS Mincho&quot;;MS Mincho&quot;;">伴随着新的</span><span style="font-family:Helvetica;">Kafka </span><span style="font-family:宋体;">流式计算方式，</span><span style="font-family:Helvetica;">Apache Kafka 0.10</span><span style="font-family:宋体;">版发布了。新版本支持了机架感知和消息中的</span><span style="font-family:Helvetica;">timestamp</span><span style="font-family:宋体;">，提升了</span><span style="font-family:Helvetica;">SASL</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">Kafka Connect</span><span style="font-family:宋体;">等。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">http://mail-archives.us.apache.org/mod_mbox/www-announce/201605.mbox/%3CCAPuboUuRyCRxDp5CLjv2yVM77SpYFF+HdnBeiiyeumYTJNpY4g@mail.gmail.com%3E</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Confluent</span><span style="font-family:宋体;">发布了基于</span><span style="font-family:Helvetica;">Apache Kafka 0.10</span><span style="font-family: 宋体;">的</span><span style="font-family:Helvetica;">Confluent Platform 3.0</span><span style="font-family:宋体;">版。除了</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">的核心特性，</span><span style="font-family:Helvetica;">Confluent Platform</span><span style="font-family:宋体;">还有一个商业组件为</span><span style="font-family:Helvetica;">Kafka Connect</span><span style="font-family:宋体;">提供配置工具和端到端流监控。</span></p>  <p align="left"><a href="http://www.confluent.io/blog/announcing-apache-kafka-0.10-and-confluent-platform-3.0"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">http://www.confluent.io/blog/announcing-apache-kafka-0.10-and-confluent-platform-3.0</span></a></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Apache Kylin</span><span style="font-family:宋体;">，大数据</span><span style="font-family:Helvetica;">OLAP</span><span style="font-family:宋体;">引擎，发布了</span><span style="font-family:Helvetica;">1.5.2</span><span style="font-family:宋体;">版。作为一次补丁级的发布，</span><span style="font-family:Helvetica;">1.5.2</span><span style="font-family:宋体;">有不少新特性</span><span style="font-family:Helvetica;">/</span><span style="font-family:宋体;">提升</span><span style="font-family:Helvetica;">/bug</span><span style="font-family:宋体;">修复，包括支持</span><span style="font-family:Helvetica;">CDH 5.7</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">MapR</span><span style="font-family:宋体;">。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">http://mail-archives.us.apache.org/mod_mbox/www-announce/201605.mbox/%3CCA+LQBaTDxb4wVYVvtOC22gMbJ0p9cvhAWzEY_x2n1oNGvEDPSQ@mail.gmail.com%3E</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Twitter</span><span style="font-family:宋体;">开源了他们的流式处理系统</span><span style="font-family:Helvetica;">Heron</span><span style="font-family:宋体;">。</span><span style="font-family:Helvetica;">Heron</span><span style="font-family:宋体;">是</span><span style="font-family:Helvetica;">Twitter</span><span style="font-family:宋体;">用于替换</span><span style="font-family:Helvetica;">Apache Storm</span><span style="font-family: 宋体;">的产品，发力点在性能、调试以及开发人员生产率。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">https://blog.twitter.com/2016/open-sourcing-twitter-heron</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Envelope</span><span style="font-family:宋体;">是来自于</span><span style="font-family:Helvetica;">Cloudera Labs</span><span style="font-family: 宋体;">的新项目，它提供了基于配置文件的流式</span><span style="font-family:Helvetica;">ETL</span><span style="font-family:宋体;">处理过程。构建在</span><span style="font-family:Helvetica;">Spark streaming</span><span style="font-family:宋体;">之上，</span><span style="font-family:Helvetica;">Envelope</span><span style="font-family:宋体;">最近正在研发面向</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">Kudu</span><span style="font-family:宋体;">的连接器。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">http://blog.cloudera.com/blog/2016/05/new-in-cloudera-labs-envelope-for-apache-spark-streaming/</span></p>  <p align="left">&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">活动</span></strong><strong></strong></p>  <p align="left"><span style="font-size:14.0pt;font-family:SimSun;">中国</span></p>  <p align="left"><span style="font-family:Helvetica;">Spark Meetup 4 (</span><span style="font-family:宋体;">杭州</span><span style="font-family:Helvetica;">) &#8211; </span><span style="font-family:宋体;">周日</span><span style="font-family:Helvetica;">, 6</span><span style="font-family:宋体;">月</span><span style="font-family:Helvetica;">5</span><span style="font-family:宋体;">日</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">http://www.meetup.com/Hangzhou-Apache-Spark-Meetup/events/231071384/</span></p><img src ="http://www.blogjava.net/rosen/aggbug/430841.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/rosen/" target="_blank">Rosen</a> 2016-06-09 00:11 <a href="http://www.blogjava.net/rosen/archive/2016/06/09/430841.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Hadoop周刊—第 171 期</title><link>http://www.blogjava.net/rosen/archive/2016/06/08/430838.html</link><dc:creator>Rosen</dc:creator><author>Rosen</author><pubDate>Wed, 08 Jun 2016 08:42:00 GMT</pubDate><guid>http://www.blogjava.net/rosen/archive/2016/06/08/430838.html</guid><wfw:comment>http://www.blogjava.net/rosen/comments/430838.html</wfw:comment><comments>http://www.blogjava.net/rosen/archive/2016/06/08/430838.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/rosen/comments/commentRss/430838.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/rosen/services/trackbacks/430838.html</trackback:ping><description><![CDATA[<p align="left" style="line-height: 10%;"><strong>&nbsp;</strong></p>  <p align="left" style="line-height: 10%;"><strong><span style="font-size:16.0pt;line-height:10%">Hadoop</span></strong><strong><span style="font-size:16.0pt;line-height:10%;font-family:宋体;">周刊</span></strong><strong> </strong><strong><span style="font-size:16.0pt;line-height: 10%;font-family:宋体;">第</span></strong><strong><span style="font-size:16.0pt;line-height:10%"> 171 </span></strong><strong><span style="font-size:16.0pt;line-height: 10%;font-family:宋体;">期</span></strong><strong></strong></p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;"><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">启明星辰平台和大数据总体组编译</span></p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;"><span style="font-size:14.0pt;line-height:10%">2016</span><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">年</span><span style="font-size:14.0pt;line-height:10%">5</span><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">月</span><span style="font-size:14.0pt;line-height:10%">22</span><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">日</span></p>  <p>&nbsp;</p>  <p><span style="font-family:宋体;">本周，包括</span><span style="font-family:Helvetica;">LinkedIn</span><span style="font-family:宋体;">新开源项目在内的几个项目都有版本发布。在技术新闻和其他新闻方面，多篇文章回顾了</span><span style="font-family:Helvetica;">Apache: Big Data North America</span><span style="font-family:宋体;">会议，另外有一组跨越多个不同数据系统分析纽约出租车数据的系列文章。</span></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">技术新闻</span></strong><strong></strong></p>  <p><span style="font-family:Helvetica;">Databricks</span><span style="font-family:宋体;">博客分析了</span><span style="font-family:Helvetica;">Apache Spark</span><span style="font-family:宋体;">中两种逼近算法。之一，</span><span style="font-family:Helvetica;">&#8220;approxCountDistict&#8221;</span><span style="font-family:宋体;">是用来评估不同值的数量；之二，</span><span style="font-family: Helvetica;">&#8220;approxQuantile&#8221;</span><span style="font-family:宋体;">用于生成逼近百分比。本文介绍了算法和可视化精度不同的残差。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://databricks.com/blog/2016/05/19/approximate-algorithms-in-apache-spark-hyperloglog-and-quantiles.html</span></p>  <p>&nbsp;</p>  <p><span style="font-family:宋体;">本教程描述了如何使用</span><span style="font-family:Helvetica;">Apache Hadoop HDFS</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Apache Solr</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Hue</span><span style="font-family:宋体;">存储、索引、查询</span><span style="font-family:Helvetica;">DICOM</span><span style="font-family:宋体;">格式的医学影像。文章贯穿了加载和获取数据的整个步骤。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://blog.cloudera.com/blog/2016/05/how-to-process-and-index-medical-images-with-apache-hadoop-and-apache-solr/</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">MapR Streams</span><span style="font-family: 宋体;">是一个</span><span style="font-family:Helvetica;">API</span><span style="font-family:宋体;">兼容</span><span style="font-family:Helvetica;">Apache Kafka</span><span style="font-family:宋体;">的系统。本文在宏观上比较了</span><span style="font-family:Helvetica;">MapR Streams</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">的异同。同时阐明了</span><span style="font-family:Helvetica;">Kafka Streams</span><span style="font-family:宋体;">怎样和</span><span style="font-family:Helvetica;">MapR Streams</span><span style="font-family:宋体;">扯上关系的。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://www.mapr.com/blog/apache-kafka-and-mapr-streams-terms-techniques-and-new-designs</span></p>  <p>&nbsp;</p>  <p><span style="font-family:宋体;">本文在我看来是最清晰介绍</span><span style="font-family:Helvetica;">Paxos</span><span style="font-family:宋体;">的文章之一，</span><span style="font-family:Helvetica;">Paxos</span><span style="font-family:宋体;">为分布式系统构建了一致性协议。本文用绘图计算机和分布式拍卖示范了这个协议。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://ifeanyi.co/posts/understanding-consensus/</span></p>  <p>&nbsp;</p>  <p><span style="font-family:宋体;">基于</span><span style="font-family:Helvetica;">Apache: Big Data North America</span><span style="font-family:宋体;">会议上的一篇演讲。</span><span style="font-family:Helvetica;">Datanami</span><span style="font-family:宋体;">窥探了即将发布的</span><span style="font-family:Helvetica;">Apache Hadoop 3</span><span style="font-family: 宋体;">的新特性。包括，</span><span style="font-family:Helvetica;">shell</span><span style="font-family:宋体;">脚本重写、任务集本地优化、内存大小自动伸缩能力、支持</span><span style="font-family:Helvetica;">HDFS erasure codings</span><span style="font-family:宋体;">。本文着重在</span><span style="font-family:Helvetica;">erasure codings</span><span style="font-family:宋体;">上，文章密切关注了</span><span style="font-family:Helvetica;">erasure codings</span><span style="font-family:宋体;">在存储效率方面的提升（</span><span style="font-family: Helvetica;">3x</span><span style="font-family:宋体;">磁盘消耗降低到</span><span style="font-family:Helvetica;">1.5x</span><span style="font-family:宋体;">）。</span></p>  <p align="left"><a href="http://www.datanami.com/2016/05/18/hadoop-3-poised-boost-storage-capacity-resilience-erasure-coding/"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">http://www.datanami.com/2016/05/18/hadoop-3-poised-boost-storage-capacity-resilience-erasure-coding/</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:宋体;">这篇演讲来自于</span><span style="font-family:Helvetica;">PyData</span><span style="font-family:宋体;">柏林会议，描述了</span><span style="font-family:Helvetica;">Apache Arrow</span><span style="font-family: 宋体;">和</span><span style="font-family:Helvetica;">Feather</span><span style="font-family:宋体;">文件格式，探究了数据在跨语言</span><span style="font-family:Helvetica;">/</span><span style="font-family:宋体;">框架互操作性的工作机制。</span></p>  <p align="left"><a href="http://www.slideshare.net/wesm/python-data-ecosystem-thoughts-on-building-for-the-future"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">http://www.slideshare.net/wesm/python-data-ecosystem-thoughts-on-building-for-the-future</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:宋体;">发布了两个来自于不同会议与</span><span style="font-family:Helvetica;">Apache Kafka</span><span style="font-family:宋体;">有关的演讲视频。第一个讨论了</span><span style="font-family: Helvetica;">Kafka</span><span style="font-family:宋体;">的安全特性，第二个探索了</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">如何跨系统共享数据。</span></p>  <p align="left"><a href="https://www.oreilly.com/learning/securing-apache-kafka"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">https://www.oreilly.com/learning/securing-apache-kafka</span></a></p>  <p><a href="https://www.infoq.com/presentations/event-streams-kafka"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">https://www.infoq.com/presentations/event-streams-kafka</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:宋体;">这篇博客集成了数篇利用</span><span style="font-family:Helvetica;">Amazon Redshift</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Google BigQuery</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Postgres</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Presto</span><span style="font-family:宋体;">数据系统加载</span><span style="font-family:Helvetica;">/</span><span style="font-family:宋体;">查询纽约出租车数据的文章。除了原始基准测试，还详细介绍了如何处理故障、优化、比较替代方案（</span><span style="font-family:Helvetica;">AWS</span><span style="font-family:宋体;">的</span><span style="font-family:Helvetica;">S3</span><span style="font-family:宋体;">与</span><span style="font-family:Helvetica;">HDFS</span><span style="font-family:宋体;">比）。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://tech.marksblogg.com/all-billion-nyc-taxi-rides-redshift.html</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">O'Reilly</span><span style="font-family:宋体;">撰文介绍了通过</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Flink</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Elasticsearch</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Kibana</span><span style="font-family:宋体;">怎样实现</span><span style="font-family:Helvetica;">kappa</span><span style="font-family:宋体;">架构。文章概述了</span><span style="font-family:Helvetica;">lambda</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">kappa</span><span style="font-family:宋体;">架构，介绍了主要的架构组件，以及怎样设置使用贝叶斯模型发现新奇事物。</span></p>  <p align="left"><a href="https://www.oreilly.com/ideas/applying-the-kappa-architecture-in-the-telco-industry"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">http://www.oreilly.com/ideas/applying-the-kappa-architecture-in-the-telco-industry</span></a></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">其他新闻</span></strong><strong></strong></p>  <p><span style="font-family:宋体;">本文列举了最近在</span><span style="font-family:Helvetica;">Apache: Big Data North America</span><span style="font-family:宋体;">会议上提到的几个大数据生态系统项目。有不少是我们没纳入视线的内容。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://www.datanami.com/2016/05/11/open-source-tour-de-force-apache-big-data-2016/</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Pivotal</span><span style="font-family:宋体;">博客有一篇关于大数据和敏捷开发有趣的文章。大数据系统往往停留在非敏捷的世界，例如在装载数据前需求要收集到位，模型要定义好。本文认为，没有在云环境中经过长期验证的话，要对这种方式进行约束（有限的能力和性能、竖井式数据等）。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://blog.pivotal.io/big-data-pivotal/features/when-it-comes-to-big-data-cloud-and-agility-go-hand-in-hand</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Databricks</span><span style="font-family:宋体;">发布了他们记录的网络会议视频</span><span style="font-family: Helvetica;">&#8220;Apache Spark MLlib: From Quick Start to Scikit-Learn&#8221;</span><span style="font-family:宋体;">。除了视频内容，他们还在会议中解答了八个常见问题。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://databricks.com/blog/2016/05/18/spark-mllib-from-quick-start-to-scikit-learn.html</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Hortonworks</span><span style="font-family: 宋体;">博客回顾了</span><span style="font-family:Helvetica;">Apache Storm</span><span style="font-family:宋体;">的历史。</span><span style="font-family:Helvetica;">2011</span><span style="font-family:宋体;">年开源，</span><span style="font-family:Helvetica;">2013</span><span style="font-family:宋体;">年进入</span><span style="font-family:Helvetica;">Apache</span><span style="font-family:宋体;">孵化器，</span><span style="font-family:Helvetica;">2014</span><span style="font-family:宋体;">年成为顶级项目，今年初发布了</span><span style="font-family:Helvetica;">1.0</span><span style="font-family:宋体;">版。本文论述了每个里程碑的主要技术进步。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://hortonworks.com/blog/brief-history-apache-storm/</span></p>  <p>&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">HBaseCon</span><span style="font-family:宋体;">本周在旧金山召开。这次会议，</span><span style="font-family:Helvetica;">Apple</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Yahoo</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Facebook</span><span style="font-family:宋体;">都有演讲材料。</span></p>  <p align="left"><a href="http://hbasecon.com/"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline: none">http://hbasecon.com</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">MapR</span><span style="font-family:宋体;">发图庆祝了过去一年中</span><span style="font-family: Helvetica;">Apache Drill</span><span style="font-family:宋体;">取得的成绩。一年中发布了</span><span style="font-family:Helvetica;">7</span><span style="font-family:宋体;">个版本，完成了多个里程碑。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://www.mapr.com/blog/happy-anniversary-apache-drill-what-difference-year-makes</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Datanami</span><span style="font-family:宋体;">发布了在</span><span style="font-family:Helvetica;">Apache: Big Data North America</span><span style="font-family:宋体;">会议上，</span><span style="font-family:Helvetica;">ASF</span><span style="font-family:宋体;">总监</span><span style="font-family:Helvetica;">Jim Jagielski</span><span style="font-family: 宋体;">和</span><span style="font-family:Helvetica;">ODPi</span><span style="font-family:宋体;">项目总监</span><span style="font-family:Helvetica;">John Mertic</span><span style="font-family:宋体;">的问答录，如大家所料，主要话题还是</span><span style="font-family:Helvetica;">ASF</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">ODPi</span><span style="font-family:宋体;">的关系。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://www.datanami.com/2016/05/20/apache-foundation-keeps-eyes-wide-open-odpi/</span></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">产品发布</span></strong><strong></strong></p>  <p align="left"><span style="font-family:Helvetica;">LinkedIn</span><span style="font-family:宋体;">开源了</span><span style="font-family:Helvetica;">Ambry</span><span style="font-family:宋体;">，他们的</span><span style="font-family:Helvetica;">ObjectStore</span><span style="font-family: 宋体;">分布式系统。</span><span style="font-family:Helvetica;">Ambry</span><span style="font-family:宋体;">代码已提交到</span><span style="font-family:Helvetica;">github</span><span style="font-family:宋体;">，这篇博文介绍了</span><span style="font-family:Helvetica;">Ambry</span><span style="font-family:宋体;">的服务承诺，设计目标，体系架构和接口。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">https://engineering.linkedin.com/blog/2016/05/introducing-and-open-sourcing-ambry---linkedins-new-distributed-</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:宋体;">由</span><span style="font-family:Helvetica;">apache HAWQ</span><span style="font-family:宋体;">（孵化中）驱动的</span><span style="font-family:Helvetica;">Pivotal HDB </span><span style="font-family:宋体;">本周发布了</span><span style="font-family:Helvetica;">2.0</span><span style="font-family:宋体;">版，</span><span style="font-family:Helvetica;">HDB</span><span style="font-family:宋体;">为</span><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">提供了分析数据库。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">https://blog.pivotal.io/big-data-pivotal/products/fail-fast-and-ask-more-questions-of-your-data-with-hdb-2-0</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Apache Mahout</span><span style="font-family:宋体;">本周发布了</span><span style="font-family:Helvetica;">0.12.1</span><span style="font-family:&quot;MS Mincho&quot;;MS Mincho&quot;;">版，</span><span style="font-family:Helvetica;">Mahout</span><span style="font-family:宋体;">是一个机器学习和数据挖掘系统。本次发布旨在推进</span><span style="font-family:Helvetica;">Flink</span><span style="font-family:宋体;">与</span><span style="font-family:Helvetica;">Mahout</span><span style="font-family:宋体;">的集成。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">http://mail-archives.us.apache.org/mod_mbox/www-announce/201605.mbox/%3CCAOtpBjhshagyLN3Qnt0xRnc7YbnMVJjTS4piVXL7LiS2pQguXw@mail.gmail.com%3E</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Apache Tajo</span><span style="font-family:宋体;">发布了</span><span style="font-family:Helvetica;">0.11.3</span><span style="font-family:宋体;">版。</span><span style="font-family:Helvetica;">Tajo</span><span style="font-family:宋体;">是</span><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">的数据仓库。本次发布修正了</span><span style="font-family:Helvetica;">5</span><span style="font-family:宋体;">个</span><span style="font-family:Helvetica;">bug</span><span style="font-family:宋体;">。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">http://tajo.apache.org/releases/0.11.3/announcement.html</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">MongoDB</span><span style="font-family:宋体;">为</span><span style="font-family:Helvetica;">Apache Spark</span><span style="font-family: 宋体;">发布了新的</span><span style="font-family:Helvetica;">MongoDB Connector</span><span style="font-family:宋体;">。除了对应</span><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">的</span><span style="font-family:Helvetica;">Hadoop InputFormat shim</span><span style="font-family:宋体;">外，该</span><span style="font-family:Helvetica;">Connector</span><span style="font-family:宋体;">还有其他特性。最后，还解释了</span><span style="font-family: Helvetica;">MongoDB</span><span style="font-family:宋体;">一些关键特性。</span></p>  <p align="left"><a href="https://www.mongodb.com/blog/post/mongodb-connector-for-apache-spark-announcing-early-access-program-and-new-spark-training"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">https://www.mongodb.com/blog/post/mongodb-connector-for-apache-spark-announcing-early-access-program-and-new-spark-training</span></a></p>  <p align="left"><a href="http://rosslawley.co.uk/introducing-a-new=mongodb-spark-connector/"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">http://rosslawley.co.uk/introducing-a-new=mongodb-spark-connector/</span></a></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">SyncSort</span><span style="font-family:宋体;">发布了</span><span style="font-family:Helvetica;">DMX-h v9</span><span style="font-family:宋体;">，支持</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">以及新的智能执行框架。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">http://insidebigdata.com/2016/05/20/syncsorts-latest-innovations-simplify-integration-of-streaming-data-in-spark-kafka-and-hadoop-for-real-time-analytics/</span></p>  <p align="left">&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">活动</span></strong><strong></strong></p>  <p align="left"><span style="font-size:14.0pt;font-family:SimSun;">中国</span></p>  <p align="left"><span style="font-family:SimSun;">无</span></p><img src ="http://www.blogjava.net/rosen/aggbug/430838.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/rosen/" target="_blank">Rosen</a> 2016-06-08 16:42 <a href="http://www.blogjava.net/rosen/archive/2016/06/08/430838.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Hadoop周刊—第 169 期</title><link>http://www.blogjava.net/rosen/archive/2016/05/15/430513.html</link><dc:creator>Rosen</dc:creator><author>Rosen</author><pubDate>Sun, 15 May 2016 12:30:00 GMT</pubDate><guid>http://www.blogjava.net/rosen/archive/2016/05/15/430513.html</guid><wfw:comment>http://www.blogjava.net/rosen/comments/430513.html</wfw:comment><comments>http://www.blogjava.net/rosen/archive/2016/05/15/430513.html#Feedback</comments><slash:comments>1</slash:comments><wfw:commentRss>http://www.blogjava.net/rosen/comments/commentRss/430513.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/rosen/services/trackbacks/430513.html</trackback:ping><description><![CDATA[<p class="MsoNormal" align="left" style="text-align:left;line-height:10%;
mso-outline-level:1"><strong style="mso-bidi-font-weight:normal"><span lang="EN-US" style="font-size:16.0pt;line-height:10%"><o:p>&nbsp;</o:p></span></strong></p><p align="left" style="line-height: 10%;"><br /></p><p align="left" style="line-height: 10%;"><strong><span style="font-size:16.0pt;line-height:10%">Hadoop</span></strong><strong><span style="font-size:16.0pt;line-height:10%;font-family:宋体;">周刊</span></strong><strong> </strong><strong><span style="font-size:16.0pt;line-height: 10%;font-family:宋体;">第</span></strong><strong><span style="font-size:16.0pt;line-height:10%"> 169 </span></strong><strong><span style="font-size:16.0pt;line-height: 10%;font-family:宋体;">期</span></strong><strong></strong></p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;"><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">启明星辰平台和大数据整体组编译</span></p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;"><span style="font-size:14.0pt;line-height:10%">2016</span><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">年</span><span style="font-size:14.0pt;line-height:10%">5</span><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">月</span><span style="font-size:14.0pt;line-height:10%">8</span><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">日</span></p>  <p>&nbsp;</p>  <p><span style="font-family:宋体;">本周内容短小精练。主题覆盖</span><span style="font-family:Helvetica;">Apache Beam</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">MapR</span><span style="font-family:宋体;">季度业绩、最近的</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">峰会，以及来自</span><span style="font-family:Helvetica;">Cloudera</span><span style="font-family:宋体;">新开源的分布式单元测试框架。</span></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">技术新闻</span></strong><strong></strong></p>  <p><span style="font-family:Helvetica;">Elastic</span><span style="font-family:宋体;">分析了宕机事件的根源。错误配置</span><span style="font-family: Helvetica;">ZooKeeper</span><span style="font-family:宋体;">内存设置会引起过度的</span><span style="font-family:Helvetica;">GC</span><span style="font-family:宋体;">，这将从根本上导致</span><span style="font-family:Helvetica;">ZooKeeper</span><span style="font-family:宋体;">集群丢失。文章介绍了一些缓解策略，用来防止未来类似问题的发生。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://www.elastic.co/blog/elastic-cloud-outage-april-2016</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Cask</span><span style="font-family:宋体;">博客简明扼要的归纳了最近</span><span style="font-family: Helvetica;">Big Data Applications Meetup</span><span style="font-family:宋体;">的花絮。首先出场的是</span><span style="font-family:Helvetica;">Pachyderm</span><span style="font-family:宋体;">，它基于</span><span style="font-family:Helvetica;">Docker</span><span style="font-family:宋体;">容器提供</span><span style="font-family:Helvetica;">&#8220;</span><span style="font-family:宋体;">数据</span><span style="font-family:Helvetica;">Git&#8221;</span><span style="font-family:宋体;">语义。第二个出场的是</span><span style="font-family: Helvetica;">TubeMogul</span><span style="font-family:宋体;">大数据平台，</span><span style="font-family:Helvetica;">TubeMogul</span><span style="font-family:宋体;">构建于</span><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Hive</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Presto</span><span style="font-family:宋体;">之上。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://blog.cask.co/2016/05/pachyderm-and-tubemogul-share-their-big-data-application-platforms-and-experience/</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Google</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">dataArtisans</span><span style="font-family:宋体;">同时撰文介绍了</span><span style="font-family:Helvetica;">Apache Beam</span><span style="font-family:宋体;">（前生是</span><span style="font-family:Helvetica;">Google Dataflow SDK</span><span style="font-family:宋体;">）。</span><span style="font-family:Helvetica;">Google</span><span style="font-family:宋体;">的文章解释了为何开源和开发</span><span style="font-family:Helvetica;">Beam</span><span style="font-family:宋体;">的动机，</span><span style="font-family:Helvetica;">dataArtisans</span><span style="font-family: 宋体;">的文章介绍他们对</span><span style="font-family:Helvetica;">Beam</span><span style="font-family:宋体;">模型的支持以及怎样考虑</span><span style="font-family:Helvetica;">Flink</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">Beam API</span><span style="font-family:宋体;">之间的关系。</span></p>  <p align="left"><a href="https://cloud.google.com/blog/big-data/2016/05/why-apache-beam-a-google-perspective"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">https://cloud.google.com/blog/big-data/2016/05/why-apache-beam-a-google-perspective</span></a></p>  <p><a href="http://data-artisans.com/why-apache-beam/"><span style="font-family:Helvetica;color:#386EFF;text-decoration: none;text-underline:none">http://data-artisans.com/why-apache-beam/</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">IBM Hadoop dev</span><span style="font-family: 宋体;">博客有个关于安装</span><span style="font-family:Helvetica;">Python</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Scala</span><span style="font-family:宋体;">和为</span><span style="font-family:Helvetica;">Jupyter notebook</span><span style="font-family:宋体;">嵌入</span><span style="font-family:Helvetica;">R</span><span style="font-family:宋体;">内核的操作说明。同时，也说明了怎样连接</span><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">和通过</span><span style="font-family:Helvetica;">SSL</span><span style="font-family:宋体;">暴露</span><span style="font-family:Helvetica;">notebook</span><span style="font-family:宋体;">。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://developer.ibm.com/hadoop/blog/2016/05/04/install-jupyter-notebook-spark/</span></p>  <p>&nbsp;</p>  <p><span style="font-family:宋体;">本文介绍了</span><span style="font-family:Helvetica;">Mongo Hadoop</span><span style="font-family:宋体;">的连接函数是如何窜起</span><span style="font-family: Helvetica;">Spark</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">MongoDB</span><span style="font-family:宋体;">的。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://x.ai/using-the-mongo-hadoop-connector-as-a-translation-layer-to-spark/</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Qubole</span><span style="font-family:宋体;">博客撰文比较了用于大数据分析的流行编程语言</span><span style="font-family:Helvetica;">&#8212;Python</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">R</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">Scala</span><span style="font-family:宋体;">。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://www.qubole.com/blog/big-data/programming-language/</span></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">其他新闻</span></strong><strong></strong></p>  <p><span style="font-family:Helvetica;">MapR</span><span style="font-family:宋体;">宣布本季度他们授权下单创纪录的增长了</span><span style="font-family:Helvetica;">99%</span><span style="font-family:宋体;">，以及</span><span style="font-family:Helvetica;">146%</span><span style="font-family:宋体;">的美元净增长率。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://www.mapr.com/company/press-releases/mapr-achieves-another-record-quarter-99-software-subscription-license-growth</span></p>  <p>&nbsp;</p>  <p><span style="font-family:宋体;">本文描述了最近</span><span style="font-family:Helvetica;">Google Cloud Dataflow</span><span style="font-family: 宋体;">和</span><span style="font-family:Helvetica;">Apache Spark</span><span style="font-family:宋体;">在</span><span style="font-family:Helvetica;">Google Compute Engine</span><span style="font-family:宋体;">上的基准测试表现。</span><span style="font-family:Helvetica;">Dataflow</span><span style="font-family:宋体;">胜过</span><span style="font-family:Helvetica;">Spark2</span><span style="font-family:宋体;">－</span><span style="font-family:Helvetica;">5.7</span><span style="font-family:宋体;">倍（一直以来，最好是在自己的环境下评估工作负载，而不是一味的信任基准测试）。本文还解释了一种</span><span style="font-family:Helvetica;">&#8220;</span><span style="font-family:宋体;">冷战</span><span style="font-family:Helvetica;">&#8221;</span><span style="font-family:宋体;">，通过它使每个使用大数据工具的人获益。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://www.datanami.com/2016/05/02/dataflow-tops-spark-benchmark-test/</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Confluent</span><span style="font-family:宋体;">博客回顾了最近召开的</span><span style="font-family: Helvetica;">Kafka</span><span style="font-family:宋体;">峰会，包括编程挑战预选赛，主题演讲，分组会议等等。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://www.confluent.io/blog/log-compaction-kafka-summit-edition-may-2016</span></p>  <p>&nbsp;</p>  <p><span style="font-family:宋体;">福布斯介绍了美国运通在过去</span><span style="font-family:Helvetica;">5</span><span style="font-family:宋体;">年间采用大数据技术的历程。本文中，美国运通分享了一些技巧和学到的经验教训，例如采用新技术的困难（得到组织高层的认同是多么的重要），以及雇佣和留住工程师的挑战等等。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://www.forbes.com/sites/ciocentral/2016/04/27/inside-american-express-big-data-journey/</span></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">产品发布</span></strong><strong></strong></p>  <p align="left"><span style="font-family:Helvetica;">Cask</span><span style="font-family:宋体;">发布了</span><span style="font-family:Helvetica;">Cask Data Application Platform (CDAP)3.4</span><span style="font-family:宋体;">版本。</span><span style="font-family:&quot;MS Mincho&quot;;MS Mincho&quot;;">新版本增加了</span><span style="font-family:Helvetica;">Cask Tracker</span><span style="font-family: 宋体;">，新的数据集成</span><span style="font-family:Helvetica;">/</span><span style="font-family:宋体;">审计</span><span style="font-family:Helvetica;">/</span><span style="font-family:宋体;">搜索系统，升级了</span><span style="font-family:Helvetica;">Cask Hydrator</span><span style="font-family:宋体;">的</span><span style="font-family:Helvetica;">UI</span><span style="font-family:宋体;">，增强了对</span><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">的支持等等。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">http://blog.cask.co/2016/05/announcing-cdap-release-3-4-introducing-tracker-next-gen-hydrator-enhanced-spark-support-and-much-more/</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Cloudera</span><span style="font-family:宋体;">开源了</span><span style="font-family:Helvetica;">&#8220;dist_tes&#8221;</span><span style="font-family:宋体;">，并行执行单元测试的新工具。通过该工具，对</span><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">Kudu</span><span style="font-family:宋体;">项目进行单元测试，可以在数分钟而不是数小时内完成。该工具绑定了</span><span style="font-family: Helvetica;">C++</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">Java</span><span style="font-family:宋体;">，并在网站上演示了这些特性。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">http://blog.cloudera.com/blog/2016/05/quality-assurance-at-cloudera-distributed-unit-testing/</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Google</span><span style="font-family:宋体;">宣布</span><span style="font-family:Helvetica;">Google BigQuery</span><span style="font-family: 宋体;">和</span><span style="font-family:Helvetica;">Drive</span><span style="font-family:宋体;">可集成在一起，把输出保存到</span><span style="font-family:Helvetica;">Google sheets</span><span style="font-family:宋体;">。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">http://techcrunch.com/2016/05/06/google-connects-bigquery-to-google-drive-and-sheets/</span></p>  <p align="left">&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">活动</span></strong><strong></strong></p>  <p align="left"><span style="font-size:14.0pt;font-family:SimSun;">中国</span></p>  <p align="left"><span style="font-family:SimSun;">无</span></p><img src ="http://www.blogjava.net/rosen/aggbug/430513.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/rosen/" target="_blank">Rosen</a> 2016-05-15 20:30 <a href="http://www.blogjava.net/rosen/archive/2016/05/15/430513.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Hadoop周刊—第 168 期</title><link>http://www.blogjava.net/rosen/archive/2016/05/07/430401.html</link><dc:creator>Rosen</dc:creator><author>Rosen</author><pubDate>Sat, 07 May 2016 15:37:00 GMT</pubDate><guid>http://www.blogjava.net/rosen/archive/2016/05/07/430401.html</guid><wfw:comment>http://www.blogjava.net/rosen/comments/430401.html</wfw:comment><comments>http://www.blogjava.net/rosen/archive/2016/05/07/430401.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/rosen/comments/commentRss/430401.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/rosen/services/trackbacks/430401.html</trackback:ping><description><![CDATA[<p align="left" style="line-height: 10%;"><strong>&nbsp;</strong></p>  <p align="left" style="line-height: 10%;"><strong><span style="font-size:16.0pt;line-height:10%">Hadoop</span></strong><strong><span style="font-size:16.0pt;line-height:10%;font-family:宋体;">周刊</span></strong><strong> </strong><strong><span style="font-size:16.0pt;line-height: 10%;font-family:宋体;">第</span></strong><strong><span style="font-size:16.0pt;line-height:10%"> 168 </span></strong><strong><span style="font-size:16.0pt;line-height: 10%;font-family:宋体;">期</span></strong><strong></strong></p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;"><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">启明星辰平台和大数据整体组编译</span></p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;">&nbsp;</p>  <p align="left" style="line-height: 10%;"><span style="font-size:14.0pt;line-height:10%">2016</span><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">年</span><span style="font-size:14.0pt;line-height:10%">5</span><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">月</span><span style="font-size:14.0pt;line-height:10%">1</span><span style="font-size:14.0pt;line-height:10%;font-family:宋体;">日</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">峰会本周在旧金山召开，不容置疑本周期刊将有大量的</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">内容。除此以外，还有大量关于</span><span style="font-family:Helvetica;">Impala</span><span style="font-family:宋体;">性能、</span><span style="font-family:Helvetica;">Kudu</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Druid</span><span style="font-family:宋体;">方面的文章。在其他新闻部分，</span><span style="font-family:Helvetica;">Apache Apex</span><span style="font-family:宋体;">成为了</span><span style="font-family:Helvetica;">Apache</span><span style="font-family:宋体;">的顶级项目，</span><span style="font-family:Helvetica;">Qubole</span><span style="font-family:宋体;">开源了其</span><span style="font-family:Helvetica;">StreamX</span><span style="font-family:宋体;">项目。</span></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">技术新闻</span></strong><strong></strong></p>  <p><span style="font-family: 宋体;">本文快速浏览了如何在可能或不可能创建新数据分区的情况下操作</span><span style="font-family:Helvetica;">Spark RDD</span><span style="font-family:宋体;">。尤其</span><span style="font-family:Helvetica;">`mapValues`</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">`filter`</span><span style="font-family:宋体;">会保存分区而</span><span style="font-family:Helvetica;">`map`</span><span style="font-family:宋体;">却不会。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://medium.com/@corentinanjuna/apache-spark-rdd-partitioning-preservation-2187a93bc33e</span></p>  <p><span style="position: relative;z-index:251659264"><span style="left:0px;position:absolute;left:-10px; top:-133px;width:433px;height:19px"><img width="433" height="19" src="file://localhost/Users/jiangrongsheng/Library/Group%20Containers/UBF8T346G9.Office/msoclip1/01/clip_image001.png" v:shapes="直线连接符_x0020_1"  alt="" /></span></span>&nbsp;</p>  <br clear="ALL" />  <p><span style="font-family:宋体;">本文介绍了如何使用</span><span style="font-family:Helvetica;">Conda</span><span style="font-family:宋体;">构建独立的</span><span style="font-family:Helvetica;">Python</span><span style="font-family:宋体;">环境（例如</span><span style="font-family:Helvetica;">pandas</span><span style="font-family:宋体;">插件），以便做为</span><span style="font-family:Helvetica;">Spark job</span><span style="font-family:宋体;">的一部分装载到集群节点。经过这样的处理，就能在没有</span><span style="font-family:Helvetica;">python</span><span style="font-family:宋体;">原生包被安装在主操作系统上的情况下运行</span><span style="font-family:Helvetica;">PySpark job</span><span style="font-family:宋体;">。这种方案同样适用于</span><span style="font-family:Helvetica;">SparkR</span><span style="font-family:宋体;">。</span></p>  <p align="left"><a href="http://quasiben.github.io/blog/2016/4/15/conda-spark/"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">http://quasiben.github.io/blog/2016/4/15/conda-spark/</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Datadog</span><span style="font-family:宋体;">博客有三篇监控</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">的系列文章。第一篇详细概括了</span><span style="font-family:Helvetica;">broker</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">producer</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">consumers</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">ZooKeeper</span><span style="font-family:宋体;">的关键度量指标。第二篇介绍了怎样在</span><span style="font-family:Helvetica;">JConsole</span><span style="font-family:宋体;">和其他工具上通过</span><span style="font-family:Helvetica;">JMX</span><span style="font-family:宋体;">查看指标，第三篇介绍了</span><span style="font-family: Helvetica;">Datadog</span><span style="font-family:宋体;">集成方面的知识。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://www.datadoghq.com/blog/monitoring-kafka-performance-metrics/</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Salesforce</span><span style="font-family:宋体;">撰文介绍了</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">在他们组织内的成长史。最初，他们借助</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">驱动了操作指标分析功能，渐渐地成为一个驱动众多系统的大平台。</span><span style="font-family: Helvetica;">Salesforce</span><span style="font-family:宋体;">运用</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">在多个数据中心运行，并使用</span><span style="font-family:Helvetica;">MirrorMaker</span><span style="font-family:宋体;">在集群间复制和聚合数据。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://medium.com/salesforce-engineering/expanding-visibility-with-apache-kafka-e305b12c4aba#.5k7j921o3</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Metamarkets</span><span style="font-family: 宋体;">博客有一篇关于优化大规模分布式系统的有趣博文。</span><span style="font-family:Helvetica;">Druid</span><span style="font-family:宋体;">，他们的分布式数据仓库，最近增加了一种</span><span style="font-family:Helvetica;">"</span><span style="font-family:宋体;">先进先出</span><span style="font-family:Helvetica;">"</span><span style="font-family:宋体;">的查询模式，并在重型负载大集群间进行了测试。根据他们的假设，推测任何可能发生和收集到有趣的的指标。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://metamarkets.com/2016/impact-on-query-speed-from-forced-processing-ordering-in-druid/</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Google Cloud Big Data</span><span style="font-family:宋体;">博客撰文介绍了</span><span style="font-family:Helvetica;">BigQuery</span><span style="font-family:宋体;">的内部存储格式，容器，以及其它使得存储数据更有效率的优化措施。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://cloud.google.com/blog/big-data/2016/04/inside-capacitor-bigquerys-next-generation-columnar-storage-format</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Apache Kudu</span><span style="font-family: 宋体;">（孵化中）博客概述了最近使用</span><span style="font-family: Helvetica;">YCSB</span><span style="font-family:宋体;">工具对系统性能分析和调优的结果。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://getkudu.io/2016/04/26/ycsb.html</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Impala 2.5</span><span style="font-family:宋体;">无论是</span><span style="font-family:Helvetica;">TPC</span><span style="font-family:宋体;">基准测试还是其它方面均有显著的性能提升。提升项包括运行时过滤器，</span><span style="font-family:Helvetica;">LLVM</span><span style="font-family:宋体;">代码生成器对</span><span style="font-family:Helvetica;">`SORT`</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">`DECIMAL`</span><span style="font-family:宋体;">的支持，更快的</span><span style="font-family:Helvetica;">metadata-only</span><span style="font-family:宋体;">查询，等等。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://blog.cloudera.com/blog/2016/04/apache-impala-incubating-in-cdh-5-7-4x-faster-for-bi-workloads-on-apache-hadoop/</span></p>  <p>&nbsp;</p>  <p><span style="font-family:宋体;">本文介绍了，为支持高可用性，如何对</span><span style="font-family:Helvetica;">Hive Metastore</span><span style="font-family:宋体;">配置</span><span style="font-family:Helvetica;">MariaDB</span><span style="font-family:宋体;">的。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://developer.ibm.com/hadoop/blog/2016/04/26/bigsql-ha-configure-ha-hive-metastore-db-using-mariadb10-1/</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Altiscale</span><span style="font-family:宋体;">博客撰文介绍了寻找</span><span style="font-family:Helvetica;">NodeGroup</span><span style="font-family:宋体;">相关</span><span style="font-family:Helvetica;">bug</span><span style="font-family:宋体;">的过程（跟进三月的文章）。如果你因没找到</span><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">（或其他分布式系统）的</span><span style="font-family:Helvetica;">bug</span><span style="font-family:宋体;">根结而气馁，不要叹气。本文告诉你这的确困难，甚至需要程序员在销售</span><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">服务的企业干活才能搞定。</span></p>  <p align="left"><a href="https://www.altiscale.com/blog/part-1-2-investigation-analysis-and-resolution-of-nodegroup-performance-issues-on-bare-metal-hardware-clusters/"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">https://www.altiscale.com/blog/part-1-2-investigation-analysis-and-resolution-of-nodegroup-performance-issues-on-bare-metal-hardware-clusters/</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Netflix</span><span style="font-family:宋体;">现在运行了超过</span><span style="font-family:Helvetica;">4000</span><span style="font-family:宋体;">个</span><span style="font-family:Helvetica;">Kafka </span><span style="font-family:Helvetica;">broker</span><span style="font-family:宋体;">，横跨</span><span style="font-family:Helvetica;">36</span><span style="font-family:宋体;">个集群。在云中运行</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">需要一些权衡，团队平衡了开销和数据丢失（日数据丢失小于</span><span style="font-family:Helvetica;">0.01%</span><span style="font-family:宋体;">）。本文分享了团队在</span><span style="font-family:Helvetica;">AWS</span><span style="font-family:宋体;">中运行</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">的经验，主要是一些典型问题，部署策略（小集群、隔离的</span><span style="font-family:Helvetica;">zookeeper</span><span style="font-family:宋体;">集群），集群级容错，支持</span><span style="font-family:Helvetica;">AWS availability zones</span><span style="font-family: 宋体;">，</span><span style="font-family:Helvetica;">Kafka UI</span><span style="font-family:宋体;">可视化等等。</span></p>  <p align="left"><a href="http://techblog.netflix.com/2016/04/kafka-inside-keystone-pipeline.html"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">http://techblog.netflix.com/2016/04/kafka-inside-keystone-pipeline.html</span></a></p>  <p align="left">&nbsp;</p>  <p><span style="font-family:Helvetica;">Amazon</span><span style="font-family:宋体;">大数据博客撰文介绍了如何从</span><span style="font-family: Helvetica;">Amazon EMR</span><span style="font-family:宋体;">加密数据存放在</span><span style="font-family:Helvetica;">S3</span><span style="font-family:宋体;">中。这种集成方式同时支持客户端和服务器端加密（借助于</span><span style="font-family:Helvetica;">Amazon KMS</span><span style="font-family:宋体;">）。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">http://blogs.aws.amazon.com/bigdata/post/TxBQTAF 3X7VLEP/Process-Encrypted-Data-in-Amazon-EMR-with-Amazon-S3-and-AWS-KMS</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">TubeMogul</span><span style="font-family:宋体;">介绍了他们大数据平台的历史，该平台每月支撑万亿次数据分析请求。该团队很早就运用</span><span style="font-family:Helvetica;">Amazon EMR</span><span style="font-family:宋体;">，导入了</span><span style="font-family:Helvetica;">Storm</span><span style="font-family:宋体;">实时处理技术，最终把大数据服务落在了</span><span style="font-family:Helvetica;">Qubole</span><span style="font-family:宋体;">上。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://www.tubemogul.com/engineering/the-big-data-lifecycle-at-tubemogul/</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Caffe</span><span style="font-family:宋体;">，深度学习框架，与</span><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">进行了集成</span><span style="font-family:Helvetica;">&#8212;CaffeOnSpark</span><span style="font-family:宋体;">。</span><span style="font-family:Helvetica;">MapR</span><span style="font-family:宋体;">公司撰文介绍了如何在</span><span style="font-family:Helvetica;">MapR YARN</span><span style="font-family:宋体;">上运行，文章还包括了采用的性能优化手段。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://www.mapr.com/blog/distributed-deep-learning-caffe-using-mapr-cluster</span></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">其他新闻</span></strong><strong></strong></p>  <p><span style="font-family:Helvetica;">Apache Apex</span><span style="font-family: 宋体;">，大数据流式处理和批处理系统，现在成为了</span><span style="font-family:Helvetica;">Apache</span><span style="font-family:宋体;">软件基金会的顶级项目。</span><span style="font-family:Helvetica;">Apex</span><span style="font-family:宋体;">去年</span><span style="font-family:Helvetica;">8</span><span style="font-family:宋体;">月进入孵化器。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://blogs.apache.org/foundation/entry/the_apache_ software_foundation_announces90</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Heroku Kafka</span><span style="font-family: 宋体;">，是一个分支于</span><span style="font-family:Helvetica;">Heroku</span><span style="font-family:宋体;">的</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">管理服务。最近接近发布</span><span style="font-family:Helvetica;">beta</span><span style="font-family:宋体;">版。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://blog.heroku.com/archives/2016/4/26/announcing-heroku-kafka-early-access</span></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">MapR</span><span style="font-family:宋体;">博客上的一篇文章强调为什么性别多样性是重要的，还提到了大数据论坛中的女性，本文旨在鼓励女性投身于这一领域。</span><span style="font-family:Helvetica;">&#8220;</span><span style="font-family:宋体;">大数据论坛中的女性</span><span style="font-family:Helvetica;">&#8221;</span><span style="font-family:宋体;">研讨会本周由</span><span style="font-family:Helvetica;">MapR</span><span style="font-family:宋体;">组织在圣何塞召开。</span></p>  <p><span style="font-family:Helvetica;color:#386EFF;">https://www.mapr.com/blog/case-women-big-data</span></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">产品发布</span></strong><strong></strong></p>  <p align="left"><span style="font-family:Helvetica;">StreamX</span><span style="font-family:宋体;">是一个来自</span><span style="font-family:Helvetica;">Qubole</span><span style="font-family:宋体;">的开源项目，它能从</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">拷贝数据到</span><span style="font-family:Helvetica;">Amazon S3</span><span style="font-family:宋体;">这样的目标存储中。</span><span style="font-family:Helvetica;">Qubole</span><span style="font-family:宋体;">把</span><span style="font-family:Helvetica;">StreamX</span><span style="font-family:宋体;">作为一种管理服务提供。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">http://www.qubole.com/blog/big-data/streamx/</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">SnappyData</span><span style="font-family:宋体;">是一个为</span><span style="font-family:Helvetica;">OLAP</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">OLTP</span><span style="font-family:宋体;">查询流式数据的新平台（和公司）。</span><span style="font-family:Helvetica;">SnappyData</span><span style="font-family:宋体;">由</span><span style="font-family:Helvetica;">Apache Spark</span><span style="font-family: 宋体;">和</span><span style="font-family:Helvetica;">GemFire</span><span style="font-family:宋体;">的内存存储技术驱动。</span></p>  <p align="left"><a href="http://www.infoworld.com/article/3062022/sql/apache-spark-powers-live-sql-analytics-in-snappydata.html"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">http://www.infoworld.com/article/3062022/sql/apache-spark-powers-live-sql-analytics-in-snappydata.html</span></a></p>  <p align="left"><a href="http://www.snappydata.io/"><span style="font-family:Helvetica;color:#386EFF;text-decoration: none;text-underline:none">http://www.snappydata.io/</span></a></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Apache Geode</span><span style="font-family:宋体;">（孵化中）发布了</span><span style="font-family:Helvetica;">1.0.0-incubating.M2</span><span style="font-family:宋体;">版本，它是一个分布式数据平台，瞄准高性能和低延迟。新版本提供了广域网下的点对点连接等新特性。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">http://mail-archives.apache.org/mod_mbox/incubator-geode-dev/201604.mbox/%3CCAFh%2B7k2eiK2TMGK sLqrY9CZDjxjYwiuTQ4QGUVC2s3geyJYwnA% 40mail.gmail.com%3E</span></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Apache Knox</span><span style="font-family:宋体;">发布了</span><span style="font-family:Helvetica;">0.9.0</span><span style="font-family:宋体;">版，它是</span><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">的</span><span style="font-family:Helvetica;">REST API</span><span style="font-family:宋体;">网关。新版本为</span><span style="font-family:Helvetica;">Ranger</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">Ambari</span><span style="font-family:宋体;">提供了</span><span style="font-family:Helvetica;">UI</span><span style="font-family:宋体;">界面支持，以及一些其它的提升和</span><span style="font-family:Helvetica;">bug</span><span style="font-family:宋体;">修复。</span></p>  <p align="left"><span style="font-family:Helvetica; color:#386EFF;">http://mail-archives.us.apache.org/mod_mbox/www-announce/201604.mbox/%3CCACRbFyjRF7zShb-NQ29d3FJ0hKZ57ts0Qfo31ffuNODpskwqPQ @mail.gmail.com%3E</span></p>  <p align="left">&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">活动</span></strong><strong></strong></p>  <p align="left"><span style="font-size:14.0pt;font-family:SimSun;">中国</span></p>  <p align="left"><span style="font-family:SimSun;">无</span></p><img src ="http://www.blogjava.net/rosen/aggbug/430401.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/rosen/" target="_blank">Rosen</a> 2016-05-07 23:37 <a href="http://www.blogjava.net/rosen/archive/2016/05/07/430401.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Hadoop周刊—第 167 期 </title><link>http://www.blogjava.net/rosen/archive/2016/05/03/430325.html</link><dc:creator>Rosen</dc:creator><author>Rosen</author><pubDate>Tue, 03 May 2016 02:08:00 GMT</pubDate><guid>http://www.blogjava.net/rosen/archive/2016/05/03/430325.html</guid><wfw:comment>http://www.blogjava.net/rosen/comments/430325.html</wfw:comment><comments>http://www.blogjava.net/rosen/archive/2016/05/03/430325.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/rosen/comments/commentRss/430325.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/rosen/services/trackbacks/430325.html</trackback:ping><description><![CDATA[<p align="right" style="text-align: left; line-height: 10%;"><strong><span style="font-size:16.0pt; line-height:10%">Hadoop</span></strong><strong><span style="font-size:16.0pt;line-height:10%;font-family:宋体;">周刊</span></strong><strong> </strong><strong><span style="font-size: 16.0pt;line-height:10%;font-family:宋体;">第</span></strong><strong><span style="font-size:16.0pt; line-height:10%"> 167 </span></strong><strong><span style="font-size:16.0pt;line-height:10%;font-family:宋体;">期<br /></span></strong><strong></strong></p>  <p align="right" style="text-align: right;"><div style="text-align: left;"><font face="宋体"><span style="font-size: 18.6667px; line-height: 1.86667px;"><br /></span></font></div><span style="line-height: 10%; font-size: 14pt; font-family: 宋体;"><div style="text-align: left;"><span style="font-size: 14pt; line-height: 10%;"><br />启明星辰平台和大数据整体组编译</span></div></span></p>  <p align="right" style="text-align: right;"><div style="text-align: left;"><span style="font-size: 18.6667px; line-height: 1.86667px;"><br /></span></div><span style="line-height: 10%; font-size: 14pt;"><div style="text-align: left;"><span style="line-height: 10%; font-size: 14pt;"><br />2016</span><span style="line-height: 10%; font-size: 14pt; font-family: 宋体;">年</span><span style="line-height: 10%; font-size: 14pt;">4</span><span style="line-height: 10%; font-size: 14pt; font-family: 宋体;">月</span><span style="line-height: 10%; font-size: 14pt;">25</span><span style="line-height: 10%; font-size: 14pt; font-family: 宋体;">日</span></div></span></p>  <p>&nbsp;</p>  <p><span style="font-family:宋体;">欢迎来到</span><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">周刊周一特别版。本周有大量来自</span><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Beam</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Kudu</span><span style="font-family:宋体;">的技术新闻。如果你正在寻找一些更前沿的技术，</span><span style="font-family:Helvetica;">Apache Metron</span><span style="font-family:宋体;">（孵化中）发布了它们第一个版本。</span><span style="font-family:Helvetica;">Metron</span><span style="font-family:宋体;">，是一个构建在</span><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">上正在不断发展的通用安全系统。</span></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">技术新闻</span></strong><strong></strong></p>  <p><span style="font-family:宋体;">本文介绍了如何在</span><span style="font-family:Helvetica;">AWS</span><span style="font-family:宋体;">上构建流式处理系统。包括了诸如</span><span style="font-family:Helvetica;">Amazon Kinesis </span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">AWS Lambda</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Kineses S3 connector</span><span style="font-family:宋体;">之类简单的搭配方案，也介绍了</span><span style="font-family: Helvetica;">AWS</span><span style="font-family:宋体;">实现实时分析场景这样相对复杂点的方案。</span></p>  <p align="left"><a href="http://cdn.oreillystatic.com/en/assets/1/event/144/Building%20a%20scalable%20architecture%20for%20processing%20streaming%20data%20on%20AWS%20Presentation.pdf"><span style="font-family:Helvetica;">http://cdn.oreillystatic.com/en/assets/1/event/144/Building%20a%20scalable%20architecture%20for%20processing%20streaming%20data%20on%20AWS%20Presentation.pdf</span></a></p>  <p align="left"><u>&nbsp;</u></p>  <p align="left"><span style="font-family:宋体;">本文介绍了怎样使用</span><span style="font-family:Helvetica;">Spark Testing Base</span><span style="font-family:宋体;">。</span><span style="font-family:Helvetica;">Spark Testing Base</span><span style="font-family:宋体;">是一个用</span><span style="font-family:Helvetica;">Scala</span><span style="font-family:宋体;">编写，通过</span><span style="font-family:Helvetica;">Java</span><span style="font-family:宋体;">调用的</span><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">测试框架。本文的样例代码展示了如何隔离测试逻辑重构</span><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">代码，同时还通过</span><span style="font-family:Helvetica;">Java</span><span style="font-family:宋体;">处理了一些臃肿的</span><span style="font-family:Helvetica;">Scala API</span><span style="font-family:宋体;">。</span></p>  <p align="left"><a href="http://www.jesse-anderson.com/2016/04/unit-testing-spark-with-java/"><span style="font-family:Helvetica;">http://www.jesse-anderson.com/2016/04/unit-testing-spark-with-java/</span></a></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Altiscale</span><span style="font-family:宋体;">博客概述了在</span><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">环境下，构建</span><span style="font-family:Helvetica;">thin</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">uber jar</span><span style="font-family:宋体;">包的优劣。示范了在</span><span style="font-family:Helvetica;">Maven</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">SBT</span><span style="font-family:宋体;">分别构建两种包的情况。</span></p>  <p align="left"><a href="https://www.altiscale.com/blog/spark-on-hadoop-thin-jars/"><span style="font-family:Helvetica;">https://www.altiscale.com/blog/spark-on-hadoop-thin-jars/</span></a></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">LinkedIn</span><span style="font-family:宋体;">介绍了他们的</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">生态系统，生态系统包含一个特殊的</span><span style="font-family:Helvetica;">Kafka producer</span><span style="font-family:宋体;">，一个为非</span><span style="font-family:Helvetica;">Java</span><span style="font-family:宋体;">客户端提供的</span><span style="font-family:Helvetica;">REST API</span><span style="font-family:宋体;">，一个</span><span style="font-family:Helvetica;">avro</span><span style="font-family:宋体;">模式注册表，以及</span><span style="font-family:Helvetica;">Gobblin</span><span style="font-family:宋体;">（装载数据到</span><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">的工具）等等。</span></p>  <p align="left"><a href="https://engineering.linkedin.com/blog/2016/04/kafka-ecosystem-at-linkedin"><span style="font-family:Helvetica;">https://engineering.linkedin.com/blog/2016/04/kafka-ecosystem-at-linkedin</span></a></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:宋体;">该</span><span style="font-family:Helvetica;">Spark Streaming</span><span style="font-family:宋体;">教程介绍了怎样通过</span><span style="font-family:Helvetica;">twitter4j API</span><span style="font-family:宋体;">拉推文，基于标签过滤，对推文进行情感分析。</span></p>  <p align="left"><a href="https://www.mapr.com/blog/spark-streaming-and-twitter-sentiment-analysis"><span style="font-family:Helvetica;">https://www.mapr.com/blog/spark-streaming-and-twitter-sentiment-analysis</span></a></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Apache Kudu</span><span style="font-family:宋体;">（孵化中）是</span><span style="font-family:Helvetica;">Apache Impala</span><span style="font-family:宋体;">（孵化中）的绝佳伴侣，因为它能高效地解决广泛的分析和有针对性的查询。本文描述了两者集成的技术细节，例如</span><span style="font-family:Helvetica;">Kudu</span><span style="font-family:宋体;">的设计如何保证高效地查询能力，如何通过</span><span style="font-family:Helvetica;">Impala</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">Kudu</span><span style="font-family:宋体;">执行写／更新／删除操作等等。</span></p>  <p><a href="http://blog.cloudera.com/blog/2016/04/how-to-use-impala-and-kudu-together-for-analytic-workloads/"><span style="font-family:Helvetica;">http://blog.cloudera.com/blog/2016/04/how-to-use-impala-and-kudu-together-for-analytic-workloads/</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">MapR</span><span style="font-family:宋体;">撰文介绍了使用</span><span style="font-family:Helvetica;">spark-sklearn</span><span style="font-family:宋体;">扩展一个已存在的</span><span style="font-family:Helvetica;">scikit-learn</span><span style="font-family:宋体;">模型。文章介绍了如何透过</span><span style="font-family:Helvetica;">Airbnb</span><span style="font-family:宋体;">数据集内部建模，还介绍了如何傍着</span><span style="font-family:Helvetica;">spark-sklearn</span><span style="font-family:宋体;">进行交叉验证。</span></p>  <p><a href="https://www.mapr.com/blog/predicting-airbnb-listing-prices-scikit-learn-and-apache-spark"><span style="font-family:Helvetica;">https://www.mapr.com/blog/predicting-airbnb-listing-prices-scikit-learn-and-apache-spark</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">AWS</span><span style="font-family:宋体;">大数据博客写了个如何在</span><span style="font-family: Helvetica;">Amazon EMR</span><span style="font-family:宋体;">中使用</span><span style="font-family:Helvetica;">HBase</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">Hive</span><span style="font-family:宋体;">的教程。本教程介绍了</span><span style="font-family: Helvetica;">HBase</span><span style="font-family:宋体;">，描述了如何在</span><span style="font-family:Helvetica;">S3</span><span style="font-family:宋体;">中恢复</span><span style="font-family:Helvetica;">HBase</span><span style="font-family:宋体;">表，示范了</span><span style="font-family:Helvetica;">Hive</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">HBase</span><span style="font-family:宋体;">如何集成等等。</span></p>  <p><a href="http://blogs.aws.amazon.com/bigdata/post/Tx3EGE8Z90LZ9WX/Combine-NoSQL-and-Massively-Parallel-Analytics-Using-Apache-HBase-and-Apache-Hiv"><span style="font-family:Helvetica;">http://blogs.aws.amazon.com/bigdata/post/Tx3EGE8Z90LZ9WX/Combine-NoSQL-and-Massively-Parallel-Analytics-Using-Apache-HBase-and-Apache-Hiv</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:宋体;">本文描述了为学生在大数据课程上提供实战经验的挑战。作者经历若干次的迭代和选择似乎有了一个好方案</span><span style="font-family:Helvetica;">&#8212;</span> <span style="font-family:Helvetica;">Altiscale</span><span style="font-family:宋体;">的</span><span style="font-family:Helvetica;">Hadoop-as-a-Service</span><span style="font-family:宋体;">。</span></p>  <p><a href="https://www.altiscale.com/blog/hadoop-as-a-service-in-the-classroom/"><span style="font-family:Helvetica;">https://www.altiscale.com/blog/hadoop-as-a-service-in-the-classroom/</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Cloudera</span><span style="font-family:宋体;">博客的一篇客做文章，作者比较了</span><span style="font-family:Helvetica;">Parquet</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">Avro</span><span style="font-family:宋体;">在跨两个数据集的不同处理方式（一个数据集窄</span><span style="font-family:Helvetica;">(3</span><span style="font-family:宋体;">列</span><span style="font-family:Helvetica;">)</span><span style="font-family:宋体;">、一个数据集宽</span><span style="font-family:Helvetica;">(103</span><span style="font-family:宋体;">列</span><span style="font-family:Helvetica;">)</span><span style="font-family:宋体;">）。在用</span><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">Spark SQL</span><span style="font-family:宋体;">测试查询／操作后，作者发现</span><span style="font-family: Helvetica;">Parquet</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">Avro</span><span style="font-family:宋体;">在查询序列化数据方面有时表现很类似，尽管在大多数情况下查询</span><span style="font-family:Helvetica;">Parquet</span><span style="font-family:宋体;">数据的时候更快点（序列化数据更小）。</span></p>  <p><a href="http://blog.cloudera.com/blog/2016/04/benchmarking-apache-parquet-the-allstate-experience/"><span style="font-family:Helvetica;">http://blog.cloudera.com/blog/2016/04/benchmarking-apache-parquet-the-allstate-experience/</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:宋体;">本文介绍了如何在</span><span style="font-family:Helvetica;">CDH</span><span style="font-family:宋体;">这样的分布式环境中使用</span><span style="font-family:Helvetica;">SparkR</span><span style="font-family:宋体;">，尽管</span><span style="font-family:Helvetica;">SparkR</span><span style="font-family:宋体;">官方还没有支持这种方式。借助</span><span style="font-family:Helvetica;">YARN</span><span style="font-family:宋体;">在</span><span style="font-family:Helvetica;">worker</span><span style="font-family:宋体;">本地安装</span><span style="font-family:Helvetica;">R</span><span style="font-family:宋体;">语言包，</span><span style="font-family:Helvetica;">job</span><span style="font-family:宋体;">稍加改造就能执行了。</span></p>  <p><a href="http://www.nodalpoint.com/sparkr-in-cloudera-hadoop/"><span style="font-family:Helvetica;">http://www.nodalpoint.com/sparkr-in-cloudera-hadoop/</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:宋体;">很多开源框架都能执行</span><span style="font-family:Helvetica;">MapReduce</span><span style="font-family:宋体;">以及借助更高级的编程模型完成类似的工作。纵观过去，它们依赖独立运行的框架（例如</span><span style="font-family:Helvetica;">MapReduce, Storm</span><span style="font-family:宋体;">），但是最近的某些变化使得这一切充满了变数。</span><span style="font-family:Helvetica;">Apache Beam</span><span style="font-family:宋体;">（孵化中）更进一步地跨越了批处理、流式处理两种执行模式，内置更加复杂的计算模型。</span></p>  <p><a href="http://www.datanami.com/2016/04/22/apache-beam-emerges-ambitious-goal-unify-big-data-development/"><span style="font-family:Helvetica;">http://www.datanami.com/2016/04/22/apache-beam-emerges-ambitious-goal-unify-big-data-development/</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Apache</span><span style="font-family:宋体;">博客发布了</span><span style="font-family:Helvetica;">HBase</span><span style="font-family:宋体;">在</span><span style="font-family:Helvetica;">HDD</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">SSD</span><span style="font-family:宋体;">以及</span><span style="font-family:Helvetica;">RAMDISK</span><span style="font-family:宋体;">上的写入性能测试比对的</span><span style="font-family: Helvetica;">7</span><span style="font-family:宋体;">篇系列文章。通过这一分析，作者发现并提议在</span><span style="font-family:Helvetica;">HBase</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">HDFS</span><span style="font-family:宋体;">上实现一些未覆盖的功能。</span></p>  <p><a href="https://blogs.apache.org/hbase/entry/hdfs_hsm_and_hbase_part"><span style="font-family:Helvetica;">https://blogs.apache.org/hbase/entry/hdfs_hsm_and_hbase_part</span></a></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">其他新闻</span></strong><strong></strong></p>  <p><span style="font-family:Helvetica;">Tom White</span><span style="font-family:宋体;">，</span><span style="font-family:Helvetica;">&#8220;Hadoop</span><span style="font-family:宋体;">权威指南</span><span style="font-family:Helvetica;">&#8221;</span><span style="font-family:宋体;">的作者撰文介绍他是如何步入</span><span style="font-family:Helvetica;">Apache Hadoop</span><span style="font-family:宋体;">殿堂的。他的早期贡献是绕着</span><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">与</span><span style="font-family:Helvetica;">Amazon Web Services</span><span style="font-family:宋体;">集成展开，而今</span><span style="font-family:Helvetica;">AWS</span><span style="font-family:宋体;">已成为</span><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">项目成功的重要部分。</span></p>  <p><a href="http://vision.cloudera.com/how-i-got-into-hadoop/"><span style="font-family:Helvetica;">http://vision.cloudera.com/how-i-got-into-hadoop/</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Fluo</span><span style="font-family:宋体;">，为</span><span style="font-family:Helvetica;">Apache Accumulo</span><span style="font-family:宋体;">准备的分布式处理引擎，向</span><span style="font-family:Helvetica;">Apache</span><span style="font-family:宋体;">孵化器提交了孵化申请。</span></p>  <p><a href="https://wiki.apache.org/incubator/FluoProposal"><span style="font-family: Helvetica;">https://wiki.apache.org/incubator/FluoProposal</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Apache Phoenix</span><span style="font-family: 宋体;">宣布将在</span><span style="font-family:Helvetica;">HBaseCon</span><span style="font-family:宋体;">后举行会议，</span><span style="font-family:Helvetica;">Apache Phoenix</span><span style="font-family:宋体;">是一个</span><span style="font-family:Helvetica;">SQL-on-HBase</span><span style="font-family:宋体;">系统。该会议只有半天，主题是介绍</span><span style="font-family:Helvetica;">Phoenix</span><span style="font-family:宋体;">内部情况和用例。</span></p>  <p><a href="http://hortonworks.com/blog/announcing-first-annual-phoenixcon-apache-phoenix-user-conference/"><span style="font-family:Helvetica;">http://hortonworks.com/blog/announcing-first-annual-phoenixcon-apache-phoenix-user-conference/</span></a></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">产品发布</span></strong><strong></strong></p>  <p align="left"><span style="font-family:Helvetica;">Apache Metron</span><span style="font-family:宋体;">，构建于</span><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">上的安全框架，发布了</span><span style="font-family:Helvetica;">0.1</span><span style="font-family:宋体;">版。</span><span style="font-family:Helvetica;">Hortonworks</span><span style="font-family: 宋体;">支撑其作为技术预览版，并撰写本文介绍了如何上手，如何贡献，如何使用</span><span style="font-family:Helvetica;">Metron UI</span><span style="font-family:宋体;">等等。</span></p>  <p align="left"><a href="http://hortonworks.com/blog/apache-metron-tech-preview-1-come-get/"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">http://hortonworks.com/blog/apache-metron-tech-preview-1-come-get/</span></a></p>  <p align="left"><a href="http://hortonworks.com/blog/apache-metron-use-case-finding-needle-haystack/"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">http://hortonworks.com/blog/apache-metron-use-case-finding-needle-haystack/</span></a></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Apache NiFi</span><span style="font-family:宋体;">本周发布了</span><span style="font-family:Helvetica;">0.6.1</span><span style="font-family:宋体;">版。这是修复了</span><span style="font-family:Helvetica;">10</span><span style="font-family:宋体;">多个</span><span style="font-family:Helvetica;">bug</span><span style="font-family:宋体;">后的修复版。</span></p>  <p align="left"><a href="http://mail-archives.us.apache.org/mod_mbox/www-announce/201604.mbox/%3CCALJK9a7yLnFeJ7Z=eU6mOB-DXvo8MHUr=_RshSjZcTbTcAHDZA@mail.gmail.com%3E"><span style="font-family:Helvetica;">http://mail-archives.us.apache.org/mod_mbox/www-announce/201604.mbox/%3CCALJK9a7yLnFeJ7Z=eU6mOB-DXvo8MHUr=_RshSjZcTbTcAHDZA@mail.gmail.com%3E</span></a></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Apache Flink</span><span style="font-family:宋体;">本周发布了</span><span style="font-family:Helvetica;">1.0.2</span><span style="font-family:宋体;">版。本次发布包括了</span><span style="font-family:Helvetica;">bug</span><span style="font-family:宋体;">修复，</span><span style="font-family:Helvetica;">RocksDB</span><span style="font-family:宋体;">环境下的性能提升以及一些文档方面的进步。</span></p>  <p align="left"><a href="http://flink.apache.org/news/2016/04/22/release-1.0.2.html"><span style="font-family:Helvetica;">http://flink.apache.org/news/2016/04/22/release-1.0.2.html</span></a></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Amazon</span><span style="font-family:宋体;">发布了新版</span><span style="font-family:Helvetica;">Amazon EMR</span><span style="font-family:宋体;">，开始支持</span><span style="font-family:Helvetica;">HBase 1.2</span><span style="font-family:宋体;">。</span></p>  <p align="left"><a href="https://aws.amazon.com/blogs/aws/amazon-emr-update-apache-hbase-1-2-is-now-available/"><span style="font-family:Helvetica;">https://aws.amazon.com/blogs/aws/amazon-emr-update-apache-hbase-1-2-is-now-available/</span></a></p>  <p align="left">&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">活动</span></strong><strong></strong></p>  <p align="left"><span style="font-size:14.0pt;font-family:SimSun;">中国</span></p>  <p align="left"><span style="font-family:SimSun;">无</span></p><img src ="http://www.blogjava.net/rosen/aggbug/430325.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/rosen/" target="_blank">Rosen</a> 2016-05-03 10:08 <a href="http://www.blogjava.net/rosen/archive/2016/05/03/430325.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Hadoop周刊—第 166 期</title><link>http://www.blogjava.net/rosen/archive/2016/04/21/430176.html</link><dc:creator>Rosen</dc:creator><author>Rosen</author><pubDate>Thu, 21 Apr 2016 07:07:00 GMT</pubDate><guid>http://www.blogjava.net/rosen/archive/2016/04/21/430176.html</guid><wfw:comment>http://www.blogjava.net/rosen/comments/430176.html</wfw:comment><comments>http://www.blogjava.net/rosen/archive/2016/04/21/430176.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/rosen/comments/commentRss/430176.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/rosen/services/trackbacks/430176.html</trackback:ping><description><![CDATA[<p><strong><span style="font-size:16.0pt">Hadoop</span></strong><strong><span style="font-size:16.0pt;font-family: 宋体;">周刊</span></strong><strong> </strong><strong><span style="font-size:16.0pt;font-family:宋体;">第</span></strong><strong><span style="font-size:16.0pt"> 166 </span></strong><strong><span style="font-size: 16.0pt;font-family:宋体;">期</span></strong><strong></strong></p>  <p><span style="font-size:14.0pt">2016</span><span style="font-size:14.0pt;font-family:宋体;">年</span><span style="font-size:14.0pt">4</span><span style="font-size:14.0pt; font-family:宋体;">月</span><span style="font-size:14.0pt">17</span><span style="font-size:14.0pt;font-family: 宋体;">日</span></p>  <p>启明星辰&#8212;&#8212;平台和大数据整体组编译&nbsp;<br /><br /></p>  <p><span style="font-family:Helvetica;">Hortonworks</span><span style="font-family: 宋体;">在本周</span><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">欧洲峰会上有若干爆料，贯穿了本期整个内容。伴随着骄人的新特性，</span><span style="font-family:Helvetica;">Apache Storm</span><span style="font-family:宋体;">发布了</span><span style="font-family:Helvetica;">1.0.0</span><span style="font-family:宋体;">版。在技术新闻方面，有不少基于</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">构建大规模服务和分布式系统测试的文章。如果你错过了</span><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">峰会，那么不用担心，演讲视频已经放到了网上。</span></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">技术新闻</span></strong><strong></strong></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Smyte</span><span style="font-family:宋体;">撰文介绍了他们基于事件数据流实时检测垃圾邮件和诈骗信息的基础设施。最初的事件处理系统构建在</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Redis</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Secor</span><span style="font-family:宋体;">以及</span><span style="font-family:Helvetica;">S3</span><span style="font-family:宋体;">上，为了满足规模不断扩张和廉价的要求，他们把系统迁移到基于磁盘的方案上，使用</span><span style="font-family:Helvetica;">Redis</span><span style="font-family:宋体;">协议与</span><span style="font-family:Helvetica;">RocksDB</span><span style="font-family:宋体;">交互，使用</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">进行复制。</span></p>  <p><a href="https://medium.com/the-smyte-blog/counting-with-domain-specific-databases-73c660472da"><span style="font-family:Helvetica;">https://medium.com/the-smyte-blog/counting-with-domain-specific-databases-73c660472da</span></a></p>  <p><u>&nbsp;</u></p>  <p><span style="font-family:宋体;">本文把</span><span style="font-family:Helvetica;">rsyslog</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">AWS </span><span style="font-family:宋体;">与</span><span style="font-family:Helvetica;">ELK</span><span style="font-family:宋体;">栈（</span><span style="font-family:Helvetica;">ElasticSearch</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Logstash</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Kibana</span><span style="font-family:宋体;">）结合，处理诸如反压、规模以及维护方面的问题。本文覆盖了</span><span style="font-family:Helvetica;">rsyslog</span><span style="font-family:宋体;">集成</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">以及</span><span style="font-family:Helvetica;">schema</span><span style="font-family:宋体;">方面的技巧，也介绍了如何运行</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Zookeeper</span><span style="font-family:宋体;">以及</span><span style="font-family:Helvetica;">AWS</span><span style="font-family:宋体;">中大规模自动分组。</span></p>  <p><a href="https://www.bashton.com/blog/2016/elk-on-ark/"><span style="font-family: Helvetica;">https://www.bashton.com/blog/2016/elk-on-ark/</span></a></p>  <p><u>&nbsp;</u></p>  <p><span style="font-family:Helvetica;">Hortonworks</span><span style="font-family: 宋体;">撰文介绍了</span><span style="font-family:Helvetica;">Apache Atlas</span><span style="font-family:宋体;">以及</span><span style="font-family:Helvetica;">Apache Range</span><span style="font-family:宋体;">将要引入的数据管理特性。这些特性是：分类访问控制、数据有效期策略、位置特性策略、禁止数据集组合、跨组件家族（例如从</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">到</span><span style="font-family:Helvetica;">Storm</span><span style="font-family:宋体;">再到</span><span style="font-family:Helvetica;">Hive</span><span style="font-family:宋体;">的数据跟踪）。</span></p>  <p><a href="http://hortonworks.com/blog/the-next-generation-of-hadoop-based-security-data-governance/"><span style="font-family:Helvetica;">http://hortonworks.com/blog/the-next-generation-of-hadoop-based-security-data-governance/</span></a></p>  <p><u>&nbsp;</u></p>  <p><span style="font-family:Helvetica;">Apache HAWQ </span><span style="font-family: 宋体;">（孵化中）是一个基于</span><span style="font-family: Helvetica;">Greenplum</span><span style="font-family:宋体;">在</span><span style="font-family:Helvetica;">HDFS</span><span style="font-family:宋体;">上提供数据查询的</span><span style="font-family:Helvetica;">SQL</span><span style="font-family:宋体;">引擎。本文讨论了其典型设计以及新版本的诸多改进。包括它与</span><span style="font-family: Helvetica;">Spark</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">MapReduce</span><span style="font-family:宋体;">的区别，还有些</span><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">挑战经典</span><span style="font-family:Helvetica;">MPP</span><span style="font-family:宋体;">设计的内容，以及</span><span style="font-family:Helvetica;">HAWQ</span><span style="font-family:宋体;">的新设计怎样结合</span><span style="font-family:Helvetica;">MPP</span><span style="font-family:宋体;">和批处理技术进而使其两者兼顾。</span></p>  <p><a href="https://blog.pivotal.io/big-data-pivotal/products/apache-hawq-next-step-in-massively-parallel-processing"><span style="font-family:Helvetica;">https://blog.pivotal.io/big-data-pivotal/products/apache-hawq-next-step-in-massively-parallel-processing</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Cloudera</span><span style="font-family:宋体;">博客撰文介绍了对</span><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">分布式系统进行故障注入、组网的测试工具</span><span style="font-family:Helvetica;">AgenTEST</span><span style="font-family:宋体;">。它能注入网络故障（例如丢包），资源满载（例如</span><span style="font-family:Helvetica;">CPU</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">IO</span><span style="font-family:宋体;">、磁盘空间）等等。当测试网络分区时，可以评估环形组网、桥接组网等等。</span></p>  <p><a href="http://blog.cloudera.com/blog/2016/04/quality-assurance-at-cloudera-fault-injection-and-elastic-partitioning/"><span style="font-family:Helvetica;">http://blog.cloudera.com/blog/2016/04/quality-assurance-at-cloudera-fault-injection-and-elastic-partitioning/</span></a></p>  <p><u>&nbsp;</u></p>  <p><span style="font-family:Helvetica;">Hortonworks</span><span style="font-family: 宋体;">博客展望了将包含新版本</span><span style="font-family: Helvetica;">Spark</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">Zeppelin</span><span style="font-family:宋体;">的</span><span style="font-family:Helvetica;">HDP 2.4.2</span><span style="font-family:宋体;">。</span><span style="font-family:Helvetica;">Spark2.0</span><span style="font-family:宋体;">预览版和</span><span style="font-family:Helvetica;">Zeppelin</span><span style="font-family:宋体;">新特性都将包含在内。</span></p>  <p><a href="http://hortonworks.com/blog/apache-spark-apache-zeppelin-whats-coming-in-hdp-2-4-2/"><span style="font-family:Helvetica;">http://hortonworks.com/blog/apache-spark-apache-zeppelin-whats-coming-in-hdp-2-4-2/</span></a></p>  <p><u>&nbsp;</u></p>  <p><span style="font-family:Helvetica;">Cask</span><span style="font-family:宋体;">撰文介绍了在</span><span style="font-family:Helvetica;">Hbase region compaction</span><span style="font-family:宋体;">这样罕见事件发生的前后，他们是怎样通过长时间测试以评估分布式系统正确性的。</span></p>  <p><a href="http://blog.cask.co/2016/04/long-running-tests-in-cdap/"><span style="font-family:Helvetica;">http://blog.cask.co/2016/04/long-running-tests-in-cdap/</span></a></p>  <p><u>&nbsp;</u></p>  <p><span style="font-family:宋体;">本文介绍了如何结合</span><span style="font-family:Helvetica;">SparkR</span><span style="font-family:宋体;">与亚马逊</span><span style="font-family:Helvetica;">EMR</span><span style="font-family:宋体;">进行地理空间分析的。通过</span><span style="font-family: Helvetica;">SparkR</span><span style="font-family:宋体;">的</span><span style="font-family:Helvetica;">Hive</span><span style="font-family:宋体;">集成组件，可以立刻基于</span><span style="font-family:Helvetica;">S3</span><span style="font-family:宋体;">上的数据映射</span><span style="font-family:Helvetica;">Hive</span><span style="font-family:宋体;">外部表。从这开始，数据就能直接加载到内存中使用</span><span style="font-family:Helvetica;">R</span><span style="font-family:宋体;">语言分析，很容易实现高质量的数据可视化。</span></p>  <p><a href="http://blogs.aws.amazon.com/bigdata/post/Tx1MECZ47VAV84F/Exploring-Geospatial-Intelligence-using-SparkR-on-Amazon-EMR"><span style="font-family:Helvetica;">http://blogs.aws.amazon.com/bigdata/post/Tx1MECZ47VAV84F/Exploring-Geospatial-Intelligence-using-SparkR-on-Amazon-EMR</span></a></p>  <p><u>&nbsp;</u></p>  <p><span style="font-family:Helvetica;">MapR</span><span style="font-family:宋体;">编写了使用</span><span style="font-family:Helvetica;">Pig</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">Hive</span><span style="font-family:宋体;">分析职业棒球大联盟球队水平的教程。</span><span style="font-family:Helvetica;">Pig</span><span style="font-family:宋体;">用于数据初加工，</span><span style="font-family:Helvetica;">Hive</span><span style="font-family:宋体;">提供基于</span><span style="font-family:Helvetica;">SQL</span><span style="font-family:宋体;">的数据查询环境。借助</span><span style="font-family:Helvetica;">Hive ODBC</span><span style="font-family:宋体;">驱动和</span><span style="font-family:Helvetica;">Hive</span><span style="font-family:宋体;">服务器，使得微软</span><span style="font-family:Helvetica;">Excel</span><span style="font-family:宋体;">也能用于获取和分析数据。</span></p>  <p><a href="https://www.mapr.com/blog/using-hive-and-pig-baseball-statistics"><span style="font-family:Helvetica;">https://www.mapr.com/blog/using-hive-and-pig-baseball-statistics</span></a></p>  <p><u>&nbsp;</u></p>  <p><span style="font-family:Helvetica;">SignalFX</span><span style="font-family:宋体;">通过</span><span style="font-family:Helvetica;">27</span><span style="font-family:宋体;">节点的</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">集群每天处理</span><span style="font-family:Helvetica;">700</span><span style="font-family:宋体;">多亿条消息。只有基于他们积累的大规模</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">使用经验才能有如此高的量，因此他们共享了不少调试</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">的技巧，定位告警（例如日志刷新延迟增加），以及</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">横向扩展。</span></p>  <p><a href="http://www.confluent.io/blog/how-we-monitor-and-run-kafka-at-scale-signalfx"><span style="font-family:Helvetica;">http://www.confluent.io/blog/how-we-monitor-and-run-kafka-at-scale-signalfx</span></a></p>  <p><u>&nbsp;</u></p>  <p><span style="font-family:Helvetica;">dataArtisan's</span><span style="font-family: 宋体;">博客为了度量</span><span style="font-family:Helvetica;">Flink</span><span style="font-family:宋体;">在数据流效率、低延迟、正确性上的能力，专门写了这篇文章。为了证明效率，在高吞吐量的环境下运行了最新的</span><span style="font-family:Helvetica;">Yahoo!</span><span style="font-family:宋体;">流式基准测试程序。在正确性方面，文章突出了</span><span style="font-family:Helvetica;">Flink</span><span style="font-family:宋体;">事件判别和处理事件（星球大战电影年表做类比）方面的优势。最后，文章描述了</span><span style="font-family:Helvetica;">Flink</span><span style="font-family:宋体;">未来版本基于内存的查询任务。</span></p>  <p><a href="http://data-artisans.com/counting-in-streams-a-hierarchy-of-needs/"><span style="font-family:Helvetica;">http://data-artisans.com/counting-in-streams-a-hierarchy-of-needs/</span></a></p>  <p><strong>&nbsp;</strong></p>  <p><span style="font-family:宋体;">本教程介绍了怎样把</span>TCP Socket<span style="font-family:宋体;">中的文本数据流转换为</span>Spark<span style="font-family:宋体;">流式数据源。</span></p>  <p align="left"><a href="https://medium.com/@anicolaspp/spark-custom-streaming-sources-e7d52da72e80"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">https://medium.com/@anicolaspp/spark-custom-streaming-sources-e7d52da72e80</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:宋体;">本文介绍了在构建</span><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">的时候怎样防止</span><span style="font-family:Helvetica;">AWS</span><span style="font-family:宋体;">证书</span><span style="font-family:宋体;">意外提交到补丁或</span><span style="font-family:Helvetica;">git</span><span style="font-family:宋体;">资源库。除</span><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">本身外，本文还建议使用</span><span style="font-family: Helvetica;">&#8220;git-secrets&#8221;</span><span style="font-family:宋体;">工具防止意外提交访问</span><span style="font-family:Helvetica;">/</span><span style="font-family:宋体;">安全密钥。如果你用的是</span><span style="font-family:Helvetica;">Hadoop S3</span><span style="font-family:宋体;">，还推荐了新补丁供评估。</span></p>  <p><a href="http://steveloughran.blogspot.co.uk/2016/04/testing-against-s3-and-object-stores.html"><span style="font-family:Helvetica;">http://steveloughran.blogspot.co.uk/2016/04/testing-against-s3-and-object-stores.html</span></a></p>  <p><u>&nbsp;</u></p>  <p><span style="font-family:Helvetica;">Big Data &amp; Brews</span><span style="font-family:宋体;">采访了</span><span style="font-family:Helvetica;">MapR</span><span style="font-family:宋体;">的</span><span style="font-family:Helvetica;">Ted Dunning</span><span style="font-family: 宋体;">和</span><span style="font-family:Helvetica;">Jacques Nadeau</span><span style="font-family:宋体;">。</span><span style="font-family:Helvetica;">Apache Arrow</span><span style="font-family:宋体;">也在本次采访范围内。</span></p>  <p align="left"><a href="https://www.youtube.com/watch?v=l3mDDKjDjMk"><span style="font-family: Helvetica;color:#386EFF; text-decoration:none;text-underline:none">https://www.youtube.com/watch?v=l3mDDKjDjMk</span></a></p>  <p align="left"><a href="https://www.youtube.com/watch?v=Xo9CO0a0VJI"><span style="font-family: Helvetica;color:#386EFF; text-decoration:none;text-underline:none">https://www.youtube.com/watch?v=Xo9CO0a0VJI</span></a></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">其他新闻</span></strong><strong></strong></p>  <p><span style="font-family:Helvetica;">DataEngConf</span><span style="font-family: 宋体;">最近在旧金山召开。本文总结了</span><span style="font-family: Helvetica;">Uber</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Stripe</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Microsoft</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Instacart</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Jawbone</span><span style="font-family:宋体;">的发言内容。也介绍了会议主题</span><span style="font-family:Helvetica;">&#8220;</span><span style="font-family:宋体;">数据科学在现实世界中是一个产品和工程学科</span><span style="font-family:Helvetica;">&#8221;</span><span style="font-family:宋体;">。</span></p>  <p><a href="https://medium.com/@eugmandel/software-engineering-invades-data-science-notes-from-dataengconf-4a3c066b081f#.g2h0duo44"><span style="font-family:Helvetica;">https://medium.com/@eugmandel/software-engineering-invades-data-science-notes-from-dataengconf-4a3c066b081f#.g2h0duo44</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Hortonworks</span><span style="font-family: 宋体;">在上周都柏林举行的</span><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">欧洲峰会上大放异彩。</span><span style="font-family:Helvetica;">ZDNet</span><span style="font-family:宋体;">报导了这些亮点，其中包括与</span><span style="font-family:Helvetica;">Pivotal</span><span style="font-family:宋体;">（已转售给</span><span style="font-family:Helvetica;">HDP</span><span style="font-family:宋体;">）的扩展合作，与</span><span style="font-family:Helvetica;">Syncosrt</span><span style="font-family:宋体;">的转售协议，以及</span><span style="font-family:Helvetica;">Atlas</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Ranger</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Zeppelin</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Metron</span><span style="font-family:宋体;">的技术预览。报导还介绍了</span><span style="font-family: Helvetica;">Hortonworks</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Cloudera</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">MapR</span><span style="font-family:宋体;">产品的不同之处。</span></p>  <p><a href="http://www.zdnet.com/article/hortonworks-announces-new-alliances-and-releases-hadoop-comes-to-fork-in-road/"><span style="font-family:Helvetica;">http://www.zdnet.com/article/hortonworks-announces-new-alliances-and-releases-hadoop-comes-to-fork-in-road/</span></a></p>  <p><u>&nbsp;</u></p>  <p><span style="font-family:Helvetica;">Flink 2016</span><span style="font-family:宋体;">峰会将在九月于德国柏林举行。讨论议题征集将于六月末结束。</span></p>  <p><a href="http://flink.apache.org/news/2016/04/14/flink-forward-announce.html"><span style="font-family:Helvetica;">http://flink.apache.org/news/2016/04/14/flink-forward-announce.html</span></a></p>  <p><u>&nbsp;</u></p>  <p><span style="font-family:Helvetica;">YouTube</span><span style="font-family:宋体;">上发布了</span><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">都柏林峰会演讲视频。正如预期的那样，这些演讲内容涵盖</span><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">生态系统的各个部分。</span></p>  <p><a href="https://www.youtube.com/channel/UCAPa-K_rhylDZAUHVxqqsRA/videos?flow=list&amp;live_view=500&amp;view=0&amp;sort=dd">https://www.youtube.com/channel/UCAPa-K_rhylDZAUHVxqqsRA/videos?flow=list&amp;live_view=500&amp;view=0&amp;sort=dd</a></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">产品发布</span></strong><strong></strong></p>  <p><span style="font-family:Helvetica;">Metascope</span><span style="font-family:宋体;">是一个配合</span><span style="font-family:Helvetica;">Schedoscope</span><span style="font-family:宋体;">在</span><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">集群中进行元数据管理的新工具。通过</span><span style="font-family:Helvetica;">web</span><span style="font-family:宋体;">界面，利用数据沿袭它能洞察大量的数据。也提供检索、内嵌文档、</span><span style="font-family: Helvetica;">REST API</span><span style="font-family:宋体;">等等功能。</span></p>  <p><a href="https://github.com/ottogroup/metascope"><span style="font-family:Helvetica;">https://github.com/ottogroup/metascope</span></a></p>  <p><u>&nbsp;</u></p>  <p><span style="font-family:Helvetica;">Apache HBase 1.2.1</span><span style="font-family:宋体;">于本周发布，在</span><span style="font-family:Helvetica;">1.2.0</span><span style="font-family:宋体;">的基础上解决了</span><span style="font-family:Helvetica;">27</span><span style="font-family:宋体;">个问题。发布声明中重点介绍了四个高优先级的问题。</span></p>  <p><a href="http://mail-archives.us.apache.org/mod_mbox/www-announce/201604.mbox/%3CCAN5cbe7-T5uAYvGRbxw2dfvdbwe5s0nx3vKU8Nt2fzXbKPoQTg@mail.gmail.com%3E"><span style="font-family:Helvetica;">http://mail-archives.us.apache.org/mod_mbox/www-announce/201604.mbox/%3CCAN5cbe7-T5uAYvGRbxw2dfvdbwe5s0nx3vKU8Nt2fzXbKPoQTg@mail.gmail.com%3E</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Apache Mahout</span><span style="font-family: 宋体;">机器学习库发布了</span><span style="font-family:Helvetica;">0.12.0</span><span style="font-family:宋体;">版。该版本的</span><span style="font-family:Helvetica;">&#8220;Samsara&#8221;</span><span style="font-family:宋体;">数学环境开始支持</span><span style="font-family:Helvetica;">Apache Flink</span><span style="font-family: 宋体;">了，并且是平台无关的。发布声明中分享了与</span><span style="font-family:Helvetica;">Flink</span><span style="font-family:宋体;">集成、已知问题、项目演进计划相关的内容。</span></p>  <p><a href="http://mail-archives.us.apache.org/mod_mbox/www-announce/201604.mbox/%3CCAOtpBjj5An876PStdn5kMeaF+up-B72WTmCk9j21EXdP=JOCUA@mail.gmail.com%3E"><span style="font-family:Helvetica;">http://mail-archives.us.apache.org/mod_mbox/www-announce/201604.mbox/%3CCAOtpBjj5An876PStdn5kMeaF+up-B72WTmCk9j21EXdP=JOCUA@mail.gmail.com%3E</span></a></p>  <p><u>&nbsp;</u></p>  <p><span style="font-family:Helvetica;">Apache Storm 1.0.0</span><span style="font-family:宋体;">本周发布了。亮点包括性能提升（普遍提升</span><span style="font-family:Helvetica;">3</span><span style="font-family:宋体;">倍以上）、新的分布式缓存</span><span style="font-family:Helvetica;">API</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">nimbus</span><span style="font-family:宋体;">的高可用性、自动反压、动态</span><span style="font-family:Helvetica;">worker</span><span style="font-family:宋体;">性能分析等等。</span></p>  <p><a href="http://storm.apache.org/2016/04/12/storm100-released.html"><span style="font-family:Helvetica;">http://storm.apache.org/2016/04/12/storm100-released.html</span></a></p>  <p><u>&nbsp;</u></p>  <p><span style="font-family:Helvetica;">Apache Kudu</span><span style="font-family: 宋体;">（孵化中）本周发布了</span><span style="font-family: Helvetica;">0.8.0</span><span style="font-family:宋体;">版。本次发布添加了</span><span style="font-family:Helvetica;">Apache Flume sink</span><span style="font-family:宋体;">、部分功能提升、修复了一批</span><span style="font-family: Helvetica;">bug</span><span style="font-family:宋体;">。</span></p>  <p><a href="http://getkudu.io/releases/0.8.0/docs/release_notes.html"><span style="font-family:Helvetica;">http://getkudu.io/releases/0.8.0/docs/release_notes.html</span></a></p>  <p><u>&nbsp;</u></p>  <p align="left"><span style="font-family:Helvetica;">Cloudbreak</span><span style="font-family:宋体;">本周发布了</span><span style="font-family:Helvetica;">1.2</span><span style="font-family:宋体;">版，它为云环境提供</span><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">集群</span><span style="font-family:Helvetica;">Docker</span><span style="font-family:宋体;">。新特性包括支持</span><span style="font-family:Helvetica;">OpenStack</span><span style="font-family:宋体;">以及为自定义服务器提供配置脚本。</span></p>  <p align="left"><a href="http://hortonworks.com/blog/announcing-cloudbreak-1-2/"><span style="font-family:Helvetica;">http://hortonworks.com/blog/announcing-cloudbreak-1-2/</span></a></p>  <p align="left"><u>&nbsp;</u></p>  <p align="left"><span style="font-family:Helvetica;">Cloudera</span><span style="font-family:宋体;">发布了</span><span style="font-family:Helvetica;">Cloudera Enterprise 5.4.10</span><span style="font-family:宋体;">，内置了</span><span style="font-family:Helvetica;">Flume</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">HBase</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Hive</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Impala</span><span style="font-family:宋体;">等组件。</span></p>  <p align="left"><a href="http://community.cloudera.com/t5/Community-News-Release/ANNOUNCE-Cloudera-Enterprise-5-4-10-Released/m-p/39790#U39790"><span style="font-family:Helvetica;">http://community.cloudera.com/t5/Community-News-Release/ANNOUNCE-Cloudera-Enterprise-5-4-10-Released/m-p/39790#U39790</span></a></p>  <p align="left"><u>&nbsp;</u></p>  <p align="left"><span style="font-family:Helvetica;">Presto Accumulo</span><span style="font-family:宋体;">是个新项目，为</span><span style="font-family:Helvetica;">Accumulo</span><span style="font-family:宋体;">读写数据提供了</span><span style="font-family:Helvetica;">Presto</span><span style="font-family:宋体;">连接器。</span></p>  <p align="left"><a href="https://github.com/bloomberg/presto-accumulo"><span style="font-family:Helvetica;">https://github.com/bloomberg/presto-accumulo</span></a></p>  <p align="left">&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">活动</span></strong><strong></strong></p>  <p align="left"><span style="font-size:14.0pt;font-family:SimSun;">中国</span></p>  <p align="left"><span style="font-family:SimSun;">无</span></p><img src ="http://www.blogjava.net/rosen/aggbug/430176.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/rosen/" target="_blank">Rosen</a> 2016-04-21 15:07 <a href="http://www.blogjava.net/rosen/archive/2016/04/21/430176.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Hadoop周刊—第 165 期</title><link>http://www.blogjava.net/rosen/archive/2016/04/14/430099.html</link><dc:creator>Rosen</dc:creator><author>Rosen</author><pubDate>Thu, 14 Apr 2016 10:02:00 GMT</pubDate><guid>http://www.blogjava.net/rosen/archive/2016/04/14/430099.html</guid><wfw:comment>http://www.blogjava.net/rosen/comments/430099.html</wfw:comment><comments>http://www.blogjava.net/rosen/archive/2016/04/14/430099.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/rosen/comments/commentRss/430099.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/rosen/services/trackbacks/430099.html</trackback:ping><description><![CDATA[<p><strong><span style="font-size:22.0pt;font-family:&quot;Lantinghei SC Demibold&quot;; color:#355400;">Hadoop</span><span style="font-size:22.0pt; font-family:&quot;Lantinghei SC Demibold&quot;;color:#355400;">周刊</span></strong></p>  <p><strong>&nbsp;</strong></p>  <p><span style="font-size:14.0pt;font-family:&quot;Lantinghei SC Demibold&quot;; color:#355400;"><strong>第 165 期 2016年4月10日 </strong></span></p>  <p><span style="font-size:10.5pt;font-family:&quot;Lantinghei SC Demibold&quot;; color:#355400;"><strong>启明星辰&#8212;&#8212;平台和大数据整体组编译</strong></span></p>  <p>&nbsp;</p>  <p><span style="font-size:12.0pt;line-height:135%; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;;">本周，包括</span><span style="font-size:12.0pt;line-height:135%">LinkedIn </span><span style="font-size:12.0pt;line-height:135%;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;;">和</span><span style="font-size: 12.0pt;line-height:135%">Airbnb</span><span style="font-size:12.0pt;line-height: 135%;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;;">新开源项目在内的数个产品进行了重大版本发布。本期技术部分与流式处理有关</span><span style="font-size:12.0pt;line-height:135%">&#8212;&#8212;Spark</span><span style="font-size:12.0pt;line-height:135%;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;;">、</span><span style="font-size: 12.0pt;line-height:135%">Flink</span><span style="font-size:12.0pt;line-height: 135%;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;;">、</span><span style="font-size:12.0pt;line-height:135%">Kafka</span><span style="font-size:12.0pt;line-height:135%;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;;">等等；新闻部分是关于</span><span style="font-size:12.0pt;line-height:135%">Spark Summit </span><span style="font-size:12.0pt;line-height:135%;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;;">和</span><span style="font-size: 12.0pt;line-height:135%">HbaseCon</span><span style="font-size:12.0pt; line-height:135%;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;;">的会议议程。</span></p>  <h1><span style="font-family: 'Comic Sans MS'; font-size: 18pt;">技术</span></h1>  <p><span style="font-size:10.5pt;">Zalando</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;;">发表了他们是如何选择</span><span style="font-size:10.5pt;">Apache Flink</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;;">作为流式处理框架的文章。该文章阐述了对评价标准进行验证后得出的结论，阐明了选择</span><span style="font-size:10.5pt;">Apache Flink</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;;">的主因</span><span style="font-size:10.5pt;">&#8212;</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;;">在高吞吐量的情况下依然能保持低延迟，真正的流式处理，开发人员支持。</span></p>  <p><a href="https://tech.zalando.com/blog/apache-showdown-flink-vs.-spark/"><span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#0088CC;background:white">https://tech.zalando.com/blog/apache-showdown-flink-vs.-spark/</span></a></p>  <p>&nbsp;</p>  <p><span style="font-size:10.5pt">Cloudera</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">博客刊登了来自</span><span style="font-size:10.5pt">Wargaming.net</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">的文章，通过本文可了解到他们如何通过</span><span style="font-size: 10.5pt">Kafka</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">、</span><span style="font-size:10.5pt">HBase</span><span style="font-size:10.5pt;font-family: 宋体;Times New Roman&quot;;Times New Roman&quot;">、</span><span style="font-size:10.5pt">Drools</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">、</span><span style="font-size:10.5pt">Spark</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">构建实时处理基础设施的。另外，在数据流程方面，他们介绍了如何对</span><span style="font-size:10.5pt">HBase</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">的检索和序列化、</span><span style="font-size:10.5pt">HBase</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">和</span><span style="font-size:10.5pt">Spark</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">之间的数据本地化以及</span><span style="font-size:10.5pt">Spark</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">计算方面的优化措施。</span></p>  <p><a href="http://blog.cloudera.com/blog/2016/04/inside-wargamings-data-driven-real-time-rules-engine/"><span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#0088CC;background:white">http://blog.cloudera.com/blog/2016/04/inside-wargamings-data-driven-real-time-rules-engine/</span></a></p>  <p>&nbsp;</p>  <p><span style="font-size:10.5pt">InfoQ</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">发布了大规模流式处理</span><span style="font-size:10.5pt">&#8212;SMACK</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">（</span><span style="font-size:10.5pt">Spark</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">、</span><span style="font-size:10.5pt">Mesos</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">、</span><span style="font-size:10.5pt">Akka</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">、</span><span style="font-size:10.5pt">Cassandra</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">以及</span><span style="font-size:10.5pt"> Kafka</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">）栈的介绍视频。讨论了为什么</span><span style="font-size:10.5pt">SMACK</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">栈在处理同样问题的时候比</span><span style="font-size:10.5pt">Lambda</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">架构更简单。</span></p>  <p><a href="http://www.infoq.com/presentations/stream-analytics-scalability"><span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#0088CC;background:white">http://www.infoq.com/presentations/stream-analytics-scalability</span></a></p>  <p>&nbsp;</p>  <p><span style="font-size:10.5pt">Confluent&#8220;</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">日志压缩</span><span style="font-size:10.5pt">&#8221;</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">系列博文又有更新，介绍了</span><span style="font-size:10.5pt">Kafka</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">项目三月份发生的事情。有不少令人关注的开发内容，包括机架感知、</span><span style="font-size:10.5pt">Kerberos</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">支持、基于时间索引方面的进展。以及不少你（我也是）没有时间持续关注的最新研发成果。</span></p>  <p><a href="http://www.confluent.io/blog/log-compaction-highlights-in-the-kafka-and-stream-processing-community-april-2016"><span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#0088CC;background:white">http://www.confluent.io/blog/log-compaction-highlights-in-the-kafka-and-stream-processing-community-april-2016</span></a></p>  <p>&nbsp;</p>  <p><span style="font-size:10.5pt">Apache Flink 1.0</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">引入了新的复杂事件处理（</span><span style="font-size:10.5pt">CEP</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">）库。啰嗦几句，</span><span style="font-size:10.5pt">CEP</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">提供了一种检测事件模式的方法。本文借助传感器从数据中心服务器上收集数据，运用一种可能的异常检测用例，诠释了</span><span style="font-size:10.5pt">Flink</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">的</span><span style="font-size:10.5pt">CEP</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">模式</span><span style="font-size:10.5pt">API </span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">。</span></p>  <p><a href="http://flink.apache.org/news/2016/04/06/cep-monitoring.html"><span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#0088CC;background:white">http://flink.apache.org/news/2016/04/06/cep-monitoring.html</span></a></p>  <p>&nbsp;</p>  <p><span style="font-size:10.5pt">Genome Analysis Toolkit </span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">（</span><span style="font-size:10.5pt">GATK</span><span style="font-size:10.5pt;font-family: 宋体;Times New Roman&quot;;Times New Roman&quot;">）最近宣布，下一个版本（当前是</span><span style="font-size:10.5pt">alpha</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">）将支持</span><span style="font-size:10.5pt">Apache Spark</span><span style="font-size: 10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">。本文简要介绍了工具箱并展示了怎样通过</span><span style="font-size:10.5pt">Spark</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">来检测重复</span><span style="font-size:10.5pt">DNA</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">片段的。</span></p>  <p><a href="http://blog.cloudera.com/blog/2016/04/genome-analysis-toolkit-now-using-apache-spark-for-data-processing/"><span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#0088CC;background:white">http://blog.cloudera.com/blog/2016/04/genome-analysis-toolkit-now-using-apache-spark-for-data-processing/</span></a></p>  <p>&nbsp;</p>  <p><span style="font-size:10.5pt">InfoWorld</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">综述了</span><span style="font-size:10.5pt">Spark2.0</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">关于结构化流式处理方面的计划。微批处理将依然延续，还有些新特性，例如无限数据帧（</span><span style="font-size:10.5pt">Infinite DataFrames</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">）、一流的重复查询支持。</span></p>  <p><a href="http://www.infoworld.com/article/3052924/analytics/what-sparks-structured-streaming-really-means.html"><span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#0088CC;background:white">http://www.infoworld.com/article/3052924/analytics/what-sparks-structured-streaming-really-means.html</span></a></p>  <p>&nbsp;</p>  <p><span style="font-size:10.5pt">AWS</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">大数据博客发布了一篇通过存储在</span><span style="font-size: 10.5pt">AWS Key Management Service </span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">（</span><span style="font-size:10.5pt">KMS</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">）中的加密密钥加载数据到</span><span style="font-size:10.5pt">S3</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">和</span><span style="font-size:10.5pt">Redshift</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">的文章。除了描述所需步骤，本文还介绍了如何在</span><span style="font-size:10.5pt">AWS S3</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">中通过</span><span style="font-size:10.5pt">KMS</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">密钥加密数据。</span></p>  <p><a href="http://blogs.aws.amazon.com/bigdata/post/Tx2Q3ZBOZO9DHVQ/Encrypt-Your-Amazon-Redshift-Loads-with-Amazon-S3-and-AWS-KMS"><span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#0088CC;background:white">http://blogs.aws.amazon.com/bigdata/post/Tx2Q3ZBOZO9DHVQ/Encrypt-Your-Amazon-Redshift-Loads-with-Amazon-S3-and-AWS-KMS</span></a></p>  <p>&nbsp;</p>  <p><span style="font-size:10.5pt">Confluent</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">博客介绍了如何使用</span><span style="font-size:10.5pt">Kafka Connect </span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">和</span><span style="font-size:10.5pt"> Kafka Streams </span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">编写非凡的</span><span style="font-size:10.5pt">&#8220;hello world&#8221;</span><span style="font-size: 10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">程序。更确切地说，范例程序从</span><span style="font-size:10.5pt">IRC</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">拉维基百科数据，并解析消息、进行多方面的统计计算。本文还用了若干程序展示了整个实现过程。</span></p>  <p><a href="http://www.confluent.io/blog/hello-world-kafka-connect-kafka-streams"><span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#0088CC;background:white">http://www.confluent.io/blog/hello-world-kafka-connect-kafka-streams</span></a></p>  <p>&nbsp;</p>  <p style="line-height:107%"><span style="font-size:10.5pt; line-height:107%;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">本文从</span><span style="font-size:10.5pt; line-height:107%">Postgres </span><span style="font-size:10.5pt;line-height: 107%;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">向</span><span style="font-size:10.5pt;line-height:107%"> Cassandra</span><span style="font-size:10.5pt;line-height:107%;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">转换简单的模式（</span><span style="font-size:10.5pt;line-height:107%">schemas</span><span style="font-size: 10.5pt;line-height:107%;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">），并描述了主要的差异</span><span style="font-size:10.5pt; line-height:107%">&#8212;</span><span style="font-size:10.5pt;line-height:107%; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">复制、数据类型（</span><span style="font-size:10.5pt;line-height:107%">Cassandra</span><span style="font-size:10.5pt;line-height:107%;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">不支持</span><span style="font-size:10.5pt;line-height:107%">JSON</span><span style="font-size: 10.5pt;line-height:107%;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">）、主键、最终以一致性。</span></p>  <p><a href="http://neovintage.org/2016/04/07/data-modeling-in-cassandra-from-a-postgres-perspective/"><span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#0088CC;background:white">http://neovintage.org/2016/04/07/data-modeling-in-cassandra-from-a-postgres-perspective/</span></a></p>  <p>&nbsp;</p>  <h1><span style="font-family: 'Comic Sans MS'; font-size: 18pt;">新闻</span></h1>  <p style="line-height:107%"><span style="font-size: 10.5pt;line-height:107%">ESG</span><span style="font-size:10.5pt;line-height: 107%;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">博客报导了最近</span><span style="font-size:10.5pt;line-height:107%">Strata+Hadoop World</span><span style="font-size:10.5pt;line-height:107%;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">大会的情况。并有些重点关注，例如</span><span style="font-size:10.5pt;line-height:107%">Spark</span><span style="font-size:10.5pt;line-height:107%;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">的良好势头、机器学习、云服务。</span></p>  <p><a href="http://blog.esg-global.com/riding-high-at-stratahadoop-world"><span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#0088CC;background:white">http://blog.esg-global.com/riding-high-at-stratahadoop-world</span></a></p>  <p>&nbsp;</p>  <p><span style="font-size:10.5pt">InformationWeek</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">也报导了</span><span style="font-size:10.5pt">Strata</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">大会，关注了</span><span style="font-size:10.5pt">MapR</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">和</span><span style="font-size:10.5pt">Pivotal</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">的关灯片、人工智能等。</span></p>  <p><a href="http://www.informationweek.com/big-data/ai-public-data-sets-real-time-strata-+-hadoop-keynote-sampling/d/d-id/1324943?"><span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#0088CC;background:white">http://www.informationweek.com/big-data/ai-public-data-sets-real-time-strata-+-hadoop-keynote-sampling/d/d-id/1324943?</span></a></p>  <p>&nbsp;</p>  <p><span style="font-size:10.5pt">Spark Summit 2016</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">议程敲定，将于</span><span style="font-size:10.5pt">6</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">月</span><span style="font-size:10.5pt">6-8</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">日在旧金山举行。会议将有两天展开五个方向的讨论。</span></p>  <p><a href="https://databricks.com/blog/2016/04/04/agenda-announced-for-sparksummit-2016-in-san-francisco.html"><span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#0088CC;background:white">https://databricks.com/blog/2016/04/04/agenda-announced-for-sparksummit-2016-in-san-francisco.html</span></a></p>  <p>&nbsp;</p>  <p><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">福布斯采访了</span><span style="font-size:10.5pt">Cloudera CEO Tom Reilly</span><span style="font-size: 10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">，他讨论了公司的机遇、竞争性市场、上市计划等。</span></p>  <p><a href="http://www.forbes.com/sites/roberthof/2016/04/06/ceo-tom-reilly-makes-the-case-for-cloudera-and-its-ipo/"><span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#0088CC;background:white">http://www.forbes.com/sites/roberthof/2016/04/06/ceo-tom-reilly-makes-the-case-for-cloudera-and-its-ipo/</span></a></p>  <p>&nbsp;</p>  <p><span style="font-size:10.5pt">Datanami</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">撰文将正在崛起的</span><span style="font-size:10.5pt">Apache Kafka</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">作为流式处理的支柱。文章还采访了</span><span style="font-size:10.5pt">Confluent</span><span style="font-size: 10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">联合创始人兼</span><span style="font-size:10.5pt">CTO Neha Narkhede</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">，坊间她表示最近将推出</span><span style="font-size:10.5pt">Kafka Connect </span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">和</span><span style="font-size:10.5pt"> Kafka Streams</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">。</span></p>  <p><a href="http://www.datanami.com/2016/04/06/real-time-rise-apache-kafka/"><span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#0088CC;background:white">http://www.datanami.com/2016/04/06/real-time-rise-apache-kafka/</span></a></p>  <p>&nbsp;</p>  <p><span style="font-size:10.5pt">HBaseCon</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">将于</span><span style="font-size:10.5pt">5</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">月</span><span style="font-size:10.5pt">24</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">日在旧金山召开，最近议程才正式宣布。在三个方向上，将有</span><span style="font-size:10.5pt">20</span><span style="font-size:10.5pt;font-family: 宋体;Times New Roman&quot;;Times New Roman&quot;">个以上的议题要讨论。</span></p>  <p><a href="http://blog.cloudera.com/blog/2016/04/hbasecon-2016-speaker-lineup-announced/"><span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#0088CC;background:white">http://blog.cloudera.com/blog/2016/04/hbasecon-2016-speaker-lineup-announced/</span></a></p>  <p>&nbsp;</p>  <h1><span style="font-family: 'Comic Sans MS'; font-size: 18pt;">发布</span></h1>  <p>&nbsp;<span style="font-size:10.5pt">Apache HBase 0.98.18 </span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">和</span><span style="font-size:10.5pt">1.1.4</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">最近都发布了。</span><span style="font-size:10.5pt">1.1.4</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">上有包括九个或正确性在内的若干修复。</span><span style="font-size: 10.5pt">HBase 0.98.18</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">羞答答的仅解决了</span><span style="font-size:10.5pt">50</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">个问题（</span><span style="font-size:10.5pt">bug</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">、改善两个新特性）。</span></p>  <p><a href="http://mail-archives.apache.org/mod_mbox/hbase-user/201603.mbox/%3CCANZa%3DGu-mAxKEtfoRjctHcE0KD7z52oE010Fgsf6AMmW2tDZLA%40mail.gmail.com%3E"><span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#0088CC;background:white">http://mail-archives.apache.org/mod_mbox/hbase-user/201603.mbox/%3CCANZa%3DGu-mAxKEtfoRjctHcE0KD7z52oE010Fgsf6AMmW2tDZLA%40mail.gmail.com%3E</span></a>&nbsp;<span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#333333"><br /> </span><a href="http://mail-archives.apache.org/mod_mbox/hbase-user/201603.mbox/%3CCA%2BRK%3D_CtZ1L07nS6Og2ekfVwet0qTE7jw-bmyD2pp5UPweUehQ%40mail.gmail.com%3E"><span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#0088CC;background:white">http://mail-archives.apache.org/mod_mbox/hbase-user/201603.mbox/%3CCA%2BRK%3D_CtZ1L07nS6Og2ekfVwet0qTE7jw-bmyD2pp5UPweUehQ%40mail.gmail.com%3E</span></a></p>  <p>&nbsp;</p>  <p><span style="font-size:10.5pt">Apache Lens</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">发布了</span><span style="font-size:10.5pt">2.5.0-beta</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">，作为统一分析接口，它已经支持</span><span style="font-size: 10.5pt">Hadoop</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">生态系统的执行引擎数据存储了。本次发布解决了</span><span style="font-size:10.5pt">87</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">票，主要是</span><span style="font-size:10.5pt">bug</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">修复和实现新功能。</span></p>  <p><a href="http://mail-archives.us.apache.org/mod_mbox/www-announce/201604.mbox/%3CCAL3kmZj60kpopRPpOVEs9o7oTg7YuaC_=c8zncBeMyUESrZsmQ@mail.gmail.com%3E"><span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#0088CC;background:white">http://mail-archives.us.apache.org/mod_mbox/www-announce/201604.mbox/%3CCAL3kmZj60kpopRPpOVEs9o7oTg7YuaC_=c8zncBeMyUESrZsmQ@mail.gmail.com%3E</span></a></p>  <p>&nbsp;</p>  <p><span style="font-size:10.5pt">Airbnb </span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">开源了</span><span style="font-size:10.5pt"> Caravel</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">，数据探索系统（数据可视化平台）。</span><span style="font-size: 10.5pt">Caravel</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">支持多种在商业产品上才能看到的特性，能够连接到任意只要支持</span><span style="font-size:10.5pt">SQL</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">方言的系统。尤其它支持面向</span><span style="font-size:10.5pt">Druid</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">的实时分析。</span></p>  <p><a href="https://medium.com/airbnb-engineering/caravel-airbnb-s-data-exploration-platform-15a72aa610e5"><span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#0088CC;background:white">https://medium.com/airbnb-engineering/caravel-airbnb-s-data-exploration-platform-15a72aa610e5</span></a></p>  <p>&nbsp;</p>  <p><span style="font-size:10.5pt">MapR </span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">宣布支持</span><span style="font-size:10.5pt">Apache Drill 1.6</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">作为他们的分布式系统。比较有亮点的发布有</span><span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#333333;background:white">MapR-DB</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">新存储插件、新</span><span style="font-size:10.5pt">SQL</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">窗口函数支持以及端对端安全。在网页介绍部分，有些使用</span><span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#333333;background:white">MapR-DB API</span><span style="font-size:10.5pt;font-family:&quot;MS Mincho&quot;;MS Mincho&quot;; color:#333333;background:white">加</span><span style="font-size:10.5pt; font-family:SimSun;color:#333333;background:white">载</span><span style="font-size:10.5pt;font-family:&quot;MS Mincho&quot;;MS Mincho&quot;; color:#333333;background:white">数据并通</span><span style="font-size:10.5pt; font-family:SimSun;color:#333333;background:white">过</span><span style="font-size:10.5pt">Drill</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">查询的例子。</span></p>  <p><a href="https://www.mapr.com/blog/apache-drill-16-mapr-converged-platform-gearing-new-generation-stack-json-enabled-big-data"><span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#0088CC;background:white">https://www.mapr.com/blog/apache-drill-16-mapr-converged-platform-gearing-new-generation-stack-json-enabled-big-data</span></a></p>  <p>&nbsp;</p>  <p><span style="font-size:10.5pt">Apache Flink</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">发布了修复</span><span style="font-size:10.5pt">bug</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">后的</span><span style="font-size:10.5pt">1.0.x</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">。这次发布解决了</span><span style="font-size:10.5pt">23</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">个问题，推荐所有</span><span style="font-size:10.5pt">1.0.0</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">的用户升级。</span></p>  <p><a href="http://flink.apache.org/news/2016/04/06/release-1.0.1.html"><span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#0088CC;background:white">http://flink.apache.org/news/2016/04/06/release-1.0.1.html</span></a></p>  <p>&nbsp;</p>  <p><span style="font-size:10.5pt">Cloudera Enterprise 5.7</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">发布附带了</span><span style="font-size:10.5pt">Spark</span><span style="font-size:10.5pt;font-family: 宋体;Times New Roman&quot;;Times New Roman&quot;">、</span><span style="font-size:10.5pt">HBase</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">、</span><span style="font-size:10.5pt">Impala</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">、</span><span style="font-size:10.5pt">Kafka</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">等组件版本的升级。本次发布的亮点包括从</span><span style="font-size:10.5pt">Cloudera Labs </span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">新鲜推荐的</span><span style="font-size:10.5pt">Hive-on-Spark</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">、</span><span style="font-size:10.5pt">HBase-Spark</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">、</span><span style="font-size:10.5pt">Impala</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">性能重要提升，支持</span><span style="font-size:10.5pt">SSD </span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">上</span><span style="font-size:10.5pt">HBase WAL</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">。</span></p>  <p><a href="http://blog.cloudera.com/blog/2016/04/cloudera-enterprise-5-7-is-released/"><span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#0088CC;background:white">http://blog.cloudera.com/blog/2016/04/cloudera-enterprise-5-7-is-released/</span></a></p>  <p>&nbsp;</p>  <p><span style="font-size:10.5pt">Apache Tajo</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">，构建在</span><span style="font-size:10.5pt">Hadoop</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">上的数据仓库系统，发布了</span><span style="font-size:10.5pt">0.11.2</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">版。新版本支持了</span><span style="font-size:10.5pt">Kerberos</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">，修复了</span><span style="font-size:10.5pt">ORC</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">表对</span><span style="font-size:10.5pt">Hive</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">的支持等。</span></p>  <p><a href="http://tajo.apache.org/releases/0.11.2/announcement.html"><span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#0088CC;background:white">http://tajo.apache.org/releases/0.11.2/announcement.html</span></a></p>  <p>&nbsp;</p>  <p><span style="font-size:10.5pt">LinkedIn </span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">开源了</span><span style="font-size:10.5pt"> Dr. Elephant</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">，里面的工具能诊断</span><span style="font-size:10.5pt">Hadoop</span><span style="font-size:10.5pt;font-family: 宋体;Times New Roman&quot;;Times New Roman&quot;">和</span><span style="font-size:10.5pt">Spark</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">任务的性能问题。基于</span><span style="font-size:10.5pt">metrics</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">从</span><span style="font-size:10.5pt">YARN</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">资源管理器收集已完成任务数据，</span><span style="font-size:10.5pt">Dr. Elephant</span><span style="font-size: 10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">评估后生成诊断报表，内容包括数据错位、</span><span style="font-size:10.5pt">GC</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">开销等。</span><span style="font-size:10.5pt">LinkedIn</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">宣称借助它能解决</span><span style="font-size:10.5pt">80%</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">的问题。</span></p>  <p><a href="https://engineering.linkedin.com/blog/2016/04/dr-elephant-open-source-self-serve-performance-tuning-hadoop-spark"><span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#0088CC;background:white">https://engineering.linkedin.com/blog/2016/04/dr-elephant-open-source-self-serve-performance-tuning-hadoop-spark</span></a></p>  <p>&nbsp;</p>  <h1><span style="font-family: 'Comic Sans MS'; font-size: 18pt;">活动</span></h1>  <p><strong><span style="font-size:16.0pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">中国</span></strong><strong></strong></p>  <p><span style="font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">无</span></p><img src ="http://www.blogjava.net/rosen/aggbug/430099.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/rosen/" target="_blank">Rosen</a> 2016-04-14 18:02 <a href="http://www.blogjava.net/rosen/archive/2016/04/14/430099.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>开源面向对象数据库 db4o 之旅: 使用 dRS “db4o 之旅（四）”</title><link>http://www.blogjava.net/rosen/archive/2010/07/09/325618.html</link><dc:creator>Rosen</dc:creator><author>Rosen</author><pubDate>Fri, 09 Jul 2010 02:19:00 GMT</pubDate><guid>http://www.blogjava.net/rosen/archive/2010/07/09/325618.html</guid><wfw:comment>http://www.blogjava.net/rosen/comments/325618.html</wfw:comment><comments>http://www.blogjava.net/rosen/archive/2010/07/09/325618.html#Feedback</comments><slash:comments>9</slash:comments><wfw:commentRss>http://www.blogjava.net/rosen/comments/commentRss/325618.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/rosen/services/trackbacks/325618.html</trackback:ping><description><![CDATA[&nbsp;&nbsp;&nbsp;&nbsp; 摘要: 很多开发者对 hibernate 性能表示置疑，下一次技术革新会是什么呢？——对象数据库<br>这篇文章是开源面向对象数据库  db4o 之旅   系列文章的第 4 部分，介绍面向对象数据库 db4o 的 db4o Replication System(dRS) —— db4o 复制系统，并对其如何同步 Oracle 数据库进行分析。&nbsp;&nbsp;<a href='http://www.blogjava.net/rosen/archive/2010/07/09/325618.html'>阅读全文</a><img src ="http://www.blogjava.net/rosen/aggbug/325618.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/rosen/" target="_blank">Rosen</a> 2010-07-09 10:19 <a href="http://www.blogjava.net/rosen/archive/2010/07/09/325618.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>使用SoftReference软引用</title><link>http://www.blogjava.net/rosen/archive/2010/06/22/324173.html</link><dc:creator>Rosen</dc:creator><author>Rosen</author><pubDate>Tue, 22 Jun 2010 07:27:00 GMT</pubDate><guid>http://www.blogjava.net/rosen/archive/2010/06/22/324173.html</guid><wfw:comment>http://www.blogjava.net/rosen/comments/324173.html</wfw:comment><comments>http://www.blogjava.net/rosen/archive/2010/06/22/324173.html#Feedback</comments><slash:comments>2</slash:comments><wfw:commentRss>http://www.blogjava.net/rosen/comments/commentRss/324173.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/rosen/services/trackbacks/324173.html</trackback:ping><description><![CDATA[&nbsp;&nbsp;&nbsp;&nbsp; 摘要: 还是做个实际的SoftReference测试吧。&nbsp;&nbsp;<a href='http://www.blogjava.net/rosen/archive/2010/06/22/324173.html'>阅读全文</a><img src ="http://www.blogjava.net/rosen/aggbug/324173.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/rosen/" target="_blank">Rosen</a> 2010-06-22 15:27 <a href="http://www.blogjava.net/rosen/archive/2010/06/22/324173.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>使用Memory Analyzer tool(MAT)分析内存泄漏（二）</title><link>http://www.blogjava.net/rosen/archive/2010/06/13/323522.html</link><dc:creator>Rosen</dc:creator><author>Rosen</author><pubDate>Sun, 13 Jun 2010 08:13:00 GMT</pubDate><guid>http://www.blogjava.net/rosen/archive/2010/06/13/323522.html</guid><wfw:comment>http://www.blogjava.net/rosen/comments/323522.html</wfw:comment><comments>http://www.blogjava.net/rosen/archive/2010/06/13/323522.html#Feedback</comments><slash:comments>19</slash:comments><wfw:commentRss>http://www.blogjava.net/rosen/comments/commentRss/323522.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/rosen/services/trackbacks/323522.html</trackback:ping><description><![CDATA[&nbsp;&nbsp;&nbsp;&nbsp; 摘要: 在平时工作过程中，有时会遇到OutOfMemoryError，我们知道遇到Error一般表明程序存在着严重问题，可能是灾难性的。所以找出是什么原因造成OutOfMemoryError非常重要。现在向大家引荐Eclipse Memory Analyzer tool(MAT)，来化解我们遇到的难题。&nbsp;&nbsp;<a href='http://www.blogjava.net/rosen/archive/2010/06/13/323522.html'>阅读全文</a><img src ="http://www.blogjava.net/rosen/aggbug/323522.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/rosen/" target="_blank">Rosen</a> 2010-06-13 16:13 <a href="http://www.blogjava.net/rosen/archive/2010/06/13/323522.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>使用Memory Analyzer tool(MAT)分析内存泄漏（一）</title><link>http://www.blogjava.net/rosen/archive/2010/05/21/321575.html</link><dc:creator>Rosen</dc:creator><author>Rosen</author><pubDate>Fri, 21 May 2010 12:59:00 GMT</pubDate><guid>http://www.blogjava.net/rosen/archive/2010/05/21/321575.html</guid><wfw:comment>http://www.blogjava.net/rosen/comments/321575.html</wfw:comment><comments>http://www.blogjava.net/rosen/archive/2010/05/21/321575.html#Feedback</comments><slash:comments>23</slash:comments><wfw:commentRss>http://www.blogjava.net/rosen/comments/commentRss/321575.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/rosen/services/trackbacks/321575.html</trackback:ping><description><![CDATA[&nbsp;&nbsp;&nbsp;&nbsp; 摘要: 在平时工作过程中，有时会遇到OutOfMemoryError，我们知道遇到Error一般表明程序存在着严重问题，可能是灾难性的。所以找出是什么原因造成OutOfMemoryError非常重要。现在向大家引荐Eclipse Memory Analyzer tool(MAT)，来化解我们遇到的难题。&nbsp;&nbsp;<a href='http://www.blogjava.net/rosen/archive/2010/05/21/321575.html'>阅读全文</a><img src ="http://www.blogjava.net/rosen/aggbug/321575.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/rosen/" target="_blank">Rosen</a> 2010-05-21 20:59 <a href="http://www.blogjava.net/rosen/archive/2010/05/21/321575.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item></channel></rss>