﻿<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/"><channel><title>BlogJava-&lt;b&gt;成都心情&lt;/b&gt;-随笔分类-大数据</title><link>http://www.blogjava.net/rosen/category/55019.html</link><description /><language>zh-cn</language><lastBuildDate>Tue, 03 May 2016 02:40:00 GMT</lastBuildDate><pubDate>Tue, 03 May 2016 02:40:00 GMT</pubDate><ttl>60</ttl><item><title>Hadoop周刊—第 167 期 </title><link>http://www.blogjava.net/rosen/archive/2016/05/03/430325.html</link><dc:creator>Rosen</dc:creator><author>Rosen</author><pubDate>Tue, 03 May 2016 02:08:00 GMT</pubDate><guid>http://www.blogjava.net/rosen/archive/2016/05/03/430325.html</guid><wfw:comment>http://www.blogjava.net/rosen/comments/430325.html</wfw:comment><comments>http://www.blogjava.net/rosen/archive/2016/05/03/430325.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/rosen/comments/commentRss/430325.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/rosen/services/trackbacks/430325.html</trackback:ping><description><![CDATA[<p align="right" style="text-align: left; line-height: 10%;"><strong><span style="font-size:16.0pt; line-height:10%">Hadoop</span></strong><strong><span style="font-size:16.0pt;line-height:10%;font-family:宋体;">周刊</span></strong><strong> </strong><strong><span style="font-size: 16.0pt;line-height:10%;font-family:宋体;">第</span></strong><strong><span style="font-size:16.0pt; line-height:10%"> 167 </span></strong><strong><span style="font-size:16.0pt;line-height:10%;font-family:宋体;">期<br /></span></strong><strong></strong></p>  <p align="right" style="text-align: right;"><div style="text-align: left;"><font face="宋体"><span style="font-size: 18.6667px; line-height: 1.86667px;"><br /></span></font></div><span style="line-height: 10%; font-size: 14pt; font-family: 宋体;"><div style="text-align: left;"><span style="font-size: 14pt; line-height: 10%;"><br />启明星辰平台和大数据整体组编译</span></div></span></p>  <p align="right" style="text-align: right;"><div style="text-align: left;"><span style="font-size: 18.6667px; line-height: 1.86667px;"><br /></span></div><span style="line-height: 10%; font-size: 14pt;"><div style="text-align: left;"><span style="line-height: 10%; font-size: 14pt;"><br />2016</span><span style="line-height: 10%; font-size: 14pt; font-family: 宋体;">年</span><span style="line-height: 10%; font-size: 14pt;">4</span><span style="line-height: 10%; font-size: 14pt; font-family: 宋体;">月</span><span style="line-height: 10%; font-size: 14pt;">25</span><span style="line-height: 10%; font-size: 14pt; font-family: 宋体;">日</span></div></span></p>  <p>&nbsp;</p>  <p><span style="font-family:宋体;">欢迎来到</span><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">周刊周一特别版。本周有大量来自</span><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Beam</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Kudu</span><span style="font-family:宋体;">的技术新闻。如果你正在寻找一些更前沿的技术，</span><span style="font-family:Helvetica;">Apache Metron</span><span style="font-family:宋体;">（孵化中）发布了它们第一个版本。</span><span style="font-family:Helvetica;">Metron</span><span style="font-family:宋体;">，是一个构建在</span><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">上正在不断发展的通用安全系统。</span></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">技术新闻</span></strong><strong></strong></p>  <p><span style="font-family:宋体;">本文介绍了如何在</span><span style="font-family:Helvetica;">AWS</span><span style="font-family:宋体;">上构建流式处理系统。包括了诸如</span><span style="font-family:Helvetica;">Amazon Kinesis </span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">AWS Lambda</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Kineses S3 connector</span><span style="font-family:宋体;">之类简单的搭配方案，也介绍了</span><span style="font-family: Helvetica;">AWS</span><span style="font-family:宋体;">实现实时分析场景这样相对复杂点的方案。</span></p>  <p align="left"><a href="http://cdn.oreillystatic.com/en/assets/1/event/144/Building%20a%20scalable%20architecture%20for%20processing%20streaming%20data%20on%20AWS%20Presentation.pdf"><span style="font-family:Helvetica;">http://cdn.oreillystatic.com/en/assets/1/event/144/Building%20a%20scalable%20architecture%20for%20processing%20streaming%20data%20on%20AWS%20Presentation.pdf</span></a></p>  <p align="left"><u>&nbsp;</u></p>  <p align="left"><span style="font-family:宋体;">本文介绍了怎样使用</span><span style="font-family:Helvetica;">Spark Testing Base</span><span style="font-family:宋体;">。</span><span style="font-family:Helvetica;">Spark Testing Base</span><span style="font-family:宋体;">是一个用</span><span style="font-family:Helvetica;">Scala</span><span style="font-family:宋体;">编写，通过</span><span style="font-family:Helvetica;">Java</span><span style="font-family:宋体;">调用的</span><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">测试框架。本文的样例代码展示了如何隔离测试逻辑重构</span><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">代码，同时还通过</span><span style="font-family:Helvetica;">Java</span><span style="font-family:宋体;">处理了一些臃肿的</span><span style="font-family:Helvetica;">Scala API</span><span style="font-family:宋体;">。</span></p>  <p align="left"><a href="http://www.jesse-anderson.com/2016/04/unit-testing-spark-with-java/"><span style="font-family:Helvetica;">http://www.jesse-anderson.com/2016/04/unit-testing-spark-with-java/</span></a></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Altiscale</span><span style="font-family:宋体;">博客概述了在</span><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">环境下，构建</span><span style="font-family:Helvetica;">thin</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">uber jar</span><span style="font-family:宋体;">包的优劣。示范了在</span><span style="font-family:Helvetica;">Maven</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">SBT</span><span style="font-family:宋体;">分别构建两种包的情况。</span></p>  <p align="left"><a href="https://www.altiscale.com/blog/spark-on-hadoop-thin-jars/"><span style="font-family:Helvetica;">https://www.altiscale.com/blog/spark-on-hadoop-thin-jars/</span></a></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">LinkedIn</span><span style="font-family:宋体;">介绍了他们的</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">生态系统，生态系统包含一个特殊的</span><span style="font-family:Helvetica;">Kafka producer</span><span style="font-family:宋体;">，一个为非</span><span style="font-family:Helvetica;">Java</span><span style="font-family:宋体;">客户端提供的</span><span style="font-family:Helvetica;">REST API</span><span style="font-family:宋体;">，一个</span><span style="font-family:Helvetica;">avro</span><span style="font-family:宋体;">模式注册表，以及</span><span style="font-family:Helvetica;">Gobblin</span><span style="font-family:宋体;">（装载数据到</span><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">的工具）等等。</span></p>  <p align="left"><a href="https://engineering.linkedin.com/blog/2016/04/kafka-ecosystem-at-linkedin"><span style="font-family:Helvetica;">https://engineering.linkedin.com/blog/2016/04/kafka-ecosystem-at-linkedin</span></a></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:宋体;">该</span><span style="font-family:Helvetica;">Spark Streaming</span><span style="font-family:宋体;">教程介绍了怎样通过</span><span style="font-family:Helvetica;">twitter4j API</span><span style="font-family:宋体;">拉推文，基于标签过滤，对推文进行情感分析。</span></p>  <p align="left"><a href="https://www.mapr.com/blog/spark-streaming-and-twitter-sentiment-analysis"><span style="font-family:Helvetica;">https://www.mapr.com/blog/spark-streaming-and-twitter-sentiment-analysis</span></a></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Apache Kudu</span><span style="font-family:宋体;">（孵化中）是</span><span style="font-family:Helvetica;">Apache Impala</span><span style="font-family:宋体;">（孵化中）的绝佳伴侣，因为它能高效地解决广泛的分析和有针对性的查询。本文描述了两者集成的技术细节，例如</span><span style="font-family:Helvetica;">Kudu</span><span style="font-family:宋体;">的设计如何保证高效地查询能力，如何通过</span><span style="font-family:Helvetica;">Impala</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">Kudu</span><span style="font-family:宋体;">执行写／更新／删除操作等等。</span></p>  <p><a href="http://blog.cloudera.com/blog/2016/04/how-to-use-impala-and-kudu-together-for-analytic-workloads/"><span style="font-family:Helvetica;">http://blog.cloudera.com/blog/2016/04/how-to-use-impala-and-kudu-together-for-analytic-workloads/</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">MapR</span><span style="font-family:宋体;">撰文介绍了使用</span><span style="font-family:Helvetica;">spark-sklearn</span><span style="font-family:宋体;">扩展一个已存在的</span><span style="font-family:Helvetica;">scikit-learn</span><span style="font-family:宋体;">模型。文章介绍了如何透过</span><span style="font-family:Helvetica;">Airbnb</span><span style="font-family:宋体;">数据集内部建模，还介绍了如何傍着</span><span style="font-family:Helvetica;">spark-sklearn</span><span style="font-family:宋体;">进行交叉验证。</span></p>  <p><a href="https://www.mapr.com/blog/predicting-airbnb-listing-prices-scikit-learn-and-apache-spark"><span style="font-family:Helvetica;">https://www.mapr.com/blog/predicting-airbnb-listing-prices-scikit-learn-and-apache-spark</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">AWS</span><span style="font-family:宋体;">大数据博客写了个如何在</span><span style="font-family: Helvetica;">Amazon EMR</span><span style="font-family:宋体;">中使用</span><span style="font-family:Helvetica;">HBase</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">Hive</span><span style="font-family:宋体;">的教程。本教程介绍了</span><span style="font-family: Helvetica;">HBase</span><span style="font-family:宋体;">，描述了如何在</span><span style="font-family:Helvetica;">S3</span><span style="font-family:宋体;">中恢复</span><span style="font-family:Helvetica;">HBase</span><span style="font-family:宋体;">表，示范了</span><span style="font-family:Helvetica;">Hive</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">HBase</span><span style="font-family:宋体;">如何集成等等。</span></p>  <p><a href="http://blogs.aws.amazon.com/bigdata/post/Tx3EGE8Z90LZ9WX/Combine-NoSQL-and-Massively-Parallel-Analytics-Using-Apache-HBase-and-Apache-Hiv"><span style="font-family:Helvetica;">http://blogs.aws.amazon.com/bigdata/post/Tx3EGE8Z90LZ9WX/Combine-NoSQL-and-Massively-Parallel-Analytics-Using-Apache-HBase-and-Apache-Hiv</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:宋体;">本文描述了为学生在大数据课程上提供实战经验的挑战。作者经历若干次的迭代和选择似乎有了一个好方案</span><span style="font-family:Helvetica;">&#8212;</span> <span style="font-family:Helvetica;">Altiscale</span><span style="font-family:宋体;">的</span><span style="font-family:Helvetica;">Hadoop-as-a-Service</span><span style="font-family:宋体;">。</span></p>  <p><a href="https://www.altiscale.com/blog/hadoop-as-a-service-in-the-classroom/"><span style="font-family:Helvetica;">https://www.altiscale.com/blog/hadoop-as-a-service-in-the-classroom/</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Cloudera</span><span style="font-family:宋体;">博客的一篇客做文章，作者比较了</span><span style="font-family:Helvetica;">Parquet</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">Avro</span><span style="font-family:宋体;">在跨两个数据集的不同处理方式（一个数据集窄</span><span style="font-family:Helvetica;">(3</span><span style="font-family:宋体;">列</span><span style="font-family:Helvetica;">)</span><span style="font-family:宋体;">、一个数据集宽</span><span style="font-family:Helvetica;">(103</span><span style="font-family:宋体;">列</span><span style="font-family:Helvetica;">)</span><span style="font-family:宋体;">）。在用</span><span style="font-family:Helvetica;">Spark</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">Spark SQL</span><span style="font-family:宋体;">测试查询／操作后，作者发现</span><span style="font-family: Helvetica;">Parquet</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">Avro</span><span style="font-family:宋体;">在查询序列化数据方面有时表现很类似，尽管在大多数情况下查询</span><span style="font-family:Helvetica;">Parquet</span><span style="font-family:宋体;">数据的时候更快点（序列化数据更小）。</span></p>  <p><a href="http://blog.cloudera.com/blog/2016/04/benchmarking-apache-parquet-the-allstate-experience/"><span style="font-family:Helvetica;">http://blog.cloudera.com/blog/2016/04/benchmarking-apache-parquet-the-allstate-experience/</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:宋体;">本文介绍了如何在</span><span style="font-family:Helvetica;">CDH</span><span style="font-family:宋体;">这样的分布式环境中使用</span><span style="font-family:Helvetica;">SparkR</span><span style="font-family:宋体;">，尽管</span><span style="font-family:Helvetica;">SparkR</span><span style="font-family:宋体;">官方还没有支持这种方式。借助</span><span style="font-family:Helvetica;">YARN</span><span style="font-family:宋体;">在</span><span style="font-family:Helvetica;">worker</span><span style="font-family:宋体;">本地安装</span><span style="font-family:Helvetica;">R</span><span style="font-family:宋体;">语言包，</span><span style="font-family:Helvetica;">job</span><span style="font-family:宋体;">稍加改造就能执行了。</span></p>  <p><a href="http://www.nodalpoint.com/sparkr-in-cloudera-hadoop/"><span style="font-family:Helvetica;">http://www.nodalpoint.com/sparkr-in-cloudera-hadoop/</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:宋体;">很多开源框架都能执行</span><span style="font-family:Helvetica;">MapReduce</span><span style="font-family:宋体;">以及借助更高级的编程模型完成类似的工作。纵观过去，它们依赖独立运行的框架（例如</span><span style="font-family:Helvetica;">MapReduce, Storm</span><span style="font-family:宋体;">），但是最近的某些变化使得这一切充满了变数。</span><span style="font-family:Helvetica;">Apache Beam</span><span style="font-family:宋体;">（孵化中）更进一步地跨越了批处理、流式处理两种执行模式，内置更加复杂的计算模型。</span></p>  <p><a href="http://www.datanami.com/2016/04/22/apache-beam-emerges-ambitious-goal-unify-big-data-development/"><span style="font-family:Helvetica;">http://www.datanami.com/2016/04/22/apache-beam-emerges-ambitious-goal-unify-big-data-development/</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Apache</span><span style="font-family:宋体;">博客发布了</span><span style="font-family:Helvetica;">HBase</span><span style="font-family:宋体;">在</span><span style="font-family:Helvetica;">HDD</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">SSD</span><span style="font-family:宋体;">以及</span><span style="font-family:Helvetica;">RAMDISK</span><span style="font-family:宋体;">上的写入性能测试比对的</span><span style="font-family: Helvetica;">7</span><span style="font-family:宋体;">篇系列文章。通过这一分析，作者发现并提议在</span><span style="font-family:Helvetica;">HBase</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">HDFS</span><span style="font-family:宋体;">上实现一些未覆盖的功能。</span></p>  <p><a href="https://blogs.apache.org/hbase/entry/hdfs_hsm_and_hbase_part"><span style="font-family:Helvetica;">https://blogs.apache.org/hbase/entry/hdfs_hsm_and_hbase_part</span></a></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">其他新闻</span></strong><strong></strong></p>  <p><span style="font-family:Helvetica;">Tom White</span><span style="font-family:宋体;">，</span><span style="font-family:Helvetica;">&#8220;Hadoop</span><span style="font-family:宋体;">权威指南</span><span style="font-family:Helvetica;">&#8221;</span><span style="font-family:宋体;">的作者撰文介绍他是如何步入</span><span style="font-family:Helvetica;">Apache Hadoop</span><span style="font-family:宋体;">殿堂的。他的早期贡献是绕着</span><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">与</span><span style="font-family:Helvetica;">Amazon Web Services</span><span style="font-family:宋体;">集成展开，而今</span><span style="font-family:Helvetica;">AWS</span><span style="font-family:宋体;">已成为</span><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">项目成功的重要部分。</span></p>  <p><a href="http://vision.cloudera.com/how-i-got-into-hadoop/"><span style="font-family:Helvetica;">http://vision.cloudera.com/how-i-got-into-hadoop/</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Fluo</span><span style="font-family:宋体;">，为</span><span style="font-family:Helvetica;">Apache Accumulo</span><span style="font-family:宋体;">准备的分布式处理引擎，向</span><span style="font-family:Helvetica;">Apache</span><span style="font-family:宋体;">孵化器提交了孵化申请。</span></p>  <p><a href="https://wiki.apache.org/incubator/FluoProposal"><span style="font-family: Helvetica;">https://wiki.apache.org/incubator/FluoProposal</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Apache Phoenix</span><span style="font-family: 宋体;">宣布将在</span><span style="font-family:Helvetica;">HBaseCon</span><span style="font-family:宋体;">后举行会议，</span><span style="font-family:Helvetica;">Apache Phoenix</span><span style="font-family:宋体;">是一个</span><span style="font-family:Helvetica;">SQL-on-HBase</span><span style="font-family:宋体;">系统。该会议只有半天，主题是介绍</span><span style="font-family:Helvetica;">Phoenix</span><span style="font-family:宋体;">内部情况和用例。</span></p>  <p><a href="http://hortonworks.com/blog/announcing-first-annual-phoenixcon-apache-phoenix-user-conference/"><span style="font-family:Helvetica;">http://hortonworks.com/blog/announcing-first-annual-phoenixcon-apache-phoenix-user-conference/</span></a></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">产品发布</span></strong><strong></strong></p>  <p align="left"><span style="font-family:Helvetica;">Apache Metron</span><span style="font-family:宋体;">，构建于</span><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">上的安全框架，发布了</span><span style="font-family:Helvetica;">0.1</span><span style="font-family:宋体;">版。</span><span style="font-family:Helvetica;">Hortonworks</span><span style="font-family: 宋体;">支撑其作为技术预览版，并撰写本文介绍了如何上手，如何贡献，如何使用</span><span style="font-family:Helvetica;">Metron UI</span><span style="font-family:宋体;">等等。</span></p>  <p align="left"><a href="http://hortonworks.com/blog/apache-metron-tech-preview-1-come-get/"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">http://hortonworks.com/blog/apache-metron-tech-preview-1-come-get/</span></a></p>  <p align="left"><a href="http://hortonworks.com/blog/apache-metron-use-case-finding-needle-haystack/"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">http://hortonworks.com/blog/apache-metron-use-case-finding-needle-haystack/</span></a></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Apache NiFi</span><span style="font-family:宋体;">本周发布了</span><span style="font-family:Helvetica;">0.6.1</span><span style="font-family:宋体;">版。这是修复了</span><span style="font-family:Helvetica;">10</span><span style="font-family:宋体;">多个</span><span style="font-family:Helvetica;">bug</span><span style="font-family:宋体;">后的修复版。</span></p>  <p align="left"><a href="http://mail-archives.us.apache.org/mod_mbox/www-announce/201604.mbox/%3CCALJK9a7yLnFeJ7Z=eU6mOB-DXvo8MHUr=_RshSjZcTbTcAHDZA@mail.gmail.com%3E"><span style="font-family:Helvetica;">http://mail-archives.us.apache.org/mod_mbox/www-announce/201604.mbox/%3CCALJK9a7yLnFeJ7Z=eU6mOB-DXvo8MHUr=_RshSjZcTbTcAHDZA@mail.gmail.com%3E</span></a></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Apache Flink</span><span style="font-family:宋体;">本周发布了</span><span style="font-family:Helvetica;">1.0.2</span><span style="font-family:宋体;">版。本次发布包括了</span><span style="font-family:Helvetica;">bug</span><span style="font-family:宋体;">修复，</span><span style="font-family:Helvetica;">RocksDB</span><span style="font-family:宋体;">环境下的性能提升以及一些文档方面的进步。</span></p>  <p align="left"><a href="http://flink.apache.org/news/2016/04/22/release-1.0.2.html"><span style="font-family:Helvetica;">http://flink.apache.org/news/2016/04/22/release-1.0.2.html</span></a></p>  <p align="left">&nbsp;</p>  <p align="left"><span style="font-family:Helvetica;">Amazon</span><span style="font-family:宋体;">发布了新版</span><span style="font-family:Helvetica;">Amazon EMR</span><span style="font-family:宋体;">，开始支持</span><span style="font-family:Helvetica;">HBase 1.2</span><span style="font-family:宋体;">。</span></p>  <p align="left"><a href="https://aws.amazon.com/blogs/aws/amazon-emr-update-apache-hbase-1-2-is-now-available/"><span style="font-family:Helvetica;">https://aws.amazon.com/blogs/aws/amazon-emr-update-apache-hbase-1-2-is-now-available/</span></a></p>  <p align="left">&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">活动</span></strong><strong></strong></p>  <p align="left"><span style="font-size:14.0pt;font-family:SimSun;">中国</span></p>  <p align="left"><span style="font-family:SimSun;">无</span></p><img src ="http://www.blogjava.net/rosen/aggbug/430325.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/rosen/" target="_blank">Rosen</a> 2016-05-03 10:08 <a href="http://www.blogjava.net/rosen/archive/2016/05/03/430325.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Hadoop周刊—第 166 期</title><link>http://www.blogjava.net/rosen/archive/2016/04/21/430176.html</link><dc:creator>Rosen</dc:creator><author>Rosen</author><pubDate>Thu, 21 Apr 2016 07:07:00 GMT</pubDate><guid>http://www.blogjava.net/rosen/archive/2016/04/21/430176.html</guid><wfw:comment>http://www.blogjava.net/rosen/comments/430176.html</wfw:comment><comments>http://www.blogjava.net/rosen/archive/2016/04/21/430176.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/rosen/comments/commentRss/430176.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/rosen/services/trackbacks/430176.html</trackback:ping><description><![CDATA[<p><strong><span style="font-size:16.0pt">Hadoop</span></strong><strong><span style="font-size:16.0pt;font-family: 宋体;">周刊</span></strong><strong> </strong><strong><span style="font-size:16.0pt;font-family:宋体;">第</span></strong><strong><span style="font-size:16.0pt"> 166 </span></strong><strong><span style="font-size: 16.0pt;font-family:宋体;">期</span></strong><strong></strong></p>  <p><span style="font-size:14.0pt">2016</span><span style="font-size:14.0pt;font-family:宋体;">年</span><span style="font-size:14.0pt">4</span><span style="font-size:14.0pt; font-family:宋体;">月</span><span style="font-size:14.0pt">17</span><span style="font-size:14.0pt;font-family: 宋体;">日</span></p>  <p>启明星辰&#8212;&#8212;平台和大数据整体组编译&nbsp;<br /><br /></p>  <p><span style="font-family:Helvetica;">Hortonworks</span><span style="font-family: 宋体;">在本周</span><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">欧洲峰会上有若干爆料，贯穿了本期整个内容。伴随着骄人的新特性，</span><span style="font-family:Helvetica;">Apache Storm</span><span style="font-family:宋体;">发布了</span><span style="font-family:Helvetica;">1.0.0</span><span style="font-family:宋体;">版。在技术新闻方面，有不少基于</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">构建大规模服务和分布式系统测试的文章。如果你错过了</span><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">峰会，那么不用担心，演讲视频已经放到了网上。</span></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">技术新闻</span></strong><strong></strong></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Smyte</span><span style="font-family:宋体;">撰文介绍了他们基于事件数据流实时检测垃圾邮件和诈骗信息的基础设施。最初的事件处理系统构建在</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Redis</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Secor</span><span style="font-family:宋体;">以及</span><span style="font-family:Helvetica;">S3</span><span style="font-family:宋体;">上，为了满足规模不断扩张和廉价的要求，他们把系统迁移到基于磁盘的方案上，使用</span><span style="font-family:Helvetica;">Redis</span><span style="font-family:宋体;">协议与</span><span style="font-family:Helvetica;">RocksDB</span><span style="font-family:宋体;">交互，使用</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">进行复制。</span></p>  <p><a href="https://medium.com/the-smyte-blog/counting-with-domain-specific-databases-73c660472da"><span style="font-family:Helvetica;">https://medium.com/the-smyte-blog/counting-with-domain-specific-databases-73c660472da</span></a></p>  <p><u>&nbsp;</u></p>  <p><span style="font-family:宋体;">本文把</span><span style="font-family:Helvetica;">rsyslog</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">AWS </span><span style="font-family:宋体;">与</span><span style="font-family:Helvetica;">ELK</span><span style="font-family:宋体;">栈（</span><span style="font-family:Helvetica;">ElasticSearch</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Logstash</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Kibana</span><span style="font-family:宋体;">）结合，处理诸如反压、规模以及维护方面的问题。本文覆盖了</span><span style="font-family:Helvetica;">rsyslog</span><span style="font-family:宋体;">集成</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">以及</span><span style="font-family:Helvetica;">schema</span><span style="font-family:宋体;">方面的技巧，也介绍了如何运行</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Zookeeper</span><span style="font-family:宋体;">以及</span><span style="font-family:Helvetica;">AWS</span><span style="font-family:宋体;">中大规模自动分组。</span></p>  <p><a href="https://www.bashton.com/blog/2016/elk-on-ark/"><span style="font-family: Helvetica;">https://www.bashton.com/blog/2016/elk-on-ark/</span></a></p>  <p><u>&nbsp;</u></p>  <p><span style="font-family:Helvetica;">Hortonworks</span><span style="font-family: 宋体;">撰文介绍了</span><span style="font-family:Helvetica;">Apache Atlas</span><span style="font-family:宋体;">以及</span><span style="font-family:Helvetica;">Apache Range</span><span style="font-family:宋体;">将要引入的数据管理特性。这些特性是：分类访问控制、数据有效期策略、位置特性策略、禁止数据集组合、跨组件家族（例如从</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">到</span><span style="font-family:Helvetica;">Storm</span><span style="font-family:宋体;">再到</span><span style="font-family:Helvetica;">Hive</span><span style="font-family:宋体;">的数据跟踪）。</span></p>  <p><a href="http://hortonworks.com/blog/the-next-generation-of-hadoop-based-security-data-governance/"><span style="font-family:Helvetica;">http://hortonworks.com/blog/the-next-generation-of-hadoop-based-security-data-governance/</span></a></p>  <p><u>&nbsp;</u></p>  <p><span style="font-family:Helvetica;">Apache HAWQ </span><span style="font-family: 宋体;">（孵化中）是一个基于</span><span style="font-family: Helvetica;">Greenplum</span><span style="font-family:宋体;">在</span><span style="font-family:Helvetica;">HDFS</span><span style="font-family:宋体;">上提供数据查询的</span><span style="font-family:Helvetica;">SQL</span><span style="font-family:宋体;">引擎。本文讨论了其典型设计以及新版本的诸多改进。包括它与</span><span style="font-family: Helvetica;">Spark</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">MapReduce</span><span style="font-family:宋体;">的区别，还有些</span><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">挑战经典</span><span style="font-family:Helvetica;">MPP</span><span style="font-family:宋体;">设计的内容，以及</span><span style="font-family:Helvetica;">HAWQ</span><span style="font-family:宋体;">的新设计怎样结合</span><span style="font-family:Helvetica;">MPP</span><span style="font-family:宋体;">和批处理技术进而使其两者兼顾。</span></p>  <p><a href="https://blog.pivotal.io/big-data-pivotal/products/apache-hawq-next-step-in-massively-parallel-processing"><span style="font-family:Helvetica;">https://blog.pivotal.io/big-data-pivotal/products/apache-hawq-next-step-in-massively-parallel-processing</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Cloudera</span><span style="font-family:宋体;">博客撰文介绍了对</span><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">分布式系统进行故障注入、组网的测试工具</span><span style="font-family:Helvetica;">AgenTEST</span><span style="font-family:宋体;">。它能注入网络故障（例如丢包），资源满载（例如</span><span style="font-family:Helvetica;">CPU</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">IO</span><span style="font-family:宋体;">、磁盘空间）等等。当测试网络分区时，可以评估环形组网、桥接组网等等。</span></p>  <p><a href="http://blog.cloudera.com/blog/2016/04/quality-assurance-at-cloudera-fault-injection-and-elastic-partitioning/"><span style="font-family:Helvetica;">http://blog.cloudera.com/blog/2016/04/quality-assurance-at-cloudera-fault-injection-and-elastic-partitioning/</span></a></p>  <p><u>&nbsp;</u></p>  <p><span style="font-family:Helvetica;">Hortonworks</span><span style="font-family: 宋体;">博客展望了将包含新版本</span><span style="font-family: Helvetica;">Spark</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">Zeppelin</span><span style="font-family:宋体;">的</span><span style="font-family:Helvetica;">HDP 2.4.2</span><span style="font-family:宋体;">。</span><span style="font-family:Helvetica;">Spark2.0</span><span style="font-family:宋体;">预览版和</span><span style="font-family:Helvetica;">Zeppelin</span><span style="font-family:宋体;">新特性都将包含在内。</span></p>  <p><a href="http://hortonworks.com/blog/apache-spark-apache-zeppelin-whats-coming-in-hdp-2-4-2/"><span style="font-family:Helvetica;">http://hortonworks.com/blog/apache-spark-apache-zeppelin-whats-coming-in-hdp-2-4-2/</span></a></p>  <p><u>&nbsp;</u></p>  <p><span style="font-family:Helvetica;">Cask</span><span style="font-family:宋体;">撰文介绍了在</span><span style="font-family:Helvetica;">Hbase region compaction</span><span style="font-family:宋体;">这样罕见事件发生的前后，他们是怎样通过长时间测试以评估分布式系统正确性的。</span></p>  <p><a href="http://blog.cask.co/2016/04/long-running-tests-in-cdap/"><span style="font-family:Helvetica;">http://blog.cask.co/2016/04/long-running-tests-in-cdap/</span></a></p>  <p><u>&nbsp;</u></p>  <p><span style="font-family:宋体;">本文介绍了如何结合</span><span style="font-family:Helvetica;">SparkR</span><span style="font-family:宋体;">与亚马逊</span><span style="font-family:Helvetica;">EMR</span><span style="font-family:宋体;">进行地理空间分析的。通过</span><span style="font-family: Helvetica;">SparkR</span><span style="font-family:宋体;">的</span><span style="font-family:Helvetica;">Hive</span><span style="font-family:宋体;">集成组件，可以立刻基于</span><span style="font-family:Helvetica;">S3</span><span style="font-family:宋体;">上的数据映射</span><span style="font-family:Helvetica;">Hive</span><span style="font-family:宋体;">外部表。从这开始，数据就能直接加载到内存中使用</span><span style="font-family:Helvetica;">R</span><span style="font-family:宋体;">语言分析，很容易实现高质量的数据可视化。</span></p>  <p><a href="http://blogs.aws.amazon.com/bigdata/post/Tx1MECZ47VAV84F/Exploring-Geospatial-Intelligence-using-SparkR-on-Amazon-EMR"><span style="font-family:Helvetica;">http://blogs.aws.amazon.com/bigdata/post/Tx1MECZ47VAV84F/Exploring-Geospatial-Intelligence-using-SparkR-on-Amazon-EMR</span></a></p>  <p><u>&nbsp;</u></p>  <p><span style="font-family:Helvetica;">MapR</span><span style="font-family:宋体;">编写了使用</span><span style="font-family:Helvetica;">Pig</span><span style="font-family:宋体;">和</span><span style="font-family:Helvetica;">Hive</span><span style="font-family:宋体;">分析职业棒球大联盟球队水平的教程。</span><span style="font-family:Helvetica;">Pig</span><span style="font-family:宋体;">用于数据初加工，</span><span style="font-family:Helvetica;">Hive</span><span style="font-family:宋体;">提供基于</span><span style="font-family:Helvetica;">SQL</span><span style="font-family:宋体;">的数据查询环境。借助</span><span style="font-family:Helvetica;">Hive ODBC</span><span style="font-family:宋体;">驱动和</span><span style="font-family:Helvetica;">Hive</span><span style="font-family:宋体;">服务器，使得微软</span><span style="font-family:Helvetica;">Excel</span><span style="font-family:宋体;">也能用于获取和分析数据。</span></p>  <p><a href="https://www.mapr.com/blog/using-hive-and-pig-baseball-statistics"><span style="font-family:Helvetica;">https://www.mapr.com/blog/using-hive-and-pig-baseball-statistics</span></a></p>  <p><u>&nbsp;</u></p>  <p><span style="font-family:Helvetica;">SignalFX</span><span style="font-family:宋体;">通过</span><span style="font-family:Helvetica;">27</span><span style="font-family:宋体;">节点的</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">集群每天处理</span><span style="font-family:Helvetica;">700</span><span style="font-family:宋体;">多亿条消息。只有基于他们积累的大规模</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">使用经验才能有如此高的量，因此他们共享了不少调试</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">的技巧，定位告警（例如日志刷新延迟增加），以及</span><span style="font-family:Helvetica;">Kafka</span><span style="font-family:宋体;">横向扩展。</span></p>  <p><a href="http://www.confluent.io/blog/how-we-monitor-and-run-kafka-at-scale-signalfx"><span style="font-family:Helvetica;">http://www.confluent.io/blog/how-we-monitor-and-run-kafka-at-scale-signalfx</span></a></p>  <p><u>&nbsp;</u></p>  <p><span style="font-family:Helvetica;">dataArtisan's</span><span style="font-family: 宋体;">博客为了度量</span><span style="font-family:Helvetica;">Flink</span><span style="font-family:宋体;">在数据流效率、低延迟、正确性上的能力，专门写了这篇文章。为了证明效率，在高吞吐量的环境下运行了最新的</span><span style="font-family:Helvetica;">Yahoo!</span><span style="font-family:宋体;">流式基准测试程序。在正确性方面，文章突出了</span><span style="font-family:Helvetica;">Flink</span><span style="font-family:宋体;">事件判别和处理事件（星球大战电影年表做类比）方面的优势。最后，文章描述了</span><span style="font-family:Helvetica;">Flink</span><span style="font-family:宋体;">未来版本基于内存的查询任务。</span></p>  <p><a href="http://data-artisans.com/counting-in-streams-a-hierarchy-of-needs/"><span style="font-family:Helvetica;">http://data-artisans.com/counting-in-streams-a-hierarchy-of-needs/</span></a></p>  <p><strong>&nbsp;</strong></p>  <p><span style="font-family:宋体;">本教程介绍了怎样把</span>TCP Socket<span style="font-family:宋体;">中的文本数据流转换为</span>Spark<span style="font-family:宋体;">流式数据源。</span></p>  <p align="left"><a href="https://medium.com/@anicolaspp/spark-custom-streaming-sources-e7d52da72e80"><span style="font-family:Helvetica;color:#386EFF;text-decoration:none;text-underline:none">https://medium.com/@anicolaspp/spark-custom-streaming-sources-e7d52da72e80</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:宋体;">本文介绍了在构建</span><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">的时候怎样防止</span><span style="font-family:Helvetica;">AWS</span><span style="font-family:宋体;">证书</span><span style="font-family:宋体;">意外提交到补丁或</span><span style="font-family:Helvetica;">git</span><span style="font-family:宋体;">资源库。除</span><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">本身外，本文还建议使用</span><span style="font-family: Helvetica;">&#8220;git-secrets&#8221;</span><span style="font-family:宋体;">工具防止意外提交访问</span><span style="font-family:Helvetica;">/</span><span style="font-family:宋体;">安全密钥。如果你用的是</span><span style="font-family:Helvetica;">Hadoop S3</span><span style="font-family:宋体;">，还推荐了新补丁供评估。</span></p>  <p><a href="http://steveloughran.blogspot.co.uk/2016/04/testing-against-s3-and-object-stores.html"><span style="font-family:Helvetica;">http://steveloughran.blogspot.co.uk/2016/04/testing-against-s3-and-object-stores.html</span></a></p>  <p><u>&nbsp;</u></p>  <p><span style="font-family:Helvetica;">Big Data &amp; Brews</span><span style="font-family:宋体;">采访了</span><span style="font-family:Helvetica;">MapR</span><span style="font-family:宋体;">的</span><span style="font-family:Helvetica;">Ted Dunning</span><span style="font-family: 宋体;">和</span><span style="font-family:Helvetica;">Jacques Nadeau</span><span style="font-family:宋体;">。</span><span style="font-family:Helvetica;">Apache Arrow</span><span style="font-family:宋体;">也在本次采访范围内。</span></p>  <p align="left"><a href="https://www.youtube.com/watch?v=l3mDDKjDjMk"><span style="font-family: Helvetica;color:#386EFF; text-decoration:none;text-underline:none">https://www.youtube.com/watch?v=l3mDDKjDjMk</span></a></p>  <p align="left"><a href="https://www.youtube.com/watch?v=Xo9CO0a0VJI"><span style="font-family: Helvetica;color:#386EFF; text-decoration:none;text-underline:none">https://www.youtube.com/watch?v=Xo9CO0a0VJI</span></a></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">其他新闻</span></strong><strong></strong></p>  <p><span style="font-family:Helvetica;">DataEngConf</span><span style="font-family: 宋体;">最近在旧金山召开。本文总结了</span><span style="font-family: Helvetica;">Uber</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Stripe</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Microsoft</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Instacart</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Jawbone</span><span style="font-family:宋体;">的发言内容。也介绍了会议主题</span><span style="font-family:Helvetica;">&#8220;</span><span style="font-family:宋体;">数据科学在现实世界中是一个产品和工程学科</span><span style="font-family:Helvetica;">&#8221;</span><span style="font-family:宋体;">。</span></p>  <p><a href="https://medium.com/@eugmandel/software-engineering-invades-data-science-notes-from-dataengconf-4a3c066b081f#.g2h0duo44"><span style="font-family:Helvetica;">https://medium.com/@eugmandel/software-engineering-invades-data-science-notes-from-dataengconf-4a3c066b081f#.g2h0duo44</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Hortonworks</span><span style="font-family: 宋体;">在上周都柏林举行的</span><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">欧洲峰会上大放异彩。</span><span style="font-family:Helvetica;">ZDNet</span><span style="font-family:宋体;">报导了这些亮点，其中包括与</span><span style="font-family:Helvetica;">Pivotal</span><span style="font-family:宋体;">（已转售给</span><span style="font-family:Helvetica;">HDP</span><span style="font-family:宋体;">）的扩展合作，与</span><span style="font-family:Helvetica;">Syncosrt</span><span style="font-family:宋体;">的转售协议，以及</span><span style="font-family:Helvetica;">Atlas</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Ranger</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Zeppelin</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Metron</span><span style="font-family:宋体;">的技术预览。报导还介绍了</span><span style="font-family: Helvetica;">Hortonworks</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Cloudera</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">MapR</span><span style="font-family:宋体;">产品的不同之处。</span></p>  <p><a href="http://www.zdnet.com/article/hortonworks-announces-new-alliances-and-releases-hadoop-comes-to-fork-in-road/"><span style="font-family:Helvetica;">http://www.zdnet.com/article/hortonworks-announces-new-alliances-and-releases-hadoop-comes-to-fork-in-road/</span></a></p>  <p><u>&nbsp;</u></p>  <p><span style="font-family:Helvetica;">Flink 2016</span><span style="font-family:宋体;">峰会将在九月于德国柏林举行。讨论议题征集将于六月末结束。</span></p>  <p><a href="http://flink.apache.org/news/2016/04/14/flink-forward-announce.html"><span style="font-family:Helvetica;">http://flink.apache.org/news/2016/04/14/flink-forward-announce.html</span></a></p>  <p><u>&nbsp;</u></p>  <p><span style="font-family:Helvetica;">YouTube</span><span style="font-family:宋体;">上发布了</span><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">都柏林峰会演讲视频。正如预期的那样，这些演讲内容涵盖</span><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">生态系统的各个部分。</span></p>  <p><a href="https://www.youtube.com/channel/UCAPa-K_rhylDZAUHVxqqsRA/videos?flow=list&amp;live_view=500&amp;view=0&amp;sort=dd">https://www.youtube.com/channel/UCAPa-K_rhylDZAUHVxqqsRA/videos?flow=list&amp;live_view=500&amp;view=0&amp;sort=dd</a></p>  <p>&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">产品发布</span></strong><strong></strong></p>  <p><span style="font-family:Helvetica;">Metascope</span><span style="font-family:宋体;">是一个配合</span><span style="font-family:Helvetica;">Schedoscope</span><span style="font-family:宋体;">在</span><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">集群中进行元数据管理的新工具。通过</span><span style="font-family:Helvetica;">web</span><span style="font-family:宋体;">界面，利用数据沿袭它能洞察大量的数据。也提供检索、内嵌文档、</span><span style="font-family: Helvetica;">REST API</span><span style="font-family:宋体;">等等功能。</span></p>  <p><a href="https://github.com/ottogroup/metascope"><span style="font-family:Helvetica;">https://github.com/ottogroup/metascope</span></a></p>  <p><u>&nbsp;</u></p>  <p><span style="font-family:Helvetica;">Apache HBase 1.2.1</span><span style="font-family:宋体;">于本周发布，在</span><span style="font-family:Helvetica;">1.2.0</span><span style="font-family:宋体;">的基础上解决了</span><span style="font-family:Helvetica;">27</span><span style="font-family:宋体;">个问题。发布声明中重点介绍了四个高优先级的问题。</span></p>  <p><a href="http://mail-archives.us.apache.org/mod_mbox/www-announce/201604.mbox/%3CCAN5cbe7-T5uAYvGRbxw2dfvdbwe5s0nx3vKU8Nt2fzXbKPoQTg@mail.gmail.com%3E"><span style="font-family:Helvetica;">http://mail-archives.us.apache.org/mod_mbox/www-announce/201604.mbox/%3CCAN5cbe7-T5uAYvGRbxw2dfvdbwe5s0nx3vKU8Nt2fzXbKPoQTg@mail.gmail.com%3E</span></a></p>  <p>&nbsp;</p>  <p><span style="font-family:Helvetica;">Apache Mahout</span><span style="font-family: 宋体;">机器学习库发布了</span><span style="font-family:Helvetica;">0.12.0</span><span style="font-family:宋体;">版。该版本的</span><span style="font-family:Helvetica;">&#8220;Samsara&#8221;</span><span style="font-family:宋体;">数学环境开始支持</span><span style="font-family:Helvetica;">Apache Flink</span><span style="font-family: 宋体;">了，并且是平台无关的。发布声明中分享了与</span><span style="font-family:Helvetica;">Flink</span><span style="font-family:宋体;">集成、已知问题、项目演进计划相关的内容。</span></p>  <p><a href="http://mail-archives.us.apache.org/mod_mbox/www-announce/201604.mbox/%3CCAOtpBjj5An876PStdn5kMeaF+up-B72WTmCk9j21EXdP=JOCUA@mail.gmail.com%3E"><span style="font-family:Helvetica;">http://mail-archives.us.apache.org/mod_mbox/www-announce/201604.mbox/%3CCAOtpBjj5An876PStdn5kMeaF+up-B72WTmCk9j21EXdP=JOCUA@mail.gmail.com%3E</span></a></p>  <p><u>&nbsp;</u></p>  <p><span style="font-family:Helvetica;">Apache Storm 1.0.0</span><span style="font-family:宋体;">本周发布了。亮点包括性能提升（普遍提升</span><span style="font-family:Helvetica;">3</span><span style="font-family:宋体;">倍以上）、新的分布式缓存</span><span style="font-family:Helvetica;">API</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">nimbus</span><span style="font-family:宋体;">的高可用性、自动反压、动态</span><span style="font-family:Helvetica;">worker</span><span style="font-family:宋体;">性能分析等等。</span></p>  <p><a href="http://storm.apache.org/2016/04/12/storm100-released.html"><span style="font-family:Helvetica;">http://storm.apache.org/2016/04/12/storm100-released.html</span></a></p>  <p><u>&nbsp;</u></p>  <p><span style="font-family:Helvetica;">Apache Kudu</span><span style="font-family: 宋体;">（孵化中）本周发布了</span><span style="font-family: Helvetica;">0.8.0</span><span style="font-family:宋体;">版。本次发布添加了</span><span style="font-family:Helvetica;">Apache Flume sink</span><span style="font-family:宋体;">、部分功能提升、修复了一批</span><span style="font-family: Helvetica;">bug</span><span style="font-family:宋体;">。</span></p>  <p><a href="http://getkudu.io/releases/0.8.0/docs/release_notes.html"><span style="font-family:Helvetica;">http://getkudu.io/releases/0.8.0/docs/release_notes.html</span></a></p>  <p><u>&nbsp;</u></p>  <p align="left"><span style="font-family:Helvetica;">Cloudbreak</span><span style="font-family:宋体;">本周发布了</span><span style="font-family:Helvetica;">1.2</span><span style="font-family:宋体;">版，它为云环境提供</span><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">集群</span><span style="font-family:Helvetica;">Docker</span><span style="font-family:宋体;">。新特性包括支持</span><span style="font-family:Helvetica;">OpenStack</span><span style="font-family:宋体;">以及为自定义服务器提供配置脚本。</span></p>  <p align="left"><a href="http://hortonworks.com/blog/announcing-cloudbreak-1-2/"><span style="font-family:Helvetica;">http://hortonworks.com/blog/announcing-cloudbreak-1-2/</span></a></p>  <p align="left"><u>&nbsp;</u></p>  <p align="left"><span style="font-family:Helvetica;">Cloudera</span><span style="font-family:宋体;">发布了</span><span style="font-family:Helvetica;">Cloudera Enterprise 5.4.10</span><span style="font-family:宋体;">，内置了</span><span style="font-family:Helvetica;">Flume</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Hadoop</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">HBase</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Hive</span><span style="font-family:宋体;">、</span><span style="font-family:Helvetica;">Impala</span><span style="font-family:宋体;">等组件。</span></p>  <p align="left"><a href="http://community.cloudera.com/t5/Community-News-Release/ANNOUNCE-Cloudera-Enterprise-5-4-10-Released/m-p/39790#U39790"><span style="font-family:Helvetica;">http://community.cloudera.com/t5/Community-News-Release/ANNOUNCE-Cloudera-Enterprise-5-4-10-Released/m-p/39790#U39790</span></a></p>  <p align="left"><u>&nbsp;</u></p>  <p align="left"><span style="font-family:Helvetica;">Presto Accumulo</span><span style="font-family:宋体;">是个新项目，为</span><span style="font-family:Helvetica;">Accumulo</span><span style="font-family:宋体;">读写数据提供了</span><span style="font-family:Helvetica;">Presto</span><span style="font-family:宋体;">连接器。</span></p>  <p align="left"><a href="https://github.com/bloomberg/presto-accumulo"><span style="font-family:Helvetica;">https://github.com/bloomberg/presto-accumulo</span></a></p>  <p align="left">&nbsp;</p>  <p><strong><span style="font-size:15.0pt;font-family:宋体;">活动</span></strong><strong></strong></p>  <p align="left"><span style="font-size:14.0pt;font-family:SimSun;">中国</span></p>  <p align="left"><span style="font-family:SimSun;">无</span></p><img src ="http://www.blogjava.net/rosen/aggbug/430176.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/rosen/" target="_blank">Rosen</a> 2016-04-21 15:07 <a href="http://www.blogjava.net/rosen/archive/2016/04/21/430176.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>Hadoop周刊—第 165 期</title><link>http://www.blogjava.net/rosen/archive/2016/04/14/430099.html</link><dc:creator>Rosen</dc:creator><author>Rosen</author><pubDate>Thu, 14 Apr 2016 10:02:00 GMT</pubDate><guid>http://www.blogjava.net/rosen/archive/2016/04/14/430099.html</guid><wfw:comment>http://www.blogjava.net/rosen/comments/430099.html</wfw:comment><comments>http://www.blogjava.net/rosen/archive/2016/04/14/430099.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/rosen/comments/commentRss/430099.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/rosen/services/trackbacks/430099.html</trackback:ping><description><![CDATA[<p><strong><span style="font-size:22.0pt;font-family:&quot;Lantinghei SC Demibold&quot;; color:#355400;">Hadoop</span><span style="font-size:22.0pt; font-family:&quot;Lantinghei SC Demibold&quot;;color:#355400;">周刊</span></strong></p>  <p><strong>&nbsp;</strong></p>  <p><span style="font-size:14.0pt;font-family:&quot;Lantinghei SC Demibold&quot;; color:#355400;"><strong>第 165 期 2016年4月10日 </strong></span></p>  <p><span style="font-size:10.5pt;font-family:&quot;Lantinghei SC Demibold&quot;; color:#355400;"><strong>启明星辰&#8212;&#8212;平台和大数据整体组编译</strong></span></p>  <p>&nbsp;</p>  <p><span style="font-size:12.0pt;line-height:135%; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;;">本周，包括</span><span style="font-size:12.0pt;line-height:135%">LinkedIn </span><span style="font-size:12.0pt;line-height:135%;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;;">和</span><span style="font-size: 12.0pt;line-height:135%">Airbnb</span><span style="font-size:12.0pt;line-height: 135%;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;;">新开源项目在内的数个产品进行了重大版本发布。本期技术部分与流式处理有关</span><span style="font-size:12.0pt;line-height:135%">&#8212;&#8212;Spark</span><span style="font-size:12.0pt;line-height:135%;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;;">、</span><span style="font-size: 12.0pt;line-height:135%">Flink</span><span style="font-size:12.0pt;line-height: 135%;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;;">、</span><span style="font-size:12.0pt;line-height:135%">Kafka</span><span style="font-size:12.0pt;line-height:135%;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;;">等等；新闻部分是关于</span><span style="font-size:12.0pt;line-height:135%">Spark Summit </span><span style="font-size:12.0pt;line-height:135%;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;;">和</span><span style="font-size: 12.0pt;line-height:135%">HbaseCon</span><span style="font-size:12.0pt; line-height:135%;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;;">的会议议程。</span></p>  <h1><span style="font-family: 'Comic Sans MS'; font-size: 18pt;">技术</span></h1>  <p><span style="font-size:10.5pt;">Zalando</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;;">发表了他们是如何选择</span><span style="font-size:10.5pt;">Apache Flink</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;;">作为流式处理框架的文章。该文章阐述了对评价标准进行验证后得出的结论，阐明了选择</span><span style="font-size:10.5pt;">Apache Flink</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;;">的主因</span><span style="font-size:10.5pt;">&#8212;</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;;">在高吞吐量的情况下依然能保持低延迟，真正的流式处理，开发人员支持。</span></p>  <p><a href="https://tech.zalando.com/blog/apache-showdown-flink-vs.-spark/"><span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#0088CC;background:white">https://tech.zalando.com/blog/apache-showdown-flink-vs.-spark/</span></a></p>  <p>&nbsp;</p>  <p><span style="font-size:10.5pt">Cloudera</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">博客刊登了来自</span><span style="font-size:10.5pt">Wargaming.net</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">的文章，通过本文可了解到他们如何通过</span><span style="font-size: 10.5pt">Kafka</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">、</span><span style="font-size:10.5pt">HBase</span><span style="font-size:10.5pt;font-family: 宋体;Times New Roman&quot;;Times New Roman&quot;">、</span><span style="font-size:10.5pt">Drools</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">、</span><span style="font-size:10.5pt">Spark</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">构建实时处理基础设施的。另外，在数据流程方面，他们介绍了如何对</span><span style="font-size:10.5pt">HBase</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">的检索和序列化、</span><span style="font-size:10.5pt">HBase</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">和</span><span style="font-size:10.5pt">Spark</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">之间的数据本地化以及</span><span style="font-size:10.5pt">Spark</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">计算方面的优化措施。</span></p>  <p><a href="http://blog.cloudera.com/blog/2016/04/inside-wargamings-data-driven-real-time-rules-engine/"><span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#0088CC;background:white">http://blog.cloudera.com/blog/2016/04/inside-wargamings-data-driven-real-time-rules-engine/</span></a></p>  <p>&nbsp;</p>  <p><span style="font-size:10.5pt">InfoQ</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">发布了大规模流式处理</span><span style="font-size:10.5pt">&#8212;SMACK</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">（</span><span style="font-size:10.5pt">Spark</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">、</span><span style="font-size:10.5pt">Mesos</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">、</span><span style="font-size:10.5pt">Akka</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">、</span><span style="font-size:10.5pt">Cassandra</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">以及</span><span style="font-size:10.5pt"> Kafka</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">）栈的介绍视频。讨论了为什么</span><span style="font-size:10.5pt">SMACK</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">栈在处理同样问题的时候比</span><span style="font-size:10.5pt">Lambda</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">架构更简单。</span></p>  <p><a href="http://www.infoq.com/presentations/stream-analytics-scalability"><span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#0088CC;background:white">http://www.infoq.com/presentations/stream-analytics-scalability</span></a></p>  <p>&nbsp;</p>  <p><span style="font-size:10.5pt">Confluent&#8220;</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">日志压缩</span><span style="font-size:10.5pt">&#8221;</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">系列博文又有更新，介绍了</span><span style="font-size:10.5pt">Kafka</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">项目三月份发生的事情。有不少令人关注的开发内容，包括机架感知、</span><span style="font-size:10.5pt">Kerberos</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">支持、基于时间索引方面的进展。以及不少你（我也是）没有时间持续关注的最新研发成果。</span></p>  <p><a href="http://www.confluent.io/blog/log-compaction-highlights-in-the-kafka-and-stream-processing-community-april-2016"><span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#0088CC;background:white">http://www.confluent.io/blog/log-compaction-highlights-in-the-kafka-and-stream-processing-community-april-2016</span></a></p>  <p>&nbsp;</p>  <p><span style="font-size:10.5pt">Apache Flink 1.0</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">引入了新的复杂事件处理（</span><span style="font-size:10.5pt">CEP</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">）库。啰嗦几句，</span><span style="font-size:10.5pt">CEP</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">提供了一种检测事件模式的方法。本文借助传感器从数据中心服务器上收集数据，运用一种可能的异常检测用例，诠释了</span><span style="font-size:10.5pt">Flink</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">的</span><span style="font-size:10.5pt">CEP</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">模式</span><span style="font-size:10.5pt">API </span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">。</span></p>  <p><a href="http://flink.apache.org/news/2016/04/06/cep-monitoring.html"><span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#0088CC;background:white">http://flink.apache.org/news/2016/04/06/cep-monitoring.html</span></a></p>  <p>&nbsp;</p>  <p><span style="font-size:10.5pt">Genome Analysis Toolkit </span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">（</span><span style="font-size:10.5pt">GATK</span><span style="font-size:10.5pt;font-family: 宋体;Times New Roman&quot;;Times New Roman&quot;">）最近宣布，下一个版本（当前是</span><span style="font-size:10.5pt">alpha</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">）将支持</span><span style="font-size:10.5pt">Apache Spark</span><span style="font-size: 10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">。本文简要介绍了工具箱并展示了怎样通过</span><span style="font-size:10.5pt">Spark</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">来检测重复</span><span style="font-size:10.5pt">DNA</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">片段的。</span></p>  <p><a href="http://blog.cloudera.com/blog/2016/04/genome-analysis-toolkit-now-using-apache-spark-for-data-processing/"><span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#0088CC;background:white">http://blog.cloudera.com/blog/2016/04/genome-analysis-toolkit-now-using-apache-spark-for-data-processing/</span></a></p>  <p>&nbsp;</p>  <p><span style="font-size:10.5pt">InfoWorld</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">综述了</span><span style="font-size:10.5pt">Spark2.0</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">关于结构化流式处理方面的计划。微批处理将依然延续，还有些新特性，例如无限数据帧（</span><span style="font-size:10.5pt">Infinite DataFrames</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">）、一流的重复查询支持。</span></p>  <p><a href="http://www.infoworld.com/article/3052924/analytics/what-sparks-structured-streaming-really-means.html"><span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#0088CC;background:white">http://www.infoworld.com/article/3052924/analytics/what-sparks-structured-streaming-really-means.html</span></a></p>  <p>&nbsp;</p>  <p><span style="font-size:10.5pt">AWS</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">大数据博客发布了一篇通过存储在</span><span style="font-size: 10.5pt">AWS Key Management Service </span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">（</span><span style="font-size:10.5pt">KMS</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">）中的加密密钥加载数据到</span><span style="font-size:10.5pt">S3</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">和</span><span style="font-size:10.5pt">Redshift</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">的文章。除了描述所需步骤，本文还介绍了如何在</span><span style="font-size:10.5pt">AWS S3</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">中通过</span><span style="font-size:10.5pt">KMS</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">密钥加密数据。</span></p>  <p><a href="http://blogs.aws.amazon.com/bigdata/post/Tx2Q3ZBOZO9DHVQ/Encrypt-Your-Amazon-Redshift-Loads-with-Amazon-S3-and-AWS-KMS"><span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#0088CC;background:white">http://blogs.aws.amazon.com/bigdata/post/Tx2Q3ZBOZO9DHVQ/Encrypt-Your-Amazon-Redshift-Loads-with-Amazon-S3-and-AWS-KMS</span></a></p>  <p>&nbsp;</p>  <p><span style="font-size:10.5pt">Confluent</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">博客介绍了如何使用</span><span style="font-size:10.5pt">Kafka Connect </span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">和</span><span style="font-size:10.5pt"> Kafka Streams </span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">编写非凡的</span><span style="font-size:10.5pt">&#8220;hello world&#8221;</span><span style="font-size: 10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">程序。更确切地说，范例程序从</span><span style="font-size:10.5pt">IRC</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">拉维基百科数据，并解析消息、进行多方面的统计计算。本文还用了若干程序展示了整个实现过程。</span></p>  <p><a href="http://www.confluent.io/blog/hello-world-kafka-connect-kafka-streams"><span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#0088CC;background:white">http://www.confluent.io/blog/hello-world-kafka-connect-kafka-streams</span></a></p>  <p>&nbsp;</p>  <p style="line-height:107%"><span style="font-size:10.5pt; line-height:107%;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">本文从</span><span style="font-size:10.5pt; line-height:107%">Postgres </span><span style="font-size:10.5pt;line-height: 107%;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">向</span><span style="font-size:10.5pt;line-height:107%"> Cassandra</span><span style="font-size:10.5pt;line-height:107%;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">转换简单的模式（</span><span style="font-size:10.5pt;line-height:107%">schemas</span><span style="font-size: 10.5pt;line-height:107%;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">），并描述了主要的差异</span><span style="font-size:10.5pt; line-height:107%">&#8212;</span><span style="font-size:10.5pt;line-height:107%; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">复制、数据类型（</span><span style="font-size:10.5pt;line-height:107%">Cassandra</span><span style="font-size:10.5pt;line-height:107%;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">不支持</span><span style="font-size:10.5pt;line-height:107%">JSON</span><span style="font-size: 10.5pt;line-height:107%;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">）、主键、最终以一致性。</span></p>  <p><a href="http://neovintage.org/2016/04/07/data-modeling-in-cassandra-from-a-postgres-perspective/"><span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#0088CC;background:white">http://neovintage.org/2016/04/07/data-modeling-in-cassandra-from-a-postgres-perspective/</span></a></p>  <p>&nbsp;</p>  <h1><span style="font-family: 'Comic Sans MS'; font-size: 18pt;">新闻</span></h1>  <p style="line-height:107%"><span style="font-size: 10.5pt;line-height:107%">ESG</span><span style="font-size:10.5pt;line-height: 107%;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">博客报导了最近</span><span style="font-size:10.5pt;line-height:107%">Strata+Hadoop World</span><span style="font-size:10.5pt;line-height:107%;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">大会的情况。并有些重点关注，例如</span><span style="font-size:10.5pt;line-height:107%">Spark</span><span style="font-size:10.5pt;line-height:107%;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">的良好势头、机器学习、云服务。</span></p>  <p><a href="http://blog.esg-global.com/riding-high-at-stratahadoop-world"><span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#0088CC;background:white">http://blog.esg-global.com/riding-high-at-stratahadoop-world</span></a></p>  <p>&nbsp;</p>  <p><span style="font-size:10.5pt">InformationWeek</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">也报导了</span><span style="font-size:10.5pt">Strata</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">大会，关注了</span><span style="font-size:10.5pt">MapR</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">和</span><span style="font-size:10.5pt">Pivotal</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">的关灯片、人工智能等。</span></p>  <p><a href="http://www.informationweek.com/big-data/ai-public-data-sets-real-time-strata-+-hadoop-keynote-sampling/d/d-id/1324943?"><span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#0088CC;background:white">http://www.informationweek.com/big-data/ai-public-data-sets-real-time-strata-+-hadoop-keynote-sampling/d/d-id/1324943?</span></a></p>  <p>&nbsp;</p>  <p><span style="font-size:10.5pt">Spark Summit 2016</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">议程敲定，将于</span><span style="font-size:10.5pt">6</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">月</span><span style="font-size:10.5pt">6-8</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">日在旧金山举行。会议将有两天展开五个方向的讨论。</span></p>  <p><a href="https://databricks.com/blog/2016/04/04/agenda-announced-for-sparksummit-2016-in-san-francisco.html"><span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#0088CC;background:white">https://databricks.com/blog/2016/04/04/agenda-announced-for-sparksummit-2016-in-san-francisco.html</span></a></p>  <p>&nbsp;</p>  <p><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">福布斯采访了</span><span style="font-size:10.5pt">Cloudera CEO Tom Reilly</span><span style="font-size: 10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">，他讨论了公司的机遇、竞争性市场、上市计划等。</span></p>  <p><a href="http://www.forbes.com/sites/roberthof/2016/04/06/ceo-tom-reilly-makes-the-case-for-cloudera-and-its-ipo/"><span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#0088CC;background:white">http://www.forbes.com/sites/roberthof/2016/04/06/ceo-tom-reilly-makes-the-case-for-cloudera-and-its-ipo/</span></a></p>  <p>&nbsp;</p>  <p><span style="font-size:10.5pt">Datanami</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">撰文将正在崛起的</span><span style="font-size:10.5pt">Apache Kafka</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">作为流式处理的支柱。文章还采访了</span><span style="font-size:10.5pt">Confluent</span><span style="font-size: 10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">联合创始人兼</span><span style="font-size:10.5pt">CTO Neha Narkhede</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">，坊间她表示最近将推出</span><span style="font-size:10.5pt">Kafka Connect </span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">和</span><span style="font-size:10.5pt"> Kafka Streams</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">。</span></p>  <p><a href="http://www.datanami.com/2016/04/06/real-time-rise-apache-kafka/"><span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#0088CC;background:white">http://www.datanami.com/2016/04/06/real-time-rise-apache-kafka/</span></a></p>  <p>&nbsp;</p>  <p><span style="font-size:10.5pt">HBaseCon</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">将于</span><span style="font-size:10.5pt">5</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">月</span><span style="font-size:10.5pt">24</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">日在旧金山召开，最近议程才正式宣布。在三个方向上，将有</span><span style="font-size:10.5pt">20</span><span style="font-size:10.5pt;font-family: 宋体;Times New Roman&quot;;Times New Roman&quot;">个以上的议题要讨论。</span></p>  <p><a href="http://blog.cloudera.com/blog/2016/04/hbasecon-2016-speaker-lineup-announced/"><span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#0088CC;background:white">http://blog.cloudera.com/blog/2016/04/hbasecon-2016-speaker-lineup-announced/</span></a></p>  <p>&nbsp;</p>  <h1><span style="font-family: 'Comic Sans MS'; font-size: 18pt;">发布</span></h1>  <p>&nbsp;<span style="font-size:10.5pt">Apache HBase 0.98.18 </span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">和</span><span style="font-size:10.5pt">1.1.4</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">最近都发布了。</span><span style="font-size:10.5pt">1.1.4</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">上有包括九个或正确性在内的若干修复。</span><span style="font-size: 10.5pt">HBase 0.98.18</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">羞答答的仅解决了</span><span style="font-size:10.5pt">50</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">个问题（</span><span style="font-size:10.5pt">bug</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">、改善两个新特性）。</span></p>  <p><a href="http://mail-archives.apache.org/mod_mbox/hbase-user/201603.mbox/%3CCANZa%3DGu-mAxKEtfoRjctHcE0KD7z52oE010Fgsf6AMmW2tDZLA%40mail.gmail.com%3E"><span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#0088CC;background:white">http://mail-archives.apache.org/mod_mbox/hbase-user/201603.mbox/%3CCANZa%3DGu-mAxKEtfoRjctHcE0KD7z52oE010Fgsf6AMmW2tDZLA%40mail.gmail.com%3E</span></a>&nbsp;<span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#333333"><br /> </span><a href="http://mail-archives.apache.org/mod_mbox/hbase-user/201603.mbox/%3CCA%2BRK%3D_CtZ1L07nS6Og2ekfVwet0qTE7jw-bmyD2pp5UPweUehQ%40mail.gmail.com%3E"><span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#0088CC;background:white">http://mail-archives.apache.org/mod_mbox/hbase-user/201603.mbox/%3CCA%2BRK%3D_CtZ1L07nS6Og2ekfVwet0qTE7jw-bmyD2pp5UPweUehQ%40mail.gmail.com%3E</span></a></p>  <p>&nbsp;</p>  <p><span style="font-size:10.5pt">Apache Lens</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">发布了</span><span style="font-size:10.5pt">2.5.0-beta</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">，作为统一分析接口，它已经支持</span><span style="font-size: 10.5pt">Hadoop</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">生态系统的执行引擎数据存储了。本次发布解决了</span><span style="font-size:10.5pt">87</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">票，主要是</span><span style="font-size:10.5pt">bug</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">修复和实现新功能。</span></p>  <p><a href="http://mail-archives.us.apache.org/mod_mbox/www-announce/201604.mbox/%3CCAL3kmZj60kpopRPpOVEs9o7oTg7YuaC_=c8zncBeMyUESrZsmQ@mail.gmail.com%3E"><span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#0088CC;background:white">http://mail-archives.us.apache.org/mod_mbox/www-announce/201604.mbox/%3CCAL3kmZj60kpopRPpOVEs9o7oTg7YuaC_=c8zncBeMyUESrZsmQ@mail.gmail.com%3E</span></a></p>  <p>&nbsp;</p>  <p><span style="font-size:10.5pt">Airbnb </span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">开源了</span><span style="font-size:10.5pt"> Caravel</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">，数据探索系统（数据可视化平台）。</span><span style="font-size: 10.5pt">Caravel</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">支持多种在商业产品上才能看到的特性，能够连接到任意只要支持</span><span style="font-size:10.5pt">SQL</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">方言的系统。尤其它支持面向</span><span style="font-size:10.5pt">Druid</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">的实时分析。</span></p>  <p><a href="https://medium.com/airbnb-engineering/caravel-airbnb-s-data-exploration-platform-15a72aa610e5"><span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#0088CC;background:white">https://medium.com/airbnb-engineering/caravel-airbnb-s-data-exploration-platform-15a72aa610e5</span></a></p>  <p>&nbsp;</p>  <p><span style="font-size:10.5pt">MapR </span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">宣布支持</span><span style="font-size:10.5pt">Apache Drill 1.6</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">作为他们的分布式系统。比较有亮点的发布有</span><span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#333333;background:white">MapR-DB</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">新存储插件、新</span><span style="font-size:10.5pt">SQL</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">窗口函数支持以及端对端安全。在网页介绍部分，有些使用</span><span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#333333;background:white">MapR-DB API</span><span style="font-size:10.5pt;font-family:&quot;MS Mincho&quot;;MS Mincho&quot;; color:#333333;background:white">加</span><span style="font-size:10.5pt; font-family:SimSun;color:#333333;background:white">载</span><span style="font-size:10.5pt;font-family:&quot;MS Mincho&quot;;MS Mincho&quot;; color:#333333;background:white">数据并通</span><span style="font-size:10.5pt; font-family:SimSun;color:#333333;background:white">过</span><span style="font-size:10.5pt">Drill</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">查询的例子。</span></p>  <p><a href="https://www.mapr.com/blog/apache-drill-16-mapr-converged-platform-gearing-new-generation-stack-json-enabled-big-data"><span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#0088CC;background:white">https://www.mapr.com/blog/apache-drill-16-mapr-converged-platform-gearing-new-generation-stack-json-enabled-big-data</span></a></p>  <p>&nbsp;</p>  <p><span style="font-size:10.5pt">Apache Flink</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">发布了修复</span><span style="font-size:10.5pt">bug</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">后的</span><span style="font-size:10.5pt">1.0.x</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">。这次发布解决了</span><span style="font-size:10.5pt">23</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">个问题，推荐所有</span><span style="font-size:10.5pt">1.0.0</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">的用户升级。</span></p>  <p><a href="http://flink.apache.org/news/2016/04/06/release-1.0.1.html"><span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#0088CC;background:white">http://flink.apache.org/news/2016/04/06/release-1.0.1.html</span></a></p>  <p>&nbsp;</p>  <p><span style="font-size:10.5pt">Cloudera Enterprise 5.7</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">发布附带了</span><span style="font-size:10.5pt">Spark</span><span style="font-size:10.5pt;font-family: 宋体;Times New Roman&quot;;Times New Roman&quot;">、</span><span style="font-size:10.5pt">HBase</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">、</span><span style="font-size:10.5pt">Impala</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">、</span><span style="font-size:10.5pt">Kafka</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">等组件版本的升级。本次发布的亮点包括从</span><span style="font-size:10.5pt">Cloudera Labs </span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">新鲜推荐的</span><span style="font-size:10.5pt">Hive-on-Spark</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">、</span><span style="font-size:10.5pt">HBase-Spark</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">、</span><span style="font-size:10.5pt">Impala</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">性能重要提升，支持</span><span style="font-size:10.5pt">SSD </span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">上</span><span style="font-size:10.5pt">HBase WAL</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">。</span></p>  <p><a href="http://blog.cloudera.com/blog/2016/04/cloudera-enterprise-5-7-is-released/"><span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#0088CC;background:white">http://blog.cloudera.com/blog/2016/04/cloudera-enterprise-5-7-is-released/</span></a></p>  <p>&nbsp;</p>  <p><span style="font-size:10.5pt">Apache Tajo</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">，构建在</span><span style="font-size:10.5pt">Hadoop</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">上的数据仓库系统，发布了</span><span style="font-size:10.5pt">0.11.2</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">版。新版本支持了</span><span style="font-size:10.5pt">Kerberos</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">，修复了</span><span style="font-size:10.5pt">ORC</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">表对</span><span style="font-size:10.5pt">Hive</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">的支持等。</span></p>  <p><a href="http://tajo.apache.org/releases/0.11.2/announcement.html"><span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#0088CC;background:white">http://tajo.apache.org/releases/0.11.2/announcement.html</span></a></p>  <p>&nbsp;</p>  <p><span style="font-size:10.5pt">LinkedIn </span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">开源了</span><span style="font-size:10.5pt"> Dr. Elephant</span><span style="font-size:10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">，里面的工具能诊断</span><span style="font-size:10.5pt">Hadoop</span><span style="font-size:10.5pt;font-family: 宋体;Times New Roman&quot;;Times New Roman&quot;">和</span><span style="font-size:10.5pt">Spark</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">任务的性能问题。基于</span><span style="font-size:10.5pt">metrics</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">从</span><span style="font-size:10.5pt">YARN</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">资源管理器收集已完成任务数据，</span><span style="font-size:10.5pt">Dr. Elephant</span><span style="font-size: 10.5pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">评估后生成诊断报表，内容包括数据错位、</span><span style="font-size:10.5pt">GC</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">开销等。</span><span style="font-size:10.5pt">LinkedIn</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">宣称借助它能解决</span><span style="font-size:10.5pt">80%</span><span style="font-size:10.5pt; font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">的问题。</span></p>  <p><a href="https://engineering.linkedin.com/blog/2016/04/dr-elephant-open-source-self-serve-performance-tuning-hadoop-spark"><span style="font-size:10.5pt;font-family:&quot;Helvetica Neue&quot;;Times New Roman&quot;;color:#0088CC;background:white">https://engineering.linkedin.com/blog/2016/04/dr-elephant-open-source-self-serve-performance-tuning-hadoop-spark</span></a></p>  <p>&nbsp;</p>  <h1><span style="font-family: 'Comic Sans MS'; font-size: 18pt;">活动</span></h1>  <p><strong><span style="font-size:16.0pt;font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">中国</span></strong><strong></strong></p>  <p><span style="font-family:宋体;Times New Roman&quot;;Times New Roman&quot;">无</span></p><img src ="http://www.blogjava.net/rosen/aggbug/430099.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/rosen/" target="_blank">Rosen</a> 2016-04-14 18:02 <a href="http://www.blogjava.net/rosen/archive/2016/04/14/430099.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item></channel></rss>