﻿<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:trackback="http://madskills.com/public/xml/rss/module/trackback/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/"><channel><title>BlogJava-全世界的屋顶-文章分类-DISC</title><link>http://www.blogjava.net/honeybee/category/30619.html</link><description /><language>zh-cn</language><lastBuildDate>Thu, 10 Apr 2008 03:58:35 GMT</lastBuildDate><pubDate>Thu, 10 Apr 2008 03:58:35 GMT</pubDate><ttl>60</ttl><item><title>Paper Learning: Data-Intensive Supercomputing: The case for DISC</title><link>http://www.blogjava.net/honeybee/articles/191770.html</link><dc:creator>sun</dc:creator><author>sun</author><pubDate>Thu, 10 Apr 2008 02:21:00 GMT</pubDate><guid>http://www.blogjava.net/honeybee/articles/191770.html</guid><wfw:comment>http://www.blogjava.net/honeybee/comments/191770.html</wfw:comment><comments>http://www.blogjava.net/honeybee/articles/191770.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/honeybee/comments/commentRss/191770.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/honeybee/services/trackbacks/191770.html</trackback:ping><description><![CDATA[<span><span><span><span><span lang="EN-US" style="font-size: 12pt; mso-font-kerning: 0pt">Recently, I have been studying something on DISC, the inspiration for which comes from&nbsp;Google's success that have been&nbsp;used to support search over the worldwide web. According to learning Data-Intensive Supercomputing: The case for DISC, maybe we can turn the idea of constructing a Google's infrastructure like system&nbsp;into reality, that is DISC.<br />
<br />
DISC can be developed as a prototype&nbsp;system of Google's instructure,&nbsp;we can&nbsp;divide it into two types of partitions: one for application development, and the&nbsp;other for system research.<br />
For <strong>the program development</strong> partitions, we can use available software, such as the open source code from the Hadoop project, to implement the file system and support for application programming. <o:p></o:p></span>
<p class="MsoNormal" style="text-align: left; mso-pagination: widow-orphan" align="left"><span lang="EN-US" style="font-size: 12pt; mso-font-kerning: 0pt">For<strong> the systems research</strong> partitions, we can create our own design, studying the different kinds of design patterns. (e.g.: high-end hardware, low-cost component).<o:p></o:p></span></p>
<p class="MsoNormal" style="text-align: left; mso-layout-grid-align: none" align="left"><span lang="EN-US" style="font-size: 12pt; mso-font-kerning: 0pt"><br />
The paper Data-Intensive Supercomputing: The case for DISC gives me an entire impression&nbsp;of&nbsp;a&nbsp;new form of high-performance computing facility, and there are many other aspects that deeply attract me, I've taken notes on&nbsp;this paper&nbsp;as follows:</span></span></span><span lang="EN-US"><span><span><span><span><span><span lang="EN-US"><span><span><span><span><span><span><span><span><span><br />
</span></span><br />
</span></span></span><br />
</p>
<p align="left"><span>阅读</span><span lang="EN-US">Paper</span><span>：</span></p>
<h1><span lang="EN-US">Data-Intensive Supercomputing: The case for DISC<span>&nbsp;&nbsp; </span></h1>
<p><strong><span lang="EN-US">Randal E. Bryant <span>&nbsp;</span></strong><span lang="EN-US">May 10, 2007 CMU-CS-07-128</span></p>
<p><strong><span lang="EN-US">&nbsp;</span></strong></p>
<p align="left"><span>Question：</span><em><span lang="EN-US">How can university researchers demonstrate the credibility of their work without having comparable computing facilities available?</span></em></p>
<h2><span lang="EN-US">1 Background</span></h2>
<p align="left"><span lang="EN-US">Describe </span><span lang="EN-US">a new form of high-performance computing facility (<em>Data-Intensive Super Computer)</em> that places emphasis on <em>data</em>, rather than raw computation, as the core focus of the system.</span></p>
<p align="left"><span lang="EN-US">The author inspiration for DISC: comes from the server infrastructures that have been developed to support search over the worldwide web.</span></p>
<p align="left"><span lang="EN-US">This paper outlines the case for DISC as an important direction for large-scale computing systems.</span></p>
<h3><span lang="EN-US">1.1 Motivation</span></h3>
<p align="left"><span lang="EN-US">The common role in the computations:</span></p>
<p align="left"><span lang="EN-US">&#8226; </span><em><span lang="EN-US">Web search without language barriers. (</span></em><span lang="EN-US">No matter in which language they </span><span lang="EN-US">type the query<em>)</em></span></p>
<p align="left"><span lang="EN-US">&#8226; </span><em><span lang="EN-US">Inferring biological function from genomic sequences.</span></em></p>
<p align="left"><span lang="EN-US">&#8226; </span><em><span lang="EN-US">Predicting and modeling the effects of earthquakes</span></em><span lang="EN-US">.</span></p>
<p align="left"><span lang="EN-US">&#8226; </span><em><span lang="EN-US">Discovering new astronomical phenomena from telescope imagery data.</span></em></p>
<p align="left"><span lang="EN-US">&#8226; </span><em><span lang="EN-US">Synthesizing realistic graphic animations.</span></em></p>
<p align="left"><span lang="EN-US">&#8226; </span><em><span lang="EN-US">Understanding the spatial and temporal patterns of brain behavior based on MRI data.<br />
</span></em></p>
<h2><span lang="EN-US"><br />
2 Data-Intensive Super Computing</span></h2>
<p><strong><em><u><span lang="EN-US">Conventional (Current) supercomputers:</span></u></em></strong></p>
<p><span lang="EN-US">are evaluated largely on the number of arithmetic operations they can supply each second to the application programs.</span></p>
<p align="left"><strong><span lang="EN-US">Advantage:</span></strong><span lang="EN-US"> highly structured data requires large amounts of computation.</span></p>
<p><strong><span lang="EN-US">Disadvantage: </span></strong></p>
<p><span lang="EN-US">1. It creates misguided priorities in the way these machines are designed, programmed, and operated;</span></p>
<p align="left"><span lang="EN-US">2. Disregarding the importance of incorporating computation-proximate, fast-access data storage, and at the same time creating machines that are very difficult to program effectively;</span></p>
<p align="left"><span lang="EN-US">3. The range of computational styles is restricted by the system structure.</span></p>
<p align="left"><strong><em><u></u></em></strong></p>
<p align="left"><strong><em><u><span lang="EN-US">The key principles of DISC:</span></u></em></strong></p>
<p align="left"><span lang="EN-US"><span>1.<span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></span><em><span lang="EN-US">Intrinsic, rather than extrinsic data. </span></em></p>
<p align="left"><span lang="EN-US"><span>2.<span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></span><em><span lang="EN-US">High-level programming models for expressing computations over the data.</span></em></p>
<p align="left"><span lang="EN-US"><span>3.<span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></span><em><span lang="EN-US">Interactive access.</span></em></p>
<p align="left"><span lang="EN-US"><span>4.<span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></span><em><span lang="EN-US">Scalable mechanisms to ensure high reliability and availability. (error detection and handling)<br />
</span></em></p>
<h2><span lang="EN-US"><br />
<br />
3 Comparison to Other Large-Scale Computer Systems</span></h2>
<h3><span lang="EN-US">3.1 Current Supercomputers</span></h3>
<h3><span lang="EN-US">3.2 Transaction Processing Systems</span></h3>
<h3><span lang="EN-US">3.3 Grid Systems<br />
</span></h3>
<h2><span lang="EN-US"><br />
<br />
4 Google: A DISC Case Study</span></h2>
<p><span lang="EN-US">1. The Google system actively maintains<span> cached copies</span> of every document it can find on the Internet.</span></p>
<p align="left"><span lang="EN-US">The system constructs complex<span> <em>index </em>structures</span>, <span>summarizing information</span> about the documents in forms that enable rapid identification of the documents most relevant to a particular query.</span></p>
<p align="left"><span lang="EN-US">When <u>a user submits a query</u>, the front end <u>servers</u> direct the query to one of the clusters, where several hundred processors work together to <u>determine the best matching </u>documents <u>based on the index structures</u>. The system then <u>retrieves the documents from their cached locations</u>, <u>creates brief summaries of the documents</u>, <u>orders them with the most relevant documents first</u>, and <u>determines which sponsored links should be placed on the page</u>.</span></p>
<p align="left"><span lang="EN-US">2. The Google hardware design is based on a philosophy of using components that emphasize<u> low cost and low power over raw speed and reliability</u>. <u>Google keeps the hardware as simple as possible</u>.</span></p>
<p align="left"><span lang="EN-US">They<u> make extensive use of redundancy and software-based reliability</u>.</span></p>
<p align="left"><span lang="EN-US">These <u>failed</u> components are removed and replaced <u>without turning the system off.</u></span></p>
<p align="left"><u><span lang="EN-US">Google has significantly lower operating costs in terms of power consumption and human labor than do other data centers</span></u><span lang="EN-US">.</span></p>
<p align="left"><span lang="EN-US">3. <em>MapReduce</em>, that supports powerful forms of computation performed in parallel over large amounts of data.</span><span lang="EN-US"> </span></p>
<p align="left"><span lang="EN-US">Two function: a <em>map </em>function that generates values and associated keys from each document, and a <em>reduction </em>function that describes how all the data matching each possible key should be combined.</span></p>
<p align="left"><span lang="EN-US">MapReduce can be used<u> to compute statistics about documents</u>, <u>to create the index structures</u> used by the search engine, and <u>to implement their PageRank algorithm for quantifying the relative importance of different web documents</u>.</span></p>
<p align="left"><span lang="EN-US">4<em>. BigTable:</em></span><span lang="EN-US"> a distributed data structures, provides capabilities similar to those seen in database systems.</span></p>
<h2><span lang="EN-US"><br />
5 Possible Usage Model</span></h2>
<p align="left"><span lang="EN-US">The DISC operations could include user-specified functions in the style of Google&#8217;s MapReduce programming framework. As with databases, different users will be given different authority over what operations can be performed and what modifications can be made.<br />
<br />
&nbsp; </p>
<h2><span lang="EN-US">6 Constructing a General-Purpose DISC System</span></h2>
<p align="left"><span lang="EN-US">The open source project <em>Hadoop </em>implements capabilities similar to the Google file system and support for MapReduce. </span></p>
<p align="left"><strong><span lang="EN-US">Constructing a General-Purpose DISC System</span></strong><strong><span>：</span></strong><strong></strong></p>
<p align="left"><span lang="EN-US">&#8226; </span><em><span lang="EN-US">Hardware Design.</span></em></p>
<p align="left"><span lang="EN-US">There are a wide range of choices;</span></p>
<p align="left"><span lang="EN-US">We need to understand the tradeoffs between the different hardware configurations and how well the system performs on different applications.</span></p>
<p align="left"><span lang="EN-US">Google has made a compelling case for sticking with low-end nodes for web search applications, and the Google approach requires much more complex system software to overcome the limited performance and reliability of the components. But it might not be the most cost-effective solution for a smaller operation when personnel costs are considered.</span></p>
<p align="left"><span lang="EN-US">&#8226; </span><em><span lang="EN-US">Programming Model.</span></em></p>
<p align="left"><span lang="EN-US">1. One important software concept for scaling parallel computing beyond 100 or so processors is to <u>incorporate error detection and recovery into the runtime system</u> and to <u>isolate programmers from both transient and permanent failures</u> as much as possible.</span></p>
<p align="left"><span lang="EN-US">Work on providing <u>fault tolerance in a manner invisible</u> to the application programmer <u>started in the context of grid-style computing</u>, but only <u>with the advent of MapReduce</u> and in recent work <u>by Microsoft has it become recognized as an important capability</u> for parallel systems.</span></p>
<p align="left"><span lang="EN-US">2. We <u>want programming models</u> that <u>dynamically adapt to the available resources</u> and that perform well <u>in a more asynchronous execution environment</u>.</span></p>
<p align="left"><span lang="EN-US">e.g.: Google&#8217;s implementation of MapReduce partitions a computation into a number of map and reduce tasks that are then scheduled dynamically onto a number of &#8220;worker&#8221; processors.</span></p>
<p align="left"><span lang="EN-US">&#8226; </span><em><span lang="EN-US">Resource Management.</span></em></p>
<p align="left"><span lang="EN-US">Problem: how to manage the computing and storage resources of a DISC system.</span></p>
<p align="left"><span lang="EN-US">We want it to be available in an interactive mode and yet able to handle very large-scale computing tasks.</span></p>
<p align="left"><span lang="EN-US">&#8226; </span><em><span lang="EN-US">Supporting Program Development.</span></em></p>
<p align="left"><u><span lang="EN-US">Developing parallel programs is difficult</span></u><span lang="EN-US">, both in terms of <u>correctness</u> and to get <u>good performance</u>.</span></p>
<p align="left"><span lang="EN-US">As a consequence, we must <u>provide software development tools</u> that <u>allow correct programs to be written easily</u>, while also <u>enabling more detailed monitoring, analysis, and optimization of program performance</u>.</span></p>
<p align="left"><span lang="EN-US">&#8226; </span><em><span lang="EN-US">System Software.</span></em></p>
<p align="left"><span lang="EN-US">System software is required for a variety of tasks, including fault diagnosis and isolation, system resource control, and data migration and replication.</span></p>
<p align="left"><span lang="EN-US">&nbsp;</span></p>
<p align="left"><span lang="EN-US">Google and its competitors provide an existence proof that DISC systems can be implemented using available technology.</span><span lang="EN-US"> </span><span lang="EN-US">Some <u>additional topics</u> include:</span></p>
<p align="left"><span lang="EN-US">&#8226; </span><em><span lang="EN-US">How should the processors be designed for use in cluster machines?</span></em></p>
<p align="left"><span lang="EN-US">&#8226; </span><em><span lang="EN-US">How can we effectively support different scientific communities in their data management and applications?</span></em></p>
<p align="left"><span lang="EN-US">&#8226; </span><em><span lang="EN-US">Can we radically reduce the energy requirements for large-scale systems? </span></em></p>
<p align="left"><span lang="EN-US">&#8226; </span><em><span lang="EN-US">How do we build large-scale computing systems with an appropriate balance of performance and cost? </span></em></p>
<p align="left"><span lang="EN-US">&#8226; </span><em><span lang="EN-US">How can very large systems be constructed given the realities of component failures and repair times? </span></em></p>
<p align="left"><span lang="EN-US">&#8226; </span><em><span lang="EN-US">Can we support a mix of computationally intensive jobs with ones requiring interactive response?</span></em></p>
<p align="left"><span lang="EN-US">&#8226; </span><em><span lang="EN-US">How do we control access to the system while enabling sharing? </span></em></p>
<p align="left"><span lang="EN-US">&#8226; </span><em><span lang="EN-US">Can we deal with bad or unavailable data in a systematic way?</span></em><span lang="EN-US"> </span></p>
<p align="left"><span lang="EN-US">&#8226; </span><em><span lang="EN-US">Can high performance systems be built from heterogenous components? </span></em></p>
<h2><span lang="EN-US"><br />
7 Turning Ideas into Reality</span></h2>
<h3><span lang="EN-US">7.1 Developing a Prototype System</span></h3>
<p align="left"><span lang="EN-US">Operate <u>two types of partitions</u>: some for <u>application development</u>, focusing on gaining experience with the different programming techniques, and others for <u>systems research</u>, studying fundamental issues in system design.</span></p>
<p align="left"><span lang="EN-US">For <strong>the program development</strong> partitions:</span></p>
<p align="left"><span lang="EN-US">Use available software, such as the open source code from the Hadoop project, to implement the file system and support for application programming.</span></p>
<p align="left"><span lang="EN-US">For<strong> the systems research</strong> partitions: </span></p>
<p align="left"><span lang="EN-US">Create our own design, studying the different layers of hardware and system software required to get high performance and reliability. (e.g.: high-end hardware, low-cost component)</span></p>
<h3><span lang="EN-US">7.2 Jump Starting</span></h3>
<p align="left"><u><span lang="EN-US">Begin application development by renting </span></u><span lang="EN-US">much of the required computing infrastructure:</span></p>
<p align="left"><span lang="EN-US">1. network-accessible storage: <strong>Simple Storage System (S3) service</strong></span></p>
<p align="left"><span lang="EN-US">2. computing cycles: <strong>Elastic Computing Cloud (EC2) service</strong></span></p>
<p align="left"><strong><span lang="EN-US">(</span></strong><span lang="EN-US">The current pricing for storage is $0.15 per gigabyte per day ($1,000 per terabyte per year), with addition costs for reading or writing the data. Computing cycles cost $0.10 per CPU hour ($877 per year) on a virtual Linux machine.<strong>)</strong></span></p>
<p align="left"><u><span lang="EN-US">Renting problems:</span></u></p>
<p align="left"><span lang="EN-US">1. The performance of such a configuration is much less than that of a dedicated facility.</span></p>
<p align="left"><span lang="EN-US">2. There is no way to ensure that the S3 data and the EC2 processors will be in close enough proximity to provide high speed access.</span></p>
<p align="left"><span lang="EN-US">3. We would lose the opportunity to design, evaluate, and refine our own system.</span></p>
<h3><span lang="EN-US">7.3 Scaling Up</span></h3>
<h2><span lang="EN-US"><br />
8 Conclusion</span></h2>
<p align="left"><span lang="EN-US">1. We believe that DISC systems could change the face of scientific research worldwide.</span></p>
<p align="left"><span lang="EN-US">2. DISC will help realize the potential all these data such as the combination of sensors and networks to collect data, inexpensive disks to store data, and the benefits derived by analyzing data provides.</span></p>
<p align="left"><span lang="EN-US">&nbsp;</span></p>
</span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span></span>
<img src ="http://www.blogjava.net/honeybee/aggbug/191770.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/honeybee/" target="_blank">sun</a> 2008-04-10 10:21 <a href="http://www.blogjava.net/honeybee/articles/191770.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item><item><title>DISC(Data Intensive Super Computing 数据密集型超级计算)</title><link>http://www.blogjava.net/honeybee/articles/190844.html</link><dc:creator>sun</dc:creator><author>sun</author><pubDate>Fri, 04 Apr 2008 15:43:00 GMT</pubDate><guid>http://www.blogjava.net/honeybee/articles/190844.html</guid><wfw:comment>http://www.blogjava.net/honeybee/comments/190844.html</wfw:comment><comments>http://www.blogjava.net/honeybee/articles/190844.html#Feedback</comments><slash:comments>0</slash:comments><wfw:commentRss>http://www.blogjava.net/honeybee/comments/commentRss/190844.html</wfw:commentRss><trackback:ping>http://www.blogjava.net/honeybee/services/trackbacks/190844.html</trackback:ping><description><![CDATA[&nbsp;
<h2><span style="font-size: 12pt; line-height: 173%">Data Intensive System(DIS)</span></h2>
<h3><span style="font-size: 10.5pt; line-height: 173%">System Challenges</span><span style="font-size: 10.5pt; line-height: 173%; font-family: 宋体">：</span></h3>
<p><strong>Data distributed over many disks</strong></p>
<p><strong>Compute using many processors</strong></p>
<p><strong>Connected by gigabit Ethernet (or equivalent)</strong></p>
<h3><span style="font-size: 10.5pt; line-height: 173%">System Requirements:</span></h3>
<p><strong>Lots of disks</strong></p>
<p><strong>Lots of processors</strong></p>
<p><strong>Located in close proximity</strong></p>
<h2><span style="font-size: 12pt; line-height: 173%">System Comparison: </span></h2>
<h3 style="margin-left: 36pt; text-indent: -36pt"><span style="font-size: 10.5pt; line-height: 173%">(i)<span style="font: 7pt 'Times New Roman'">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></span><span style="font-size: 10.5pt; line-height: 173%">Data</span></h3>
<table style="border-right: medium none; border-top: medium none; border-left: medium none; border-bottom: medium none; border-collapse: collapse" cellspacing="0" cellpadding="0" border="1">
    <tbody>
        <tr>
            <td style="border-right: black 1pt solid; padding-right: 5.4pt; border-top: black 1pt solid; padding-left: 5.4pt; padding-bottom: 0cm; border-left: black 1pt solid; width: 213.05pt; padding-top: 0cm; border-bottom: black 1pt solid" width="284">
            <p style="text-align: center" align="center"><strong>Conventional &nbsp;Supercomputers</strong></p>
            </td>
            <td style="border-right: black 1pt solid; padding-right: 5.4pt; border-top: black 1pt solid; padding-left: 5.4pt; padding-bottom: 0cm; border-left: medium none; width: 213.05pt; padding-top: 0cm; border-bottom: black 1pt solid" width="284">
            <p style="text-align: center" align="center"><strong>DISC</strong></p>
            </td>
        </tr>
        <tr>
            <td style="border-right: black 1pt solid; padding-right: 5.4pt; border-top: medium none; padding-left: 5.4pt; padding-bottom: 0cm; border-left: black 1pt solid; width: 213.05pt; padding-top: 0cm; border-bottom: black 1pt solid" valign="top" width="284">
            <p><img height="101" src="file:///C:/Users/Sun/AppData/Local/Temp/msohtmlclip1/01/clip_image002.gif" width="264" v:shapes="对象_x0020_2"  alt="" /></p>
            <p style="margin-top: 5.4pt"><strong>Data stored in separate repository</strong></p>
            <p style="margin-top: 1.9pt; text-indent: 10.5pt">No support for collection or management</p>
            <p style="margin-top: 5.4pt"><strong>Brought into system for computation</strong></p>
            <p style="margin-top: 1.9pt; text-indent: 10.5pt">Time consuming</p>
            <p style="margin-top: 1.9pt; text-indent: 10.5pt">Limits interactivity</p>
            </td>
            <td style="border-right: black 1pt solid; padding-right: 5.4pt; border-top: medium none; padding-left: 5.4pt; padding-bottom: 0cm; border-left: medium none; width: 213.05pt; padding-top: 0cm; border-bottom: black 1pt solid" valign="top" width="284">
            <p style="text-align: center" align="center"><img height="103" src="file:///C:/Users/Sun/AppData/Local/Temp/msohtmlclip1/01/clip_image004.gif" width="155" v:shapes="对象_x0020_3"  alt="" /></p>
            <p style="margin-top: 5.4pt"><strong>System collects and maintains data</strong></p>
            <p style="margin-top: 1.9pt; text-indent: 10.5pt">Shared, active data set</p>
            <p style="margin-top: 5.4pt"><strong>Computation colocated with storage</strong></p>
            <p style="margin-top: 1.9pt; text-indent: 10.5pt">Faster access </p>
            </td>
        </tr>
    </tbody>
</table>
<h3 style="margin-left: 36pt; text-indent: -36pt"><span style="font-size: 10.5pt; line-height: 173%">(ii)<span style="font: 7pt 'Times New Roman'">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></span><span style="font-size: 10.5pt; line-height: 173%">Programing Models</span></h3>
<table style="border-right: medium none; border-top: medium none; border-left: medium none; border-bottom: medium none; border-collapse: collapse" cellspacing="0" cellpadding="0" border="1">
    <tbody>
        <tr>
            <td style="border-right: black 1pt solid; padding-right: 5.4pt; border-top: black 1pt solid; padding-left: 5.4pt; padding-bottom: 0cm; border-left: black 1pt solid; width: 217.05pt; padding-top: 0cm; border-bottom: black 1pt solid" width="289">
            <p style="text-align: center" align="center"><strong>Conventional &nbsp;Supercomputers</strong></p>
            </td>
            <td style="border-right: black 1pt solid; padding-right: 5.4pt; border-top: black 1pt solid; padding-left: 5.4pt; padding-bottom: 0cm; border-left: medium none; width: 209.05pt; padding-top: 0cm; border-bottom: black 1pt solid" width="279">
            <p style="text-align: center" align="center"><strong>DISC</strong></p>
            </td>
        </tr>
        <tr>
            <td style="border-right: black 1pt solid; padding-right: 5.4pt; border-top: medium none; padding-left: 5.4pt; padding-bottom: 0cm; border-left: black 1pt solid; width: 217.05pt; padding-top: 0cm; border-bottom: black 1pt solid" valign="top" width="289">
            <p><img height="185" src="file:///C:/Users/Sun/AppData/Local/Temp/msohtmlclip1/01/clip_image006.gif" width="275" v:shapes="对象_x0020_4"  alt="" /></p>
            <p style="margin-top: 5.4pt"><strong>Programs described at very low level</strong></p>
            <p style="margin-top: 5.4pt; text-indent: 10.5pt">Specify detailed control of processing &amp; communications</p>
            <p style="margin-top: 5.4pt"><strong>Rely on small number of software packages</strong></p>
            <p style="margin-top: 5.4pt; text-indent: 10.5pt">Written by specialists</p>
            <p style="margin-top: 5.4pt; text-indent: 10.5pt">Limits classes of problems &amp; solution methods</p>
            </td>
            <td style="border-right: black 1pt solid; padding-right: 5.4pt; border-top: medium none; padding-left: 5.4pt; padding-bottom: 0cm; border-left: medium none; width: 209.05pt; padding-top: 0cm; border-bottom: black 1pt solid" valign="top" width="279">
            <p style="text-align: center" align="center"><img height="173" src="file:///C:/Users/Sun/AppData/Local/Temp/msohtmlclip1/01/clip_image008.gif" width="258" v:shapes="对象_x0020_5"  alt="" /></p>
            <p style="margin-top: 5.4pt"><strong>Application programs written in terms of high-level operations on data</strong></p>
            <p style="margin-top: 5.4pt"><strong>Runtime system controls scheduling, load balancing, &#8230;</strong></p>
            </td>
        </tr>
    </tbody>
</table>
<h3 style="margin-left: 36pt; text-indent: -36pt"><span style="font-size: 10.5pt; line-height: 173%">(iii)<span style="font: 7pt 'Times New Roman'">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></span><span style="font-size: 10.5pt; line-height: 173%">Interaction</span></h3>
<table style="border-right: medium none; border-top: medium none; border-left: medium none; border-bottom: medium none; border-collapse: collapse" cellspacing="0" cellpadding="0" border="1">
    <tbody>
        <tr>
            <td style="border-right: black 1pt solid; padding-right: 5.4pt; border-top: black 1pt solid; padding-left: 5.4pt; padding-bottom: 0cm; border-left: black 1pt solid; width: 203.85pt; padding-top: 0cm; border-bottom: black 1pt solid" width="272">
            <p style="text-align: center" align="center"><strong>Conventional &nbsp;Supercomputers</strong></p>
            </td>
            <td style="border-right: black 1pt solid; padding-right: 5.4pt; border-top: black 1pt solid; padding-left: 5.4pt; padding-bottom: 0cm; border-left: medium none; width: 222.25pt; padding-top: 0cm; border-bottom: black 1pt solid" width="296">
            <p style="text-align: center" align="center"><strong>DISC</strong></p>
            </td>
        </tr>
        <tr>
            <td style="border-right: black 1pt solid; padding-right: 5.4pt; border-top: medium none; padding-left: 5.4pt; padding-bottom: 0cm; border-left: black 1pt solid; width: 203.85pt; padding-top: 0cm; border-bottom: black 1pt solid" valign="top" width="272">
            <p><strong>Main Machine: Batch Access</strong></p>
            <p style="text-indent: 10.3pt">Priority is to conserve machine resources</p>
            <p style="text-indent: 10.3pt">User submits job with specific resource requirements</p>
            <p style="text-indent: 10.3pt">Run in batch mode when resources available</p>
            <p><strong>Offline Visualization</strong></p>
            <p style="text-indent: 10.3pt">Move results to separate facility for interactive use</p>
            </td>
            <td style="border-right: black 1pt solid; padding-right: 5.4pt; border-top: medium none; padding-left: 5.4pt; padding-bottom: 0cm; border-left: medium none; width: 222.25pt; padding-top: 0cm; border-bottom: black 1pt solid" valign="top" width="296">
            <p><strong>Interactive Access</strong></p>
            <p style="text-indent: 10.3pt">Priority is to conserve human resources</p>
            <p style="text-indent: 10.3pt">User action can range from simple query to complex computation</p>
            <p style="text-indent: 10.3pt">System supports many simultaneous users</p>
            <p style="text-indent: 20.6pt">Requires flexible programming and runtime environment</p>
            </td>
        </tr>
    </tbody>
</table>
<h3 style="margin-left: 36pt; text-indent: -36pt"><span style="font-size: 10.5pt; line-height: 173%">(iv)<span style="font: 7pt 'Times New Roman'">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></span><span style="font-size: 10.5pt; line-height: 173%">Reliability</span></h3>
<table style="border-right: medium none; border-top: medium none; border-left: medium none; border-bottom: medium none; border-collapse: collapse" cellspacing="0" cellpadding="0" border="1">
    <tbody>
        <tr>
            <td style="border-right: black 1pt solid; padding-right: 5.4pt; border-top: black 1pt solid; padding-left: 5.4pt; padding-bottom: 0cm; border-left: black 1pt solid; width: 203.85pt; padding-top: 0cm; border-bottom: black 1pt solid" width="272">
            <p style="text-align: center" align="center"><strong>Conventional &nbsp;Supercomputers</strong></p>
            </td>
            <td style="border-right: black 1pt solid; padding-right: 5.4pt; border-top: black 1pt solid; padding-left: 5.4pt; padding-bottom: 0cm; border-left: medium none; width: 222.25pt; padding-top: 0cm; border-bottom: black 1pt solid" width="296">
            <p style="text-align: center" align="center"><strong>DISC</strong></p>
            </td>
        </tr>
        <tr>
            <td style="border-right: black 1pt solid; padding-right: 5.4pt; border-top: medium none; padding-left: 5.4pt; padding-bottom: 0cm; border-left: black 1pt solid; width: 203.85pt; padding-top: 0cm; border-bottom: black 1pt solid" valign="top" width="272">
            <p><strong>&#8220;Brittle&#8221; Systems</strong></p>
            <p style="margin-top: 5.4pt; text-indent: 10.5pt">Main recovery mechanism is to recompute from most recent checkpoint</p>
            <p style="margin-top: 5.4pt; text-indent: 10.5pt">Must bring down system for diagnosis, repair, or upgrades</p>
            </td>
            <td style="border-right: black 1pt solid; padding-right: 5.4pt; border-top: medium none; padding-left: 5.4pt; padding-bottom: 0cm; border-left: medium none; width: 222.25pt; padding-top: 0cm; border-bottom: black 1pt solid" valign="top" width="296">
            <p><strong>Flexible Error Detection and Recovery</strong></p>
            <p style="margin-top: 5.4pt; text-indent: 10.5pt">Runtime system detects and diagnoses errors</p>
            <p style="margin-top: 5.4pt; text-indent: 10.5pt">Selective use of redundancy and dynamic recomputation </p>
            <p style="margin-top: 5.4pt; text-indent: 10.5pt">Replace or upgrade components while system running</p>
            <p style="margin-top: 5.4pt; text-indent: 10.5pt">Requires flexible programming model &amp; runtime environment</p>
            </td>
        </tr>
    </tbody>
</table>
<h2><span style="font-size: 12pt; line-height: 173%">Comparing with Grid Computing: </span></h2>
<h3><span style="font-size: 10.5pt; line-height: 173%">Grid: Distribute Computing and Data</span></h3>
<p style="margin-left: 36pt; text-indent: -36pt"><strong>(i)<span style="font: 7pt 'Times New Roman'">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></strong><strong>Computation: Distribute problem across many machines</strong></p>
<p style="text-indent: 31.5pt">Generally only those with easy partitioning into independent subproblems </p>
<p style="margin-left: 36pt; text-indent: -36pt"><strong>(ii)<span style="font: 7pt 'Times New Roman'">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></strong><strong>Data: Support shared access to large-scale data set</strong></p>
<h3><span style="font-size: 10.5pt; line-height: 173%">DISC: Centralize Computing and Data</span></h3>
<p style="margin-left: 36pt; text-indent: -36pt"><strong>(i)<span style="font: 7pt 'Times New Roman'">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></strong><strong>Enables more demanding computational tasks</strong></p>
<p style="margin-left: 36pt; text-indent: -36pt"><strong>(ii)<span style="font: 7pt 'Times New Roman'">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></strong><strong>Reduces time required to get data to machines</strong></p>
<p style="margin-left: 36pt; text-indent: -36pt"><strong>(iii)<span style="font: 7pt 'Times New Roman'">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></strong><strong>Enables more flexible resource management</strong></p>
<h2><span style="font-size: 12pt; line-height: 173%">A Commercial DISC</span></h2>
<h3><span style="font-size: 10.5pt; color: red; line-height: 173%">Netezza Performance Server (NPS)</span></h3>
<p><strong><span style="color: red">Designed for &#8220;data warehouse&#8221; applications</span></strong></p>
<p style="text-indent: 21pt"><span style="color: red">Heavy duty analysis of database</span></p>
<p><strong><span style="color: red">Data distributed over up to 500 Snippet Processing Units</span></strong></p>
<p style="text-indent: 21pt"><span style="color: red">Disk storage, dedicated processor, FPGA controller</span></p>
<p><strong><span style="color: red">User &#8220;programs&#8221; expressed in SQL</span></strong></p>
<h2><span style="font-size: 12pt; line-height: 173%">Constructing DISC</span></h2>
<h3><span style="font-size: 10.5pt; line-height: 173%">Hardware: Rent from Amazon</span></h3>
<p><strong>Elastic Compute Cloud (EC2)</strong></p>
<p style="text-indent: 21pt">Generic Linux cycles for $0.10 / hour ($877 / yr)</p>
<p><strong>Simple Storage Service (S3)</strong></p>
<p style="text-indent: 21pt">Network-accessible storage for $0.15 / GB / month ($1800/TB/yr)</p>
<h3><span style="font-size: 10.5pt; line-height: 173%">Software: utilize open source</span></h3>
<p><strong>Hadoop Project</strong></p>
<p style="text-indent: 21pt">Open source project providing file system and MapReduce </p>
<p style="text-indent: 21pt">Supported and used by Yahoo </p>
<h3><span style="font-size: 10.5pt; line-height: 173%">Implementing System Software</span></h3>
<p><strong>Programming Support</strong></p>
<p style="text-indent: 21pt">Abstractions for computation &amp; data representation</p>
<p style="margin-left: 42pt; text-indent: 21pt"><strong><span style="color: red">E.g., Google: MapReduce &amp; BigTable </span></strong></p>
<p style="text-indent: 21pt">Usage models</p>
<p><strong>Runtime Support</strong></p>
<p style="text-indent: 21pt">Allocating processing and storage</p>
<p style="text-indent: 21pt">Scheduling multiple users</p>
<p style="text-indent: 21pt">Implementing programming model</p>
<p><strong>Error Handling</strong></p>
<p style="text-indent: 21pt">Detecting errors</p>
<p style="text-indent: 21pt">Dynamic recovery</p>
<p style="text-indent: 21pt">Identifying failed components</p>
 <img src ="http://www.blogjava.net/honeybee/aggbug/190844.html" width = "1" height = "1" /><br><br><div align=right><a style="text-decoration:none;" href="http://www.blogjava.net/honeybee/" target="_blank">sun</a> 2008-04-04 23:43 <a href="http://www.blogjava.net/honeybee/articles/190844.html#Feedback" target="_blank" style="text-decoration:none;">发表评论</a></div>]]></description></item></channel></rss>