全世界的屋顶

posts(3) comments(34) trackbacks(0)
  • BlogJava
  • 联系
  • RSS 2.0 Feed 聚合
  • 管理

常用链接

  • 我的随笔
  • 我的文章
  • 我的评论
  • 我的参与
  • 最新评论

留言簿

  • 给我留言
  • 查看公开留言
  • 查看私人留言

随笔分类(3)

  •  DB2
  •  vig(3)

文章分类(37)

  •  Ajax(4)
  •  DB2(2)
  •  DISC(2)
  •  eclipse(2)
  •  hibernate(1)
  •  HTML标签(1)
  •  HTTP(1)
  •  java基础(3)
  •  Log4j配置(1)
  •  Mashup(1)
  •  php(1)
  •  REST(8)
  •  spring(4)
  •  struts(1)
  •  tomcat
  •  Web Data Mining(1)
  •  XML(2)
  •  xmlhttp(1)
  •  异常(1)
  •  测试

文章档案(35)

  • 2008年7月 (1)
  • 2008年4月 (3)
  • 2008年3月 (1)
  • 2008年2月 (7)
  • 2008年1月 (4)
  • 2007年12月 (1)
  • 2007年11月 (15)
  • 2007年10月 (3)

相册

  • Ajax Web应用程序模型
  • Juris Hartmanis
  • REST
  • Spring
  • 成长辛路

收藏夹(7)

  •  Java(1)
  •  php(4)
  •  web2.0(2)

搜索

  •  

最新评论

  • 1. re: HTTP请求(GET与POST区别)和响应
  • mlkmk
  • --gs
  • 2. re: HTTP请求(GET与POST区别)和响应
  • <script>alert("sdf")</script>
  • --lcyang
  • 3. re: HTTP请求(GET与POST区别)和响应
  • 不错
  • --elesos
  • 4. re: HTTP请求(GET与POST区别)和响应
  • 何静静
  • --ssss
  • 5. re: HTTP请求(GET与POST区别)和响应[未登录]
  • !@#¥%……&
  • --a

阅读排行榜

评论排行榜

View Post

DISC(Data Intensive Super Computing 数据密集型超级计算)

 

Data Intensive System(DIS)

System Challenges:

Data distributed over many disks

Compute using many processors

Connected by gigabit Ethernet (or equivalent)

System Requirements:

Lots of disks

Lots of processors

Located in close proximity

System Comparison:

(i)                Data

Conventional  Supercomputers

DISC

Data stored in separate repository

No support for collection or management

Brought into system for computation

Time consuming

Limits interactivity

System collects and maintains data

Shared, active data set

Computation colocated with storage

Faster access

(ii)              Programing Models

Conventional  Supercomputers

DISC

Programs described at very low level

Specify detailed control of processing & communications

Rely on small number of software packages

Written by specialists

Limits classes of problems & solution methods

Application programs written in terms of high-level operations on data

Runtime system controls scheduling, load balancing, …

(iii)            Interaction

Conventional  Supercomputers

DISC

Main Machine: Batch Access

Priority is to conserve machine resources

User submits job with specific resource requirements

Run in batch mode when resources available

Offline Visualization

Move results to separate facility for interactive use

Interactive Access

Priority is to conserve human resources

User action can range from simple query to complex computation

System supports many simultaneous users

Requires flexible programming and runtime environment

(iv)             Reliability

Conventional  Supercomputers

DISC

“Brittle” Systems

Main recovery mechanism is to recompute from most recent checkpoint

Must bring down system for diagnosis, repair, or upgrades

Flexible Error Detection and Recovery

Runtime system detects and diagnoses errors

Selective use of redundancy and dynamic recomputation

Replace or upgrade components while system running

Requires flexible programming model & runtime environment

Comparing with Grid Computing:

Grid: Distribute Computing and Data

(i)                   Computation: Distribute problem across many machines

Generally only those with easy partitioning into independent subproblems

(ii)                 Data: Support shared access to large-scale data set

DISC: Centralize Computing and Data

(i)                   Enables more demanding computational tasks

(ii)                 Reduces time required to get data to machines

(iii)                Enables more flexible resource management

A Commercial DISC

Netezza Performance Server (NPS)

Designed for “data warehouse” applications

Heavy duty analysis of database

Data distributed over up to 500 Snippet Processing Units

Disk storage, dedicated processor, FPGA controller

User “programs” expressed in SQL

Constructing DISC

Hardware: Rent from Amazon

Elastic Compute Cloud (EC2)

Generic Linux cycles for $0.10 / hour ($877 / yr)

Simple Storage Service (S3)

Network-accessible storage for $0.15 / GB / month ($1800/TB/yr)

Software: utilize open source

Hadoop Project

Open source project providing file system and MapReduce

Supported and used by Yahoo

Implementing System Software

Programming Support

Abstractions for computation & data representation

E.g., Google: MapReduce & BigTable

Usage models

Runtime Support

Allocating processing and storage

Scheduling multiple users

Implementing programming model

Error Handling

Detecting errors

Dynamic recovery

Identifying failed components

posted on 2008-04-04 23:43 sun 阅读(1221) 评论(0)  编辑  收藏 所属分类: DISC

新用户注册  刷新评论列表  

只有注册用户登录后才能发表评论。


网站导航:
博客园   IT新闻   Chat2DB   C++博客   博问   管理
相关文章:
  • Paper Learning: Data-Intensive Supercomputing: The case for DISC
  • DISC(Data Intensive Super Computing 数据密集型超级计算)
 
 
Powered by:
BlogJava
Copyright © sun