BlogJava-Snowdream-随笔分类-Laboratory

精读paper - Application-Level Isolation and Recovery with Solitude

ZelluX — Wed, 28 May 2008 07:23:00 GMT

一套基于文件系统的安全方案，主要通过隔离运行不可信任的程序、taint记录、事故恢复。

我的presentation：
http://docs.google.com/Presentation?id=dcjk4xx7_473cv5ddgc8

出于时间考虑没有提到paper中进程间通信的解决方法

ZelluX 2008-05-28 15:23 发表评论

最近读的两篇paper

ZelluX — Tue, 20 May 2008 12:18:00 GMT

摘要: 一篇介绍一种全新的Web架构，另一篇介绍虚拟机的探测方法阅读全文

ZelluX 2008-05-20 20:18 发表评论

阅读笔记 - SubVirt: Implementing malware with virtual machines (2)

ZelluX — Tue, 06 May 2008 06:35:00 GMT

摘要: 一种利用虚拟机进行的攻击手段(下篇) 阅读全文

ZelluX 2008-05-06 14:35 发表评论

阅读笔记 - SubVirt: Implementing malware with virtual machines (1)

ZelluX — Mon, 05 May 2008 13:53:00 GMT

摘要: 一种利用虚拟机进行的攻击手段阅读全文

ZelluX 2008-05-05 21:53 发表评论

Streamware ppt

ZelluX — Wed, 16 Apr 2008 06:57:00 GMT

要提高效率果然得远离网络，躺床上看paper理解起来快多了
总算把晚上要讲的ppt做出来了，囧

ZelluX 2008-04-16 14:57 发表评论

Weekly Report

ZelluX — Wed, 26 Mar 2008 09:01:00 GMT

试试Google Document

ZelluX 2008-03-26 17:01 发表评论

DEBUG 记录 - SPEC2006 470.lbm

ZelluX — Mon, 24 Mar 2008 13:16:00 GMT

一个用Lattice Boltzmann Method模拟三维空间中不可压缩流体的程序，示意图见底部。
转这个程序实在是太耗体力了 -_-b

Brook本身的不少缺陷、bug，加上不习惯科学计算程序的代码风格，导致大多数时间都在fix bug。

其中de掉以后最有快感的一个bug：（只能这么形容了 >,<）

每个cell都有一个flag值，尽管类型是double，但是程序中是用一个MAGIC_CAST宏把它当作整型处理的。
初始情况，每个cell的flag都为~f，也就是一个1~28位都是1，29~32位为0的double型浮点。根据IEEE标准，应该是个NaN。
CPU上没有问题，放到GPU上问题就出来了，GPU不支持这种转型操作，在对这个double型进行运算操作的时候，所有结果都会变成NaN。

解决方法：
在把数据传给GPU之前可以先把这些flag值转换为GPU可以操作的double型，最简单的方法就是都先转成int（会有truncating），然后取反，再传给GPU。

ZelluX 2008-03-24 21:16 发表评论

阅读笔记

ZelluX — Sat, 15 Mar 2008 06:46:00 GMT

摘要: 包括各种paper, survey以及workshop上的讲座等内容阅读全文

ZelluX 2008-03-15 14:46 发表评论

GP-GPU 阅读笔记 (5)

ZelluX — Fri, 15 Feb 2008 11:51:00 GMT

Mars: A MapReduce Framework on Graphics Processors
by Bingsheng He @ Hong Kong Univ. of Sci. & Tech.
Nage K. Govindaraju @ Microsoft Corp.
Qiong Luo, Tuyong Wang @ Sina Corp.

一些重点摘记：
1. Introduction
Three challenges in implementing the MapReduce framework on the GPU:
First, the synchronization overhead in the run-time system of the framework must be low.
Second, a fine-grained load balancing scheme is required.
Third, the core tasks of MapReduce, including string processing, file manipulation and concurrent reads and writes, are unconventional to GPUs and must be handled efficiently.
Each thread is responsible for a Map or a Reduce task with a small number of key/value pairs as input.
Performance improvement: 1.5-16 times

2. Priliminary and Related Work
2.1. Graphics Processors
It is desirable to schedule the tasks between the CPU and the GPU to fully exploit their computation power.
Given a kernel program, the occupancy of the GPU is the ratio of active schedule units to the maximum number of schedule units supported on the GPU.
The GPU has a hardware feature called coalesced access to exploit the spatial locality of memory accesses among threads.

2.2. GPGPU
2.3. MapReduce
Map: (k1, v1) -> (k2, v2)*
Reduce: (k2, v2*) -> v3*

3. Design and Immplementation
3.1. Design Goals
3.2. System Workflow and Configuration
3.3. APIs
3.4. Implementation Techniques
Based on this compilation information and the total computation resources on the GPU, we set the number of threads per thread group and the number of thread groups to achieve a high occupancy at run time.

4. Evaluation
4.1. Experimental Setup

ZelluX 2008-02-15 19:51 发表评论

GP-GPU 阅读笔记 (4)

ZelluX — Sun, 10 Feb 2008 08:13:00 GMT

4.2. Data Structures

The GPU Memory Model
通常使用二维的texture保存，一是因为一维texture能存放的东西很少，二是因为现在的GPU很难高效地写入一列3维texture。
Iteration
stream编程模型包含了一种隐式的流的并行遍历。
Generalized Arrays via Address Translation
在GPGPU编程中主要使用的数据结构是随机访问的多位容器，包括稀疏/稠密数组等。每个结构定义了一个虚拟域virual grid domain和一个物理域physical grid domaiin，以及之间相互转换的address translator。

4.2.1. Dense Arrays
多维数组通常先映射到一维，然后再到二维。
4.2.2. Sparse Arrays
根据非零元素的位置和数量是否变化分两种，静态和动态。
4.2.3. Adaptive Structures

ZelluX 2008-02-10 16:13 发表评论

GP-GPU 阅读笔记 (3)

ZelluX — Sat, 09 Feb 2008 05:14:00 GMT

4. GPGPU Techniques
4.1. Stream Operations
4.1.1. Map
Given a stream of data elements and a function, map will apply the function to every element in the stream.
4.1.2. Reduce
Sometimes a computation requires computing a smaller stream from a larger input stream, possibly to a single element stream. This type of computation is called a reduction. For example, computing the sum or maximum of all the elements in a stream.
On GPUs, reductions can be performed by alternately rendering to and reading from a pair of textures.
也就是用分治法，不断切换输入和输出数据，每次都能减少一定比例的数据规模。
4.1.3. Scatter and Gather
If the write and read operations access memory indirectly, they are called scatter and gather respectively.
4.1.4. Stream Filtering
This stream fitering operation is essentially a nonuniform reduction.
4.1.5. Sort
Classic sorting algorithms are data-dependent and generally require scatter operations.
主要的几个算法都和Sorting Network有关，还有一种adaptive sort，和原来序列的有序度相关。
4.1.6. Search
4.2. Data Structures

ZelluX 2008-02-09 13:14 发表评论

GP-GPU 阅读笔记 (2)

ZelluX — Fri, 08 Feb 2008 08:05:00 GMT

2.4 GPU Program Flow Control
最新的GPU支持多种形式的分支，但是由于它们的高度并行化的本质，使用这些分支的时候一定要注意。
2.4.1 Hardware Machanisms for Flow Control
三种主要实现：
Predication 并非真正的data-dependent branch
MIMD branching
SIMD branching 同时进行的指令唯一，即各个点的分支选择应该一致
2.4.2 Moving Branching Up The Pipeline
2.4.2.1 Static Branch Resolution
静态分析，避免循环内部的分支。这里举了一个在离散空间点格(discrete spatial grid)上解偏微分方程的例子，不过没怎么看懂，大致是把循环拆成两部分的做法。
2.4.2.2 Pre-computation
有时候一段时间内或者几次循环中某个分支的结果会是一个常数。这时候就只要在知道结果会改变的时候重新计算即可。
2.4.2.3 Z-Cull
现代GPU有一系列用于避免处理不会被看到的像素的技术，其中之一就是Z-cull。简单的说Z-cull把没有通过深度测试（Z轴覆盖）点直接放弃。在流体模拟中，把land-locked障碍单元的Z深度标记为0，即可跳过这些点的计算。
2.4.2.4 Data-Dependent Looping With Occlusion Queries
同样是避免处理不可见的点的技术

3 Programming Systems
GPU的架构发展非常迅速，使得profiling和tuning需要由GPU生产商解决。
3.1 High-level Shading Languages
Cg, HLSL 和底层硬件很接近
OpenGL Shading Language 有一些不直接映射到硬件的特性，比如整数支持
Sh, Ashli, ...
3.2 GPGPU Languages and Libraries
上面提到的几个语言在使用时都要求编程人员站在几何元素的视角写代码。下面的几个系统试着把一些GPGPU功能抽象出来，隐藏底层的GPU实现。
Brook 前几星期打过交道的东东
Scout, Glift 都没听说过。。。
3.3 Debugging Tools
GPU的调试功能很受局限。它必须提供在某一时刻显示多个点的调试信息的功能。一种printf-style的方法是把他们直接显示在屏幕上（汗，如果是GPGPU编程岂不是花屏了 >,<）。

ZelluX 2008-02-08 16:05 发表评论

GP-GPU 阅读笔记 (1)

ZelluX — Thu, 07 Feb 2008 08:31:00 GMT

实验室的寒假任务 =_=
No.1
A Survey of General-Purpose Computation on Graphics Hardware
on EUROGRAPHICS 2005

1. Why GP-GPU?
1.1 Powerful and Inexpensive
高内存带宽：Nvidia GeForce 6800 Ultra - 35.2GB/sec
强大的计算能力：ATI X800 XT - 63GFLOPS, Intel Pentium4 SSE unit(3.7GHz) - 14.8GFLOPS
尖端处理科技的应用：最新公布(指该survey发布的时间)的GPU包含三亿个晶体管，由0.011微米技术制作
快速发展：GeForce 6800的throughput为5900的两倍。通常GPU的计算能力平均每年增长速度为1.7x(pixels/second)和2.3x(vertices/second)，而根据摩尔定律，CPU的对应数值大概为每年1.4x。粗略的说，GPU性能每六个月增长一倍。

1.2 Flexible and Programmable

1.3 Limitations and Difficulties
GPU的强大计算性能是建立在它高度针对的架构上的，因此很多应用都不适合放到GPU上做。比如文字处理，主要包括内存通信，而且很难并行化。
如今的GPU也缺少一些基本的计算功能，比如整数运算。而且很多只支持32位浮点数（貌似最近的R670指令集可以处理double类型了），这样导致很多科学计算都没法在GPU上做。
另外即使对于适合GPU这些特性的问题，真正使用GPU做时也有不少问题。GPU的编程模型很不一样，高效的GPU编程不仅仅是说多学一门高级语言。如今要借助GPU的计算能力，需要编程人员同时掌握相应的科学计算知识和计算机图形学知识。尽管如此，GPU对性能提升的帮助还是很诱人的。

1.4 GPGPU Today
http://gpgpu.org
一些GPGPU的应用包括
Dense and sparse matrix multiplication 计算领域
Multigrid and conjugate-gradient solves for systems partial differential equations   计算领域
Ray tracing   图像处理
Photon mapping   图像处理
Fluid mechanics solvers   物理模拟
Datamining operations   数据库/数据挖掘

2. Overview of Programmable Graphics Hardware
2.1 Overview of the Graphics Pipeline
当今的GPU都采用了称为graphics pipeline的架构。pipeline被分成不同的stage，硬件上每个stage都被放到task-parallel machine organization上实现。

2.2 Programmable Hardware
显卡商们把固定功能的pipeline转化成了一个更灵活的可编程的pipeliine。主要在geometry stage和fragment stage。原来的固定的操作被用户定义的vertex program和fragment program代替
通常来说，这些可编程阶段读入一组含有限数量的有4个32位浮点的向量数组并输出一组含有限数量的4*32浮点向量的数组。每个可编程阶段都可以访问常数寄存器，也可以读写对应的寄存器。

2.3 Introduction to the GPU Programming Model
典型的GPGPU程序都使用了fragment processor作为计算引擎。通常的结构为：
a. 程序员确定该应用的并行部分。应用程序被分成几个独立的可并行段，每段都被看成是一个kernel，被当成fragment program实现。每个kernel的输入输出都是一个或多个数据数组，以texture形式保存在GPU内存中。用流相关的术语表述的话，这些在texture中的数据组成了stream，每个stream上的元素都要被kernel分别处理。
b. 调用kernel前要先确定计算范围，程序员可以传递点的数据给GPU。注意GPU在处理一维数组时性能有所局限。
c. rasterizer为每个像素生成一个fragment。
d. 每个fragment被同一个活动的kernel程序处理。fragment程序可以读入任意的全局内存，但只能写到rasterizer决定的frame buffer中。这块还没怎么搞懂
e. 每个fragment的输出是一个值或者向量值，可以作为作中的程序结果，也可以保存为一个texture，用于后面的计算，复杂的应用通常需要多个pipeline之间的传递(multipass)

ZelluX 2008-02-07 16:31 发表评论

草拟一个计划

ZelluX — Wed, 16 Jan 2008 03:58:00 GMT

剩下的两星期
我负责的主要是Fortran -> IL的部分
主要的几个问题

Fortran转成High WHIRL后，怎么写成IL？
    1. 参考brook，看看能不能代码重用
    2. 或者试试直接将WHIRL转成Brook IR，然后调用那几个routines自动转IL？

如何在Fortran中调用CAL？
    1. 如何实现F77调用库函数？
    2. 调用的overhead如何呢？

一些优化相关的paper，CC已经收集了几篇
    1. Alan Leung on 6th Workshop on Compiler-Driven Performance
    2. RapidMind Development Platform
    3. LiquidSIMD

其他一些问题
    1. 决定是否放到GPU里面做的那个tradeoff如何控制？或者动态控制？

暂时想到这些，一步一步来

ZelluX 2008-01-16 11:58 发表评论

vectorization

ZelluX — Wed, 02 Jan 2008 11:07:00 GMT

摘要: http://wikipedia.answers.com/vectorization
阅读全文

ZelluX 2008-01-02 19:07 发表评论

居然要看Fortran了

ZelluX — Sun, 16 Dec 2007 13:03:00 GMT

涉及到优化spec2006中的一些程序，orz
贴资料

Fortran导引

Fortran入门快速指南

Fortran学习的一些建议

2006-8-6
相信大家都对C语言有一定的了解，其实Fortran跟C相差不是很多。
我把自己认为比较合理快速学习Fortran的方法说下。
学习Fortran，会遇到Fortran77&Fortran90等等，两者差别不大，建议学习Fortran90或更
高，更加自由些（仅对一般用而言，其他优势可能体现不出来），对自己以后学习他
的程序包也会有好处。
大家一般只是为了编程为了计算而学Fortran，而不是为了学习Fortran而学Fortran，所以
我的建议是学习Fortran不要像学C那样拿一本很详细的教材从头至尾学下来，一个大家都
有不错的C语言基础，而且也没有太多的精力去专门研究这些，倒不如看些简易的教材（我
会附上），掌握基本语句之后直接从看最简单的程序开始。这样，很快就会体会到Fortra
n的格式，可以开始自己写程序了。学习的顺序我建议如下：
1、 编一些仅含输入输出的程序，然后可以尝试把输入输出同文件结合起来（从文件里读
数据、写数据）；
2、 然后可以学条件判断、循环语句，通过几个实例也可以很快掌握；
3、 再往后就是写子程序，就是程序的调用，相信那个时候，看了我的第一个例子（PROG
RAM A）就应该能写出简单的含函数调用的程序，到了这里，基本上可以算告一段落，可以
进行结构上复杂的程序的编写；
4、 最后，可以学一下多个程序的编译甚至是多种语言程序的混编（如既有C又有Fortran
的多个程序一起编译）。多个程序的编译我不并不熟悉，就留给siriusbobo同志来解说吧
:-)
在编程中遇到困难然后再去查找资料和用法不失为一种好的方法，不必刻意去求学全。
当然，有足够时间和精力的同学强烈建议好好看教材，不必急于求成，有一个好的基础总
是一件很好的事。
Fortran相比C的优势的话在于它丰富的资源，C的优势可能是更加简洁，编译效率更高。但
对于我的平时使用来说，这两者的优势、劣势都体现不出来，自己的感觉是Fortran更接近
平时的科学语言，比较严谨些，更容易读懂不出错，比较符合习惯，变量、函数的声明上
也比C更方便灵活，以外函数的使用为例：
******************************************************************************
PROGRAM A
real z
read *,z
call f(z)
y=z
print *,y
end
subroutine f(x)
x=x**2
return
end
******************************************************************************
只需要加一个"subroutine"程序段，主函数即可用"call"调用，当然也可以写多个子程序
，其中一个子程序也可以通过"call"来调用其他子程序。
就一般学习而言，除了子程序的编写，另外一个用得比较多的是文件的读写操作，读用
"read",写用"write"，如下：
******************************************************************************
PROGRAM B
real x
open (1,file='in.dat',status='unknown')
open (2,file='out.dat',status='unknown')
read (1,100) x
100 format (1e12.7)
close(1)
write (2,200) x
200 format (1e15.8)
close(2)
end
******************************************************************************
如果用"*"的话，就为默认形式，更具体的可以查看帮助或有关资料，比较好的方法是随时
做一个test程序，用来检测所学或所想。
对于上程序，出现的"100","200"是语句标号，这些标号为方便语句的跳转而出现，可以实
现循环、条件控制等，但也为了使程序结构化而不推荐使用，用goto语句和语句标号实现
语句的跳转如下：
******************************************************************************
PROGRAM C
integer n
real z
n=0
read *,z
1 call f(z)
y=z
n=n+1
if (n<10) goto 1
print *,y
end
subroutine f(x)
x=x**2
return
end
******************************************************************************
这类跳转在F77里经常用到，F90以后并不多见，但对于"100 format (1e12.7)"之类还是经
常用到，这是用来表示存储读取的数据的格式的，可以放在程序任何位置，更具体的用法
要参看说明。
有关注释：
Fortran里注释用"!"或"C"，其中，一般在Windows下使用"Compad Visual Fortran"编译，
有两种格式，一个是"Free Format"，生成".f90"，另外一个"Fixed Format"，生成".for
"，只有".for"里两种注释都可用（"!"或"C"），但在".f90"里只能用"!"。
有关学习的困难：
算法是语言的灵魂没错，是最麻烦的，但想必大家都学过C，遇到过不少算法，这些可以用
C实现的，用Fortran实现都不是很困难，所以这里不主要讨论这个“灵魂”性质的东西。
常量、变量、数组的数据类型，以及数据类型的读写控制倒是经常容易出错的。下面主要
讲一些我认为需要注意的和我曾经犯过和看到过的错误。
Fortran跟C一样，也分整型(INTEGER)，实型(REAL)，双精度(REAL*8或REAL(8)或DOUBLE
PRECISION)，这些在科学计算中还是比较重要的，以实型数为例：
一般REAL等价于REAL*4或REAL(4),是单精度的；
而双精度在F77中表示为DOUBLE PRECISION，在F90中可以表示为REAL*8或REAL(8)，在高精
度计算中，双精度的变量是很有必要的，对于一般实数可以表示为小数形式或指数形式，
而双精度都表示成指数形式，但指数E要改成D，如：
REAL:100.0或1e2,双精度下就得表示成1D2
由于Fortran中不需要对每个变量都进行声明，所以有时候会在每个程序或子程序开头做个
说明，如下：
IMPLICIT DOUBLE PRECISION(A-H,O-Z)
代表以A-H以及O-Z字母开头的变量默认（在不声明的情况下）是双精度的，否则则是整型
的，如下：
******************************************************************************
PROGRAM D
IMPLICIT DOUBLE PRECISION(A-H,O-Z)
J1=1D-2
J2=-0.5D-1
x=J1+J2
print *,x
end
******************************************************************************
PROGRAM E
implicit double precision (A-I,O-Z)
double precision a,i,e1,e2
data j2 /0.87450547081842D-3/
data j3 /-0.11886910646016D-4/
data j5 /-0.17242068505339D-5/
data j7 /0.10566966079622D-6/
write(*,*) "please input a"
read(*,*) a
write(*,*) "please input i"
read(*,*) i
e1=(j3*sin(i)/(2*a*j2)-5*j5*sin(i)*(1-7*sin(i)**2/2+21*sin(i)**4/8)&
&/(2*a**3*(2-5*sin(i)**2/2))+35*j7*sin(i)*(1-27*sin(i)**2/4+99&
&*sin(i)**4/8-429*sin(i)**6/64)/(3*a**5*(2-5*sin(i)**2/2)))
e2=-(j3*sin(i)/(2*a*j2)-5*j5*sin(i)*(1-7*sin(i)**2/2+21*sin(i)**4/8)&
&/(2*a**3*(2-5*sin(i)**2/2))+35*j7*sin(i)*(1-27*sin(i)**2/4+99&
&*sin(i)**4/8-429*sin(i)**6/64)/(3*a**5*(2-5*sin(i)**2/2)))
write(*,"(E9.2E3)") e1,e2
stop
end
******************************************************************************
第一个程序输出不是-0.4而是0.000000000000000E+000
第二个程序任意输入a、i，并未得到希望得到的结果，而是输出NAN和NAN，关于NAN这个错
误，有时候函数定义域不符合的时候，运行并不报错而是输出NAN，这个时候检查程序这些
地方是检查的重点，当然，会有其他情况，但我碰到的不多，只好就我所知跟大家交流一
下。
这两个程序都因为J开头的变量不属于默认双精度变量，而用双精度表示给它们赋值了，导
致结果跟预期不一致，在程序中把这些以J开头的变量用REAL*8声明一下，或把
implicit double precision (A-I,O-Z)改为：
implicit double precision (A-J,O-Z)，或把这个语句去掉
就可以得到预期的结果了。
对于数组，可以用DIMENSION定义，但需要注意的是，若在程序头未做声明（implicit
none）时，用DIMENSION定义数组时，当数组名首字母不属于(A-J,O-Z)里时，其值输出时
为整型，当然做了如下声明情况也会如此：（implicit double precision (A-I,O-Z)）
如下：
******************************************************************************
PROGRAM F
dimension m(2)
m(1)=1.5
m(2)=2.5
print *,m(1),m(2)
end
******************************************************************************
输出的结果是“1，2”而不是“1.500000,2.500000”
当把程序中m改为a时，输出“1.500000,2.500000”
所以，比较好的方法是尝试用REAL来定义数组（当然也可以用REAL*8）：
******************************************************************************
PROGRAM G
real m(2)
m(1)=1.5
m(2)=2.5
print *,m(1),m(2)
end
******************************************************************************
另外，要说的是，变量可以不定义而直接赋值，但会出现如上面PROGRAM D-E的问题，所以
建议大家在编程的时候对非整型变量声明一下，尽管麻烦，但不容易出错，有时候正是这
类错误会让初学者困扰好久。
定义变量时，经常会看到两种定义的写法：以REAL为例：
可以有
real m
和 real:: m
第一种方式不可以直接赋值，必须写成这样：
******************************************************************************
PROGRAM H
real m
m=1.0
print *,m
end
******************************************************************************
第二种则可以：
******************************************************************************
PROGRAM I
real:: m=1.0
print *,m
end
******************************************************************************

一些免费的Fortran编译器

Free Fortran Compilers

取自 http://www.thefreecountry.com/compilers/fortran.shtml
This page lists free Fortran compilers for various operating systems. Some of the compilers are compliant with the ANSI Fortran 77 specifications, others with Fortran 95, and so on. Some of them may also come complete with debuggers, editors and an integrated development environment (IDE).

If you need a book on Fortran, you may want to check out the selection of books available at Amazon.com.

Disclaimer

The information provided on this page comes without any warranty whatsoever. Use it at your own risk. Just because a program, book or service is listed here or has a good review does not mean that I endorse or approve of the program or of any of its contents. All the other standard disclaimers also apply.

Free Fortran Compilers and IDEs

Sun Studio Compilers and Tools: Sun Studio Compilers and Tools for Linux and Solaris OS on Sparc and x86/x64 platforms includes command line tools as well as a NetBeans-based IDE for developing, compiling and debugging C, C++ and Fortran programs. It also includes performance analysis tools.
Intel Fortran Compiler for Linux: The Intel Fortran Compiler for Linux is free for personal, non-commercial use (registration required). It features an optimizing compiler, the Intel Debugger (GUI and command-line), mixed language support (C and Fortran), full compliance with the ISO Fortran 95 standard, support for the evolving Fortran 2003 standard, multi-threaded application support (OpenMP and auto-parallelization), ability to handle big-endian data files, compatibility with various Linux tools (like make, gdb and Emacs), substantial compatibility with Compaq Visual Fortran, etc. The optimizing compiler supports interprocedural optimization, profile guided optimization, automatic vectorizer, etc.
G95: G95 is an open source Fortran 95 compiler. At the time this was written, most of the ISO Fortran 95 standard has been implemented. Platforms supported include Linux(x86, Intel IA64, AMD x86_64), Windows, Macintosh OS X, FreeBSD, Sparc Solaris and HP-UX.
Gfortran: gfortran is a Fortran 95 compiler. It runs on Linux and Windows (under cygwin).
Salford FTN95 Fortran 95 Compiler: Salford FTN95 is a Fortran 95 compiler that supports Fortran 77, Fortran 90 and Fortran 95. The compiler generates exectuables for Win32 (but Win32 console and GUI applications) and the Microsoft .NET framework. It comes with CHECKMATE, a tool that lets programmers check the correctness of their code at runtime. Also included is Plato 3 (an IDE), full source level debugging, documentation and examples. You may only generate code for your personal use on your home computer, and all executables will display a banner on execution.
Salford FTN77 PE ANSI Fortran 77 Compiler: The Salford FTN77 PE (Personal Edition) comes with a full optimising ANSI Fortran 77 compiler with support for various common extensions (including MIL-STD-1753), linker, libraries, make utility, librarian and a full screen debugger. The compiler has a built-in assembler for inline assembly, and the ability to link with code from other sources (such as C++ Fortran 90 and Fortran 95 code). It is free for personal use and for use by students. It supports Windows 95, 98 and NT.
Open Source Watcom / OpenWatcom Fortran Compiler: The Watcom (now OpenWatcom) Fortran 77 compiler is now available free of charge, complete with source code. This compiler, which generates code for Win32, Windows 3.1 (Win16), OS/2, Netware, MSDOS (16 and 32 bit), etc, was a well-known compiler some years back (until Sybase terminated it).
MinGW'S G77 (GNU Fortran): This system comes with the GNU G77 Fortran compiler (among other things, including a C/C++ compiler), which you can use to generate Win32 executables from F77 code. Like many systems based on the GNU tools, Mingw32 comes with complete with various programming tools, such as a program maintainence program (ie, make), text processing tools (sed, grep), lexical analyser generator (flex), parser generator (bison), etc.
DJGPP GNU G77 (Fortran 77) for MSDOS: This is a development system based on the well-known GNU compiler system that includes compilers for Fortran 77, C, C++, Objective C, etc. It generates 32 bit MSDOS executables that is Windows 95 long-filename-aware. It is a very complete system with IDEs, graphics libraries, lexical analyser generators (flex), parser generators (bison), text processing utilities (like grep, sed), a program maintainence utility (ie, make), a dos extender, and so on. The compiler, utilities and libraries come with source code.
f2j - Fortran to Java Compiler: f2j translates Fortran 77 source code to Java class files. It is distributed under the GNU GPL and runs on Linux, SunOS/Solaris.
F2C - Fortran to C Translator: This is a well-known Fortran to C converter that comes with source code. The site also includes pre-compiled binaries (executables) for MSDOS and Microsoft Windows, although these are by no means the only systems supported - the compiler works on Unix systems like BSD, Linux, etc. You have to compile the compiler yourself on those systems. Libraries containing the runtime support needed (together with the C source code) are also included. You need a C compiler to generate binaries from your Fortran sources.
FORCE Project - Fortran Compiler and Editor: FORCE is actually just an IDE for Fortran 77 that integrates the GNU Fortran 77 compiler (G77).
Emx/Rsx G77 (GNU Fortran): This is another GNU Fortran port. The RSX port compiles DOS extended console applications for Win32 and the EMX port generates MSDOS extended applications as well as OS/2 applications. The compiler supports the Fortran 77 syntax.
Lcc-Win32 Fortran Compiler: LCC-Win32 is primarily a free C compiler and its programming environment for Win32, but it also appears to have a Fortran compiler available for download from their website. It apparently compiles Fortran 77 code (with some common extensions) to C which is subsequently compiled by the C compiler to generate a Win32 native executable. The entire process is integrated seamlessly into the IDE so you might not even realise that intemediate C files were being generated (they are deleted automatically when they are no longer needed). The IDE supports syntax highlighting in C and Fortran.
Compaq Fortran for Linux Alpha: This Fortran compiler is for Linux Alpha systems only. It implements the full Fortran-95 language as well as a few language extensions. It comes with a debugger (ladebug), an extended maths library (the Compaq Extended Math Library, CXML) containing technical and scientific subroutines. The licence for the free version allows it to be used for personal and educational purposes, and prohibits its use in any commercial venture.

我的Fortran基本用法小结

作者：gator

目录：
一、说明
二、概述
三、数据类型及基本输入输出
四、流程控制
五、循环
六、数组
七、函数
八、文件

一、说明
本文多数内容是我读彭国伦《Fortran 95 程序设计》的笔记。只读到第九章，主要是3~9
章，都是最基本的用法（原书共16章）。这里主要摘录了我看书过程中总结的一些Fortran和C不
同的地方，主要是语法方面。希望这份笔记能够给学过C但没有接触过Fortran的同学带去一些帮
助。要想得更清楚些，推荐看一下原书，觉得作者真的写得很好，很清楚；如果有C语言的基础，
看完前九应该很快的，花一两天就行了。觉得如果耐心看完本文，基本功能应该也可以顺利用起
来了。外，由于我之前没有用过Fortran，这次为了赶文档看书又看得很粗浅，大多数东西看过
之后都没得及仔细想，只是按着作者的意思去理解。所以这份笔记还处于纸上谈兵的层次。如果
有不妥的方，希望大家指正。谢谢！
文中蓝色的部分是程序代码，!后面的内容为注释。

二、概述
1、名词解释
Fortran=Formula Translator/Translation
一看就知道有什么特色了：可以把接近数学语言的文本翻译成机械语言。的确，从一开始
，IBM设计的时候就是为了方便数值计算和科学数据处理。设计强大的数组操作就是为了实现这一
目标。ortran奠定了高级语言发展的基础。现在Fortran在科研和机械方面应用很广。

2、Fortran的主要版本及差别
按其发展历史，Fortran编译器的版本其实很多。现在在广泛使用的是Fortran 77和Fortr
an90。ortran 90在Fortran 77基础上添加了不少使用的功能，并且改良了77编程的版面格式，
所以编程时推荐使用90。鉴于很多现成的程序只有77版本，有必要知道77的一些基本常识，至少保
证能够看77程序。以下是77和90的一些格式上的区别。
Fortran 77： 固定格式（fixed format），程序代码扩展名：.f或.for
（1）若某行以C,c或*开头，则该行被当成注释；
（2）每行前六个字符不能写程序代码，可空着，或者1~5字符以数字表明行代码（用作格
式化输入出等）；7~72为程序代码编写区；73往后被忽略；
（3）太长的话可以续行，所续行的第六个字符必须是"0"以外的任何字符。
Fortran 90：自由格式（free format）， 扩展名：.f90
（1）以"!"引导注释；
（2）每行可132字符，行代码放在每行最前面；
（3）以&续行，放在该行末或下行初。
以下都是讨论Fortran 90。

3、Fortran的一些特点，和C的一些不同
其实很多，在下面涉及具体方面时可以看到。这里只是大致提一些。
（1）不分大小写
（2）每句末尾不必要写分号
（3）程序代码命令间的空格没有意义
（4）不像C，Fortran不使用{ }
（5）数据类型多出了复数和逻辑判断类型。比如复数类型
complex :: a  !声明复数的方法。复数显然方便了科学计算，满足了工程方面需求
a=(1.0,2.0)   ! a=1+i
（6）多出了乘幂运算（**）。乘幂除了整数还可以是实数形式。如开方，开立方
a=4.0**0.5，a=8.0**(1.0/3.0)。
（7）数组有一些整体操作的功能；可以方便的对部分元素进行操作
（8）有些情况下可以声明大小待定的数组，很实用的功能

4、Fortran的基本程序结构
先看一看所谓的"Hello Fortran"程序。
program main          !程序开始，main是program的名字，完全自定义
write(*,*) "Hello"    !主程序
stop                  !终止程序
end [program[main]]   !end用于封装代码，表示代码编写完毕。[ ]中的内容可省略，下同。
再看一段实用一些的程序，好有点感性认识。程序用于计算圆柱的表面积，要求输入底面
半径和。其中展示了Fortran的一些特色用法。程序摘自维基。其实是一个叫www.answers.com
的网上引的维基的网页。推荐去看看!能查到不少有意思的东西。
program cylinder        !给主函数起个名字
! Calculate the area of a cylinder.
! Declare variables and constants.
! constants=pi
! variables=radius squared and height
implicit none    ! Require all variables to be explicitly declared
!这个一般都是要写上的。下面会进一步说明。
integer :: ierr
character :: yn
real :: radius, height, area
real, parameter :: pi = 3.1415926536   !这是常量的声明方法
interactive_loop: do       !do循环，Fortran中的循环可以加标签，如d前面的
!interactive_loop就是标签

!    Prompt the user for radius and height

!    and read them.

write (*,*) 'Enter radius and height.'    !屏幕输出

read (*,*,iostat=ierr) radius,height     !键盘输入。isotat的值用判断输入成功否。

!    If radius and height could not be read from input,

!    then cycle through the loop.

if (ierr /= 0) then          

write(*,*) 'Error, invalid input.'

cycle interactive_loop          !cycle 相当于C里的continue

end if

!    Compute area.  The ** means "raise to a power."

area = 2 * pi * (radius**2 + radius*height)     ! 指数运算比C方便

!    Write the input variables (radius, height)

!    and output (area) to the screen. 

write (*,'(1x,a7,f6.2,5x,a7,f6.2,5x,a5,f6.2)') &

 !"&"表示续行。这里还显示了格式化输出
'radius=',radius,'height=',height,'area=',area
yn = ' '
yn_loop: do             !内嵌的另一个do循环
write(*,*) 'Perform another calculation? y[n]'
read(*,'(a1)') yn
if (yn=='y' .or. yn=='Y') exit yn_loop
if (yn=='n' .or. yn=='N' .or. yn==' ') exit interactive_loop
end do yn_loop       !结束内嵌do循环
end do interactive_loop
end program cylinder    
Fortran程序的主要结构就是这样了。一般还会有些module的部分在主函数前，函数在主函
数后。

三、数据类型及基本输入输出
1、数据类型，声明及赋初值
（1）integer： 短整型kind=2, 长整型kind=4
integer([kind=]2) :: a=3
如果声明成integer:: a，则默认为长整型。
!"::" 在声明并同时赋初值时必须要写上；类型名后面有形容词时也必须保留::；其他情况可略去
!所谓形容词，可以看一下这个。比如声明常数
real，parameter :: pi=3.1415926 。parameter就是形容词。
（2）real：单精度kind=4（默认），双精度kind=8
real([kind=]8) :: a=3.0
还有指数的形式，如1E10为单精度，1D10为双精度
（3）complex 单精度和双精度
complex([kind=]4) b
（4）character
character([len=]10) c  !len为最大长度
（5）logical
logical*2 :: d=.ture. (等价于logical(2)::d=.ture.)
（6）自定义类型type：类似于C中的struct
Fortran 77中给变量赋初值常用DATA命令，可同时给多个变量赋初值
data  a,b,string  /1, 2.0, 'fortran'/
与C不同的是，Fortran中变量不声明也能使用,即有默认类型（跟implicit命令有关）。按
照默认的定，以i,j,k,l,m,n开头的变量被定义为integer,其余为real。取消该设置需在程序声明
部分之前implicit none。彭国伦建议一般都使用该语句。
另一点关于声明的不同是Fortran有"等价声明"：
integer a,b
equivalence(a,b)
使得a,b使用同一块内存。这样可以节省内存；有时可精简代码。如：equivalence(很长名
字的变量如三维数组的某个元素，a)，之后使用a来编写程序就简洁多了。

2、基本输入输出
输入：read(*,*) a           !从键盘读入
输出：write(*,*) "text" !在屏幕上输出。Fortran 77用' text'。Fortan 90中一般" "和' '都可
print *，"text"               !只能用于屏幕输出
（*,*）完整写为（unit=*,fmt=*）。其中unit为输入/输出位置，如屏幕，文件等；fmt为
格式。如这两项都写成*，则按默认的方式进行，即上面描述的。print后面的*表示按默认格式输
出。

四、流程控制
1、运算符
（1）逻辑运算符
==    /=    >    >=   <    <=        !Fortran 90用法
.EQ.  .NE.  .GT.  .GE.  .LT.  .LE.   !Fortran 77用法
（2）涉及相互关系的集合运算符
.AND.  .OR.  .NOT.  .EQV.  .NEQV.
! 仅.NOT.连接一个表达式，其余左右两边都要有表达式（可以是logical类型的变量）
!.EQV.：当两边逻辑运算值相同时为真， .NEQV.：当两边逻辑运算值不同时为真

2、IF

(1) 基本 ： 

if(逻辑判断式) then

……

end if 

如果then后面只有一句，可写为

if(逻辑判断式)  ……     !then和end if可省略

(2)  多重判断：

if（条件1） then

……

else if（条件2）then

……

else if （条件3）then

……

else

……

end if

(3) 嵌套：

if(逻辑判断式) then

if(逻辑判断式) then

if(逻辑判断式) then

else if(逻辑判断式) then

……

else

    ……

end if         

end if

end if

(4) 算术判断：

program example

implicit none

real c

write (*,*)  "input a number"

read (*,*) c

if(c) 10,20,30 !10,20和30为行代码,根据c小于/等于/大于0，执行10/20/30行的程

10    write (*,*)  "A"

goto 40        !goto可实现跳到任意前面或后面的行代码处，但用多了破坏程序结

20    write (*,*)  "B"

goto 40

30    write (*,*)  "C"

goto 40

40    stop

end

3、SELECT CASE

类似于C的switch语句

select case(变量)

case（数值1） ! 比如case(1:5)代表1<=变量<=5会执行该模块

……          !case（1，3，5）代表变量等于1或3或5会执行该模块

case（数值2） !括号中数值只能是integer,character或logical型常量，不能real型

…

case default

……

end case

4、PAUSE, CONTINUE
pause暂停程序执行，按enter可继续执行
continue貌似没什么用处，可用作封装程序的标志

五、循环
1、DO
do counter=初值, 终值, 增/减量   !counter的值从初值到终值按增/减量变，
……           !counter每取一个值对应着一次循环。增/减量不写则认为1
……   
……     !循环主体也没有必要用{}
…… 
end do
Fortran 77中不是用end do来终止，而是下面这样子：
do 循环最后一行的行代码  counter=初值, 终值, 增/减量
……
行代码       ……       !这是do的最后一行

2、DO WHILE
do while(逻辑运算)
……
……
end do
类似于C中的while(逻辑运算) {……}。
一开始那个计算圆柱表面积的程序中，应该也算是这一类。不过它是通过内部的if语句来
控制循。看来也是可以的，不过在这本书上没看到这样写。其实应该也可以归于下面这种。

3、没看到和C里面的do{……}while(逻辑运算); 相对应的循环语句，不过可以这样，保证

至少做一循环：

do while(.ture.)

……  

……  

if(逻辑运算) exit  !exit就好比C里面的break。C里的continue在Fortran里是cycle

end do

4、Fortran的一个特色：带署名的循环
可以这样，不易出错：
outer:  do i=1,3
inner:  do j=1,3
……
end do inner
end do outer
还可以这样，很方便：
loop 1: do i=1,3
loop2: do j=1,3
if(i==3) exit loop1     !exit终止整个循环loop1
if(j==2) cycle loop2    !cycle跳出loop2的本次循环，进行loop2的下次循环
write(*,*) i,j
end do loop2
end do loop1
还有一些循环主要用于Fortran中的数组运算，为Fortran特有，很实用。

六、数组
1、数组的声明
和C不同的是，Fortran中的数组元素的索引值写在（）内，且高维的也只用一个（），如
integer a(5)   !声明一个整型一维数组
real :: b(3,6)  !声明一个实型二维数组
类型可以是integer, real, character, logical或type。最高可以到7维。
数组大小必须为常数。但是和C语言不同，Fortran也有办法使用大小可变的数组，方法如：
integer, allocatable :: a(:)

!声明小可变经过某个途径得知所需数组大小size之后，用下面的语句：
allocate(a(size))   !配置内存空间
之后该数组和通过一般方法声明的数组完全相同。
与C不同，Fortran索引值默认为从1开始，而且可以在声明时改变该规则：
integer a(-3:1)   ! 索引值为-3，-2，-1，0，1
integer b(2:3,-1:3) !b(2~3,-1~3)为可使用的元素

2、数组在内存中的存放
和C不同，Fortran中的数组比如a(2,2)在内存中存放顺序为a(1,1),a(2,1),a(1,2),a(2,2
)。原则是放低维的元素，再放高维的元素。此规则称为column major。

3、赋初值
（1）最普通的做法：
integer a(5)
data  a  /1,2,3,4,5/
或integer :: a(5)=(/1,2,3,4,5/)
若integer :: a(5)=5，则5个元素均为5
对于integer :: a(2,2)=(/1,2,3,4/) 
根据数组元素在内存中存放的方式，等价于赋值a(1,1)=1,a(2,1)=2,a(1,2)=3,a(2,2)=4
（2）利用Fortran的特色：隐含式循环。看例子就明白了。
integer a(5)
integer i
data (a(i),i=2,4)/2,3,4/    !(a(i),i=2,4)表示i从2到4循环，增量为默认值1
还可以这样：
integer i
integer :: a(5)=(/1,(2,i=2,4),5/)   !五个元素分别赋值为1，2，2，2，5
integer :: b(5)=(/i, i=1,5/)        !五个元素分别赋值为1，2，3，4，
还可以嵌套
data ((a(i,j),i=1,2),j=1,2)=/1,2,3,4/ !a(1,1)=1,1(2,1)=2,a(1,2)=3,a(2,2)=4

4、操作整个数组
设a，b为相同类型、维数和大小的数组
a=5           !所有元素赋值为5
a=(/1,2,3/) !这里假设a为一维，a(1)=1,a(2)=2,a(3)=3
a=b          !对应元素赋值，要求a,b,c维数和大小相同，下同
a=b+c
a=b-c
a=b*c
a=b/c
a=sin(b)     !内部函数都可以这样用

5、操作部分数组元素
a为一维数组
a(3:5)=(/3,4,5/)   !a(3)=3,a(4)=4,a(5)=5
a(1:5:2)=3          !a(1)=3,a(3)=3,a(5)=3
a(3:)=5              !a(3)以及之后的所有元素赋值为5
a(1:3)=b(4:6)      !类似于这种的要求左右数组元素个数相同
a(:)=b(:,2)         !a(1)=b(1,2),a(2)=b(2,2)，以此类推

6、WHERE
where形式上类似于if，但只用于设置数组。设有两个同样类型、维数和大小的数组a,b
where(a<3)
b=a          !a中小于3的元素赋值给b对应位置的元素
end where 
再如：where(a(1:3)/=0)  c=a  !略去了end where,因为只跟了一行where可嵌，也
!可类似do循环有署名标签。

7、FORALL
有点像C中的for循环：
forall(triplet1[,triplet2 [,triplet3…]],mask)
其中triplet形如i=2：6：2，表示循环，最后一个数字省略则增量为1
例如：
forall(i=1:5,j=1:5,a(i,j)<10)
a(i,j)=1
end forall
又如： forall(i=1:5,j=1:5,a(i,j)/=0) a(i,j)=1/a(i,j)
forall也可以嵌套使用，好比C中for循环的嵌套。

七、函数
Fortran中函数分两类：子程序（subroutine）和自定义函数（function）。自定义函数本
质上就是学上的函数，一般要传递自变量给自定义函数，返回函数值。子程序不一定是这样，可
以没有返值。传递参数要注意类型的对应，这跟C是一样的。
1、子程序
目的：把某一段经常使用的有特定功能的程序独立出来，可以方便调用。
习惯上一般都把子程序放在主程序结束之后。
形式：
subroutine name (parameter1, parameter2) 
!给子程序起一个有意义的名字。可以传递参数，这样可以有返回值。括号内也可以
空着，代不传递参数。
implicit none
integer:: parameter1, parameter2   !需要定义一下接收参数的类型。
……                               !接下来的程序编写跟主程序没有任何别。
…… 
mreturn !跟C不同，这里表示子程序执行后回到调用它的地方继续执行下面的程序。不一定放

            !在最后。可以放在子程序的其他位置，作用相同；子程序中return之后的部分不执行。
end [subroutine name]
调用：使用call命令直接使用，不需要声明。在调用处写：
call subroutine name(parameter1,parameter2)
注意点：
a.子程序之间也可相互调用。直接调用就是了，像在主程序中调用子程序一样。
b.传递参数的原理和C中不同。Fortran里是传址调用(call by address/reference)，就是
传递时用参数和子程序中接收时用的参数使用同一个地址，尽管命名可以不同。这样如果子程序
的执行改子程序中接收参数的值，所传递的参数也相应发生变化。
c.子程序各自内部定义的变量具有独立性，类似于C。各自的行代码也具有独立性。因此各
个子程序主程序中有相同的变量名、行代码号，并不会相互影响。

2、自定义函数
和子程序的明显不同在于：需要在主程序中声明之后才能使用。调用方式也有差别。另外
按照惯例用函数不去改变自变量的值。如果要改变传递参数的值，习惯上用子程序来做。
声明方式：real, external :: function_name
一般自定义函数也是放在主程序之后。
形式：
function function_name(parameter1, parameter2)
implicit none
real:: parameter1, parameter2    !声明函数参数类型，这是必需的
real::function_name         !声明函数返回值类型，这是必需的
……
……
function_name=….    !返回值的表达式
return
end    
也可以这样直接声明返回值类型，简洁些：
real function function_name(parameter1, parameter2)
implicit none
real:: parameter1, parameter2   !这个还是必需的
……
……
function_name=….   !返回值表达式
return
end  
调用：function_name(parameter1,parameter2)
不需要call命令。
自定义函数可以相互调用。调用时也需要事先声明。
总之，调用自定义函数前需要做声明，调用子程序则不需要。

3、关于函数中的变量
（1）注意类型的对应。Fortran中甚至可以传递数值常量，但只有跟函数定义的参数类型
对应才会到想要的结果。如call ShowReal(1.0)就必须用1.0而不是1。
（2）传递数组参数，也跟C一样是传地址，不过不一定是数组首地址，而可以是数组某个
指定元素地址。比如有数组a(5)，调用call function(a)则传递a(1)的地址，调用call functio
n(a(3))则递a(3)的地址。
（3）多维数组作为函数参数，跟C相反的是，最后一维的大小可以不写，其他维大小必须
写。这决于Fortran中数组元素column major的存放方式。
（4）在函数中，如果数组是接收用的参数，则在声明时可以用变量赋值它的大小，甚至可
以不指定小。例如：
subroutine Array(num,size)
implicit none
integer:: size
integer num(size) !可以定义一个数组，其大小是通过传递过来的参数决定的。这很实用
……
……
return
end
（5）save命令：将函数中的变量值在调用之后保留下来，下次调用此函数时该变量的值就
是上次保的值。只要在定义时加上save就行：
integer, save :: a=1
（6）传递函数（包括自定义函数、库函数、子程序都是可以的）。类似于C中的函数指针需要在
主程序和调用函数的函数中都声明作为参数传递的函数。如
real, external :: function  !自定义函数
real, intrinsic :: sin        !库函数
external sub                 !子程序
（7）函数使用接口（interface）：一段程序模块。以下情况必需：
a.函数返回值为数组
b.指定参数位置来传递参数时
c.所调用的函数参数个数不固定
d.输入指标参数时
e.函数返回值为指针时。
具体用法结合例子容易看懂。例子都很长。看书吧。

4、全局变量

功能就不用说了。原理：根据声明时的相对位置关系而取用，不同与C中根据变量名使用。

如果在主程序中定义：

integer :: a,b

common a,b   !就是这样定义全局变量的

在子程序或自定义函数中定义：

integer :: c,d

common c,d

则a和c共用相同内存，b和d共用相同内存。

全局变量太多时会很麻烦。可以把它们人为归类，只需在定义时在common后面加上区间名

。如

common /groupe1/ a, common /group2/ b。这样使用时就不必把所有全局变量

都列出来，再声明common /groupe1/ c就可以用a、c全局变量了。

可以使用block data程序模块。在主程序和函数中不能直接使用前面提到的data命令给全

局变量赋初值。可以给它们各自赋初值；如果要使用data命令必须要这样：

block data [name]

implicit none

integer a,b,c

real d,e

common a b c

common /group1/ d,e

data a,b,c,d,e /1,2,3,4.0,5.0/

end [block data [name]]

5、Module
Module不是函数。它用于封装程序模块，一般是把具有相关功能的函数及变量封装在一起
。用法很单，但能提供很多方便，使程序变得简洁，比如使用全局变量不必每次都声明一长串，
写在odule里调用就行了。Module一般写在主程序开始之前。
形式：
module module_name
……
……
end [module [module_name]]
使用：在主程序或函数中使用时，需要在声明之前先写上一行：use module_name.
Module中有函数时必须在contains命令之后（即在某一行写上contains然后下
面开始写数，多所有函数都写在这个contains之后）。并且module中定义过的变量在module里的
函数中可直接使用，函数之间也可以直接相互调用，连module中的自定义函数在被调用时也不用
先声明。

 6、include放在需要的任何地方，插入另外的文件(必须在同一目录下)。如：

include 'funcion.f90'

八、文件
1、文本文件
Fortran里有两种读取文件的方式，对应于两种文件
顺序读取：用于文本文件
直接读取：用于二进制文件
这里只摘录关于文本文件的读取。一般模式如下。
character(len=20)::filenamein="in.txt", filenameout="out.txt"  !文件名
logical alive
integer::fileidin=10,fileidout=20 
!10，20是给文件编的号，除1，2，5，6的正整数都可，因为2、6是默认的输出位置（屏幕
），1、5是默认的输入位置（键盘）
integer::error
real::in,out
!下面这一段用于确认指定名字的文件是否存在
inquire(file=filenamein, exist=alive)  !如果存在，alive赋值为0
if(.NOT. alive) then
write(*,*) trim(filenamein), " doesn't exist."!trim用于删去filenamein中字串
!后面的stop多余空格，输出时好看些
end if
open([unit=]fileidin, file=filenamein, status="old")
open([unit=]fileidout,file=filenameout[,status="new"])
!unit指定输入/输出的位置。打开已有文件一定要用status="old"；打开新文件用status="new"；
!不指定status，则默认status="unknown"，覆盖已有文件或打开新文件……
read([unit=]fileidin, [fmt=]100,iostat=error )in    !error=0表示正确读入数据。
100  format(1X,F6.3) 
!按一定格式输入输出，格式可以另外写并指定行代码，也可以直接写在read/write中
write(([unit=]fileidout, "(1X,F6.3)")out
close(fileidin)
close(fileidout)
!1X代表一个空格。F6.3代表real型数据用占6个字符（含小数点），其中小数点后三位。
!常用的还有I3，用于整型数据，共占三个字符；A8，字符型，占8个字符。换行用 /
二进制文件的读取有所不同。不再列举。

2、内部文件
另一个很实用的读写功能是内部文件（internal file）。看看这个例子就明白了。
integer::a=1,b=2
character(len=20)::string
write(unit=string,fmt="(I2,'+',I2,'=',I2)")a,b,a+b
write(*,*)string
则结果输出1+2=3。反过来也是可以的：
integer a
character(len=20)::string="123"
read(string,*)a
write(*,*)a
则输出123。

!全文结束。

ZelluX 2007-12-16 21:03 发表评论

Sampling

ZelluX — Fri, 14 Dec 2007 05:42:00 GMT

CAL样例程序里面出现很多sample指令，google到的简单介绍：

Antialias （抗锯齿）

虽然减小像素的大小可以使图像可以更加精细，一定程度上减轻了锯齿，但是只要像素的大小大到可以互相彼此区分，那么锯齿的产生是不可避免的！抗锯齿的方法一般是多点（注意此处是“点”而不是“像素”，后面可以看出它们间的区别）采样。

一、 理论与方法：

1 ． Oversampling （重复取样）：

（ 1 ）方法：

　首先，将场景以比你的显示器（前缓冲）更高分辨率进行渲染：

假设当前的（前 / 后缓冲）的分辨率是 800 × 600 ，那么可以先将场景渲染到 1600 × 1200 的渲染目标上（纹理）；

　然后，从高分辨率的渲染目标得到低分辨率的场景渲染结果：

此时取每 2 × 2 个像素块颜色的平均值为最终渲染的像素颜色值。

（ 2 ）优点：可以显著地改善锯齿导致的失真。

（ 3 ）缺点：需要更大的缓冲，同时填充缓冲导致性能消耗变大；

进行多个像素的取样，导致性能下降；

由于以上缺点， D3D 并没有采用这种抗锯齿方法。

2 ． Multisampling （多取样）：

（ 1 ）方法：

只需要对像素进行一次取样，而是在每个像素中取 N 个点（取决于具体的取样模型），该像素的最终颜色 = 该像素原先的颜色 * 　多边形覆盖的点数　 / 　总的取样点数；

（ 2 ）优点：可以改善锯齿带来的失真的同时而不会增加取样次数，同时比起 Oversampling 它也不需要更大的后备缓冲。

（ 3 ）缺点：原本当一个多边形覆盖了一个像素的中心点时，该像素的颜色才会由该多边形决定（在像素管线阶段典型的就是寻址到合适的纹理颜色与顶点管线输出的颜色进行调制），但是 Multisampling 中，如果该多边形覆盖了其中一部分取样点却未覆盖像素中心点，该像素颜色仍然由此多边形决定。如此一来，纹理寻址可能出现错误，这对于纹理集（ atlas ）会出现另一种失真效果：多边形边缘颜色错误！

3 ． Centriod Sampling （质心采样）：

（ 1 ）方法：

为了解决在使用 Multisampling 导致的在纹理集中进行纹理寻址带来的错误，不再采用像素中心的颜色作为“该像素原先的颜色”，而是用“该像素中被多边形覆盖的那些取样点的中心点的颜色”。这样就保证了被渲染的像素点始终是多边形的内部（也就是说纹理地址不会超出多边形的范围）。

（ 2 ）如何使用：

①任何有COLOR语义作为输入的Pixel Shader会自动运用质心采样；

②在Pixel Shader的输入参数的语义后中手动加入 _centroid 扩展，例如：

float4 TexturePointCentroidPS( float4 TexCoord : TEXCOORD0_centroid ) : COLOR0

{

return tex2D( PointSampler, TexCoord );

}

（ 3 ）注意：

质心采样主要用于采用纹理集的 Multisampling ，对于一整张纹理对应一个的多边形网格的情况，采用质心采样反而会导致错误！

ZelluX 2007-12-14 13:42 发表评论

Inter-Procedural Analysis 相关的资料 (3)

ZelluX — Tue, 27 Nov 2007 07:24:00 GMT

ORC (Open Research Compiler) 的一个讲座，里面有不少IPA的内容
http://www.blogjava.net/Files/zellux/ORC-PACT02-tutorial.rar

然后貌似龙书第二版里也讲了大量的IPA优化和call graph方面的东西，啃啊啃

ZelluX 2007-11-27 15:24 发表评论

Inter-Procedural Analysis 相关的资料 (2)

ZelluX — Mon, 26 Nov 2007 04:53:00 GMT

University of Houston, Computer Science Department, High Performance Computing Tools Group的一篇论文：
Overview of the Open64 Compiler Infrastructure
VI.4. Interprocedural Analysis
Interprocedural Analysis (IPA) is performed in the following phases of Open64:
• Inliner phase
• IPA local summary phase
• IPA analysis phase
• IPA optimization phase
• IPA miscellaneous
By default the IPA does the function inlining in the inliner facility. The local summary phase is done in the IPL module and the analysis phase and optimization phase in the ipa-link module.
During the analysis phase, it does the following:
• IPA_Padding Analysis (common blocks Padding/Split Analysis)
• Construction of the Callgraph
Then it does space and multigot partitioning of the Callgraph. The partitioning algorithm takes into account whether it is doing partitioning for solving space or the multigot problem.
During the optimization phase the following phases are performed:
• IPA Global Variable Optimization
• IPA Dead function elimination
• IPA Interprocedural Alias Analysis
• IPA Cloning Analysis (It propagates information about formal parameters used as symbolic terms in array section summaries. This information is later used to trigger cloning.
• IPA Interprocedural Constant propagation
• IPA Array_Section Analysis
• IPA Inlining Analysis
• Array section summaries arrays for the Dependence Analyzer of the Loop Nest Optimizer.

ZelluX 2007-11-26 12:53 发表评论

Inter-Procedural Analysis 相关的资料 (1)

ZelluX — Sun, 25 Nov 2007 15:04:00 GMT

突然要做一个相关的编译优化项目，先放一点国外网的IPA的资料上来，教育网出国不方便

GCC wiki:

Analysis and optimizations that work on more than one procedure at a time. This is usually done by making walking the Strongly Connected Components of the call graph, and performing some analysis and optimization across some set of procedures (be it the whole program, or just a subset) at once.

GCC has had a callgraph for a few versions now (since GCC 3.4 in the FSF releases), but the procedures didn't have control flow graphs (CFGs) built. The tree-profiling-branch in GCC CVS now has a CFG for every procedure built and accessible from the callgraph, as well as a basic IPA pass manager. It also contains in-progress interprocedural optimizations and analyses: interprocedural constant propagation (with cloning for specialization) and interprocedural type escape analysis.

IBM的XL Fortran V10.1 for Linux:

Benefits of interprocedural analysis (IPA)

Interprocedural Analysis (IPA) can analyze and optimize your application as a whole, rather than on a file-by-file basis. Run during the link step of an application build, the entire application, including linked libraries, is available for interprocedural analysis. This whole program analysis opens your application to a powerful set of transformations available only when more than one file or compilation unit is accessible. IPA optimizations are also effective on mixed language applications.

Figure 2. IPA at the link step

The following are some of the link-time transformations that IPA can use to restructure and optimize your application:

Inlining between compilation units
Complex data flow analyses across subprogram calls to eliminate parameters or propagate constants directly into called subprograms.
Improving parameter usage analysis, or replacing external subprogram calls to system libraries with more efficient inline code.
Restructuring data structures to maximize access locality.

In order to maximize IPA link-time optimization, you must use IPA at both the compile and link step. Objects you do not compile with IPA can only provide minimal information to the optimizer, and receive minimal benefit. However when IPA is active on the compile step, the resulting object file contains program information that IPA can read during the link step. The program information is invisible to the system linker, and you can still use the object file and link without invoking IPA. The IPA optimizations use hidden information to reconstruct the original compilation and can completely analyze the subprograms the object contains in the context of their actual usage in your application.

During the link step, IPA restructures your application, partitioning it into distinct logical code units. After IPA optimizations are complete, IPA applies the same low-level compilation-unit transformations as the -O2 and -O3 base optimizations levels. Following those transformations, the compiler creates one or more object files and linking occurs with the necessary libraries through the system linker.

It is important that you specify a set of compilation options as consistent as possible when compiling and linking your application. This includes all compiler options, not just -qipa suboptions. When possible, specify identical options on all compilations and repeat the same options on the IPA link step. Incompatible or conflicting options that you specify to create object files, or link-time options in conflict with compile-time options can reduce the effectiveness of IPA optimizations.

Using IPA on the compile step only

IPA can still perform transformations if you do not specify IPA on the link step. Using IPA on the compile step initiates optimizations that can improve performance for an individual object file even if you do not link the object file using IPA. The primary focus of IPA is link-step optimization, but using IPA only on the compile-step can still be beneficial to your application without incurring the costs of link-time IPA.

Figure 3. IPA at the compile step

IPA Levels and other IPA suboptions

You can control many IPA optimization functions using the -qipa option and suboptions. The most important part of the IPA optimization process is the level at which IPA optimization occurs. Default compilation does not invoke IPA. If you specify -qipa without a level, or specify -O4, IPA optimizations are at level one. If you specify -O5, IPA optimizations are at level two.

Table 5. The levels of IPA
IPA Level	Behaviors
qipa=level=0	Automatically recognizes standard library functions Localizes statically bound variables and procedures Organizes and partitions your code according to call affinity, expanding the scope of the -O2 and -O3 low-level compilation unit optimizer Lowers compilation time in comparison to higher levels, though limits analysis
qipa=level=1	Level 0 optimizations Performs procedure inlining across compilation units Organizes and partitions static data according to reference affinity
qipa=level=2	Level 0 and level 1 optimizations Performs whole program alias analysis which removes ambiguity between pointer references and calls, while refining call side effect information Propagates interprocedural constants Eliminates dead code Performs pointer analysis Performs procedure cloning Optimizes intraprocedural operations, using specifically: Value numbering Code propagation and simplification Code motion, into conditions and out of loops Redundancy elimination techniques

IPA includes many suboptions that can help you guide IPA to perform optimizations important to the particular characteristics of your application. Among the most relevant to providing information on your application are:

lowfreq which allows you to specify a list of procedures that are likely to be called infrequently during the course of a typical program run. Performance can increase because optimization transformations will not focus on these procedures.
partition which allows you to specify the size of the regions within the program to analyze. Larger partitions contain more procedures, which result in better interprocedural analysis but require more storage to optimize.
threads which allows you to specify the number of parallel threads available to IPA optimizations. This can provide an increase in compilation-time performance on multi-processor systems.
clonearch which allows you to instruct the compiler to generate duplicate subprograms with each tuned to a particular architecture.

Using IPA across the XL compiler family

The XL compiler family shares optimization technology. Object files you create using IPA on the compile step with the XL C, C++, and Fortran compilers can undergo IPA analysis during the link step. Where program analysis shows that objects were built with compatible options, such as -qnostrict, IPA can perform transformations such as inlining C functions into Fortran code, or propagating C++ constant data into C function calls.

ZelluX 2007-11-25 23:04 发表评论