wueddie

The analysis of MOR(MXOR) instruction implementation in MMIXWare

The analysis of MOR(MXOR) instruction implementation in MMIXWare

-- A stupid way to understand the source code.

the implementation of MOR(MXOR) is in file: mmix-arith.w
436 octa bool_mult(y,z,xor)
437   octa y,z; /* the operands */
438   bool xor; /* do we do xor instead of or? */
439 {
440   octa o,x;
441   register tetra a,b,c;
442   register int k;
443   for (k=0,o=y,x=zero_octa;o.h||o.l;k++,o=shift_right(o,8,1))
444     if (o.l&0xff) {
445       a=((z.h>>k)&0x01010101)*0xff;
446       b=((z.l>>k)&0x01010101)*0xff;
447       c=(o.l&0xff)*0x01010101;
448       if (xor) x.h^=a&c, x.l^=b&c;
449       else x.h|=a&c, x.l|=b&c;
450     }
451   return x;
452 }

It takes me several hours to understand the details.

If we treat each octabyte as a matrix, each row corresponds to a byte, then
y MOR z = z (matrix_mulitiply) y

For a=((z.h>>k)&0x01010101)*0xff;
(z.h>>k)&0x01010101 will get the four last bit in (z.h>>k). depends on the bit in last row,
((z.h>>k)&0x01010101)*0xff will expand the bit (either 0 or 1) into the whole row.
e.g.
             ff
*     0x01010101
---------------
=           ff
         ff
         ff
     ff
----------------
=    ffffffff
(depending on the last bit in each row of z, the result could be #ff00ff00. #ff0000ff, etc.)

similarily, b=((z.l>>k)&0x01010101)*0xff; will expand the last bit in each byte into the
whole byte.

over all, after these two step, the z becomes the replication of it's last row, since k vary
from 0 to 7, it will loop on all the rows actually.

For c=(o.l&0xff)*0x01010101, it will get the last byte in o.l and populate it to other three byte.
since it will not only or/xor h but also l. it is not necessary populate it to o.h.

one example,
let (z.h>>k)&0x01010101 = 0x01000101, then a= 0xff00ffff;
let (z.l>>k)&0x01010101 = 0x01010001, then b= 0xffff00ff;
let (o.l&0xff)=0xuv, then c= 0xuvuvuvuv;
then a&c=0xuv00uvuv;
     b&c=0xuvuv00uv;

consider the elements [i,j] in result x. in this round, what value was accumalated in by operation
or(xor).
it is the jth bit in last byte of o.l & ith bit in last column of z.(do not consider looping now.)
in this round, the 64 combination of i and j, contirbute the value to the 64 bits in z.

Noticed that o loop on y from last byte to first byte. There are 8 loop/rounds, in another round.
say kth round.
the elements[i,j] will accumuate the jth bit in last (k + 1)th row & the jth bit in last (k+1)th
column.
that means the jth column in y multiply the ith row in z. it conform to the definiton for
z matrix_multiply y.

posted @ 2009-01-16 10:54 InPractice 阅读(276) | 评论 (0) | 编辑收藏

九连环游戏和数学

游戏和数学有密切的联系。最近在玩九连环，感受更深。

之所以开始玩九连环，是因为在高德纳的书中提到了格雷码和九连环的关系。为了理解生成格雷码的算法，特意买了九连环来玩。毕竟书上的
描述没有实际玩起来那么容易理解。

通过这个游戏，我不仅会解九连环了，而且掌握的生成格雷码的一种算法。

posted @ 2009-01-13 19:59 InPractice 阅读(400) | 评论 (0) | 编辑收藏

A piece of beautiful and trick bitwise operation code.

A detailed reading process of a piece of beautiful and trick bitwise operation code.

The following code is from MMIXWare, it is used to implement the Wyde difference between two octabyte.

    in file: "mmix-arith.w"
    423 tetra wyde_diff(y,z)
    424   tetra y,z;
    425 {
    426   register tetra a=((y>>16)-(z>>16))&0x10000;
    427   register tetra b=((y&0xffff)-(z&0xffff))&0x10000;
    428   return y-(z^((y^z)&(b-a-(b>>16))));
    429 }

It is hard to understand it without any thinking or verification, here is the process I used
to check the correctness of this algorithm.

let y = 0xuuuuvvvv;
     z = 0xccccdddd; (please note the [c]s may be different hex number.)

then y>>16 = 0x0000uuuu;
     z>>16 = 0x0000cccc;

then ((y>>16)-(z>>16)) = 0x1111gggg if #uuuu < #cccc or
     ((y>>16)-(z>>16)) = 0x0000gggg if #uuuu >= #cccc

so variable a = 0x00010000 if #uuuu < #cccc or
   variable a = 0x00000000 if #uuuu >= #cccc

similarly, we can get
   variable b = 0x00010000 if #vvvv < #dddd or
   variable b = 0x00000000 if #vvvv >= #dddd

for (b-a-(b>>16)))), there are four different result depending on the relation between a and b.
when #uuuu >= #cccc and #vvvv >= #dddd, (b-a-(b>>16)))) = 0x00000000;
when #uuuu >= #cccc and #vvvv < #dddd, (b-a-(b>>16)))) = 0x00001111;
when #uuuu < #cccc and #vvvv >= #dddd, (b-a-(b>>16)))) = 0x11110000;
when #uuuu < #cccc and #vvvv < #dddd, (b-a-(b>>16)))) = 0x11111111;
You can see that >= map to #0000 and < map to #1111

for y-(z^((y^z)&(b-a-(b>>16)))), when (b-a-(b>>16)))) is 0x00000000, z^((y^z)&(b-a-(b>>16))) is
z^((y^z)& 0) = z^0=z, so y-(z^((y^z)&(b-a-(b>>16))))=y-z.
similarily, when (b-a-(b>>16)))) is 0x11111111, z^((y^z)&(b-a-(b>>16))) is
z^((y^z)& 1) = z^(y^z)=y, so y-(z^((y^z)&(b-a-(b>>16))))=0.

when (b-a-(b>>16)))) is 0x11110000 or 0x11110000, we can treat the y and z as two separate wydes.
each wyde in the result is correct.

You may think it is a little stupid to verify such kind of details. but for my point of view,
without such detailed analysis, I can not understand the algorithm in the code. with the hard
work like this, I successfully understand it. The pleasure deserve the effort.

I am wondering how can the author discover such a genius algorithm.

posted @ 2009-01-06 16:04 InPractice 阅读(212) | 评论 (0) | 编辑收藏

儿子的梦话

昨天晚上，瑞瑞睡到十二点的时候，爬出被子，躺在外面。结果咳嗽得很厉害，吐了好几口。吃了枇杷膏很快就睡着了。

没过一会，他就开始做恶梦了。下面是他的梦话：
“爸爸的手不见了”
“爸爸掉下去了”
“我亲爱的爸爸不见了，我可怎么办阿”

估计是最近洪恩的GOGO学英语的DVD看多了。

posted @ 2008-11-19 10:34 InPractice 阅读(107) | 评论 (0) | 编辑收藏

Create a MMIX simulator in Java

After reading the <<MMIX: A RISC Computer for the New Millennium>>, I am ispired to Create a MMIX simulator in Java.
Donald Knuth already created a high quality MMIX simulater in C, why I still bother to creating a new one in Java.
First, I want to learn more about how the computer works. I think re-implement a simulator for MMIX can
help me gain a better understanding.
Second, I want to exercise my Java skills.

After about one month's work, I realize that I can not finish it by myself. I am looking for the help.
If you are interested in MMIX and know Java, Please give me a hand.

Currently I have finished most of the instructions, but some important and complex one are not completed
yet.
I have developed a few JUnit TestCase for some instructions, but it's way far from covering all the instructions (there are 256 instructions total).
Few of the sample MMIX program in Donald Knuth's MMIXware package, such as cp.mmo, hello.mmo can be
simulated successfully, but there are much more to support.

To help on this project, first you need the access to the current source code. It's hosted on Google
code. Please follow the steps below to access the source code.

Use this command to anonymously check out the latest project source code:
# Non-members may check out a read-only working copy anonymously over HTTP.
svn checkout http://mmix.googlecode.com/svn/trunk/ mmix-read-only

If you are willint to help, please comment on this blog with your email address.

posted @ 2008-11-07 14:39 InPractice 阅读(178) | 评论 (0) | 编辑收藏

My confusion about kernel and corresponding clarification.

There are many questions coming into my mind when I read the Linux kernel book and source code. As time goes by, I become more knowledgeable than before and can address those questions by myself, here is the first question addressed by myself.

Q: why kernel have to map the high memory in kernel space, why not just allocate the high memory and only map it in user process.

A: Because kernel also need to access the high memory before it returned the allocated memory to user process. For example, kernel must zero the page or initialized the page for security reason. Please refer to linux device driver page 9.

Q: why not let the clib zero the page or initialize it, it saves the kernel's effort and simplifies the kernel.

A: besides Requesting memory through clib, user program can also request memory through direct System call, in this situation, the security is not guaranteed, the information in memory will be leaked.

posted @ 2008-10-08 14:40 InPractice 阅读(159) | 评论 (0) | 编辑收藏

How to substitute text in the file.

9/26/2008 8:57AM
Today I want to research the different ways to substitute text in the file. For records, I written them down.

1. use Ultra Edit, it is super easy for a Windows user if you have Ultra Edit installed.
use Ctrl + R to get Replacement Wizard and follow you intuition.

2. use VI in Unix.
:s/xx/yy/
will replace xx with yy.

3. use filter, such as sed, and awk in Unix.
sed -e 's/xx/yy/g' file.in > file.out
replace xx with yy in all the lines. It seem sed will not change the original input file, so I redirect the out put to file.out

posted @ 2008-09-26 10:24 InPractice 阅读(83) | 评论 (0) | 编辑收藏

WYSIWYM vs WYSIWYG

1 WYSIWYM vs WYSIWYG

WYSIWYM stands for What You See is What You Mean; WYSIWYG stands for What You See is What You Get;

Microsoft -- Word is always considered as a example of WYSIWYG. Today I have a look at the tool named LyX, which is an example of WYSIWYM. From an end user's point of view, there are more similarity than difference between them.

They both display the the resulted layout on the fly; they both provide button to typeset the document.

The difference I can see between then is -- LyX use text file, while Word use binary file. But I don't think it matters.

In my humble opinion, the real difference between Word and LyX/LaTeX is as the following. In Word, you typeset in the lower level, you can control all the details but it also need more effort. In LyX/LaTex, you typeset in higher level, you only need to figure out the logic structure of the document. The resulted layout is not decided by you, you actually just share the layout developed by the expert. I think it is the key advantage of WYSIWYM.

posted @ 2008-08-15 15:24 InPractice 阅读(196) | 评论 (0) | 编辑收藏

Trouble shooting - Fail to send out mail from application server

Yesterday, we found that the application can not send mail successfully; the performance of the module using email feature is also very bad. I suspect it caused by that the mail server host name can not be resolved in the application server.

I executed the following command
host <mail server host name>
It shows a strange IP. It means it can not properly resolve the mail server host name

Then I execute the command below.
man host
The output tells me to resort to /etc/resolv.conf
open it with
vi /etc/resolv.conf

The context is as following:
nameserver <name server 1>
nameserver <name server 2>
update the config with correct DNS server IP.

Everything is OK.

P.S. It seems that the ping and host commands are different. For some host name, I can ping it but I can not host it.

posted @ 2008-08-13 17:57 InPractice 阅读(114) | 评论 (0) | 编辑收藏

The inelegance in Operating System

The reality is far from the idealism - the inelegance in Operating System

I am interested in Operating System, after I know more and more concepts, know more and more details, I realize that the reality is far from the idealism. The root cause is the history and to some extent, it is the back compatibility. we can not afford to make a brand new thing from scratch, we need to include many old things in any things.

Let me give some example about how the history make the current Operation System become complicated and inelegant.
1. DMA
DMA stands for Direct Memory Access, which is a way to improve the parallelism in computer system. Basically, with DMA, peripheral device can access main Memory simultaneously when CPU is running. but for historical reason, in X86 platform, some DMA device only have 24 bit address line. which limit the memory scope to 16M. since X86 platform is also lack of IO-MMU to remap the address, the memory can be used in DMA is [0,16M). It definitely complicated the memory management.

2. High Memory
Since Linux kernel has only 1G linear address space, it can not address all the 4G physical memory in 32 bit machine. This is actual a design issue in Linux for historical reason. it does not predict that some day, the physical memory will become so large. Later in order to support more than 1 G physical memory, CONFIG_HIGHMEM compile option was added. There are also other way to fix this problem, such as 4G kernel space v.s. 4G user space.

3. PAE
PAE stands for Physical Memory Extension, PAE make it possible to support up to 64G physical memory. but to me, it is just a temporary solution, does not deserve the effort. I even do not want to have a look on the corresponding document. It does not make too much sense. I prefer to directly move to 64 bit platform. 64 bit platform has its own problems though.

the above is just some inelegant in hardware. majorly cause by historical reason. I am wondering how can we keep up the quick development under the burden of history. maybe at some point, we finally need to throw away the history and move on with a brand new start.

posted @ 2008-08-06 10:58 InPractice 阅读(137) | 评论 (0) | 编辑收藏

仅列出标题

The analysis of MOR(MXOR) instruction implementation in MMIXWare

九连环游戏和数学

A piece of beautiful and trick bitwise operation code.

儿子的梦话

Create a MMIX simulator in Java

My confusion about kernel and corresponding clarification.

How to substitute text in the file.

WYSIWYM vs WYSIWYG

Trouble shooting - Fail to send out mail from application server

The inelegance in Operating System

导航

常用链接

留言簿(2)

随笔分类

随笔档案

搜索

最新评论

阅读排行榜

评论排行榜