wueddie

Basics of Virtual Memory Area

Virtual Memory Area

Virtual Memory Area is also called Memory Region in some book.

In the process address spaces, there are many memory areas, contiguous addresses wil be divided into different memory area if the access right of them are different. For example, in one Java Process, there are 359 memory areas.

so the kernel need to find a effective way to insert into, remove from, search from the list of memory areas. The semantics of find_area API is the as the following.

return null if
    1. The list itself is empty.
    2. The list is not empty, and the address is big than the last memory area.

return found area if
    1. the address is in the region of one area.
    2. the address is not in the region of any area. but is not bigger than the last area.
        it means it is in the hole between areas. right area besides the hole is returned.

posted @ 2008-08-05 17:34 InPractice 阅读(132) | 评论 (0) | 编辑收藏

How Linux kernel decrease the Kernel Stack Size in 2.6

The kernel are trying to use as little resource as possible. Here is an example, Originally, in kernerl 2.4, the size of Kernel Stack is 8K. Now, in kernel 2.6, it could be 4K, if you enable it in compilaiton time.

Why will kernel spend effort to support such a feature when most of PC have more than 1 Gigabyte memonry. I think it has something to do with the C10K probleum; C10K means Concurrent 10 Thousand Processes(Threads). considering a system with more thant 10 thousand processes, such as a WEB server, the save of 4K in every kernel stack will become 4K * 10 K = 40 M tatal save of memory, which is a big deal!

How is it possible to achieve that? originally the kernel mode stack is also used in Exception and Interrupt handling, but Exception and Interrupt handling is not specific to any process. so in 2.6, Interrupt and Exception will have their own Stack for each CPU. Kernel stack is only used by process in the kernel mode. so the acutal kernel stack did not become small.
2.4     8K Stack shared between process kernel mode, Exception, Interrupt.
v.s
2.6     4K Stack specific for process kernel mode Stack
        4K Stack specific for Exception Stack
        4K Stack specific for Interrupt Stack
Besides this, in 8K stack of 2.4, task_struct is at the bottom of stack, which may cost about 1K, in 4K stack of 2.6, only thread_info is at the bottom of stack, the task_struct is put into a per-CPU data structre, thread_info is only about 50 bytes.

posted @ 2008-08-01 09:26 InPractice 阅读(1022) | 评论 (0) | 编辑收藏

High level summary of my understanding on Linux Kernel Memory Management

Here is just the high level summary of my understanding on Linux Kernel Memory Management. I think it can help achieve a better understanding of the book <<understanding linux kernel>>.

It is said, the memory management is most complex sub-system in linux kernel, at the same time, there aren't too much System Calls for it. Becuase most the the complex mechanism happens trasparently to the user process, such as COW(Copy On Write), On Demand Paging. For user process, to successfully refer to a linear memory address, the following factors are necessary:
    vm_area_struct (Virtual Memory Area, Memory Region) are set up correctly.
    Phsical memory are allocated.
    Page Global Directory, Page Table, and the corresponding entry are correclty set up according to Virtual Memory Area and Phisical Meory.

This three factors can be further simplified as
    Virtual Memory
    Phisical Memory
    Mappting between Virtual Momory and Phisical Memory.

From user process's perspective, only Virtual Memory is visible, when user process applys for memory, he got virtual memory; phisical memory may not be allocated yet. All these three factors are managed by the kernel, they can be thought of as three resource managed by the kernel. kernel not only need to manage the Virtual Memoty in user address space, but also need to manage Virtual Memory in kernel address space.

When user process try to use his virtual memory, but the phisical memory is not allocated yet. Page Exception happens, kernel take charge of it and allocate the phisical memory and set up the mapping. user process reexecute the instruction and everything go forward smoothly. It's called On Demand Paging.

Besides that there are many more concepts, such as Memory mapping, non-linear memory mapping. I will continue this article when I dig into the details.

posted @ 2008-07-29 12:20 InPractice 阅读(285) | 评论 (0) | 编辑收藏

One Interesting Usage of PS command.

ps -H -A
can show the relationship between all the processes in a tree format. it is helpful when you want to research the internals of UNIX.

init
   keventd
   ksoftirqd/0
   bdflush
   kswapd

we can see from the above that all the process are the children of init (directly or indirectly). especially the kernel thread are also the children of init process.
process 0 is special, it is not displayed.

From the following:
sshd
    sshd
      sshd
        bash
          vim
            cscope
    sshd
      sshd
        bash
          ps
we can see that how ssh works. actually I have created two ssh session to the server.

posted @ 2008-07-28 15:51 InPractice 阅读(126) | 评论 (0) | 编辑收藏

Java 的虚拟内存分配

根据以下Xusage.txt中的说明：
-Xms<size> set initial Java heap size
-Xmx<size> set maximum Java heap size

Java -Xms512M 应该为Java分配至少512M的内存，但是在Linux中用TOP查看，其RSS和SIZE的值远小于512M。我的理解是Java向操作系统申请内存时，用的是mmap2或者old_mmap系统调用，这两个系统调用其实都没有真正分配物理内存，而仅仅是分配了虚拟内存。所以预先分配的这些内存要到实际使用时才能落实到位。

posted @ 2008-07-17 11:21 InPractice 阅读(283) | 评论 (0) | 编辑收藏

summary of regular expression grammar

There are not too much grammar. here is just the incomplete summary for the future reference.
meta-character
.                                any character
|                                or
()                                grouping
[]                                character class
[^]                                negative character class

Greedy Quantifier
?                                optional
*                                any amount
+                                at least one
lazy quantifier
??
*?
+?
possessing quantifier
?+
*+
++

position related
^                                start ot the line
\A
$                                end of the line
\Z
\<                                start of the word
\>                                end of the word
\b                                start or end of the word

non-capturing group                (?:Expression)
non-capturing atomic group        (?>Expression)
positive lookahead                (?=Expression)
negative lookahead                (?!Expression)
positive lookbehind                (?<=Expression)
negative lookbehind             (?<!Expression)

\Q start quoting
\E end quoting

mode modifier
(?modifier)Expression(?-modifier)
valid modifier
i            case insensitive match mode
x             free spacing
s            dot matches all match mode
m            enhanced line-anchor match mode
(?modifier:Expression)

comments:
(?#Comments)

posted @ 2008-07-16 17:06 InPractice 阅读(81) | 评论 (0) | 编辑收藏

kernel memory mapping summay

kernel memory mapping summay

Today, finally I become clear about the relationship between
fixed mapping
permanent kernel mapping
temporary kernel mapping
noncontiguous memory area mapping
(I feel that most of the name is not appropriate, to some text, it will mislead the reader.)

4G linear virtual address space is divided into two major part.
kernel space mapping     [3G, 4G)
user space mapping        [0, 3G)

kernel space mapping is divided into more pieces
linear mapping [3G, 3G + 896M)
non linear mapping [3G + 896M + 8M, 4G)
1. Fixed Mapping (wrong name, should be compile time mapping, the virtual address is decided in compile time. )
2. Temporary mapping
3. Permanent mapping
4. noncontiguous memory area mapping (Vmalloc area)

The following is the diagram for the reference.

FIXADDR_TOP            (=0xfffff000)

                    fixed_addresses (temporary kernel mapping is part of it)
                     #define __FIXADDR_SIZE (__end_of_permanent_fixed_addresses << PAGE_SHIFT)

FIXADDR_START        (FIXADDR_TOP - __FIXADDR_SIZE)

                    temp fixed addresses (used in boot time)
                     #define __FIXADDR_BOOT_SIZE     (__end_of_fixed_addresses << PAGE_SHIFT)

FIXADDR_BOOT_START    (FIXADDR_TOP - __FIXADDR_BOOT_SIZE)

                    Persistent kmap area (4M)

PKMAP_BASE            ( (FIXADDR_BOOT_START - PAGE_SIZE*(LAST_PKMAP + 1)) & PMD_MASK )

                     2*PAGE_SIZE

VMALLOC_END            (PKMAP_BASE-2*PAGE_SIZE) or (FIXADDR_START-2*PAGE_SIZE)

                     noncontiguous memory area mapping (Vmalloc area)

VMALLOC_START        (((unsigned long) high_memory + 2*VMALLOC_OFFSET-1) & ~(VMALLOC_OFFSET-1))

high_memory            MIN (896M, phisical memory size)

below the excerp of the source code.

#ifdef CONFIG_X86_PAE
#define LAST_PKMAP 512
#else
#define LAST_PKMAP 1024
#endif

#define VMALLOC_OFFSET (8*1024*1024)
#define VMALLOC_START   (((unsigned long) high_memory + \
                        2*VMALLOC_OFFSET-1) & ~(VMALLOC_OFFSET-1))

#ifdef CONFIG_HIGHMEM
# define VMALLOC_END    (PKMAP_BASE-2*PAGE_SIZE)
#else
# define VMALLOC_END    (FIXADDR_START-2*PAGE_SIZE)
#endif

enum fixed_addresses {
        FIX_HOLE,
        FIX_VDSO,
        FIX_DBGP_BASE,
        FIX_EARLYCON_MEM_BASE,
#ifdef CONFIG_X86_LOCAL_APIC
        FIX_APIC_BASE, /* local (CPU) APIC) -- required for SMP or not */
#endif
#ifdef CONFIG_X86_IO_APIC
        FIX_IO_APIC_BASE_0,
        FIX_IO_APIC_BASE_END = FIX_IO_APIC_BASE_0 + MAX_IO_APICS-1,
#endif
#ifdef CONFIG_X86_VISWS_APIC
        FIX_CO_CPU,     /* Cobalt timer */
        FIX_CO_APIC,    /* Cobalt APIC Redirection Table */
        FIX_LI_PCIA,    /* Lithium PCI Bridge A */
        FIX_LI_PCIB,    /* Lithium PCI Bridge B */
#endif
#ifdef CONFIG_X86_F00F_BUG
        FIX_F00F_IDT,   /* Virtual mapping for IDT */
#endif
#ifdef CONFIG_X86_CYCLONE_TIMER
        FIX_CYCLONE_TIMER, /*cyclone timer register*/
#endif
#ifdef CONFIG_HIGHMEM
        FIX_KMAP_BEGIN, /* reserved pte's for temporary kernel mappings */
        FIX_KMAP_END = FIX_KMAP_BEGIN+(KM_TYPE_NR*NR_CPUS)-1,
#endif
#ifdef CONFIG_ACPI
        FIX_ACPI_BEGIN,
        FIX_ACPI_END = FIX_ACPI_BEGIN + FIX_ACPI_PAGES - 1,
#endif
#ifdef CONFIG_PCI_MMCONFIG
        FIX_PCIE_MCFG,
#endif
#ifdef CONFIG_PARAVIRT
        FIX_PARAVIRT_BOOTMAP,
#endif
        __end_of_permanent_fixed_addresses,
        /* temporary boot-time mappings, used before ioremap() is functional */
#define NR_FIX_BTMAPS   16
        FIX_BTMAP_END = __end_of_permanent_fixed_addresses,
        FIX_BTMAP_BEGIN = FIX_BTMAP_END + NR_FIX_BTMAPS - 1,
        FIX_WP_TEST,
        __end_of_fixed_addresses
}

posted @ 2008-07-16 17:05 InPractice 阅读(305) | 评论 (0) | 编辑收藏

Scaling your JEE application part 2 - 阅读笔记

scale up     -     vertically scale
scale out     -     horizontally scale

scale out
1. Use share nothing clustering architectures
    The session failover functionality cannot avoid errors completely when failures happen, as my article mentioned, but it will damage the performance and scalability.

2. Use scalable session replication mechanisms
    The most scalable one is paired node replication, the least scalable solution is using database as session persistence storage.

3. Use collocated deployment instead of distributed one.

4. Shared resources and services
    Database servers, JNDI trees, LDAP Servers, and external file systems can be shared by the nodes in the cluster.

5. Memcached
    Memcached's magic lies in its two-stage hash approach. It behaves as though it were a giant hash table, looking up key = value pairs. Give it a key, and set or get some arbitrary data. When doing a memcached lookup, first the client hashes the key against the whole list of servers. Once it has chosen a server, the client then sends its request, and the server does an internal hash key lookup for the actual item data.

6. Terracotta
    Terracotta extends the Java Memory Model of a single JVM to include a cluster of virtual machines such that threads on one virtual machine can interact with threads on another virtual machine as if they were all on the same virtual machine with an unlimited amount of heap.

7. Using unorthodox approach to achieve high scalability

posted @ 2008-07-09 11:43 InPractice 阅读(256) | 评论 (0) | 编辑收藏

为对象属性同时定义get和is方法导致的hibernate异常。

今天遇到了一个奇怪的Hibernate问题。（我用得hibernate是2.1版。比较旧，不知道这个问题在hibernate 3 中是否存在。）
下面这个是捕捉到的异常堆栈。
java.lang.ClassCastException: java.lang.Boolean
at net.sf.hibernate.type.StringType.set(StringType.java:26)
at net.sf.hibernate.type.NullableType.nullSafeSet(NullableType.java:48)
at net.sf.hibernate.type.NullableType.nullSafeSet(NullableType.java:35)
at net.sf.hibernate.persister.EntityPersister.dehydrate(EntityPersister.java:393)
at net.sf.hibernate.persister.EntityPersister.insert(EntityPersister.java:466)
at net.sf.hibernate.persister.EntityPersister.insert(EntityPersister.java:442)
at net.sf.hibernate.impl.ScheduledInsertion.execute(ScheduledInsertion.java:29)
at net.sf.hibernate.impl.SessionImpl.executeAll(SessionImpl.java:2382)
at net.sf.hibernate.impl.SessionImpl.execute(SessionImpl.java:2335)
at net.sf.hibernate.impl.SessionImpl.flush(SessionImpl.java:2204)
奇怪之处在于程序在本机Tomcat上运行情况良好，一旦部署到Linux服务器上就挂了。

仔细分析之后，发现要存储的对象既定义了get方法又定义了is方法。内容示例如下
public class FakePO {
    String goodMan;
    public String getGoodMan() {
        return goodMan;
    }
    public void setGoodMan(String goodMan) {
        this.goodMan = goodMan;
    }
    public boolean isGoodMan(){
        return "Y".equalsIgnoreCase(goodMan);
    }
}
怀疑可能是这个衍生的辅助方法isGoodMan()导致的问题。通过追踪Hibernate 2的源代码，发现hibernate 2是按如下方式通过反射API访问PO的。

private static Method getterMethod(Class theClass, String propertyName) {
        Method[] methods = theClass.getDeclaredMethods();
        for (int i=0; i<methods.length; i++) {
            // only carry on if the method has no parameters
            if ( methods[i].getParameterTypes().length==0 ) {
                String methodName = methods[i].getName();

                // try "get"
                if( methodName.startsWith("get") ) {
                    String testStdMethod = Introspector.decapitalize( methodName.substring(3) );
                    String testOldMethod = methodName.substring(3);
                    if( testStdMethod.equals(propertyName) || testOldMethod.equals(propertyName) ) return methods[i];

                }

                // if not "get" then try "is"
                /*boolean isBoolean = methods[i].getReturnType().equals(Boolean.class) ||
                    methods[i].getReturnType().equals(boolean.class);*/
                if( methodName.startsWith("is") ) {
                    String testStdMethod = Introspector.decapitalize( methodName.substring(2) );
                    String testOldMethod = methodName.substring(2);
                    if( testStdMethod.equals(propertyName) || testOldMethod.equals(propertyName) ) return methods[i];
                }
            }
        }
        return null;
    }
仔细读以上代码可以发现，Hibernate就是简单的遍历类的public方法，看是否和属性名称匹配，并不检查方法的返回值是否和属性的类型匹配。所以在我们的例子中，既可能返回get方法，也可能返回is方法，取决于public方法列表的顺序，而这个顺序恰恰是没有任何保证的。这也解释了为什么这个问题只能在特定平台上发生。

posted @ 2008-07-07 12:14 InPractice 阅读(581) | 评论 (0) | 编辑收藏

Linux 内核源码阅读－ write 系统调用的实现

最近在看write系统调用的实现，虽然还有一下细节不是很清楚，但是大致的实现机理还是有一定的理解了。总结如下：
这里假设最普通的情况，不考虑Direct IO 的情况。从全家的高度看，要往一个文件中写入内容，需要一下几步。
1. sys_write 将用户进程要写的内容写入到内核的文件页面缓冲中。sys_write 本身到此就结束了。
2. pdflush 内核线程（定期或者由内核阈值触发）刷新脏的页面缓冲，其实只是提交IO请求给底层的驱动。
3. IO请求并不是同步执行的，而是由底层的驱动调度执行，发出DMA操作指令。
4. 物理IO完成之后会中断并通知内核，内核负责更新IO的状态。
先要去陪儿子睡觉了。有空会继续细化各个部分的实现。

sys_write 的调用过程。（我的linux内核版本为2.6.24，文件系统为ext3）
asmlinkage ssize_t sys_write(unsigned int fd, const char __user * buf, size_t count)

vfs_write(file, buf, count, &pos);

file->f_op->write(file, buf, count, pos);
这里的file->fop 是在open一个文件是初始化的函数指针，ext3文件系统对应的函数为do_sync_write。
下面是其实现的要点。
for (;;) {
300                 ret = filp->f_op->aio_write(&kiocb, &iov, 1, kiocb.ki_pos);
301                 if (ret != -EIOCBRETRY)
302                         break;
303                 wait_on_retry_sync_kiocb(&kiocb);
304         }
305
306         if (-EIOCBQUEUED == ret)
307                 ret = wait_on_sync_kiocb(&kiocb);
filp->f_op->aio_write(&kiocb, &iov, 1, kiocb.ki_pos); 是实现的核心，其函数指针指向ext3_file_write。
307行的作用在于等待IO的完成。这里的IO完成指的是进入IO的队列而已，不是物理IO的完成。

generic_file_aio_write(iocb, iov, nr_segs, pos);

__generic_file_aio_write_nolock(iocb, iov, nr_segs, &iocb->ki_pos);

generic_segment_checks(iov, &nr_segs, &ocount, VERIFY_READ);

generic_file_buffered_write(iocb, iov, nr_segs, pos,ppos,count,written);

generic_file_direct_IO(WRITE, iocb, iov, pos, *nr_segs);

以下的调用序列还很长，一时还消化不了。仅供自己参考。

posted @ 2008-06-02 21:43 InPractice 阅读(2605) | 评论 (0) | 编辑收藏

Basics of Virtual Memory Area

How Linux kernel decrease the Kernel Stack Size in 2.6

High level summary of my understanding on Linux Kernel Memory Management

One Interesting Usage of PS command.

Java 的虚拟内存分配

summary of regular expression grammar

kernel memory mapping summay

Scaling your JEE application part 2 - 阅读笔记

为对象属性同时定义get和is方法导致的hibernate异常。

Linux 内核源码阅读－ write 系统调用的实现

导航

常用链接

留言簿(2)

随笔分类

随笔档案

搜索

最新评论

阅读排行榜

评论排行榜