小明思考

回文字符串的切割问题

摘要: 给定一个字符串s，分割s使得每个子串都是回文的（即倒过来和原字符串是一样的，如aba）
求最少的分割次数。阅读全文

posted @ 2013-04-11 11:24 小明阅读(4246) | 评论 (3) | 编辑收藏

TopCoder MatchMaker解法

问题描述：

Problem Statement

THIS PROBLEM WAS TAKEN FROM THE SEMIFINALS OF THE TOPCODER INVITATIONAL

TOURNAMENT

DEFINITION

Class Name: MatchMaker

Method Name: getBestMatches

Paramaters: String[], String, int

Returns: String[]

Method signature (be sure your method is public): String[]

getBestMatches(String[] members, String currentUser, int sf);

PROBLEM STATEMENT

A new online match making company needs some software to help find the "perfect

couples". People who sign up answer a series of multiple-choice questions.

Then, when a member makes a "Get Best Mates" request, the software returns a

list of users whose gender matches the requested gender and whose answers to

the questions were equal to or greater than a similarity factor when compared

to the user's answers.

Implement a class MatchMaker, which contains a method getBestMatches. The

method takes as parameters a String[] members, String currentUser, and an int

sf:

- members contains information about all the members. Elements of members are

of the form "NAME G D X X X X X X X X X X"

* NAME represents the member's name

* G represents the gender of the current user.

* D represents the requested gender of the potential mate.

* Each X indicates the member's answer to one of the multiple-choice

questions. The first X is the answer to the first question, the second is the

answer to the second question, et cetera.

- currentUser is the name of the user who made the "Get Best Mates" request.

- sf is an integer representing the similarity factor.

The method returns a String[] consisting of members' names who have at least sf

identical answers to currentUser and are of the requested gender. The names

should be returned in order from most identical answers to least. If two

members have the same number of identical answers as the currentUser, the names

should be returned in the same relative order they were inputted.

TopCoder will ensure the validity of the inputs. Inputs are valid if all of

the following criteria are met:

- members will have between 1 and 50 elements, inclusive.

- Each element of members will have a length between 7 and 44, inclusive.

- NAME will have a length between 1 and 20, inclusive, and only contain

uppercase letters A-Z.

- G can be either an uppercase M or an uppercase F.

- D can be either an uppercase M or an uppercase F.

- Each X is a capital letter (A-D).

- The number of Xs in each element of the members is equal. The number of Xs

will be between 1 and 10, inclusive.

- No two elements will have the same NAME.

- Names are case sensitive.

- currentUser consists of between 1 and 20, inclusive, uppercase letters, A-Z,

and must be a member.

- sf is an int between 1 and 10, inclusive.

- sf must be less than or equal to the number of answers (Xs) of the members.

NOTES

The currentUser should not be included in the returned list of potential mates.

EXAMPLES

For the following examples, assume members =

{"BETTY F M A A C C",

"TOM M F A D C A",

"SUE F M D D D D",

"ELLEN F M A A C A",

"JOE M F A A C A",

"ED M F A D D A",

"SALLY F M C D A B",

"MARGE F M A A C C"}

If currentUser="BETTY" and sf=2, BETTY and TOM have two identical answers and

BETTY and JOE have three identical answers, so the method should return

{"JOE","TOM"}.

If currentUser="JOE" and sf=1, the method should return

{"ELLEN","BETTY","MARGE"}.

If currentUser="MARGE" and sf=4, the method should return [].

Definition

Class:

MatchMaker

Method:

getBestMatches

Parameters:

String[], String, int

Returns:

String[]

Method signature:

String[] getBestMatches(String[] param0, String param1, int param2)

(be sure your method is public)

================================================================我的代码=============

import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.List;

public class MatchMaker {
    enum GENDER{MALE,FEMALE};

    //"NAME G D X X X X X X X X X X"
    private static class Member{
        String name;
        GENDER gender;
        GENDER mate;
        String[] answers;
        int index;
        int matched = 0;
    }

    String[] getBestMatches(String[] members, String currentUser, int sf){
        List<Member> allMembers = new ArrayList<Member>();
        Member cu = null;
        for(int i=0;i<members.length;++i){
            String m = members[i];
            String[] c = m.split(" ");
            Member mem = new Member();
            mem.name= c[0];
            mem.gender = c[1].equals("M")?GENDER.MALE:GENDER.FEMALE;
            mem.mate = c[2].equals("M")?GENDER.MALE:GENDER.FEMALE;
            mem.index = i;
            mem.matched = 0;
            String[] answers = mem.answers = new String[c.length-3];
            for(int j=3;j<c.length;++j){
                answers[j-3] = c[j];
            }
            allMembers.add(mem);
            if(c[0].equals(currentUser)){
                cu = mem;
            }
        }
        List<Member> matched = new ArrayList<Member>();
        if(cu!=null){
            for(Member mem:allMembers){
                if(mem!=cu && mem.gender==cu.mate){
                    for(int i=0;i<mem.answers.length;++i){
                        if(mem.answers[i].equals(cu.answers[i])){
                            ++mem.matched;
                        }
                    }
                    if(mem.matched>=sf){
                        matched.add(mem);
                    }
                }
            }

            Collections.sort(matched, new Comparator<Member>(){
                public int compare(Member ma, Member mb) {
                    if(ma.matched!=mb.matched){
                        return mb.matched - ma.matched;
                    }
                    return ma.index-mb.index;
                }
            });

            String[] result = new String[matched.size()];
            for(int i=0;i<result.length;++i){
                result[i] = matched.get(i).name;
            }
            return result;
        }
        return new String[0];
    }
}

posted @ 2013-04-02 14:04 小明阅读(484) | 评论 (0) | 编辑收藏

有点难度的java笔试题

以下是我在上一家公司出的java笔试题，有些难度，感兴趣的同学可以做做看。

---Question---

1.What is the output of the following program?

public class Foo {

public static void main(String[] args){

Map<byte[], String> m = new HashMap<byte[], String>();

byte[] key = "abcd".getBytes();

m.put(key, "abcd");

System.out.println(m.containsKey(key));

System.out.println(m.containsKey("abcd"));

System.out.println(m.containsKey("abcd".getBytes()));

}

a) true,true,false b)true,false,false c)true,true,true d) false,false,false e)Program throws an exception

2. What is the proper string filled in the following program?

Public class Foo {

public static void main(String[] args) {

String s=”1\\2\\3\\4”;

//split the string with “\”

String []result = s.split(“____”);

for(String r:result){

System.out.println(r);

}

a) \ b) \\ c) \\\ d)\\\\ e)\\\\\

3. What is the output of the following program?

public class Foo {

public static void main(String[] args) {

char[] c = new char[] { '1' };

String s = new String(c);

System.out.println("abcd" + c);

System.out.println("abcd" + s);

}

a) Compile error b)abcd1,abcd1 c) abcd49,abcd1 d) Program throws an exception e)none of above

4. Which class is threading safe which one object can be used between multi-threads without extra synchronized?

a) Vector b) HashMap c) ArrayList d)StringBuilder e)HashSet

5. What is the output of the following program?

public class Foo {

public static void main(String[] args) throws IOException {

ByteArrayOutputStream baos = new ByteArrayOutputStream();

byte[] b = new byte[]{(byte)0x0,(byte)0x1,(byte)0x2};

baos.write(b);

baos.write(0x0102);

byte[] result = baos.toByteArray();

ByteArrayInputStream bais = new ByteArrayInputStream(result);

System.out.println(bais.available());

}

a) 0 b) 1 c)4 d) 5 e) Program throws an exception

6. What is return value of function “calc”?

public class Foo {

public static int calc() throws IOException{

int ret = 0;

try{

++ret;

throw new IOException("try");

}

catch(IOException ioe){

--ret;

return ret;

}

finally{

++ret;

return ret;

}

a) 0 b) 1 c)2 d)3 e) throws an exception

7. What is the output of the following program?

public class Foo {

public static class Value {

private int value;

public int get(){

return value;

}

public void set(int v){

value = v;

}

public static class Values implements Iterable<Value>{

public Values(int capacity){

this.capacity = capacity;

}

int count =1 ;

int capacity;

Value v = new Value();

public Iterator<Value> iterator() {

return new Iterator<Value>(){

public boolean hasNext() {

return count<=capacity;

}

public Value next() {

v.set(count++);

return v;

}

public void remove() {

throw new UnsupportedOperationException();

}

};

}

public static void main(String[] args) {

Values vs = new Values(10);

Value result = null;

for(Value v:vs){

if(result == null){

result = v;

}

else{

result.set(result.get()+v.get());

}

System.out.println(result.get());

}

a) 20 b)40 c)45 d)55 e)throws NullpointerException

8. If add keyword “final” before a class member function, it means:

a) The method can’t access the non-final member variable.

b) The method can’t modify the member variable.

c) The method can’t be override by subclass.

d) The method is a thread-safe function.

e) The method can’t be accessed by other non-final function.

9. About java memory and garbage collector, which statement is correct?

a) Moving variable from locale to class will make GC more effectively.

b) When Full GC is executing, all the user threads will be paused.

c) If object A contains a reference of object B and object B contains a reference of object A, the two objects can’t be reclaimed by GC.

d) When a thread exits, all objects which created by that thread will be reclaimed

e) It is recommended that calling “System.gc()” to control the memory usage.

10. About java classpath and classloader, which statement is NOT correct?

a) User can specify the classpath by using the option “-cp” in Java command line.

b) If user doesn’t specify classpath, the JVM search the class from the current folder by default.

c) A JVM can load two different versions of a library.

d) To define customized class loader, it is possible to load class from internet at runtime.

11. Which data structure has best performance when remove an element from it?

a) Vector b)ArrayList c)LinkedList d)HashMap e)HashSet

12. Which is the correct way to convert bytes from charset “gb2312” to “utf-8”?

byte[] src , dst;

a) dst = new String(src,”utf-8”).getBytes(“gb2312”);

b) dst = new String(src,”gb2312”).getBytes(“utf-8”);

c) dst = new String(src,”utf-16”).getBytes();

d) dst = new String(src).getBytes();

e) None of above.

posted @ 2012-11-07 09:46 小明阅读(2596) | 评论 (3) | 编辑收藏

leveldb性能分析 - 随机写

摘要: 准备工作：1. 下载Snappy库Download source code from: http://code.google.com/p/snappy编译并安装./configure & make & sudo make install2. 编译leveldb自带的db_benchmake db_bench注意:在ubuntu 11.04上编译会出错，修改makefile:$(CX... 阅读全文

posted @ 2012-03-22 17:32 小明阅读(4234) | 评论 (0) | 编辑收藏

leveldb研究11- 流程分析：读数据

leveldb读数据

先看看ReadOptions有哪些参数可以指定：

// Options that control read operations
struct ReadOptions {
  // 是否检查checksum
  // Default: false
  bool verify_checksums;

  // 是否将此次结果放入cache
  // Default: true
  bool fill_cache;

  //是否指定snapshot,否则读取当前版本
  // Default: NULL
  const Snapshot* snapshot;

  ReadOptions()
      : verify_checksums(false),
        fill_cache(true),
        snapshot(NULL) {
  }
};

下面看看读取的详细过程：
查询memtable=>查询previous memtable(imm_)=>查询文件(缓冲）

Status DBImpl::Get(const ReadOptions& options,
                   const Slice& key,
                   std::string* value) {
  Status s;
  MutexLock l(&mutex_);
  SequenceNumber snapshot;
  //设置snapshot
  if (options.snapshot != NULL) {
    snapshot = reinterpret_cast<const SnapshotImpl*>(options.snapshot)->number_;
  } else {
    snapshot = versions_->LastSequence();
  }

  MemTable* mem = mem_;
  MemTable* imm = imm_;
  Version* current = versions_->current();
  mem->Ref();
  if (imm != NULL) imm->Ref();
  current->Ref();

  bool have_stat_update = false;
  Version::GetStats stats;

  // Unlock while reading from files and memtables
  {
    mutex_.Unlock();
    LookupKey lkey(key, snapshot);
    //先查询memtable
    if (mem->Get(lkey, value, &s)) {
      // Done
    } else if (imm != NULL && imm->Get(lkey, value, &s)) { //然后查询previous memtable:imm_
      // Done
    } else {
      //从文件中读取
      s = current->Get(options, lkey, value, &stats);
      have_stat_update = true;
    }
    mutex_.Lock();
  }

  //是否有文件需要被compaction,参见allowed_seek
  if (have_stat_update && current->UpdateStats(stats)) {
    MaybeScheduleCompaction();
  }
  mem->Unref();
  if (imm != NULL) imm->Unref();
  current->Unref();
  return s;
}

重点来看看从version中读取：

Status Version::Get(const ReadOptions& options,
                    const LookupKey& k,
                    std::string* value,
                    GetStats* stats) {
  Slice ikey = k.internal_key();
  Slice user_key = k.user_key();
  const Comparator* ucmp = vset_->icmp_.user_comparator();
  Status s;

  stats->seek_file = NULL;
  stats->seek_file_level = -1;
  FileMetaData* last_file_read = NULL;
  int last_file_read_level = -1;

  //从level0向高层查找，如果再低级level中查到，则不再查询
  std::vector<FileMetaData*> tmp;
  FileMetaData* tmp2;
  for (int level = 0; level < config::kNumLevels; level++) {
    size_t num_files = files_[level].size();
    //本层文件数为空，则返回
    if (num_files == 0) continue;

    // Get the list of files to search in this level
    FileMetaData* const* files = &files_[level][0];
    if (level == 0) {
      //level0特殊处理，因为key是重叠，所有符合条件的文件必须被查找
      tmp.reserve(num_files);
      for (uint32_t i = 0; i < num_files; i++) {
        FileMetaData* f = files[i];
        if (ucmp->Compare(user_key, f->smallest.user_key()) >= 0 &&
            ucmp->Compare(user_key, f->largest.user_key()) <= 0) {
          tmp.push_back(f);
        }
      }
      if (tmp.empty()) continue;

      std::sort(tmp.begin(), tmp.end(), NewestFirst);
      files = &tmp[0];
      num_files = tmp.size();
    } else {
      // 二分法查找，某个key只可能属于一个文件
      uint32_t index = FindFile(vset_->icmp_, files_[level], ikey);
      //没有查到
      if (index >= num_files) {
        files = NULL;
        num_files = 0;
      } else {
        tmp2 = files[index];
        if (ucmp->Compare(user_key, tmp2->smallest.user_key()) < 0) {
          // All of "tmp2" is past any data for user_key
          files = NULL;
          num_files = 0;
        } else {
          files = &tmp2;
          num_files = 1;
        }
      }
    }

    for (uint32_t i = 0; i < num_files; ++i) { //遍历本层符合条件的文件
      if (last_file_read != NULL && stats->seek_file == NULL) {
        //seek_file只记录第一个
        stats->seek_file = last_file_read;
        stats->seek_file_level = last_file_read_level;
      }

      FileMetaData* f = files[i];
      last_file_read = f;
      last_file_read_level = level;

      //从table cache中读取
      Iterator* iter = vset_->table_cache_->NewIterator(
          options,
          f->number,
          f->file_size);
      iter->Seek(ikey);
      const bool done = GetValue(ucmp, iter, user_key, value, &s);
      if (!iter->status().ok()) { //查找到
        s = iter->status();
        delete iter;
        return s;
      } else {
        delete iter;
        if (done) {
          return s;
        }
      }
    }
  }

  return Status::NotFound(Slice());  // Use an empty error message for speed
}

继续跟踪：TableCache

Iterator* TableCache::NewIterator(const ReadOptions& options,
                                  uint64_t file_number,
                                  uint64_t file_size,
                                  Table** tableptr) {
  if (tableptr != NULL) {
    *tableptr = NULL;
  }

  char buf[sizeof(file_number)];
  EncodeFixed64(buf, file_number);
  Slice key(buf, sizeof(buf));

  //从LRU cache中查找
  Cache::Handle* handle = cache_->Lookup(key);
  if (handle == NULL) {
    /加载文件
    std::string fname = TableFileName(dbname_, file_number);
    RandomAccessFile* file = NULL;
    Table* table = NULL;
    Status s = env_->NewRandomAccessFile(fname, &file);
    if (s.ok()) {
      s = Table::Open(*options_, file, file_size, &table);
    }

    if (!s.ok()) {
      assert(table == NULL);
      delete file;
      // We do not cache error results so that if the error is transient,
      // or somebody repairs the file, we recover automatically.
      return NewErrorIterator(s);
    }

    //插入Cache
    TableAndFile* tf = new TableAndFile;
    tf->file = file;
    tf->table = table;
    handle = cache_->Insert(key, tf, 1, &DeleteEntry);
  }

  Table* table = reinterpret_cast<TableAndFile*>(cache_->Value(handle))->table;
  //从Table对象中生成iterator
  Iterator* result = table->NewIterator(options);
  result->RegisterCleanup(&UnrefEntry, cache_, handle);
  if (tableptr != NULL) {
    *tableptr = table;
  }
  return result;
}

posted @ 2012-03-21 17:30 小明阅读(2780) | 评论 (0) | 编辑收藏

leveldb研究10- 流程分析：写数据

总体来说,leveldb的写操作有两个步骤，首先是针对log的append操作，然后是对memtable的插入操作。

影响写性能的因素有：
1. write_buffer_size
2. kL0_SlowdownWritesTrigger and kL0_StopWritesTrigger.提高这两个值，能够增加写的性能，但是降低读的性能

看看WriteOptions有哪些参数可以指定

struct WriteOptions {
  //设置sync=true,leveldb会调用fsync()，这会降低插入性能
  //同时会增加数据的安全性
  //Default: false
  bool sync;

  WriteOptions()
      : sync(false) {
  }
};

首先把Key,value转成WriteBatch

Status DB::Put(const WriteOptions& opt, const Slice& key, const Slice& value) {
  WriteBatch batch;
  batch.Put(key, value);
  return Write(opt, &batch);
}

接下来就是真正的插入了
这里使用了两把锁，主要是想提高并发能力，减少上锁的时间。
首先是检查是否可写，然后append log，最后是插入memtable
<db/dbimpl.cc>

Status DBImpl::Write(const WriteOptions& options, WriteBatch* updates) {
  Status status;
  //加锁
  MutexLock l(&mutex_);
  LoggerId self;
  //拿到写log的权利
  AcquireLoggingResponsibility(&self);
  //检查是否可写
  status = MakeRoomForWrite(false);  // May temporarily release lock and wait
  uint64_t last_sequence = versions_->LastSequence();
  if (status.ok()) {
    WriteBatchInternal::SetSequence(updates, last_sequence + 1);
    last_sequence += WriteBatchInternal::Count(updates);

    // Add to log and apply to memtable.  We can release the lock during
    // this phase since the "logger_" flag protects against concurrent
    // loggers and concurrent writes into mem_.
    {
      assert(logger_ == &self);
      mutex_.Unlock();
      //IO操作：写入LOG
      status = log_->AddRecord(WriteBatchInternal::Contents(updates));
      if (status.ok() && options.sync) {
        status = logfile_->Sync();
      }
      //插入memtable
      if (status.ok()) {
        status = WriteBatchInternal::InsertInto(updates, mem_);
      }
      mutex_.Lock();
      assert(logger_ == &self);
    }
    //设置新的seqence number
    versions_->SetLastSequence(last_sequence);
  }
  //释放写LOG锁
  ReleaseLoggingResponsibility(&self);
  return status;
}

写流量控制：
<db/dbimpl.cc>

Status DBImpl::MakeRoomForWrite(bool force) {
  mutex_.AssertHeld();
  assert(logger_ != NULL);
  bool allow_delay = !force;
  Status s;
  while (true) {
    if (!bg_error_.ok()) {
      // Yield previous error
      s = bg_error_;
      break;
    } else if (
        allow_delay &&
        versions_->NumLevelFiles(0) >= config::kL0_SlowdownWritesTrigger) {
      mutex_.Unlock();
      //如果level0的文件大于kL0_SlowdownWritesTrigger阈值，则sleep 1s，这样给compaction更多的CPU
      env_->SleepForMicroseconds(1000);
      allow_delay = false;  // Do not delay a single write more than once
      mutex_.Lock();
    } else if (!force &&
               (mem_->ApproximateMemoryUsage() <= options_.write_buffer_size)) {
      //可写
      break;
    } else if (imm_ != NULL) {
      // imm_:之前的memtable 没有被compaction，需要等待
      bg_cv_.Wait();
    } else if (versions_->NumLevelFiles(0) >= config::kL0_StopWritesTrigger) {
      // level0文件个数大于kL0_StopWritesTrigger,需要等待
      Log(options_.info_log, "waiting

\n");
      bg_cv_.Wait();
    } else {
      //生成新的额memtable和logfile，把当前memtable传给imm_
      assert(versions_->PrevLogNumber() == 0);
      uint64_t new_log_number = versions_->NewFileNumber();
      WritableFile* lfile = NULL;
      s = env_->NewWritableFile(LogFileName(dbname_, new_log_number), &lfile);
      if (!s.ok()) {
        break;
      }
      delete log_;
      delete logfile_;
      logfile_ = lfile;
      logfile_number_ = new_log_number;
      log_ = new log::Writer(lfile);
      imm_ = mem_;
      has_imm_.Release_Store(imm_);
      mem_ = new MemTable(internal_comparator_);
      mem_->Ref();
      force = false;   // Do not force another compaction if have room

// 发起compaction,dump imm_

      MaybeScheduleCompaction();
    }
  }
  return s;
}

posted @ 2012-03-21 14:41 小明阅读(3500) | 评论 (0) | 编辑收藏

leveldb研究9- 流程分析：打开数据库

摘要: leveldb 是通过Open函数来打开/新建数据库。Code highlighting produced by Actipro CodeHighlighter (freeware)http://www.CodeHighlighter.com/-->static Status Open(const Options& options, &nb... 阅读全文

posted @ 2012-03-20 16:02 小明阅读(3489) | 评论 (0) | 编辑收藏

leveldb研究8- 内存中的数据结构Memtable/SkipList

我们知道,leveldb在写数据的时候，除了log文件，都要在内存中写一份。

先看看SkipList【跳表】这个数据结构：

SkipList有如下特点：
1. 本质上一个排序好的链表
2. 分层，上层节点比下层的少，更具有跳跃性
3. 查询的复杂度是O(logn)

SkipList跟红黑树等还是比较容易实现和理解的，主要长处是比较容易实现Lock free和遍历。
我们来看看leveldb的实现
插入：

//插入一个新的key
template<typename Key, class Comparator>
void SkipList<Key,Comparator>::Insert(const Key& key) {
  //查找插入节点,prev为各层的前置节点
  Node* prev[kMaxHeight];
  Node* x = FindGreaterOrEqual(key, prev);

  // Our data structure does not allow duplicate insertion
  assert(x == NULL || !Equal(key, x->key));

  //生成随机高度
  int height = RandomHeight();
  if (height > GetMaxHeight()) {
    for (int i = GetMaxHeight(); i < height; i++) {
      prev[i] = head_;
    }
    //设置当前最大高度
    max_height_.NoBarrier_Store(reinterpret_cast<void*>(height));
  }

  //生成新节点
  x = NewNode(key, height);
  for (int i = 0; i < height; i++) {
    //设置新节点的各层的下一跳
    x->NoBarrier_SetNext(i, prev[i]->NoBarrier_Next(i));
    //设置前节点的next为当前节点，完成插入
    prev[i]->SetNext(i, x);
  }
}

查询：

template<typename Key, class Comparator>
typename SkipList<Key,Comparator>::Node* SkipList<Key,Comparator>::FindGreaterOrEqual(const Key& key, Node** prev)
    const {
  Node* x = head_;
  int level = GetMaxHeight() - 1; //从高层开始查找，依次到0 level
  while (true) {
    Node* next = x->Next(level);
    if (KeyIsAfterNode(key, next)) { //比next key 要大
      // Keep searching in this list
      x = next;
    } else { //比next key小，查找下一层
      //标记当前level的前置节点
      if (prev != NULL) prev[level] = x;
      if (level == 0) {
        return next;
      } else {
        level--;
      }
    }
  }
}

template<typename Key, class Comparator>
bool SkipList<Key,Comparator>::Contains(const Key& key) const {
  Node* x = FindGreaterOrEqual(key, NULL);
  if (x != NULL && Equal(key, x->key)) {
    return true;
  } else {
    return false;
  }
}

接着我们看看leveldb MemTable的实现，很简单了，基本是对SkipList访问一个封装
<db/memtable.h>

class MemTable {
public:
  explicit MemTable(const InternalKeyComparator& comparator);

  //增加引用计数
  void Ref() { ++refs_; }

  //减少引用计数
  void Unref() {
    --refs_;
    assert(refs_ >= 0);
    if (refs_ <= 0) {
      delete this;
    }
  }

  //内存使用量
  size_t ApproximateMemoryUsage();

  //遍历操作
  Iterator* NewIterator();

  //插入
  void Add(SequenceNumber seq, ValueType type,
           const Slice& key,
           const Slice& value);

  //查询
  bool Get(const LookupKey& key, std::string* value, Status* s);

private:
  ~MemTable();  // Private since only Unref() should be used to delete it

  //key compartor，用于排序
  struct KeyComparator {
    const InternalKeyComparator comparator;
    explicit KeyComparator(const InternalKeyComparator& c) : comparator(c) { }
    int operator()(const char* a, const char* b) const;
  };
  friend class MemTableIterator;
  friend class MemTableBackwardIterator;

  typedef SkipList<const char*, KeyComparator> Table;

  KeyComparator comparator_;
  int refs_; //引用计数
  Arena arena_; //内存分配器
  Table table_; //数据存放SkipList

  // No copying allowed
  MemTable(const MemTable&);
  void operator=(const MemTable&);
};

先看看插入
<db/memtable.cc>

void MemTable::Add(SequenceNumber s, ValueType type,
                   const Slice& key,
                   const Slice& value) {
  //数据结构：
  //1.internal key size : Varint32 (length of 2+3)
  //2.key data
  //3.SequenceNumber+Key type: 8 bytes
  //4 value size: Varint32
  //5 value data
  size_t key_size = key.size();
  size_t val_size = value.size();
  size_t internal_key_size = key_size + 8;
  const size_t encoded_len =
      VarintLength(internal_key_size) + internal_key_size +
      VarintLength(val_size) + val_size;
  char* buf = arena_.Allocate(encoded_len);
  char* p = EncodeVarint32(buf, internal_key_size);
  memcpy(p, key.data(), key_size);
  p += key_size;
  EncodeFixed64(p, (s << 8) | type);
  p += 8;
  p = EncodeVarint32(p, val_size);
  memcpy(p, value.data(), val_size);
  assert((p + val_size) - buf == encoded_len);
  table_.Insert(buf);
}

查询
<db/memtable.cc>

bool MemTable::Get(const LookupKey& key, std::string* value, Status* s) {
  Slice memkey = key.memtable_key();
  Table::Iterator iter(&table_);
  iter.Seek(memkey.data());
  if (iter.Valid()) {
    // entry format is:
    //    klength  varint32
    //    userkey  char[klength]
    //    tag      uint64
    //    vlength  varint32
    //    value    char[vlength]
    // Check that it belongs to same user key.  We do not check the
    // sequence number since the Seek() call above should have skipped
    // all entries with overly large sequence numbers.
    const char* entry = iter.key();
    uint32_t key_length;
    const char* key_ptr = GetVarint32Ptr(entry, entry+5, &key_length);
    if (comparator_.comparator.user_comparator()->Compare(
            Slice(key_ptr, key_length - 8),
            key.user_key()) == 0) {
      // Correct user key
      const uint64_t tag = DecodeFixed64(key_ptr + key_length - 8);
      switch (static_cast<ValueType>(tag & 0xff)) {
        case kTypeValue: {
          Slice v = GetLengthPrefixedSlice(key_ptr + key_length);
          value->assign(v.data(), v.size());
          return true;
        }
        case kTypeDeletion:
          *s = Status::NotFound(Slice());
          return true;
      }
    }
  }
  return false;
}

posted @ 2012-03-19 16:31 小明阅读(3065) | 评论 (0) | 编辑收藏

leveldb研究7-Version/VersionSet/VersionEdit

leveldb 使用 version 来保存数据库的状态。

先看看一个重要的数据结果，sst file的META info
<db/version_edit.h>

struct FileMetaData {
  int refs; //引用计数
  int allowed_seeks; //允许的seeks次数
  uint64_t number;//文件编号
  uint64_t file_size;  //文件大小
  InternalKey smallest;    //最小的key
  InternalKey largest;      //最大的key

  FileMetaData() : refs(0), allowed_seeks(1 << 30), file_size(0) { }
};

这里面有一个很有意思的字段: allowed_seeks,代表了可以seek的次数，为0的时候表示这个文件需要被compaction.如何设置seeks次数呢？文件大小除以16k，不到100算100。

f->allowed_seeks = (f->file_size / 16384);
if (f->allowed_seeks < 100) f->allowed_seeks = 100;

原因，请看leveldb的注释：

// We arrange to automatically compact this file after a certain number of seeks. Let's assume:
      //   (1) One seek costs 10ms
      //   (2) Writing or reading 1MB costs 10ms (100MB/s)
      //   (3) A compaction of 1MB does 25MB of IO:
      //         1MB read from this level
      //         10-12MB read from next level (boundaries may be misaligned)
      //         10-12MB written to next level
      // This implies that 25 seeks cost the same as the compaction
      // of 1MB of data. I.e., one seek costs approximately the
      // same as the compaction of 40KB of data. We are a little
      // conservative and allow approximately one seek for every 16KB
      // of data before triggering a compaction.

接下来看Version的定义，version其实就是一系列的SST file的集合。

class Version {
public:
  //生成iterator用于遍历
  void AddIterators(const ReadOptions&, std::vector<Iterator*>* iters);

  //根据key来查询，若没有查到，更新GetStats
  struct GetStats {
    FileMetaData* seek_file;
    int seek_file_level;
  };
  Status Get(const ReadOptions&, const LookupKey& key, std::string* val,
             GetStats* stats);

  //是否需要进行compaction
  bool UpdateStats(const GetStats& stats);

  //引用计算，避免在被引用时候删除
  void Ref();
  void Unref();

  //查询和key range有关的files
  void GetOverlappingInputs(
      int level,
      const InternalKey* begin,         // NULL means before all keys
      const InternalKey* end,           // NULL means after all keys
      std::vector<FileMetaData*>* inputs);

  //计算是否level对某个key range是否有overlap
  bool OverlapInLevel(int level,
                      const Slice* smallest_user_key,
                      const Slice* largest_user_key);

  //memtable output应该放到哪个level
  int PickLevelForMemTableOutput(const Slice& smallest_user_key,
                                 const Slice& largest_user_key);

  //某个level的文件个数
   int NumFiles(int level) const { return files_[level].size(); }

  // Return a human readable string that describes this version's contents.
  std::string DebugString() const;

private:
  friend class Compaction;
  friend class VersionSet;

  class LevelFileNumIterator;
  Iterator* NewConcatenatingIterator(const ReadOptions&, int level) const;

  VersionSet* vset_;            // VersionSet to which this Version belongs
  Version* next_;               // Next version in linked list
  Version* prev_;               // Previous version in linked list
  int refs_;                    // Number of live refs to this version

  //sst files
  std::vector<FileMetaData*> files_[config::kNumLevels];

  //下一个要被compaction的文件
  FileMetaData* file_to_compact_;
  int file_to_compact_level_;

  //compaction score:>1表示要compaction
  double compaction_score_;
  int compaction_level_;

  explicit Version(VersionSet* vset)
      : vset_(vset), next_(this), prev_(this), refs_(0),
        file_to_compact_(NULL),
        file_to_compact_level_(-1),
        compaction_score_(-1),
        compaction_level_(-1) {
  }

  ~Version();

  // No copying allowed
  Version(const Version&);
  void operator=(const Version&);
};

那VersionSet呢？VersionSet 是version组成一个双向循环链表。

class VersionSet{
//. . .
Env* const env_;
  const std::string dbname_;
  const Options* const options_;
  TableCache* const table_cache_;
  const InternalKeyComparator icmp_;
  uint64_t next_file_number_;
  uint64_t manifest_file_number_;
  uint64_t last_sequence_;
  uint64_t log_number_;

  WritableFile* descriptor_file_;
  log::Writer* descriptor_log_;
  Version dummy_versions_;  // Head of circular doubly-linked list of versions.
  Version* current_;        // == dummy_versions_.prev_

  //每层都有一个compact pointer用于指示下次从哪里开始compact,以用于实现循环compact
  std::string compact_pointer_[config::kNumLevels];
//. . .
}

VersionEdit是version对象的变更记录，用于写入manifest.这样通过原始的version加上一系列的versionedit的记录，就可以恢复到最新状态。

class VersionEdit {
public:
  VersionEdit() { Clear(); }
  ~VersionEdit() { }

  void Clear();

  void SetComparatorName(const Slice& name) {
    has_comparator_ = true;
    comparator_ = name.ToString();
  }
  void SetLogNumber(uint64_t num) {
    has_log_number_ = true;
    log_number_ = num;
  }
  void SetPrevLogNumber(uint64_t num) {
    has_prev_log_number_ = true;
    prev_log_number_ = num;
  }
  void SetNextFile(uint64_t num) {
    has_next_file_number_ = true;
    next_file_number_ = num;
  }
  void SetLastSequence(SequenceNumber seq) {
    has_last_sequence_ = true;
    last_sequence_ = seq;
  }
  void SetCompactPointer(int level, const InternalKey& key) {
    compact_pointers_.push_back(std::make_pair(level, key));
  }

  //添加meta file
  void AddFile(int level, uint64_t file,
               uint64_t file_size,
               const InternalKey& smallest,
               const InternalKey& largest) {
    FileMetaData f;
    f.number = file;
    f.file_size = file_size;
    f.smallest = smallest;
    f.largest = largest;
    new_files_.push_back(std::make_pair(level, f));
  }

  //删除特定的文件
  void DeleteFile(int level, uint64_t file) {
    deleted_files_.insert(std::make_pair(level, file));
  }

  //编码，解码：用于写入manifest
  void EncodeTo(std::string* dst) const;
  Status DecodeFrom(const Slice& src);

  std::string DebugString() const;

private:
  friend class VersionSet;

  typedef std::set< std::pair<int, uint64_t> > DeletedFileSet;

  std::string comparator_;
  uint64_t log_number_;
  uint64_t prev_log_number_;
  uint64_t next_file_number_;
  SequenceNumber last_sequence_;
  bool has_comparator_;
  bool has_log_number_;
  bool has_prev_log_number_;
  bool has_next_file_number_;
  bool has_last_sequence_;

  std::vector< std::pair<int, InternalKey> > compact_pointers_;
  DeletedFileSet deleted_files_;
  std::vector< std::pair<int, FileMetaData> > new_files_;
};

posted @ 2012-03-16 17:10 小明阅读(4015) | 评论 (0) | 编辑收藏

leveldb研究6- Level和Compaction

摘要: leveldb之所以使用level作为数据库名称，精华就在于level的设计。本质是一种归并排序算法。这样设计的好处主要是可以减少compaction的次数和每次的文件个数。Compaction为什么要compaction? compaction可以提高数据的查询效率，没有经过compaction，需要从很多SST file去查找，而做过compaction后，只需要从有限的SST文件去... 阅读全文

posted @ 2012-03-15 17:28 小明阅读(7964) | 评论 (0) | 编辑收藏

回文字符串的切割问题

TopCoder MatchMaker解法

有点难度的java笔试题

leveldb性能分析 - 随机写

leveldb研究11- 流程分析：读数据

leveldb研究10- 流程分析：写数据

leveldb研究9- 流程分析：打开数据库

leveldb研究8- 内存中的数据结构Memtable/SkipList

leveldb研究7-Version/VersionSet/VersionEdit

leveldb研究6- Level和Compaction

日历

常用链接

留言簿(5)

随笔分类

随笔档案

相册

My blogs

搜索

最新评论

阅读排行榜

评论排行榜