paulwong

#

利用Mongodb的复制集搭建高可用分片,Replica Sets + Sharding的搭建过程

     摘要: 参考资料 reference:  http://mongodb.blog.51cto.com/1071559/740131  http://docs.mongodb.org/manual/tutorial/deploy-shard-cluster/#sharding-setup-shard-collection感谢网友Mr.Sharp,他给了我很多...  阅读全文

posted @ 2015-12-18 13:54 paulwong 阅读(942) | 评论 (0)编辑 收藏

MONGODB的复制与分片

复制:为了防止单点故障,会有几个实例在运行,保持相同的数据。

  • 一般主从:一主多从,主作读写数据,从作从主备份数据用,如果主宕机,则整个MONGODB无法工作。
  • 复制式主从:一动态主多从,主由选举产生,当中一个主宕机,其他的从会选出一个主。

适用场景:高负荷的读多写少。

分片:SHARDING,一般数据库中的分库分表,一个表分成几个表用。每个片再做复制。

适用场景:高负荷的写多读少。即如果发现MONGODB写不能支撑了,则要转此模式。

安装配置服务器,安装ROUTER:MONGOS,安装分片服务器,通知MONGOS挂载SHARD。

如果只启用数据库的分片,则不同的表放在不同的分片上,即一个表只占一个分片,另一个表占另一个分片,如果做了表的分片,则此表会分布在所有分片上。











posted @ 2015-12-18 13:21 paulwong 阅读(557) | 评论 (0)编辑 收藏

Android Application Architecture 安卓APP架构[译]

本文介绍了文章作者从事了几年android应用的开发,经历2次架构变革,第一次集成了RxJava第二次集成了MVP,并将RxJava与MVP完美结合,实现了低耦合,代码简单,测试方便的架构。

其实我们在开发中也遇到过,Android入门门槛较低,如果前期对APP规划不清晰,Coder们对未来变化把握不准,技术架构经验不够强大,最终导致就是一个Activity几千行,里面写了大量的Private方法,拆成几个Fragment、封装出来几个类都是无法解决,结果就是看Activity难受的要死,纠结,看了不爽改也不是不改也不是,严重影响看的人的心情。并且怨天尤人这个是产品人员规划App不好,没有前瞻性,改来改去。。。

这篇文章就是使用新的结构解决该问题。

安卓APP架构

Android Application Architecture

Our journey from standard Activities and AsyncTasks to a modern MVP-based architecture powered by RxJava.

这篇文章主要目的是讲述如何将传统的Activities 与 AsyncTasks 模式向目前主流的MVP架构基础的响应式编程框架过度。

1*HrE2lljEfsCu1X_OUDfHYA

Different parts of a software codebase should be independent, yet perfectly work together like a well-oiled machine — photo by Chester Alvarez.

先畅享一下:~~~如果松耦合架构,分工明确,然后完美的组合在一起工作是一个很吊的事情。
(转个图片还要写明白谁拍的,版权意识真强)

The Android dev ecosystem moves very quickly. Every week new tools are created, libraries are updated, blog posts are written and talks are given. If you go on holiday for a month, by the time you come back there will be a new version of the support library and/or Play Services.

最近几年Android的生态链变化非常迅速,从底层的Android Api到应用层的各种开源的类库、工具更新非常迅速。一不留神就落后了。

I’ve been making Android apps with the ribot team for over three years. During this time, the architecture and technologies we’ve used to build Android apps have been continuously evolving. This article will take you through this journey by explaining our learnings, mistakes and the reasoning behind these architectural changes.

我在Ribot团队从事Android应用开发工作三年多,伴随着公司技术的不断创新,积累了很多经验、错误以及在技术选型背后的故事。

旧的应用架构

The old times
Back in 2012 our codebases used to follow a basic structure. We didn’t use any networking library and AsyncTasks were still our friends. The diagram below shows approximately how the architecture was.

2012年那个时候,我们的代码都是用的原生Android,没有使用任何的网络请求框架,而是基于AsyncTasks书写。
1*TTtpcT4H80THBofnCtQ_L>The code was structured in two layers: the data layer that was in charge of retrieving/saving data from REST APIs and persistent data stores; and the view layer, whose responsibility was handling and displaying the data on the UI.
The APIProvider provides methods to enable Activities and Fragments to easily interact with the REST API. These methods use URLConnection and AsyncTasks to perform network calls in a separate thread and return the result to the Activities via callbacks.

代码分为两层,Data与View,Data层主要是用来从API获取数据,保存到持久化的db当中。View层主要就是把Data的数据显示到UI上。APIProvider提供方法出来,用于在Activity或者Fragment中方便的进行控制与交互。技术上将,使用URLConnection与AsyncTasks实现了一个异步的网络请求并将结果返回到调用的回调方法里面。

In a similar way, the CacheProvider contains methods that retrieve and store data from SharedPreferences or a SQLite database. It also uses callbacks to pass the result back to the Activities.

相同的原理CacheProvider提供一系列方法,将SharedPreferences或者SQLite的数据取出来,并且返回给到Activity

问题

The problems
The main issue with this approach was that the View layer had too many responsibilities. Imagine a simple common scenario where the application has to load a list of blog posts, cache them in a SQLite database and finally display them on a ListView. The Activity would have to do the following:

主要问题是View层有太多的累赘,以一个博客列表为例来讲述,比如博客需要显示一个ListView,从SQLite读取数据,Activity需要做到以下几点:

  1. Call a method loadPosts(callback) in the APIProvider
  2. Wait for the APIProvider success callback and then call savePosts(callback) in the CacheProvider.
  3. Wait for the CacheProvider success callback and then display the posts on the ListView.
  4. Separately handle the two potential errors callback from the APIProvider and CacheProvider.
  1. 执行APIProvider里面的loadPosts的方法,里面传入回调参数内容。
  2. 等待loadPosts执行成功后,执行回调里面的CacheProvider中的savePosts方法,savePosts也要传入回调参数。
  3. 等待savePosts执行成功后,执行回调里面的方法刷新ListView
  4. 分别书写代码处理2 3 两步的错误回调内容。

This is a very simple example. In a real case scenario the REST API will probably not return the data like the view needs it. Therefore, the Activity will have to somehow transform or filter the data before showing it. Another common case is when the loadPosts() method takes a parameter that needs to be fetched from somewhere else, for example an email address provided by the Play Services SDK. It’s likely that the SDK will return the email asynchronously using a callback, meaning that we now have three levels of nested callbacks. If we keep adding complexity, this approach will result into what is known as callback hell.

这还是一个比较简单的例子,在一些真实的场景中,远程的API可能没有返回程序的必须值,但是activity必须把数据处理完成之后才能显示结果。再一个例子就是如果loadPosts方法需要借助一些其他地方的返回参数时,类似用多线程去实现同步请求,为保证数据正常请求,意味着必须做一个三层的回调,如果再复杂一些,想理清楚这些回调就是很蛋疼的事情。

In summary:
Activities and Fragments become very large and difficult to maintain
Too many nested callbacks means the code is ugly and difficult to understand so painful to make changes or add new features.
Unit testing becomes challenging, if not impossible, because a lot of the logic lives within the Activities or Fragments that are arduous to unit test.

总之,回调多了之后,Activity与Fragment会乱的要死,并且一般人无法直视。

牛逼的新架构出来了

A new architecture driven by RxJava
We followed the previous approach for about two years. During that time, we made several improvements that slightly mitigated the problems described above. For example, we added several helper classes to reduce the code in Activities and Fragments and we started using Volley in the APIProvider. Despite these changes, our application code wasn’t yet test-friendly and the callback hell issue was still happening too often.

我们在蛋疼的架构中煎熬了2年,当然也尝试过很多方式,最终也只能是缓和一下乱的问题。我们在APIProvider使用了Volley,代替了AsyncHttpClient,但是其实是一个吊样。

It wasn’t until 2014 when we started reading about RxJava. After trying it on a few sample projects, we realised that this could finally be the solution to the nested callback problem. If you are not familiar with reactive programming you can read this introduction. In short, RxJava allows you to manage data via asynchronous streams and gives you many operators that you can apply to the stream in order to transform, filter or combine the data.

不到2014年我们就开始进行RxJava的预研,然后尝试了一批简单的项目,感觉RxJava的方式是解决我们嵌套回调的终极解决办法。简单的说,RxJava允许你通过异步流的方式管理你的数据,并且还可以通过操作符(Operators)对Observable对象的变换

Taking into account the pains we experienced in previous years, we started to think about how the architecture of a new app would look. So we came up with this.

我们用了几年的经验痛定思痛,搞了下面这么个东西,新的APP的架构图

1*kCynNIa5PscRl41V2scosA-200

Similar to the first approach, this architecture can be separated into a data and view layer. The data layer contains the DataManager and a set of helpers. The view layer is formed by Android framework components like Fragments, Activities, ViewGroups, etc.

与第一种方法相似,这个架构也是分为Data层与View层,Data层包含DataManager与一堆Helper;View层是包含Fragments, Activities, ViewGroups等。

Helper classes (third column on diagram) have very specific responsibilities and implement them in a concise manner. For example, most projects have helpers for accessing REST APIs, reading data from databases or interacting with third party SDKs. Different applications will have a different number of helpers but the most common ones are:

Helper主要是集成第三方的类库,以便于在代码中几行代码就可以清晰的实现某个功能,比如请求API,访问数据库等,虽然不同的应用程序都有不同的类库,但是他们无非就是以下这些内容:

  • PreferencesHelper: reads and saves data in SharedPreferences.
  • DatabaseHelper: handles accessing SQLite databases.
  • Retrofit services: perform calls to REST APIs. We started using Retrofit instead of Volley because it provides support for RxJava. It’s also nicer to use.
  • 从SharedPreferences中读取或者写入数据
  • 读写SQLite数据库
  • 类似与square的Retrofit服务,也就是Http Client,我们用Restrofit替代了Volley因为他支持Rxjava,并且更吊。

Most of the public methods inside helper classes will return RxJava Observables.
The DataManager is the brain of the architecture. It extensively uses RxJava operators to combine, filter and transform data retrieved from helper classes. The aim of the DataManager is to reduce the amount of work that Activities and Fragments have to do by providing data that is ready to display and won’t usually need any transformation.

RxJava最核心的两个东西是Observables(被观察者,事件源)和Subscribers(观察者),在Helper类中的Public方法,一般都会返回一个RxJava的Observables;DataManager是整个架构的大脑,他大量的使用Rxjava的operators对Helper返回来的数据进行的整合过滤、二次处理。

The code below shows what a DataManager method would look like. This sample method works as follows:

下面用一个例子来说明DataManager是做什么的:

  1. Call the Retrofit service to load a list of blog posts from a REST API
  2. Save the posts in a local database for caching purposes using the DatabaseHelper.
  3. Filter the blog posts written today because those are the only ones the view layer wants to display.
  1. 调用Retrofit的服务,去请求一个博客列表的API
  2. 用DatabaseHelper保存这些数据到数据库
  3. 过滤出这些BLOG哪些是今天写的,然后显示到UI界面上。

Components in the view layer such as Activities or Fragments would simply call this method and subscribe to the returned Observable. Once the subscription finishes, the different Posts emitted by the Observable can be directly added to an Adapter in order to be displayed on a RecyclerView or similar.

Observables发出一系列事件,Subscribers(例如 Activities or Fragments)处理这些事件,可以直接将数据显示到一些可以回收、重用的View上面。
【BTW:如果一个Observerble没有任何的的Subscriber,那么这个Observable是不会发出任何事件的】

The last element of this architecture is the event bus. The event bus allows us to broadcast events that happen in the data layer, so that multiple components in the view layer can subscribe to these events. For example, a signOut() method in the DataManager can post an event when the Observable completes so that multiple Activities that are subscribed to this event can change their UI to show a signed out state.

这个架构的另外一个模块是event bus,event bus可以让我们在Data层发出广播(不是Android的Broadcast)然后不同的模块去注册并接收不同的广播事件

Why was this approach better?
RxJava Observables and operators remove the need for having nested callbacks.
1*BIsOCzJnc-SSU8fPXTiP1A

为什么这个方式这么牛逼,是因为Observables与operators可以去掉那一堆必须的回调方法

The DataManager takes over responsibilities that were previously part of the view layer. Hence, it makes Activities and Fragments more lightweight.
Moving code from Activities and Fragments to the DataManager and helpers means that writing unit tests becomes easier.

DataManager替代了传统架构中很多代码,从而使得Activity与Fragment变得更加轻量级。并且使得单元测试变得更加简单。

Clear separation of responsibilities and having the DataManager as the only point of interaction with the data layer, makes this architecture test-friendly. Helper classes or the DataManager can be easily mocked.

DataManager成为了唯一的数据交互部分,这样清晰的架构使得更方便进行代码自测。

What problems did we still have?
For large and very complex projects the DataManager can become too bloated and difficult to maintain.
Although view layer components such as Activities and Fragments became more lightweight, they still have to handle a considerable amount of logic around managing RxJava subscriptions, analysing errors, etc.

我们还有什么问题?
- 如果对于非常庞大并且复杂的项目来说,DataManger也会变得非常臃肿并且难以维护。
- 尽管Activity与Fragment已经变得更加轻量级,但是对于错误异常的处理还是要在subscriptions的地方去书写。

一体化的MVP模式

Integrating Model View Presenter
In the past year, several architectural patterns such as MVP or MVVM have been gaining popularity within the Android community. After exploring these patterns on a sample project and article, we found that MVP could bring very valuable improvements to our existing approach. Because our current architecture was divided in two layers (view and data), adding MVP felt natural. We simply had to add a new layer of presenters and move part of the code from the view to presenters.

前几年开始,很多类似MVP与MVVM在Android的一些社区比较流行,经过研究之后,我们发现MVP模式是对我们目前的方案最有价值的改动。我们的两层架构View-Data与MVP的 Model-View架构天然融合,理念一致。我们只需要增加一个presenters层,然后把之前在view实现的代码移到上面就可以了。
1*NonRJ0uzzN9o1ygT6J421g

The data layer remains as it was but it’s now called model to be more consistent with the name of the pattern.
Presenters are in charge of loading data from the model and calling the right method in the view when the result is ready. They subscribe to Observables returned by the data manager. Therefore, they have to handle things like schedulers and subscriptions. Moreover, they can analyse error codes or apply extra operations to the data stream if needed. For example, if we need to filter some data and this same filter is not likely to be reused anywhere else, it may make more sense to implement it in the presenter rather than in the data manager.

之前的Data层就是现在的MVP中的Model,Presenter现在负责从Model中加载数据,加载完成后后再去调用左边的在Activity、ViewGroup中的方法。Presenters的subscribe去接收data manager中的Observables广播出来的数据。
举例说明,如果我们需要增加数据的过滤操作但是并不是所有地方都需要的那种,那就可以在presenter里面写这些代码,而不用写在公共的datamanager里面。

Below you can see what a public method in the presenter would look like. This code subscribes to the Observable returned by the dataManager.loadTodayPosts() method we defined in the previous section.

我们定义的dataManager.loadTodayPosts()会广播出数据给到对应的subscribes

The mMvpView is the view component that this presenter is assisting. Usually the MVP view is an instance of an Activity, Fragment or ViewGroup.

MVP的View并不是指的Android的View,而是一个界面组件的的实例,例如Activity, Fragment , ViewGroup 在注册presenter的时候,需要把自己当前的实例传递进去。

// Activity onCreate 中的代码段  if (presenter == null)         presenter = new Presenter1();         presenter.onTakeView(this);  

Like the previous architecture, the view layer contains standard framework components like ViewGroups, Fragments or Activities. The main difference is that these components don’t subscribe directly to Observables. They instead implement an MvpView interface and provide a list of concise methods such as showError() or showProgressIndicator(). The view components are also in charge of handling user interactions such as click events and act accordingly by calling the right method in the presenter. For example, if we have a button that loads the list of posts, our Activity would call presenter.loadTodayPosts() from the onClick listener.

这个架构与上一个架构不同的是,ViewLayer 也就是Activity这些,不会直接去订阅接收Observables发出的这些事件。而是只在Activity实现几个简单的显示错误、显示进度的方法(用接口interface来规范统一),然后把当前实例以参数形式传递给到对应事件的Presenter,由Presenter去执行这些显示错误、显示进度的方法。
当然对于用户交互部分的按钮点击事件还是要在Activity中进行处理。

If you want to see a full working sample of this MVP-based architecture, you can check out our Android Boilerplate project on GitHub. You can also read more about it in the ribot’s architecture guidelines.

关于MVP的文章可以自行百度一下,MVP Android 关键词

Why is this approach better?

为什么这个又最吊

  • Activities and Fragments become very lightweight. Their only responsibilities are to set up/update the UI and handle user events. Therefore, they become easier to maintain.
  • We can now easily write unit tests for the presenters by mocking the view layer. Before, this code was part of the view layer so we couldn’t unit test it. The whole architecture becomes very test-friendly.
  • If the data manager is becoming bloated, we can mitigate this problem by moving some code to the presenters.
  • Activity与Fragment代码量大大降低,逻辑代码全部都丢给了Presenter,结果就是Activity只需要负责UI交互的按钮等代码。
  • 对于Presenter可以写单独的单元测试代码,只需要对Presenter提供的方法测试即可
  • 如果DataManager变得臃肿庞大了,我们可以分离这些代码到各自的Presenter中去。

What problems do we still have?
现在还有遗留什么问题

Having a single data manager can still be an issue when the codebase becomes very large and complex. We haven’t reached the point where this is a real problem but we are aware that it could happen.

只有一个DataManager仍旧是一个问题,尤其是当代码项目比较庞大的时候,当然我们还没有到达这个庞大的地步,尽管我们知道这个将来某天会发生。

It’s important to mention that this is not the perfect architecture. In fact, it’d be naive to think there is a unique and perfect one that will solve all your problems forever. The Android ecosystem will keep evolving at a fast pace and we have to keep up by exploring, reading and experimenting so that we can find better ways to continue building excellent Android apps.

如果想有个完美的架构解决你所有问题是不可能的。TMD Android的整个生态圈变化太快,又TM的不标准,就导致我们不断的去探索探索。。。以致于去找到更吊的方法去做Android apps。

I hope you enjoyed this article and you found it useful. If so, don’t forget to click the recommend button. Also, I’d love to hear your thoughts about our latest approach.

希望读了之后对我们的最新解决方案能有些建议想法。

【本文翻译的目的是在闲暇时间,研究新技术,用通俗技术语言写给自己看,便于日后方便查阅为目】
原文:https://medium.com/ribot-labs/android-application-architecture-8b6e34acda65
MVP介绍:http://www.jcodecraeer.com/a/anzhuokaifa/androidkaifa/2015/0425/2782.html
RxAndroid:https://github.com/ReactiveX/RxAndroid
Eventbus:https://github.com/greenrobot/EventBus

posted @ 2015-12-18 13:07 paulwong 阅读(653) | 评论 (0)编辑 收藏

mongodb的监控与性能优化

     摘要: .mongodb的监控 mongodb可以通过profile来监控数据,进行优化。查看当前是否开启profile功能用命令db.getProfilingLevel()  返回level等级,值为0|1|2,分别代表意思:0代表关闭,1代表记录慢命令,2代表全部开始profile功能为db.setProfilingLevel(level);  #level等级,值同上l...  阅读全文

posted @ 2015-12-16 18:50 paulwong 阅读(767) | 评论 (0)编辑 收藏

高压锅内部烧黑,如何去掉?

http://www.360doc.com/content/11/0415/13/117643_109815383.shtml

http://wenda.tianya.cn/question/4b4edcf687745412

http://zhidao.baidu.com/question/456177515446176485.html

http://iask.sina.com.cn/b/6262165.html

http://baike.pcbaby.com.cn/qzbd/5691.html#ldjc4ta=baby_tbody2

http://www.xiaoqiaomen.cc/qingjieweisheng/258.html

http://home.19lou.com/forum-106-thread-6901352097188270-1-1.html

posted @ 2015-12-14 18:25 paulwong 阅读(479) | 评论 (0)编辑 收藏

MONGODB删除/新增/更改大量记录的方法

MONGODB中,由于删除大量记录会十分耗时,一般推荐由MONGODB自己在后台处理,只需在某个字段设一个索引的标签即可。

@Indexed(expireAfterSeconds=180)
private Date deletedAt;

以上代码,如果字段deletedAt有值,那么将在180秒后被MONGODB删除,如果没值不会被删除。

批量新增,小批量更新,防止读取超时
private <T> void insertAll(List<T> list) {
        if (null != list) {
            int total = list.size();
            int count = (total + 50000 -1) / 50000;
            for (int i = 0; i < count; i++) {
                int toIndex = ((i +1) * 50000 > total) ? total : ((i +1) * 50000);
                log.info("toIndex = " + toIndex);
                mongoTemplate1.insertAll(list.subList(i * 50000, toIndex));
            }
        }
    }

批量更改
import java.util.Date;

import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.data.mongodb.core.MongoTemplate;
import org.springframework.data.mongodb.core.query.Criteria;
import org.springframework.data.mongodb.core.query.Query;
import org.springframework.data.mongodb.core.query.Update;

import com.tcl.project7.boss.gameapplication.yearendactivities.bigwheelgame.valueobject.SingleUseRedeemCode;

public class SingleUseRedeemCodeRepositoryImpl implements SingleUseRedeemCodeRepositoryCustom{
    
    @Autowired
    private MongoTemplate mongoTemplate1;
    
    public void batchUpdateSingleUseRedeemCodeList(String bigWheelGameAwardId) {
        
        Query query = new Query();
        query.addCriteria(Criteria.where("bigwheelgameawardid").is(bigWheelGameAwardId));
        mongoTemplate1.updateMulti(
                                    query, 
                                    new Update().set("bigwheelgameawardid", "-1")
                                        .set("deletedat", new Date()), 
                                    SingleUseRedeemCode.class);
    }

}


Expire Data from Collections by Setting TTL

New in version 2.2.

This document provides an introduction to MongoDB’s “time to live” or TTL collection feature. TTL collections make it possible to store data in MongoDB and have the mongod automatically remove data after a specified number of seconds or at a specific clock time.

Data expiration is useful for some classes of information, including machine generated event data, logs, and session information that only need to persist for a limited period of time.

A special TTL index property supports the implementation of TTL collections. The TTL feature relies on a background thread in mongod that reads the date-typed values in the index and removes expired documentsfrom the collection.

Procedures

To create a TTL index, use the db.collection.createIndex() method with theexpireAfterSeconds option on a field whose value is either a date or an array that contains date values.

NOTE

The TTL index is a single field index. Compound indexes do not support the TTL property. For more information on TTL indexes, see TTL Indexes.

Expire Documents after a Specified Number of Seconds

To expire data after a specified number of seconds has passed since the indexed field, create a TTL index on a field that holds values of BSON date type or an array of BSON date-typed objects and specify a positive non-zero value in the expireAfterSeconds field. A document will expire when the number of seconds in the expireAfterSeconds field has passed since the time specified in its indexed field. [1]

For example, the following operation creates an index on the log_events collection’s createdAt field and specifies the expireAfterSeconds value of 3600 to set the expiration time to be one hour after the time specified by createdAt.

db.log_events.createIndex( { "createdAt": 1 }, { expireAfterSeconds: 3600 } ) 

When adding documents to the log_events collection, set the createdAt field to the current time:

db.log_events.insert( {    "createdAt": new Date(),    "logEvent": 2,    "logMessage": "Success!" } ) 

MongoDB will automatically delete documents from the log_events collection when the document’screatedAt value [1] is older than the number of seconds specified in expireAfterSeconds.

[1](12) If the field contains an array of BSON date-typed objects, data expires if at least one of BSON date-typed object is older than the number of seconds specified in expireAfterSeconds.

SEE ALSO

$currentDate operator

Expire Documents at a Specific Clock Time

To expire documents at a specific clock time, begin by creating a TTL index on a field that holds values of BSON date type or an array of BSON date-typed objects and specify an expireAfterSeconds value of0. For each document in the collection, set the indexed date field to a value corresponding to the time the document should expire. If the indexed date field contains a date in the past, MongoDB considers the document expired.

For example, the following operation creates an index on the log_events collection’s expireAt field and specifies the expireAfterSeconds value of 0:

db.log_events.createIndex( { "expireAt": 1 }, { expireAfterSeconds: 0 } ) 

For each document, set the value of expireAt to correspond to the time the document should expire. For instance, the following insert() operation adds a document that should expire at July 22, 201314:00:00.

db.log_events.insert( {    "expireAt": new Date('July 22, 2013 14:00:00'),    "logEvent": 2,    "logMessage": "Success!" } ) 

MongoDB will automatically delete documents from the log_events collection when the documents’expireAt value is older than the number of seconds specified in expireAfterSeconds, i.e. 0 seconds older in this case. As such, the data expires at the specified expireAt value.

posted @ 2015-12-11 15:03 paulwong 阅读(1796) | 评论 (0)编辑 收藏

How to delete large amount of data of a MongoDB collection “quickly”

We have a db collection that is around 30 million documents, and I need to trim it down, to only keeping the documents created on the last month. 

One approach would be use the remove command with a condition on the created_at field (the collection already have an index on this field):

db.my_collection.remove({created_at: {$lte: new Date("11/01/2012")}});

But this approach will be very slow, instead of that, a better way to do it is rename the current collection (for instance to “old_collection”) using renameCollection. Then performing a query-and-insert from the “old_collection” into “my_collection”:

db.my_collection.renameCollection("old_collection");  
db.createCollection("my_collection");
db.my_collection.createIndex(...); // recreate the indexes for the collection
// copy docs from old collection into the new collection
db.old_collection.find(
{created_at: {$gte: new Date("11/01/2012")}} ).sort({_id: -1}).forEach(
function(row) { db.my_collection.insert(row); } ); // drop old collection db.old_collection.drop();

This approach is typically faster than running a bunch of removes on your data

posted @ 2015-12-10 20:09 paulwong 阅读(540) | 评论 (0)编辑 收藏

MongoDB 固定集合(Capped Collections)

MongoDB 固定集合(Capped Collections)是性能出色且有着固定大小的集合,对于大小固定,我们可以想象其就像一个环形队列,当集合空间用完后,再插入的元素就会覆盖最初始的头部的元素!


创建固定集合

我们通过createCollection来创建一个固定集合,且capped选项设置为true:

>db.createCollection("cappedLogCollection",{capped:true,size:10000})

还可以指定文档个数,加上max:1000属性:

>db.createCollection("cappedLogCollection",{capped:true,size:10000,max:1000})

判断集合是否为固定集合:

>db.cappedLogCollection.isCapped()

如果需要将已存在的集合转换为固定集合可以使用以下命令:

>db.runCommand({"convertToCapped":"posts",size:10000})

以上代码将我们已存在的 posts 集合转换为固定集合。


固定集合查询

固定集合文档按照插入顺序储存的,默认情况下查询就是按照插入顺序返回的,也可以使用$natural调整返回顺序。

>db.cappedLogCollection.find().sort({$natural:-1})

固定集合的功能特点

可以插入及更新,但更新不能超出collection的大小,否则更新失败,不允许删除,但是可以调用drop()删除集合中的所有行,但是drop后需要显式地重建集合。

在32位机子上一个cappped collection的最大值约为482.5M,64位上只受系统文件大小的限制。


固定集合属性及用法

属性

  • 属性1:对固定集合进行插入速度极快
  • 属性2:按照插入顺序的查询输出速度极快
  • 属性3:能够在插入最新数据时,淘汰最早的数据

用法

  • 用法1:储存日志信息
  • 用法2:缓存一些少量的文档

posted @ 2015-12-09 14:41 paulwong 阅读(463) | 评论 (0)编辑 收藏

MongoDB 聚合

http://www.runoob.com/mongodb/mongodb-aggregate.html
MongoDB中聚合(aggregate)主要用于处理数据(诸如统计平均值,求和等),并返回计算后的数据结果。有点类似sql语句中的 count(*)。


aggregate() 方法

MongoDB中聚合的方法使用aggregate()。

语法

aggregate() 方法的基本语法格式如下所示:

>db.COLLECTION_NAME.aggregate(AGGREGATE_OPERATION)

实例

集合中的数据如下:

{    _id: ObjectId(7df78ad8902c)    title: 'MongoDB Overview',     description: 'MongoDB is no sql database',    by_user: 'w3cschool.cc',    url: 'http://www.w3cschool.cc',    tags: ['mongodb', 'database', 'NoSQL'],    likes: 100 }, {    _id: ObjectId(7df78ad8902d)    title: 'NoSQL Overview',     description: 'No sql database is very fast',    by_user: 'w3cschool.cc',    url: 'http://www.w3cschool.cc',    tags: ['mongodb', 'database', 'NoSQL'],    likes: 10 }, {    _id: ObjectId(7df78ad8902e)    title: 'Neo4j Overview',     description: 'Neo4j is no sql database',    by_user: 'Neo4j',    url: 'http://www.neo4j.com',    tags: ['neo4j', 'database', 'NoSQL'],    likes: 750 },

现在我们通过以上集合计算每个作者所写的文章数,使用aggregate()计算结果如下:

> db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$sum : 1}}}]) { "result" : [ { "_id" : "w3cschool.cc", "num_tutorial" : 2 }, { "_id" : "Neo4j", "num_tutorial" : 1 } ], "ok" : 1 } >

以上实例类似sql语句: select by_user, count(*) from mycol group by by_user

在上面的例子中,我们通过字段by_user字段对数据进行分组,并计算by_user字段相同值的总和。

下表展示了一些聚合的表达式:

表达式描述实例
$sum计算总和。db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$sum : "$likes"}}}])
$avg计算平均值db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$avg : "$likes"}}}])
$min获取集合中所有文档对应值得最小值。db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$min : "$likes"}}}])
$max获取集合中所有文档对应值得最大值。db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$max : "$likes"}}}])
$push在结果文档中插入值到一个数组中。db.mycol.aggregate([{$group : {_id : "$by_user", url : {$push: "$url"}}}])
$addToSet在结果文档中插入值到一个数组中,但不创建副本。db.mycol.aggregate([{$group : {_id : "$by_user", url : {$addToSet : "$url"}}}])
$first根据资源文档的排序获取第一个文档数据。db.mycol.aggregate([{$group : {_id : "$by_user", first_url : {$first : "$url"}}}])
$last根据资源文档的排序获取最后一个文档数据db.mycol.aggregate([{$group : {_id : "$by_user", last_url : {$last : "$url"}}}])

管道的概念

管道在Unix和Linux中一般用于将当前命令的输出结果作为下一个命令的参数。

MongoDB的聚合管道将MongoDB文档在一个管道处理完毕后将结果传递给下一个管道处理。管道操作是可以重复的。

表达式:处理输入文档并输出。表达式是无状态的,只能用于计算当前聚合管道的文档,不能处理其它的文档。

这里我们介绍一下聚合框架中常用的几个操作:

  • $project:修改输入文档的结构。可以用来重命名、增加或删除域,也可以用于创建计算结果以及嵌套文档。
  • $match:用于过滤数据,只输出符合条件的文档。$match使用MongoDB的标准查询操作。
  • $limit:用来限制MongoDB聚合管道返回的文档数。
  • $skip:在聚合管道中跳过指定数量的文档,并返回余下的文档。
  • $unwind:将文档中的某一个数组类型字段拆分成多条,每条包含数组中的一个值。
  • $group:将集合中的文档分组,可用于统计结果。
  • $sort:将输入文档排序后输出。
  • $geoNear:输出接近某一地理位置的有序文档。

管道操作符实例

1、$project实例

db.article.aggregate( { $project : {         title : 1 ,         author : 1 , }} );

这样的话结果中就只还有_id,tilte和author三个字段了,默认情况下_id字段是被包含的,如果要想不包含_id话可以这样:

db.article.aggregate( { $project : {         _id : 0 ,         title : 1 ,         author : 1 }});

2.$match实例

db.articles.aggregate( [ { $match : { score : { $gt : 70, $lte : 90 } } }, { $group: { _id: null, count: { $sum: 1 } } } ] );

$match用于获取分数大于70小于或等于90记录,然后将符合条件的记录送到下一阶段$group管道操作符进行处理。

3.$skip实例

db.article.aggregate( { $skip : 5 }); 

经过$skip管道操作符处理后,前五个文档被"过滤"掉。

posted @ 2015-12-08 10:44 paulwong 阅读(544) | 评论 (0)编辑 收藏

Java 中正确使用 hashCode 和 equals 方法

     摘要: 在这篇文章中,我将告诉大家我对hashCode和equals方法的理解。我将讨论他们的默认实现,以及如何正确的重写他们。我也将使用Apache Commons提供的工具包做一个实现。 目录:hashCode()和equals()的用法重写默认实现使用Apache Commons Lang包重写hashCode()和equals()需要注意记住的事情当使用ORM的时候特别要注意的hashC...  阅读全文

posted @ 2015-12-01 10:52 paulwong 阅读(416) | 评论 (0)编辑 收藏

仅列出标题
共115页: First 上一页 32 33 34 35 36 37 38 39 40 下一页 Last