hengheng123456789

  BlogJava :: 首页 :: 联系 :: 聚合  :: 管理
  297 Posts :: 68 Stories :: 144 Comments :: 0 Trackbacks

Mondrian and OLAP

   Mondrian 是使用java编写的OLAP引擎,它执行MDX语言描述的查询,可以从关系数据库中获取数据。

Online Analytical Processing (OLAP)

   OLAP是指实时地分析大数据量,与Online Transaction Processing (OLTP)不同。

Year

2000

2001

Growth

Product

Dollar sales

Unit sales

Dollar sales

Unit sales

Dollar sales

Unit sales

Total

$7,073

2,693

$7,636

3,008

8%

12%

Books

$2,753

824

$3,331

966

21%

17%

Fiction

$1,341

424

$1,202

380

-10%

-10%

Non-fiction

$1,412

400

$2,129

586

51%

47%

Magazines

$2,753

824

$2,426

766

-12%

-7%

— Greetings cards

$1,567

1,045

$1,879

1,276

20%

22%


从上表可以看到year和Product为dimensions (维度),measures 为'Unit sales' and 'Dollar sales'。

Layers of a Mondrian system

   Mondrian OLAP System 包含4个层:

   1、the presentation layer
         用于描述最终用户所看到的,其表现形式有很多,如:pivot表、pie、line和bar图、maps和动态图。由用户提问,OLAP服务器返回答案。
   2、the dimensional layer
         这一层解析、验证和执行MDX 查询,首先计算轴,再计算轴上所有单元的值。metadata用于描述空间模型,及空间模型怎样映射到关系模型上。
   3、the star layer
         它负责维护一个聚合的cache,aggregation 聚合是一些内存中的测量值(cells),及一些维度值。如果请求的cells不在cache中,则aggregation manager 向the storage layer发出请求。
   4、and the storage layer
         the storage layer为RDBMS。


What is MDX?

   'multi-dimensional expressions'为Mondrian执行的查询。
   下面为一个基本的查询:
SELECT {[Measures].[Unit Sales], [Measures].[Store Sales]} ON COLUMNS,
{[Product].members} ON ROWS
FROM [Sales]
WHERE [Time].[1997].[Q2]

What is a schema? 

 schema定义了一个多维数据库。它包含一个逻辑模型,由多个cube、hierarchies、members及一个到物理模型的映射组成。
 逻辑模型包含:cubes, dimensions, hierarchies, levels, and members.

  • A cube is a collection of dimensions and measures in a particular subject area.
  • A measure is a quantity that you are interested in measuring, for example, unit sales of a product, or cost price of inventory items.
  • A dimension is an attribute, or set of attributes, by which you can divide measures into sub-categories. For example, you might wish to break down product sales by their color, the gender of the customer, and the store in which the product was sold; color, gender, and store are all dimensions.
    例如:
    <Schema>
      <Cube name="Sales">
        <Table name="sales_fact_1997"/>
        <Dimension name="Gender" foreignKey="customer_id">
          <Hierarchy hasAll="true" allMemberName="All Genders" primaryKey="customer_id">
            <Table name="customer"/>
            <Level name="Gender" column="gender" uniqueMembers="true"/>
          </Hierarchy>
        </Dimension>
        <Dimension name="Time" foreignKey="time_id">
          <Hierarchy hasAll="false" primaryKey="time_id">
            <Table name="time_by_day"/>
            <Level name="Year" column="the_year" type="Numeric" uniqueMembers="true"/>
            <Level name="Quarter" column="quarter" uniqueMembers="false"/>
            <Level name="Month" column="month_of_year" type="Numeric" uniqueMembers="false"/>
          </Hierarchy>
        </Dimension>
        <Measure name="Unit Sales" column="unit_sales" aggregator="sum" formatString="#,###"/>
        <Measure name="Store Sales" column="store_sales" aggregator="sum" formatString="#,###.##"/>
        <CalculatedMember name="Profit" dimension="Measures" formula="[Measures].[Store Sales]-[Measures].[Store Cost]">
          <CalculatedMemberProperty name="FORMAT_STRING" value="$#,##0.00"/>
        </CalculatedMember>
      </Cube>
    </Schema>

     
    MDX查询为:


    SELECT {[Measures].[Unit Sales], [Measures].[Store Sales]} ON COLUMNS,
      {[Time].[1997].[Q1].descendants} ON ROWS
    FROM [Sales]



    WHERE [Gender].[F]



    [Time][Measures].[Unit Sales][Measures].[Store Sales]
    [1997].[Q1]00
    [1997].[Q1].[Jan]00
    [1997].[Q1].[Feb]00
    [1997].[Q1].[Mar]00


  • A member is a point within a dimension determined by a particular set of attribute values. The gender hierarchy has the two members 'M' and 'F'. 'San Francisco', 'California' and 'USA' are all members of the store hierarchy.
  • A hierarchy is a set of members organized into a structure for convenient analysis. For example, the store hierarchy consists of the store name, city, state, and nation. The hierarchy allows you form intermediate sub-totals: the sub-total for a state is the sum of the sub-totals of all of the cities in that state, each of which is the sum of the sub-totals of the stores in that city.
  • A level is a collection of members which have the same distance from the root of the hierarchy.
  • A dimension is a collection of hierarchies which discriminate on the same fact table attribute (say, the day that a sale occurred).
  • mondrian.properties
       mondrian有一个配置文件可以定义它是如何运行的。


    a simple star schema

       

  • Cube [Sales] has two measures [Unit sales] and [Dollar sales]
  • Dimension [Product] has levels [All Products], [Manufacturer], [Brand], [Prodid]
  • Dimension [Time] has levels [All Time], [Year], [Quarter], [Month], [Day]
  • Dimension [Customer] has levels [All Customers], [State], [City], [Custid]
  • Dimension [Payment Method] has levels [All Payment Methods], [Payment Method]

  • posted on 2006-11-26 16:44 哼哼 阅读(1372) 评论(0)  编辑  收藏 所属分类: BI

    只有注册用户登录后才能发表评论。


    网站导航: