无为

无为则可为,无为则至深!

  BlogJava :: 首页 :: 联系 :: 聚合  :: 管理
  190 Posts :: 291 Stories :: 258 Comments :: 0 Trackbacks

Business applications are performed by programs that collect, create, modify, retrieve, and delete data, and programs that use, analyze, summarize, extract, or in other ways manipulate data. Data is the common thread that ties together the extensive corporate application portfolio. Data, as it is transformed into information as it flows between users, can provide current advantage in the form of superior operational systems and future advantage in the form of superior analysis for planning. How the data asset is positioned is of vital long-term importance to the health of the enterprise.

Increasingly, corporations are recognizing that the purposeful management and leveraging of the corporate data asset must take on increased attention in the 1990s. In the 1970s, management attention was focused on hardware cost. During the 1980s, management′s attention shifted to software as both a growing element of the I/T cost structure and the source of advantageous applications. In the 1990s, management will increasingly focus on data exploitation as the path to improved customer service, cooperation with suppliers, and the creation of new barriers for competitors.

Data Engineering theory (Data Engineering is the discipline that studies how to model, analyze, and design data for maximum utility) indicates that there are four generic data environments on which to build business applications. For a variety of technical and architectural reasons they are not equally advantageous. 

 

Dedicated File Architecture: Each application has a set of privately designed files. The data structure is tightly embedded with the application and the data files are owned by the application.

Closed Database Architecture: A database management system (DBMS) is used to provide technological advantage over file systems (exemplary advantages are views, security, atomicity, locking, recovery, etc.) but distinct, separate, and independent databases are still designed for each application. The DBMS is used as a private and powerful file system with the data remaining the proprietary property of the application. As is true with the Dedicated File Architecture, there is a high degree of data redundancy and frequently poor data administration. "Spaghetti-like" interfaces move data between the Closed Databases. Since these interfaces often have to convert, edit, and/or restructure data as it moves between proprietary definitions, they are often called "data scrubbers" or translators. "Data scrubbers" do not add value; they compensate for inadequate data administration.

Subject Database Architecture: Data is analyzed, modeled, structured, and stored, based on its own internal attributes, independent of any specific application. Data is administered as a shareable resource through a data administration function that owns the data for all potential users. Extensive sharing of data occurs through application sensitive views. Subject Databases run the day-to-day operations of the enterprise.

Decision Support Database Architecture: Databases are constructed for quick searching, retrieval, ad-hoc queries, and ease of use. The data is normally a periodic extract from a Subject Database or public information service. To minimize the number of extracts and to insure time/content consistent data, data is shared at the corporate, departmental and local levels-not extracted per user. Data definitions are kept synchronized with the source databases to insure the ability to inter-relate data from multiple subject database extracts without the need to resort to "data scrubbers." Decision Support Databases are used to analyze the enterprise.

The recommended data architecture is a mixture of the Subject Database and Decision Support Database environments. Subject Databases to support "The Business Applications" and Decision Support Databases to enable "The About The Business Applications." This dual database architecture is most advantageous for the following reasons:

Data quality, accessibility and sharing are maximized.
Unplanned data redundancy is eliminated.
Inter-application interaction is simplified.
Data standardization is assured.
Application life cycle productivity is maximized.
Development of new applications is accelerated through the reuse of in-place data resource.
Creation of centers of excellence in data management to protect the data asset is enabled.

Operational Environment Subject Databases The Business Applications

Data Warehouse Environment Decision Support Databases The About-The-Business Applications

Stores very detailed data
Stores entire subject database
Requires to the last transaction accuracy
Disciplined, highly structured, and planned transactions
Optimized for performance, efficiency and availability
Maintains rigorous data structures
Runs the business
Emphasizes needs of all potential users
Short-running and engineered transactions
Stores detailed and/or summarized data
Stores only data of interest
Requires "as of" accuracy
Unstructured and ad-hoc transactions
Optimized for flexibility and ease of use
Supports dynamic data structures
Analyzes the business
Emphasizes needs of each user
Potentially long running and dynamically defined transactions
Performance: The unpredictable nature of the ad-hoc queries disrupts the requirement of predictable response time for operational systems. Predictable and guaranteed performance cannot be engineered into the system design if the transactions are not predictable.
Data Retention: The decision support applications often require longer retention of data for cumulative analysis than the operational systems, which only need it for the active business practice cycle. The growth in the size of the database can negatively impact performance, integrity, and the ability to meet any recoverability time constraints.
Logical Reasoning: Since the database is dynamically changing with each transaction, information queries are non-repeatable and chained queries do not necessarily operate on the same set of data. Deductive reasoning against a stable data store is not feasible. A temporal database that maintains a time view of the data could resolve this problem but creates a new set of issues unrelated to the pressing operational needs.
There are four generic ways to design and organize the corporation′s data asset.
They are not equal.
A combination of the Subject Database and Decision Support Database environments is most advantageous. This is called the Dual Database Environment.
A single database environment from which both operational and decision support requirements are met is desirable but plagued by many practical problems that make it infeasable.
Key Points
  1. There are four generic data environments:
  2. Major problems occur when routine access to operational database is given to decision support users.
  3. The recommended data architecture-the Dual Database Environment-is a mixture of the Subject Database and Decision Support Database environments.
File-Data Architecture
Close Database Architecture
Subject Database Architecture
Decision-Support Database Architecture

 

Some data architects would prefer a single database environment where both OLTP and decision support needs are fulfilled concurrently against a single database and, thereby, eliminate duplication and extraction altogether. It is our assessment that the two user communities have fundamentally different and incompatible requirements that preclude this option. Table 3 summarizes the major points of conflict. These dichotomies present a formidable barrier to a single database environment.

Table 3. Subject Database and Decision Support Database Dichotomies. (Source: Implementing Client/Server Computing, Bernard H. Boar, Mc-Graw Hill, 1993)

When routine access of operational databases is given to decision support users, major problems can occur:

We may summarize our views on data architecture as follows:

Data warehousing is the modern term used to describe a mature and robust decision support database environment consistent with the model in Figure 3. It implies that decision support databases have been carefully selected and designed to provide maximum utility and that a powerful set of tools have been provided to users to maximize their ability to leverage and exploit the captured operational data.

While data warehouses today are primarily single, physical databases with staged replication to departmental and personal databases, it should be anticipated that with the emergence of industrial grade distributed database management technology, data warehouses will become logical databases that transcend physically distributed decision support databases.



凡是有该标志的文章,都是该blog博主Caoer(草儿)原创,凡是索引、收藏
、转载请注明来处和原文作者。非常感谢。

posted on 2006-06-24 13:42 草儿 阅读(165) 评论(0)  编辑  收藏 所属分类: Data Warehouse