无为

无为则可为,无为则至深!

  BlogJava :: 首页 :: 联系 :: 聚合  :: 管理
  190 Posts :: 291 Stories :: 258 Comments :: 0 Trackbacks
Data mining algorithms are the foundation from which mining models are created. But why use models?  The use of models from prototypes to create the real thing is the same reason why we need to create models to base our data mining approach.  I still remember my differential calculus in this sense.  We studied the subject as a prerequisite for engineering mathematics simply because if you were to create a solid object, say a spherical tank, you need to know which mathematical model can be used to get what you want.  It's the same thing in using mining models.

In order for us to understand better, let me give you the different mining model algorithms and their uses.

Microsoft Decision Trees
[决策树]

This algorithm supports both classification and regression and works well for predictive modeling.  You can use this model to answer questions like "What causes customers to buy this product?"," Who among my existing customers do I need to focus on to generate more revenues?" and similar requirements. 

In management science, decision trees are pictorial networks of alternative courses of action showing the possible outcomes of different choices, taking into account probabilities, costs and returns. Decision trees enable a manager to set out the consequences of choices, ensuring that he has considered all possibilities and to assess the likelihood of each different possibility and to assess the result of each possibility in terms of cost and profit. In Microsoft SQL Server 2005 Data Mining, you are actually automating all of these tasks.   

Microsoft Clustering
[群集]

This algorithm uses iterative to group records from a data set into clusters containing similar characteristics.This model is used to answer questions like, “How can I differentiate my customers?”  You can also think in terms of grouping like all affluent people in the US happen to own an older car, an older house, either saved or invested money and married to one spouse(from the book The Millionaire Next Door by Dr. Thomas Stanley). By coming up with such results, you can probably use this information to change your busines approach (in this case, what can I do to be the next American millionaire? Kasi di applicable yung ibang findings nung author sa Philippine setting.)

Microsoft Naїve Bayes
[贝叶斯]

This is another classification algorithm similar to the decision trees.  Let’s consider an example to further explain this algorithm.  Let’s say you are an isaw and barbecue vendor who happened to use database to track your customers and their buying patterns(syempre, aside from isaw, merong tenga, balun-balunan, balut, etc – name it – Filipino specialty kaya mabili).  You gathered data from your customers like gender, age, occupation, and probably waistline.You can train this algorithm to use the given set of data against a classification, say item purchased.  This algorithm (ang hirap kasing i-type nung Naїve Bayes kaya algorithm na lang) can be used to estimate the probability that based on the data gathered, what items can be purchased by a given customer. So, based on the outcome, if Erap drops by your barbecue stand, knowing his age, gender, occupation, and waistline, you can predict what he will buy from you (kaya lang masisira yata ang predictions ng algorithm na ito okay Erap sa lakas nyang uminom at sa dami ng sinasama nya pag umiinom sya…he…he)       

Microsoft Time Series
[时间序列]

This algorithm uses a linear regression decision tree approach to analyze time-related data.  Using this algorithm you can forecast how much revenue you will be having for the next year based on the sales you had from the isaw and barbecue stand, what are inventory levels next month, and, if you have additional branch, you can predict based on the outcome of this model the probable revenue on the other branches. A typical application of this is the one I saw on the web wherein you simply key in your age, gender, occupation, lifestyle and it predicts how many days left before you die.  Although I really don’t believe in the results because no one knows when you will definitely leave this planet, it makes use of this concept.  

Microsoft Association Rules
[关联规则]

This algorithm builds rules describing which items are most likely to appear together in a transaction. You normally see this in Amazon.com’s website wherein they cross-sell products. You will read, “Customers who bought this product also bought these…” then the recommendations.  From a selling point of view, you can have a basis for suggestive selling like what the fast-food people do (“Would you like to try our new apple flavored gravy?”)  What they do is simply suggest. Implementing this algorithm makes intelligent suggesting.  There will be a higher probability that what you suggested will be taken into consideration based on existing data.  

Microsoft Sequence Clustering
[序列群集]

This algorithm analyzes sequence-oriented data that contains discrete-valued series and is a hybrid of sequence and clustering algorithms. Usually the sequence attribute in the series holds a set of events with a specific order. By analyzing the transition between states of the sequence, the algorithm can predict future states in related sequences.  This answers the questions “How do I differentiate my customers?” or “How do I know which events caused the outage of my servers?”  

Microsoft Nueral Networks
[神经网络]

A neural network is an interconnected group of artificial or biological neurons. Similar to the Microsoft Decision Trees algorithm provider, given each state of the predictable attribute, the algorithm calculates probabilities for each possible state of the input attribute. The algorithm provider processes the entire set of cases, iteratively comparing the predicted classification of the cases with the known actual classification of the cases. The errors from the initial classification of the first iteration of the entire set of cases is fed back into the network, and used to modify the network's performance for the next iteration, and so on. You can later use these probabilities to predict an outcome of the predicted attribute, based on the input attributes.  

One of the major advantages of neural networks is that, theoretically, they are capable of approximating any continuous function, and thus the researcher does not need to have any hypotheses about the underlying model, or even to some extent, which variables matter. An important disadvantage, however, is that the final solution depends on the initial conditions of the network, and, as stated before, it is virtually impossible to "interpret" the solution in traditional, analytic terms, such as those used to build theories that explain phenomena.  This algorithm helps you answer questions like “How long will this asset be in service or totally depreciated?” or “Will this customer, who is a recipient of a target mailing campaign, buy a product?”

Microsoft Linear Regression
[线性回归]

The Microsoft Linear Regression algorithm is a particular configuration of the Microsoft Decision Trees algorithm, obtained by disabling splits (the whole regression formula is built in a single root node). The algorithm supports the prediction of continuous attributes. One application I can think of in this case is predicting how related is stress from heart attack or how close corruption to people in the government is.  Business applications that could take advantage of this algorithm is how closely related is our marketing strategy to revenue for a particular product.  

Microsoft Logistic Regression
[逻辑回归]

This algorithm is a particular configuration of the Microsoft Neural Network algorithm, obtained by eliminating the hidden layer. The algorithm supports the prediction of both discrete and continuous attributes. An application of this is identifying the high cost users of medical care.  In countries like the US and Canada where medical care is partly shouldered by the government, analysis is needed to determine what factors are contributing to this so the government can generate policies on medical claims. 



凡是有该标志的文章,都是该blog博主Caoer(草儿)原创,凡是索引、收藏
、转载请注明来处和原文作者。非常感谢。

posted on 2006-06-10 13:36 草儿 阅读(145) 评论(0)  编辑  收藏 所属分类: BI and DM

只有注册用户登录后才能发表评论。


网站导航: