Call now: 252-767-6166  
Oracle Training Oracle Support Development Oracle Apps

 
 Home
 E-mail Us
 Oracle Articles
New Oracle Articles


 Oracle Training
 Oracle Tips

 Oracle Forum
 Class Catalog


 Remote DBA
 Oracle Tuning
 Emergency 911
 RAC Support
 Apps Support
 Analysis
 Design
 Implementation
 Oracle Support


 SQL Tuning
 Security

 Oracle UNIX
 Oracle Linux
 Monitoring
 Remote s
upport
 Remote plans
 Remote
services
 Application Server

 Applications
 Oracle Forms
 Oracle Portal
 App Upgrades
 SQL Server
 Oracle Concepts
 Software Support

 Remote S
upport  
 Development  

 Implementation


 Consulting Staff
 Consulting Prices
 Help Wanted!

 


 Oracle Posters
 Oracle Books

 Oracle Scripts
 Ion
 Excel-DB  

Don Burleson Blog 


 

 

 


 

 

 

 

 

Using the K-Means Wizard

Data warehouse tips by Burleson Consulting

This is an excerpt from Dr. Ham's premier book "Oracle Data Mining: Mining Gold from your Warehouse".

The Wizard will automatically trim outliers and impute missing data by substituting the mean for numerical attributes and the mode for categorical attributes.  Normalization of numerical values is also performed using the Min/Max technique.  You can change these default settings by clicking on Advanced Settings when you finish the New Activity Wizard.  In the Build Tab, the default number of clusters is set at 10.  K-means must have a number of clusters to start with, in contrast to O-Cluster, which finds the number of clusters best suited to the dataset.   We?ll keep the default build settings and build the model.

After the ?Mining Activity? completes, click on Build results to view the clusters.  In this view, all clusters are shown.  Cluster #1 has all 1044 cases.  Cluster #2 is an intermediate cluster created from Cluster #1, with 615 cases, and Cluster #4 was created from Cluster #3.  The check box Show Leaves Onlywill display the final clustering. 

Next, highlight a cluster and click the ?Detail? button to view a histogramof the cluster centroid attributesand corresponding values.   Keeping in mind that clustering is an unsupervised data mining technique, meaning that there was no target attribute to predict, we can learn more about the similarities of customers who purchased insurance if by serendipity the clustering algorithm split on the target attribute. 

Finding majority cohort values

Even though there is usually no ?pure? sample of customers with the target value of interest, we may find a cohort of the population that has more or less a majority of that attribute value.  To explore this possibility, select each of the leaf clusters, choose Detail, and highlight the CARAVAN attribute.  As it turns out, three clusters #11, #14 and #16 are mostly insurance carriers for mobile homes.  Let?s choose Cluster #16 and compare it with Cluster #18, a cohort of customers without CARAVAN insurance.

You can see from the cluster details that Cluster #16 has 84 customers who all have insurance while Cluster #18 is comprised of 95 customers without insurance.  We didn?t plan these divisions, the algorithm found these naturally occurring clusters in the dataset.  In fact, performing the cluster algorithm on the entire dataset of 5822 cases does not yield any clusters where the CARAVAN target = all 1?s. 

We influenced this pure sample by stratifying the data so that the values of 1 and 0 were more evenly distributed in the starting cluster, Cluster #1.  You might use these subsets of the case dataset to define very homogeneous populations or cohortsof customers, hospital patients, sales executives or whatever business you may be investigating. 

Next, we proceed by clicking through each attribute to find those whose values are most different between the cohorts, as shown in the following examples.  To quickly review the values, you can place each Cluster Detail windows side by side.

As you can see from these examples, there are clear differences in various attributes between those customers who purchased CARAVAN insurance and those who did not, including any other insurance purchased, size of household, number of children in the household, and amount of money spent on third party insurance. 

 

For more tips and tricks for Oracle data warehouse analysis, see Dr. Ham's premier book "Oracle Data Mining: Mining Gold from your Warehouse"

You can buy it direct from the publisher for 30%-off:

http://www.rampant-books.com/book_2006_1_oracle_data_mining.htm


 

 
��  
 
 
Oracle Training at Sea
 
 
 
 
oracle dba poster
 

 
Follow us on Twitter 
 
Oracle performance tuning software 
 
Oracle Linux poster
 
 
 

 

Burleson is the American Team

Note: This Oracle documentation was created as a support and Oracle training reference for use by our DBA performance tuning consulting professionals.  Feel free to ask questions on our Oracle forum.

Verify experience! Anyone considering using the services of an Oracle support expert should independently investigate their credentials and experience, and not rely on advertisements and self-proclaimed expertise. All legitimate Oracle experts publish their Oracle qualifications.

Errata?  Oracle technology is changing and we strive to update our BC Oracle support information.  If you find an error or have a suggestion for improving our content, we would appreciate your feedback.  Just  e-mail:  

and include the URL for the page.


                    









Burleson Consulting

The Oracle of Database Support

Oracle Performance Tuning

Remote DBA Services


 

Copyright © 1996 -  2017

All rights reserved by Burleson

Oracle ® is the registered trademark of Oracle Corporation.

Remote Emergency Support provided by Conversational