Preface. 1. Introduction. 1.1 Overview. 1.2 Problem definition. 1.3 Data preparation. 1.4 Implementation of the analysis. 1.5 Deployment of the results. 1.6 Book outline. 1.7 Summary. 1.8 Further reading. 2. Definition. 2.1 Overview. 2.2 Objectives. 2.3 Deliverables. 2.4 Roles and responsibilities. 2.5 Project plan. 2.6 Case study. 2.6.1 Overview. 2.6.2 Problem. 2.6.3 Deliverables. 2.6.4 Roles and responsibilities. 2.6.5 Current situation. 2.6.6 Timetable and budget. 2.6.7 Cost/benefit analysis. 2.7 Summary. 2.8 Further reading. 3. Preparation. 3.1 Overview. 3.2 Data sources. 3.3 Data understanding. 3.3.1 Data tables. 3.3.2 Continuous and discrete variables. 3.3.3 Scales of measurement. 3.3.4 Roles in analysis. 3.3.5 Frequency distribution. 3.4 Data preparation. 3.4.1 Overview. 3.4.2 Cleaning the data. 3.4.3 Removing variables. 3.4.4 Data transformations. 3.4.5 Segmentation. 3.5 Summary. 3.6 Exercises. 3.7 Further reading. 4. Tables and graphs. 4.1 Introduction. 4.2 Tables. 4.2.1 Data tables. 4.2.2 Contingency tables. 4.2.3 Summary tables. 4.3 Graphs. 4.3.1 Overview. 4.3.2 Frequency polygrams and histograms. 4.3.3 Scatterplots. 4.3.4 Box plots. 4.3.5 Multiple graphs. 4.4 Summary. 4.5 Exercises. 4.6 Further reading. 5. Statistics. 5.1 Overview. 5.2 Descriptive statistics. 5.2.1 Overview. 5.2.2 Central tendency. 5.2.3 Variation. 5.2.4 Shape. 5.2.5 Example. 5.3 Inferential statistics. 5.3.1 Overview. 5.3.2 Confidence intervals. 5.3.3 Hypothesis tests. 5.3.4 Chi-square. 5.3.5 One-way analysis of variance. 5.4 Comparative statistics. 5.4.1 Overview. 5.4.2 Visualizing relationships. 5.4.3 Correlation coefficient (r). 5.4.4 Correlation analysis for more than two variables. 5.5 Summary. 5.6 Exercises. 5.7 Further reading. 6. Grouping. 6.1 Introduction. 6.1.1 Overview. 6.1.2 Grouping by values or ranges. 6.1.3 Similarity measures. 6.1.4 Grouping approaches. 6.2 Clustering. 6.2.1 Overview. 6.2.2 Hierarchical agglomerative clustering. 6.2.3 K-means clustering. 6.3 Associative rules. 6.3.1 Overview. 6.3.2 Grouping by value combinations. 6.3.3 Extracting rules from groups. 6.3.4 Example. 6.4 Decision trees. 6.4.1 Overview. 6.4.2 Tree generation. 6.4.3 Splitting criteria. 6.4.4 Example. 6.5 Summary. 6.6 Exercises. 6.7 Further reading. 7. Prediction. 7.1 Introduction. 7.1.1 Overview. 7.1.2 Classification. 7.1.3 Regression. 7.1.4 Building a prediction model. 7.1.5 Applying a prediction model. 7.2 Simple regression models. 7.2.1 Overview. 7.2.2 Simple linear regression. 7.2.3 Simple nonlinear regression. 7.3 K-nearest neighbors. 7.3.1 Overview. 7.3.2 Learning. 7.3.3 Prediction. 7.4 Classification and regression trees. 7.4.1 Overview. 7.4.2 Predicting using decision trees. 7.4.3 Example. 7.5 Neural networks. 7.5.1 Overview. 7.5.2 Neural network layers. 7.5.3 Node calculations. 7.5.4 Neural network predictions. 7.5.5 Learning process. 7.5.6 Backpropagation. 7.5.7 Using neural networks. 7.5.8 Example. 7.6 Other methods. 7.7 Summary. 7.8 Exercises. 7.9 Further reading. 8. Deployment. 8.1 Overview. 8.2 Deliverables. 8.3 Activities. 8.4 Deployment scenarios. 8.5 Summary. 8.6 Further reading. 9. Conclusions. 9.1 Summary of process. 9.2 Example. 9.2.1 Problem overview. 9.2.2 Problem definition. 9.2.3 Data preparation. 9.2.4 Implementation of the analysis. 9.2.5 Deployment of the results. 9.3 Advanced data mining. 9.3.1 Overview. 9.3.2 Text data mining. 9.3.3 Time series data mining. 9.3.4 Sequence data mining. 9.4 Further reading. Appendix A Statistical tables. A.1 Normal distribution. A.2 Student’s t-distribution. A.3 Chi-square distribution. A.4 F-distribution. Appendix B Answers to exercises. Glossary. Bibliography. Index. |