Nong Ye9780805840810, 0805840818
Table of contents :
TeamLiB……Page 1
Cover……Page 2
Contents……Page 7
Foreword……Page 20
Preface……Page 21
About the Editor……Page 25
Advisory Board……Page 27
Contributors……Page 29
I Methodologies of Data Mining……Page 33
INTRODUCTION……Page 35
PROBLEM DEFINITION……Page 36
Split Selection……Page 39
Data Access……Page 40
Tree Pruning……Page 47
Missing Values……Page 49
Split Selection……Page 52
Data Access……Page 53
SUMMARY……Page 54
REFERENCES……Page 55
2 Association Rules……Page 57
MARKET BASKET ANALYSIS……Page 58
ASSOCIATION RULE DISCOVERY……Page 59
The Apriori Algorithm……Page 60
The Power of the Frequent Item Set Strategy……Page 61
Lift……Page 63
ITEM SET DISCOVERY……Page 64
Closed Item Set Strategies……Page 65
TECHNIQUES FOR DISCOVERING ASSOCIATION RULES WITHOUT ITEM SET DISCOVERY……Page 67
APPLICATIONS OF ASSOCIATION RULE DISCOVERY……Page 68
SUMMARY……Page 69
REFERENCES……Page 70
3 Artificial Neural Network Models for Data Mining……Page 73
INTRODUCTION TO MULTILAYER FEEDFORWARD NETWORKS……Page 74
GRADIENT BASED TRAINING METHODS FOR MFN……Page 75
The Partial Derivatives……Page 76
Nonlinear Least Squares Methods……Page 77
Decision Tree Methods……Page 79
Discriminant Analysis Methods……Page 80
Multiple Partition Decision Tree……Page 81
A Growing MFN……Page 82
Experimental Conditions……Page 84
Quantitative Comparison Results of Classification Methods……Page 85
INTRODUCTION TO SOM……Page 87
The SOM Algorithm……Page 88
SOM Building Blocks……Page 89
Implementation of the SOM Algorithm……Page 90
CASE STUDY 2—DECODING MONKEY’S MOVEMENT DIRECTIONS FROM ITS CORTICAL ACTIVITIES……Page 91
Trajectory Computation from Motor Cortical Discharge Rates……Page 92
Train the SOM Train the SOM……Page 94
Average Testing Result Using the Leave- K- Out Method……Page 95
Closing Discussions on Case 2……Page 96
REFERENCES……Page 97
INTRODUCTION……Page 99
Variables Control Charts……Page 100
Attributes Control Charts……Page 113
Cumulative Sum Control Charts……Page 121
Exponentially Weighted Moving Average Control Charts……Page 125
Choice of Control Charting Techniques……Page 127
Average Run Length……Page 128
Hotelling T2 Control Chart……Page 130
Multivariate EWMA Control Charts……Page 133
REFERENCES……Page 134
5 Bayesian Data Analysis……Page 135
A Simple Example……Page 136
A More Complicated Example……Page 138
Hierarchical Models and Exchangeability……Page 141
Prior Distributions in Practice……Page 143
Model Selection……Page 145
Model Assessment……Page 146
Importance Sampling……Page 147
Markov Chain Monte Carlo ( MCMC)……Page 148
An Example……Page 149
Importance Sampling for Analysis of Massive Data Sets……Page 150
BUGS and Models of Realistic Complexity via MCMC……Page 153
Bayesian Predictive Modeling……Page 157
Bayesian Descriptive Modeling……Page 159
SUMMARY……Page 160
REFERENCES……Page 161
6 Hidden Markov Processes and Sequential Pattern Mining……Page 165
Introduction to Hidden Markov Models……Page 166
The EM Algorithm……Page 168
MCMC Data Augmentation……Page 170
The Likelihood Recursion……Page 172
The Forward-Backward Recursions……Page 173
The Viterbi Algorithm……Page 174
A Numerical Example Illustrating the Recursions……Page 175
Fetal Lamb Movements……Page 176
The Business Cycle……Page 182
Stationary Distribution of dt……Page 185
Summary……Page 186
References……Page 187
7 Strategies and Methods for Prediction……Page 191
Guiding Examples……Page 192
Prediction Model Components……Page 193
Common Regression Loss Functions……Page 194
Common Classi.cation Loss Functions……Page 195
Cox Loss Function for Survival Data……Page 198
Linear Models……Page 199
Linear Regression……Page 200
Classi.cation……Page 201
Generalized Linear Model……Page 204
Nearest Neighbor and Kernel Methods……Page 206
Tree Models……Page 209
Smoothing, Basis Expansions, and Additive Models……Page 211
Neural Networks……Page 214
Support Vector Machines……Page 215
Boosting……Page 217
Availability of Software……Page 220
Summary……Page 221
References……Page 222
8 Principal Components and Factor Analysis……Page 225
Examples of Variation Patterns in Correlated Multivariate Data……Page 226
Representation and Illustration of Variation Patterns in Multivariate Data……Page 229
Principal Components Analysis……Page 230
Using Principal Components as Estimates of the Variation Patterns……Page 231
Capabilities and Limitations of PCA……Page 234
Methods for Factor Rotation……Page 235
The Classic Blind Source Separation Problem……Page 237
Blind Separation Principles……Page 238
Fourth-Order Blind Separation Methods……Page 240
Available Software……Page 243
References……Page 244
9 Psychometric Methods of Latent Variable Modeling……Page 247
Introduction……Page 248
The Basic Latent Class Model……Page 249
The Basic Finite Mixture Model……Page 253
The Basic Latent Trait Model……Page 256
The Basic Factor Analytic Model……Page 258
Extending the Basic Latent Class Model……Page 261
Extending the Basic Mixture Model……Page 264
Extending the Latent Trait Model……Page 265
Extending the Factor Analytic Model……Page 266
Hierarchical Structure in Transaction Data……Page 268
Individualized Mixture Models……Page 269
Experimental Results……Page 270
References……Page 273
Tools……Page 275
References……Page 276
10 Scalable Clustering……Page 279
Introduction……Page 280
Clustering Techniques: A Brief Survey……Page 281
Partitional Methods……Page 282
Hierarchical Methods……Page 287
Assessment of Results……Page 288
Visualization of Results……Page 290
Transactional Data Analysis……Page 291
Next Generation Clickstream Clustering……Page 292
Large Scale Remote Sensing……Page 293
Scalability to Large Number of Records or Patterns, N……Page 294
Scalability to Large Number of Attributes or Dimensions, d……Page 296
Sequence Clustering Techniques……Page 298
Case Study: Similarity Based Clustering of Market Baskets and Web Logs……Page 299
Similarity Measures: A Sampler……Page 302
Clustering Algorithms and Text Data Sets……Page 304
Comparative Results……Page 305
Acknowledgments……Page 306
References……Page 307
Introduction……Page 311
Euclidean Distances and L p Norms……Page 313
General Transformations……Page 314
Dynamic Time Warping……Page 315
Longest Common Subsequence Similarity……Page 316
Piecewise Linear Representations……Page 319
Other Similarity Measures……Page 320
Indexing Techniques for Time Series……Page 321
Indexing Time Series When the Distance Function Is a Metric……Page 322
A Survey of Dimensionality Reduction Techniques……Page 324
Similar Time-Series Retrieval When the Distance Function Is Not a Metric……Page 331
Subsequence Retrieval……Page 333
References……Page 334
Introduction……Page 337
Reconstruction of Phase Space……Page 339
Computation of Dimension……Page 341
Detection of Unstable Periodic Orbits……Page 343
Computing Lyapunov Exponents from Time Series……Page 349
Time-Frequency Analysis of Time Series……Page 355
Analytic Signals and Hilbert Transform……Page 356
Method of EMD……Page 363
References……Page 370
13 Distributed Data Mining……Page 373
Introduction……Page 374
Related Research……Page 375
Data Distribution and Preprocessing……Page 376
Data Preprocessing……Page 377
Distributed Classi.er Learning……Page 378
Collective Data Mining……Page 381
Distributed Association Rule Mining……Page 382
Distributed Clustering……Page 383
Privacy Preserving Distributed Data Mining……Page 384
Distributed Data Mining Systems……Page 385
Architectural Issues……Page 386
Components Maintenance……Page 388
Future Directions……Page 389
References……Page 390
II: MANAGEMENT OF DATA MINING……Page 395
14 Data Collection, Preparation, Quality, and Visualization……Page 397
How Data Relates to Data Mining……Page 398
The “10 Commandments” of Data Mining……Page 400
What You Need to Know about Algorithms Before Preparing Data……Page 401
Choosing the Right Data……Page 402
Assembling the Data Set……Page 403
Assaying the Data Set……Page 404
Assessing the Effect of Missing Values……Page 405
Why Data Needs Preparing: The Business Case……Page 406
Missing Values……Page 407
Representing Time: Absolute, Relative, and Cyclic……Page 408
Outliers and Distribution Normalization……Page 409
Ranges and Normalization……Page 410
Numbers and Categories……Page 411
Data Quality……Page 412
What Is Quality?……Page 414
Data Visualization……Page 416
Seeing Is Believing……Page 417
Absolute Versus Relative Visualization……Page 420
Summary……Page 423
Introduction……Page 425
Spreadsheet Files……Page 427
Historical Databases……Page 429
Relational Database……Page 430
Object-Oriented Database……Page 431
OLAP……Page 434
Data Warehouse……Page 435
Distributed Databases……Page 436
Summary……Page 438
References……Page 439
16 Feature Extraction, Selection, and Construction……Page 441
Introduction……Page 442
Concepts……Page 443
Algorithms……Page 444
Summary……Page 445
Concepts……Page 446
Algorithm……Page 447
An Example……Page 448
Concepts……Page 449
Algorithms and Examples……Page 450
Summary……Page 451
Some Applications……Page 452
Summary……Page 453
References……Page 454
17 Performance Analysis and Evaluation……Page 457
Training versus Testing……Page 458
Error Measurement……Page 459
Error from Regression……Page 460
Error from Conditional Density Estimation……Page 461
Precision, Recall, and the F Measure……Page 462
Confusion Tables……Page 463
Clustering Performance: Unlabeled Data……Page 464
Signi.cance Testing……Page 465
Resampling and Cross-Validation……Page 467
Bootstrap……Page 468
Estimating Cost and Risk……Page 469
Interpretability……Page 470
References……Page 471
Introduction: Why There Are Security and Privacy Issues with Data Mining……Page 473
Privacy of Individual Data……Page 474
Fear of What Others May Find in Otherwise Releasable Data……Page 480
References……Page 483
Introduction……Page 485
XML for Data Mining Models……Page 486
SQL APIs……Page 488
Semantic Web……Page 489
Relationships……Page 490
References……Page 491
III: APPLICATIONS OF DATA MINING……Page 493
Introduction and Overview……Page 495
Methods……Page 496
Individual Learning……Page 499
Methods……Page 500
Distributions and Patterns of Individual Performance……Page 506
Other Areas……Page 508
References……Page 509
21 Mining Text Data……Page 513
Introduction……Page 514
Architecture of Text Mining Systems……Page 515
Text Categorization……Page 517
Semantic Tagging……Page 521
DIAL……Page 523
Development of IE Rules……Page 525
Auditing Environment……Page 531
Find……Page 532
Taxonomy Construction……Page 533
Soft Matching……Page 537
Anaphora Resolution……Page 538
Database Connectivity……Page 539
De.nitions and Notations……Page 540
Category Connection Maps……Page 541
Relationship Maps……Page 542
Summary……Page 548
References……Page 549
22 Mining Geospatial Data……Page 551
Introduction……Page 552
Illustrative Examples and Application Domains……Page 553
Tests for Detecting Spatial Outliers……Page 554
Spatial Colocation Rules……Page 557
Illustrative Application Domains……Page 558
Colocation Rule Approaches……Page 559
An Illustrative Application Domain……Page 562
Problem Formulation……Page 564
Modeling Spatial Dependencies Using the SAR and MRF Models……Page 565
Logistic SAR……Page 566
MRF Based Bayesian Classi.ers……Page 567
Clustering……Page 569
Categories of Clustering Algorithms……Page 571
K-Medoid: An Algorithm for Clustering……Page 572
Clustering, Mixture Analysis, and the EM Algorithm……Page 573
Summary……Page 576
References……Page 577
23 Mining Science and Engineering Data……Page 581
Introduction……Page 582
Motivation for Mining Scienti.c Data……Page 583
Data Mining in Astronomy……Page 584
Data Mining in Earth Sciences……Page 587
Data Mining in Nondestructive Testing……Page 589
Data Mining in Simulation Data……Page 590
Common Challenges in Mining Scienti.c Data……Page 593
Potential Solutions to Some Common Problems……Page 594
Data Registration……Page 596
De-Noising Data……Page 597
Object Identi.cation……Page 598
Dimensionality Reduction……Page 599
Software for Scienti.c Data Mining……Page 600
References……Page 601
24 Mining Data in Bioinformatics……Page 605
Basic Molecular Biology……Page 606
Mining Methods in Protein Structure Prediction……Page 607
Mining Protein Contact Maps……Page 609
Mining Methodology……Page 610
How Much Information Is There in Amino Acids Alone?……Page 613
Using Local Structures for Contact Prediction……Page 614
Characterizing Physical, Protein-Like Contact Maps……Page 619
Generating a Database of Protein-Like Structures……Page 620
Mining Dense Patterns in Contact Maps……Page 621
Pruning and Integration……Page 622
Experimental Results……Page 623
Heuristic Rules for “Physicality”……Page 625
Rules for Pathways in Contact Map Space……Page 626
Summary……Page 627
References……Page 628
Introduction……Page 629
Data Types……Page 631
E-Commerce Data……Page 633
Data Preparation……Page 636
Data Aggregation……Page 637
Feature Preparation……Page 639
Pattern Discovery……Page 640
Robustness……Page 642
Deployment……Page 643
Strategic Questions……Page 644
Operational Questions……Page 645
Summary……Page 647
References……Page 648
26 Mining Computer and Network Security Data……Page 649
Intrusive Activities and System Activity Data……Page 650
Phases of Intrusions……Page 651
Data of System Activities……Page 652
Extraction and Representation of Activity Features for Intrusion Detection……Page 655
Features of System Activities……Page 656
Feature Representation……Page 657
Existing Intrusion Detection Techniques……Page 660
Hotelling’s T 2 Test and Chi-Square Distance Test……Page 661
Data Source and Representation……Page 663
Testing Performance……Page 665
Summary……Page 666
References……Page 667
Introduction……Page 669
Related Works……Page 671
How to Discover the Number of Clusters: k……Page 673
K-Automatic Discovery Algorithm……Page 676
Experimental Results……Page 678
Data Sets……Page 679
Data Item Representation……Page 680
Evaluation Method……Page 681
Results and Analysis……Page 682
Summary……Page 686
References……Page 687
Introduction……Page 689
Hotelling T 2 Control Charts……Page 690
MEWMA Charts……Page 692
Nonparametric Properties of the MEWMA Control Charts……Page 695
Summary……Page 699
References……Page 700
Author Index……Page 701
Subject Index……Page 713
Reviews
There are no reviews yet.