Elementary Cluster Analysis : : Four Basic Methods That (Usually) Work.
Saved in:
: | |
---|---|
Place / Publishing House: | Denmark : : River Publishers,, 2022. Ã2022. |
Year of Publication: | 2022 |
Edition: | 1st ed. |
Language: | English |
Online Access: | |
Physical Description: | 1 online resource (518 pages) |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Table of Contents:
- Front Cover
- Elementary Cluster Analysis: Four Basic Methods that (Usually) Work
- Contents
- Preface
- List of Figures
- List of Tables
- List of Abbreviations
- Appendix A. List of Algorithms
- Appendix D. List of Definitions
- Appendix E. List of Examples
- Appendix L. List of Lemmas and Theorems
- Appendix V. List of Video Links
- I The Art and Science of Clustering
- 1 Clusters: The Human Point of View (HPOV)
- 1.1 Introduction
- 1.2 What are Clusters?
- 1.3 Notes and Remarks
- 1.4 Exercises
- 2 Uncertainty: Fuzzy Sets and Models
- 2.1 Introduction
- 2.2 Fuzzy Sets and Models
- 2.3 Fuzziness and Probability
- 2.4 Notes and Remarks
- 2.5 Exercises
- 3 Clusters: The Computer Point of View (CPOV)
- 3.1 Introduction
- 3.2 Label Vectors
- 3.3 Partition Matrices
- 3.4 How Many Clusters are Present in a Data Set?
- 3.5 CPOV Clusters: The Computer's Point of View
- 3.6 Notes and Remarks
- 3.7 Exercises
- 4 The Three Canonical Problems
- 4.1 Introduction
- 4.2 Tendency Assessment - (Are There Clusters?)
- 4.2.1 An Overview of Tendency Assessment
- 4.2.2 Minimal Spanning Trees (MSTs)
- 4.2.3 Visual Assessment of Clustering Tendency
- 4.2.4 The VAT and iVAT Reordering Algorithms
- 4.3 Clustering (Partitioning the Data into Clusters)
- 4.4 Cluster Validity (Which Clusters are "Best"?)
- 4.5 Notes and Remarks
- 4.6 Exercises
- 5 Feature Analysis
- 5.1 Introduction
- 5.2 Feature Nomination
- 5.3 Feature Analysis
- 5.4 Feature Selection
- 5.5 Feature Extraction
- 5.5.1 Principal Components Analysis
- 5.5.2 Random Projection
- 5.5.3 Sammon's Algorithm
- 5.5.4 Autoencoders
- 5.5.5 Relational Data
- 5.6 Normalization and Statistical Standardization
- 5.7 Notes and Remarks
- 5.8 Exercises
- II Four Basic Models and Algorithms
- 6 The c-Means (aka k-Means) Models
- 6.1 Introduction.
- 6.2 The Geometry of Partition Spaces
- 6.3 The HCM/FCM Models and Basic AO Algorithms
- 6.4 Cluster Accuracy for Labeled Data
- 6.5 Choosing Model Parameters (c, m, ||*||A)
- 6.5.1 How to Pick the Number of Clusters c
- 6.5.2 How to Pick the Weighting Exponent m
- 6.5.3 Choosing the Weight Matrix (A) for the Model Norm
- 6.6 Choosing Execution Parameters (V0, ", ||*||err,T)
- 6.6.1 Choosing Termination and Iterate Limit Criteria
- 6.6.2 How to Pick an Initial V0 (or U0)
- 6.6.3 Acceleration Schemes for HCM (aka k-Means) and (FCM)
- 6.7 Cluster Validity With the Best c Method
- 6.7.1 Scale Normalization
- 6.7.2 Statistical Standardization
- 6.7.3 Stochastic Correction for Chance
- 6.7.4 Best c Validation With Internal CVIs
- 6.7.5 Crisp Cluster Validity Indices
- 6.7.6 Soft Cluster Validity Indices
- 6.8 Alternate Forms of Hard c-Means (aka k-Means)
- 6.8.1 Bounds on k-Means in Randomly Projected Downspaces
- 6.8.2 Matrix Factorization for HCM for Clustering
- 6.8.3 SVD: A Global Bound for J1 (U, V
- X)
- 6.9 Notes and Remarks
- 6.10 Exercises
- 7 Probabilistic Clustering - GMD/EM
- 7.1 Introduction
- 7.2 The Mixture Model
- 7.3 The Multivariate Normal Distribution
- 7.4 Gaussian Mixture Decomposition
- 7.5 The Basic EM Algorithm for GMD
- 7.6 Choosing Model and Execution Parameters for EM
- 7.6.1 Estimating c With iVAT
- 7.6.2 Choosing Q0 or P0 in GMD
- 7.6.3 Implementation Parameters ", ||*||err,T for GMD With EM
- 7.6.4 Acceleration Schemes for GMD With EM
- 7.7 Model Selection and Cluster Validity for GMD
- 7.7.1 Two Interpretations of the Objective of GMD
- 7.7.2 Choosing the Number of Components Using GMD/EM With GOFIs
- 7.7.3 Choosing the Number of Clusters Using GMD/EM With CVIs
- 7.8 Notes and Remarks
- 7.9 Exercises
- 8 Relational Clustering - The SAHN Models
- 8.1 Relations and Similarity Measures.
- 8.2 The SAHN Model and Algorithms
- 8.3 Choosing Model Parameters for SAHN Clustering
- 8.4 Dendrogram Representation of SAHN Clusters
- 8.5 SL Implemented With Minimal Spanning Trees
- 8.5.1 The Role of the MST in Single Linkage Clustering
- 8.5.2 SL Compared to a Fitch-Margoliash Dendrogram
- 8.5.3 Repairing SL Sensitivity to Inliers and Bridge Points
- 8.5.4 Acceleration of the Single Linkage Algorithm
- 8.6 Cluster Validity for Single Linkage
- 8.7 An Example Using All Four Basic Models
- 8.8 Notes and Remarks
- 8.9 Exercises
- 9 Properties of the Fantastic Four: External Cluster Validity
- 9.1 Introduction
- 9.2 Computational Complexity
- 9.2.1 Using Big-Oh to Measure the Growth of Functions
- 9.2.2 Time and Space Complexity for the Fantastic Four
- 9.3 Customizing the c-Means Models to Account for Cluster Shape
- 9.3.1 Variable Norm Methods
- 9.3.2 Variable Prototype Methods
- 9.4 Traversing the Partition Landscape
- 9.5 External Cluster Validity With Labeled Data
- 9.5.1 External Paired-Comparison Cluster Validity Indices
- 9.5.2 External Best Match (Best U, or Best E) Validation
- 9.5.3 The Fantastic Four Use Best E Evaluations on Labeled Data
- 9.6 Choosing an Internal CVI Using Internal/External (Best I/E) Correlation
- 9.7 Notes and Remarks
- 9.8 Problems
- 10 Alternating Optimization
- 10.1 Introduction
- 10.2 General Considerations on Numerical Optimization
- 10.2.1 Iterative Solution of Optimization Problems
- 10.2.2 Iterative Solution of Alternating Optimization with (t, s) Schemes
- 10.3 Local Convergence Theory for AO
- 10.4 Global Convergence Theory
- 10.5 Impact of the Theory for the c-Means Models
- 10.6 Convergence for GMD Using EM/AO
- 10.7 Notes and Remarks
- 10.8 Exercises
- 11 Clustering in Static Big Data
- 11.1 The Jungle of Big Data
- 11.1.1 An Overview of Big Data.
- 11.1.2 Scalability vs. Acceleration
- 11.2 Methods for Clustering in Big Data
- 11.3 Sampling Functions
- 11.3.1 Chunk Sampling
- 11.3.2 Random Sampling
- 11.3.3 Progressive Sampling
- 11.3.4 Maximin (MM) Sampling
- 11.3.5 Aggregation and Non-Iterative Extension of a Literal Partition to the Rest of the Data
- 11.4 A Sampler of Other Methods: Precursors to Streaming Data Analysis
- 11.5 Visualization of Big Static Data
- 11.6 Extending Single Linkage for Static Big Data
- 11.7 Notes and Remarks
- 11.8 Exercises
- 12 Structural Assessment in Streaming Data
- 12.1 Streaming Data Analysis
- 12.1.1 The Streaming Process
- 12.1.2 Computational Footprints
- 12.2 Streaming Clustering Algorithms
- 12.2.1 Sequential Hard c-Means and Sebestyen's Method
- 12.2.2 Extensions of Sequential Hard c-Means: BIRCH, CluStream, and DenStream
- 12.2.3 Model-Based Algorithms
- 12.2.4 Projection and Grid-Based Methods
- 12.3 Reading the Footprints: Hindsight Evaluation
- 12.3.1 When You Can See the Data and Footprints
- 12.3.2 When You Can't See the Data and Footprints
- 12.3.3 Change Point Detection
- 12.4 Dynamic Evaluation of Streaming Data Analysis
- 12.4.1 Incremental Stream Monitoring Functions (ISMFs)
- 12.4.2 Visualization of Streaming Data
- 12.5 What's Next for Streaming Data Analysis?
- 12.6 Notes and Remarks
- 12.7 Exercises
- References
- Index
- About the Author
- Back Cover.