Elementary Cluster Analysis : : Four Basic Methods That (Usually) Work.

Saved in:
Bibliographic Details
:
Place / Publishing House:Denmark : : River Publishers,, 2022.
Ã2022.
Year of Publication:2022
Edition:1st ed.
Language:English
Online Access:
Physical Description:1 online resource (518 pages)
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Front Cover
  • Elementary Cluster Analysis: Four Basic Methods that (Usually) Work
  • Contents
  • Preface
  • List of Figures
  • List of Tables
  • List of Abbreviations
  • Appendix A. List of Algorithms
  • Appendix D. List of Definitions
  • Appendix E. List of Examples
  • Appendix L. List of Lemmas and Theorems
  • Appendix V. List of Video Links
  • I The Art and Science of Clustering
  • 1 Clusters: The Human Point of View (HPOV)
  • 1.1 Introduction
  • 1.2 What are Clusters?
  • 1.3 Notes and Remarks
  • 1.4 Exercises
  • 2 Uncertainty: Fuzzy Sets and Models
  • 2.1 Introduction
  • 2.2 Fuzzy Sets and Models
  • 2.3 Fuzziness and Probability
  • 2.4 Notes and Remarks
  • 2.5 Exercises
  • 3 Clusters: The Computer Point of View (CPOV)
  • 3.1 Introduction
  • 3.2 Label Vectors
  • 3.3 Partition Matrices
  • 3.4 How Many Clusters are Present in a Data Set?
  • 3.5 CPOV Clusters: The Computer's Point of View
  • 3.6 Notes and Remarks
  • 3.7 Exercises
  • 4 The Three Canonical Problems
  • 4.1 Introduction
  • 4.2 Tendency Assessment - (Are There Clusters?)
  • 4.2.1 An Overview of Tendency Assessment
  • 4.2.2 Minimal Spanning Trees (MSTs)
  • 4.2.3 Visual Assessment of Clustering Tendency
  • 4.2.4 The VAT and iVAT Reordering Algorithms
  • 4.3 Clustering (Partitioning the Data into Clusters)
  • 4.4 Cluster Validity (Which Clusters are "Best"?)
  • 4.5 Notes and Remarks
  • 4.6 Exercises
  • 5 Feature Analysis
  • 5.1 Introduction
  • 5.2 Feature Nomination
  • 5.3 Feature Analysis
  • 5.4 Feature Selection
  • 5.5 Feature Extraction
  • 5.5.1 Principal Components Analysis
  • 5.5.2 Random Projection
  • 5.5.3 Sammon's Algorithm
  • 5.5.4 Autoencoders
  • 5.5.5 Relational Data
  • 5.6 Normalization and Statistical Standardization
  • 5.7 Notes and Remarks
  • 5.8 Exercises
  • II Four Basic Models and Algorithms
  • 6 The c-Means (aka k-Means) Models
  • 6.1 Introduction.
  • 6.2 The Geometry of Partition Spaces
  • 6.3 The HCM/FCM Models and Basic AO Algorithms
  • 6.4 Cluster Accuracy for Labeled Data
  • 6.5 Choosing Model Parameters (c, m, ||*||A)
  • 6.5.1 How to Pick the Number of Clusters c
  • 6.5.2 How to Pick the Weighting Exponent m
  • 6.5.3 Choosing the Weight Matrix (A) for the Model Norm
  • 6.6 Choosing Execution Parameters (V0, ", ||*||err,T)
  • 6.6.1 Choosing Termination and Iterate Limit Criteria
  • 6.6.2 How to Pick an Initial V0 (or U0)
  • 6.6.3 Acceleration Schemes for HCM (aka k-Means) and (FCM)
  • 6.7 Cluster Validity With the Best c Method
  • 6.7.1 Scale Normalization
  • 6.7.2 Statistical Standardization
  • 6.7.3 Stochastic Correction for Chance
  • 6.7.4 Best c Validation With Internal CVIs
  • 6.7.5 Crisp Cluster Validity Indices
  • 6.7.6 Soft Cluster Validity Indices
  • 6.8 Alternate Forms of Hard c-Means (aka k-Means)
  • 6.8.1 Bounds on k-Means in Randomly Projected Downspaces
  • 6.8.2 Matrix Factorization for HCM for Clustering
  • 6.8.3 SVD: A Global Bound for J1 (U, V
  • X)
  • 6.9 Notes and Remarks
  • 6.10 Exercises
  • 7 Probabilistic Clustering - GMD/EM
  • 7.1 Introduction
  • 7.2 The Mixture Model
  • 7.3 The Multivariate Normal Distribution
  • 7.4 Gaussian Mixture Decomposition
  • 7.5 The Basic EM Algorithm for GMD
  • 7.6 Choosing Model and Execution Parameters for EM
  • 7.6.1 Estimating c With iVAT
  • 7.6.2 Choosing Q0 or P0 in GMD
  • 7.6.3 Implementation Parameters ", ||*||err,T for GMD With EM
  • 7.6.4 Acceleration Schemes for GMD With EM
  • 7.7 Model Selection and Cluster Validity for GMD
  • 7.7.1 Two Interpretations of the Objective of GMD
  • 7.7.2 Choosing the Number of Components Using GMD/EM With GOFIs
  • 7.7.3 Choosing the Number of Clusters Using GMD/EM With CVIs
  • 7.8 Notes and Remarks
  • 7.9 Exercises
  • 8 Relational Clustering - The SAHN Models
  • 8.1 Relations and Similarity Measures.
  • 8.2 The SAHN Model and Algorithms
  • 8.3 Choosing Model Parameters for SAHN Clustering
  • 8.4 Dendrogram Representation of SAHN Clusters
  • 8.5 SL Implemented With Minimal Spanning Trees
  • 8.5.1 The Role of the MST in Single Linkage Clustering
  • 8.5.2 SL Compared to a Fitch-Margoliash Dendrogram
  • 8.5.3 Repairing SL Sensitivity to Inliers and Bridge Points
  • 8.5.4 Acceleration of the Single Linkage Algorithm
  • 8.6 Cluster Validity for Single Linkage
  • 8.7 An Example Using All Four Basic Models
  • 8.8 Notes and Remarks
  • 8.9 Exercises
  • 9 Properties of the Fantastic Four: External Cluster Validity
  • 9.1 Introduction
  • 9.2 Computational Complexity
  • 9.2.1 Using Big-Oh to Measure the Growth of Functions
  • 9.2.2 Time and Space Complexity for the Fantastic Four
  • 9.3 Customizing the c-Means Models to Account for Cluster Shape
  • 9.3.1 Variable Norm Methods
  • 9.3.2 Variable Prototype Methods
  • 9.4 Traversing the Partition Landscape
  • 9.5 External Cluster Validity With Labeled Data
  • 9.5.1 External Paired-Comparison Cluster Validity Indices
  • 9.5.2 External Best Match (Best U, or Best E) Validation
  • 9.5.3 The Fantastic Four Use Best E Evaluations on Labeled Data
  • 9.6 Choosing an Internal CVI Using Internal/External (Best I/E) Correlation
  • 9.7 Notes and Remarks
  • 9.8 Problems
  • 10 Alternating Optimization
  • 10.1 Introduction
  • 10.2 General Considerations on Numerical Optimization
  • 10.2.1 Iterative Solution of Optimization Problems
  • 10.2.2 Iterative Solution of Alternating Optimization with (t, s) Schemes
  • 10.3 Local Convergence Theory for AO
  • 10.4 Global Convergence Theory
  • 10.5 Impact of the Theory for the c-Means Models
  • 10.6 Convergence for GMD Using EM/AO
  • 10.7 Notes and Remarks
  • 10.8 Exercises
  • 11 Clustering in Static Big Data
  • 11.1 The Jungle of Big Data
  • 11.1.1 An Overview of Big Data.
  • 11.1.2 Scalability vs. Acceleration
  • 11.2 Methods for Clustering in Big Data
  • 11.3 Sampling Functions
  • 11.3.1 Chunk Sampling
  • 11.3.2 Random Sampling
  • 11.3.3 Progressive Sampling
  • 11.3.4 Maximin (MM) Sampling
  • 11.3.5 Aggregation and Non-Iterative Extension of a Literal Partition to the Rest of the Data
  • 11.4 A Sampler of Other Methods: Precursors to Streaming Data Analysis
  • 11.5 Visualization of Big Static Data
  • 11.6 Extending Single Linkage for Static Big Data
  • 11.7 Notes and Remarks
  • 11.8 Exercises
  • 12 Structural Assessment in Streaming Data
  • 12.1 Streaming Data Analysis
  • 12.1.1 The Streaming Process
  • 12.1.2 Computational Footprints
  • 12.2 Streaming Clustering Algorithms
  • 12.2.1 Sequential Hard c-Means and Sebestyen's Method
  • 12.2.2 Extensions of Sequential Hard c-Means: BIRCH, CluStream, and DenStream
  • 12.2.3 Model-Based Algorithms
  • 12.2.4 Projection and Grid-Based Methods
  • 12.3 Reading the Footprints: Hindsight Evaluation
  • 12.3.1 When You Can See the Data and Footprints
  • 12.3.2 When You Can't See the Data and Footprints
  • 12.3.3 Change Point Detection
  • 12.4 Dynamic Evaluation of Streaming Data Analysis
  • 12.4.1 Incremental Stream Monitoring Functions (ISMFs)
  • 12.4.2 Visualization of Streaming Data
  • 12.5 What's Next for Streaming Data Analysis?
  • 12.6 Notes and Remarks
  • 12.7 Exercises
  • References
  • Index
  • About the Author
  • Back Cover.