Elementary Cluster Analysis : : Four Basic Methods That (Usually) Work.

Saved in:
Bibliographic Details
:
Place / Publishing House:Denmark : : River Publishers,, 2022.
Ã2022.
Year of Publication:2022
Edition:1st ed.
Language:English
Online Access:
Physical Description:1 online resource (518 pages)
Tags: Add Tag
No Tags, Be the first to tag this record!
id 50029156150
ctrlnum (MiAaPQ)50029156150
(Au-PeEL)EBL29156150
(OCoLC)1311313906
collection bib_alma
record_format marc
spelling Bezdek, James C.
Elementary Cluster Analysis : Four Basic Methods That (Usually) Work.
1st ed.
Denmark : River Publishers, 2022.
Ã2022.
1 online resource (518 pages)
text txt rdacontent
computer c rdamedia
online resource cr rdacarrier
Front Cover -- Elementary Cluster Analysis: Four Basic Methods that (Usually) Work -- Contents -- Preface -- List of Figures -- List of Tables -- List of Abbreviations -- Appendix A. List of Algorithms -- Appendix D. List of Definitions -- Appendix E. List of Examples -- Appendix L. List of Lemmas and Theorems -- Appendix V. List of Video Links -- I The Art and Science of Clustering -- 1 Clusters: The Human Point of View (HPOV) -- 1.1 Introduction -- 1.2 What are Clusters? -- 1.3 Notes and Remarks -- 1.4 Exercises -- 2 Uncertainty: Fuzzy Sets and Models -- 2.1 Introduction -- 2.2 Fuzzy Sets and Models -- 2.3 Fuzziness and Probability -- 2.4 Notes and Remarks -- 2.5 Exercises -- 3 Clusters: The Computer Point of View (CPOV) -- 3.1 Introduction -- 3.2 Label Vectors -- 3.3 Partition Matrices -- 3.4 How Many Clusters are Present in a Data Set? -- 3.5 CPOV Clusters: The Computer's Point of View -- 3.6 Notes and Remarks -- 3.7 Exercises -- 4 The Three Canonical Problems -- 4.1 Introduction -- 4.2 Tendency Assessment - (Are There Clusters?) -- 4.2.1 An Overview of Tendency Assessment -- 4.2.2 Minimal Spanning Trees (MSTs) -- 4.2.3 Visual Assessment of Clustering Tendency -- 4.2.4 The VAT and iVAT Reordering Algorithms -- 4.3 Clustering (Partitioning the Data into Clusters) -- 4.4 Cluster Validity (Which Clusters are "Best"?) -- 4.5 Notes and Remarks -- 4.6 Exercises -- 5 Feature Analysis -- 5.1 Introduction -- 5.2 Feature Nomination -- 5.3 Feature Analysis -- 5.4 Feature Selection -- 5.5 Feature Extraction -- 5.5.1 Principal Components Analysis -- 5.5.2 Random Projection -- 5.5.3 Sammon's Algorithm -- 5.5.4 Autoencoders -- 5.5.5 Relational Data -- 5.6 Normalization and Statistical Standardization -- 5.7 Notes and Remarks -- 5.8 Exercises -- II Four Basic Models and Algorithms -- 6 The c-Means (aka k-Means) Models -- 6.1 Introduction.
6.2 The Geometry of Partition Spaces -- 6.3 The HCM/FCM Models and Basic AO Algorithms -- 6.4 Cluster Accuracy for Labeled Data -- 6.5 Choosing Model Parameters (c, m, ||*||A) -- 6.5.1 How to Pick the Number of Clusters c -- 6.5.2 How to Pick the Weighting Exponent m -- 6.5.3 Choosing the Weight Matrix (A) for the Model Norm -- 6.6 Choosing Execution Parameters (V0, ", ||*||err,T) -- 6.6.1 Choosing Termination and Iterate Limit Criteria -- 6.6.2 How to Pick an Initial V0 (or U0) -- 6.6.3 Acceleration Schemes for HCM (aka k-Means) and (FCM) -- 6.7 Cluster Validity With the Best c Method -- 6.7.1 Scale Normalization -- 6.7.2 Statistical Standardization -- 6.7.3 Stochastic Correction for Chance -- 6.7.4 Best c Validation With Internal CVIs -- 6.7.5 Crisp Cluster Validity Indices -- 6.7.6 Soft Cluster Validity Indices -- 6.8 Alternate Forms of Hard c-Means (aka k-Means) -- 6.8.1 Bounds on k-Means in Randomly Projected Downspaces -- 6.8.2 Matrix Factorization for HCM for Clustering -- 6.8.3 SVD: A Global Bound for J1 (U, V -- X) -- 6.9 Notes and Remarks -- 6.10 Exercises -- 7 Probabilistic Clustering - GMD/EM -- 7.1 Introduction -- 7.2 The Mixture Model -- 7.3 The Multivariate Normal Distribution -- 7.4 Gaussian Mixture Decomposition -- 7.5 The Basic EM Algorithm for GMD -- 7.6 Choosing Model and Execution Parameters for EM -- 7.6.1 Estimating c With iVAT -- 7.6.2 Choosing Q0 or P0 in GMD -- 7.6.3 Implementation Parameters ", ||*||err,T for GMD With EM -- 7.6.4 Acceleration Schemes for GMD With EM -- 7.7 Model Selection and Cluster Validity for GMD -- 7.7.1 Two Interpretations of the Objective of GMD -- 7.7.2 Choosing the Number of Components Using GMD/EM With GOFIs -- 7.7.3 Choosing the Number of Clusters Using GMD/EM With CVIs -- 7.8 Notes and Remarks -- 7.9 Exercises -- 8 Relational Clustering - The SAHN Models -- 8.1 Relations and Similarity Measures.
8.2 The SAHN Model and Algorithms -- 8.3 Choosing Model Parameters for SAHN Clustering -- 8.4 Dendrogram Representation of SAHN Clusters -- 8.5 SL Implemented With Minimal Spanning Trees -- 8.5.1 The Role of the MST in Single Linkage Clustering -- 8.5.2 SL Compared to a Fitch-Margoliash Dendrogram -- 8.5.3 Repairing SL Sensitivity to Inliers and Bridge Points -- 8.5.4 Acceleration of the Single Linkage Algorithm -- 8.6 Cluster Validity for Single Linkage -- 8.7 An Example Using All Four Basic Models -- 8.8 Notes and Remarks -- 8.9 Exercises -- 9 Properties of the Fantastic Four: External Cluster Validity -- 9.1 Introduction -- 9.2 Computational Complexity -- 9.2.1 Using Big-Oh to Measure the Growth of Functions -- 9.2.2 Time and Space Complexity for the Fantastic Four -- 9.3 Customizing the c-Means Models to Account for Cluster Shape -- 9.3.1 Variable Norm Methods -- 9.3.2 Variable Prototype Methods -- 9.4 Traversing the Partition Landscape -- 9.5 External Cluster Validity With Labeled Data -- 9.5.1 External Paired-Comparison Cluster Validity Indices -- 9.5.2 External Best Match (Best U, or Best E) Validation -- 9.5.3 The Fantastic Four Use Best E Evaluations on Labeled Data -- 9.6 Choosing an Internal CVI Using Internal/External (Best I/E) Correlation -- 9.7 Notes and Remarks -- 9.8 Problems -- 10 Alternating Optimization -- 10.1 Introduction -- 10.2 General Considerations on Numerical Optimization -- 10.2.1 Iterative Solution of Optimization Problems -- 10.2.2 Iterative Solution of Alternating Optimization with (t, s) Schemes -- 10.3 Local Convergence Theory for AO -- 10.4 Global Convergence Theory -- 10.5 Impact of the Theory for the c-Means Models -- 10.6 Convergence for GMD Using EM/AO -- 10.7 Notes and Remarks -- 10.8 Exercises -- 11 Clustering in Static Big Data -- 11.1 The Jungle of Big Data -- 11.1.1 An Overview of Big Data.
11.1.2 Scalability vs. Acceleration -- 11.2 Methods for Clustering in Big Data -- 11.3 Sampling Functions -- 11.3.1 Chunk Sampling -- 11.3.2 Random Sampling -- 11.3.3 Progressive Sampling -- 11.3.4 Maximin (MM) Sampling -- 11.3.5 Aggregation and Non-Iterative Extension of a Literal Partition to the Rest of the Data -- 11.4 A Sampler of Other Methods: Precursors to Streaming Data Analysis -- 11.5 Visualization of Big Static Data -- 11.6 Extending Single Linkage for Static Big Data -- 11.7 Notes and Remarks -- 11.8 Exercises -- 12 Structural Assessment in Streaming Data -- 12.1 Streaming Data Analysis -- 12.1.1 The Streaming Process -- 12.1.2 Computational Footprints -- 12.2 Streaming Clustering Algorithms -- 12.2.1 Sequential Hard c-Means and Sebestyen's Method -- 12.2.2 Extensions of Sequential Hard c-Means: BIRCH, CluStream, and DenStream -- 12.2.3 Model-Based Algorithms -- 12.2.4 Projection and Grid-Based Methods -- 12.3 Reading the Footprints: Hindsight Evaluation -- 12.3.1 When You Can See the Data and Footprints -- 12.3.2 When You Can't See the Data and Footprints -- 12.3.3 Change Point Detection -- 12.4 Dynamic Evaluation of Streaming Data Analysis -- 12.4.1 Incremental Stream Monitoring Functions (ISMFs) -- 12.4.2 Visualization of Streaming Data -- 12.5 What's Next for Streaming Data Analysis? -- 12.6 Notes and Remarks -- 12.7 Exercises -- References -- Index -- About the Author -- Back Cover.
Description based on publisher supplied metadata and other sources.
Electronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2024. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries.
Cluster analysis.
Cluster analysis--Data processing.
Electronic books.
Print version: Bezdek, James C. Elementary Cluster Analysis: Four Basic Methods That (Usually) Work Denmark : River Publishers,c2022
ProQuest (Firm)
https://ebookcentral.proquest.com/lib/oeawat/detail.action?docID=29156150 Click to View
language English
format eBook
author Bezdek, James C.
spellingShingle Bezdek, James C.
Elementary Cluster Analysis : Four Basic Methods That (Usually) Work.
Front Cover -- Elementary Cluster Analysis: Four Basic Methods that (Usually) Work -- Contents -- Preface -- List of Figures -- List of Tables -- List of Abbreviations -- Appendix A. List of Algorithms -- Appendix D. List of Definitions -- Appendix E. List of Examples -- Appendix L. List of Lemmas and Theorems -- Appendix V. List of Video Links -- I The Art and Science of Clustering -- 1 Clusters: The Human Point of View (HPOV) -- 1.1 Introduction -- 1.2 What are Clusters? -- 1.3 Notes and Remarks -- 1.4 Exercises -- 2 Uncertainty: Fuzzy Sets and Models -- 2.1 Introduction -- 2.2 Fuzzy Sets and Models -- 2.3 Fuzziness and Probability -- 2.4 Notes and Remarks -- 2.5 Exercises -- 3 Clusters: The Computer Point of View (CPOV) -- 3.1 Introduction -- 3.2 Label Vectors -- 3.3 Partition Matrices -- 3.4 How Many Clusters are Present in a Data Set? -- 3.5 CPOV Clusters: The Computer's Point of View -- 3.6 Notes and Remarks -- 3.7 Exercises -- 4 The Three Canonical Problems -- 4.1 Introduction -- 4.2 Tendency Assessment - (Are There Clusters?) -- 4.2.1 An Overview of Tendency Assessment -- 4.2.2 Minimal Spanning Trees (MSTs) -- 4.2.3 Visual Assessment of Clustering Tendency -- 4.2.4 The VAT and iVAT Reordering Algorithms -- 4.3 Clustering (Partitioning the Data into Clusters) -- 4.4 Cluster Validity (Which Clusters are "Best"?) -- 4.5 Notes and Remarks -- 4.6 Exercises -- 5 Feature Analysis -- 5.1 Introduction -- 5.2 Feature Nomination -- 5.3 Feature Analysis -- 5.4 Feature Selection -- 5.5 Feature Extraction -- 5.5.1 Principal Components Analysis -- 5.5.2 Random Projection -- 5.5.3 Sammon's Algorithm -- 5.5.4 Autoencoders -- 5.5.5 Relational Data -- 5.6 Normalization and Statistical Standardization -- 5.7 Notes and Remarks -- 5.8 Exercises -- II Four Basic Models and Algorithms -- 6 The c-Means (aka k-Means) Models -- 6.1 Introduction.
6.2 The Geometry of Partition Spaces -- 6.3 The HCM/FCM Models and Basic AO Algorithms -- 6.4 Cluster Accuracy for Labeled Data -- 6.5 Choosing Model Parameters (c, m, ||*||A) -- 6.5.1 How to Pick the Number of Clusters c -- 6.5.2 How to Pick the Weighting Exponent m -- 6.5.3 Choosing the Weight Matrix (A) for the Model Norm -- 6.6 Choosing Execution Parameters (V0, ", ||*||err,T) -- 6.6.1 Choosing Termination and Iterate Limit Criteria -- 6.6.2 How to Pick an Initial V0 (or U0) -- 6.6.3 Acceleration Schemes for HCM (aka k-Means) and (FCM) -- 6.7 Cluster Validity With the Best c Method -- 6.7.1 Scale Normalization -- 6.7.2 Statistical Standardization -- 6.7.3 Stochastic Correction for Chance -- 6.7.4 Best c Validation With Internal CVIs -- 6.7.5 Crisp Cluster Validity Indices -- 6.7.6 Soft Cluster Validity Indices -- 6.8 Alternate Forms of Hard c-Means (aka k-Means) -- 6.8.1 Bounds on k-Means in Randomly Projected Downspaces -- 6.8.2 Matrix Factorization for HCM for Clustering -- 6.8.3 SVD: A Global Bound for J1 (U, V -- X) -- 6.9 Notes and Remarks -- 6.10 Exercises -- 7 Probabilistic Clustering - GMD/EM -- 7.1 Introduction -- 7.2 The Mixture Model -- 7.3 The Multivariate Normal Distribution -- 7.4 Gaussian Mixture Decomposition -- 7.5 The Basic EM Algorithm for GMD -- 7.6 Choosing Model and Execution Parameters for EM -- 7.6.1 Estimating c With iVAT -- 7.6.2 Choosing Q0 or P0 in GMD -- 7.6.3 Implementation Parameters ", ||*||err,T for GMD With EM -- 7.6.4 Acceleration Schemes for GMD With EM -- 7.7 Model Selection and Cluster Validity for GMD -- 7.7.1 Two Interpretations of the Objective of GMD -- 7.7.2 Choosing the Number of Components Using GMD/EM With GOFIs -- 7.7.3 Choosing the Number of Clusters Using GMD/EM With CVIs -- 7.8 Notes and Remarks -- 7.9 Exercises -- 8 Relational Clustering - The SAHN Models -- 8.1 Relations and Similarity Measures.
8.2 The SAHN Model and Algorithms -- 8.3 Choosing Model Parameters for SAHN Clustering -- 8.4 Dendrogram Representation of SAHN Clusters -- 8.5 SL Implemented With Minimal Spanning Trees -- 8.5.1 The Role of the MST in Single Linkage Clustering -- 8.5.2 SL Compared to a Fitch-Margoliash Dendrogram -- 8.5.3 Repairing SL Sensitivity to Inliers and Bridge Points -- 8.5.4 Acceleration of the Single Linkage Algorithm -- 8.6 Cluster Validity for Single Linkage -- 8.7 An Example Using All Four Basic Models -- 8.8 Notes and Remarks -- 8.9 Exercises -- 9 Properties of the Fantastic Four: External Cluster Validity -- 9.1 Introduction -- 9.2 Computational Complexity -- 9.2.1 Using Big-Oh to Measure the Growth of Functions -- 9.2.2 Time and Space Complexity for the Fantastic Four -- 9.3 Customizing the c-Means Models to Account for Cluster Shape -- 9.3.1 Variable Norm Methods -- 9.3.2 Variable Prototype Methods -- 9.4 Traversing the Partition Landscape -- 9.5 External Cluster Validity With Labeled Data -- 9.5.1 External Paired-Comparison Cluster Validity Indices -- 9.5.2 External Best Match (Best U, or Best E) Validation -- 9.5.3 The Fantastic Four Use Best E Evaluations on Labeled Data -- 9.6 Choosing an Internal CVI Using Internal/External (Best I/E) Correlation -- 9.7 Notes and Remarks -- 9.8 Problems -- 10 Alternating Optimization -- 10.1 Introduction -- 10.2 General Considerations on Numerical Optimization -- 10.2.1 Iterative Solution of Optimization Problems -- 10.2.2 Iterative Solution of Alternating Optimization with (t, s) Schemes -- 10.3 Local Convergence Theory for AO -- 10.4 Global Convergence Theory -- 10.5 Impact of the Theory for the c-Means Models -- 10.6 Convergence for GMD Using EM/AO -- 10.7 Notes and Remarks -- 10.8 Exercises -- 11 Clustering in Static Big Data -- 11.1 The Jungle of Big Data -- 11.1.1 An Overview of Big Data.
11.1.2 Scalability vs. Acceleration -- 11.2 Methods for Clustering in Big Data -- 11.3 Sampling Functions -- 11.3.1 Chunk Sampling -- 11.3.2 Random Sampling -- 11.3.3 Progressive Sampling -- 11.3.4 Maximin (MM) Sampling -- 11.3.5 Aggregation and Non-Iterative Extension of a Literal Partition to the Rest of the Data -- 11.4 A Sampler of Other Methods: Precursors to Streaming Data Analysis -- 11.5 Visualization of Big Static Data -- 11.6 Extending Single Linkage for Static Big Data -- 11.7 Notes and Remarks -- 11.8 Exercises -- 12 Structural Assessment in Streaming Data -- 12.1 Streaming Data Analysis -- 12.1.1 The Streaming Process -- 12.1.2 Computational Footprints -- 12.2 Streaming Clustering Algorithms -- 12.2.1 Sequential Hard c-Means and Sebestyen's Method -- 12.2.2 Extensions of Sequential Hard c-Means: BIRCH, CluStream, and DenStream -- 12.2.3 Model-Based Algorithms -- 12.2.4 Projection and Grid-Based Methods -- 12.3 Reading the Footprints: Hindsight Evaluation -- 12.3.1 When You Can See the Data and Footprints -- 12.3.2 When You Can't See the Data and Footprints -- 12.3.3 Change Point Detection -- 12.4 Dynamic Evaluation of Streaming Data Analysis -- 12.4.1 Incremental Stream Monitoring Functions (ISMFs) -- 12.4.2 Visualization of Streaming Data -- 12.5 What's Next for Streaming Data Analysis? -- 12.6 Notes and Remarks -- 12.7 Exercises -- References -- Index -- About the Author -- Back Cover.
author_facet Bezdek, James C.
author_variant j c b jc jcb
author_sort Bezdek, James C.
title Elementary Cluster Analysis : Four Basic Methods That (Usually) Work.
title_sub Four Basic Methods That (Usually) Work.
title_full Elementary Cluster Analysis : Four Basic Methods That (Usually) Work.
title_fullStr Elementary Cluster Analysis : Four Basic Methods That (Usually) Work.
title_full_unstemmed Elementary Cluster Analysis : Four Basic Methods That (Usually) Work.
title_auth Elementary Cluster Analysis : Four Basic Methods That (Usually) Work.
title_new Elementary Cluster Analysis :
title_sort elementary cluster analysis : four basic methods that (usually) work.
publisher River Publishers,
publishDate 2022
physical 1 online resource (518 pages)
edition 1st ed.
contents Front Cover -- Elementary Cluster Analysis: Four Basic Methods that (Usually) Work -- Contents -- Preface -- List of Figures -- List of Tables -- List of Abbreviations -- Appendix A. List of Algorithms -- Appendix D. List of Definitions -- Appendix E. List of Examples -- Appendix L. List of Lemmas and Theorems -- Appendix V. List of Video Links -- I The Art and Science of Clustering -- 1 Clusters: The Human Point of View (HPOV) -- 1.1 Introduction -- 1.2 What are Clusters? -- 1.3 Notes and Remarks -- 1.4 Exercises -- 2 Uncertainty: Fuzzy Sets and Models -- 2.1 Introduction -- 2.2 Fuzzy Sets and Models -- 2.3 Fuzziness and Probability -- 2.4 Notes and Remarks -- 2.5 Exercises -- 3 Clusters: The Computer Point of View (CPOV) -- 3.1 Introduction -- 3.2 Label Vectors -- 3.3 Partition Matrices -- 3.4 How Many Clusters are Present in a Data Set? -- 3.5 CPOV Clusters: The Computer's Point of View -- 3.6 Notes and Remarks -- 3.7 Exercises -- 4 The Three Canonical Problems -- 4.1 Introduction -- 4.2 Tendency Assessment - (Are There Clusters?) -- 4.2.1 An Overview of Tendency Assessment -- 4.2.2 Minimal Spanning Trees (MSTs) -- 4.2.3 Visual Assessment of Clustering Tendency -- 4.2.4 The VAT and iVAT Reordering Algorithms -- 4.3 Clustering (Partitioning the Data into Clusters) -- 4.4 Cluster Validity (Which Clusters are "Best"?) -- 4.5 Notes and Remarks -- 4.6 Exercises -- 5 Feature Analysis -- 5.1 Introduction -- 5.2 Feature Nomination -- 5.3 Feature Analysis -- 5.4 Feature Selection -- 5.5 Feature Extraction -- 5.5.1 Principal Components Analysis -- 5.5.2 Random Projection -- 5.5.3 Sammon's Algorithm -- 5.5.4 Autoencoders -- 5.5.5 Relational Data -- 5.6 Normalization and Statistical Standardization -- 5.7 Notes and Remarks -- 5.8 Exercises -- II Four Basic Models and Algorithms -- 6 The c-Means (aka k-Means) Models -- 6.1 Introduction.
6.2 The Geometry of Partition Spaces -- 6.3 The HCM/FCM Models and Basic AO Algorithms -- 6.4 Cluster Accuracy for Labeled Data -- 6.5 Choosing Model Parameters (c, m, ||*||A) -- 6.5.1 How to Pick the Number of Clusters c -- 6.5.2 How to Pick the Weighting Exponent m -- 6.5.3 Choosing the Weight Matrix (A) for the Model Norm -- 6.6 Choosing Execution Parameters (V0, ", ||*||err,T) -- 6.6.1 Choosing Termination and Iterate Limit Criteria -- 6.6.2 How to Pick an Initial V0 (or U0) -- 6.6.3 Acceleration Schemes for HCM (aka k-Means) and (FCM) -- 6.7 Cluster Validity With the Best c Method -- 6.7.1 Scale Normalization -- 6.7.2 Statistical Standardization -- 6.7.3 Stochastic Correction for Chance -- 6.7.4 Best c Validation With Internal CVIs -- 6.7.5 Crisp Cluster Validity Indices -- 6.7.6 Soft Cluster Validity Indices -- 6.8 Alternate Forms of Hard c-Means (aka k-Means) -- 6.8.1 Bounds on k-Means in Randomly Projected Downspaces -- 6.8.2 Matrix Factorization for HCM for Clustering -- 6.8.3 SVD: A Global Bound for J1 (U, V -- X) -- 6.9 Notes and Remarks -- 6.10 Exercises -- 7 Probabilistic Clustering - GMD/EM -- 7.1 Introduction -- 7.2 The Mixture Model -- 7.3 The Multivariate Normal Distribution -- 7.4 Gaussian Mixture Decomposition -- 7.5 The Basic EM Algorithm for GMD -- 7.6 Choosing Model and Execution Parameters for EM -- 7.6.1 Estimating c With iVAT -- 7.6.2 Choosing Q0 or P0 in GMD -- 7.6.3 Implementation Parameters ", ||*||err,T for GMD With EM -- 7.6.4 Acceleration Schemes for GMD With EM -- 7.7 Model Selection and Cluster Validity for GMD -- 7.7.1 Two Interpretations of the Objective of GMD -- 7.7.2 Choosing the Number of Components Using GMD/EM With GOFIs -- 7.7.3 Choosing the Number of Clusters Using GMD/EM With CVIs -- 7.8 Notes and Remarks -- 7.9 Exercises -- 8 Relational Clustering - The SAHN Models -- 8.1 Relations and Similarity Measures.
8.2 The SAHN Model and Algorithms -- 8.3 Choosing Model Parameters for SAHN Clustering -- 8.4 Dendrogram Representation of SAHN Clusters -- 8.5 SL Implemented With Minimal Spanning Trees -- 8.5.1 The Role of the MST in Single Linkage Clustering -- 8.5.2 SL Compared to a Fitch-Margoliash Dendrogram -- 8.5.3 Repairing SL Sensitivity to Inliers and Bridge Points -- 8.5.4 Acceleration of the Single Linkage Algorithm -- 8.6 Cluster Validity for Single Linkage -- 8.7 An Example Using All Four Basic Models -- 8.8 Notes and Remarks -- 8.9 Exercises -- 9 Properties of the Fantastic Four: External Cluster Validity -- 9.1 Introduction -- 9.2 Computational Complexity -- 9.2.1 Using Big-Oh to Measure the Growth of Functions -- 9.2.2 Time and Space Complexity for the Fantastic Four -- 9.3 Customizing the c-Means Models to Account for Cluster Shape -- 9.3.1 Variable Norm Methods -- 9.3.2 Variable Prototype Methods -- 9.4 Traversing the Partition Landscape -- 9.5 External Cluster Validity With Labeled Data -- 9.5.1 External Paired-Comparison Cluster Validity Indices -- 9.5.2 External Best Match (Best U, or Best E) Validation -- 9.5.3 The Fantastic Four Use Best E Evaluations on Labeled Data -- 9.6 Choosing an Internal CVI Using Internal/External (Best I/E) Correlation -- 9.7 Notes and Remarks -- 9.8 Problems -- 10 Alternating Optimization -- 10.1 Introduction -- 10.2 General Considerations on Numerical Optimization -- 10.2.1 Iterative Solution of Optimization Problems -- 10.2.2 Iterative Solution of Alternating Optimization with (t, s) Schemes -- 10.3 Local Convergence Theory for AO -- 10.4 Global Convergence Theory -- 10.5 Impact of the Theory for the c-Means Models -- 10.6 Convergence for GMD Using EM/AO -- 10.7 Notes and Remarks -- 10.8 Exercises -- 11 Clustering in Static Big Data -- 11.1 The Jungle of Big Data -- 11.1.1 An Overview of Big Data.
11.1.2 Scalability vs. Acceleration -- 11.2 Methods for Clustering in Big Data -- 11.3 Sampling Functions -- 11.3.1 Chunk Sampling -- 11.3.2 Random Sampling -- 11.3.3 Progressive Sampling -- 11.3.4 Maximin (MM) Sampling -- 11.3.5 Aggregation and Non-Iterative Extension of a Literal Partition to the Rest of the Data -- 11.4 A Sampler of Other Methods: Precursors to Streaming Data Analysis -- 11.5 Visualization of Big Static Data -- 11.6 Extending Single Linkage for Static Big Data -- 11.7 Notes and Remarks -- 11.8 Exercises -- 12 Structural Assessment in Streaming Data -- 12.1 Streaming Data Analysis -- 12.1.1 The Streaming Process -- 12.1.2 Computational Footprints -- 12.2 Streaming Clustering Algorithms -- 12.2.1 Sequential Hard c-Means and Sebestyen's Method -- 12.2.2 Extensions of Sequential Hard c-Means: BIRCH, CluStream, and DenStream -- 12.2.3 Model-Based Algorithms -- 12.2.4 Projection and Grid-Based Methods -- 12.3 Reading the Footprints: Hindsight Evaluation -- 12.3.1 When You Can See the Data and Footprints -- 12.3.2 When You Can't See the Data and Footprints -- 12.3.3 Change Point Detection -- 12.4 Dynamic Evaluation of Streaming Data Analysis -- 12.4.1 Incremental Stream Monitoring Functions (ISMFs) -- 12.4.2 Visualization of Streaming Data -- 12.5 What's Next for Streaming Data Analysis? -- 12.6 Notes and Remarks -- 12.7 Exercises -- References -- Index -- About the Author -- Back Cover.
isbn 9788770224246
callnumber-first Q - Science
callnumber-subject QA - Mathematics
callnumber-label QA278
callnumber-sort QA 3278.55
genre Electronic books.
genre_facet Electronic books.
url https://ebookcentral.proquest.com/lib/oeawat/detail.action?docID=29156150
illustrated Not Illustrated
dewey-hundreds 500 - Science
dewey-tens 510 - Mathematics
dewey-ones 519 - Probabilities & applied mathematics
dewey-full 519.53028557
dewey-sort 3519.53028557
dewey-raw 519.53028557
dewey-search 519.53028557
oclc_num 1311313906
work_keys_str_mv AT bezdekjamesc elementaryclusteranalysisfourbasicmethodsthatusuallywork
status_str n
ids_txt_mv (MiAaPQ)50029156150
(Au-PeEL)EBL29156150
(OCoLC)1311313906
carrierType_str_mv cr
is_hierarchy_title Elementary Cluster Analysis : Four Basic Methods That (Usually) Work.
marc_error Info : Unimarc and ISO-8859-1 translations identical, choosing ISO-8859-1. --- [ 856 : z ]
_version_ 1792331069675012097
fullrecord <?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim"><record><leader>08642nam a22004333i 4500</leader><controlfield tag="001">50029156150</controlfield><controlfield tag="003">MiAaPQ</controlfield><controlfield tag="005">20240229073849.0</controlfield><controlfield tag="006">m o d | </controlfield><controlfield tag="007">cr cnu||||||||</controlfield><controlfield tag="008">240229s2022 xx o ||||0 eng d</controlfield><datafield tag="020" ind1=" " ind2=" "><subfield code="a">9788770224246</subfield><subfield code="q">(electronic bk.)</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(MiAaPQ)50029156150</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(Au-PeEL)EBL29156150</subfield></datafield><datafield tag="035" ind1=" " ind2=" "><subfield code="a">(OCoLC)1311313906</subfield></datafield><datafield tag="040" ind1=" " ind2=" "><subfield code="a">MiAaPQ</subfield><subfield code="b">eng</subfield><subfield code="e">rda</subfield><subfield code="e">pn</subfield><subfield code="c">MiAaPQ</subfield><subfield code="d">MiAaPQ</subfield></datafield><datafield tag="050" ind1=" " ind2="4"><subfield code="a">QA278.55</subfield></datafield><datafield tag="082" ind1="0" ind2=" "><subfield code="a">519.53028557</subfield></datafield><datafield tag="100" ind1="1" ind2=" "><subfield code="a">Bezdek, James C.</subfield></datafield><datafield tag="245" ind1="1" ind2="0"><subfield code="a">Elementary Cluster Analysis :</subfield><subfield code="b">Four Basic Methods That (Usually) Work.</subfield></datafield><datafield tag="250" ind1=" " ind2=" "><subfield code="a">1st ed.</subfield></datafield><datafield tag="264" ind1=" " ind2="1"><subfield code="a">Denmark :</subfield><subfield code="b">River Publishers,</subfield><subfield code="c">2022.</subfield></datafield><datafield tag="264" ind1=" " ind2="4"><subfield code="c">Ã2022.</subfield></datafield><datafield tag="300" ind1=" " ind2=" "><subfield code="a">1 online resource (518 pages)</subfield></datafield><datafield tag="336" ind1=" " ind2=" "><subfield code="a">text</subfield><subfield code="b">txt</subfield><subfield code="2">rdacontent</subfield></datafield><datafield tag="337" ind1=" " ind2=" "><subfield code="a">computer</subfield><subfield code="b">c</subfield><subfield code="2">rdamedia</subfield></datafield><datafield tag="338" ind1=" " ind2=" "><subfield code="a">online resource</subfield><subfield code="b">cr</subfield><subfield code="2">rdacarrier</subfield></datafield><datafield tag="505" ind1="0" ind2=" "><subfield code="a">Front Cover -- Elementary Cluster Analysis: Four Basic Methods that (Usually) Work -- Contents -- Preface -- List of Figures -- List of Tables -- List of Abbreviations -- Appendix A. List of Algorithms -- Appendix D. List of Definitions -- Appendix E. List of Examples -- Appendix L. List of Lemmas and Theorems -- Appendix V. List of Video Links -- I The Art and Science of Clustering -- 1 Clusters: The Human Point of View (HPOV) -- 1.1 Introduction -- 1.2 What are Clusters? -- 1.3 Notes and Remarks -- 1.4 Exercises -- 2 Uncertainty: Fuzzy Sets and Models -- 2.1 Introduction -- 2.2 Fuzzy Sets and Models -- 2.3 Fuzziness and Probability -- 2.4 Notes and Remarks -- 2.5 Exercises -- 3 Clusters: The Computer Point of View (CPOV) -- 3.1 Introduction -- 3.2 Label Vectors -- 3.3 Partition Matrices -- 3.4 How Many Clusters are Present in a Data Set? -- 3.5 CPOV Clusters: The Computer's Point of View -- 3.6 Notes and Remarks -- 3.7 Exercises -- 4 The Three Canonical Problems -- 4.1 Introduction -- 4.2 Tendency Assessment - (Are There Clusters?) -- 4.2.1 An Overview of Tendency Assessment -- 4.2.2 Minimal Spanning Trees (MSTs) -- 4.2.3 Visual Assessment of Clustering Tendency -- 4.2.4 The VAT and iVAT Reordering Algorithms -- 4.3 Clustering (Partitioning the Data into Clusters) -- 4.4 Cluster Validity (Which Clusters are "Best"?) -- 4.5 Notes and Remarks -- 4.6 Exercises -- 5 Feature Analysis -- 5.1 Introduction -- 5.2 Feature Nomination -- 5.3 Feature Analysis -- 5.4 Feature Selection -- 5.5 Feature Extraction -- 5.5.1 Principal Components Analysis -- 5.5.2 Random Projection -- 5.5.3 Sammon's Algorithm -- 5.5.4 Autoencoders -- 5.5.5 Relational Data -- 5.6 Normalization and Statistical Standardization -- 5.7 Notes and Remarks -- 5.8 Exercises -- II Four Basic Models and Algorithms -- 6 The c-Means (aka k-Means) Models -- 6.1 Introduction.</subfield></datafield><datafield tag="505" ind1="8" ind2=" "><subfield code="a">6.2 The Geometry of Partition Spaces -- 6.3 The HCM/FCM Models and Basic AO Algorithms -- 6.4 Cluster Accuracy for Labeled Data -- 6.5 Choosing Model Parameters (c, m, ||*||A) -- 6.5.1 How to Pick the Number of Clusters c -- 6.5.2 How to Pick the Weighting Exponent m -- 6.5.3 Choosing the Weight Matrix (A) for the Model Norm -- 6.6 Choosing Execution Parameters (V0, ", ||*||err,T) -- 6.6.1 Choosing Termination and Iterate Limit Criteria -- 6.6.2 How to Pick an Initial V0 (or U0) -- 6.6.3 Acceleration Schemes for HCM (aka k-Means) and (FCM) -- 6.7 Cluster Validity With the Best c Method -- 6.7.1 Scale Normalization -- 6.7.2 Statistical Standardization -- 6.7.3 Stochastic Correction for Chance -- 6.7.4 Best c Validation With Internal CVIs -- 6.7.5 Crisp Cluster Validity Indices -- 6.7.6 Soft Cluster Validity Indices -- 6.8 Alternate Forms of Hard c-Means (aka k-Means) -- 6.8.1 Bounds on k-Means in Randomly Projected Downspaces -- 6.8.2 Matrix Factorization for HCM for Clustering -- 6.8.3 SVD: A Global Bound for J1 (U, V -- X) -- 6.9 Notes and Remarks -- 6.10 Exercises -- 7 Probabilistic Clustering - GMD/EM -- 7.1 Introduction -- 7.2 The Mixture Model -- 7.3 The Multivariate Normal Distribution -- 7.4 Gaussian Mixture Decomposition -- 7.5 The Basic EM Algorithm for GMD -- 7.6 Choosing Model and Execution Parameters for EM -- 7.6.1 Estimating c With iVAT -- 7.6.2 Choosing Q0 or P0 in GMD -- 7.6.3 Implementation Parameters ", ||*||err,T for GMD With EM -- 7.6.4 Acceleration Schemes for GMD With EM -- 7.7 Model Selection and Cluster Validity for GMD -- 7.7.1 Two Interpretations of the Objective of GMD -- 7.7.2 Choosing the Number of Components Using GMD/EM With GOFIs -- 7.7.3 Choosing the Number of Clusters Using GMD/EM With CVIs -- 7.8 Notes and Remarks -- 7.9 Exercises -- 8 Relational Clustering - The SAHN Models -- 8.1 Relations and Similarity Measures.</subfield></datafield><datafield tag="505" ind1="8" ind2=" "><subfield code="a">8.2 The SAHN Model and Algorithms -- 8.3 Choosing Model Parameters for SAHN Clustering -- 8.4 Dendrogram Representation of SAHN Clusters -- 8.5 SL Implemented With Minimal Spanning Trees -- 8.5.1 The Role of the MST in Single Linkage Clustering -- 8.5.2 SL Compared to a Fitch-Margoliash Dendrogram -- 8.5.3 Repairing SL Sensitivity to Inliers and Bridge Points -- 8.5.4 Acceleration of the Single Linkage Algorithm -- 8.6 Cluster Validity for Single Linkage -- 8.7 An Example Using All Four Basic Models -- 8.8 Notes and Remarks -- 8.9 Exercises -- 9 Properties of the Fantastic Four: External Cluster Validity -- 9.1 Introduction -- 9.2 Computational Complexity -- 9.2.1 Using Big-Oh to Measure the Growth of Functions -- 9.2.2 Time and Space Complexity for the Fantastic Four -- 9.3 Customizing the c-Means Models to Account for Cluster Shape -- 9.3.1 Variable Norm Methods -- 9.3.2 Variable Prototype Methods -- 9.4 Traversing the Partition Landscape -- 9.5 External Cluster Validity With Labeled Data -- 9.5.1 External Paired-Comparison Cluster Validity Indices -- 9.5.2 External Best Match (Best U, or Best E) Validation -- 9.5.3 The Fantastic Four Use Best E Evaluations on Labeled Data -- 9.6 Choosing an Internal CVI Using Internal/External (Best I/E) Correlation -- 9.7 Notes and Remarks -- 9.8 Problems -- 10 Alternating Optimization -- 10.1 Introduction -- 10.2 General Considerations on Numerical Optimization -- 10.2.1 Iterative Solution of Optimization Problems -- 10.2.2 Iterative Solution of Alternating Optimization with (t, s) Schemes -- 10.3 Local Convergence Theory for AO -- 10.4 Global Convergence Theory -- 10.5 Impact of the Theory for the c-Means Models -- 10.6 Convergence for GMD Using EM/AO -- 10.7 Notes and Remarks -- 10.8 Exercises -- 11 Clustering in Static Big Data -- 11.1 The Jungle of Big Data -- 11.1.1 An Overview of Big Data.</subfield></datafield><datafield tag="505" ind1="8" ind2=" "><subfield code="a">11.1.2 Scalability vs. Acceleration -- 11.2 Methods for Clustering in Big Data -- 11.3 Sampling Functions -- 11.3.1 Chunk Sampling -- 11.3.2 Random Sampling -- 11.3.3 Progressive Sampling -- 11.3.4 Maximin (MM) Sampling -- 11.3.5 Aggregation and Non-Iterative Extension of a Literal Partition to the Rest of the Data -- 11.4 A Sampler of Other Methods: Precursors to Streaming Data Analysis -- 11.5 Visualization of Big Static Data -- 11.6 Extending Single Linkage for Static Big Data -- 11.7 Notes and Remarks -- 11.8 Exercises -- 12 Structural Assessment in Streaming Data -- 12.1 Streaming Data Analysis -- 12.1.1 The Streaming Process -- 12.1.2 Computational Footprints -- 12.2 Streaming Clustering Algorithms -- 12.2.1 Sequential Hard c-Means and Sebestyen's Method -- 12.2.2 Extensions of Sequential Hard c-Means: BIRCH, CluStream, and DenStream -- 12.2.3 Model-Based Algorithms -- 12.2.4 Projection and Grid-Based Methods -- 12.3 Reading the Footprints: Hindsight Evaluation -- 12.3.1 When You Can See the Data and Footprints -- 12.3.2 When You Can't See the Data and Footprints -- 12.3.3 Change Point Detection -- 12.4 Dynamic Evaluation of Streaming Data Analysis -- 12.4.1 Incremental Stream Monitoring Functions (ISMFs) -- 12.4.2 Visualization of Streaming Data -- 12.5 What's Next for Streaming Data Analysis? -- 12.6 Notes and Remarks -- 12.7 Exercises -- References -- Index -- About the Author -- Back Cover.</subfield></datafield><datafield tag="588" ind1=" " ind2=" "><subfield code="a">Description based on publisher supplied metadata and other sources.</subfield></datafield><datafield tag="590" ind1=" " ind2=" "><subfield code="a">Electronic reproduction. Ann Arbor, Michigan : ProQuest Ebook Central, 2024. Available via World Wide Web. Access may be limited to ProQuest Ebook Central affiliated libraries. </subfield></datafield><datafield tag="650" ind1=" " ind2="0"><subfield code="a">Cluster analysis.</subfield></datafield><datafield tag="650" ind1=" " ind2="0"><subfield code="a">Cluster analysis--Data processing.</subfield></datafield><datafield tag="655" ind1=" " ind2="4"><subfield code="a">Electronic books.</subfield></datafield><datafield tag="776" ind1="0" ind2="8"><subfield code="i">Print version:</subfield><subfield code="a">Bezdek, James C.</subfield><subfield code="t">Elementary Cluster Analysis: Four Basic Methods That (Usually) Work</subfield><subfield code="d">Denmark : River Publishers,c2022</subfield></datafield><datafield tag="797" ind1="2" ind2=" "><subfield code="a">ProQuest (Firm)</subfield></datafield><datafield tag="856" ind1="4" ind2="0"><subfield code="u">https://ebookcentral.proquest.com/lib/oeawat/detail.action?docID=29156150</subfield><subfield code="z">Click to View</subfield></datafield></record></collection>