Projection-Based Clustering Through Self-Organization and Swarm Intelligence : : Combining Cluster Analysis with the Visualization of High-Dimensional Data.

Saved in:
Bibliographic Details
:
Place / Publishing House:Wiesbaden : : Springer Fachmedien Wiesbaden GmbH,, 2018.
©2018.
Year of Publication:2018
Edition:1st ed.
Language:English
Online Access:
Physical Description:1 online resource (210 pages)
Tags: Add Tag
No Tags, Be the first to tag this record!
Table of Contents:
  • Intro
  • Acknowledgments
  • Table of contents
  • List of figures
  • List of tables
  • Zusammenfassung
  • Abstract
  • 1 Introduction
  • 2 Fundamentals
  • 2.1 Basic Definitions
  • 2.2 Concepts of Graph Theory Applied to Patterns
  • 2.3 Overview of Knowledge Discovery
  • 2.3.1 Feature Selection
  • 2.3.2 Preprocessing
  • 2.3.3 Feature Extraction
  • 2.3.3.1 Transformations
  • 2.3.3.2 Dimensionality Reduction
  • 2.3.4 Cluster Analysis
  • 2.3.5 An Approach to Knowledge Acquisition
  • 3 Approaches to Cluster Analysis
  • 3.1 Common Clustering Methods
  • 3.2 Structure of Natural Clusters
  • 3.2.1 Types of Structures Sought by Clustering Algorithms
  • 3.2.2 Quality of Clustering
  • 3.2.2.1 Heatmaps
  • 3.2.2.2 Silhouette plots
  • 3.3 Problems with Clustering Methods
  • 4 Methods of Projection
  • 4.1 Common Approaches
  • 4.1.1 Principal Component Analysis (PCA)
  • 4.1.2 Independent Component Analysis (ICA)
  • 4.1.3 Non-linear metric multidimensional scaling (MDS) techniques
  • 4.1.4 Curvilinear Component Analysis (CCA)
  • 4.1.5 t-Distributed Stochastic Neighbor Embedding (t-SNE)
  • 4.1.6 Neighborhood Retrieval Visualizer (NeRV)
  • 4.2 Emergent Self-Organizing Map (ESOM)
  • 4.2.1 Visualizations of SOMs
  • 4.2.2 Clustering with ESOM
  • 4.3 Types of Projection Methods
  • 5 Visualizing the Output Space
  • 5.1 Examples
  • 5.2 Structure Preservation
  • 5.3 Generating a Topographic Map from the Generalized U*-matrix
  • 5.3.1 Simplified ESOM
  • 5.3.2 U*-Matrix Calculation
  • 5.3.3 Topographic Map with Hypsometric Tints
  • 5.3.4 Limitations
  • 6 Quality Assessments of Visualizations
  • 6.1 Common Quality Measures (QMs)
  • 6.1.1 Classification Error (CE)
  • 6.1.2 C Measure
  • 6.1.3 Two Variants of the C Measure: Minimal Path Length and Minimal Wiring
  • 6.1.4 Force Approach Error
  • 6.1.5 König's Measure
  • 6.1.6 Local Continuity Meta-Criterion (LCMC).
  • 6.1.7 Mean Relative Rank Error (MRRE) and the Co-ranking Matrix
  • 6.1.8 Precision and Recall
  • 6.1.9 Rescaled Average Agreement Rate (RAAR)
  • 6.1.10 Stress and the Shepard Diagram
  • 6.1.11 Topographic Product
  • 6.1.12 Topographic Function (TF)
  • 6.1.13 Trustworthiness and Discontinuity (T&amp
  • D)
  • 6.1.14 U-ranking
  • 6.1.15 Overall Correlations: Topological Index (TI) and Topological Correlation (TC)
  • 6.1.16 Zrehen's Measure
  • 6.2 Types of Quality Measures for Assessing Structure Preservation
  • 6.2.1 Theoretical Assessment of Quality Measures
  • 6.2.2 Practical Assessment of Quality Measures
  • 6.3 Introducing the Delaunay Classification Error (DCE)
  • 6.3.1 Summary
  • 7 Behavior-based Systems in Data Science
  • 7.1 Artificial Behavior Based on DataBots
  • 7.1.1 Swarm-Organized Projection (SOP)
  • 7.2 Swarm Intelligence for Unsupervised Machine Learning
  • 7.3 Missing Links: Emergence and Game Theory
  • 8 Databionic Swarm (DBS)
  • 8.1 Projection with Pswarm
  • 8.1.1 Motivation: Game Theory
  • 8.1.2 Symmetry Considerations
  • 8.1.3 Algorithm
  • 8.1.4 Data-driven Annealing Scheme
  • 8.1.5 Annealing Interval
  • 8.1.6 Convergence
  • 8.2 Comparing Pswarm with a Previously Developed Approach
  • 8.2.1 Neighborhood Definition
  • 8.2.2 Annealing Scheme
  • 8.2.3 Swarm Intelligence and Self-Organization
  • 8.3 Clustering on a Generalized U*-Matrix
  • 9 Experimental Methodology
  • 9.1 Data Sets
  • 9.1.1 Atom
  • 9.1.2 Chainlink
  • 9.1.3 EngyTime
  • 9.1.4 Golf Ball
  • 9.1.5 Hepta
  • 9.1.6 Iris
  • 9.1.7 Leukemia
  • 9.1.8 Lsun3D
  • 9.1.9 S-shape
  • 9.1.10 Swiss Banknotes
  • 9.1.11 Target
  • 9.1.12 Tetra
  • 9.1.13 Tetragonula
  • 9.1.14 Cuboid
  • 9.1.15 Two Diamonds
  • 9.1.16 Wine
  • 9.1.17 Wing Nut
  • 9.1.18 World Gross Domestic Product (World GDP)
  • 9.2 Parameter Settings
  • 9.2.1 Quality Measures (QMs)
  • 9.2.2 Projection Methods.
  • 9.2.2.1 Swarm-Organized Projection (SOP)
  • 9.2.2.2 Pswarm
  • 9.2.3 Common clustering algorithms
  • 9.3 Gene Ontology (GO)
  • 9.3.1 Overrepresentation Analysis (ORA)
  • 9.3.2 Filtering via ABC Analysis
  • 10 Results on Pre-classified Data Sets
  • 10.1 Comparison with Given Classifications
  • 10.1.1 Recognition of the Absence of Clusters
  • 10.2 Evaluation of Projections Using the Delaunay Classification Error (DCE)
  • 10.3 Topographic Maps with Hypsometric Colors
  • 11 DBS on Natural Data Sets
  • 11.1 Types of Leukemia
  • 11.2 World Gross Domestic Product (World GDP)
  • 11.3 Tetragonula Bees
  • 12 Knowledge Discovery with DBS
  • 12.1 Hydrology
  • 12.1.1 Knowledge Acquisition and Prediction in the Hydrology Data Set
  • 12.2 Pain Genes
  • 12.2.1 Prior Knowledge
  • 12.2.2 Knowledge Acquisition in Clusters of Pain Genes
  • 13 Discussion
  • 14 Conclusion
  • References
  • Appendices
  • Supplement A: Evaluation of Common QMs
  • Supplement B: Wine Dataset Distance Distribution
  • Supplement C: Generalized Umatrix of Pswarm and SOP
  • Supplement D: DBS Visualizations of S-shape and uniform Cuboid
  • Supplement E: U-Matrix Visualizations of ESOM Projections
  • Supplement F: Statistical Tests in Hydrology
  • Supplement G: 3D Prints of Generalized Umatrix Visualizations of DBS
  • Supplement H: Contingency Table for Tetragonula Bees Clustering
  • Index.