How accurate is K-means clustering as predictor of subcategories in the metal-metalloid-nonmetal trend?
I started with a database containing some of the properties of the chemical elements: Atomic number, atomic weight, electronegativity, ionization potential, atomic radius, atomic volume, boiling point, covalent radius, density, heat of fusion, heat of vaporization, melting point, and specific heat capacity. A quick cleaning of the data gave me a reduced dataset with 80 elements.
I applied Lloyd’s clustering algorithm to this dataset, imposing 10 clusters. My hope was that the K-means method would be accurate enough to put together all elements from the same metal-metalloid-nonmetal trend. In other words, I expected each cluster to match one of the following: Alkali metals, alkaline earth metals, lanthanides, actinides, transition metals, post-transition metals, metalloids, polyatomic nonmetals, diatomic nonmetals, or noble gasses.
I included an interactive Bokeh diagram with the results. This diagram offers four possible projections of the data: isometric map, principal component analysis, spectral embedding, and local linear embedding. The tabs in the menu of the diagram allow to switch among the four computed projections. In any case, the chart presents all clustered elements as points with the same color. Hovering over any point offers a tooltip with both the name and the atomic number of the corresponding chemical element. Using zoom or box-zoom capabilities to isolate different clusters enhances the information obtained.
Note how the orange cluster gathers all noble gases together—although some extra elements have been included in this group.