Astrophysics Source Code Library

Making codes discoverable since 1999

ASCL Code Record

[ascl:2301.004] HEADSS: HiErArchical Data Splitting and Stitching for non-distributed clustering algorithms

HEADSS (HiErArchical Data Splitting and Stitching) facilitates clustering at scale, unlike clustering algorithms that scale poorly with increased data volume or that are intrinsically non-distributed. HEADSS automates data splitting and stitching, allowing repeatable handling, and removal, of edge effects. Implemented in conjunction with scikit's HDBSCAN, the code achieves orders of magnitude reduction in single node memory requirements for both non-distributed and distributed implementations, with the latter offering similar order of magnitude reductions in total run times while recovering analogous accuracy. HEADSS also establishes a hierarchy of features by using a subset of clustering features to split the data.

Code site:

Views: 1557

Add this shield to your page
Copy the above HTML to add this shield to your code's website.