ASCL.net

Astrophysics Source Code Library

Making codes discoverable since 1999

Searching for codes credited to 'Crake, Dennis A.'

Tip! Refine or expand your search. Authors are sometimes listed as 'Smith, J. K.' instead of 'Smith, John' so it is useful to search for last names only. Note this is currently a simple phrase search.

[ascl:2301.004] HEADSS: HiErArchical Data Splitting and Stitching for non-distributed clustering algorithms

HEADSS (HiErArchical Data Splitting and Stitching) facilitates clustering at scale, unlike clustering algorithms that scale poorly with increased data volume or that are intrinsically non-distributed. HEADSS automates data splitting and stitching, allowing repeatable handling, and removal, of edge effects. Implemented in conjunction with scikit's HDBSCAN, the code achieves orders of magnitude reduction in single node memory requirements for both non-distributed and distributed implementations, with the latter offering similar order of magnitude reductions in total run times while recovering analogous accuracy. HEADSS also establishes a hierarchy of features by using a subset of clustering features to split the data.