ASCL.net

Astrophysics Source Code Library

Making codes discoverable since 1999

Searching for codes credited to 'Baron, Dalya'

Tip! Refine or expand your search. Authors are sometimes listed as 'Smith, J. K.' instead of 'Smith, John' so it is useful to search for last names only. Note this is currently a simple phrase search.

[ascl:1705.015] WeirdestGalaxies: Outlier Detection Algorithm on Galaxy Spectra

WeirdestGalaxies finds the weirdest galaxies in the Sloan Digital Sky Survey (SDSS) by using a basic outlier detection algorithm. It uses an unsupervised Random Forest (RF) algorithm to assign a similarity measure (or distance) between every pair of galaxy spectra in the SDSS. It then uses the distance matrix to find the galaxies that have the largest distance, on average, from the rest of the galaxies in the sample, and defined them as outliers.

[ascl:1903.009] PRF: Probabilistic Random Forest

PRF (Probabilistic Random Forest) is a machine learning algorithm for noisy datasets. The PRF is a modification of the long-established Random Forest (RF) algorithm, and takes into account uncertainties in the measurements (i.e., features) as well as in the assigned classes (i.e., labels). To do so, the Probabilistic Random Forest (PRF) algorithm treats the features and labels as probability distribution functions, rather than as deterministic quantities.

[ascl:2105.006] The Sequencer: Detect one-dimensional sequences in complex datasets

The Sequencer reveals the main sequence in a dataset if one exists. To do so, it reorders objects within a set to produce the most elongated manifold describing their similarities which are measured in a multi-scale manner and using a collection of metrics. To be generic, it combines information from four different metrics: the Euclidean Distance, the Kullback-Leibler Divergence, the Monge-Wasserstein or Earth Mover Distance, and the Energy Distance. It considers different scales of the data by dividing each object in the input data into separate parts (chunks), and estimating pair-wise similarities between the chunks. It then aggregates the information in each of the chunks into a single estimator for each metric+scale.