Solar activity classification based on Mg II spectra: towards classification on compressed data

Discussion topics for individual codes
Post Reply
Ada Coda
ASCL Robot
Posts: 1702
Joined: Thu May 08, 2014 5:37 am

Solar activity classification based on Mg II spectra: towards classification on compressed data

Post by Ada Coda » Tue Sep 15, 2020 9:19 am

Solar activity classification based on Mg II spectra: towards classification on compressed data

Abstract: Although large volumes of solar data are available for investigation and study, the vast majority of these data remain unlabeled and are therefore not amenable to modern supervised machine learning methods. Having a way to accurately and automatically classify spectra into categories related to the degree of solar activity is highly desirable and will assist and speed up future research efforts in solar physics. At the same time, the large volume of raw observational data is a serious bottleneck for machine learning, requiring powerful computational means that are not at the disposal of many laboratories. Additionally, the raw data communication imposes some restrictions on real time data observations and requires considerable bandwidth and energy for the onboard solar observation systems. To cope with the above mentioned issues, we propose a framework to classify solar activity on compressed data. To this end, we used a labeling scheme from a pre-existing vector quantization technique in conjunction with several machine learning algorithms to categorize Mg II spectra measured by NASA’s small explorer satellite IRIS into several groups characterizing solar activity. Our training data set is a human annotated list of 85 IRIS observations containing 29097 frames in total or equivalently 9 million Mg II spectra. The annotated types of Solar activity are: active region, pre-flare activity, Solar flare, Sunspot and quiet Sun. We used the vector quantization to compress these data and to reduce its complexity before training classifiers. From a host of classifiers, we found that the XGBoost classifier produced the most accurate results on the compressed data, yielding over a 95% prediction rate, and outperforming other ML methods like convolution neural networks, K-nearest neighbors, naive Bayes classifiers and support vector machines. A principle finding of this research is that the classification performance on compressed and uncompressed data is comparable under our particular architecture, implying the possibility of large compression rates for relatively low degrees of information loss.

Credit: Tsizh, Maksym; Ullmann, Denis

Site: https://github.com/DenisUllmann/Solar-a ... essed-data
Last edited by Ada Coda on Sun Sep 27, 2020 8:21 pm, edited 1 time in total.
Reason: Updated code entry.

Post Reply