Data modelling approaches to astronomical data - Mapping large spectral line data cubes to dimensional data models

Discussion topics for individual codes
Post Reply
Ada Coda
ASCL Robot
Posts: 2161
Joined: Thu May 08, 2014 5:37 am

Data modelling approaches to astronomical data - Mapping large spectral line data cubes to dimensional data models

Post by Ada Coda » Wed Nov 17, 2021 10:03 pm

Data modelling approaches to astronomical data - Mapping large spectral line data cubes to dimensional data models

Abstract: As a new generation of large-scale telescopes are expected to produce single data products in the range of hundreds of GBs to multiple TBs, different approaches to I/O efficient data interaction and extraction need to be investigated and made available to researchers. This will become increasingly important as the downloading and distribution of TB scale data products will become unsustainable, and researchers will have to take their processing analysis to the data. We present a methodology to extract 3 dimensional spatial-spectral data from dimensionally modelled tables in Parquet format on a Hadoop system. The data is loaded into the Parquet tables from FITS cube files using a dedicated process. We compare the performance of extracting data using the Apache Spark parallel compute framework on top of the Parquet-Hadoop ecosystem with data extraction from the original source files on a shared file system. We have found that the Spark-Parquet-Hadoop solution provides significant performance benefits, particularly in a multi user environment. We present a detailed analysis of the single and multi-user experiments conducted and also discuss the benefits and limitations of the platform used for this study.

Credit: Duniam, Geoff;mKitaeff, Vyacheslav V.; Wicenec, Ande\reas

Site: https://github.com/GeoffDuniam/FITS-Cod ... master/ETL

Post Reply