Software in Astronomy Symposium Presentations, Part 5

This is the fifth in a series of posts on the six-session Software in Astronomy Symposium held on Wednesday and Thursday, April 3-4 at the 2018 EWASS/NAM meeting.

BLOCK 5: Machine Learning & Data Mining
Stephen Serjeant (Open University, UK) moderated the fifth session. This session presented different techniques to, for example, study noise in gravitational wave interferometers, select young stellar object candidates, and directly image exoplanets. David Cornu’s (UTINAM, FR) talk, titled Selection of Spitzer YSO candidates using deep learning classifier, included a short tutorial for creating an artificial neural network, showing how a small neuron takes input vectors and updates weights associated with them to understand anti- and co-relation between various factors. Similarly, Carlos Alberto Gomez Gonzalez (U Grenoble Alpes) showed how supervised machine learning can be used to detect exoplanets in his presentation Data science for direct imaging of exoplanets. Other talks in this session included Massimiliano Razzano (INFN, IT), Deep learning to study the noise in gravitational wave interferometers, Sebastian Turner (LJMU, UK) presenting on k-means clustering in galaxy feature data, and Emille Ishida (COIN, FR) presenting for Santiago Gonzalez Gaitan, with Spatial inference of astronomical datasets with INLA. Robert Lyon (UManchester, UK) finished this session with a presentation on Imbalanced learning in astronomy, and provides a Jupyter notebook containing a tutorial and examples. The presentations in this session were accessible even to those with no experience in data mining and machine learning, as the techniques used were explained quickly and well before moving on to how they enabled particular research.

Slides from this session

Selection of Spitzer YSO candidates using deep learning classifier by David Cornu (pdf)

Data science for direct imaging of exoplanets by Carlos Alberto Gomez Gonzalez (pdf)

k-means clustering in galaxy feature data by Seb Turner (pdf)

Spatial inference of astronomical datasets with INLA by Emille Ishida/Santiago Gonzalez Gaitan (pdf)

Imbalanced learning in astronomy by Rob Lyon (pptx)

Software in Astronomy Symposium Presentations, Part 4

This is the fourth in a series of posts on the six-session Software in Astronomy Symposium held on Wednesday and Thursday, April 3-4 at the 2018 EWASS/NAM meeting.
BLOCK 4: Open & Transparent Data Services
Astronomy leads most sciences in providing many open services, particularly data and ways to get access to and use data. This session, moderated by Andy Pollock (USheffield, UK), highlighted some of the new and ongoing services available to not just professional astronomers, but also to students and other interested parties. Debbie Baines (ESA, ES) opened the session with her presentation ESASky version 2: the next generation, which included a live demonstration of this incredible resource. ESASky allows searching for any astronomical object and offers viewing it in different wavelengths. It is very fast, too; Baines loaded images from Herschel, XMM, Chandra, and HST in her live demo. It offers relevant links to SIMBAD, such as to papers, and one can pull up research literature from ADS in ESASky, too. Rachael Ainsworth (UMelbourne, AU), originally scheduled to give the first presentation, followed Baines; her presentation, Open Science in Astronomy, gave an overview of openness in astronomy. She pointed out that astronomy is better at open science than many other fields, crediting, among other services, arXiv and GalaxyZoo. She covered some of the challenges and shared additional resources that support, encourage, or make open science possible. Jorge Palacios (IFAE, ES), in his talk Astronomy in a Big Data platform, discussed two services, Cosmohub, a web portal for interactive exploration and distribution of massive cosmological data, and SciPIC (Scientific pipeline at PIC), software for generating synthetic galaxy catalogs using DM simulations. In An Interactive Sky Map based on the Byurakan Plate Archive, Gor Mikayelyan (BAO, AM) shared that the NAS RA Byurakan Astrophysical Observatory in Armenia is making its plate archive, covering 1947 -1991, available online in the BAO Observational Database, which will have the ability to search and select and will allow downloading the plates in different formats. A sky map will show where plates are available. After the last two talks, The ASI Cosmic Ray Database for charged particles data by Valeria Di Felice (SSDC/INFN, IT) and Using XML and semantic technologies in astroinformatics to manage data by Guy Beech (UHuddersfield, UK), Pollock opened the floor for discussion and asked whether people could do what they’d like to do as efficiently as they’d like. One answer from the audience was no, because the data are heterogeneous and different ways are required to access them. One pointed observation arising from the back-and-forth was that “radio astronomy is decades behind” in terms of software services, with “thousands of different file formats processed by thousands of different programs”.

Slides from this session

Open Science in Astronomy by Rachael Ainsworth (pdf)

Astronomy in a Big Data platform by Jorge Palacios (pdf)

An Interactive Sky Map based on the Byurakan Plate Archive by Gor Mikayelyan (pdf) | Text (pdf)

The ASI Cosmic Ray Database for charged particles data by Valeria Di Felice (pdf)

Using XML and semantic technologies in astroinformatics to manage data by Guy Beech (pdf) | Paper (pdf)

Software in Astronomy Symposium Presentations, Part 3

This is the third in a series of posts on the six-session Software in Astronomy Symposium held on Wednesday and Thursday, April 3-4 at the 2018 EWASS/NAM meeting.

BLOCK 3: Software packages for research
Amruta Jaodand
(ASTRON, NL) moderated the third session of the Symposium, which featured talks on seven different software packages. The first presentation was by Sergio Martin (ESO, CL), on MADCUBA and SLIM: A lightweight software package for datacube handling and spectral line analysis. These software packages offer features for easier use of the ImageJ image processing framework; this framework is widely used in other disciplines for multidimentional images. SLIM provides a synthetic spectra generation and automatic fit to observed spectra; MADCUBA provides an interface to handle and manipulate multiple datacubes, process large datasets quickly, and is scriptable. Alex Hamilton (UHull, UK) presented SunPy the Open Source Solar Physics Library, giving an introduction to the package, which collects, manages, and analyzes data from many solar data sources, and sharing information on its new features and capabilities.

As one of this session’s presenters was doubled-booked, we made adjustments to the order of presentations, so The next-generation cosmological code SWIFT was the next talk; this was given by Matthieu Schaller (LeidenU, NL). This package, due for release this summer, uses task-based parallelism for intra-node parallelization; testing has demonstrated that it is more than 30x faster than the code Gadget on representative cosmological problems while using fewer resources. The next talk had generated a lot of interest, as it is not a usual astronomy conference offering. Maisie Rashman (LJMU, UK) spoke about Developing and applying astronomical software for novel use in conservation biology. The team Rashman works on has developed a pipeline using astronomy techniques to identify and track animals; their goal is to create a fully automated system for species identification, population tracking, and combating poaching using drones. After Rashman, Shane Maloney (Trinity College, IE) presented xrayvision – a collection of image reconstructions methods for X-ray visibility observations. The xrayvision package is built atop other packages, including SunPy, and fills a need for an open-source solution for these observations. One of the project’s goals is to provide access to people in poorer countries to software for solar physics. The cleverly-named software package PampelMuse was presented by its author Sebastian Kamann (LJMU, UK) in his talk Crowded-field 3D spectroscopy with PampelMuse. This Python software performs PSF deblending on the integral-field data; several thousand sources can be deblended simultaneously. The final talk of this packed and fast-paced session was by Matteo Bachetti (INAF, IT) on Stingray, HENDRICS and Dave: Spectral Timing for all. Stingray is an AstroPy affiliated package that merges timing and spectral analysis for X-ray spectral timing. The idea for spectral timing arose in a meeting at Leiden University two years ago, where a team formed and started working on the project; the code is under active development. By combining Stingray with the HENDRICS (for shell scripting) and Dave codes (a GUI atop Stingray), the team will provide the community with a Python API (for the brave), a GUI to ease the learning curve, and shell scripting capabilities for batch processing, for advanced spectral timing with a correct statistical framework.

Slides from this session

MADCUBA and SLIM: A lightweight software package for datacube handling and spectral line analysis by Sergio Martin (pdf)

The next-generation cosmological code SWIFT by Mattheiu Schaller (pdf)

Stingray, HENDRICS and Dave: Spectral Timing for all by Matteo Bachetti (pdf)

Software in Astronomy Symposium Presentations, Part 2

This is the second in a series of posts on the six-session Software in Astronomy Symposium held on Wednesday and Thursday, April 3-4 at the 2018 EWASS/NAM meeting.

BLOCK 2: Software publishing, impact, & credit
This session focused on using the available infrastructure to better reward software authors and ways to count these valuable research objects. The software contributions that enable much of the results in astronomy are often not recognized, nor considered for reward or promotion. Unlike most of the other sessions in this Symposium, this session had only three short presentations and devoted the rest of the time to an open discussion and Q&A. The session and discussion period was moderated by Rein Warmels (ESO, DE).

Slide from Bianco’s presentation

Federica Bianco (NYU, US) presented Understanding the Impact of Your Research Software to open the session. She stated that one should always cite software used in research, but “it’s not always obvious how.” She discussed a finding of Howison and Bullard (2015) in their research on software citation in biology articles: open source software is cited more informally than proprietary software. This means that software authors are not accruing credit for their contributions in a way that academia rewards: formal citations. Bianco mentioned Force11, which has published software citation principles, and the now-completed Depsy project, which sought to provide not only citation information on software but also to measure the impact and use of code through other statistics, such as downloads, number of contributors and number of projects reusing the software. Among Bianco’s suggestions for fostering good citation practices for one’s own software were to get a DOI for it and give users instructions on how the software should be cited.

One of Smith’s slides

Keith Smith (Science, UK) spoke on Citation of data and software in astronomy: A journal editor’s perspective. Smith said that most scientific advances have their base on previous work, which requires reproducibility. Citations not only enhance reproducibility, they also assign credit. He provided guidelines for citing data and software, sharing bad, better, and good examples, and spoke of a virtuous cycle that will increase reproducibility in addition to the sharing of software and data. Smith mentioned the Center for Open Science Transparency and Openness Promotion guidelines, a policy framework for journals that was developed with journal and community input, and noted that though over 5,000 journals have signed onto them, none of the major astronomy journals have done so. Science’s policies require proper citation of data and software and release of data and software upon publication. Looking forward, Smith sees data and software citation becoming more common, as could be seen with a graph from the ASCL’s dashboard showing the increasing number of citations to its entries, and stated that journals have a role in improving reproducibility and proper citations through policies and editor and referee awareness of changing community standards.

The last presentation of this session was by Alice Allen (ASCL, US), who spoke on Receiving Credit for Research Software. She discussed recent changes in astronomy and in other disciplines that make recognizing the contributions of software authors easier. These changes include new journals, both astro-specific and with a broader focus, specifically for software, policy changes for existing journals, and community resources; these resources include collaborative coding sites such as Bitbucket and GitHub and archival resources such as Figshare and Zenodo. Existing services such as the ASCL have been given new life and are growing; in its Next Generation project, arXiv is improving its support for linking data and code to research. Software citation is captured/tracked/counted by indexers such as ADS, Web of Science, and Google Scholar. Broader efforts to improve reproducibility, citation,

Rein Warmels moderating

and credit, such as CodeMeta, Force11, WSSSPE, the FAIR principles, and DataCite involve those from many disciplines; the sharing of ideas influences not just those involved in the efforts, but has a greater reach with their aspirational and practical goals and guidelines. Allen shared steps code authors can take to increase the probability of having their software cited correctly and steps researchers can take to improve their articles by including citations for the computational methods that enabled their research, and provided a link to resources mentioned in her talk.

Rein Warmels then opened the floor for discussion. Someone asked whether GitHub would “be there forever?” The point was made that GitHub is not intended to be an archive, and that other services are, so use them to archive your code. On whether to release software, Smith stated that even horrible to read code is better than no code: “it’s hard, painful, and you may hate someone forever… but that’s better than nothing.” Science can provide software information as a supplement, so that is one way to ensure your software is available to support your research findings. In discussing software citations, Allen pointed out that ADS has not been able to automatically track citations to Zenedo DOIs, though that is expected to change soon. The issue of what exactly to cite also came up – should one cite all the dependencies needed to run a particular research code? Neither astronomy nor other disciplines have a way to handle this; at this time, the recommendation is to cite the research software you use well so others will know what your work relied on, and leave it to the software sites to identify the dependencies.

Slides from this session

Understanding the Impact of Your Research Software by Federica Bianco (pdf)

Citation of data and software in astronomy: A journal editor’s perspective_by Keith Smith (pdf)

Receiving Credit for Research Software by Alice Allen (pdf)

Software in Astronomy Symposium Presentations (2018 EWASS/NAM)

This is the first in a series of posts on the six-session Software in Astronomy Symposium held on Wednesday and Thursday, April 3-4 at the 2018 EWASS/NAM meeting. Each of the six sessions focused on a different aspect of research software, covering not only specific software packages, but also computational techniques used in data mining and machine learning, open services, software development training and techniques, and getting credit and citations for computational methods. Several sessions included a free-form period in which participants could ask questions, discuss issues, and share information. The last session of the Symposium was a lively moderated discussion among attendees with particular interest in software publishing.

BLOCK 1: Software engineering and sustainability, education for better software, & the ecosystem around Python in astronomy
The first session set the stage for the Symposium, featuring a variety of topics of importance when discussing astronomy research software. Alice Allen (ASCL, US) moderated the session. In the inaugural talk, Software Engineers as Partners in Astronomy Software Development, John Wenskovitch (Virginia Tech, US) opened his presentation with a quote by computer scientist and professor Carole Goble, stating that software is “the most prevalent of all the instruments used in modern science.” This was reiterated by others throughout the symposium. Wenskovitch provided statistics on software use and development activities by academics, among these that 92% of academics use software and 38% spend at least 20% of their time developing software. Research software engineers (RSE) provide guidance to researchers on software engineering and encourage the use of tools that can save academics time and effort in their development efforts. Wenskovitch suggests identifying and using the strengths of each, with the researcher bringing domain knowledge and expertise on the research itself, and the RSE bringing development experiendocument all the things!ce and software engineering expertise. He provided suggestions for ensuring a fruitful partnership; these include using version control, scheduling time for regular and frequent communication, having a prioritized feature list, testing the code thoroughly using unit, regression, and usability tests, and documenting everything.

photo of Mark WilkinsonMark Wilkinson (DiRAC HPC Facility, UK) spoke next, presenting Research Software Engineering – the DiRAC facility experience. The science requirements for DiRAC demand a 10-40fold increase in computing power to stay competitive, and this increase cannot be delivered solely by hardware. Software vectorization and code efficiency is vital, and RSEs are increasingly important to help with, for example, code profiling, optimization, and porting. DiRAC’s three full-time RSEs are embedded in teams, their time allocated through a peer-review process. Wilkinson showed that the use of RSEs has paid off well for DiRAC, with, for one project, a factor 10 speed-up by optimizing a particular code. The focus on software engineering continued with a talk on Software Engineering Training for Researchers delivered by David Perez-Suarez (UCL, UK). He presented information he had gathered by conducting a quick survey to learn, among other things, what software development training researchers had gotten. His recommendations for training include running or attending training taught by The Carpentries, asking that training be conducted in conjunction with a large conference, such as the  American Astronomical Society has been doing for several years, checking to see what software training might be offered by your university, creating your own study group, and contributing to an open source project. James Nightingale (DurhamU, UK) presented a very interesting talk on Test-driven Development in Astronomy. He convinced many in the room that using this technique for developing software will result in better software and less aggravation when coding. He stressed that test-driven development (TDD) is not a testing process, but a development process, and that the code coming out fully tested is a bonus. With TDD, the first task is not to write code, but to write a unit test and then run it to ensure it fails. Only after that do you write the code, and then test it. Through refactoring and testing code, you get instant feedback on whether the code’s functionality has changed, and code design becomes part of the development cycle.

The session then moved on to software sustainability with Bruce Berriman’s (Caltech/IPAC-NExScI, US) talk on Sustaining The Montage Image Mosaic Engine Since 2002. Montage has become increasingly robust and versatile over the years, is embedded in various archives and processing environments, and has been used in other disciplines as well as in astronomy. It has been cited more in information technology literature than in astronomy literature, though uptake of Montage was initially slow. Berriman made the point that design drives sustainability; all Montage releases inherit the design, and each module performs one task. He advocates listening to users and learning from their experiences, and shared his adage that “the grumpier the user, the more valuable the suggestions.”

The last two talks of this first session focused on Python, and covered the growing use of this language in astronomy, the reasons for this growth, the support that is available for the language, and information on one very popular package written in Python. Amruta Jaodand (ASTRON, NL) presented A Walk Through Python Ecosystem, starting with its early development in 1989 by Guido van Rossum at the University of Amsterdam. The advantages of Python include simplicity and natural flow and an extensive, powerful standard library. Strengths of the language include the development of scientific, numerical, and statistical packages and its Python Package Index (PyPi), which enables module and package sharing. Jaodand shared some of the learning materials available for Python, including python4astronomers, and also a lovely Easter egg about Python that is too long to include here and is worth reading. One of the most important astronomy packages is AstroPy, and Jaodand’s talk was followed by The Astropy Project: A community Python library and ecosystem of astronomy packages, presented by Brigitta Sipocz (AstroPy, UK). AstroPy provides software for many common astronomy needs; in addition to the core library, there are many affiliated packages. All of these packages adhere to coding, testing and documentation standards that have been developed by the AstroPy coding community. Sipocz also discussed the community, with members of the team having one or more of its many roles. The number of collaborators continues to grow, and the community welcomes new members, and labels packages that are particularly friendly for a new contributor to work on.

Slides from this session

Software Engineers as Partners in Astronomy Software Development by John Wenskovitch (PDF)

Research Software Engineering – the DiRAC facility experience by Mark Wilkinson (pdf)

Sustaining The Montage Image Mosaic Engine Since 2002 by Bruce Berriman (pdf)

Software Engineering Training for Researchers by David Perez-Suarez (Google doc) | blog post

Test-driven Development in Astronomy by James Nightingale (pdf)

A Walk Through Python Ecosystem by Amruta Jaodand (pdf)

April 2018 additions to the ASCL

Twenty-six codes were added to the ASCL in April 2018:

3DView: Space physics data visualizer
Agatha: Disentangling period signals from correlated noise in a periodogram framework
allantools: Allan deviation calculation
APPHi: Automated Photometry Pipeline for High Cadence, Large Volume Data
ASERA: A Spectrum Eye Recognition Assistant

AstroCV: Astronomy computer vision library
CAT-PUMA: CME Arrival Time Prediction Using Machine learning Algorithms
chroma: Chromatic effects for LSST weak lensing
DaCHS: Data Center Helper Suite
DESCQA: Synthetic Sky Catalog Validation Framework

DPPP: Default Pre-Processing Pipeline
EGG: Empirical Galaxy Generator
FastChem: An ultra-fast equilibrium chemistry
IMNN: Information Maximizing Neural Networks
ipole: Semianalytic scheme for relativistic polarized radiative transport

KSTAT: KD-tree Statistics Package
Lenstronomy: Multi-purpose gravitational lens modelling software package
LFlGRB: Luminosity function of long gamma-ray bursts
LFsGRB: Binary neutron star merger rate via the luminosity function of short gamma-ray bursts
NR-code: Nonlinear reconstruction code

orbit-estimation: Fast orbital parameters estimator
ProFound: Source Extraction and Application to Modern Survey Data
SMERFS: Stochastic Markov Evaluation of Random Fields on the Sphere
surrkick: Black-hole kicks from numerical-relativity surrogate models
UniDAM: Unified tool to estimate Distances, Ages, and Masses

ViSBARD: Visual System for Browsing, Analysis and Retrieval of Data

Resources mentioned in Receiving Credit for Research Software session at EWASS/NAM 2018

Journals

Journal of Open Source Software (JORS)

Astronomy and Computing (A&C)

Computational Astrophysics and Cosmology (ComAC)

SoftwareX

Journal of Open Source Software (JOSS)

Research Notes of the AAS

Change leaders and guidelines

Force11/Force11 Software Citation Principles

CodeMeta

Working toward Sustainable Software for Science: Practice and Experiences (WSSSPE)

FAIR principles

Social coding sites and archival services

Bitbucket

GitHub

Figshare

Zenodo

Other resources

Asclepias

arXiv/arXiv Next Generation

DataCite

 

EWASS/NAM Software in Astronomy Symposium

The EWASS/NAM Software in Astronomy Symposium gets underway at 9:00 AM today in Room 11A of the Liverpool ACC. This six-session Symposium includes presentations on:

  • Software engineering and sustainability, education for better software, and the ecosystem around Python in astronomy (Wednesday, 9:00 – 10:30 AM)
  • Software publishing, impact, and credit (Wednesday, 2:30 – 4:00 PM)
  • Software packages for research (Wednesday, 4:30 – 6:00 PM)
  • Open and Transparent Data Services (Thursday, 9:00 – 10:30 AM)
  • Machine Learning and Data Mining (Thursday, 2:30 – 4:00 PM)

The last session of the Symposium is a Software Publishing Special Interest Group meeting, and will take place on Thursday from 4:30 to 6:00 PM.

For more information on this session, including abstracts, check the interactive guide for Symposium S6a – S6f. See you there!

March 2018 additions to the ASCL

Fifteen codes were added in March 2018:

3D-PDR: Three-dimensional photodissociation region code
CIFOG: Cosmological Ionization Fields frOm Galaxies
DaMaSCUS-CRUST: Dark Matter Simulation Code for Underground Scatterings – Crust Edition
ExoCross: Spectra from molecular line lists
ExtLaw_H18: Extinction law code

FAST: Fitting and Assessment of Synthetic Templates
IMAGINE: Interstellar MAGnetic field INference Engine
Kadenza: Kepler/K2 Raw Cadence Data Reader
LWPC: Long Wavelength Propagation Capability
MulensModel: Microlensing light curves modeling

nanopipe: Calibration and data reduction pipeline for pulsar timing
optBINS: Optimal Binning for histograms
RAPTOR: Imaging code for relativistic plasmas in strong gravity
scarlet: Source separation in multi-band images by Constrained Matrix Factorization
SETI-EC: SETI Encryption Code

Citations over time

How much have things changed? The previous “big 4” journals that had citations to ASCL entries have been joined by AJ and the percentage of citations from MNRAS has dropped a bit, but overall, the wedges of these two piecharts, one from October, 2015 and the second from today, look remarkably similar.

At the time the 2015 piechart was created, ASCL entries had been cited 465 times; today, ADS shows 2093 citations to ASCL entries. Seventeen percent of ASCL entries had been cited in October 2015, and as of today, over 29% of ASCL entries have citations.

Of course there are other ways to cite software, and the ASCL supports all citable methods and ASCL entries include preferred citation information where possible.

Do we list how your software should be cited? If not, please let us know your preferred method and we will add it to the entry!