Seminar at Centre de Recherche Astrophysique de Lyon

I gave a seminar at the Centre de Recherche Astrophysique de Lyon on Friday, September 14, at the invitation of Mohammad Akhlaghi, a post-doc there. Mohammad is very interested and has done a lot of work on reproducibility, ensuring that his work is reproducible and developing a reproducibility framework that can be adopted by others. The seminar took place on CRAL’s lovely historic campus at the Observatoire de Lyon in Saint-Genis-Laval. The title, abstract, and link to the slides are below.

Title: Make your code famous! (or at least discoverable).

Abstract: Source codes are increasingly important for the advancement of science in general and astrophysics in particular. Journal articles detail the general logic behind new results and ideas, but often the source codes that enable these results remain hidden from public view. In this presentation, I will discuss our recent study on the availability of source codes used for published research and how this affects the transparency and reproducibility of astro research. I will cover what the Astrophysics Source Code Library (ASCL, is, how to submit software to the resource, and the benefits of doing so. I will share what happens after software is submitted, how ASCL entries are indexed by ADS, the links between literature and software entries, and how an ASCL ID can be used for citing your code. I will cover good and bad ways to cite software, avenues for publishing software, and how journals are changing to include and recognize the contribution software makes to our discipline.

Slides (PDF)

August 2018 additions to the ASCL

Eleven codes were added to the ASCL in August 2018:

2DSF: Vectorized Structure Function Algorithm
Barycorrpy: Barycentric velocity calculation and leap second management
CPF: Corral Pipeline Framework
Fips: An OpenGL based FITS viewer
hfof: Friends-of-Friends via spatial hashing

hi_class: Horndeski in the Cosmic Linear Anisotropy Solving System
ImPlaneIA: Image Plane Approach to Interferometric Analysis
py-sdm: Support Distribution Machines
PyMieDap: Python Mie Doubling Adding Program
Robbie: Radio transients and variables detection workflow

rsigma: Resonant disturbance

ASCL poster as IAU 2018 General Assembly

ASCL poster for IAU 2018 meeting

Abstract: Astrophysics research relies on software and all robust science requires transparency and reproducibility, yet the computational methods used in our discipline are often not shared or are difficult to find. In recent preliminary research, 40% of the software used in the 2015 papers we examined did not offer source code and restricting the reproducibility of this research. The Astrophysics Source Code Library (ASCL. registers astrophysics research source codes that have been used in refereed research, benefiting the field in numerous ways, including increasing the discoverability of software and making the published research record more robust. With over 1,700 codes, the ASCL is the largest indexed resource for astronomy research codes in existence. This free online registry was established in 1999 and is indexed by ADS and Web of Science. ASCL registration allows your software to be cited on its own merits and provides a citation method that is trackable and accepted by all astronomy journals and journals such as Science and Nature. This presentation covers the benefits of registering astronomy research software with the ASCL, upcoming changes that will enable greater software discovery initially for NASA software and potentially for software funded by other organizations, changes to the ASCL and ADS that benefit researchers, and our research into software use in astronomy.

Alice Allen, Astrophysics Source Code Library/University of Maryland
Robert J. Nemiroff, Michigan Technological University
Peter J. Teuben, University of Maryland

Download poster

July 2018 additions to the ASCL

Thirty-three codes were added to the ASCL in July 2018:

AngPow: Fast computation of accurate tomographic power spectra
ARKCoS: Radial kernel convolution on the sphere
ASP: Ames Stereo Pipeline
BARYCORR: Python interface for barycentric RV correction
CAESAR: Compact And Extended Source Automated Recognition

CLASSgal: Relativistic cosmological large scale structure code
DAMOCLES: Monte Carlo line radiative transfer code
EVEREST: Tools for de-trending stellar photometry
GLS: Generalized Lomb-Scargle periodogram
HELIOS: Radiative transfer code for exoplanetary atmospheres

HII-CHI-mistry_UV: Oxygen abundance and ionizionation parameters for ultraviolet emission lines
HII-CHI-mistry: Oxygen abundance and ionizionation parameters for optical emission lines
kplr: Tools for working with Kepler data using Python
ktransit: Exoplanet transit modeling tool in python
LSC: Supervised classification of time-series variable stars

MAPPINGS V: Astrophysical plasma modeling code
MIDLL: Markwardt IDL Library
nfield: Stochastic tool for QFT on inflationary backgrounds
NRPy+: Code generator for Numerical Relativity
POLARIS: POLArized RadIation Simulator

POWER: Python Open-source Waveform ExtractoR
PUMA: Low-frequency radio catalog cross-matching
PyAutoLens: Strong lens modeling
pyqz: Emission line code
SENR: Simple, Efficient Numerical Relativity

SPEGID: Single-Pulse Event Group IDentification
SSMM: Slotted Symbolic Markov Modeling for classifying variable star signatures
TBI: Three-Body Integration
THOR: Global Circulation Model for planetary atmospheres
Warpfield: Winds And Radiation Pressure: Feedback Induced Expansion, colLapse and Dissolution

wdmerger: Simulate white dwarf mergers with CASTRO
xGDS: Exploration Ground Data Systems
ZBARYCORR: Barycentric redshift calculator

June 2018 additions to the ASCL

Thirty-two codes were added to the ASCL in June 2018:

ASPIC: Accurate Slow-roll Predictions for Inflationary Cosmology
BHDD: Primordial black hole binaries code
BRATS: Broadband Radio Astronomy ToolS
BWED: Brane-world extra dimensions
DirectDM-mma: Dark matter direct detection

DirectDM-py: Dark matter direct detection
EXO-NAILER: EXOplanet traNsits and rAdIal veLocity fittER
exoinformatics: Compute the entropy of a planetary system’s size-ordering
fcmaker: Creating ESO-compliant finding charts for Observing Blocks on p2

foxi: Forecast Observations and their eXpected Information
GLASS: Parallel, free-form gravitational lens modeling tool and framework
gsf: galactic structure finder
Indri: Pulsar population synthesis toolset
Keras: The Python Deep Learning library

LASR: Linear Algorithm for Significance Reduction
OMEGA: One-zone Model for the Evolution of GAlaxies
P2DFFT: Parallelized technique for measuring galactic spiral arm pitch angles
pile-up: Monte Carlo simulations of star-disk torques on hot Jupiters
pwv_kpno: Modeling atmospheric absorption

PyAMOR: AMmOnia data Reduction
PyMUSE: VLT/MUSE data analyzer
pyZELDA: Python code for Zernike wavefront sensors
QE: Quantum opEn-Source Package for Research in Electronic Structure, Simulation, and Optimization
RadFil: Radial density profile builder for interstellar filaments

RMextract: Ionospheric Faraday Rotation calculator
SpaghettiLens: Web-based gravitational lens modeling tool
Spheral++: Coupled hydrodynamical and gravitational numerical simulations
SpS: Single-pulse Searcher
SYGMA: Modeling stellar yields for galactic modeling

WDEC: White Dwarf Evolution Code
WiseView: Visualizing motion and variability of faint WISE sources

Linking literature and software

Most ASCL code entries have one or more links to articles that either describe or use the software in that entry. ADS ingests this information to associate the code with relevant literature. For example, the entry for 2LPTIC: 2nd-order Lagrangian Perturbation Theory Initial Conditions includes a link for an MNRAS paper in the “Appears in: field:

Going to the ADS entry for this software shows that the code is associated with a paper under Associated Articles:This, however, doesn’t tell you anything about the relationship between the article and the software. ADS and the ASCL have been working to improve this. The ASCL has been disambiguating these article links into Described in and Used in. At the ADS Hack Day event last month, ASCL provided ADS with disambiguated links for over 900 entries, and Carolyn Grant had these uploaded into ADS in a matter of minutes. (ADS folks are wizards, I tell you! Wizards! They work magic!!)

Currently, ASCL records appear the same, but for those records we have provided disambiguated article links, ADS displayed them as Described in and Used in, as you can see in the 2-DUST: Dust radiative transfer code entry, for which the ASCL lists two papers:

For ASCL records in which the Appears in link(s) have not been disambiguated, there is no change in how they are displayed in ADS. We have over 700 entries with article links still to be disambiguated and we continue this work; ADS will be ingesting the changes with their regular weekly ingest of ASCL data.

May 2018 additions to the ASCL

Thirty-two codes were added to the ASCL in May 2018:

3DCORE: Forward modeling of solar storm magnetic flux ropes for space weather prediction
AGAMA: Action-based galaxy modeling framework
Arcmancer: Geodesics and polarized radiative transfer library
ASTROPOP: ASTROnomical Polarimetry and Photometry pipeline
BCcodes: Bolometric Corrections and Synthetic Stellar Photometry

BinMag: Widget for comparing stellar observed with theoretical spectra
CUBE: Information-optimized parallel cosmological N-body simulation code
CubiCal: Suite for fast radio interferometric calibration
DeepMoon: Convolutional neural network trainer to identify moon craters
dftools: Distribution function fitting

EARL: Exoplanet Analytic Reflected Lightcurves package
exocartographer: Constraining surface maps orbital parameters of exoplanets
GLACiAR: GaLAxy survey Completeness AlgoRithm
grid-model: Semi-numerical reionization code
HENDRICS: High ENergy Data Reduction Interface from the Command Shell

lcps: Light curve pre-selection
MontePython 3: Parameter inference code for cosmology
OSS: OSSOS Survey Simulator
PampelMuse: Crowded-field 3D spectroscopy
PoMiN: A Post-Minkowskian N-Body Solver

powerbox: Arbitrarily structured, arbitrary-dimension boxes and log-normal mocks
PROM7: 1D modeler of solar filaments or prominences
PyCBC: Gravitational-wave data analysis toolkit
PyCCF: Python Cross Correlation Function for reverberation mapping studies
PySE: Python Source Extractor for radio astronomical images

SNSEDextend: SuperNova Spectral Energy Distributions extrapolation toolkit
SP_Ace: Stellar Parameters And Chemical abundances Estimator
STARBLADE: STar and Artefact Removal with a Bayesian Lightweight Algorithm from Diffuse Emission
StarSmasher: Smoothed Particle Hydrodynamics code for smashing stars and planets
StePS: Stereographically Projected Cosmological Simulations

SWIFT: SPH With Inter-dependent Fine-grained Tasking
xspec_emcee: XSPEC-friendly interface for the emcee package

Software in Astronomy Symposium Presentations, Part 6

This is the sixth in a series of posts on the six-session Software in Astronomy Symposium held on Wednesday and Thursday, April 3-4 at the 2018 EWASS/NAM meeting.

BLOCK 6: Software Publishing Special Interest Group Meeting
This meeting-within-a-meeting was an opportunity for journal editors, publishers, referees, abstract services, and others associated with research software publication to discuss how best to include research software in the scholarly record, improve the sustainability and reproducibility of research articles, and share information on issues and possible solutions. Representatives from Science, Nature-Springer, MNRAS, Oxford University Press, and AAS Journals were among the journals and publishers attending. As the session was open to all, researchers and software authors also attended. The agenda had three main items on it: journal software policies, ratings for numerical reproducibility, and improving instructions for authors and referees. The session was moderated by Rein Warmels (ESO, DE) and Alice Allen (ASCL, US).

After introductions by all in the room, the first agenda item, journal policies on software, was opened for discussion. Keith Smith, associate editor for astronomy and planetary science for Science, shared that his journal is requiring that software that enables research results be shared. Editors from other journals stated that this would probably not work for them, though they are sympathetic to the importance of research software transparency. Chris Lintott, lead editor for Instrumentation, Software, Laboratory Astrophysics, and Data for AAS Journals, said that he rewrote the Journals’ software policies on his first day at the job to require formal citation of software. Smith pointed out the expectation is different for data; people are much more open about sharing observational data, and A&A (may) require it. Warmels noted a difference between data and software; the quality of observational data from an observatory is known. This is not the case with software. We cannot know the quality of the unreleased code. Amruta Jaodand asked whether publishing houses have software reviewers. The various astronomical societies’ journals do not peer-review code; there are a few journals that do perform code review of various depths, such as the Journal of Open Source Software and Software X, both of which focus on research software across disciplines.

There was support for better software citation, but not for ratings of articles for numerical reproducibility. The idea of ratings for reproducibility led to a discussion about reproducibility itself and the issue of releasing software written for research. Adam Leary, senior publisher at Oxford University Press, said that the Journal of Biostatistics rates the reproducibility of its articles and that that journal has a reproducibility editor. The group discussed the workload this might put on reviewers along with other issues, which would slow down the review. But the real job would be for the authors! Brigitta Sipocz mentioned the need for a feedback loop, which triggered the question as to why one would want to put a lot of resources into reproducibility. Smith replied that there are cases in other sciences where whole bodies of work could not be reproduced! Warmels pointed out that a number of fake results were found by the community, not by reviewers.

Someone suggested pushing the community toward releasing software through funding councils. Lintott initially liked the idea, and stated that journals could enforce this by checking papers against funding/funders that require release. Allen found this is an intriguing suggestion. Additional discussion raised several issues with this approach. The impracticality of implementing the idea became obvious when considering the time and resources it would take and the complexity of funding, as well as varied requirements of a large number of funding organizations.

Smith’s “virtuous cycle” slide

The discussion then turned to reasons researchers do not release their software, with one advantage stated as, “If you don’t give me this funding, this research will not be done.” We have to change the way we argue for funding, then… “because as the only person who can do this, keeping my code private IS an advantage.” Jaodand mentioned that researchers get less credit for software than for research results. Smith replied that until we build up a virtuous cycle of code release, something he had mentioned in his presentation the previous day, the answer may be getting the credit system working first.

Another disincentive for code release mentioned was the possibility of someone running software incorrectly and then publishing “this code doesn’t work.” Lintott said that we should look for this and get data on it, so we can answer the question, “How often has this happened?” He also suggested looking for the positive cases, where release has been good for a developer or developer team, and provide this data to code authors.

The next agenda item was improving instructions for authors and referees on software citation and treatment. According to Smith, Science’s instructions were improved by rewiting them to accord with the Center for Open Science‘s Transparency and Openness Promotion (TOP) Guidelines. The guidelines are very helpful, and using them provides clear instructions. Greg Schwartz, data editor for AAS Journals, asked how we could better encourage authors to read instructions. Smith waggishly replied these instructions exist so journal editors can point to them. It was suggested that journals standardized their instructions not only to help authors out but also to discourage what was referred to as “research tourism.”  The TOP Guidelines were again brought up as a good tool to use for standardizing instructions. Someone asked about having a section for acknowledgements or statements for software. Smith pointed out the danger that some may think that section as a substitute for formal citation. Allen agreed that software should have formal citations, and also stated her appreciation for the Software section that AAS Journals have added to their papers. The ASCL has long been interested in seeing such a software listing in research articles (in addition to, not as a substitute for, formal citation of software). Warmels returned the discussion to the idea of standardization of instructions, asking whether this can be done. Journal representatives said the various journals do get together to standardize where they can, and are due to do so again.

The discussion migrated to openness in general. Among the suggestions for moving the discipline to be more open were to “Advertise your openness!” and to include a slide in your presentations that say your work is open and reproducible; this lets your peers know that you value openness, and can help others think about working more openly. The point was made to not rely on policing for open practices as resources aren’t available to do so. The role of education was brought up, too: Researchers need to be taught how to make their data and software open.

The final agenda item was to decide whether an on-going software publishing special interest group might be welcomed by those in the room; there was no support for this. Journals already have a method to share information amongst themselves and everyone is oversubscribed to meetings, groups, and conference calls. With that item settled, the meeting and Software in Astronomy Symposium concluded.

The ASCL thanks the Heidelberg Institute for Theoretical Studies for its generous ongoing support, which permitted two participants in this symposium to attend the EWASS/NAM meeting who would not have been able to do so without it.

Software in Astronomy Symposium Presentations, Part 5

This is the fifth in a series of posts on the six-session Software in Astronomy Symposium held on Wednesday and Thursday, April 3-4 at the 2018 EWASS/NAM meeting.

BLOCK 5: Machine Learning & Data Mining
Stephen Serjeant (Open University, UK) moderated the fifth session. This session presented different techniques to, for example, study noise in gravitational wave interferometers, select young stellar object candidates, and directly image exoplanets. David Cornu’s (UTINAM, FR) talk, titled Selection of Spitzer YSO candidates using deep learning classifier, included a short tutorial for creating an artificial neural network, showing how a small neuron takes input vectors and updates weights associated with them to understand anti- and co-relation between various factors. Similarly, Carlos Alberto Gomez Gonzalez (U Grenoble Alpes) showed how supervised machine learning can be used to detect exoplanets in his presentation Data science for direct imaging of exoplanets. Other talks in this session included Massimiliano Razzano (INFN, IT), Deep learning to study the noise in gravitational wave interferometers, Sebastian Turner (LJMU, UK) presenting on k-means clustering in galaxy feature data, and Emille Ishida (COIN, FR) presenting for Santiago Gonzalez Gaitan, with Spatial inference of astronomical datasets with INLA. Robert Lyon (UManchester, UK) finished this session with a presentation on Imbalanced learning in astronomy, and provides a Jupyter notebook containing a tutorial and examples. The presentations in this session were accessible even to those with no experience in data mining and machine learning, as the techniques used were explained quickly and well before moving on to how they enabled particular research.

Slides from this session

Selection of Spitzer YSO candidates using deep learning classifier by David Cornu (pdf)

Data science for direct imaging of exoplanets by Carlos Alberto Gomez Gonzalez (pdf)

k-means clustering in galaxy feature data by Seb Turner (pdf)

Spatial inference of astronomical datasets with INLA by Emille Ishida/Santiago Gonzalez Gaitan (pdf)

Imbalanced learning in astronomy by Rob Lyon (pptx)

Software in Astronomy Symposium Presentations, Part 4

This is the fourth in a series of posts on the six-session Software in Astronomy Symposium held on Wednesday and Thursday, April 3-4 at the 2018 EWASS/NAM meeting.
BLOCK 4: Open & Transparent Data Services
Astronomy leads most sciences in providing many open services, particularly data and ways to get access to and use data. This session, moderated by Andy Pollock (USheffield, UK), highlighted some of the new and ongoing services available to not just professional astronomers, but also to students and other interested parties. Debbie Baines (ESA, ES) opened the session with her presentation ESASky version 2: the next generation, which included a live demonstration of this incredible resource. ESASky allows searching for any astronomical object and offers viewing it in different wavelengths. It is very fast, too; Baines loaded images from Herschel, XMM, Chandra, and HST in her live demo. It offers relevant links to SIMBAD, such as to papers, and one can pull up research literature from ADS in ESASky, too. Rachael Ainsworth (UMelbourne, AU), originally scheduled to give the first presentation, followed Baines; her presentation, Open Science in Astronomy, gave an overview of openness in astronomy. She pointed out that astronomy is better at open science than many other fields, crediting, among other services, arXiv and GalaxyZoo. She covered some of the challenges and shared additional resources that support, encourage, or make open science possible. Jorge Palacios (IFAE, ES), in his talk Astronomy in a Big Data platform, discussed two services, Cosmohub, a web portal for interactive exploration and distribution of massive cosmological data, and SciPIC (Scientific pipeline at PIC), software for generating synthetic galaxy catalogs using DM simulations. In An Interactive Sky Map based on the Byurakan Plate Archive, Gor Mikayelyan (BAO, AM) shared that the NAS RA Byurakan Astrophysical Observatory in Armenia is making its plate archive, covering 1947 -1991, available online in the BAO Observational Database, which will have the ability to search and select and will allow downloading the plates in different formats. A sky map will show where plates are available. After the last two talks, The ASI Cosmic Ray Database for charged particles data by Valeria Di Felice (SSDC/INFN, IT) and Using XML and semantic technologies in astroinformatics to manage data by Guy Beech (UHuddersfield, UK), Pollock opened the floor for discussion and asked whether people could do what they’d like to do as efficiently as they’d like. One answer from the audience was no, because the data are heterogeneous and different ways are required to access them. One pointed observation arising from the back-and-forth was that “radio astronomy is decades behind” in terms of software services, with “thousands of different file formats processed by thousands of different programs”.

Slides from this session

Open Science in Astronomy by Rachael Ainsworth (pdf)

Astronomy in a Big Data platform by Jorge Palacios (pdf)

An Interactive Sky Map based on the Byurakan Plate Archive by Gor Mikayelyan (pdf) | Text (pdf)

The ASI Cosmic Ray Database for charged particles data by Valeria Di Felice (pdf)

Using XML and semantic technologies in astroinformatics to manage data by Guy Beech (pdf) | Paper (pdf)