EWASS/NAM Software in Astronomy Symposium

The EWASS/NAM Software in Astronomy Symposium gets underway at 9:00 AM today in Room 11A of the Liverpool ACC. This six-session Symposium includes presentations on:

  • Software engineering and sustainability, education for better software, and the ecosystem around Python in astronomy (Wednesday, 9:00 – 10:30 AM)
  • Software publishing, impact, and credit (Wednesday, 2:30 – 4:00 PM)
  • Software packages for research (Wednesday, 4:30 – 6:00 PM)
  • Open and Transparent Data Services (Thursday, 9:00 – 10:30 AM)
  • Machine Learning and Data Mining (Thursday, 2:30 – 4:00 PM)

The last session of the Symposium is a Software Publishing Special Interest Group meeting, and will take place on Thursday from 4:30 to 6:00 PM.

For more information on this session, including abstracts, check the interactive guide for Symposium S6a – S6f. See you there!

March 2018 additions to the ASCL

Fifteen codes were added in March 2018:

3D-PDR: Three-dimensional photodissociation region code
CIFOG: Cosmological Ionization Fields frOm Galaxies
DaMaSCUS-CRUST: Dark Matter Simulation Code for Underground Scatterings – Crust Edition
ExoCross: Spectra from molecular line lists
ExtLaw_H18: Extinction law code

FAST: Fitting and Assessment of Synthetic Templates
IMAGINE: Interstellar MAGnetic field INference Engine
Kadenza: Kepler/K2 Raw Cadence Data Reader
LWPC: Long Wavelength Propagation Capability
MulensModel: Microlensing light curves modeling

nanopipe: Calibration and data reduction pipeline for pulsar timing
optBINS: Optimal Binning for histograms
RAPTOR: Imaging code for relativistic plasmas in strong gravity
scarlet: Source separation in multi-band images by Constrained Matrix Factorization
SETI-EC: SETI Encryption Code

Citations over time

How much have things changed? The previous “big 4” journals that had citations to ASCL entries have been joined by AJ and the percentage of citations from MNRAS has dropped a bit, but overall, the wedges of these two piecharts, one from October, 2015 and the second from today, look remarkably similar.

At the time the 2015 piechart was created, ASCL entries had been cited 465 times; today, ADS shows 2093 citations to ASCL entries. Seventeen percent of ASCL entries had been cited in October 2015, and as of today, over 29% of ASCL entries have citations.

Of course there are other ways to cite software, and the ASCL supports all citable methods and ASCL entries include preferred citation information where possible.

Do we list how your software should be cited? If not, please let us know your preferred method and we will add it to the entry!

February 2018 additions to the ASCL

Sixteen codes were added in February 2018:

AntiparticleDM: Discriminating between Majorana and Dirac Dark Matter
ARTIP: Automated Radio Telescope Image Processing Pipeline
astroplan: Observation planning package for astronomers
BHMcalc: Binary Habitability Mechanism Calculator

CMacIonize: Monte Carlo photoionisation and moving-mesh radiation hydrodynamics
collapse: Spherical-collapse model code
eqpair: Electron energy distribution calculator
FAC: Flexible Atomic Code

Glimpse: Sparsity based weak lensing mass-mapping tool
HiGal_SED_Fitter: SED fitting tools for Herschel Hi-Gal data
mrpy: Renormalized generalized gamma distribution for HMF and galaxy ensemble properties comparisons
PyOSE: Orbital sampling effect (OSE) simulator

runDM: Running couplings of Dark Matter to the Standard Model
venice: Mask utility
Verne: Earth-stopping effect for heavy dark matter
VISIBLE: VISIbility Based Line Extraction

December 2017 and January 2018 additions to the ASCL

Sixteen codes were added in December 2017:

Bitshuffle: Filter for improving compression of typed binary data
CosApps: Simulate gravitational lensing through ray tracing and shear calculation
draco: Analysis and simulation of drift scan radio data
FBEye: Analyzing Kepler light curves and validating flares

Flux Tube: Solar model
KDUtils: Kinematic Distance Utilities
LgrbWorldModel: Long-duration Gamma-Ray Burst World Model
MadDM: Computation of dark matter relic abundance

MPI_XSTAR: MPI-based parallelization of XSTAR program
Nyx: Adaptive mesh, massively-parallel, cosmological simulation code
photodynam: Photodynamical code for fitting the light curves of multiple body systems
Py-SPHViewer: Cosmological simulations using Smoothed Particle Hydrodynamics

QATS: Quasiperiodic Automated Transit Search
RODRIGUES: RATT Online Deconvolved Radio Image Generation Using Esoteric Software
SFoF: Friends-of-friends galaxy cluster detection algorithm
SgrbWorldModel: Short-duration Gamma-Ray Burst World Model

And twelve codes were added in January 2018:

BANYAN_Sigma: Bayesian classifier for members of young stellar associations
BOND: Bayesian Oxygen and Nitrogen abundance Determinations
cambmag: Magnetic Fields in CAMB
DecouplingModes: Passive modes amplitudes

DICE/ColDICE: 6D collisionless phase space hydrodynamics using a lagrangian tesselation
GABE: Grid And Bubble Evolver
Gnuastro: GNU Astronomy Utilities
hh0: Hierarchical Hubble Constant Inference

InitialConditions: Initial series solutions for perturbations in our Universe
iWander: Dynamics of interstellar wanderers
RadVel: General toolkit for modeling Radial Velocities
Stan: Statistical inference

Funding for the ASCL

The ASCL will receive funding for two years from NASA’s Astrophysics Data Analysis Program (ADAP) to improve the discoverability of NASA-funded astrophysics research software through the ASCL. The project will run under the direction of Dr. Peter Teuben, PI, and Alice Allen, Co-I, through the University of Maryland, College Park.

ASCL at the AAS231 Hack Together Day

The ASCL was well-represented at the AAS 231 Hack Together Day on Friday, January 12, with Advisory Committee Chairman Peter Teuben working on two hacks, one of which hopes to provide better guidance regarding software to reviewers, dashboard developer PW Ryan also working on two hacks, both related to the ASCL and research we’re conducting, and yours truly; I mostly worked on ASCL tasks that have been backlogged, such as adding preferred citation information to ASCL entries. The ASCL currently has preferred citation information listed for 25% of our entries; we will be adding this information to more records in 2018 where we can find it, though I note that many code sites do not list a preferred citation on their download sites.

For one of his hacks, Ryan grabbed all the Github links in ASCL entries, and then using a Ruby Gem that looks for licenses in Github repos, reported on the licensing information available. These results are preliminary, so please don’t take them as gospel, but it appears that a whopping 34% of these codes do not have licensing information in the repo. The most popular license is MIT, which does not surprise me, as Daniel Foreman-Mackey reported in the Special Session we held at AAS 225 that MIT was the popular license across all Github repos that have licensing info.

Report on the Astronomy Software Publishing Special Session at AAS231

On Thursday, January 11, the Astrophysics Source Code Library (ASCL) and Astronomical Data Group at the Flatiron Institute organized a Special Session at the 231st AAS meeting in National Harbor, MD on Astronomy Software Publishing: Community Roles and Services, the sixth in a series of software-focused sessions that the ASCL, sometimes with others, has organized at AAS meetings.

"Really glad to see software article publication and citation getting attention at the #AAS231 meeting. Great articles like Daniel Foreman-Mackey's "emcee: The MCMC Hammer in PASP is a perfect example of a highly-referenced software article."Peter Teuben from the University of Maryland and chair of the ASCL’s Advisory Committee) opened the session with a few words about the use of software in research articles. He outlined the layout of the session. A talk by Matteo Cantiello set the scene on how we have reached the point where we are now. Four presentations by representatives from different journals presented their policies on software publication followed Cantiello’s talk, and they were followed by presentations by representatives of others with roles in publishing software: the software author, the data editor, the ADS and the ASCL. The floor was then opened for discussion and Q&A. Teuben moderated the discussion, and at the end of it, turned the podium over to Robert Nemiroff from Michigan Technological University, and a founder of the ASCL, for a summary and closing remarks.

Presentations
Some of the main points from each presentation are summarized below; the titles of each are links to the slides used by the presenters.

  • The Evolution of Software Publication in Astronomy, Matteo Cantiello (Flatiron Institute)
    Cantiello states that the complexity of astrophysics requires computationally intensive models, making astronomy a digital science, and that astronomers have a rich computational environment available, allowing them to easily version, share, and deploy astronomy software. Reproducibility paradoxDespite this, software is often not shared, resulting in a reproducibility paradox: astronomers use computation to provide precise, accurate results, but research has become less transparent with the increase in the use of computational methods. Adding external links to papers to link to software is not a reliable solution to software sharing because of link rot. Formats have changed very little in the last 400 years; despite progress both technologically and socially, the format of papers is still largely the same. He stated that astronomy now has an opportunity to rethink scientific papers as research repositories, with executable objects containing narrative, figures, data, and code.
  • Software papers and citation in the AAS Journals, Chris Lintott (AAS Journals)
    The AAS journals policy on software until recently was set in 1964, which stated that the “need for communication between astronomers interested in computation is already supplied by associations of users of automated computing machines.” The AAS journals changed their policies at the beginning of 2016, and recognized that if novel code is important to published research then it is likely appropriate to describe it in such a paper. Papers can be short, descriptive, and need not include research resultsAAS journals are interested in disclosing software in a form that is currently recognized: the research article, so now allow short papers on code that can be short, descriptive, and do not need to include scientific results. AAS formally recommends open source licensing but does not require it. AAS journals ask people to cite the software paper, as this is the currency the field cares about and also ask people to cite the code. In addition, they request people use the \software{} tag to create a software section in a paper; this is similar to the \facilities{} tag already in use. AAS Publishing continues to think about how to improve, and are introducing the concept of ‘living’ papers, which can be updated with new sections and expanded author lists, so software authors don’t need to publish a new paper to give credit to software authors who have contributed to a new version of the software. Lintott encouraged those interested in living papers to contact him.
  • Software policies and guidelines at Nature, Leslie J. Sage (Nature)
    First, Sage explained the context in which Nature‘s policy is created: Nature is driven by biologists, who live in a very different world from astronomers. Unlike astronomers, biologists live in Windows world. Right now, two journals, Nature Methods and Nature Biotech, require code to be made available, and there are ongoing discussions about whether Nature should do this for other journals. There are formidable problems because of the issue of very specialized code, for example, code that is optimized to compile on a particular Beowulf cluster that may not compile anywhere else. There will be a call for public comment, and Sage hopes astronomers will provide input that is useful for astronomers within that context. Sage raised a number of points that warrant public discussion, such as a preference voiced by some to see detailed descriptions of the algorithms used rather than having the scripts published. Another point to consider for input is that though a lot of software has been made publicly available, all software is written with certain constraints and boundary conditions; some people not aware of these constraints and conditions may drive the software beyond those limits; this raises the question as to whether the results are physically meaningful.
  • SpringerNature data and software policies for astrophysics journals, Ramon Khanna (Springer)
    Springer is encouraging authors to take care of transparency and reproducibility of their results presented in articles, allowing them to append relevant information on source code or the full code in an appendix of the paper; authors can also append the full code, or use other methods to provide this information, such as alternative repositories (e.g., CDS, ASCL, Figshare), and making this information available. They would like the full data and code available. Khanna acknowledged some challenges, including that authors are often not willing to share their software and/or data, editors are often not willing or at least not determined enough to execute policy, and citation standards are unclear. One of the questions is how to execute this policy in the face of unwillingness from authors, editors, reviewers, etc.Questions arise as to how software can be peer reviewed; this would require standards for documentation,  presenting how the results were obtained, making data and software available, and for reviewing the source code itself. How can referees handle this effort? Khanna pointed out that in a field as advanced as astronomy is, and already having some standards and domain resources such as archives, it’s not so much the publisher that should drive new standards, but the community itself.
  • Journal of Open Source Software (JOSS): Design and first-year review, Arfon M. Smith (STScI/JOSS)
    Smith stated that he created JOSS accidentally, from frustration about the overhead of publishing papers about software, and acknowledged that software papers are a hack of the current system to provide a citable, creditable research object for software.  JOSS (http://joss.theoj.org/) seeks to improve the quality of software; its peer review process is almost entirely about the software that’s submitted, and includes making sure the documentation is sufficiently fleshed out, that the package includes automated tests, and that the software has an open source license so can be reused. Smith said it should take about an hour to write a one-page paper for JOSS for those with a well set up repository for their code. The reviews are public on Github and accepted submissions appear on the JOSS site, which has published 200 papers online.
  • Lessons Learned through the Development and Publication of AstroImageJ, Karen Collins (Center for Astrophysics)
    Collins discussed her experience with publishing her software AstroImageJ, a data reduction and image display interface with analysis capabilities specialized for time series differential photometry. She developed the code over several years to support her research. She initially had no intention of releasing the code to the public, but her collaborators saw her plots and graphs and asked to use the software, which was posted to the university’s website to give team members access to it. AstroImageJ lessons learnedShe found her fellow KELT-FUN team members were an excellent focus group; they provided great feedback on the software before it was published, enabling her to add useful features to the software. Results using AstroImageJ started appearing in journals; she registered the software with the ASCL to give it a citable reference, and as usage (and support tasks) grew, she and others working on the code decided to submit a paper to the Astronomical Journal (AJ) to provide good exposure to the potential userbase for the software. This resulted in about 4K downloads of the software in the first year, and the paper is listed 4th on AJ’s most read list. Among the lessons learned in publishing AstroImageJ are to specify how your code is licensed and how it should be cited, make the source code easily accessible, and provide easy way to install and update the software.
  • The roles of the AAS Journals’ Data Editors, August Muench (AAS Journals)
    Muench covered the data editors’ workflow for all submitted manuscripts. A quick review of 60%-90% of all submitted manuscripts is performed, with scripts run on the manuscripts to identify references to code by looking for such things as Github repositories to see whether  their citations need to be reviewed. The editors make notes on the software, data, and figures for review by a scientific editor or the author with recommendations for improving citations for these research artifacts. A subset of accepted articles, 15-20%, undergo a more rigorous post-acceptance data review; this includes a review of tabular data, figures, and interactive elements in addition to software. If necessary, the data editors request that authors acquire DOIs or get preferred citati"People recognize software via plots (and other fingerprints). Make sure you cite the code. I still recognize plots made with PAW."ons for the software used in the research. Muench mentioned that he uses ten keywords in his scripts to identify software, and ends up with a surprising number of articles that do not mention code at all. He stated that part of a data editor’s role to improve software and data citation is educating authors.
  • The role of the ADS in software discovery and citation, Alberto Accomazzi (NASA Astrophysics Data System)
    Accomazzi described what ADS does to promote software discovery and citation, but first he shared ADS’s traditional core responsibilities: to discover content, typically science papers, related to astronomy. Some years ago, the capability to track citations was introduced. As the expectations of community have evolved, so have ADS’s policies, moving from ingesting records about scientific papers to records about scholarly works, including data catalogs, observing proposals, and other artifacts such as software. They have also evolved from tracking citations to articles to citations to tracking citations to scholarly content. How ADS awards citationsADS has an interest in enabling linking so users can easily and uniquely identify the software that was used. Accomazzi covered how ADS ingestion works; for content to be considered for inclusion in ADS, it must be scholarly, related to astronomy, and published formally — not just on a website, but following an explicit editorial process. He also discussed how citations are tracked and what ADS needs to count a citation, going through several examples of what does and does not work for citation. The bottom line for software is to cite it by using a formal citation and a unique identifier; a URL to a website or a DOI in a footnote are not captured as citations. ASCL, JOSS, and Zenodo are ways software can get a persistent identifier to use in a formal citation, and these citations can be tracked by ADS. Accomazzi also discussed how software may have several records in ADS, and that in the future, these records will be crosslinked, as will different versions of a software package so that eventually, ADS can provide cumulative metrics for all different versions of that software product, and like all citation data, this information will become publicly available through an API.
  • The Astrophysics Source Code Library: Supporting software publication and citation, Alice Allen (ASCL/UMD)
    Allen gave a brief overview of what the ASCL is, and stated that though entries in this citable online registry usually point to a software package’s download it, the ASCL can and does serve as a repository for those authors who want to deposit an archive file for their code. The ASCL assigns a DOI to software that it stores. She covered the three main reasons the ASCL exists: to make research more transparent, to improve communication about research computations, and to disseminate software of utility to others. Allen acknowledged that though there is software that might be useful to astronomy, the ASCL focuses on that which has been used in refereed research or submitted for refereeing, this to support the research record. ASCL editors take an active approach in looking for software in research papers and registering them; authors are encouraged to submit their own software, too, and submission by author increased 23% in 2017 over 2016. The ASCL supports software publication and citation in a number of ways, including providing a citation avenue for software and listing preferred citation information in ASCL entries. The ASCL has been online since 1999; it supports the Force11 software citation principles and was a party to  developing them. It was also party to a Dagstuhl Manifesto, another cross-disciplinary effort  that focused on steps members of a research community can take on their own. Among these steps is citing software properly — in a trackable way — and when reviewing a paper, ensuring that it cites the software used in the research.

Discussion
After the presentations, Teuben commented that he thought journals could do a better job in instructing referees about software, to identify when code is involved in research and insist on citations to it. He hoped the discussion would touch on this, and then opened the floor to all. People interested in software sustainability might want to follow @si2urssi (urssi.us) which is working to plan a US research software systainability institute - a first workshop will likely be in AprilDiscussion was lively and may be covered in more depth in a future post, but some of the major points were:

  • There’s still fear about releasing software, still resistance to doing so
  • Science is all about reproducibility; it’s not science if it’s not reproducible
  • Who should push for greater openness is an open question, with some wanting journals to do this, and others feeling it’s up to the astronomy community — us! — to enforce the standards we want
  • Astronomers are often not trained in software engineering techniques; greater education in this area would be helpful

"If software developers were well funded, ie would be easier to get people to share their code."

Teuben brought the discussion to an end and turned the floor over to Robert Nemiroff (Michigan Technological University), who briefly summarized the presentations and discussion and closed the session.

My thanks to David W. Hogg and Peter Teuben for work on developing the session, to Peter for his excellent moderating, to Robert for closing the session, and for PW Ryan for serving as scribe. My thanks to Matteo, Chris, Leslie, Ramon, Arfon, Karen, Gus, and Alberto for their excellent presentations and participation, to the Astronomical Data Group at the Flatiron Institute for partnering with the ASCL, and to the Heidelberg Institute for Theoretical Studies, the University of Maryland College Park, and Michigan Technological University for supporting the ASCL.

Slides from Astronomy Software Publishing: Community Roles and Services

THURSDAY, 11 JANUARY 2018
Special Session: Astronomy Software Publishing: Community Roles and Services
10:00 am – 11:30 am
National Harbor 2

The Astrophysics Source Code Library (ASCL) and Astronomical Data Group at the Flatiron Institute organized a Special Session at the 231st AAS meeting in National Harbor, MD on Astronomy Software Publishing: Community Roles and Services. Click on a talk’s title to download its slides.


Matteo Cantiello (Flatiron Institute), The Evolution of Software Publication in Astronomy
Chris Lintott (AAS Journals), Software papers and citation in the AAS Journals
Leslie J. Sage (Nature), Software policies and guidelines at Nature
Ramon Khanna (Springer), SpringerNature data and software policies for astrophysics journals
Arfon M. Smith (STScI/JOSS), Journal of Open Source Software (JOSS): Design and first-year review
Karen Collins (Center for Astrophysics), Lessons Learned through the Development and Publication of AstroImageJ
August Muench (AAS Journals), The roles of the AAS Journals’ Data Editors
Alberto Accomazzi (NASA Astrophysics Data System), The role of the ADS in software discovery and citation
Alice Allen (ASCL/UMD), The Astrophysics Source Code Library: Supporting software publication and citation

ASCL research poster as AAS 231

Poster for Schroedinger's Code research paper showing results
Astronomers use software for their research, but how many of the codes they use are available as source code? We examined a sample of 166 papers from 2015 for clearly identified software use, then searched for source code for the software packages mentioned in these research papers. We categorized the software to indicate whether source code is available for download and whether there are restrictions to accessing it, and if source code was not available, whether some other form of the software, such as a binary, was. Over 40% of the source code for the software used in our sample was not available for download.

As URLs have often been used as proxy citations for software, we also extracted URLs from one journal’s 2015 research articles, removed those from certain long-term, reliable domains, and tested the remainder to determine what percentage of these URLs were still accessible in September and October, 2017.

P. Wesley Ryan, Astrophysics Source Code Library
Alice Allen, Astrophysics Source Code Library/University of Maryland
Peter Teuben, University of Maryland

Download poster
Download article pre-print