On Thursday, January 11, the Astrophysics Source Code Library (ASCL) and Astronomical Data Group at the Flatiron Institute organized a Special Session at the 231st AAS meeting in National Harbor, MD on Astronomy Software Publishing: Community Roles and Services, the sixth in a series of software-focused sessions that the ASCL, sometimes with others, has organized at AAS meetings.
Peter Teuben from the University of Maryland and chair of the ASCL’s Advisory Committee) opened the session with a few words about the use of software in research articles. He outlined the layout of the session. A talk by Matteo Cantiello set the scene on how we have reached the point where we are now. Four presentations by representatives from different journals presented their policies on software publication followed Cantiello’s talk, and they were followed by presentations by representatives of others with roles in publishing software: the software author, the data editor, the ADS and the ASCL. The floor was then opened for discussion and Q&A. Teuben moderated the discussion, and at the end of it, turned the podium over to Robert Nemiroff from Michigan Technological University, and a founder of the ASCL, for a summary and closing remarks.
Presentations
Some of the main points from each presentation are summarized below; the titles of each are links to the slides used by the presenters.
- The Evolution of Software Publication in Astronomy, Matteo Cantiello (Flatiron Institute)
Cantiello states that the complexity of astrophysics requires computationally intensive models, making astronomy a digital science, and that astronomers have a rich computational environment available, allowing them to easily version, share, and deploy astronomy software. Despite this, software is often not shared, resulting in a reproducibility paradox: astronomers use computation to provide precise, accurate results, but research has become less transparent with the increase in the use of computational methods. Adding external links to papers to link to software is not a reliable solution to software sharing because of link rot. Formats have changed very little in the last 400 years; despite progress both technologically and socially, the format of papers is still largely the same. He stated that astronomy now has an opportunity to rethink scientific papers as research repositories, with executable objects containing narrative, figures, data, and code.
- Software papers and citation in the AAS Journals, Chris Lintott (AAS Journals)
The AAS journals policy on software until recently was set in 1964, which stated that the “need for communication between astronomers interested in computation is already supplied by associations of users of automated computing machines.” The AAS journals changed their policies at the beginning of 2016, and recognized that if novel code is important to published research then it is likely appropriate to describe it in such a paper. AAS journals are interested in disclosing software in a form that is currently recognized: the research article, so now allow short papers on code that can be short, descriptive, and do not need to include scientific results. AAS formally recommends open source licensing but does not require it. AAS journals ask people to cite the software paper, as this is the currency the field cares about and also ask people to cite the code. In addition, they request people use the \software{} tag to create a software section in a paper; this is similar to the \facilities{} tag already in use. AAS Publishing continues to think about how to improve, and are introducing the concept of ‘living’ papers, which can be updated with new sections and expanded author lists, so software authors don’t need to publish a new paper to give credit to software authors who have contributed to a new version of the software. Lintott encouraged those interested in living papers to contact him.
- Software policies and guidelines at Nature, Leslie J. Sage (Nature)
First, Sage explained the context in which Nature‘s policy is created: Nature is driven by biologists, who live in a very different world from astronomers. Unlike astronomers, biologists live in Windows world. Right now, two journals, Nature Methods and Nature Biotech, require code to be made available, and there are ongoing discussions about whether Nature should do this for other journals. There are formidable problems because of the issue of very specialized code, for example, code that is optimized to compile on a particular Beowulf cluster that may not compile anywhere else. There will be a call for public comment, and Sage hopes astronomers will provide input that is useful for astronomers within that context. Sage raised a number of points that warrant public discussion, such as a preference voiced by some to see detailed descriptions of the algorithms used rather than having the scripts published. Another point to consider for input is that though a lot of software has been made publicly available, all software is written with certain constraints and boundary conditions; some people not aware of these constraints and conditions may drive the software beyond those limits; this raises the question as to whether the results are physically meaningful.
- SpringerNature data and software policies for astrophysics journals, Ramon Khanna (Springer)
Springer is encouraging authors to take care of transparency and reproducibility of their results presented in articles, allowing them to append relevant information on source code or the full code in an appendix of the paper; authors can also append the full code, or use other methods to provide this information, such as alternative repositories (e.g., CDS, ASCL, Figshare), and making this information available. They would like the full data and code available. Khanna acknowledged some challenges, including that authors are often not willing to share their software and/or data, editors are often not willing or at least not determined enough to execute policy, and citation standards are unclear. Questions arise as to how software can be peer reviewed; this would require standards for documentation, presenting how the results were obtained, making data and software available, and for reviewing the source code itself. How can referees handle this effort? Khanna pointed out that in a field as advanced as astronomy is, and already having some standards and domain resources such as archives, it’s not so much the publisher that should drive new standards, but the community itself.
- Journal of Open Source Software (JOSS): Design and first-year review, Arfon M. Smith (STScI/JOSS)
Smith stated that he created JOSS accidentally, from frustration about the overhead of publishing papers about software, and acknowledged that software papers are a hack of the current system to provide a citable, creditable research object for software. JOSS (http://joss.theoj.org/) seeks to improve the quality of software; its peer review process is almost entirely about the software that’s submitted, and includes making sure the documentation is sufficiently fleshed out, that the package includes automated tests, and that the software has an open source license so can be reused. Smith said it should take about an hour to write a one-page paper for JOSS for those with a well set up repository for their code. The reviews are public on Github and accepted submissions appear on the JOSS site, which has published 200 papers online.
- Lessons Learned through the Development and Publication of AstroImageJ, Karen Collins (Center for Astrophysics)
Collins discussed her experience with publishing her software AstroImageJ, a data reduction and image display interface with analysis capabilities specialized for time series differential photometry. She developed the code over several years to support her research. She initially had no intention of releasing the code to the public, but her collaborators saw her plots and graphs and asked to use the software, which was posted to the university’s website to give team members access to it. She found her fellow KELT-FUN team members were an excellent focus group; they provided great feedback on the software before it was published, enabling her to add useful features to the software. Results using AstroImageJ started appearing in journals; she registered the software with the ASCL to give it a citable reference, and as usage (and support tasks) grew, she and others working on the code decided to submit a paper to the Astronomical Journal (AJ) to provide good exposure to the potential userbase for the software. This resulted in about 4K downloads of the software in the first year, and the paper is listed 4th on AJ’s most read list. Among the lessons learned in publishing AstroImageJ are to specify how your code is licensed and how it should be cited, make the source code easily accessible, and provide easy way to install and update the software.
- The roles of the AAS Journals’ Data Editors, August Muench (AAS Journals)
Muench covered the data editors’ workflow for all submitted manuscripts. A quick review of 60%-90% of all submitted manuscripts is performed, with scripts run on the manuscripts to identify references to code by looking for such things as Github repositories to see whether their citations need to be reviewed. The editors make notes on the software, data, and figures for review by a scientific editor or the author with recommendations for improving citations for these research artifacts. A subset of accepted articles, 15-20%, undergo a more rigorous post-acceptance data review; this includes a review of tabular data, figures, and interactive elements in addition to software. If necessary, the data editors request that authors acquire DOIs or get preferred citations for the software used in the research. Muench mentioned that he uses ten keywords in his scripts to identify software, and ends up with a surprising number of articles that do not mention code at all. He stated that part of a data editor’s role to improve software and data citation is educating authors.
- The role of the ADS in software discovery and citation, Alberto Accomazzi (NASA Astrophysics Data System)
Accomazzi described what ADS does to promote software discovery and citation, but first he shared ADS’s traditional core responsibilities: to discover content, typically science papers, related to astronomy. Some years ago, the capability to track citations was introduced. As the expectations of community have evolved, so have ADS’s policies, moving from ingesting records about scientific papers to records about scholarly works, including data catalogs, observing proposals, and other artifacts such as software. They have also evolved from tracking citations to articles to citations to tracking citations to scholarly content. ADS has an interest in enabling linking so users can easily and uniquely identify the software that was used. Accomazzi covered how ADS ingestion works; for content to be considered for inclusion in ADS, it must be scholarly, related to astronomy, and published formally — not just on a website, but following an explicit editorial process. He also discussed how citations are tracked and what ADS needs to count a citation, going through several examples of what does and does not work for citation. The bottom line for software is to cite it by using a formal citation and a unique identifier; a URL to a website or a DOI in a footnote are not captured as citations. ASCL, JOSS, and Zenodo are ways software can get a persistent identifier to use in a formal citation, and these citations can be tracked by ADS. Accomazzi also discussed how software may have several records in ADS, and that in the future, these records will be crosslinked, as will different versions of a software package so that eventually, ADS can provide cumulative metrics for all different versions of that software product, and like all citation data, this information will become publicly available through an API.
- The Astrophysics Source Code Library: Supporting software publication and citation, Alice Allen (ASCL/UMD)
Allen gave a brief overview of what the ASCL is, and stated that though entries in this citable online registry usually point to a software package’s download it, the ASCL can and does serve as a repository for those authors who want to deposit an archive file for their code. The ASCL assigns a DOI to software that it stores. She covered the three main reasons the ASCL exists: to make research more transparent, to improve communication about research computations, and to disseminate software of utility to others. Allen acknowledged that though there is software that might be useful to astronomy, the ASCL focuses on that which has been used in refereed research or submitted for refereeing, this to support the research record. ASCL editors take an active approach in looking for software in research papers and registering them; authors are encouraged to submit their own software, too, and submission by author increased 23% in 2017 over 2016. The ASCL supports software publication and citation in a number of ways, including providing a citation avenue for software and listing preferred citation information in ASCL entries. The ASCL has been online since 1999; it supports the Force11 software citation principles and was a party to developing them. It was also party to a Dagstuhl Manifesto, another cross-disciplinary effort that focused on steps members of a research community can take on their own. Among these steps is citing software properly — in a trackable way — and when reviewing a paper, ensuring that it cites the software used in the research.
Discussion
After the presentations, Teuben commented that he thought journals could do a better job in instructing referees about software, to identify when code is involved in research and insist on citations to it. He hoped the discussion would touch on this, and then opened the floor to all. Discussion was lively and may be covered in more depth in a future post, but some of the major points were:
- There’s still fear about releasing software, still resistance to doing so
- Science is all about reproducibility; it’s not science if it’s not reproducible
- Who should push for greater openness is an open question, with some wanting journals to do this, and others feeling it’s up to the astronomy community — us! — to enforce the standards we want
- Astronomers are often not trained in software engineering techniques; greater education in this area would be helpful
Teuben brought the discussion to an end and turned the floor over to Robert Nemiroff (Michigan Technological University), who briefly summarized the presentations and discussion and closed the session.
My thanks to David W. Hogg and Peter Teuben for work on developing the session, to Peter for his excellent moderating, to Robert for closing the session, and for PW Ryan for serving as scribe. My thanks to Matteo, Chris, Leslie, Ramon, Arfon, Karen, Gus, and Alberto for their excellent presentations and participation, to the Astronomical Data Group at the Flatiron Institute for partnering with the ASCL, and to the Heidelberg Institute for Theoretical Studies, the University of Maryland College Park, and Michigan Technological University for supporting the ASCL.