I am involved in several efforts, in addition to the ASCL, to improve recognition and credit for software authors; one such effort is the FORCE11 Software Citation Implementation Working Group (SCIWG), in which several software registries and repositories are involved. These resources, along with others not part of the SCIWG, have formed a Repository Best Practice Task Force, which has held monthly conference calls this year to collaboratively develop a list of best practices for such resources. This has also been an excellent vehicle for enabling people who run these resources to share information about managing software registries and working with software authors, researchers, and journal editors to improve software citation.
Thanks to funding from the Sloan Foundation, members of this Task Force and other software resources are coming together in a Scientific Software Registry Collaboration Workshop to demonstrate unique aspects of our respective services, discuss challenges and share solutions to common issues that arise in managing our resources, finalize a list of best practices for our resources, and work cooperatively to speed adoption of the CodeMeta and/or Citation File Format standards. The workshop has been organized by the Caltech Library and ASCL, and takes place at the University of Maryland (College Park) this coming Wednesday and Thursday (November 13-14). It includes presentations by software registry managers and subject matter experts, break-out sessions for collaborative work, and group discussion.
I’m happy to say we are able to provide remote access to most of the plenary portions of the workshop through Webex; links on the workshop agenda identify the sessions available over Webex. As the workshop has an element of unconferencing, it’s possible that additional portions of the workshop will be suitable for Webex and if so, we will update the agenda accordingly. In addition, we will have someone live-scribing the event; a link to the Google Doc for these notes will be added to the agenda webpage before the workshop begins.
A major focus of this workshop is to discuss and finalize the best practices that have been identified so far in our monthly conference calls. A draft list of the practices (PDF) is available for download below; these are the practices we will be working on in break-out groups during the workshop. Links to the Google Docs we will be using for these breakout sessions are listed on the agenda; this offers another way for anyone interested to see the work being done in this meeting.
I have wanted to meet with others doing work similar to that I do on the ASCL for a long time, and am very grateful to Tom Morrell, Mike Hucka, and Stephen Davison from Caltech Libraries for partnering with me to organize this workshop, and to Josh Greenberg at the Sloan Foundation for thinking this workshop was a good idea and funding the project. My thanks to all of them!
Draft list of Best Practices for research software registries (pdf)
I’ve set a goal of bringing the number of entries missing preferred citation information to under 1000, though that might be just beyond possible. When I started this process, there were 1284 entries without a preferred citation; I’ve examined the software sites and documentation of 150+ of these codes so far and have found explicit citation information for just over 14% of these.
In general, we include a preferred citation in an ASCL record when a code’s site or documentation explicitly states what should be cited (“cite [code] with this [ASCL entry/article/DOI/etc.]”). We don’t assume a paper listed under “References” or “Articles” is intended to be for citation, though that may be the intent of some authors listing them, as some list these papers because a code is built upon others’ work, or these papers include research that used the software.
In some cases, a particular software has no citations to the ASCL record and numerous citations (> 25, let’s say) to a code description paper even though the download site or repo does not specify how the software should be cited. Allowing this “apparent established practice” of citation to substitute for an explicit statement and listing the description paper as the preferred citation seems fair to me, and valuable to those who want to do the right thing by citing a software package but don’t find guidance for how to do so on the code’s site.
We very much prefer that authors provide explicit information on their preferred citation for their programming work, but where they don’t, and where there is an apparent established practice of citation, we will now list that citation method as the preferred citation in the ASCL entry. So far, this inferred information has been added to 15 ASCL entries.
Do you want to discuss different software citation methods before selecting a preferred method? Did I get your software’s preferred citation wrong or miss it entirely? If so, please let me know via email or the Suggest a change link at the bottom of your code’s ASCL entry.
In June, I was invited to participate in a one-day workshop as a member of an expert panel for the The Open Source Software Health Index Project. The subject of software citation came up at lunch with other panel members, and someone suggested that because of the limit on references in prestigious publications, citations for software may be dropped to make room for article citations. This surprised me, since I know that several highly-regarded journals have published articles on the importance of research software, have edited their author guidelines to include more and better information on citing software properly, and have improved how citations to ASCL entries, for example, are treated to ensure their proper capture and tracking by indexers.
So I wrote to editors at a number of prestigious publications such as Nature and Science to ask whether their publications might consider exempting software citations from the reference limits. The prompt replies stated that there is no need to do so: there is room for essential references, and even if there are (soft) limits on the number of references in the main text in the print journal, they are unlimited in the online supplementary materials, the reference list appears in full on the website (the version that has the most readers), and all are picked up (or at least made available for ingestion) as citations in bibliographic databases.
Here is a case in point: this Science paper was printed with a limited number of references, but all 113 appear in the online version, and 92 of them were captured by ADS. Those not captured by ADS include one of the four software references, which is only a link to a website, and other references that are similarly not formatted well for tracking or are to resources ADS does not ingest.
I’m very pleased — and relieved! — to know the commitment to have code cited well carries over to practice and that limiting citations in print format, when this might occur, does not appear to inhibit nor restrict software citation.
We were asked recently how many of our entries were attributed to one, two, or three authors. Would you guess that over a third of the codes in the ASCL — 35% — have only one author? Codes with 1-3 authors attributed, what we dubbed “short author list” codes, account for 68% of our entries. We ended up writing a short paper, published by Research Notes of the AAS (RNAAS), about authorship and citation numbers for team and short author list codes. It was a quick look and we hope to look more deeply into this; if you’d like to do the same, you can download our public data in JSON and find the code that we used for consolidating citations on GitHub.