The main events this past week were finishing up and submitting an article, this for the special issue of PeerJ Computer Science, which I’ve previously mentioned, and prepping for and holding the monthly SciCodes meetings. Unfortunately, our paper was desk-rejected for being out of scope for the journal (yet in scope for the call for papers). This happened very quickly, which gave the author team some time to determine what our options might be and how we would approach considering them before the week was out. The SciCodes meetings went well and I had a great chat with a possible new participant in the consortium; it was a very fruitful conversation. As is common, the week included curation, new entries, social media post scheduling, and correspondence. Sixteen records were curated, some of them the result of scheduling of seven daily code posts, and three new entries were staged. All in all, a busy week, with elation, disappointment, determination, and some whining from the ASCL Central cat, who, poor thing, is going to the kitty dentist on Monday for evaluation before his dental surgery later this month.
Meetings and writing, writing and meetings. Sure, Monday was officially a holiday, but I spent the day mostly on writing tasks; later in the week, the SciCodes team working on a paper met twice for writing sprints and discussion, and I attended the monthly Force11 Software Citation Implementation Working Group meeting. I also met with ASCL founder Robert Nemiroff and ASCL editors Kimberly DuPrie and Catherine Gosmeyer. Peter Teuben, chair of the ASCL’s Advisory Committee, usually attends the editor meetings as well but had a prior commitment. Most of the meeting was spent discussing updates to our procedures and communications; we also talked about upcoming conferences and got caught up on what (and how) we’re all doing. Still, we curated records and sent correspondence, usual tasks most weeks, and Peter submitted the final report for our NASA ADAP project.
August is over, and so is our hiatus from weekly posting. This past week, we added 21 new entries, curated 21 entries, and have so far sent 22 registration notices. Of the 25 codes added to the ASCL in August, seven had been submitted by their authors. We also staged “Today’s random code” social media posts through September 14, and shared a blog post, cross-posted on several other sites including Better Scientific Software, on best practices for entities such as the ASCL. I spent a lot of time this week on a paper that expands on the best practices, this written collaboratively with some of the SciCodes participants, and we expect to submit this paper soon to a special issue of PeerJ Computer Science. I also wrote a final report for our NASA ADAP project that will be submitted this coming week.
Twenty-five codes were added to the ASCL in August:
AMOEBA: Automated Gaussian decomposition
AUM: A Unified Modeling scheme for galaxy abundance, galaxy clustering and galaxy-galaxy lensing
AutoProf: Automatic Isophotal solutions for galaxy images
BOSS-Without-Windows: Window-free analysis of the BOSS DR12 power spectrum and bispectrum
caesar-rest: Web service for the caesar source extractor
CatBoost: High performance gradient boosting on decision trees library
catwoman: Transit modeling Python package for asymmetric light curves
Chemulator: Thermochemical emulator for hydrodynamical modeling
CMC-COSMIC: Cluster Monte Carlo code
Cosmic-CoNN: Cosmic ray detection toolkit
COSMIC: Compact Object Synthesis and Monte Carlo Investigation Code
DBSP_DRP: DBSP Data Reduction Pipeline
ELISa: Eclipsing binaries Learning Interactive System
ExoPlaSim: Exoplanet climate simulator
FIREFLY: Chi-squared minimization full spectral fitting code
HRK: HII Region Kinematics
iminuit: Jupyter-friendly Python interface for C++ MINUIT2
MAPS: Multi-frequency Angular Power Spectrum estimator
millennium-tap-query: Python tool to query the Millennium Simulation UWS/TAP client
NRDD_constraints: Dark Matter interaction with the Standard Model exclusion plot calculator
PIPS: Period detection and Identification Pipeline Suite
SORA: Stellar Occultation Reduction Analysis
StelNet: Stellar mass and age predictor
viper: Velocity and IP EstimatoR
WaldoInSky: Anomaly detection algorithms for time-domain astronomy
by Alejandra Gonzalez-Beltran, Alice Allen, Allen Lee, Daniel Garijo, Thomas Morrell, SciCodes Consortium
This post is cross-posted on the SciCodes website, the US Research Software Sustainability Institute blog, the UK Software Sustainability Institute blog, and the FORCE11 blog.
Software is a fundamental element of the scientific process, and cataloguing scientific software is helpful to enable software discoverability. During the years 2019-2020, the Task Force on Best Practices for Software Registries of the FORCE11 Software Citation Implementation Working Group worked to create Nine Best Practices for Scientific Software Registries and Repositories. In this post, we explain why scientific software registries and repositories are important, why we wanted to create a list of best practices for such registries and repositories, the process we followed, what the best practices include, and what the next steps for this community are.
Why are scientific software registries and repositories important?
Scientific software registries and repositories support identifying and finding software, provide information for software citation, foster long-term preservation and reuse of computational methods, and ultimately, improve research reproducibility and replicability.
Why did we write these guidelines?
Managers of scientific software registries and repositories have been working independently to run their services and provide useful information and tools to users in different communities. The Best Practices for Software Registries Task Force participants had different perspectives representing a heterogeneous set of resources, but came together for the common goal of creating a list of best practices for scientific software registries. These shared practices help to raise awareness of software as a research output, enable credit for software creators, and guide curators working on software catalogues through the steps to consider when setting up their software registries. In the longer term, we hope to improve the interoperability of the software metadata supported by different services.
The goals that we considered for writing the guidelines were:
- to have a minimal number of best practices, easy to adopt by repository managers
- to be broadly applicable to most or all of our resources
- to be descriptive on a meta level, not prescriptive, and focused on what the best practices should do or provide, not on what a suggested policy or element should specifically say.
What are the best practices?
Our guidelines, listed below, provide an overview of the key points to take into consideration when creating a software registry. They are:
- Provide a public scope statement (examples)
- Provide guidance for users
- Provide guidance to software contributors
- Establish an authorship policy (examples)
- Share your metadata schema (examples)
- Stipulate conditions of use (examples)
- Provide a retention policy (examples)
- Disclose your end-of-life policy (examples)
Our pre-print offers more explanation about each guideline and a longer list of implementations that we found when we were doing our work on these practices.
What process did we follow to produce the guidelines?
Representatives from numerous software registries and repositories were involved in the FORCE11 Software Citation Implementation Working Group (SCIWG). Alice Allen proposed that we form a task force within the SCIWG for writing up some best practices for the registries and repositories, and with acceptance by the co-chairs of the SCIWG and interest from relevant people, the Task Force on Best Practices for Software Registries was formed. Initially, we gathered information from members of this Task Force to learn more about each resource and to identify some of our overlapping interests. We then identified potential best practices based on prior issues we experienced running our services and discussed what each potential practice might include or exclude.
Through iterative deliberations, we determined which of the potential practices were the most broadly applicable. With generous funding from the Alfred P. Sloan Foundation, we hosted a workshop for scientific registries and repositories, part of which was devoted to gathering final consensus around the Best Practices. The workshop included registries who were not part of the Task Force, resulting in a broader set of contributions to the final list.
What are the next steps for the group?
Our goal is to continue our efforts by implementing these practices more uniformly in our own registries and repositories and reducing the burdens of adoption. We have created SciCodes, a consortium of scientific software registries and repositories, which is now defining the next priorities to tackle, such as tracking the impact of good metadata, improving interoperability between registries, and making our metadata more discoverable by search engines and services such as Google Scholar, ORCID, and discipline indexes. We are also sharing tools and ideas in a series of presentations that are recorded and available for viewing on the SciCodes website, so please check them out!
Thirty codes were added to the ASCL in July:
AlignBandColors: Inter-color-band image alignment tool
ART: A Reconstruction Tool
Balrog: Astronomical image simulation
Chem-I-Calc: Chemical Information Calculator
cosmic_variance: Cosmic variance calculator
Kd-match: Correspondences of objects between two catalogs through pattern matching
KeplerPORTS: Kepler Planet Occurrence Rate Tools
light-curve: Light curve analysis toolbox
MCPM: Modified CPM method
nimbus: A Bayesian inference framework to constrain kilonova models
PlaSim: Planet Simulator
PMN-body: Particle Mesh N-body code
PyCactus: Post-processing tools for Cactus computational toolkit simulation data
PyROA: Modeling quasar light curves
ReionYuga: Epoch of Reionization neutral Hydrogen field generator
RePrimAnd: Recovery of Primitives And EOS framework
ROA: Running Optimal Average
shapelens: Astronomical image analysis and shape estimation framework
shear-stacking: Stacked shear profiles and tests based upon them
Skylens++: Simulation package for optical astronomical observations
Skymapper: Mapping astronomical survey data on the sky
snmachine: Photometric supernova classification
SpArcFiRe: SPiral ARC FInder and REporter
TRINITY: Dark matter halos, galaxies and supermassive black holes empirical model
Ah, the last week of the month, when most new code entries appear! Thirty codes were added to the ASCL this week; nine of them had been submitted by their authors or a user and the other twenty-one entries were the work of the three ASCL editors.
Yes, users can submit codes, and do! Sometimes they do so because they would like the cite the software and a good way to do so doesn’t already exist. The ASCL ID can be used to cite the code; these citations are picked up and tracked by indexers such as ADS and Web of Science. We welcome code submissions, and after we have assigned an ASCL ID, we send a registration notification email to one or more of the code authors.
In addition to adding/processing new entries and staging a few for future processing, fourteen existing entries were curated, most as a result of our daily random code activity. We’re also always checking our site links, and fixed a few that weren’t working.
I spent a good bit of time on research and writing, too, and participated in a writing sprint, this past week, and will continue these activities this coming week.
Most of my work this week was preparatory and writing related: literature searches and reading. This week’s two SciCodes writing sprints involved a lot of discussion, and now that we’ve hashed out what we want to do and who’s going to do what, we will work mostly independently and meet on Fridays to go over our work together.
While I was busy with article work, other ASCL editors added six code entries to our staging area, and two entries were submitted by code authors. Twelve entries were curated, social media posts were created and published, and there was some technical work, too, to solve a small issue with the ASCL email accounts.
This week, we’ll be concentrating on adding code entries and assigning ASCL IDs.
Advisory Committee Chair Peter Teuben participated in the MODEST-21a AMUSE workshop this past week; he gave a talk on the first day on NEMO and its relation to AMUSE. He also updated the list of codes related to NEMO, including adding ASCL links to the entries that didn’t already have them. Three new code entries were staged, and, perhaps wholly or in part because of Peter’s participation in the AMUSE workshop, six new codes were submitted to the ASCL by their authors. Nine entries were curated, and about a week’s worth of social media posts were scheduled.
SciCodes meetings were this past Thursday, so a good part of my activity this week focused on preparing for that meeting, working on tasks on the SciCodes To Do list, and doing follow-up work after the meeting. We’ve scheduled a couple of writing sprints for a paper we hope to submit to a special issue of PeerJ Computer Science for this next week, and as always, it seems, I am behind on my own writing so plan to work on that this coming week, too.
This week, six notification emails were sent to code authors, fifteen entries were curated, and three new entries were staged. Associate Editor Kimberly DuPrie maintains one of our link checkers and follows up on bad links. We (and by “we,” I mean primarily Kimberly) do a lot to find sites for software that has gone missing from where it used to be, and most weeks, including this one, she writes to one or more code author asking for a good link to replace the bad.
Sometimes, a software author hasn’t realized the code’s site is down; other times, the author has changed institutions, so a previous site has been wiped out. As I’ve mentioned before, we have downloaded archive files of most of the codes listed in the ASCL; we also often download information related to these codes, including the code website’s HTML files and, where they exist, user manuals. This makes it easy for us to provide these artifacts to authors whose code sites have disappeared. Alternatively, we can create an archive file of the code and the additional information we have and offer it for download if a code author prefers to have the ASCL host the code.
Other work this week was getting a bit of collaborative writing finalized, this with SciCodes participants, and talking with Robert Nemiroff on some ideas for the ASCL’s future. I tried to attend the FORCE11 Software Citation Implementation Working Group monthly call on Tuesday, but my location on that day had no cell service and wifi that was not up to the task of Zoom, alas; fortunately, the notes from the meeting are online.