Writing and organizing seemed to be this week’s theme. Melissa Harrison and I wrote and submitted a proposal for a dedicated working group session at FORCE2021 on behalf of the FORCE11 Software Citation Implementation Working Group and secured a number of speakers for lightning talks. I got a rejection notice on Wednesday for a paper I’d submitted in early September; based on feedback from the reviewers and the to-do list I’d started after submitting it, I edited the paper, intending to post it to arXiv. A couple of people encouraged me to submit it to another journal, however, so I did. I also worked on my ADASS poster and paper. Actual work on the ASCL itself included curating seven entries, processing one submission and assigning the code an ASCL ID, and staging three new entries.
As previously mentioned, curating records in the ASCL is done a number of ways. We ensure that every record gets looked at periodically by querying our database as to which records have not been updated since current year – 3, which this year means January 1, 2018. We’ve been busy looking at records and can now say that every record in the ASCL has been examined for health and/or curated in some way (or added) since January 1, 2018.With that done, we will now start checking entries that haven’t been updated since January 1, 2019, because curation never ends.
This week, we also sent emails to authors of codes added in September and staged three new entries. I attended the FORCE11 Software Citation Implementation Working Group meeting on Tuesday, and later in the week, talked with several people about possible poster presentations at upcoming conferences.
One sad note: On September 27, ASCL Central became catless, alas. RIP, handsome little cat; it was a lovely 15 years.
Thirty codes were added to the ASCL this week, seven of which had been submitted by authors. Nineteen codes were curated, mostly through our work in creating the daily random code social media posts; we scheduled twenty-three posts. This coming week, we’ll be sending out registration notices for the new entries along with other usual correspondence, and I’ll be attending a FORCE11 Software Citation Implementation Working Group meeting on Tuesday.
Thirty codes were added to the ASCL in September:
alpconv: Calculating alp-photon conversion
BHJet: Semi-analytical black hole jet model
BiPoS1: Dynamical processing of the initial binary star population
DviSukta: Spherically Averaged Bispectrum calculator
eMCP: e-MERLIN CASA pipeline
Frankenstein: Flux reconstructor
gammaALPs: Conversion probability between photons and axions/axionlike particles
GLoBES: General Long Baseline Experiment Simulator
gphist: Cosmological expansion history inference using Gaussian processes
Healpix.jl: Julia-only port of the HEALPix library
HSS: The Hough Stream Spotter
HTOF: Astrometric solutions for Hipparcos and Gaia intermediate data
Menura: Multi-GPU numerical model for space plasma simulation
OSPREI: Sun-to-Earth (or satellite) CME simulator
pyFFTW: python wrapper around FFTW
pyia: Python package for working with Gaia data
Rubble: Simulating dust size distributions in protoplanetary disks
ShapeMeasurementFisherFormalism: Fisher Formalism for Weak Lensing
SkyCalc_ipy: SkyCalc wrapper for interactive Python
SkyPy: Simulating the astrophysical sky
SNEWPY: Supernova Neutrino Early Warning Models for Python
Snowball: Generalizable atmospheric mass loss calculator
SNOwGLoBES: SuperNova Observatories with GLoBES
SoFiA 2: An automated, parallel HI source finding pipeline
STAR-MELT: STellar AccrRtion Mapping with Emission Line Tomography
unpopular: Using CPM detrending to obtain TESS light curves
Varstar Detect: Variable star detection in TESS data
VOLKS2: VLBI Observation for transient Localization Keen Searcher
WeakLensingDeblending: Weak lensing fast simulations and analysis of blended objects
WimPyDD: WIMP direct–detection rates predictor
What a difference a week makes! Our paper has been reconsidered and is now with the journal for peer-review. Productivity on the ASCL was lower than usual this week as I took a few days off, but still, nine code records were edited, daily random code posts were made to Facebook and Twitter, and three new entries were written and staged for consideration. I also made note of minor changes to make on another paper that is undergoing peer review. This coming week will be all code entry work, vetting, writing, and curating.
The main events this past week were finishing up and submitting an article, this for the special issue of PeerJ Computer Science, which I’ve previously mentioned, and prepping for and holding the monthly SciCodes meetings. Unfortunately, our paper was desk-rejected for being out of scope for the journal (yet in scope for the call for papers). This happened very quickly, which gave the author team some time to determine what our options might be and how we would approach considering them before the week was out. The SciCodes meetings went well and I had a great chat with a possible new participant in the consortium; it was a very fruitful conversation. As is common, the week included curation, new entries, social media post scheduling, and correspondence. Sixteen records were curated, some of them the result of scheduling of seven daily code posts, and three new entries were staged. All in all, a busy week, with elation, disappointment, determination, and some whining from the ASCL Central cat, who, poor thing, is going to the kitty dentist on Monday for evaluation before his dental surgery later this month.
Meetings and writing, writing and meetings. Sure, Monday was officially a holiday, but I spent the day mostly on writing tasks; later in the week, the SciCodes team working on a paper met twice for writing sprints and discussion, and I attended the monthly Force11 Software Citation Implementation Working Group meeting. I also met with ASCL founder Robert Nemiroff and ASCL editors Kimberly DuPrie and Catherine Gosmeyer. Peter Teuben, chair of the ASCL’s Advisory Committee, usually attends the editor meetings as well but had a prior commitment. Most of the meeting was spent discussing updates to our procedures and communications; we also talked about upcoming conferences and got caught up on what (and how) we’re all doing. Still, we curated records and sent correspondence, usual tasks most weeks, and Peter submitted the final report for our NASA ADAP project.
August is over, and so is our hiatus from weekly posting. This past week, we added 21 new entries, curated 21 entries, and have so far sent 22 registration notices. Of the 25 codes added to the ASCL in August, seven had been submitted by their authors. We also staged “Today’s random code” social media posts through September 14, and shared a blog post, cross-posted on several other sites including Better Scientific Software, on best practices for entities such as the ASCL. I spent a lot of time this week on a paper that expands on the best practices, this written collaboratively with some of the SciCodes participants, and we expect to submit this paper soon to a special issue of PeerJ Computer Science. I also wrote a final report for our NASA ADAP project that will be submitted this coming week.
Twenty-five codes were added to the ASCL in August:
AMOEBA: Automated Gaussian decomposition
AUM: A Unified Modeling scheme for galaxy abundance, galaxy clustering and galaxy-galaxy lensing
AutoProf: Automatic Isophotal solutions for galaxy images
BOSS-Without-Windows: Window-free analysis of the BOSS DR12 power spectrum and bispectrum
caesar-rest: Web service for the caesar source extractor
CatBoost: High performance gradient boosting on decision trees library
catwoman: Transit modeling Python package for asymmetric light curves
Chemulator: Thermochemical emulator for hydrodynamical modeling
CMC-COSMIC: Cluster Monte Carlo code
Cosmic-CoNN: Cosmic ray detection toolkit
COSMIC: Compact Object Synthesis and Monte Carlo Investigation Code
DBSP_DRP: DBSP Data Reduction Pipeline
ELISa: Eclipsing binaries Learning Interactive System
ExoPlaSim: Exoplanet climate simulator
FIREFLY: Chi-squared minimization full spectral fitting code
HRK: HII Region Kinematics
iminuit: Jupyter-friendly Python interface for C++ MINUIT2
MAPS: Multi-frequency Angular Power Spectrum estimator
millennium-tap-query: Python tool to query the Millennium Simulation UWS/TAP client
NRDD_constraints: Dark Matter interaction with the Standard Model exclusion plot calculator
PIPS: Period detection and Identification Pipeline Suite
SORA: Stellar Occultation Reduction Analysis
StelNet: Stellar mass and age predictor
viper: Velocity and IP EstimatoR
WaldoInSky: Anomaly detection algorithms for time-domain astronomy
by Alejandra Gonzalez-Beltran, Alice Allen, Allen Lee, Daniel Garijo, Thomas Morrell, SciCodes Consortium
This post is cross-posted on the SciCodes website, the US Research Software Sustainability Institute blog, the UK Software Sustainability Institute blog, and the FORCE11 blog.
Software is a fundamental element of the scientific process, and cataloguing scientific software is helpful to enable software discoverability. During the years 2019-2020, the Task Force on Best Practices for Software Registries of the FORCE11 Software Citation Implementation Working Group worked to create Nine Best Practices for Scientific Software Registries and Repositories. In this post, we explain why scientific software registries and repositories are important, why we wanted to create a list of best practices for such registries and repositories, the process we followed, what the best practices include, and what the next steps for this community are.
Why are scientific software registries and repositories important?
Scientific software registries and repositories support identifying and finding software, provide information for software citation, foster long-term preservation and reuse of computational methods, and ultimately, improve research reproducibility and replicability.
Why did we write these guidelines?
Managers of scientific software registries and repositories have been working independently to run their services and provide useful information and tools to users in different communities. The Best Practices for Software Registries Task Force participants had different perspectives representing a heterogeneous set of resources, but came together for the common goal of creating a list of best practices for scientific software registries. These shared practices help to raise awareness of software as a research output, enable credit for software creators, and guide curators working on software catalogues through the steps to consider when setting up their software registries. In the longer term, we hope to improve the interoperability of the software metadata supported by different services.
The goals that we considered for writing the guidelines were:
- to have a minimal number of best practices, easy to adopt by repository managers
- to be broadly applicable to most or all of our resources
- to be descriptive on a meta level, not prescriptive, and focused on what the best practices should do or provide, not on what a suggested policy or element should specifically say.
What are the best practices?
Our guidelines, listed below, provide an overview of the key points to take into consideration when creating a software registry. They are:
- Provide a public scope statement (examples)
- Provide guidance for users
- Provide guidance to software contributors
- Establish an authorship policy (examples)
- Share your metadata schema (examples)
- Stipulate conditions of use (examples)
- Provide a retention policy (examples)
- Disclose your end-of-life policy (examples)
Our pre-print offers more explanation about each guideline and a longer list of implementations that we found when we were doing our work on these practices.
What process did we follow to produce the guidelines?
Representatives from numerous software registries and repositories were involved in the FORCE11 Software Citation Implementation Working Group (SCIWG). Alice Allen proposed that we form a task force within the SCIWG for writing up some best practices for the registries and repositories, and with acceptance by the co-chairs of the SCIWG and interest from relevant people, the Task Force on Best Practices for Software Registries was formed. Initially, we gathered information from members of this Task Force to learn more about each resource and to identify some of our overlapping interests. We then identified potential best practices based on prior issues we experienced running our services and discussed what each potential practice might include or exclude.
Through iterative deliberations, we determined which of the potential practices were the most broadly applicable. With generous funding from the Alfred P. Sloan Foundation, we hosted a workshop for scientific registries and repositories, part of which was devoted to gathering final consensus around the Best Practices. The workshop included registries who were not part of the Task Force, resulting in a broader set of contributions to the final list.
What are the next steps for the group?
Our goal is to continue our efforts by implementing these practices more uniformly in our own registries and repositories and reducing the burdens of adoption. We have created SciCodes, a consortium of scientific software registries and repositories, which is now defining the next priorities to tackle, such as tracking the impact of good metadata, improving interoperability between registries, and making our metadata more discoverable by search engines and services such as Google Scholar, ORCID, and discipline indexes. We are also sharing tools and ideas in a series of presentations that are recorded and available for viewing on the SciCodes website, so please check them out!