Engineering Academic Software, Schloss Dagstuhl Day 3

The day started with a quick discussion about the afternoon; it is traditional for Schloss Dagstuhl seminars that Wednesday afternoons involve a social activity. It was determined on Tuesday that the activity was to be a hike some distance away from Dagstuhl with dinner after in another town, but several changes to these plans had to be ironed out and announced. After a few minutes spent on that, the morning session got underway and was furiously fast! This was an Open Mic, with participants having signed up while here to give short talks (ten minutes or less).

First up was Daniel Garijo on Software Metadata: Describing “dark software” in Geosciences. By “dark software,” he means that which is often hidden from view. He described the current state of the art for software description in geosciences and demonstrated Ontosoft.org, a semantic registry for scientific software, which currently includes information from several geosciences resources. As Ontosoft is not domain-specific, it has the capacity to expand into other fields as well. This is a very attractive and capable site. It uses a distributed approach to software registries and depends on crowdsourcing for metadata maintenance. The resource organizes software metadata using the OntoSoft ontology along six dimensions: identify software, understand and assess software, execute software, get support for the software, do research with the software, and update the software. Slideshare

Jurgen Vinju was next with Organising a research team around the research software around the research team in software engineering: Motivation, experiences, lessons. He talked about his experiences as the group leader of the SWAT (Software Analysis and Transformation) team at Centrum Wiskunde and Informatica (CWI), the national research institute for math and computer science in the Netherlands. tweet showing image of Jurgen presenting his Open Mic talkSWAT is all about the source code and supporting programmers to create more efficient, maintainable software. They work to understand and control software complexity to enable more and better tools. He made the point that research teams “prioritise for academic output which is not software.” He showed UseTheSource, a resource developed by CWI with contributions from other institutes and housing open-source projects related to software language engineering and metaprogramming. This allows more efficient programming by automating tasks that are cumbersome or hard, and allows synergies between software engineers, researchers, and industry.
Tweet: A research team s not a software team. We have fewer resources. We need more investment in efficiency.

Dan Katz gave an overview of work done by the Force11 Software Citation Working Group; his presentation was titled Software Citation: Principles, Discussion, and Metadata. He provided Tweet: "Check out force 11 for progress in software citation"rationales for citing software, information on the WSSSPE and Force11 groups involved in developing software citation principles and the process used to develop them, and then the six principles, which focus on the importance of software, the need to credit and attribute the contributions software makes to research and to be able to uniquely identify software in a persistent and specific way, and that citations should enable access to the software and associated information about the software that informs its use. Katz brought up many of the discussions the WSSSPE and Force11 working groups had and their determinations, such as what software to cite, how to uniquely identify software, that peer-review of software is important but not required for citation, and how publishers can help.
Tweet: "It's more important to cite the software directly rather than a software paper"Each of the Open Mic sessions generated immediate discussion during the sessions and while the next presenter was setting up, and this session was no exception. When Katz pointed out that a common practice is to publish and cite papers about software (“software papers”), but that the Importance principle of the Force11 Working Group calls for the citation of the software itself, “on the same basis as any other research product”, this was countered with a comment that people should cite software papers if the software authors have requested that method of citation. Katz stated that could be done in addition to citing to the software, as one of his slides stated. The presentation concluded with information on the next steps for the Force11 Software Citation Working Group — to finalize the principles, and publish and circulate them for endorsement — and the likelihood of a Software Citation Implementation Group being formed to work with institutions, researchers, publishers, and other interested parties to put the principles into practice.

Tweet: ""Software advisors are elected. It's a role people create when ask you questions" Katie Kuksenok"The fourth Open Mic talk was by Katerena Kuksenok on Best Practices (by any other name). This interesting talk looked atTweet: "User resistance: “I don’t want to use version control because I don’t want the world to see my terrible code.”" intersections of the technical, social, and cognitive aspects of software engineering in research, and asked how the available community and skill resources could be leveraged. brought together various elements brought up through the workshop so far, including different roles that had been identified, the need for software engineers to learn from scientists just as we hope researchers learn software engineering practices, Tweet: Mike Croucher "is s/w therapist/coach, helping scientists improve code...carefully; doesn't throw computer science at them!"and overcoming communications barriers. She referred back to a comment Mike Croucher had made in his talk on Monday, agreeing that software engineers should “do CS/SE with people not at them!”

After Kuksenok’s talk, I presented Restoring reproducibility: Making scientist software discoverable. This presentation was a quick overview of the ASCL, its history and a few of the changes to our infrastructure, the lessons we learned from Tweet: astrophysics source code library since 1999looking at what other astro code registries and repositories had done and what we did with those lessons, and some of the impact we have on the community. As with every other session, there was intermittent discussion, questions asked and answered, and conversation on the topic as I headed back to my chair and the next speaker set up. PowerPoint slides PDF

Robert Haines was up next with A Short* History of Research Software Engineers in the UK (*and probably incomplete). Before there were Research Software Engineers (RSE), there were RSEs going by other names, such as Post Doc and Research Assistant. These were the people in the lab who could code, andTweet" "#dagsRobert Haines reports on the coming to life of the job of “Research Software Engineer”, with jobs, a union, etc." fell “foul of publish or perish” because they were writing code rather than papers. RSEs might also have been hiding as those working in high performance computing or as a research group admin. He is an example of someone who has always done RSE work, though was not called an RSE until fairly recently. It was at a Software Sustainability Institute Collaborations Workshop in 2012 that there was a call to arm to recognize the Screen Shot 2016-06-26 at 12.34.52 PM contributions of those who write code rather than papers and are not purely researchers. They decided they needed a name, to unionize, and a policy campaign. He described the current environment, both the challenges and the positives, and shared that many people want to work in this field. Yes, discussion broke out in this session, too! It was remarkable how engaged everyone at the workshop was, and how often and easily discussion took place.

Ralf presentingDan Katz made a very brief presentation and instigated more discussion on career paths when Robert Haines was finished, then after a brief coffee break, the morning Open Mic session continued with Ralf Lämmel‘s presentation intriguingly called Making a failing project succeed?! about the 101Companies project. He called 101Companies a software chrestomathyfrom chresto, meaning “useful” and mathein, meaning “to learn.” He shared other chrestomathies, such as the Hello World Collection and the Evolution of a Haskell programmer. (One of the previous links will lead you to a song about a popular beverage.) 101Companies is a resource for learning Tweet: "101 is a knowledge resource for technological space travel (between all kinds of online spaces)"more about software, for comparing technologies, for programming education, and can serve as “a playground for student projects.” He discussed some of the challenges the project is having and some of the ways in which it is succeeding. PDF

The last Open Mic talk of the morning was by Ashish Gehani giving a quick overview of his work on software, including software to make data more manageable, particularly the OCCAM: Object Culling and Concretization for Assurance Maximization project.

The last agenda item for the morning was to discuss the manifesto that is one of the required Tweets: "we discussed the #manifesto as genre in http://dx.doi.org/10.1109/ICSE.2015.179 … section III. http://press.princeton.edu/titles/8066.html … is a great #longread"outputs for this workshop. This discussion was led by James Howison, who shared the link for the Google Doc that was to become the manifesto, and which was discussed and created in tandem (and wild abandon) by many in the room duTweet: "I was, uh, one of the authors of the EAS manifesto. The original EAS manifesto. Not the compromised second draft."ring the time remaining before lunch. The manifesto is our public declaration, our own call to action. Our work is only beginning at Schloss Dagstuhl; we must put what we have discussed here into practice. We shared other manifestos (manifesti!), determined authorship as opt-in (by adding our names to the author list), and talked about but did not determine where this might be published. I found the creation of this document interesting and inspiring, very much in line with the philosophy of “be the change you want to see in the world.”
Tweet: "According to James Howison software as communication between people should be studied."
After getting a good start on the manifesto, we broke for a longer than usual lunch period, after which some took a long hike with a lakeside stop for a refreshing beverage, and some did other things. I took a much-needed nap and then noodled around for a bit in the music room, view of the music room looking toward the piano from the far end a lovely large, long room with wonderful acoustics and a recently-tuned grand piano, two guitars, a cello, and a violin available. (I discovered later in the week that the violin case also holds a kazoo.) small ornate doorway decorated with naked cherubs and a shield with 1743 on itScores for solo and ensemble music are stocked in a room at one end of the music room, the (small) door to which is watched over by cherubs. Most of the Schoss is modern in appearance; this is one of the few rooms that reveals the building’s history. I found plenty of music to amuse myself with, including a collection of Bach preludes and fugues from the WTC apparently edited by Bartók and in what to me was a confusing order, and Beethoven sonatas that at one time I knew how to butcher. Others reported having taken shorter walks than the one that was organized, listening to podcasts, trying out the bicycles available for guests, and also napping.

As you have likely surmised by now, the Twitter hashtag for this event was , and the Twitter feed offers more pictures and information about this workshop.

Engineering Academic Software, Schloss Dagstuhl Day 2

Tuesday started with Jeffrey Carver from the University of Alabama presenting What we have learned about using software engineering practices in scientific software. They took a multi-pronged approach to studying scientific software, from conducting surveys and workshop to direct interactions and case studies. From survey work, his team was able to group problems scientists were having with their own software into four main areas: rework, performance, regression (testing), and forgetting bugs. From this, they could see what software engineering practices might help with solving the problems.

Case studies brought numerous lessons to light; they found that the use of higher-level languages was low, performance competes with other goals, and external software use can be seen as risky. Workshops highlighted some of the differences between scientist programmers and software engineers and their domains. Scientist developers often lack formal software engineering expertise but have deep knowledge of their domains and are often the main users of their software. Quality goals are different, too; scientists would rather software not run than return an incorrect result. This project demonstrated that there is a need to eliminate the stigma associated with software engineering and that software engineers need to understand domain constraints and specific problems.

Every presentation sparked lively Q&A and discussion, often throughout the presentation, and this one was no exception. User stories beat data even for scientistsScreen Shot 2016-06-22 at 7.49.08 PM

The next presentation was Engineering yt by Matthew Turk from NCSA at the University of Illinois at Urbana Champaign. He provided context and information on this well-cited community-developed project, discussed how the community was built, and its adoption of a code of conduct. YTEP, yt Enhancement Proposals, provide a method to manage suggestions for improving yt. Communication methods within the community are well thought out. The challenges of creating and managing the community sparked a lot of discussion; large software projects can have many things go wrong. tweet about change
Failure Modes

Discussion among the group made Matt’s presentation run long, making it necessary to break for coffee before Matt’s talk was done and then return to it after the break. The group was very engaged throughout the day; fortunately, the schedule accommodated the frequent discussions in every presention very well.

After Matt’s talk, Caroline Jay (University of Manchester) and Robert Haines (Software Sustainability Institute) presented Software as academic output. They discussed software’s role in research, when it can be a tool that enables research or the actual research itself, and how this is different depending on the discipline and the functionality of the software within the discipline and the role of the person using the software. They made the point that “Software isn’t a separate thing — software could exist without the paper; the paper couldn’t exist without the software.”
Daniel's tweetChristoph's tweetOh, there was much more goodness in this presentation, which was interrupted by lunch, than I have time to report, including The Horror, as it was termed — the steps necessary for someone to replicate the computational work on one of the research projects this presentation covered — and Robert’s work on making this computational work software available in Docker. It also touched on the FAIR principles for computational research and academic software, and like the other presentations, generated lots of discussion, including conversations in the group on ethical considerations. Dan Katz's tweetOscar's tweet

The last formal presentation of the day, before the breakout workgroups, was by Claude Kirchner (INRIA) on the Software Heritage Project. He covered the rationale for this project, which includes the inconsiderate or malicious loss of code and the desire to preserve “our technical and scientific knowledge.” The Software Heritage Project has set out to preserve all the software. Yes, you read that correctly: All the software. Fortunately, a version of the slides for this presentation are online so you can see them for yourself! The site is scheduled to go live next week and I look forward to seeing it.

After Claude’s presentation, we went into breakout sessions.
Christoph's tweet re breakout groupsI joined a breakout session on getting a standing award for scientific contributions through software created. The other breakout sessions were on creating a research software engineering handbook and academic software project topology. All groups reported back before the day’s session ended for dinner. Quite an informative, exciting, and productive day!

Engineering Academic Software at Schloss Dagstuhl

I’m at Schoss Dagstuhl – Leibniz Center for Informatics for a week-long workshop on Engineering Academic Software. Some of the questions we are tackling have been discussed elsewhere, which we are taking into consideration as we talk about them here, and new questions were not only part of the seminar’s original description, but are arising throughout the general and break-out sessions. I would say we’re at the end of the first day but it continues on though it is past 10 PM, with a planned open and vibrant discussion on dogmas past and present. First up for discussion tonight was Agile project management; how do you feel about it? Is this a dogma that needs to be shot or embraced?

The hashtag to follow on Twitter is #dagstuhleas for the full-group discussions; the breakout sessions so far have been too intense for tweeting!

April and May 2016 additions to the ASCL

Twenty-eight codes were added to the ASCL in April and May 2016:

2-DUST: Dust radiative transfer code
ASTRiDE: Automated Streak Detection for Astronomical Images
BACCHUS: Brussels Automatic Stellar Parameter
CAMELOT: Cloud Archive for MEtadata, Library and Online Toolkit
CCSNMultivar: Core-Collapse Supernova Gravitational Waves

cluster-lensing: Tools for calculating properties and weak lensing profiles of galaxy clusters
DISCO: 3-D moving-mesh magnetohydrodynamics package
DNest3: Diffusive Nested Sampling
DUO: Spectra of diatomic molecules
FDPS: Framework for Developing Particle Simulators

grtrans: Polarized general relativistic radiative transfer via ray tracing
Halotools: Galaxy-Halo connection models
K2SC: K2 Systematics Correction
LAMBDAR: Lambda Adaptive Multi-Band Deblending Algorithm in R
libpolycomp: Compression/decompression library

magicaxis: Pretty scientific plotting with minor-tick and log minor-tick support
MARZ: Redshifting Program
MUSCLE: MUltiscale Spherical-ColLapse Evolution
OpenMHD: Godunov-type code for ideal/resistive magnetohydrodynamics (MHD)
PDT: Photometric DeTrending Algorithm Using Machine Learning

SAND: Automated VLBI imaging and analyzing pipeline
Shadowfax: Moving mesh hydrodynamical integration code
Surprise Calculator: Estimating relative entropy and Surprise between samples
The Tractor: Probabilistic astronomical source detection and measurement
TMBIDL: Single dish radio astronomy data reduction package

TRIPPy: Python-based Trailed Source Photometry
TTVFaster: First order eccentricity transit timing variations (TTVs)
zeldovich-PLT: Zel’dovich approximation initial conditions generator

Engineering Academic Software

I’ll be heading to Schloss Dahstuhl in June for a Perspectives Workshop on Engineering Academic Software. Questions the workshop will seek to address include:

  • How is academic software different from other software? What are its most pressing dimensions of quality?
  • Is the software we use and produce in an academic context sustainable? How can we ensure that the software continues to evolve and offer value after serving its initial purpose?
  • How can we adapt software engineering methods for the unique academic context without losing quality?
  • How can we balance domain knowledge and expertise with software engineering knowledge and expertise in an academic research team?
  • Do quality aspects of academic software apply to open data as well? How can well-engineered academic software together with open data make science more reproducible?

I look forward to tackling these and other questions with the other participants, and thank  Carole Goble , James Howison, Claude Kirchner, and Oscar M. Nierstrasz for organizing the workshop.

March 2016 additions to the ASCL

Eighteen codes were added to the ASCL in March, 2016:

Asfgrid: Asteroseismic parameters for a star
CORBITS: Efficient Geometric Probabilities of Multi-Transiting Exoplanetary Systems
Dedalus: Flexible framework for spectrally solving differential equations
DiskJockey: Protoplanetary disk modeling for dynamical mass derivation
ellc: Light curve model for eclipsing binary stars and transiting exoplanets

EQUIB: Atomic level populations and line emissivities calculator
ExoPriors: Accounting for observational bias of transiting exoplanets
FAST-PT: Convolution integrals in cosmological perturbation theory calculator
fibmeasure: Python/Cython module to find the center of back-illuminated optical fibers in metrology images
gPhoton: Time-tagged GALEX photon events analysis tools

HIIexplorer: Detect and extract integrated spectra of HII regions
PyGSM: Python interface to the Global Sky Model
PolRadTran: Polarized Radiative Transfer Model Distribution
ROBAST: ROOT-based ray-tracing library for cosmic-ray telescopes
SILSS: SPHERE/IRDIS Long-Slit Spectroscopy pipeline

SMARTIES: Spheroids Modelled Accurately with a Robust T-matrix Implementation for Electromagnetic Scattering
tpipe: Searching radio interferometry data for fast, dispersed transients
VIP: Vortex Image Processing pipeline for high-contrast direct imaging of exoplanets

February 2016 additions to the ASCL

Twenty-one codes were added to the ASCL in February, 2016:

Automark: Automatic marking of marked Poisson process in astronomical high-dimensional datasets
Celestial: Common astronomical conversion routines and functions
CHIP: Caltech High-res IRS Pipeline
CLOC: Cluster Luminosity Order-Statistic Code
COLAcode: COmoving Lagrangian Acceleration code

DELightcurveSimulation: Light curve simulation code
DUSTYWAVE: Linear waves in gas and dust
FilTER: Filament Trait-Evalutated Reconstruction
GANDALF: Graphical Astrophysics code for N-body Dynamics And Lagrangian Fluids
IRSFRINGE: Interactive tool for fringe removal from Spitzer IRS spectra

k2photometry: Read, reduce and detrend K2 photometry
LensTools: Weak Lensing computing tools
LIRA: LInear Regression in Astronomy
LRGS: Linear Regression by Gibbs Sampling
mbb_emcee: Modified Blackbody MCMC

NuCraft: Oscillation probabilities for atmospheric neutrinos calculator
POPPY: Physical Optics Propagation in PYthon
pyraf-dbsp: Reduction pipeline for the Palomar Double Beam Spectrograph
TailZ: Redshift distributions estimator of photometric samples of galaxies
The Cannon: Data-driven method for determining stellar parameters and abundances from stellar spectra

ZAP: Zurich Atmosphere Purge

January 2016 additions to the ASCL

Twenty-one codes were added to the ASCL in January, 2016:

BASCS: Bayesian Separation of Close Sources
CosmicPy: Interactive cosmology computations
ctools: Cherenkov Telescope Science Analysis Software
Fit Kinematic PA: Fit the global kinematic position-angle of galaxies
Hyper-Fit: Fitting routines for multidimensional data with multivariate Gaussian uncertainties

ImpactModel: Black Hole Accretion Disk Impact Model
ISO: Isochrone construction
K2fov: Field of view software for NASA’s K2 mission
LACEwING: LocAting Constituent mEmbers In Nearby Groups
LIRA: Low-counts Image Reconstruction and Analysis

MATPHOT: Stellar photometry and astrometry with discrete point spread functions
Nulike: Neutrino telescope likelihood tools
Odyssey: Ray tracing and radiative transfer in Kerr spacetime
PARAVT: Parallel Voronoi Tessellation code
ProC: Process Coordinator

QDPHOT: Quick & Dirty PHOTometry
SAGE: Semi-Analytic Galaxy Evolution
SavGolFilterCov: Savitzky Golay filter for data with error covariance
SCOUSE: Semi-automated multi-COmponent Universal Spectral-line fitting Engine
TRADES: TRAnsits and Dynamics of Exoplanetary Systems

WzBinned: Binned and uncorrelated estimates of dark energy EOS extractor

AAS 227 Poster 348.01: Making your code citable with the Astrophysics Source Code Library

Image of poster on ASCL showing how it can be used to cite software and get currently untrackable DOIs tracked in ADS

The Astrophysics Source Code Library (ASCL, ascl.net) is a free online registry of codes used in astronomy research. With nearly 1,200 codes, it is the largest indexed resource for astronomy codes in existence. Established in 1999, it offers software authors a path to citation of their research codes even without publication of a paper describing the software, and offers scientists a way to find codes used in refereed publications, thus improving the transparency of the research. Citations using ASCL IDs are accepted by major astronomy journals and if formatted properly are tracked by ADS and other indexing services. The number of citations to ASCL entries increased sharply from 110 citations in January 2014 to 456 citations in September 2015. The percentage of code entries in ASCL that were cited at least once rose from 7.5% in January 2014 to 17.4% in September 2015. The ASCL’s mid-2014 infrastructure upgrade added an easy entry submission form, more flexible browsing, search capabilities, and an RSS feeder for updates. A Changes/Additions form added this past fall lets authors submit links for papers that use their codes for addition to the ASCL entry even if those papers don’t formally cite the codes, thus increasing the transparency of that research and capturing the value of their software to the community.

Download poster (jpg)