Category Archives: discussion

An Open Discussion on Astronomy Software Splinter Meeting at AAS 233

Splinter Meeting: An Open Discussion on Astronomy Software
2:00 pm – 3:30 pm
Room 4C-4, Washington State Convention Center

The Astrophysics Source Code Library (ASCL) has organized a Splinter Meeting at January’s AAS  meeting. The session, An Open Discussion on Astronomy Software, is offered in recognition of the ASCL’s 20th anniversary.

Though progress has been made on various fronts, there is still work to be done to improve how astronomers (and other scientists) design, write, share, publish, maintain, archive, and receive credit, recognition, and steady positions for software. This open discussion on software will cover issues, topics, and questions attendees would like addressed, with a panel of software authors to reflect on the topics along with attendees. The session could potentially cover topics such as the sustainability of core astronomical software, whether astronomy should have a Decadal Plan for software and whether publishing need to change, and if so, how? Please submit the issues and questions you would like to see addressed this Google document ( The panel members are:

Megan Ansdell, University of California Berkeley
Rory Barnes, University of Washington
C.E. Brasseur, Space Telescope Science Institute (@cebrasseur)
Tess Jaffe, University of Maryland/NASA Goddard Space Flight Center
Mario Juric, University of Washington (@mjuric)
Amanda Kepley, National Radio Astronomy Observatory (@aakepley)
Rocio Kiman, City University of New York (@rociokiman)

The meeting will be moderated by Alice Allen (ASCL/UMD) and will end with celebratory food (yes, there will be cake!!) for the ASCL’s 20th anniversary.

Engineering Academic Software, Schloss Dagstuhl Day 0 and 1

Now that I’ve written extensively about days 2-4, I am cycling back to give day 1 its due, but first will say the sharing started on Sunday June 19 as people arrived for both the week-long Engineering Academic Software Perspectives Workshop and the three-day Information-centric Networking and Security Seminar; cake and coffee is available upon arrival, which gives folks an opportunity to meet, and conversation between participants in both workshops flowed easily. A Workshop participants on the site of the old schlossgroup of us decided to walk to the old castle ruins on the hill — up many steps — and it was on this little jaunt that I learned firsthand about stinging nettles (having only read about them before) with Andrei Chiș providing most of the hands-on instruction. He and I experimented with different nettles to see which produced the greatest stinging/welts; oh, the things we do for science! Those who had been unwilling victims of the plants provided more data points, and our quick survey leads us to suspect that the more mature the plant, the greater its “don’t touch!” defenses.

On Monday morning, we started with lightning introductions. We had been asked to create two slides, one on our relevant background and another on our interests for future use; a list of workshop participants and intro slides are available online. Tweet: "“I’m a programmer that somebody made into a professor.” @jurgenvinjuNext came presentations; first up was Dan Katz to talk about WSSSPE (Working towards Sustainable Software for Science: Practice and Experience), pronounced “wispy.” He shared the history of the organization, noting that three large annual meetings had been held, with several smaller interim meetings also having taken place. WSSSPE’s progression was to first identify challenges regarding software and best practices for sustainability, then to discuss solutions and ways to enable change, and then at WSSSPE3, to take action and encourage people to work in groups to put into practice the identified solutions. Katz’s presentation included an overview of each of the WSSSPE working groups and the progress each group has made. Some of the working groups overlapped with efforts taking place elsewhere; the Software Credit Working Group, for example, shared much in common with Force11’s Software Citation Working Group, so the decision was made to work on combining the two groups (which was successful) and for members to work on the Software Citation Principles that were being developed (which was also successful).

Katz also shared lessons learned from WSSSPE3 — what had worked, what could have worked better, and what didn’t work. He outlined what is planned for WSSSPE4 (taking place this September in Manchester), listing two tracks for the event: Building a sustainable future for open-use research software, which will concentrate on defining the future of open-use scientific software and initiating plans to arrive at this future; and Practices and experiences in sustainable software, which will concentrate on improving current practices. Katz concluded his talk by sharing links to the reports for WSSSPE1, WSSSPE2, and WSSSPE3 and the social media sites. PDF

The next presentation, Supporting Research Software Engineering, was by Mike Croucher. His talk focused on his work as an Engineering and Physical Sciences Research Council (EPSRC) Research Software Engineering Fellow; he is one of only seven to be awarded this new fellowship. He helps scientists improve their software in various ways, such as making it faster, more reliable/robust/user friendly, and more sustainable. (Now that I’ve typed that, Harder, Better, Faster, Stronger is playing on the radio in my head.) This has to be done carefully, for as Croucher put it: Tweet: "Users are afraid I’m going to “do computer science” to them. @walkingrandomly treads carefully when repairing user code."The phrase “do computer science to them” was echoed throughout the rest of the workshop; this idea — acknowledgement of that fear — seemed to resonate with many.

Croucher shared some of the outreach and education activities he’s been involved with, one of which was a (gentle) self-paced R tutorial held in a café. Volunteer facilitators walked around to answer questions, clarify information, and unstick people who got stuck and the session was a rousing success, so much so that there are now requests and expectations that more will be held!

It was acknowledged at the beginning of the day that academic software faces many challenges; Croucher’s presentation covered some of them, and included this stark slide:

Software is not valued in academizfollowed by:

Tweet: One of the major reasons I dropped out of my PhD was because I didn't believe academia could properly value software contributionsHe also mentioned the lack of funding for software activities, that soft-money researchers are discriminated against in favor of tenure-track and tenured faculty, and other issues. Oooo, he got a great discussion going with these and other points! In the active discussion, Cecilia Aragon made the point that we need to stop calling software “infrastructure,” as software has intellectual content.
Tweet: "Comment from @craragon : stop calling software 'infrastructure' because plumbers aren't invited to coauthor science papers"Though there are challenges, it’s not all bad — things are improving. The Software Sustainability Institute is funded by several organizations, the UK’s Engineering and Physical Sciences Research Council (EPSRC) has recognized the importance of software through funding of the Research Software Engineers, and a Horizon 2020 project to provide “substantial funding” for open source maths research software. Croucher’s vision of the future includes core-funded research software engineers and a hope for tenure awarded on the basis of software contributions. He closed his excellent presentation with concrete steps for changing the perception of software engineering and leading a change in culture. Slides

After a coffee break, the last presentation of the morning was given by Christoph Becker on Sustainability design.  He covered the challenges of sustainability, and referred to the “sustainability debt” Sustainability debt model across realms and widening effectsthat is mostly unknown for most systems. The effects of sustainability (or lack thereof) can be considered from several angles; one way to look at this debt is across economic, technical social, environmental, and individual aspects, and whether it has an immediate, enabling, or structural effect. The concerns about sustainability have inspired the Karlskrona Manifesto for Sustainability Design, which seeks to address sustainability across different aspects and widening effects. The Manifesto identifies eleven “misperceptions and counterpoints”, seeks to correct or mitigate them, and educate and advocate for a constructive approach to enabling a paradigm shift. Tweet: "Software projects are full of present-future trade offs, yet SE hasn't learned from behavioral economics @ChriBecker at #dagstuhleas"Becker is particularly interested in studying how people decide on the trade-offs they make when designing software, and using the insight gleaned to develop and implement methods and tools for making better choices. PDF

After Becker’s talk, we broke for lunch, then went into four breakout sessions for a good part of the afternoon; the four selected by participants from the many that had been proposed were:

  • Academic software project typology
  • Examining sustainability for a particular project
  • Making the intellectual content of software visible
  • Empirical survey of software practices in a domain

After working in our breakout sessions, we came back together to report on our progress. Wow, was there a lot of discussion! Everyone was very engaged in listening to, commenting on, and discussing the reports from the different groups. It was a very exciting afternoon, and discussion continued right up until we were forced to break for dinner.

As previously reported, we had a discussion in the evening as well. Monday was an excellent start to an outstanding week!

Engineering Academic Software, Schloss Dagstuhl Day 4

The morning of Day 4 of the Engineering Academic Software workshop opened with the mighty James Howison talking about the outputs expected from our work this week; these include a report of the meeting, the manifesto mentioned in my previous blog post, a draft document offering guidance for tenure committees on evaluating software contributions, a draft workplan for writing a proposal to establish an award for software contributions, a table of contents for a research software engineering handbook, and a sustainability debt use case, these last four from breakout session work.

The two talks on this morning might well be billed the Battle of Cool Places to Work. First up was Cecilia Aragon from the University of Washington on eScience Institute Initiatives. This work grew out of the realization that people were drowning in data, leading to the Moore/Sloan Data Science Environment, a $37.8M initiative at UCB, NYU, and UW. Tweet: ""Big data is two orders of magnitudes larger than you are used to dealing with now" - @craragon"The eScience Institute has a multi-pronged approach set up around science theme areas with bridges to data science methodologies; this sets up a cycle wherein research needs generate new methodologies, which enable more science. Two new roles were established, Data Science Fellows and Data Scientists. They also set up education and training, including workshops and bootcamps for data science, such as Software Carpentry and Astro Hack Week and a seminar series. They offer a new MS in data science that is interdisciplinary, involving six departments, and innovative, with a social science component that includes a human-centered viewpoints and ethics. This MS program is designed for working professionals, provides a rigorous technical program in statistics and computer science, and has evening courses and allows full or part time attendance. Tweet: "UW's MSc in Data Science includes training in "software hygiene" and pragmatic parts of Soft Eng"They have set up guidelines for reproducibility and offer help to people to improve this aspect of their work; eScience institute data scientists and others participate in a “drop-in” office hours program. They foster working relations with their working space and culture; people sit side-by-side to work on a problem. They see sharing a physical space as essential for data science and growing research software collaborations.

Aragon discussed the integration of ethnography (a qualitative field-based technique originally from anthropology that enables study of underlying patterns and themes) and evaluation into a wide range of the data science environment. Ethnography research tries to answer questions such as Who does data science?, How are they networked?, and What forms of social interaction do they use? Ethnographers at UW work with members of the community to interpret observations and to provide feedback on what works and what doesn’t. She reports they have had a lot of  success with “applied ethnography”.

She also discussed their data science incubator program, which was the precursor to the Data Science for Social Good program. They looked for high-impact data-intensive science projects that would benefit from quarter-long sprints of expertise, and had projects outlive the incubator, getting advances in both the science and the software and generating publishable "Human centered data science" initiative by @craragon impressive in its interdisciplinary diversity. Great stuff to learn fromresults for both. One project was to try to solve problem of homelessness in Seattle. It involved bringing data about homelessness into more manageable form and analyzing it to see what worked, conducting analysis to identify predictors of permanent housing, and looking for successful outcomes. Another project, Open Sidewalks, created sidewalk maps for low-mobility citizens to show the curb cuts are.

Aragon discussed the marketing that they do; they talk to a lot of people, and this has helped with engagement. They actively look for ways to build relationships and collaboration.
factors contributing to collaborative dynamicsAs with all the talks, participants in the room were very engaged, asked questions, and discussed various points. Aragon was asked about career paths and the backgrounds of those in the MS data science program; she said there were forty students in the first cohort and that it was a very heterogenous class, with people from many disciplines. The ethnography work has been discussed in a paper by Tanweer, Fiore, and Aragon. I hope the slides for this talk are released! There was a lot in it that I have not captured here. Screen Shot 2016-06-27 at 6.32.45 PMWhen someone gives a talk about their organisation and you want to quit your job, leave the country and go join them.
The “organization envy,” as one person in the room put it, continued with Rob van Nieuwpoort‘s talk on the Netherlands eScience Center. The Netherlands eScience center receives 5.4M€/year in permanent funding.
Screen Shot 2016-06-27 at 6.34.54 PMTheir responsibilities is demand-driven for all sciences; competition for funding and services is within disciplines, not between disciplines. They fund path-finding projects; this program is similar to UW’s incubator projects program. They also receive in-kind funding for eScience research engineers; these are broadly oriented scientists.

The eScience Center recognized early on that they wanted to give research engineers career paths; they offer three different paths: managerial, technical, and research. Asked by Katz whether research engineers have to have academic appointments, van Nieuwpoort stated that some researcher engineers do have academic appointments, but not all do. Vinju asked about educational opportunities, to which the answer was that yes, there are educational opportunities, including workshop and other training; this topic came up again a bit later.

They foster a collaborative rather than a competitive environment, with their engineers fully integrated into the scientific work, and domain scientists recognized for their contributions to software development. Research software engineers are coauthors on science papers to which they have contributed, and when software methods published, domain scientists are recognized with coauthorship.

Returning to education, van Nieuwpoort stated that research software engineers like learn, so the eScience Center keeps them challenged and learning with courses, hackathons, and sprints, and by switching disciplines and technologies.

Through eStep, an eScience technology platform, the eScience Center serves the 99% of Rob action shotscientists in the Netherlands that aren’t at the eScience Center. eSTEP goals are to prevent fragmentation and duplication; to promote exchange and reuse of best practices; to represent NLeSC’s expertise and knowledge, and to improve the science state of art with fundamental science research. There are key expertises used in many projects and projects use number of methodologies. NLeSC generalizes software for use in eSTEP; they find or develop state of the art and “best of breed” technologies and software matching their expertise areas that can be made generic and overarching and integrate that technology into eSTEP.

Sustainability is important to them, as is preventing duplication and fragmentation; they seek to build software that is worth sustaining and enforce software engineering best practicesTweet: OH sustainability only matters for things that are worth sustaining :-). They use Software Carpentry and Data Carpentry to educate their partners, maintain a knowledge base, and (be still, my heart!) offer a searchable software repository. And more! Slides (PDF)

Tweet: lots of cool projects and ideas at :)

Of course there was discussion (and funding envy, too); Kevin Crowston pointed out that permanent funding, as Rob’s eScience center has, solves a lot of issues. After a short break, we worked together (all of us in the same Google document, which was a little wild) on the Manifesto, sometimes tweeting out comments and questions, up until lunch.

Tweet: Has anyone got any stories about researchers who shared their code and got publicly mocked because of low quality?Tweet: Trying to "look far into future" as @dagstuhl manifestos are meant to do. What will research software be in 10, 20 years?Tweet: "Just became aware of #CodeMeta “a Rosetta stone of software metadata”"After lunch, we went into breakout sessions; these included sessions on future research questions, the research software engineering handbook, and the Force11 software citation suggestions. After working in these breakouts, we reconvened to share and discuss the progress that was made, breaking only for dinner at 6:00 PM because we had to.

A very busy, exciting, interesting, informative, and productive day!

Engineering Academic Software, Schloss Dagstuhl Day 3

The day started with a quick discussion about the afternoon; it is traditional for Schloss Dagstuhl seminars that Wednesday afternoons involve a social activity. It was determined on Tuesday that the activity was to be a hike some distance away from Dagstuhl with dinner after in another town, but several changes to these plans had to be ironed out and announced. After a few minutes spent on that, the morning session got underway and was furiously fast! This was an Open Mic, with participants having signed up while here to give short talks (ten minutes or less).

First up was Daniel Garijo on Software Metadata: Describing “dark software” in Geosciences. By “dark software,” he means that which is often hidden from view. He described the current state of the art for software description in geosciences and demonstrated, a semantic registry for scientific software, which currently includes information from several geosciences resources. As Ontosoft is not domain-specific, it has the capacity to expand into other fields as well. This is a very attractive and capable site. It uses a distributed approach to software registries and depends on crowdsourcing for metadata maintenance. The resource organizes software metadata using the OntoSoft ontology along six dimensions: identify software, understand and assess software, execute software, get support for the software, do research with the software, and update the software. Slideshare

Jurgen Vinju was next with Organising a research team around the research software around the research team in software engineering: Motivation, experiences, lessons. He talked about his experiences as the group leader of the SWAT (Software Analysis and Transformation) team at Centrum Wiskunde and Informatica (CWI), the national research institute for math and computer science in the Netherlands. tweet showing image of Jurgen presenting his Open Mic talkSWAT is all about the source code and supporting programmers to create more efficient, maintainable software. They work to understand and control software complexity to enable more and better tools. He made the point that research teams “prioritise for academic output which is not software.” He showed UseTheSource, a resource developed by CWI with contributions from other institutes and housing open-source projects related to software language engineering and metaprogramming. This allows more efficient programming by automating tasks that are cumbersome or hard, and allows synergies between software engineers, researchers, and industry. PDF
Tweet: A research team s not a software team. We have fewer resources. We need more investment in efficiency.

Dan Katz gave an overview of work done by the Force11 Software Citation Working Group; his presentation was titled Software Citation: Principles, Discussion, and Metadata. He provided Tweet: "Check out force 11 for progress in software citation"rationales for citing software, information on the WSSSPE and Force11 groups involved in developing software citation principles and the process used to develop them, and then the six principles, which focus on the importance of software, the need to credit and attribute the contributions software makes to research and to be able to uniquely identify software in a persistent and specific way, and that citations should enable access to the software and associated information about the software that informs its use. Katz brought up many of the discussions the WSSSPE and Force11 working groups had and their determinations, such as what software to cite, how to uniquely identify software, that peer-review of software is important but not required for citation, and how publishers can help.
Tweet: "It's more important to cite the software directly rather than a software paper"Each of the Open Mic sessions generated immediate discussion during the sessions and while the next presenter was setting up, and this session was no exception. When Katz pointed out that a common practice is to publish and cite papers about software (“software papers”), but that the Importance principle of the Force11 Working Group calls for the citation of the software itself, “on the same basis as any other research product”, this was countered with a comment that people should cite software papers if the software authors have requested that method of citation. Katz stated that could be done in addition to citing to the software, as one of his slides stated. The presentation concluded with information on the next steps for the Force11 Software Citation Working Group — to finalize the principles, and publish and circulate them for endorsement — and the likelihood of a Software Citation Implementation Group being formed to work with institutions, researchers, publishers, and other interested parties to put the principles into practice. PDF

Tweet: ""Software advisors are elected. It's a role people create when ask you questions" Katie Kuksenok"The fourth Open Mic talk was by Katerena Kuksenok on Best Practices (by any other name). This interesting talk looked atTweet: "User resistance: “I don’t want to use version control because I don’t want the world to see my terrible code.”" intersections of the technical, social, and cognitive aspects of software engineering in research, and asked how the available community and skill resources could be leveraged. brought together various elements brought up through the workshop so far, including different roles that had been identified, the need for software engineers to learn from scientists just as we hope researchers learn software engineering practices, Tweet: Mike Croucher "is s/w therapist/coach, helping scientists improve code...carefully; doesn't throw computer science at them!"and overcoming communications barriers. She referred back to a comment Mike Croucher had made in his talk on Monday, agreeing that software engineers should “do CS/SE with people not at them!” PowerPoint

After Kuksenok’s talk, I presented Restoring reproducibility: Making scientist software discoverable. This presentation was a quick overview of the ASCL, its history and a few of the changes to our infrastructure, the lessons we learned from Tweet: astrophysics source code library since 1999looking at what other astro code registries and repositories had done and what we did with those lessons, and some of the impact we have on the community. As with every other session, there was intermittent discussion, questions asked and answered, and conversation on the topic as I headed back to my chair and the next speaker set up. PowerPoint PDF

Robert Haines was up next with A Short* History of Research Software Engineers in the UK (*and probably incomplete). Before there were Research Software Engineers (RSE), there were RSEs going by other names, such as Post Doc and Research Assistant. These were the people in the lab who could code, andTweet" "#dagsRobert Haines reports on the coming to life of the job of “Research Software Engineer”, with jobs, a union, etc." fell “foul of publish or perish” because they were writing code rather than papers. RSEs might also have been hiding as those working in high performance computing or as a research group admin. He is an example of someone who has always done RSE work, though was not called an RSE until fairly recently. It was at a Software Sustainability Institute Collaborations Workshop in 2012 that there was a call to arm to recognize the Screen Shot 2016-06-26 at 12.34.52 PM contributions of those who write code rather than papers and are not purely researchers. They decided they needed a name, to unionize, and a policy campaign. He described the current environment, both the challenges and the positives, and shared that many people want to work in this field. Yes, discussion broke out in this session, too! It was remarkable how engaged everyone at the workshop was, and how often and easily discussion took place. PDF

Ralf presentingDan Katz made a very brief presentation and instigated more discussion on career paths when Robert Haines was finished, then after a brief coffee break, the morning Open Mic session continued with Ralf Lämmel‘s presentation intriguingly called Making a failing project succeed?! about the 101Companies project. He called 101Companies a software chrestomathyfrom chresto, meaning “useful” and mathein, meaning “to learn.” He shared other chrestomathies, such as the Hello World Collection and the Evolution of a Haskell programmer. (One of the previous links will lead you to a song about a popular beverage.) 101Companies is a resource for learning Tweet: "101 is a knowledge resource for technological space travel (between all kinds of online spaces)"more about software, for comparing technologies, for programming education, and can serve as “a playground for student projects.” He discussed some of the challenges the project is having and some of the ways in which it is succeeding. PDF

The last Open Mic talk of the morning was by Ashish Gehani giving a quick overview of his work on software, including software to make data more manageable, particularly the OCCAM: Object Culling and Concretization for Assurance Maximization project.

The last agenda item for the morning was to discuss the manifesto that is one of the required Tweets: "we discussed the #manifesto as genre in … section III. … is a great #longread"outputs for this workshop. This discussion was led by James Howison, who shared the link for the Google Doc that was to become the manifesto, and which was discussed and created in tandem (and wild abandon) by many in the room duTweet: "I was, uh, one of the authors of the EAS manifesto. The original EAS manifesto. Not the compromised second draft."ring the time remaining before lunch. The manifesto is our public declaration, our own call to action. Our work is only beginning at Schloss Dagstuhl; we must put what we have discussed here into practice. We shared other manifestos (manifesti!), determined authorship as opt-in (by adding our names to the author list), and talked about but did not determine where this might be published. I found the creation of this document interesting and inspiring, very much in line with the philosophy of “be the change you want to see in the world.”
Tweet: "According to James Howison software as communication between people should be studied."
After getting a good start on the manifesto, we broke for a longer than usual lunch period, after which some took a long hike with a lakeside stop for a refreshing beverage, and some did other things. I took a much-needed nap and then noodled around for a bit in the music room, view of the music room looking toward the piano from the far end a lovely large, long room with wonderful acoustics and a recently-tuned grand piano, two guitars, a cello, and a violin available. (I discovered later in the week that the violin case also holds a kazoo.) small ornate doorway decorated with naked cherubs and a shield with 1743 on itScores for solo and ensemble music are stocked in a room at one end of the music room, the (small) door to which is watched over by cherubs. Most of the Schloss is modern in appearance; this is one of the few rooms that reveals the building’s history. I found plenty of music to amuse myself with, including a collection of Bach preludes and fugues from the WTC apparently edited by Bartók and in what to me was a confusing order, and Beethoven sonatas that at one time I knew how to butcher. Others reported having taken shorter walks than the one that was organized, listening to podcasts, trying out the bicycles available for guests, and also napping.

As you have likely surmised by now, the Twitter hashtag for this event was , and the Twitter feed offers more pictures and information about this workshop.

Engineering Academic Software, Schloss Dagstuhl Day 2

Tuesday started with Jeffrey Carver from the University of Alabama presenting What we have learned about using software engineering practices in scientific software. They took a multi-pronged approach to studying scientific software, from conducting surveys and workshop to direct interactions and case studies. From survey work, his team was able to group problems scientists were having with their own software into four main areas: rework, performance, regression (testing), and forgetting bugs. From this, they could see what software engineering practices might help with solving the problems.

Case studies brought numerous lessons to light; they found that the use of higher-level languages was low, performance competes with other goals, and external software use can be seen as risky. Workshops highlighted some of the differences between scientist programmers and software engineers and their domains. Scientist developers often lack formal software engineering expertise but have deep knowledge of their domains and are often the main users of their software. Quality goals are different, too; scientists would rather software not run than return an incorrect result. This project demonstrated that there is a need to eliminate the stigma associated with software engineering and that software engineers need to understand domain constraints and specific problems. PDF

Every presentation sparked lively Q&A and discussion, often throughout the presentation, and this one was no exception. User stories beat data even for scientistsScreen Shot 2016-06-22 at 7.49.08 PM

The next presentation was Engineering yt by Matthew Turk from NCSA at the University of Illinois at Urbana Champaign. He provided context and information on this well-cited community-developed project, discussed how the community was built, and its adoption of a code of conduct. YTEP, yt Enhancement Proposals, provide a method to manage suggestions for improving yt. Communication methods within the community are well thought out. The challenges of creating and managing the community sparked a lot of discussion; large software projects can have many things go wrong. tweet about change
Failure Modes

Discussion among the group made Matt’s presentation run long, making it necessary to break for coffee before Matt’s talk was done and then return to it after the break. The group was very engaged throughout the day; fortunately, the schedule accommodated the frequent discussions in every presention very well.

After Matt’s talk, Caroline Jay (University of Manchester) and Robert Haines (Software Sustainability Institute) presented Software as academic output. They discussed software’s role in research, when it can be a tool that enables research or the actual research itself, and how this is different depending on the discipline and the functionality of the software within the discipline and the role of the person using the software. They made the point that “Software isn’t a separate thing — software could exist without the paper; the paper couldn’t exist without the software.”
Daniel's tweetChristoph's tweetOh, there was much more goodness in this presentation, which was interrupted by lunch, than I have time to report, including The Horror, as it was termed — the steps necessary for someone to replicate the computational work on one of the research projects this presentation covered — and Robert’s work on making this computational work software available in Docker. It also touched on the FAIR principles for computational research and academic software, and like the other presentations, generated lots of discussion, including conversations in the group on ethical considerations. PDFDan Katz's tweetOscar's tweet

The last formal presentation of the day, before the breakout workgroups, was by Claude Kirchner (INRIA) on the Software Heritage Project. He covered the rationale for this project, which includes the inconsiderate or malicious loss of code and the desire to preserve “our technical and scientific knowledge.” The Software Heritage Project has set out to preserve all the software. Yes, you read that correctly: All the software. Fortunately, a version of the slides for this presentation are online so you can see them for yourself! The site is scheduled to go live next week and I look forward to seeing it.

After Claude’s presentation, we went into breakout sessions.
Christoph's tweet re breakout groupsI joined a breakout session on getting a standing award for scientific contributions through software created. The other breakout sessions were on creating a research software engineering handbook and academic software project typology. All groups reported back before the day’s session ended for dinner. Quite an informative, exciting, and productive day!

Engineering Academic Software at Schloss Dagstuhl

I’m at Schloss Dagstuhl – Leibniz Center for Informatics for a week-long workshop on Engineering Academic Software. Some of the questions we are tackling have been discussed elsewhere, which we are taking into consideration as we talk about them here, and new questions were not only part of the seminar’s original description, but are arising throughout the general and break-out sessions. I would say we’re at the end of the first day but it continues on though it is past 10 PM, with a planned open and vibrant discussion on dogmas past and present. First up for discussion tonight was Agile project management; how do you feel about it? Is this a dogma that needs to be shot or embraced?

The hashtag to follow on Twitter is #dagstuhleas for the full-group discussions; the breakout sessions so far have been too intense for tweeting!

Improving Software Citation and Credit BoF session slides

On Tuesday, October 27, the ASCL held a Birds of a Feather session at ADASS on Improving Software Citation and Credit. The session was opened with a brief presentation by Bruce Berriman, who reported on a Software Publishing Special Interest Group meeting held at the January 2015 AAS meeting and the ongoing work that has come out of that. I followed with a quick overview of other efforts to improve software credit and citation, not just in astronomy but across disciplines, after which Keith Shortridge moderated a lively discussion among the forty people present. The slides Bruce and I presented are now available online.

Previously, we shared resources for the session and the Google doc created during the session to capture some of the main points from the discussion.

Resources for ADASS Birds of a Feather session: Improving software citation and credit

The ASCL has organized a Birds of a Feather session (BoF) at ADASS to discuss improving software citation and credit to be held on Tuesday, October 27; the following links may be helpful for the discussion.

Astronomy software citation examples and ideas (working [Google] document arising from AAS SPSIG discussion)

Astronomy software indexing workshop

Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE)

Force11 Software Citation Working Group (Mission statement, member list, timeline, communications plan, etc. on GitHub)

Center for Open Science‘s Transparency and Openness Promotion (TOP) Guidelines


Google doc created during the BoF session; anyone with the link can comment.

Licensing Astrophysics Codes session at AAS 225

On Tuesday, January 6, the ASCL, AAS Working Group on Astronomical Software (WGAS), and the Moore-Sloan Data Science Environment at NYU sponsored a special session on software licenses, with support from the AAS. This subject was suggested as a topic of interest in the Astrophysics Code Sharing II: The Sequel session at AAS 223.

Frossie Economou from the LSST and chair of the WGAS opened the session with a few words of welcome and stressed the importance of licensing. I gave a 90-second overview of the ASCL before turning the podium over to Alberto Accomazzi from NASA/Astronomy Data System (ADS), who introduced the panel of speakers and later moderated the open discussion (opening slides), after which Frossie again took the podium for some closing remarks. The panel of six speakers discussed different licenses and shared considerations that arise when choosing a license; they also covered institutional concerns about intellectual property, governmental restrictions on exporting codes, concerns about software beyond licensing, and information on how much software is licensed and characteristics of that software. The floor was then opened for discussion and questions.

photo of audience at licensing session

Discussion period moderated by Alberto Accomazzi

Some of the main points from each presentation are summarized below, with links to the slides used by the presenters.

    • Copy-left and Copy-right, Jacob VanderPlas (eScience institute, University of Washington)
      Jake extolled everyone to always license codes, as in the US, copyright law defaults to “all privileges retained” unless otherwise specified. He pointed out that “free software” can refer to the freedoms that are available to users of the software. He covered the major differences between BSD/MIT-style “permissive” licensing and GPL “sticky” licensing while acknowledging that the difference between them can be a contentious issue.
      slides (PDF)
    • University tech transfer perspective on software licensing, Laura L. Dorsey (Center for Commercialization, University of Washington)
      Universities care about software licenses for a variety of reasons, Laura stated, which can include limiting the university’s risk, respecting IP rights, complying with funding obligations, and retaining academic and research use rights. She also covered factors software authors may care about, among them receiving attribution, controlling the software, and making money. She reinforced the importance of licensing code and discussed the common components of a software license.
      slides (PDF)
    • Relicensing the Montage Image Mosaic Engine, G. Bruce Berriman (Infrared Processing and Analysis Center, Caltech)
      In last year’s Astrophysics Code Sharing session, Bruce had discussed the limitations of the Caltech license under which the code Montage was licensed; since then, Montage has been relicensed to a BSD 3-Clause License. Following on the heels of Laura’s discussion and serving as a case study for institutional concerns regarding software,  Bruce related the reasons for and concerns about the relicensing, and discussed working with the appropriate office at Caltech to bring about this change.
      slides (PDF)
    • Export Controls on Astrophysical Simulation Codes, Daniel Whalen (Institute for Theoretical Astrophysics, University of Heidelberg)
      image of presentation slide

      Restricted algorithms; image by Adam M. Jacobs

      Dan’s presentation covered some of the government issues that arise from research codes, including why certain codes fall under export controls; a primary reason is to prevent the development of nuclear weapons.Dan also brought up how foreign intelligence agencies collect information and what specific simulations are restricted, and stated that Federal rules are changing, but slowly.
      slides (PDF)

    • Why licensing is just the first step, Arfon M. Smith (GitHub Inc.)
      Arfon went beyond licensing in his presentation to discuss open source and open collaborations, and how GitHub delivers on a “theoretical promise of open source.” He shared statistics on the growth of collaborative coding using GitHub, and demonstrated how a collaborative coding process can work and pointed out that through this exposed process, community knowledge is increased and shared. He challenged the audience to contemplate the many reasons for releasing a project and to ask themselves what kind of project they want to create.
      slides (PDF)
    • Licenses in the wild, Daniel Foreman-Mackey (New York University)
      First, I have to note that Dan made it through 41 slides in just over the six minutes allotted for his talk, covering about seven slides/minute; I don’t know whether to be more impressed with his presentation skills or the audience’s information-intake abilities!

      17% of GitHub repositories examined are licensed

      Percentage of licensed GitHub repos; image by Arfon Smith

      After declaring that he knows nothing about licensing, Dan showed us, and how, that he knows plenty about mining data and extracting information from it. From his “random” selection of 1.6 million GitHub repositories, he noted with some glee that 63 languages are more popular on GitHub than IDL is, the number of repositories with licenses have increased since 2012 to 17%, and that only 28,972 of the 1.6 million mentioned the license in the README file. Dan also determined the popularity of various licenses overall and by language and shared that information as well.
      slides (PDF)

Open Discussion
After Dan’s presentation, Alberto Accomazzi opened the floor for discussion. Takeaway points included:

  • Discuss licensing with your institution; it’s likely there is an office/personnel devoted to deal with these issues
  • This office is likely very familiar with issues you bring to it, including who to refer you to when the issues are outside their purview
  • “Friends don’t let friends write their own licenses.” IOW, select an existing license rather than writing your own
  • License your code
  • Let others know how you want your code cited/acknowledged

My thanks to David W. Hogg, Kelle Cruz, Matt Turk, and Peter Teuben for work — which started last March! — on developing the session, to Alberto for his excellent moderating and to Frossie for opening and closing it. My thanks also to the wonderful Jake, Laura, Bruce, Dan W, Arfon, and Dan F-M for presenting at this session, and to the Moore-Sloan Data Science Environment at NYU and AAS for their sponsorship.

Many resources on licensing, including excellent posts by Jake and Bruce, can be found here.

Software licensing resources

Below, a list of informative, interesting (or both!) writings about software licensing; the ASCL doesn’t necessarily agree with all positions in these articles, but we want to know what people are thinking even when we don’t agree with them.

EUDAT License Wizard

A Quick Guide to Software Licensing for the Scientist-Programmer
By Andrew Morin, Jennifer Urban, Piotr Sliz

Relicensing yt from GPLv3 to BSD
By Matthew Turk

Best Practices for Scientific Computing
Greg Wilson, D. A. Aruliah, C. Titus Brown, Neil P. Chue Hong, Matt Davis, Richard T. Guy, Steven H. D. Haddock, Katy Huff, Ian M. Mitchell, Mark Plumbley, Ben Waugh, Ethan P. White, Paul Wilson

The Whys and Hows of Licensing Scientific Code
By Jake VanderPlas

Licensing your code
ASCL blog post lists the following:

Making Sense of Software Licensing
Choose a license
Open Source Initiative also offers information on licenses
White paper from the Software Freedom Law Center
Bruce Berriman’s post on relicensing Montage

The Gentle Art of Muddying the Licensing Waters
by Glyn Moody

STM open license suggestions and aftermath

Open Access Licensing
Don’t Muddy the “Open” Waters: SPARC Joins Call for STM Association to Rethink New Licenses
Global Coalition of Access to Research, Science and Education Organizations calls on STM to Withdraw New Model Licenses
STM response to ‘Global Coalition of Access to Research, Science and Education Organisations calls on STM to Withdraw New Model Licenses’
New “open” licenses aren’t so open

Interesting talk on ITAR
Discusses dual-use technologies, which is what codes are under ITAR. These are governed by the Wassenaar Arrangement. The countries that participate meet 3x/year to decide what restrictions to put on dual-use technologies. Dr. James Harrington was the speaker. Slides available on that page.