ADASS Birds of a Feather session

Several of the ASCL’s Advisory Committee are panelists for a Birds of a Feather session that seeks answers to the following questions:

How do we ensure code release is recognized as an essential part of assuring reproducibility of research?

How can the community change the culture so developers will release their programs?

What can we do to ensure code authors receive credit for writing and releasing their software, and encourage them to release it even if it’s “messy” code?

How do we reduce expectations of support when a developer does not wish to (or cannot) take on that role after program release?

What role might journal publishers and funding agencies have in furthering code release, and how can the community influence them to take on that role?

How can universities be convinced to change policies which prohibit software publication?

Can funding agencies and publishers encourage documentation of programs, and if so, how?

Got answers? Ideas? Comments? Please share!

2 thoughts on “ADASS Birds of a Feather session

  1. Alice Post author

    Below are Kim DuPrie’s notes from the BoF session; please feel free to weigh in!

    BOF notes:

    distinguish between algorithms and source code
    code needs to be rewritten for various reasons.
    – This shouldn’t prevent people from publishing code
    – When re-writing code it’s helpful to see original version

    making code available is difficult: need to document, make portable, support etc
    – Some people release code through MatLab or R. Could our community use this?
     – We use many different languages, so this probably isn’t feasible

    Open source is good
    – Can learn a lot about how things work, especially if revision history is available
    – Even if the code doesn’t work on your system, you can reimplement if you have access to the code
    – Having the algorithm isn’t enough, you need to know how to implement it
    – Conversely, if the algorithm is available, someone may have a better way to implement it
    – Github is good because it lets you track history, submit tickets, encourages collaboration

    It’s good to release messy code, because it’s good from CS students to learn
    – CS students can clean up code
    – People will find bugs you missed, and sometimes fix them as well
    – If we can identify codes that are heavily used, CS can concentrate on cleaning up that code
    – Encourage people to cite your code so others can see how heavily it’s used

    Open source is great for getting algorithms out there, but how can we account for validation of code?  (i.e. are there good test cases?)
    – published code (referred) has to be tested and proven or else it won’t get published
    – perhaps we should have an FAQ saying what required before making code available (i.e. X amount of testing, test suite, etc)
    – Given that code written today may not work tomorrow (libraries disappear, OS’s change, etc.) That’s why it’s important to publish algorithms as well.

    When you release messy code it becomes quite onerous
    – get lots of emails complaining it doesn’t work, or is too buggy: some people want to release messy code just to display the algorithm, but they get lots of email about the code
    – There’s an open source license called Crapl by Matt Might that basically says don’t bother me with email

    If you want to publish code, as a project manager you should account for funding for this at the beginning of the project.
    – In a lot of cases there is no funding for this, so people don’t have the resources to make it available.
    – Project managers should help staff in their career development. Having them publish their code is a good way to do this. This would also make sure that the code is well written

    Publishing algorithms may not be enough, publishing a design pattern is better

    NSF requires people to publish data and encourages them to release software.

    There’s a difference between software written by Joe Blow to solve his problem, software written for a big project, and algorithms.  These 3 things shouldn’t all be treated the same way.

    If interested in usefulness and reproducibility, there’s no need to release code, just need a web service
    – The usefulness of open source is that the code be improved by others

    Why do scientists write code? It should be done by IT people
    – There’s always someone who’s better at writing code. How far do you pass it off?
    – But some people are really really bad at writing code. In that case you should work with IT people, and then maybe you’ll be more willing to share the code with the community.
    – cost is a factor: may not have money for developers, or may choose to spend money on something else
    – might be too difficult to explain to IT people what code you want them to write
    – scientists feel they can do it all themselves: we should try to change this belief starting with undergraduates.
    – scientists sometimes don’t trust other people’s code

    Create a very high-level programming language that a scientist can easily use, where the guts are written by IT
    – How do we do this?

    How to encourage people to release code?
    – Use public source repositories like Github etc, that make it easier to release code
    – people put more effort into code they release than code written for personal use. They may not want to release personal-use code because they have to create a makefile, tarball, etc.
    – even source code repositories can disappear after awhile. This is a concern

    How can journals encourage people?
    – need to distinguish between big projects like ALMA and code written by an individual.
    – referee should ask the individual for the code and validation
    – develop guidelines for different types of papers

    Need experience to run some code. They’re happy to let people use it as a black box. If they publish the code they’re afraid people will use the code incorrectly and it will reflect badly on the developers.
    – ASCL does not require you to give us the code, we just put in pointers

    Reply
  2. Pingback: Astrophysics Code Sharing II: The Sequel at AAS 223 – ASCL.net

Leave a Reply

Your email address will not be published. Required fields are marked *