Data play: Social coding sites

I’ve posted before about where the codes are; here’s a pie that shows the relative use of Github, Google Code, Bitbucket, and Sourceforge. Please note that because all the Starlink codes are in one Github repo, that repo is represented only once in the pie below. Want to do your own analysis? The site links (1080 of them at the moment, as some codes have more than one) are available here.

socialcodingsitepercentages

ASCL visit to NIST

On Thursday, February 12, I visited the National Institute of Standards and Technology (NIST) in Gaithersburg, MD on February 12 to present a seminar titled Restoring reproducibility: Making scientist software discoverable to the research reproducibility users’ group there. Hosted by Chandler Becker and Robert Hanisch, I also had the opportunity to talk with Jim Warren before the presentation; he asked excellent questions during the Q&A, too. Bob and I have often discussed (even argued!) about the amount of metadata the ASCL should maintain, and Jim’s questions were on this point.

After the presentation, I talked with Dan Wheeler, Kimberly Tryka, Andrea Medina-Smith, and Jonathan Guyer. Dan had excellent ideas for the ASCL; as we were standing by the conference room door, I didn’t have the opportunity to write these down but I hope to continue the discussion via email. Kimberly, Andrea, and I talked about metadata, indexing software, and how the ASCL maintains its links to software download sites. We would like to create a way to exchange and share discussion with a larger community and have already started chatting about how to do this in email. Jonathan and I talked generally about the ASCL and how change can occur in a community. After that, Chandler took me to the NIST museum (so cool!) and Bob showed me around a bit before my departure. I had a very interesting and thoroughly enjoyable afternoon!

The abstract and PowerPoint file for my presentation are below; the notes in the slides provide most of the text of my talk, though sometimes simply as bullet points.

Abstract: Source codes are increasingly important for the advancement of science in general and astrophysics in particular. Journal articles meant to detail the general logic behind new results and ideas often do not make the source codes that generated these results available, decreasing the transparency and integrity of the research. The Astrophysics Source Code Library (ASCL) is a registry of scientist-written software used in astronomy research. The challenges of creating and growing the resource will be covered by its current editor, who will also discuss specific steps the ASCL has taken to improve code discovery in astronomy and the effect this work is having within astronomy and more broadly in other research areas.

NISTpresentationslides_Feb12

January 2015 additions to the ASCL

Licensing Astrophysics Codes session at AAS 225

On Tuesday, January 6, the ASCL, AAS Working Group on Astronomical Software (WGAS), and the Moore-Sloan Data Science Environment at NYU sponsored a special session on software licenses, with support from the AAS. This subject was suggested as a topic of interest in the Astrophysics Code Sharing II: The Sequel session at AAS 223.

Frossie Economou from the LSST and chair of the WGAS opened the session with a few words of welcome and stressed the importance of licensing. I gave a 90-second overview of the ASCL before turning the podium over to Alberto Accomazzi from NASA/Astronomy Data System (ADS), who introduced the panel of speakers and later moderated the open discussion (opening slides), after which Frossie again took the podium for some closing remarks. The panel of six speakers discussed different licenses and shared considerations that arise when choosing a license; they also covered institutional concerns about intellectual property, governmental restrictions on exporting codes, concerns about software beyond licensing, and information on how much software is licensed and characteristics of that software. The floor was then opened for discussion and questions.

photo of audience at licensing session

Discussion period moderated by Alberto Accomazzi

Presentations
Some of the main points from each presentation are summarized below, with links to the slides used by the presenters.

    • Copy-left and Copy-right, Jacob VanderPlas (eScience institute, University of Washington)
      Jake extolled everyone to always license codes, as in the US, copyright law defaults to “all privileges retained” unless otherwise specified. He pointed out that “free software” can refer to the freedoms that are available to users of the software. He covered the major differences between BSD/MIT-style “permissive” licensing and GPL “sticky” licensing while acknowledging that the difference between them can be a contentious issue.
      slides (PDF)
    • University tech transfer perspective on software licensing, Laura L. Dorsey (Center for Commercialization, University of Washington)
      Universities care about software licenses for a variety of reasons, Laura stated, which can include limiting the university’s risk, respecting IP rights, complying with funding obligations, and retaining academic and research use rights. She also covered factors software authors may care about, among them receiving attribution, controlling the software, and making money. She reinforced the importance of licensing code and discussed the common components of a software license.
      slides (PDF)
    • Relicensing the Montage Image Mosaic Engine, G. Bruce Berriman (Infrared Processing and Analysis Center, Caltech)
      In last year’s Astrophysics Code Sharing session, Bruce had discussed the limitations of the Caltech license under which the code Montage was licensed; since then, Montage has been relicensed to a BSD 3-Clause License. Following on the heels of Laura’s discussion and serving as a case study for institutional concerns regarding software,  Bruce related the reasons for and concerns about the relicensing, and discussed working with the appropriate office at Caltech to bring about this change.
      slides (PDF)
    • Export Controls on Astrophysical Simulation Codes, Daniel Whalen (Institute for Theoretical Astrophysics, University of Heidelberg)
      image of presentation slide

      Restricted algorithms; image by Adam M. Jacobs

      Dan’s presentation covered some of the government issues that arise from research codes, including why certain codes fall under export controls; a primary reason is to prevent the development of nuclear weapons.Dan also brought up how foreign intelligence agencies collect information and what specific simulations are restricted, and stated that Federal rules are changing, but slowly.
      slides (PDF)

    • Why licensing is just the first step, Arfon M. Smith (GitHub Inc.)
      Arfon went beyond licensing in his presentation to discuss open source and open collaborations, and how GitHub delivers on a “theoretical promise of open source.” He shared statistics on the growth of collaborative coding using GitHub, and demonstrated how a collaborative coding process can work and pointed out that through this exposed process, community knowledge is increased and shared. He challenged the audience to contemplate the many reasons for releasing a project and to ask themselves what kind of project they want to create.
      slides (PDF)
    • Licenses in the wild, Daniel Foreman-Mackey (New York University)
      First, I have to note that Dan made it through 41 slides in just over the six minutes allotted for his talk, covering about seven slides/minute; I don’t know whether to be more impressed with his presentation skills or the audience’s information-intake abilities!

      17% of GitHub repositories examined are licensed

      Percentage of licensed GitHub repos; image by Arfon Smith

      After declaring that he knows nothing about licensing, Dan showed us, and how, that he knows plenty about mining data and extracting information from it. From his “random” selection of 1.6 million GitHub repositories, he noted with some glee that 63 languages are more popular on GitHub than IDL is, the number of repositories with licenses have increased since 2012 to 17%, and that only 28,972 of the 1.6 million mentioned the license in the README file. Dan also determined the popularity of various licenses overall and by language and shared that information as well.
      slides (PDF)

Open Discussion
After Dan’s presentation, Alberto Accomazzi opened the floor for discussion. Takeaway points included:

  • Discuss licensing with your institution; it’s likely there is an office/personnel devoted to deal with these issues
  • This office is likely very familiar with issues you bring to it, including who to refer you to when the issues are outside their purview
  • “Friends don’t let friends write their own licenses.” IOW, select an existing license rather than writing your own
  • License your code
  • Let others know how you want your code cited/acknowledged

My thanks to David W. Hogg, Kelle Cruz, Matt Turk, and Peter Teuben for work — which started last March! — on developing the session, to Alberto for his excellent moderating and to Frossie for opening and closing it. My thanks also to the wonderful Jake, Laura, Bruce, Dan W, Arfon, and Dan F-M for presenting at this session, and to the Moore-Sloan Data Science Environment at NYU and AAS for their sponsorship.

Resources
Many resources on licensing, including excellent posts by Jake and Bruce, can be found here.

ASCL poster at AAS

poster discussing ASCL enhancements, including one-click author search and multiple browsing options

Abstract: The Astrophysics Source Code Library (ASCL, ascl.net) is a free online registry of codes used in astornomy research. Indexed by ADS, it now contains nearly 1,000 codes and with recent major changes, is better than ever! The resource has a new infrastructure that offers greater flexibility and functionality for users, including an easier submission process, better browsing, one-click author search, and an RSS feeder for news. The new database structure is easier to maintain and offers new possibilities for collaboration. Come see what we’ve done!

Authors: Alice Allen (ASCL), Judy Schmidt (ASCL), Bruce Berriman (IPAC/Caltech), Kimberly DuPrie (ASCL/STScI), Robert J. Hanisch (NIST), Jessica D. Mink (SAO), Robert J. Nemiroff (MTU), Lior Shamir (LTU), Keith Shortridge (AAO), Mark B. Taylor (UBristol), Peter J. Teuben (UMD), John F. Wallin (MTSU)

December 2014 additions to the ASCL

Fourteen codes were added to the ASCL in December 2014:

BRUCE/KYLIE: Pulsating star spectra synthesizer
Cheetah: Starspot modeling code
CRPropa: Numerical tool for the propagation of UHE cosmic rays, gamma-rays and neutrinos
DAMIT: Database of Asteroid Models from Inversion Techniques
GeoTOA: Geocentric TOA tools

HMF: Halo Mass Function calculator
Hrothgar: MCMC model fitting toolkit
MMAS: Make Me A Star
PIAO: Python spherIcAl Overdensity code
SoFiA: Source Finding Application

SOPHIA: Simulations Of Photo Hadronic Interactions in Astrophysics
TraP: Transients discovery pipeline for image-plane surveys
URCHIN: Reverse ray tracer
UTM: Universal Transit Modeller

 

Software licensing resources

Below, a list of informative, interesting (or both!) writings about software licensing; the ASCL doesn’t necessarily agree with all positions in these articles, but we want to know what people are thinking even when we don’t agree with them.

EUDAT License Wizard
http://www.eudat.eu/news/eudat-license-wizard-guides-you-through-legal-maze
http://ufal.github.io/lindat-license-selector/

A Quick Guide to Software Licensing for the Scientist-Programmer
By Andrew Morin, Jennifer Urban, Piotr Sliz
http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1002598

Relicensing yt from GPLv3 to BSD
By Matthew Turk
http://blog.yt-project.org/post/Relicensing.html

Best Practices for Scientific Computing
Greg Wilson, D. A. Aruliah, C. Titus Brown, Neil P. Chue Hong, Matt Davis, Richard T. Guy, Steven H. D. Haddock, Katy Huff, Ian M. Mitchell, Mark Plumbley, Ben Waugh, Ethan P. White, Paul Wilson
http://arxiv.org/abs/1210.0530v4

The Whys and Hows of Licensing Scientific Code
By Jake VanderPlas
http://www.astrobetter.com/the-whys-and-hows-of-licensing-scientific-code/

Licensing your code
ASCL blog post https://ascl.net/wordpress/?p=726 lists the following:

Making Sense of Software Licensing
Choose a license
Open Source Initiative also offers information on licenses
White paper from the Software Freedom Law Center
Bruce Berriman’s post on relicensing Montage

The Gentle Art of Muddying the Licensing Waters
by Glyn Moody
http://blogs.computerworlduk.com/open-enterprise/2014/08/the-gentle-art-of-muddying-the-licensing-waters/index.htm

STM open license suggestions and aftermath

Open Access Licensing
Don’t Muddy the “Open” Waters: SPARC Joins Call for STM Association to Rethink New Licenses
Global Coalition of Access to Research, Science and Education Organizations calls on STM to Withdraw New Model Licenses
STM response to ‘Global Coalition of Access to Research, Science and Education Organisations calls on STM to Withdraw New Model Licenses’
New “open” licenses aren’t so open

Interesting talk on ITAR
http://www.state.gov/e/stas/series/154211.htm
Discusses dual-use technologies, which is what codes are under ITAR. These are governed by the Wassenaar Arrangement. The countries that participate meet 3x/year to decide what restrictions to put on dual-use technologies. Dr. James Harrington was the speaker. Slides available on that page.

AAS Software Events: The Short List

A short list without the descriptions, other information, and Saturday-start bootcamp the longer list has, because short is beautiful, too! Some may require registration/charge a fee.

Astropy Tutorial, Sunday, 8:00-11:00 (Tutorial)
Location: 612 (Convention Center)

SciCoder@AAS: Intro to Databases for Astronomers, Sunday, 9:00-5:00 (Workshop)
Location: 607 (Convention Center)

Astrostatistics, Sunday, 9:30-6:00 (Workshop)
Location: 618/619 (Convention Center)

Collaborating Online with GitHub and Other Tools, Sunday, 12:00-5:00 (Workshop)
Location: 303 (Convention Center)

232. Licensing Astrophysics Codes: What You Need to Know, Tuesday, 2:00-3:30 (Special Session)
Location: 615 (Convention Center)

Software Publication Special Interest Group (SPSIG) Inaugural Meeting, Tuesday, 3:45-4:45 (Special Interest Group meeting)
Location: 615 (Convention Center)

Catalogs, Surveys, and Computation Posters, Wednesday, 9:00-5:30

315 Astroinformatics and Astrostatistics in Astronomical Research: Steps Towards Better Curricula, Wednesday, 10:00-11:30 (Special Session)
Location: 620 (Convention Center)

The SKA Telescope: Global Project, Revolutionary Science, Extreme Computing Challenges, Wednesday, 12:30-3:30 (Splinter Meeting)
Location: 4C-4 (Convention Center)

332. Catalogs/Surveys/Computation – UVOIR, Wednesday, 3:10-3:20 PM (Oral Session)
Location: 620 (Convention Center)

434. Computation, Data Handling and Other Matters Posters, Thursday, 9:00-2:00

Hack Day, Thursday, 10:00-7:00 (Workshop)
Location: 4C-2 (Convention Center)

Update: Where the codes are; also, a bit about citing software

This is an update on figures I’ve previously shared (most recently here). Currently, the ASCL indexes 977 codes. The percentage of these codes housed on social coding sites are:

GitHub: 8.1%
SourceForge: 4.2%
Code.Google: 2.8%
Bitbucket: 1.3%

This gives us 16.4% of codes listed on the ASCL housed on a public social coding site, an increase since February of 5.4%, most of this from GitHub (up from 4.2% in February), though the percentages of four sites have increased.

As I said in February, I expect the percentage of codes on social coding sites will continue to grow, especially since GitHub’s use is increasing quickly in the community. One factor some credit for this increase is that GitHub has made it easy to push code to Zenodo for archiving and DOI minting, and providing another way to cite code.*

As mentioned in my previous post, how codes are cited vary. Software citation will be the main topic at Tuesday’s inaugural Software Publishing Special Interest Group meeting at AAS225, which will be held at 3:45 PM in 615 of the Convention Center. If you are at AAS this week, you are welcome to attend and I hope to see you there!

 

*It was reported at .Astronomy6 that “some astro journals won’t even accept a DOI as a citation.” I don’t know which journals and hope someone will enlighten me; I would like to know the rationale for that stance and would gladly take this up with publishers.