Category Archives: best practices

Creating and evaluating data management plans

I’m delighted to offer the following guest post by Jonathan Petters, Data Management Consultant, Johns Hopkins Data Management Services, and thank him very much for it!

Funding agencies have long encouraged and expected that data and code used in the course of funded research be made available to those in the research discipline.In a recent discussion on preservation and sharing of research data, a few participants expressed their concern (paraphrased here) that “My research community doesn’t know how to create a quality data management plan” and “We don’t know how to evaluate data management plans.” The astronomy community explicitly requested a little guidance. We in Johns Hopkins University Data Management Services have developed a few resources, described below, of use in both developing and evaluating data management plans within all research disciplines, including astronomy.

Funding agencies have long encouraged and expected that data and code used in the course of funded research be made available to those in the research discipline. NSF is an important funder of astronomical research that has such expectations (and the agency I will focus on here). A few years ago NSF began requiring data management plans as part of research proposal, in part to aid in the dissemination and sharing of research data and code. Following a February 2013 Office of Science and Technology Policy memo other US funding agencies are expected to follow suit with similar data management plan requirements, including the Department of Energy’s Office of Science.

What does NSF say about writing and evaluating quality data management plans? A good overview of NSF data policies relevant for the AST community can be found in these slides from Daniel Katz, NSF). In general the National Science Foundation (NSF) states that data management will be defined by “the communities of interest.” The NSF AST-specific policy further states “MPS Divisions will rely heavily on the merit review process in this initial phase to determine those types of plan that best serve each community and update the information accordingly.” Neither statement is especially prescriptive and can leave researchers unclear as to what they should do.

Creating a plan
While effective research data management certainly has community- and discipline-specific attributes, there ARE aspects of effective data management that are generalizable across research disciplines. It is around these general aspects that we in Johns Hopkins University Data Management Services (JHUDMS) devised our Data Management Planning Questionnaire. We work through this questionnaire with researchers at Johns Hopkins to help them create effective data management plans.

The Questionnaire is designed to comprehensively hit upon the important aspects of effective research data management (e.g. data inputs/outputs in the research, ethical/legal compliance, standards and formats used, intended sharing and preservation, PI restrictions on the use of the data).  By answering the applicable questions in the document, removing the questions/front matter and connecting the answers in each section into paragraphs, a researcher would be well on their way to a quality, well thought-out data management plan.

Two relevant side-notes:
1.)   For the Questionnaire we consider code and software tools as one ‘kind’ of research data; thus analysis or simulation codes used in the course of your proposed research should be included as a Data Product. While research code and research data generated or processed by code are clearly NOT the same, there are many similarities in managing the two. In both cases effective management should include consideration of documentation, licensing, formats, associated metadata, and upon what platform(s) the data or code could be shared.

2.)   Astronomy, as in other disciplines, conducts a substantial amount of research through large collaborations (e.g. surrounding HST or SDSS data). In these cases it is typical for investments in research data infrastructure to be made, and data policies/practices to be defined for those working with the data. Citing those policies and practices in a data management plan would be appropriate.

Screenshot of Reviewer Guide and Worksheet for Data Management Plans

Screenshot of Reviewer Guide and Worksheet for Data Management Plans

Evaluating a plan
To help researchers evaluate data management plans for their quality, my colleagues developed the Reviewer Guide and Worksheet for Data Management Plans (dotx). This Guide and Worksheet is a complement to our Questionnaire; it is a handy checklist by which a grant reviewer can determine whether a researcher thoroughly considered the important aspects of research data management.

For those who researchers saying to themselves, “The Questionnaire and Reviewer Guide are nice, but PLEASE just tell me what to do!!!”, I found two tweets from the code sharing session at the latest (223rd) AAS meeting in January to be quite relevant (h/t August Muench and Lucianne Walkowicz):

Who enforces software/data sharing in astronomy? YOU DO! WE DO! PEER REVIEW DOES! not snf/nasa #aas223 #astroCodeShare It's UP TO YOU to include good data management plan as part of panel reviews. The community must enforce importance. #aas223 #astroCodeShare

I wholeheartedly agree with both tweets. It is up to the research community members to police and enforce the data management and sharing practices they would like to see in their community. That’s how peer review works! So the next time you review astronomical research proposals, look over the data management plans carefully and bring up relevant thoughts and concerns to the review panel.

Summing up
I hope the Data Management Planning Questionnaire and Reviewer Guide and Worksheet for Data Management Plans help you and other researchers in the astronomy community more fully develop expectations for data management and sharing practices. It’s likely your institution also has research data management personnel (like the JHUDMS at Hopkins) who are more than happy to help!

New papers to read

It’s not just astrophysics; other sciences are also grappling with issues surrounding software release, transparency of research, and collaboratively sharing codes.

The challenge of software licensing came up in the AAS 223 Special Session on code sharing; ASCL advisor Bruce Berriman followed up on this issue with a post on Astronomy Computing Today, and I’ve recently run across A Quick Guide to Software Licensing for the Scientist-Programmer, which also offers some guidance on this important issue.

Citations redux

I’ve recently learned that some citations to ASCL (and arXiv) entries are not caught by ADS because some BibTeX styles (.bst) don’t support the eprint field, which ADS uses when generating the BibTeX for ASCL and arXiv entries. The lack of support for the eprint field results in a citation that formats the ascl ID incorrectly; for ADS to be able to find and count the citation, the ascl ID needs to be formatted just as it appears in the code entry, e.g. ascl:1010.051 for NEMO. The arXiv site has a list of BibTeX styles that have been updated to support the eprint field, and Norman Gray’s nice urlbst code can add this functionality to existing .bst files.

(This information has been added to the Citing ASCL code entries page.)