Reusability

Note

These standards are designed to facilitate reuse of model code which in principle supports reproducibility claims and verification of model results. Comments and suggestions are welcomed, and will be carefully considered by the OMF Working Groups and Membership. The standards goals and minimum implementation standards aim to capture concerns and practices among the members of OMF. Individual application domains may extend these standards to capture additional context relevant to their domain.

Overview of Reusability Standards

In this document we adopt the reuse terminology defined in the FAIR Principles for Research Software.

Reusability implicitly includes usability and focuses on the ability of humans and machines to execute, inspect, and understand the software so that it can be modified, built upon, or incorporated into other software.

Goals for Reusability Standards

A reusable computational model can be executed, understood, modified, built upon, or incorporated into other software.

Minimal Reusability Standards

A minimal set of guidelines that can be adopted by journals to ensure that submitted publications meet baseline reproducibility and reusability requirements.

Reusable computational models must:

meet OMF minimal standards for Accessibility and Documentation
have a clear and accessible open source, OSI approved license
include detailed metadata that facilitate reuse (e.g., input and output semantics, data types, units)
include detailed provenance on authorship and contributions
provide qualified information on all software and system dependencies with versions (operating system, software and system libraries)
provide clear instructions on how to execute the software

Ideal Reusability Standards

In order to meet the ideal standards, computational models should:

favor open file formats for data inputs and outputs (e.g., CSV, netCDF, geoJSON, Parquet, Feather)
provide durable containerization recipes (i.e., archival quality container images)
include relevant output analyses, data pipelines, and/or workflows
include metadata on related research outputs (publications, other software, relationship)
use continuous integration services that run automated tests on the software
for software with large compute or data requirements, representative input data samples along with sampling methodology
provide additional community established domain specific standards

Cyberinfrastructure and Tools to Support Reusability Standards

Build Docker images from research code:

stencila/dockta https://github.com/stencila/dockta
ReproZip https://www.reprozip.org/
SciUnit https://github.com/scidash/sciunit
binder https://mybinder.org/
repo2docker https://repo2docker.readthedocs.io (used by binder)

Computational Archives:

OMF may consider developing scaffolding for common modeling frameworks that reduce friction of adoption

examples: https://github.com/uwescience/shablona and https://github.com/geodynamics/software_template
GitHub bot that can help improve compliance with minimal / ideal standards
cookiecutter project structure that supports best practices for reproducibility and reusability (e.g., Cookiecutter Data Science)

Examples and References

Lorena Barba’s reproducible workflow for computational fluid dynamics https://github.com/barbagroup/cloud-repro
https://carpentries-incubator.github.io/good-enough-practices/
http://www.practicereproducibleresearch.org/
Software Deposit Guidelines from SSI
Proposed Standards for Peer-Reviewed Publication of Computer Code
TODO: find or build example codebases that meet minimal and ideal standards

Issues / Errata

Dependencies on commercial / closed source products are fine so long as they are clearly qualified with version and operating system e.g., MATLAB R2016b (Windows 10), AnyLogic 8.7 (Windows 10), ArcGIS 10.8.1 (macOS 10.15), NetLogo 6.2.1 (Ubuntu 20.04LTS)

Last modified 14.02.2024: build(deps): yarn upgrade (ca61da3)