Contact Us: wsb36[at]case.edu
Introduction:
The majority of accepted papers in biocomputing describe new computational approaches to relevant biological problems, and while journals and conferences often require the availability of software and source code, there are limited resources available for trainees to maximize the distribution and use of their software within the scientific community. While the accepted standard is to make source code available for new approaches published work, the growing problem of system configuration issues, language and library version conflicts, and other implementation issues often impede the broad distribution and availability of software tools. There are a variety of solutions to these implementation issues, but the learning curve for applying these solutions is steep. In this tutorial for the Pacific Symposium of Biocomputing, we will demonstrate tools and approaches for packaging and distribution of published code.
What you will learn:
Our session co-chairs will provide hands-on experience creating the following: simple Docker containers that encapsulate code along with the libraries and software environments necessary to support them (Dr. Brett Beaulieu-Jones); R and python packages that contain validated source code and are distributed through widely available package repositories (Dr. Nicholas Wheeler), and Jupyter/Colab notebooks which meld analysis workflows with data visualization (Dr. Christian Darabos). In this highly interactive and informal/comfortable session, participants are invited to bring laptops and actively work through packaging processes step-by-step. We are hopeful that many PSB attendees with accepted papers will attend the session and package their own software for distribution on the PSB website. All participants will be provided with example code for use in the tutorial. We will provide a brief overview and demonstration (approximately 15-20 minutes for each packaging solution) followed by 20-25 minutes of supervised hands-on activity.
Workshop Organizers:
William S. Bush, Ph.D.
William S. Bush, Ph.D. is an Associate Professor in the Department of Population and Quantitative Health Sciences, and Assistant Director for Computational Methods in the Cleveland Institute for Computational Biology at Case Western Reserve University. Dr. Bush received his Ph.D. at Vanderbilt University in Human Genetics in 2008 and then continued as a post-doctoral fellow in the Neurogenomics Training Program at Vanderbilt. Dr. Bush was recently named a Mt. Sinai Health Care Foundation Scholar. As a human geneticist and bioinformatician, Dr. Bush’s research interests include understanding the functional impact of genetic variation, developing statistical and bioinformatics approaches for integrating functional genomics knowledge into genetic analysis, and the use of electronic medical records for translational research. Dr. Bush has attended PSB annually since 2010.
Brett Beaulieu-Jones, Ph.D.
Brett Beaulieu-Jones, Ph.D. is an Instructor in Biomedical Informatics in the Kohane lab at Harvard University. He received his PhD from the Perelman School of Medicine at the University of Pennsylvania under the supervision of Dr. Jason Moore and Dr. Casey Greene. Dr. Beaulieu-Jones’ doctoral research focused on using machine learning-based methods to more precisely define phenotypes from large-scale biomedical data repositories, e.g. those contained in clinical records. He is currently performing large-scale data integration (genomic, therapeutic, imaging) to both better understand disease etiology as well as provide precise therapeutic recommendations. Initially, he is working to develop targeted models of drug selection for patients with refractory epilepsy and to further develop machine learning methods that model the way patients progress over time using longitudinal data. Dr. Beaulieu-Jones has attended PSB since 2016.
Christian Darabos, Ph.D.
Christian Darabos, Ph.D. is the Assistant Director for Research Informatics at the Research, Teaching and Learning unit at Dartmouth College, supporting the life-science informatics efforts of the scientific community. His research interests include network analysis and visualization of complex biomedical data and the development of machine learning techniques applied to large-scale datasets. Christian has been leading, organizing and teaching Software Carpentry workshops and Reproducible Research principles seminar series at Dartmouth and at Higher Ed. institutions around the country. He has attended PSB since 2011.
Nicholas Wheeler, Ph.D.
Nicholas Wheeler, Ph.D. is a Research Associate in the Institute for Computational Biology and the Bush lab at Case Western Reserve University. Dr. Wheeler is a macromolecular scientist and engineer by training with extensive expertise in the use of “big data” technologies for large scale data aggregation and analysis. In the Bush Lab, Dr. Wheeler manages genomic datasets and their associated meta-data within a Spark/Hadoop cluster, with extensions to the open-source HAIL platform for genomic analysis, which ensures standardization and reproducibility of experimental analyses. Over the course of his career, Dr. Wheeler has created, validated, and submitted multiple R and Python packages into public repositories. Dr. Wheeler has attended PSB since 2019.