Rapid storage and retrieval of genomic intervals from a relational database system using nested containment lists.

pub_thumbnail_23894185.png

Wiley LK, Sivley RM, Bush WS,.

Efficient storage and retrieval of genomic annotations based on range intervals is necessary, given the amount of data produced by next-generation sequencing studies. The indexing strategies of relational database systems (such as MySQL) greatly inhibit their use in genomic annotation tasks. This has led to the development of stand-alone applications that are dependent on flat-file libraries. In this work, we introduce MyNCList, an implementation of the NCList data structure within a MySQL database. MyNCList enables the storage, update and rapid retrieval of genomic annotations from the convenience of a relational database system. Range-based annotations of 1 million variants are retrieved in under a minute, making this approach feasible for whole-genome annotation tasks. Database URL: https://github.com/bushlab/mynclist.

Posted in Publications, Will Bush and tagged , , , , , .

Will Bush

William S. Bush, Ph.D., is a human geneticist and bioinformatician, and Assistant Professor within the Cleveland Institute for Computational Biology and the Department of Population and Quantitative Health Sciences at Case Western Reserve University.