Announcement elPrep 2.0
We have just released elPrep 2.0, a new version of the high-performance tool for preparing SAM/BAM/CRAM files for variant calling in DNA sequencing pipelines that we first released earlier this year. It can be used as a replacement for standard tools, such as SAMtools and Picard, for preparation steps such as sorting, marking duplicates, reordering contigs, and so on, while producing identical results. elPrep is designed as a multi-threaded applications that runs entirely in memory, avoids repeated file I/O, and merges the computations of several preparation steps, to speed up the execution time by an order of magnitude. For example, on a 16-core server, we see a speedup of 10.5x when using elPrep compared to using a combination of SAMtools and Picard. elPrep is also a modular, extensible framework, where users can easily add more preparation steps that automatically take advantage of elPrep’s inherent parallelism and performance.
elPrep 2.0 adds a number of new features:
• elPrep 2.0 comes with two new tools, a split tool to split up SAM/BAM/CRAM into smaller files, where splitting happens according to chromosomal regions, and a merge tool to merge these smaller files back to larger files that combine all the information from all the chromosomal regions. Splitting and merging can be used to reduce the memory pressure on elPrep by allowing it to processes regions separately.
• elPrep 2.0 also provides Python scripts to coordinate the conversion of BAM/CRAM files to and from SAM files using samtools, and the splitting and merging described above. This enables easier tweaking of certain aspects of elPrep, and integration of elPrep into existing sequencing pipelines.
• elPrep 2.0 also includes numerous bug fixes.
The core of elPrep is implemented in Common Lisp. Originally, it relied exclusively on the symmetric multi-processing features from LispWorks, but to support open-source development, elPrep 2.0 has now been ported to SBCL, so it now can execute both on LispWorks and SBCL. (However, performance on LispWorks 64bit editions is generally better, and the use of servers with large amounts of RAM is also more convenient with LispWorks. In the latter case, SBCL may require tweaking of parameters before compiling SBCL itself, please see the elPrep documentation for more details.)
elPrep has been developed at Imec Belgium at the ExaScience Life Lab (http://www.exascience.com), in collaboration with Intel and Janssen Pharmaceutica (Johnson & Johnson).
The open source release is available at http://github.com/exascience/elprep with full end user documentation. A demo is available at http://github..com/exascience/elprep-demo including test data. The API documentation at http://exascience.github.io/elprep/elprep-package/ provides details about the elPrep framework.
---
Charlotte Herzeel, PhD
Researcher at ExaScience Life Lab (via IMEC)
Address: Kapeldreef 75, 3001 Leuven, Belgium
_______________________________________________
Lisp Hug - the mailing list for LispWorks users
lisp-hug@lispworks.com
http://www.lispworks.com/support/lisp-hug.html