SeQuiLa: an elastic, fast and scalable SQL-oriented solution for processing and querying genomic intervals


Efficient processing of large-scale genomic datasets has recently become possible due to the application of ‘big data’ technologies in bioinformatics pipelines. We present SeQuiLa—a distributed, ANSI SQL-compliant solution for speedy querying and processing of genomic intervals that is available as an Apache Spark package. Proposed range join strategy is significantly (∼22×) faster than the default Apache Spark implementation and outperforms other state-of-the-art tools for genomic intervals processing.The project is available at data are available at Bioinformatics online.



  author = {Szmurło, Agnieszka and Wiewiórka, Marek and Gambin, Tomasz and Leśniewska, Anna and Stępień, Kacper and Borowiak, Mateusz and Okoniewski, Michał},
  title = ,
  year = {2018},
  month = nov,
  doi = {10.1093/bioinformatics/bty940},
  url = {},
  eprint = {},
  public = {yes}