Welcome to Galaxy at the Beer Lab!

Here we host kmer-SVM, a tool suite designed to aid in analysis of next-generation sequencing (NGS) data. Our suite uses a support vector machine (SVM) with kmer sequence features to identify predictive combinations of short transcription factor binding sites which determine the tissue specificity of the original NGS assay. Information gained from kmer-SVM can be used as an additional source of confidence in genomic experiments by recovering known binding sites, and can also reveal novel sequence features and possible cooperative mechanisms to be tested experimentally.


kmer-SVM Modules

kmer-SVM is comprised of the following modules:


Sample Datasets and Workflows

We have provided several sample datasets for use in becoming familiar with our toolkit. They are in the Data Libraries section of this website (at the top of the page, click "Shared Data", then select "Data Libraries"). Click on a library name, then check the positive and negative datasets and click "Go" to import the library data to your history. Click on "Galaxy/Beer Lab" at the top of the page to return to this page when finished. This data can now be used in the kmer-SVM suite.

A typical workflow is diagrammed below. Briefly:

  1. Users will have a dataset in BED format, for which they create a negative dataset.
  2. FASTA files are generated for positive and negative regions.
  3. Positive and negative files are used for SVM training.
  4. Weight files are analyzed for transcription factor binding sites.
  5. Classifier accuracy is assessed by ROC curve.
Typical kmer-SVM workflow


Introductory Tutorial

A tutorial describing this process in more detail can be found here.


User History Information

While anyone can use kmer-SVM without a user account, please note that all datasets for anonymous users are deleted after 60 days. We encourage users intending to pursue longer-term research using kmer-SVM to create a user account. To do so, click on 'User' in the blue bar at the top of the page and then 'Register'. Please note also that datasets deleted by any users are permanently deleted after 60 days.


This project is a collaboration between Christopher Fletez-Brant and Dongown Lee respectively of the McCallion Lab of the McKusick-Nathans Institute of Genetic Medicine at the Johns Hopkins University School of Medicine and the Beer Lab of the Johns Hopkins University Department of Biomedical Engineering.

If you use kmer-SVM, please cite the following publications:


Source Code

kmer-SVM is also available as a standalone software suite. In particular, users wishing to apply kmer-SVM to unsupported genomes should investigate the use of the standalone suite. For source code and installation instructions, please see the install guide.


Contact Information

To report errors or difficulties with kmer-SVM, please get in touch with us. When contacting the kmer-SVM team, please remember to include any error messages encountered, if applicable.