UTGB Toolkit - Track Catalog

Track Catalog in UTGB

UTGB has several built-in tracks. In order to use them, specify track class name in your view file:

-track
 -class: (track class name)

Available track class names

General Tracks

NavigatorTrack provides a navigation bar
RulerTrack is a ruler bar with range select support
ScaleBarTrack is a measure for seeing genome sequence length.
KeywordSearchTrack for adding keyword search feature

Tracks for displaying data files

ReadTrack for visualizing BED, SAM, BAM file data
RefSeqTrack, ChromosomeMapTrack for showing genome sequences written in FASTA files
WigTrack for Wiggle (WIG) files for drawing bar charts on genome coordinate.

ReadTrack for displaying BED/SAM/BAM format data

ReadTrack in UTGB can be used to display genes, annotation on genome coordinates and read data written in BED, SAM and BAM (http://samtools.sourceforge.net/) formats. To do so, first you need to import these files using utgb import command.

Importing BED files

BED is a common format for describing genes (including CDS, exon/intron regions) or locus annotations of genomes. You can obtain many useful BED files from UCSC's table browser, http://genome.ucsc.edu/cgi-bin/hgTables. For example, you can retrieve BED data of RefSeq gene sets by selecting refGene table and switch the output format to BED.

# create a BED database
$ utgb import db/refGene.bed

View configuration

After BED files are imported, edit your view file:

# Track for BED data
-track
 -name: RefSeq Genes
 -class: ReadTrack
 -property
   -path: db/refGene.bed

When importing a BED file, the utgb import command creates an SQLite database file (suffixed with .sqlite) to accelerate database searches. When SQLite DB files corresponding to the input BED files are found, ReadTrack issues database queries to reads.bed.sqlite file instead of reading raw BED files.

BED example

db/sample.bed

track name="Item,RGB,Demo2" 
chr1 20000 70000 itemB 200 - 22000 69500 0 4 4330,1000,5500,15000 0,5000,20000,35000
chr1 20000 60000 cloneB 900 - 20000 60000 0 2 4330,3990, 0,36010
chr1 10000 50000 itemA 960 + 11000 47000 0,255,255 2 15670,14880, 0,25120
chr1 10000 50000 itemA 960 + 11000 47000 0 2 15670,14880, 0,25120
chr1 10000 50000 itemC 960 + 11000 47000 0
chr1 10000 50000 itemD 960 + 11000 47000 0,255,0

chr22   20100000 20100100	first
chr22   20100200 20100300	second
chr22   20100400 20100500	thirds
chr7	127472363	127473530	Pos2	200
chr7	127472363	127473530	Pos2	200	+
chr7	127472363	127473530	Pos2	200	+	127472363	127473530	
chr7	127472363	127473530	Pos2	200	+	127472363	127473530	255,0,0

../clip/beddas.png

In BED files, the first six parameters (chr, start, end, name, score, strand) are required, and the other parameters are optional. By using RGB color fields, you can change the colors of BED entries.

Importing SAM files

SAM format is commonly used to describe alignment results of next-generation sequencer reads (e.g., Illumina, SOLiD, etc.). BAM format is a binary version of SAM. You can use the both formats in UTGB, but using BAM is recommended for the performance reason. To display read data in BAM files, you need .bai (BAM index) file to quickly locate file positions of the target window (chromosome name, start-end position). The bai files can be created when you import SAM files using UTGB, or samtools index command also can be used.

# create BAM (.bam) and BAM Index file (.bai) from the input SAM file
$ utgb import db/shortread.sam

Adding tracks for BED and SAM data

After importing BED or SAM files, edit your view file:

# Track for BAM data. In this example, db/shortread.bam.bai file must be present.
-track
 -name: Short Read
 -class: ReadTrack
 -property
   -path: db/shortread.bam

SAM example

Here is a screenshot of displaying a BAM file of Illumina's paired-end library data: clip/pe-read.png

ReadTrack Properties

ReadTrack has severl properties to customize the visualization.

-track
 -class: ReadTrack
 -property
  -path: db/sample.bam
  -wig path: db/sample.bam.wig.sqlite
  # pileup (draw individual reads) or coverage (shows read depth graph)
  -layout: pileup
  # Used when displaying read coverage
  -window funcion: MAX
  # pixel height of reads
  -read height: 2
  # horizontal margin between reads
  -read margin: 1
  # hide read name labels
  -show labels: false
  # When true, the lower the base quality, the more translucent the base color becomes
  -show base quality: true
  # Increase this number to show many reads (but consumes your browser memory and may slower the track drawing speed)
  -num reads to display: 1000

The wig path specifies wig database files of read depths, created from BAM files, which can be used to draw read depths when the number of reads contained in a query window are too many to display in the track.

WIG Data

To display bar graph data (e.g., read depth coverage, GC contents percentage data for each locus on genome), use WIG format http://genome.ucsc.edu/goldenPath/help/wiggle.html and WigTrack. First create an SQLite database file from a WIG file:

$ utgb import db/sample.wig

Then, add the following configuration to your view file:

# Track for WIG data
-track
 -name: bar graph 
 -class: WigTrack
 -property
  -path: db/sample.wig.sqlite
  # Windowing function: MAX, MIN, MEDIAN or AVG
  -window function: MAX
  # max, min value of the y scale
  -max value: 100
  -min value: 0
  # If true, adjust the y scale according to the displayed data
  -auto scale: true 
  # To use log scale, set this parameter to true
  -log scale: false
  # Graph color. red, green, blue (0-255) and alpha (0-1)
  -color: rgba(255,128, 128, 0.7)

Creating read depth WIG data from BAM files

You can create read depth (coverage) graph data from BAM files by using utgb readdepth command.

$ utgb readdepth (BAM file) > db/readdepth.wig

Wig files are large in general. To save the storage space you can directly create wig databases of read coverage as follows:

$ utgb readdepth (BAM file) | utgb -t WIG import -o db/readdepth.wig.sqlite

FASTA files

To display a genome sequence as a track, you need to create a sequence database from a FASTA file. Commonly used FASTA files of reference genome sequences are available from UCSC's web site. For example, the fasta file of S. cerevisiae is available from http://hgdownload.cse.ucsc.edu/goldenPath/sacCer3/bigZips/chromeFa.tar.gz.

To create databases of FASTA files, do utgb import:

$ utgb import -t FASTA db/chromFa.tar.gz -o db/sacCer3.sqlite

The -t option specifies the input file format. The utgb import accepts both compressed archives (.tar.gz) and raw fasta files (.fa, .fasta).

The above import command generates a compressed FASTA database (SQLite file) db/sacCer3.sqlite. Next, add a track entry to your config/view/default-view.silk file:

-track
 -name: Genome Sequence
 -class: RefSeqTrack
 -property
  -path: db/sacCer3.sqlite

This will add a new track named Genome Sequence to your genome browser.

Keyword Search Track

To add keyword search functionality, add the following track definition to your view:

-track
 -class: KeywordSearchTrack

To add keyword entries to the database from the read names in BED/SAM/BAM files, use utgb keyword command:

$ utgb keyword import -r ce6 (BED/SAM/BAM file)

-r option is used to associate sequence revision name and keywords contained in the input file. It is also possible to add your own keywords annotating loci in genome sequences. For details, see a help message in utgb keyword -help.

DAS Data

DAS track is for browsing DAS (Distributed Annotation System) data. Currently (utgb-shell version 1.2.3), browsing feature data is supported. Add the following Silk fragment to your view file, and edit the dasBaseURL property to specify the DAS server URL:

-track
 -class: DASTrack
 -name: DAS Track
 -pack: true
 -property
  -type: image
  -leftMargin: 100
  -dasBaseURL: http://www.ensembl.org/das/Homo_sapiens.NCBI36.transcript/
  -dasType: refGene

Screnshots