org.utgenome.format.agp
Class Assembly

java.lang.Object
  extended by org.utgenome.format.agp.Assembly

public class Assembly
extends Object

File Format: One feature of the AGP file is that column definitions change depending on whether the line is a component line or a gap line. There is a single column definition up to column 5, then each column will have two definitions, depending on the value in column 5. column content description

column content description
1 object This is the identifier for the object being assembled. This can be a chromosome, scaffold or contig. If the object is a chromosome and an accession.version identifier is not used to describe the object, then the naming convention is to precede the chromosome number with gchrc (if a chromosome) or gLGh (if a linkage group). For example: chr1. If the object is a contig or scaffold, then the identifier needs to be unique within the assembly.
2 object_beg The starting coordinates of the component/gap on the object in column 1. These are the location in the objectfs coordinate system, not the component’s.
3 object_end The ending coordinates of the component/gap on the object in column 1. These are the location in the objectfs coordinate system, not the component’s.
4 part_number The line count for the components/gaps that make up the object described in column 1.
5 component_type The sequencing status of the component. These typically correspond to keywords in the International Sequence Database (GenBank/EMBL/DDBJ) submission. Current acceptable values are:
  A=Active Finishing
  D=Draft HTG (often phase1 and phase2 are called Draft, whether or not they have the draft keyword).
  F=Finished HTG (phase 3)
  G=Whole Genome Finishing
  N=gap with specified size
  O=Other sequence (typically means no HTG keyword)
  P=Pre Draft
  U= gap of unknown size, typically defaulting to predefined values.
  W=WGS contig
6a component_id If column 5 not equal to N: This is a unique identifier for the sequence component contributing to the object described in column 1. Ideally this will be a valid accession.version identifier assigned by GenBank/EMBL/DDBJ. If the sequence has not been submitted to a public repository yet, a local identifier should be used.
6b gap_length If column 5 equal to N: This column represents the length of the gap.
7a component_beg If column 5 not equal to N: This column specifies the beginning of the part of the component sequence that contributes to the object in column 1 (in component coordinates).
7b gap_type

If column 5 equal to N: This column specifies the gap type. The combination of gap type and linkage (column 8b) indicates whether the gap is captured or uncaptured. In some cases, the gap types are assigned a biological value (e.g. centromere).

Accepted values:
   fragment:
gap between two sequence contigs (also called a sequence gap).
  clone:
a gap between two clones that do not overlap.
  contig:
a gap between clone contigs (also called a "layout gap").
  centromere:
a gap inserted for the centromere.
  short_arm:
a gap inserted at the start of an acrocentric chromosome.
  heterochromatin:
a gap inserted for an especially large region of heterochromatic sequence (may also include the centromere).
  telomere:
a gap inserted for the telomere.
  repeat:
an unresolvable repeat.

8a component_end If column 5 not equal to N: This column specifies the end of the part of the component that contributes to the object in column 1 (in component coordinates).
8b linkage If column 5 equal to N: This column indicates if there is evidence of linkage between the adjacent lines.
Values:
   yes

   no
9a orientation If column 5 not equal to N: This column specifies the orientation of the component relative to the object in column 1.
Values:
   + = plus
   
- = minus
   
0 (zero) = unknown
   na = irrelevant

By default, components with unknown orientation (0 or na) are treated as if they had + orientation.
9b   If column 5 equal to N: This column is empty- there is no filler. A tab should be inserted after the 8 th column though so that all lines have 9 columns.
Extended comments:

Author:
leo

Constructor Summary
Assembly()
           
 
Method Summary
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

Assembly

public Assembly()


Copyright © 2007-2012 utgenome.org. All Rights Reserved.