SEQ, SEQUENCE
NAME
SEQ, SEQUENCE - manipulate the content of the sequence buffer.
SYNOPSIS
SEQ = three_letters_code
SEQ LOAD filename
SEQ READ filename
SEQ FROM structure_identifier
SEQ COPY
SEQ SAVE filename
SEQ SWN filename
SEQ RESET
DESCRIPTION
The command SEQ (long form: SEQUENCE) manipulates the content of the main
sequence buffer. Garlic mantains two sequence buffers: the main buffer and
the reference buffer. The main sequence buffer is used to prepare the average
hydrophobicity plot, the hydrophobic moment plot, helical wheel plot and for
some other operations which require the sequence information. The reference
sequence buffer is used for sequence comparison and other operations which
require two sequences.
Both buffers store the following sequence information:
(1) The number of residues.
(2) The sequence in the form of three letters code. Uppercase letters are used.
(3) Disulfide bond flag, if information about disulfide bonds is available.
(4) Residue serial numbers.
(5) Raw hydrophobicity values (replaced by average value for exotic residues).
In addition, the main sequence buffer contains the following information:
(6) The average hydrophobicity.
width.
(7) The hydrophobic moment.
As sequence information may be given independently from any structure, atomic
coordinates are not required for most sequence manipulation routines. Thus,
garlic may be used as a sequence analysing tool.
All version of the command SEQ, except one, are used to manipulate the content
of the main sequence buffer. The only exception is SEQ COPY, which copies the
content of the main sequence buffer to the reference buffer. This is the only
way to store information to the reference buffer.
SEQ = three_letters_code
The command SEQ may be used with the keyword = (equal sign) to define sequence
at garlic command prompt. This may be practical to define a short sequence
fragment. This fragment may be used for helical wheel plot, or to locate the
given sequence fragment in a structure which is being investigated.
The syntax:
SEQ = three_letters_code
Example:
seq = ala phe tyr trp asn
The sequence fragment will be converted to uppercase. The sequence is not
checked for exotic residues so you can use the non-standard codes. However,
the routine which assigns the hydrophobicity values will fail to recognize
them. The average hydrophobicity value (calculated for the current scale)
will be assigned to these residues. At present, 23 codes are recognized:
SEQ LOA filename
SEQ LOAD filename
SEQ REA filename
SEQ READ filename
The keyword LOAD (or READ) may be used to read a sequence from the specified
file. Garlic is capable to recognize two types of input file formats:
FASTA files (one letter code) and files which contain three letters code in
a free format.
If input file contains the symbol > (greater than) in the first column of
the first useful line, the file is treated as one letter protein code in
FASTA format. Empty lines are ignored. The lines beginning with the symbol
# (numbersign) in the first column are treated as comments (ignored too).
Thus, the lines which are not empty and do not contain the symbol # in the
first column are treated as useful.
If input file is not recognized as FASTA file, it is expected to contain the
three letters code in a free format. Empty lines and all lines which
contain # in the first column are ignored. All other lines are treated as
useful. Digits (serial numbers, for example) are ignored.
The following characters are threated as separators:
(1) space
(2) tab
(3) comma (,)
(4) semicolon (;)
(5) newline (line feed)
If input file contains at least one bad code (a residue name which consists
of four letters, for example) the reading will fail. The hard-coded maximal
number of residues is 50000, but it may be easily changed (see MAXRESIDUES
in the header file defines.h).
Example:
load sample.fasta
SEQ FRO structure_identifier
SEQ FROM structure_identifier
The keyword FROM may be used to copy the sequence from the specified structure
to the main sequence buffer. Only selected residues are copied. Residue is
treated as selected if the first atom is selected. For proteins, this is
typically N (nitrogen). Residue insertion codes are ignored! Thus, the same
residue serial index (number) may appear more than once in the array of
residue serial numbers.
Example:
seq from 1
SEQ COP
SEQ COPY
The command SEQ COPY copies the sequence from the main sequence buffer to
the reference buffer. This is the only way to initialize the reference buffer.
This command must be executed (i.e., the keyword COPY must be used) before
executing commands which require two sequences for proper operation. The main
sequence buffer may be initialized prior to SEQ COPY by using one of the
keywords described above (=, LOAD or FROM).
Example:
seq copy
SEQ SAV filename
SEQ SAVE filename
The command SEQ SAVE saves the sequence to the specified file. Ten codes
(each consisting of up to three letters) are written per line, separated by
space. Serial numbers are not included (see SWN keyword).
Example:
seq save 9pap.seq
SEQ SWN filename
The command SEQ SWN saves the sequence to the specified file. Both residue
names and serial numbers are written to the output file. Insertion codes will
be missing! Five serial numbers and residue names are written per line,
separated by space.
Example:
seq swn 9pap.seq
SEQ RES
SEQ RESET
Reset (clear) the main sequence buffer. The command SEQ RESET sets the number
of residues in the main sequence buffer to zero. The storage is not freed, so
the buffer may be used again later.
Example:
seq reset
RELATED COMMANDS
PLOT prepares the average hydrophobicity and/or hydrophobic moment plot.
COMPARE compares two sequences. VENN draws Venn diagram. WHEEL draws helical
wheel plot. SEL SEQ selects portions of the structure which contain the
sequence stored to the main sequence buffer. To use any of these commands,
the main sequence buffer (to use COMPARE both buffers) must be initialized by
using the command SEQ.