Generating RSCU Values
In order to analyse the pattern of codon usage
in a genome, the set of Relative Synonymous Codon Usage (RSCU) values
are computed for each gene. The RSCU value for a codon ‘i’
is defined as follows:
where Obsi is the observed number of occurrences of codon i and
Expi is the expected number of occurrences of the same codon (based
on the number of times the relevant amino acid is present in the
gene and the number of synonymous alternatives to i). In order to
make the data more compatible with the mathematical methods used,
the log10 of each RSCUi value was found so that the resulting value
was positive if the codon was used more than expected in that gene,
and negative if the codon was used less than expected. Taking the
RSCU values for each of the codons with synonymous alternatives,
each gene in the dataset is represented by a vector of 59 values.
|>MG0001 DNA polymerase III beta sub (dnaN)
>MG0002 heat shock protein.
Figure 3.1: Example of FastA format
Option 1 on the main menu allows the user to convert
between FastA format files and RSCU. There is no defined limit to
the size of input file that RescueNet can handle. FastA sequence
data is the only acceptable form of input, and is defined as follows.
On one line is the name of the gene (or organism), preceeded by
a ‘>’ (no parentheses). The sequence data begins
on the next line. The next sequence is identified by the ‘>’.
If your data is in a different format, you can use Readseq by Don
Gilbert to reformat the data into FastA format. Readseq is available
from the IUBIO archive:
The standard (Universal) Genetic Code is the default
code used by the RSCU value generator. However, 10 other genetic
codes are also supported. To select between these codes, enter the
word ‘changecode’ instead of an input filename. The
user will then be presented with a list of supported genetic codes
(mostly mitochondrial) and asked to choose one. This will then remain
the choice of genetic code until the program is terminated or until
it is changed again.
The output file from the RSCU value generator will
hold (for each gene) the descriptor line taken from the FASTA file
and a set of 59 RSCU values; one value for each variable codon.
Within these numbers, positive values denote codons that are used
more often than expected, and negative values denote codons that
are used less often than expected.
to index) (next)