3. Generating RSCU Values

In order to analyse the pattern of codon usage in a genome, the set of Relative Synonymous Codon Usage (RSCU) values are computed for each gene. The RSCU value for a codon ‘i’ is defined as follows:

where Obsi is the observed number of occurrences of codon i and Expi is the expected number of occurrences of the same codon (based on the number of times the relevant amino acid is present in the gene and the number of synonymous alternatives to i). In order to make the data more compatible with the mathematical methods used, the log10 of each RSCUi value was found so that the resulting value was positive if the codon was used more than expected in that gene, and negative if the codon was used less than expected. Taking the RSCU values for each of the codons with synonymous alternatives, each gene in the dataset is represented by a vector of 59 values.

>MG0001 DNA polymerase III beta sub (dnaN)

>MG0002 heat shock protein.

Figure 3.1: Example of FastA format files

Option 1 on the main menu allows the user to convert between FastA format files and RSCU. There is no defined limit to the size of input file that RescueNet can handle. FastA sequence data is the only acceptable form of input, and is defined as follows. On one line is the name of the gene (or organism), preceeded by a ‘>’ (no parentheses). The sequence data begins on the next line. The next sequence is identified by the ‘>’. If your data is in a different format, you can use Readseq by Don Gilbert to reformat the data into FastA format. Readseq is available from the IUBIO archive: .

The standard (Universal) Genetic Code is the default code used by the RSCU value generator. However, 10 other genetic codes are also supported. To select between these codes, enter the word ‘changecode’ instead of an input filename. The user will then be presented with a list of supported genetic codes (mostly mitochondrial) and asked to choose one. This will then remain the choice of genetic code until the program is terminated or until it is changed again.

The output file from the RSCU value generator will hold (for each gene) the descriptor line taken from the FASTA file and a set of 59 RSCU values; one value for each variable codon. Within these numbers, positive values denote codons that are used more often than expected, and negative values denote codons that are used less often than expected.

