GMACS (Download source-code & documentation)

Familial binding profiles (FBPs) represent the average binding specificity for a group of structurally related DNA-binding proteins. The construction of such profiles allows the classification of novel motifs based on similarity to known families, can help to reduce redundancy in motif databases and de novo prediction algorithms, and can provide valuable insights into the evolution of binding sites. Many current approaches to automated motif clustering rely on progressive tree-based techniques, and can suffer from so-called frozen sub-alignments, where motifs which are clustered early on in the process remain `locked' in place despite the potential for better placement at a later stage. In order to avoid this scenario, we have developed a genetic-k-medoids approach which allows motifs to move freely between clusters at any point in the clustering process.

The first stage in the algorithm converts each motif to a K-mer Frequency Vector (KFV) and calculates pairwise distances using the Cosine distance metric. The fitness of candidate solutions is then calculated by performing one round of the k-medoids algorithm, and evaluating the resulting clusters using the silhouette metric.

For any questions, comments, or suggestions, please send an email to Pilib Ó Broin