The Self-Organizing Map for Biological Regulatory Element Recognition and Ordering

SOMBRERO finds regulatory binding sites by using a neural network algorithm called the "Self-Organizing Map" to find overrepresented motifs in a set of DNA sequences. The currently most popular methods for finding overrepresented motifs use techniques from the fields of probability theory and statistical physics such as Expectation Maximization or Gibbs Sampling. Our application of the Self-Organizing Map phrases motif-identification as a clustering problem, and this seems to yield advantageous performance when applied to real genomic problems.

Most recently, SOMBRERO was extended in order to allow it to be initialised using a SOM that has been previously trained on a set of known transcription factor binding matrices. Initialising SOMBRERO using such prior knowledge allows the software program to be biased towards finding known transcription factor binding motifs. We have recently shown that the use of prior knowledge in SOMBRERO's initialisation significantly improves accuracy when known motifs are present in the input data, while accuracy is not negatively affected for the discovery of novel motifs. SOMBRERO is the only existing motif-finder that allows the incorporation of entire transcription factor binding matrix databases as prior knowledge.

Availability: SOMBRERO is freely available from this link.

Citing SOMBRERO: Please cite SOMBRERO using either (or both) of the following citations:

  • S Mahony, A Golden, TJ Smith, PV Benos: "Improved detection of DNA motifs using a self-organized clustering of familial binding profiles." (2005) Bioinformatics 21(Suppl 1):i283-i291 ( Proc. ISMB). Abstract, Full Text, Supporting Information.
  • S Mahony, D Hendrix, A Golden, TJ Smith, DS Rokhsar: "Transcription factor binding site identification using the Self-Organizing Map." (2005) Bioinformatics 21(9):1807-14. Abstract, Full Text, Supporting Information.

SOMBRERO is the result of a collaboration between Shaun Mahony (NUI Galway) and Dave Hendrix (UC Berkeley). The project was carried out in Berkeley under the supervision of Prof. Dan Rokhsar, in NUI Galway under the supervision of Prof. Terry Smith and Dr. Aaron Golden, and in the University of Pittsburgh in collaboration with Dr. Takis Benos.

