The application of the tricks of the astronomical trade to other contemporary problems, particularly problems in the biomedical sciences, has been something I have been working on from the time of my Ph.D. studies. The scale, modalities and complexity of the underlying biological data provide interesting opportunities to explore novel approaches in imaging and artificial intelligence - as long as you have a reasonable understanding of the underlying biology! I spent five years as a genetics professor in one of New York City’s medical schools, and whilst there, honed my specific interests to the intersection between physics, genomics, radiation oncology and imaging. I’m also interested in biodiscovery, specifically identifying novel secondary metabolites in previously unknown marine microorganism genomes.

Radiogenomics & Radiomics

The ability to molecularly profile tissues from a patient provides a powerful means of identifying biomarkers that can be used either determine the likelihood of that patient suffering from excessive toxicity as a consequence of radiotherapy, or indeed, assess if an individual might benefit from radiotherapy over other forms of treatment for cancer. Recent work in defining radio sensitivity indices (RSI) that combine gene expression profiles with radiobiological concepts such as the linear quadratic model of cell survival against radiation dose is opening up ways in which we can more effectively model radiotherapeutic response. Similarly, combining radiological data with both genomic and clinical data from the same patient offers a new biomarker descriptor space that is showing promise in the area of precision medicine.

Radio Sensitivity Index calculated using gene expression data for matched tumour/control samples taken from several patients - the higher the RSI, the more radioresistant the tissue. The trend in sensitivity between normal and cancer tissues is apparent.

Secondary Metabolite Biodiscovery

Microbes communicate and when necessary defend themselves by endogenously producing metabolites which they release into their immediate environment. These regions are localised in their genomes and are known as biosynthetic gene clusters (BGC) - whilst the ‘payload’, i.e. the secondary metabolite that does the communication/damage may vary, generally the molecular machinery to manufacture and release it is conserved and its possible to search for de novo BGC using sequence homology. The fun begins when confronted by a microbe that does something interesting - such as killing cancer cells - for which no genome exists… In this case you have to sequence the DNA from scratch and start a whole voyage of discovery to identify putative BGC - then find a friend to validate your candidates at the bench. Fortunately this (the bench work anyway) is an active area of work here in NUI Galway.

A screengrab from the results of an antiSMASH search for putative biosynthetic gene clusters in an unknown marine microorganism showed cancer killing potential and which we sequenced and assembled its genome at NUI Galway.

‘Unsupervised’ Unsupervised Learning

Nearly 20 years ago when I first started exploring the use of neural networks to identify patterns in bacterial genomes, everybody hated them because they were ‘black boxes’. Then Google came along, neural networks became ‘cool’ (they always were) and now everyone is using them for this ‘new’ thing called Deep Learning (as to be distinct from Deep Thought). The thing is nearly all of these neural networks are used in supervised learning, where you have examples and outcomes known already to train the network. I’m interested in situations when we don’t know any examples and are fairly shaky on the outcomes but we have lots of data, and so the neural network learns patterns by itself - the fun begins asking what makes the patterns so special… this is unsupervised learning. The most well known variant of this type of neural network is the Self Organizing Map, but like all neural networks, it’s operation is defined by a whole clutch of user defined parameters. I’m interested in building SOMs that use concepts from Cosmology and General Relativity to literally mould themselves to the high dimensional structure of the data - basically they self learn and are parameter-free. Its sounds crazy but some early forays show promise… I’ll be submitting a PhD project proposal based around this idea to the SFI Centre for Research Training (CRT) in Genomics Data Science NUI Galway is coordinating.

Two SOMs each trained on a set of food 'vectors', with each food type corresponding to a 3-d vector [fat,carbs,protein], with each element ranging from 0 to 255 - so its easy to visualize. The SOM on the left is the vanilla version, the one on the right uses a form of self-learning - it's boundaries are much clearer, and it took an order of magnitude less time to 'learn' the menu.