Computational Biology internship

Over the summer I’ve been on a nine week internship at the Sainsbury Laboratory, a plant science research institute affiliated with Cambridge University. I was working in the Vroomans group, which studies the evolution of plant development processes using genetic algorithms.

The group uses a C++ model that simulates a population of tissues over generations. In each generation, the cells of each tissue are given time to express proteins based upon their genomes and for these proteins to diffuse between cells and interact with transcription factors, promoting or inhibiting the expression of other proteins. The tissues are then given a fitness score based upon how closely the pattern of a specific identity protein matches a target pattern. The best tissues are then selected, with higher fitness giving a higher probability of selection, before they are mutated to form the next generation. This is run for thousands of generations in the university’s high performance computing (HPC) cluster, each run taking several hours.

The section from my end-of-project poster explaining the model

Under the supervision of PhD student Pjotr van der Jagt, my project was to add dimerization reactions (where two proteins can associate and dissociate according to an equilibrium) to the group’s model and study how they affected the pattern-forming mechanisms that evolved.

Initially I was nervous because I’d only worked with C++ in a microcontroller context and had never worked with a HPC system at all. However, I quickly got used to writing and compiling for a more powerful system, and Pjotr taught me the intricacies of the HPC cluster.

I implemented the ability to add an unlimited number of pre-defined dimerization reactions between specific protein(s) to form new dimeric proteins. The equilibrium constants of the reactions are mutated between generations to enable them to evolve.

The main finding of the project was that the addition of dimerization reactions didn’t significantly affect fitness progression or genome characteristics. Despite this they were used by agents a high proportion of the time across different runs, often acting as transcription factors inhibiting the expression of their monomers to form negative feedback loops.

Since dimerization reactions are used ubiquitously in plant development processes, we theorised that they may offer a benefit not modelled by the system, such as buffering transcriptional noise, or may rely on other systems not modelled such as post-transcriptional regulation to reach their full potential.

Since runs on the HPC took so long I also had time to implement other features which weren’t used for the project, such as autodimerization reactions, miRNA molecules that degrade the mRNA of target proteins (a type of post-transcriptional regulation) and documentation.

At the end of the project I created a poster summarising the research which I presented at a poster session for summer interns from across the Sainsbury Lab, Crop Science Centre and NIAB. You can see my poster here:

I enjoyed working in a research environment (particularly the hot chocolate machine), and learned a lot about cellular biology, plant development and technologies new to me!

Leave a comment

Design a site like this with WordPress.com
Get started