A few days ago, a friend of mine, who is doing a post-doc in San Diego, had a problem. She didn’t knew at all if it was a difficult question to answer or not. And as I was her only hope she sent me an email asking for help.
She had a big list of genes differentially expressed between two conditions. So she wanted a tool to help her with that list, and something that was quicker than going through the NCBI for every gene of the list. Something to group easily some genes around signaling pathways, functions and other stuff like that.
Gene Ontology (GO) provides a controlled list of terms describing gene characteristics or process. This database consists of three different ontology: biological process, cellular component and molecular function. Each is organized in level and the deeper you go, the more specialized your term is. For example : in the biological process ontology, you can find the tumor suppressor gene TP53 under a lot of terms. One of them is apoptosis. Apoptosis is contained in programmed cell death, wich is in cell death, wich is in death, wich is in biological process. Theses ontology are constituted by a certain number of controlled terms each nested in another one. You can easily find synonyms for each terms (exact or approching). But that doesn’t help us for a list of genes.
And this isn’t the only interesting thing about GO. With this controlled vocabulary you can analyze list of genes and see what are the common GO terms. Tools are even listed directly on the GO website, in different categories. There is more than a hundred of different tools, so I’m not gonna analyze them all, but provide explanation about a few I know. First the easy one you can use online:
- Amigo: the basic, it’s the one included into GO. It’s the first one I’m using when looking for information.
- GeneMANIA: it provides interaction data and is a great visualization tool. It also gives you related genes (predicted, co-expressed, physically interacting…).
- g:Profiler: it has a resemblance with GeneMANIA, but provides information about domain and disease related to genes.
- GeneCoDis: doesn’t give you any network, but still provides pretty pictures and information about co-occurrence of annotations in your list.
Now tools that are a little more complicated, and need to be installed on your computer :
- Bioconductor: I’m mostly using the affy package in Bioconductor (to analyze high-throughput gene expression data). But since there is a great community behind Bioconductor, it’s not surprising at all that they have plenty of package GO-related, but I advice those to people who don’t mind getting their hands dirty and that are already familiar with the R project.
- ErmineJ: examines gene sets in high-throughput gene expression data, and give you ranks for your genes. So it’s for a big list of genes.