Looking for data in the Arabidopsis mine, no canary needed

27 June 2013
​If you start your Google browser, in most cases ads will appear which are suspiciously linked to your text input. To achieve this, clever informatics boys and girls use so called text mining techniques. Based on your keywords, algorithms will look for related content and display it on your screen. Nowadays these techniques are also being used by our clever VIB bio-informaticians. Sofie Van Landeghem from the group of Yves Van de Peer at the VIB Department of Plant Systems Biology, UGent together with the group of Dirk Inzé, director of PSB, digged into the wealth of information hidden in the biomolecular literature mine. They came back with interesting data which they now published in The Plant Cell.

This research is actually a follow up of previous text mining efforts of the lab. What is new?
While we have previously spent much effort into the development of novel text mining techniques, it is the first time that these state-of-the-art methods are applied to study global network topology and connectivity in a systems biology setting. The EVEX text mining resource that we used in this study is the result of a long-lasting collaboration with the University of Turku (Finland). It holds textual interaction data for all organisms found in 22 million PubMed abstracts and 400 thousand full-text PMC articles.

In the recent paper, we focused on the model plant Arabidopsis thaliana. We used not only our text mining data but also experimentally derived interactions and regulatory associations found in high-quality databases such as AGRIS, BIND, BioGRID, MINT and TAIR. This information was previously compiled in the integrative framework CORNET, also developed within our department. Even though CORNET is already a very comprehensive resource, we have found that there is still quite some information that could only be derived through text mining. Adding the textual data to the network thus improves our ability to find connections between functionally related genes.

You didn’t assess only the potential of text mining for plant biology research but by taking some case studies you also identified new associations within certain pathways. Can you name of few?
As an example case we have chosen the Arabidopsis cell cycle, which has been extensively studied in our department over the past years. By several wet-lab methods, previously 61 core cell cycle (CC) genes were identified. Using our integrative network approach we have tried to identify additional candidate genes involved in cell cycle, by inspecting the direct neighbours of the 61 core CC genes. While we found that many interactions resulted from large-scale experimental studies, there were nevertheless a number of candidate genes that were not recorded in the public databases or experimental data, and could thus only be found through text mining. Some examples are the SOB2/ESR2 gene which interacts with CYCD1;1, as well as the transcription factor SHORTROOT which was found to regulate RBR1. By providing the original PMID and sentence, the text mining data allows to further inspect these connections and quickly retrieve relevant literature on any topic.

How do you see the future of text mining? Will it become trivial for every wetlab scientist or will it remain a more exclusive tool for bio-informaticians studying large-scale biological networks?
Within our lab we definitely strive to bring text mining closer to the wetlab scientist. Together with the University of Turku, we are developing a web interface (http://www.evexdb.org) where you can browse text mining results for your favourite gene(s). Additionally, we are involved in several collaborations with wetlab scientists to help them integrate the text mining information into their specific research projects. Finally, we try to familiarise people with text mining concepts and data through the organization of workshops, such as the VIB training seminar on text mining at the end of May.

From left to right; Sofie Van Landeghem, Yves Van de Peer, Stefanie De bodt, Dirk Inzé and Zuzanna Drebert