High throughput plant functional genomics and advanced breeding methodologies underpin the global collaborative efforts targeted at increasing food production. Technological advances vastly increased the amount of agricultural data that can be generated, stored and analysed. These data currently constitute a highly valuable resource driving agricultural research.
Connecting multiple levels of data to crop genomes, other 'omic' resources, associated functions, phenotype and population data resources is a current key challenge. A recent initiative used existing Australian Data infrastructure around agricultural data to generate an access portal that allows to query information across individual data collections beyond surface metadata. This approach joins previously unconnected data sets more rapidly across species and data type borders by finding common denominators in biological information (sequence) and experimental output (parameters). The novel AgriConnect platform aims to contribute towards connecting genotypes to phenotypes across species boundaries in agricultural data collections in Australia.
AgriConnect, is a data-hooking portal that links standardised data covering more than 14 crop and model species. Currently included species comprise: thale cress (arabidopsis thaliana) , banana (musa acuminata), barley (hordeum vulgare), brachypodium species, canola (brassica napus),eucalypt species, maize (zea mays), mungbean (Vigna Raiata, Vigna Radiata sublobata, Vigna Mungo), potato (solanum tuberosum), rice (oryza sativa), sorghum (sorghum bicolor), soybean (glycine max), tomato (solanum lycopersicum), wheat (triticum aestivum) and wine grape (vitis vinifera). Using all data available for species and data-level gap filling helps reveal crop-overarching general rules for plant behaviour. “Borrowing” data across species and data types yields a better understanding and estimation of crop behaviour and proposed experimental outcomes for translation into agricultural practice.
The AgriConnect Data Hooking
The sequence-based hook
How BLAST connects the data form all data sets? The way to connect very different data requires a common denominator that is used as a search criteria. For biological data one of these denominators can be the biological code. Genomic, transcriptomic and protein sequences are often used for linking data about proteins, plant phenotypes and gene usage. The nucleic acid (gene, transcripts) and amino acid (protein) sequences stand in relationship and when translated offer additional links to yet unlinked data.
Using the AgriConnect sequence search offers the exploration of links deriving from biological code. Thereafter, users can enter a nucleic acid or amino acid code. The submitted code is sent off to each data base and compared (BLAST) to existing code from genes, transcripts, SNPs and proteins that have been linked to existing data collections. Any match or near match will be retrieved and returned from each data set to the central AgriConnect result view for investigation.
The returned hits will be mixed types of hits from all connected data sets allowing an unrestricted assessment of possible data connections beyond species and data type limits. The multi-lateral query will allow the user to search several data sets at once and assess the possible potential of each connected data set for further investigation instead of assessing each data set individually.
The keyword-based hook
How the keyword search connects the data from all data sets? For searching a field of research, type of question or other categories, the user can also make use of the keyword search in AgriConnect. In contrast to a data registry, where the keyword is compared to the meta data (data set description and keywords), AgriConnect uses the keywords as a common denominator within the data collections.
Using the AgriConnect keyword search the user can search text-minable fields within each data collection to obtain potential data matching the keyword criteria. Each connected database offers access to text-minable annotations, data fields and descriptions that allow a more comprehensive yield of hits. Any keyword match from any connected data set will be returned to the central AgriConnect results view for evaluation by the user.
Similar to the sequence search, the keyword results are mixed types of hits giving the user the opportunity to explore integrating their data with experimental results, images, single protein annotations or curated functional data. This multi-lateral query option will retrieve a clearer idea of the potential when integrating one or multiple data sets that can be found behind the linked individual data portals.
Where to go from the results view?
A goal of the AgriConnect Portal is to offer the user a large number of possible connectivity ideas to the user data by user-entered search queries. While the queries go directly into each linked data set to retrieve data hits, the results view offers some general as well as more specific information on the retrieved hit type, quality and data collection type of all connected data sets. The user can then assess what kind of data would be valuable for integration and follow links from the results view to obtain downloadable data or generate a more specific query through the data collection portal directly.
The best way to use AgriConnect is to have an open mind and don't limit the search even though we have provided some options. Sometimes using very different type of data can offer valuable gap-filling of knowledge across species or information levels that generate new hypotheses and fresh leads.