SoyKB helps improve a vital food source


Soybeans are second only to corn as a U.S. agricultural commodity. However, there has been limited soybean research over the past few decades, according to Dong Xu, James C. Dowell Professor and chair of the computer science department at the University of Missouri (MU) and one of the principal investigators for a project called SoyKB.

Five years ago, researchers affiliated with the Christopher S. Bond Life Sciences Center (Bond LSC) at MU set out to establish a resource to help researchers study soybeans more effectively. The Bond LSC includes researchers from MU’s Colleges of Agriculture, Food and Natural Resources; Arts & Science; Engineering; Human Environmental Sciences; Veterinary Medicine; and the School of Medicine.

Dong Xu: Photo courtesy of the University of Missouri

The researchers developed the Soybean Knowledge Base (SoyKB), a comprehensive web resource that handles the management of soybean genomics data and provides tools for researchers to search for and analyze soy gene families and traits. “SoyKB has molecular and protein data and many other types of data to help researchers analyze their own work and search for useful information,” said Xu.


 SoyKB architecture. SoyKB has four modules, including the back-end MySQL database that incorporates and integrates all the soybean genomics and “-omics” data from various experiments. It is designed to contain information on six different entities: genes/proteins, miRNAssRNAs, metabolites, SNPs, Plant Introduction (PI) lines and traits. The other three front-end modules are the web interface, genome browser and data integration. Image used with permission from Dong Xu.

“Soybean research has been gaining ground and has had a direct impact on crops and their resistance to disease, pests and weather,” added Xu. “SoyKB has improved the research, and this has a direct impact on agriculture and the public. Research is often pure discovery and exploration. We are not always sure what questions to ask. But we do know that investment in soybean research is paying dividends. SoyKB is an important component for many soybean studies.”

Xu is accustomed to using computers to analyze large amounts of data, such as DNA sequences in plants. Although Xu’s degree is in physics, his Ph.D. project was conducting molecular simulations in proteins—and that got him started in the field of bioinformatics. Xu’s lab has since developed software programs that assist biological modeling.

“For the past 11 years here at Missouri, my collaborators have been plant scientists, so my research involves plants and crops,” said Xu. “Our goals with SoyKB include helping breeding practice and using data to develop better soybeans, higher yields, more nutrition, or special traits like drought resistance. The data are very complex so it is very challenging to make SoyKB easy to use for ordinary biologists. It is also used for educational purposes in classrooms. High school teachers can use it to teach basic biology. It is also used in some courses here at MU.”

The Open Science Grid (OSG) and the Extreme Science and Engineering Discovery Environment (XSEDE) are critical partners for SoyKB. XSEDE helps with the submit infrastructure and workflow and using the OSG, which is an XSEDE service provider. “We rely on XSEDE and the OSG,” notes Xu. “Re-sequencing 300 to 1,000 lines of soybeans, with each line generating many gigabytes of data, requires enormous computing power. Our campus has a big computing farm, but it’s not enough for the work we are doing. It’s only possible through the OSG. We must have a lot of intensive computing for data analysis, and XSEDE provides us with a lot of resources.”

SoyKB has also developed a critical partnership with iPlant. In 2008, the National Science Foundation established iPlant to “develop cyberinfrastructure for life sciences research and democratize access to U.S. supercomputing capabilities.” Users with iPlant accounts can integrate their accounts with SoyKB. In turn, SoyKB is integrating its informatics tools with iPlant tools and making them available to iPlant users. In January 2014, website hosting for SoyKB migrated to iPlant. “We benefit quite a lot from iPlant hosting,” says Xu. “It makes the project more scalable and more reliable. Plus, the number of SoyKB users is growing and we had to keep up.”

Initial funding for the project came from the Missouri Soybean Merchandising Council. They have since received financial support from the United Soybean Board, the National Center of Soybean Biotechnology, the National Science Foundation, the Department of Energy and the U.S. Department of Agriculture.

“We are extremely grateful to our partners,” added Xu. “I especially want to acknowledge Trupti Joshi, assistant research professor in computer science at MU and one of the principal investigators, who manages SoyKB.”

~ Greg Moore