State of Research Data Management at CIAT

State of Research Data Management at CIAT

CIAT has made great strides over the years in the management of research data and is one of the institutions taking the lead in data innovations around agriculture.  CIAT co-leads the CGIAR Platform for Big Data in Agriculture and has just hosted, in September 2017, the first CGIAR Big Data in Agriculture Convention.  One of the questions that have guided the CIAT data management team reflections this year has been “As leaders of the CGIAR platform on Big Data, are we walking the talk on research data management?”  Now is an opportune time to reflect on this and progress made on data management and open data.

CIAT has made progress on several fronts – Policies, Processes, People, Platforms, and Products – but collectively, we believe CIAT can deliver even more. The Organize module of the Big Data Platform is a fantastic opportunity to make further progress on research data management and open data.

Policies and guidelines: CIAT was one of the first centers to adopt the CGIAR open access and data management policy back in 2013.  However, as a team, we realize that more guidelines and directives are required primarily to support data management at the research project level.  We are in the process of producing the needed documents and have teamed up with CCAFS and Stats4SD to update the excellent CCAFS data management support pack into a CIAT/CCAFS data management support pack. The updated pack will be ready for use within CIAT and CGIAR research programs in early 2018. Please stay tuned!

Processes: CIAT has traditionally undertaken data management from a programmatic level. However, one recommendation that has come up from different conversations around research data management at CIAT is that we must also focus on where the rubber meets the road, data management at the project-level. We are working on this, and there is still some way to go. The team has started contributing data management plans to selected project proposals, and we are piloting implementation of data management and sharing plans for selected projects.  In parallel, significant work is also ongoing to improve internal processes such as staff induction, staff exit, project process automation and data publishing.

People:  CIAT has a small data management coordination unit. As part of this team, we have also invested in improving statistical/biometric support capacity in CIAT regions, with the statisticians playing a dual role as data management focal points.  Even though some research areas and programs have invested in data management staff, as the focus now shifts to include data management at the project level, staff investments have to follow a similar path.  Where viable, like when projects are significantly large, CIAT is urging projects to create data management and sharing plans that include considerations for data management resources such as staff.

Platforms: CIAT has two major types of data management platforms. (a) Platforms for managing (collecting, storing, querying and analyzing) day to day research data. These are usually internal within CIAT for example, Oracle databases and related applications for CIAT crop research programs, DAPAFS and the IBP Breeding Management System; (b) Platforms for publishing research data, these would have data published as international public goods, for example, Dataverse and AgTrials. There is an urgent need to harmonize platforms for day to day management of data; this is an opportunity to increase the efficiency of how we manage data by putting related data together and managing related processes with one or two agreed upon tools. The data management team is continually working with all CIAT research areas to see how this harmonization can be improved.

Products: We must get more data products out, and there is a lot more we can do. The trend is positive for example; published data increased from only 8 fully open datasets published in 2014 to 40 published in 2016 and 39 last year. With the CGIAR Platform for Big Data in Agriculture, we have additional support to curate and publish more data sets. We then need to go a step further and continue building usable digital products – analytical tools, online databases, visualizations, portals and others – based on this data.

Dr. Andy Jarvis, co-pioneer of the CGIAR platform on Big Data and CIAT Director of the Decision and Policy Analysis (DAPA) Research Area, published a blog recently “By breathing new life into dormant data, we can see the future.”  This blog crisply hits the nail on the head; research institutions need to free data from C: drives and data silos, for use in accelerating the pace of agricultural research.