Improving the monitoring of animal genetic resources on National and International level

The Farm Animal Biodiversity Network (FABISnet) is a new biodiversity network for collecting domestic animal breeds data from the European countries. Data are collected on National, Regional and Global level and can be automatically transferred between the levels. As a successor of the Animal Genetic Data Bank of the European Association for Animal Production (EAAP-AGDB) and the Domestic Animal Diversity Information System (DAD-IS) of the Food and Agricultural Organization (FAO), it has taken their historic data and integrated them in a network of databases. However, the data are incomplete. For example the general description of 57% of the registered European breeds are very sparse and for more than 3900 breeds the population size and structure statistics are outdated. A set of 13 management support reports and a methodology for their application has been developed. The reports present summarized information about the degree of completeness of the breeds descriptions by country, missing or incomplete population records, reminders for updating data and the status of data translation. Various monitoring/reporting tasks of the National and Regional Coordinators for management of animal genetic resources can be organized in a systematic manner using these reports. Such an organizational scheme can reduce the time spent in completing data and improve the content of each database in the network.


Introduction
In the past efficient use of modern techniques of genetic evaluation and selection along with achievements in biotechnology have lead to remarkable improvements in performance of animal production.As a consequence some populations and breeds fell behind as they got under the pressure from their improved competitors.This issue was already addressed in 1979 (ANONYM, 1979), when the creation of a genebank was suggested.To monitor the status of farm animal genetic resources (AnGR) the database of the European Association for Animal Production (EAAP) was set up in Hanover (EAAP-AGDB, 1987).When FAO developed its world wide Domestic Animal Diversity Information System (DAD-IS), the EAAP-AGDB served as a starting point for development.With the EU funded project "European Farm Animal Biodiversity Information System" (EFABIS) a new network got developed which will replace the old EAAP-AGDB as well as FAO's DAD-IS 2 (FAO, 1998) by a uniform software system (ROSATI et al., 2006).The developed network is an exhaustive multilingual source of information about the characterization, conservation and utilization of AnGR in Europe.The software for operating the network was developed in such a general manner, that it can be used for the establishment of biodiversity networks in other regions of the world.The network operates on three levels -National, Regional and Global.Within the network countries can establish their own Web-driven National farm animal biodiversity databases in one or more local languages, free of licensing costs as this was done for Poland (NRIAP, 2006).The subset of data required on the upper levels in one of the official FAO languages will be automatically transferred from the National databases to the Regional one (EAAP, 2006) via the synchronization protocol developed in EFABIS (DUCHEV and GROENEVELD, 2006).After the implementation of the new network the focus is now on the improvement of its data quality.The databases in the network collect data for breeds of domesticated animals in more than 35 mammalian and avian species.Each database is organized in five sections: breed data, library of publications, references and Web-links, image gallery and contacts data.The breed data section contains information about the origin and development of the breed, morphology and performance traits, utilization and conservation.An important part of the breed section is the population size and structure, which should be entered on a regular basis.This information is used for example in monitoring the population dynamics and trends, as well as in estimating the degree of endangerment of the breed according to the FAO (SCHERF, 2000) and the EAAP criteria (SIMON and BUCHENAUER, 1993).In this study the information requirements for successful management of domestic animal genetic resources data in the FABISnet are analysed.The use of the set of reports developed in the EFABIS is presented and illustrated with examples.Several propositions are made on how to use this management support system worldwide in the FAO structure of National and Regional Coordinators to improve the quality of Animal Genetic Resources (AnGR) data.

Material and Methods
In this section management problems in terms of data completeness for the areas of breed and population data on one hand and translation/synchronization on the other will be described.This is followed by a proposal for a solution.

The Problem
The current content of the European database is based on the merger of the data from EAAP-AGDB with the National data stored in DAD-IS.In May 2006 the European data represented 5720 breeds from 28 species in 47 countries.Within each country, breed data are collected, summarized and entered by possibly several persons.This is especially the case in countries with large number of breeds, e.g.France -493, Germany -459, United Kingdom -578.The information required (more than 100 fields) is collected from various sources -breeding societies, farmers, scientific institutions and is not always available at once.Therefore, entering data in the database is probably done in an incremental mode leading to gaps in the data.Moreover, the new network of multilanguage databases presents added challenges for administration and monitoring of data translation and exchange.General Breeds Data: The general breeds database block contains the description of the live and extinct breeds.The degree of completeness (DC) of the European breeds description is shown in Table 1.The DC was calculated as the ratio of the number between filled and all possible data fields per data block.Additionally, the DC was calculated across all blocks.More than half of the breeds -57% have less than 10% DC which means that almost no data are recorded for these breeds.The description of another 2063 breeds has DC less than 30%.Only one breed has a degree of fill more than the 60% in the description fields.Although 100% DC is impossible, such gaps indicate the need for a systematic management approach to improve data completeness.Population Counts Data: In addition to the detailed breed description we also need information about the population size and structure over the years.Such time series are required to investigate patterns in population dynamics and extinction probabilities as it is done by BENNEWITZ and MEUWISSEN (2005).A "long" time series of population data is needed also for development of a uniform criterion for assessing the degree of endangerment of breed, as done by GANDINI et al. (2004) andDUCHEV et al. (2006).In this regard parameters like effective population size and rate of inbreeding, estimated on the number of breeding males and females, can be compared with the results of the population analysis.(e.g.BIEDERMANN, 2004 for the Vorderwald cattle).
Here data are seriously lacking: as shown in Table 2, from 5720 European breeds registered in the database 1736 do not have any population records and therefore their status of endangerment is unknown.From the rest, 2208 breeds have population data only for a single year, 1731 -between two and nine years and only 45 breeds have population data for ten or more years.Thus, only a minor part of the database can be used for studying population dynamics.Translations and Data Synchronization: National databases may contain textual descriptions in one or more of the National languages.However, data to be send to the Global database must be in one of the official FAO languages -English, French, Spanish, Arabic or Chinese.Therefore, we need a mechanism to monitor and manage the translation of data and ensure that all new or updated data are translated.
Another area, which should be monitored, is the data exchanged among databases via synchronization.An example is the European database, which contains the English version of the World data.The synchronization between the levels is automatic, but there will be a certain time of inconsistency between EAAP database and the National databases on one hand and EAAP database and FAO database on the other.Therefore we need to check the current content of each database for consistency.

The Solution
The problems described above can be solved by presenting the database managers with the appropriate tools for monitoring the database content and a schedule for the systematic organization of this monitoring.
Monitoring: As already stated above, the general breed description contains information about the breeds characterization, conservation and utilization.As most of the European breeds are already registered in the EAAP and FAO databases the remaining task is the completion of this data block.To be able to fulfil this task, the persons responsible for data entry have to be informed about the degree of data completeness.They should be able to quickly find which items of the description are missing, so that they can make a targeted effort to fill the gaps.
The other important part of breed data contains the population counts.There are three tasks in this regard, which require management support: • Completing data for already reported years • Reporting population data for antecedent years • Regular data reports for the current year In the population data area it is important to identify years for which demographic statistics are missing or incomplete and to have an overview of all population data entered for a certain year.Such information about gaps should be available by breed and also by year.This will allow two different approaches to the task -completing data breed by breed or year by year for all breeds.Another area, which requires monitoring, is the data translation.In the National databases data to be translated includes breeds description, publications and reference information, image captions and contacts data.The situation in DAD-IS is similar, data entered in one language will be translated in the other official ones.This will probably be done by an external translator on a contract basis.In both cases the responsible persons have to monitor the translation process and keep track what has been translated up to now and what is left for translation.If the country has established a National Farm Animal Biodiversity Database it may include public data from other databases via synchronization.For example, the publications and links to useful biodiversity Web pages can be automatically loaded from FAO database into the National one.In such case the person in charge has to be informed about the amount of 'foreign' data, currently stored in the National database.The situation at the Regional and Global level is similar as these supranational databases contain subsets of synchronized data from the National ones.Therefore, the managers of the Regional and Global database should have summary of data loaded via synchronization.Moreover, they should keep track of the countries, which have not synchronized their data for a long time.Organizational Structure: Most European countries have officially nominated National Coordinator for Management of Animal Genetic Resources (NC).The NC is responsible for reporting their National genetic resources to the international level and therefore she/he must monitor all national data entered in the database.In multilingual National databases, the translation process should always be monitored by the National Coordinator.This is an obligation of the NC, because she/he is responsible for the data presented to Regional and Global level in an official language.The situation on the Global level is similar: the person monitoring translations should receive information on the current status on a regular basis.Thus she/he will have an idea what National data are available to the world audience.FAO has created the structure of a Regional Coordinator (RC) for the support and communication of NCs in a region.This structure seems to be ideally suited to also improve AnGR data from a region by having RC take a moderating and encouraging role.In such a network the RC can organize the dissemination of feedback to countries about the amount and completeness of the country data.The RC can also help countries in solving data improvement problems, sharing the experience from other countries in the region.Specifically, it is suggested that the European Regional Focal Point (ERFP) takes over this task for Europe using the EAAP database.It is suggested that the ERFP (as a model for the other RC in the world) invites the countries to update their data and points the attention to the gaps in the reporting of the population statistics on regular intervals.To be efficient in this task the RC should have an overview of the status of the breed data loaded from each country and should also monitor the presence of population data.If breed data are translated in one or more languages on Regional level, we propose that the Regional Coordinator also monitor this process.

Results
In this section the 13 reports developed in EFABIS for monitoring database content along with work schedules for the routine tasks of the National and Regional Coordinators are presented.The data used in the examples are taken from the European Regional database in April 2006 and are intended for illustration purposes only.

Monitoring
The reports for the management support in the FABIS network can be divided in two groups according to their type.The first group contains Web reports, which are accessible only to the National or Regional Coordinators after login in the Web page of the respective database.These reports are generated on the fly and represent the actual status of the data in the database.For National coordinators the reports are generated on the basis of their National data, whereas the Regional coordinators see cumulated statistics over all countries from their region.The reports are created in tabular format and can be copied from the screen into a spreadsheet software, e.g.OpenOffice or MS Office for presentation.This has been done also in some of the examples in this section.
The second group contains reports in Portable Document Format (PDF), which are automatically generated and disseminated via e-mail to the NCs, RC and other participants in the network.These reports are scheduled for automatic execution on a regular basis.For example National Coordinators may receive an e-mail containing the report for the status of data translation once at the end of each year.General Breeds Data: On a country level, the Web reports summarize in one place the DC of the breed data, allowing the NCs to find rapidly incompleteness or gaps.These reports are: 1. Degree of completeness of breed description (Web) 2. Presence of images per breed (Web) 3. Status of reporting by country (Web) The most important report for completing the general breed description is the "Degree of completeness of breed description by screen".An example for this report is shown in Fig. 1.In this report the DC of the data loaded for each breed from one country is calculated.To explain the meaning of this report data entry process in a FABIS database needs to be understood.The "Entry data" section consists of five description blocks -"General", "Origin and Development", "Morphology", "Performance", "Additional information".Each of these blocks represents a group of forms, e.g. the "General" block consists of four forms -"Names", "Other names", "Uses" and "Images".In the report "Degree of completeness of breed description by screen" the degree of fill of data is calculated for each separate form as the ratio between filled and all possible fields on this form.If a form is not applicable to a certain species, e.g.there is no "Eggs" form for cattle, then it is marked with "_" in the report.The values in the various groups are marked in three colours, depending on the class of completeness of data: "almost empty" if the DC is under 30%; "sparse" between 30 and 60% and "well filled" if the DC is greater than or equal to 60%.Using the red, orange and black colour schema helps the NC to spot quickly the incomplete blocks of data.However, this DC information is only useful together with knowledge of the data fields in the various blocks and should not be interpreted on its own.For example the "Name" form consists of four data elements -"most common name of the breed", "language of the most common name", "description of the most common name" and "transboundary or brand name".It is obvious that 100% degree of completeness is not possible, e.g. the indigenous breeds do not have a transboundary name.In any case, describing the most common name and its language, results in 75% DC for this sub-block.In our example, which is an excerpt from the report for Switzerland, the "General" and "Origin and development" blocks are generally "well filled".One exception is the "Images" sub-block for cattle breeds, where no data are present at all.There may be two reasons: there are no images in the database or there are images, but the image descriptions are missing.The "Presence of images per breed" report (Fig. 2) can be used to check if there are any images loaded for these breeds in the database.On a global level the degree of completeness of data per region can be obtained from the "Status of reporting per country" report.The degree of completeness of data is calculated as the ratio between the filled and all possible data fields in all records entered from a single country.The chart produced on the basis of this report is shown in Fig. 3.In all regions the DC of data is on average 32%, ranging from 28% in Latin America and the Caribbean region to 35% in Near East and in Europe.From the same report the Regional Coordinator can obtain the DC of data per country in the respective region.An example for region Europe is shown in Fig. 4. The average DC of data in Europe is 35% -with minimum of 27% in Bulgaria and in Republic of Moldova up to 45% in Switzerland.Reminder for entering population data (pdf) From the report "Breeds with missing population records in the last 10 years" (Fig. 5) the NC can produce a list of years with no annual statistics.This list can be used to collect the missing data from the breeding organizations or farmers.The other issue to be addressed is the improvement of the completeness of already existing records in the database.For this purpose the "Degree of completeness of the population records" report can be used.The example shown in Fig. 6 is created with data from this report.There, several typical situations in reporting of yearly data from the database are presented.The Rätisches Grauvieh and Braunvieh cattle breeds are two examples where data have been reported on a regular basis.In the case of the Braunvieh the degree of completeness in the last five years is over 60% -in the "well filled" zone.This is not the case for the Rätisches Grauvieh, where the DC is under 50%, reaching its minimum in year 2003 with 30% DC.This is also true for the CH-Warmblut horse in 1983, which is an example for gaps in reporting: it has only two years with population data -1983 and 1999.Such a gap of fifteen years between the two records and seven years from the last record to our days is too long, even taking into account the long generation interval in horses.A similar situation can be found in the Skudde sheep breed, where only data in years 1998, 1999 and 2001 are recorded in the database.Here the regular reporting is even more important, because this breed is under the endangered-maintained category according to FAOs criteria, with only 259 breeding females in 2001.
In the next report "Population data entered for one year" all population data reported from a country for a chosen year are presented together.The NC can use this report to complete the population statistics in a certain year.Another usage of this report is monitoring the process of loading data for the current year.As the report represents the actual status of the data, it can be printed or stored in spreadsheet at regular intervals.
Comparing the difference between two consecutive printouts will give us the changes and data updates made in the mean time.
The report "Reminder for entering population data" (Tab.3) follows our proposition that Regional coordinator should invite NCs to update their data at regular intervals.
The report contains a list of all breeds from a single country, which have outdated population data.Each year the NCs will receive by e-mail the list of national breeds with last population data older than the 2-3 years and invitation from the RC to update the database.A copy of the report can be also send to the RC.The reporting of population data for the current year can be monitored by the NC also via the "Annual NC report" (Tab.4).This report is supposed to be automatically sent each year to the NCs who enter data in one of the network databases.The report contains information about the number of breeds per species, the degree of completeness of the populations records also calculated per species and the trend in degree of completeness in comparison to previous year.All population data entered during the year is also included in this report.The population size and structure statistics can be monitored from the Regional Coordinator via the "Countries with missing population records for a year" report.An excerpt of this report for Europe for year 1986 is shown in Figure 7. 13.Last update date per country (Web) The "Data received via synchronization" report contains information about the number of breeds, images and publications received from other databases.Other reports like "Status of translation" inform the coordinators for the status of data, which are not directly entered by them, e.g.translations of breed description or publications in an official language.The last two Web reports -"Number of breeds records updated in a year" and "Last update date per country" are intended to be used by the RC to monitor the activity of the countries in updating their data.

Work Schedule
Readily available infrastructure is only useful for data improvement when embedded in a well-defined work schedule.Here we propose a workflow for the NC and RC in their management tasks.National Coordinator: The first task of the National Coordinator is registering all National breeds in the database and completing their general description.The NC should print the "Degree of completeness of breed description" report to identify which breeds are already registered in the database and what is their description DC.Then she/he may send the breed description forms (available from the Web page) to the breeding association, the institutions or the farmers monitoring these breeds for completion.
The other important task is completing the population data and keeping it up-to-date.The population data in Europe should be entered on a yearly or Bi-yearly basis.Such interval is reasonable, taking into account the possible disease outbreaks and their consequences on the population size.There are two approaches to this task -completing data per breed or completing data per year.In the "per year" approach the NC chooses the year for which you want to complete population data.Then she/he should print the "Population data entered for one year" report for this year and send the printed forms to the breeding association, the institution or farmers for completion.Data from the returned filled forms can be entered directly in the database without need to reformatting.With the "per breed" approach the NC first chooses a breed to work with.Then she/he should identify gaps in reporting data for this breed from "Countries with missing population records for a year" report and "Reminder for entering population data" report.From the report "Degree of completeness of the populations records" the NC will identify incompleteness in existing population records.As a result the NC will have a list of years, for which she/he should enter data and also a list of years where data is incomplete.Regional coordinator: Here, several tasks should be organized by the Regional Coordinator on a yearly basis.These tasks will require a one or two weeks working time per year.In the beginning of each year the RC should send invitation e-mails to the countries to update their data.Each of these mails should contain the "Reminder for entering population data" report for the respective country.At the end of the year the RC should send annual feedback emails to the countries for their improvement in data completeness.For this purpose data from the "Status of reporting by country" report should be stored each year and differences in data completeness between two consecutive years calculated.The proposed working schedule for the Regional coordinator is shown in Table 5. December each year Invite the countries to update their data Conclusions The management of the farm animal biodiversity data and its regular update can be very time-consuming task depending on the organization and number of breeds to be monitored.Translation of the data in several languages and distribution of the data across the network also adds to the complexity of the system and increases the informational requirements of the data providers and database managers -the National, Regional and Global Coordinators.The system of tools developed in the EFABIS project for meeting these requirements can be used from the National Coordinators all over the world to reduce the time to discover gaps in data and filling them.The tools also help in preventing new gaps during the regular data loading and give the National and Regional coordinators clear overview of the content, data updates and translation and synchronization activities in the network.Involving RC in the monitoring and updating process of countries is in line with FAO's intended function for RFP and should help improve data in DAD-IS.