Machine Learning as an aid to management decisions on high somatic cell counts in dairy farms

High somatic cell counts (SCC) is associated with mastitis infection, in dairy herds, worldwide. This work describes Machine Learning (ML) techniques designed to improve the information offered to farmers on animals producing high SCCs according to particular herd profiles. The analysed population included 71 dairy farms in Asturias (Northern Spain) and a total of 2,407 lactating cows. Four sources of information were available: a) a questionnaire survey describing facilities, milking routines and management practices of the farms studied; b) dairy recording information; c) classification of the cows suspected of being healthy or subclinical mastitic according to farmers’ expertise; and d) positive or negative scores with respect to the California Mastitis Test (CMT). The decimal logarithm of the SCC (linear score), lactation number, herd size, lactating cows per milker, milk urea concentration, number of clusters per milker and actual SCC are shown to be the most informative attributes for mimicking both farmers’ expertise or CMT performance in order to identify animals producing persistently high SCCs in dairy herds. However, to improve the identification of cows suspected of being non-healthy, the system uses other information related to management and milking routines. Decision rules to predict CMT performance can provide useful, additional information to farmers to improve the management of dairy herds included in milk recording programs.

Schlüsselwörter: Milchkuh, künstliche Intelligenz (AI), Milchkontrollprogramm, Zählung somatischer Zellen Introduction Bulk milk for sale in the European Union is required to have a somatic cell count (SCC) of less than 400,000 cells/ml. Dairy market operators have introduced penalties and bonus on farmers' returns according to SCC. In these conditions the design of new tools to help farmers to limit SCC is demanded by groups specialised in producing high quality milk and becomes an important goal in dairy production (BALTAY, 2002;SKAZARD et al., 2003). The most reliable information the farmers can use to limit SCC is derived from dairy recording. However, many farmers using milk recording do not use SCC records because of the complexity of the information management (MILLER et al., 1988). Usually they pay attention to a given threshold of individual SCCs, above which they assume a cow to be mastitic and apply a relevant diagnostic technique to confirm it. However, the SCC is lacking in certain aspects that makes its interpretation difficult, such as the tremendous seasonal or age-dependent variation between SCCs with or without mastitis even with a properly carried out sampling and count (NATZKE, 1978;BALTAY, 2002). Additionally, persistently high SCCs can be found in herds where available facilities and herd management cannot be distinguished from other nearby herds showing normal SCCs (KHAITSA et al., 2000). Despite these shortcomings, the observation of SCC trends from one period to another serves to evaluate whether any progress has been made, or if the situation is stable or deteriorating. The use of decision support systems (DSS) has been proposed to realise the full management benefit of SCC records (ALLORE et al., 1995). The application of Machine Learning (ML) techniques is therefore proposed for the development of a DSS, able to manage all this information (HOGEVEEN et al., 1994). The aim of this work is to highlight the possibilities of using ML techniques to deal with this task. They can be used to ascertain the major attributes affecting process performance and obtain sound assessments, comparable to accepted human performance, when classical statistical tools are not available (GOYACHE et al., 2001). Table 1 Mean and standard deviation of the main productive variables, at a test-day level, included in the initial training set in the full dataset and according to farm size: small (N=21), average (N=34) and large (N=16)  Material and methods Data A questionnaire survey was carried out by clinical veterinarians on 71 dairy farms in Asturias (Northern Spain) comprising a total of 2,407 Spanish Holstein cows. Asturias is a humid and mountainous region that produces the 12.5% of the milk of Spain. Farms were included in the same dairy recording scheme and in a milk quality improvement program. An overall description of the main variables used in the learning process is shown in Tables 1 and 2.   Table 2 Incidence (in percentage) of variables obtained from the questionnaire survey included in the initial training set, in the full dataset and according to farm size: small (N=21), average (N=34) and large (N=16) (Anteil (in %) der in der Umfrage erfassten Variablen, enthalten im Übungssatz, im vollständigen Datensatz und entsprechend der Größe des Bauernhofs: klein N=21), mittelgroß (N=34) und groß (N=16) For descriptive purposes, farms have been grouped according to their size into: small, average and large farms (less than 30, from 30 to 49, and 50 or more lactating cows, respectively). Data concerning milk yield, and fat, protein and urea contents, and SCC at a test-day level were obtained from dairy recording for each cow and farm included in the study (Table 1). Dairy records of every lactating cow in farm were gathered when a given cow was considered suspect or a CMT was carried out. When available, the data of the previous month's productive performance from a given cow and its herd, always at a test-day level, was included in the training set used as input for the ML procedure. Thus, a minimum of 2 and a maximum of 6 records per cow and farm were available, totalling 7,292 records. The number of records obtained from 684 cows classified as suspect in a given month was 2,293 (3.3 records per cow). To test the homogeneity of the productive and management conditions we performed Duncan's multiple range and Chi-square tests on continuous and categorical variables, respectively, using SAS® (1999). The information obtained from the survey included 13 variables describing farm facilities and the milking routine (Table 2).
Farmers were asked to classify their cows as healthy or suspected of being subclinical mastitic according to their own experience and observed changes (usually very light) in the milk (flakes, clots or serum) observed by pre-stripping cows before milking, changes in the udder (swelling, discoloration or hardness), or changes in the cow (fever, reduced appetite, diarrhoea or increased respiratory rate). Only animals with no observable clinical symptoms of mastitis were classified. A total of 684 cows (28.4% of the available cows) were classified as suspect. The CMT was carried out on these suspect cows by trained experts. CMT results were scored as 0 (negative), 1 (trace), and 2, 3 and 4 (the higher the score, the higher the reaction). A total of 544 cows (22.6%) were considered as CMT-positive where at least a quarter was scored as 1. CMT was carried out at least one more time on CMT positive cows. Consecutive CMTs were performed always in average intervals of four weeks.

Machine Learning system
The learning process was performed using the ML system C4.5 (QUINLAN ,1993). This algorithm produces decision trees formed by nodes with the conditions that should be accomplished generating linear functions used to predict numeric values and branches labelled with one of the classes to be learned. C4.5 uses the 'gain ratio' as the criterion for splitting nodes. The basic idea is to split the current training set in such a way that information required to classify the examples can be minimised. The program can generate trees in iterative mode starting with a randomly-selected subset of the data (the window), generating a trial decision tree, adding some misclassified objects, and continuing until the trial decision tree correctly classifies all objects not in the window or until it appears that no gain is obtained. After each tree is generated, it is pruned in an attempt to simplify it. All trees produced, both pre-and post-simplification, are evaluated on the training data. The 'best' pruned tree (selected by the program if there is more than one trial) is saved in machine-readable form.

Learning process and relevancy
A diagram describing the learning process is shown in Figure 1. The input of the learning system (in ML terms, the 'training set') included 37 attributes per farm: 13 from the questionnaire survey and 24 from dairy recording at a cow and herd levels (12 corresponding to the previous month's production performance of the cow and herd) and two possible classes: suspect/healthy and CMT-positive/negative. To feed the ML system we first found the more 'relevant' set of attributes to represent computationally the problem we would like to solve. In ML field words the more relevant attributes are those presenting the best ratio between the prediction error and the number of attributes used.
To carry out this process, a combination of ML tools successfully implemented previously (GOYACHE et al., 2001;DÍEZ et al., 2003) was used. First we used BETS -Best Examples in Training Sets- (DEL COZ et al., 1999). BETS estimates the degree to which the value of a given attribute helps to decide a given class giving an ordering of the relevancy of the attributes which is useful to differentiate between relevant and irrelevant attributes across the initial training set. Secondly we applied FA -Filtering Attributes- GOYACHE et al., 2001) on the relevant attributes selected by means of BETS. FA removes the less relevant attribute and, in a subsequent step, checks the usefulness of the resulting training set to carry out the learning process.
Using FA rigorous screening of the attributes was possible. If we only remove a given attribute when the resulting error of prediction is equal to less than that obtained with the training set including the attribute, we perform the called 'smooth screening'. However, if the selected training is still too complex to be easily managed, we perform a 'hard screening' accepting a small increase in the error obtained by the resultant training set after removing the less relevant attributes. In any case, FA checks the efficiency of the selected training set by means of the nearest-neighbour system where the number of neighbours to be used was calculated as a function of the number of examples. The combination of the aforementioned tools made it possible to obtain first an ordering of the attributes taking into account their relevancy and secondly, the removal of the less relevant attributes. The obtained training sets were used as input for the C4.5 system. The accuracy of the performance of the ML procedure was estimated by cross validation. The training sets were divided into 10 folders. Each of these folders was successively used as a test set while the other 9 were used for training. The prediction function obtained by the ML procedure from the other 9 folders was applied for each example from the test folder, and then the absolute difference with respect to the class of the example was computed. The experiment was run 5 times, finally returning the average of the differences thus computed as a faithful estimation of the accuracy of the ML procedure acting on the whole training set when the prediction functions to unseen cases were applied.

Results
Major attributes included in the training set has been described in Tables 1 and 2. These herds are representative of dairy management and production structures in the north of Spain. The average number of lactating cows per farm was 42.3, but 30% of the studied farms had less than 30 lactating cows. Only 22% of the farms possessed more than 50 lactating cows. The Chi square test did not show significant differences with respect to the size of the herd for the variables involved, except for the bedding system and number of clusters per milker. Productive performance was consistent across herds. Duncan's multiple range test did not show significant differences for individual milk yield, fat, protein and milk urea concentration, individual SCCs or cows per milker according to size of dairies (Table 1). In addition, there were no significant differences in the distribution of heifers and multiparous cows across dairies, even when herd size was taken into consideration. Table 3 shows the main results of the C4.5 performance, analysing the initial training set and the resulting training sets after the application of smooth or hard screening of the initial training set using the FA tool. In addition, Figure 2 shows the decision tree generated by C4.5 to predict CMT performance using as input just five attributes. The differences in classification of the examples between the learning system and the initial training set are consistent regardless of the target class (farmers' expertise or CMT) or the number of attributes used as input for the ML algorithm. The differences obtained for farmers' expertise ranged between 24.5% for the whole training and 25.4% for more reduced training set (6 attributes). The differences obtained for CMT performance ranged between 22.5 and 22.9. The ML system respectively assessed suspect or CMT-positive 1,518 and 1,222 records respectively while the training set included 2,293 and 1,780 records in these categories. Consequently, the system did not consider 10.6% and 7.7% of the available records to be suspect or CMT-positive. However, the system considered any record of 818 cows as suspect according to farmers' expertise and 672 as CMTpositive, while the training set includes only 684 and 544 suspect or positive cows, respectively. Table 4 shows how the ML algorithm tended to classify records showing higher SCCs than the actual assessments as suspect or CMT-positive. However, the percentages of cows classified as healthy or CMT-negative were virtually the same irrespective of the actual classification or that of the system. Table 3 Differences in classification (in percentage) between farmers' expertise (healthy or suspect) or CMT performance (positive or negative) and learning system (C4.5) performance obtained by cross validation of 10 folders repeated 5 times over the training set described in the text (Unterschiede in der Bewertung (in %) zwischen der Sachkenntnis des Milchbauern (gesund oder verdächtig) oder den CMT-Ergebnissen (positiv oder negativ) und den Ergebnissen des AI-Lernsystems (C4.5), erhalten durch Cross-Validierung von 10 Ordnern, mit fünfmaliger Wiederholung am im Text beschriebenen Übungssatz) Number of attributes used The ML system used here obtained similar results using 11 or 6 attributes (after a smooth or a hard screening respectively) to classify a cow as suspected of being non-healthy to those obtained using the full training set. Learning how to identify cows that are CMTpositive, the system used 15 and 5 attributes respectively. The linear score (actual or previous), lactation number, herd size, lactating cows per milker, milk urea concentration, number of clusters per milker and actual SCC were the most informative attributes for mimicking both farmers' expertise or CMT performance in dairy herds. However, when smooth screening of attributes is considered, the system presents substantial differences with respect to learning how to identify suspect or CMT-positive cows. In order to identify cows suspected of being non healthy, the ML system takes into account a number of management and milking routines (dry cow management, premilking teat sanitation and forestripping check), whereas in order to classify a record as CMT-positive, it basically takes production information of the cow and its herd from dairy recording. Farmers' expertise focused on identifying non-healthy animals by taking account of two main sources of information from a given cow: its age and the characteristics of its milk (SCC as linear score); the other information comes from the cow's environment, mainly the possible attention that milkers can offer to the cows, characterised as the number of lactating cows in a given month, the ratio cows/milker and the number of clusters per milker. Milk urea concentration can reflect a potential stress of the cow due to feeding imbalance (RICHARD et al., 2001). Discussion The information obtained from the questionnaire survey highlights the fact that the management conditions of the dairies were quite homogeneous across herds (Tables 1  and 2). Most large farms had a loose housing system, while most small farms had a tied housing system. In spite of the farm size, the most frequent bedding system was rubber or concrete, and almost 95% of the dairies used some type of bedding sanitation. The number of milkers per farm was more dependent on the availability of family members than on herd size. Most herds seemed to follow sound mastitis control procedures: 70% milked the cows with a proper sequence; almost 85% checked the milk before milking used paper towels to dry teats and applied some kind of system to disinfect teats before milking, and virtually all of the farms sanitised teats after milking. To dry the cows, an abrupt cessation of milking was preferred, and 95% of the farms administered some kind of therapy to dry cows. Human expertise and CMT performance are expected to present substantial errors. However, the performance of our system is consistent with others obtained from literature. Yang et al. (2000) using test-day records and conformation traits information to predict clinical mastitis obtained an overall efficiency of the predicting ability of their Artificial Neural Network (ANN) based system of 76.2%; in other words, it has an average error of 23.8% which is similar to the errors reported in Table 3. The probabilities of diagnosing the mastitis bacteriological status of dairy herds by means of ANNs and using test-day records and information of management practices ranged from 57 to 71% in a study by HEALD et al. (2000). To learn how to mimic CMT performance, the system characterises the environment of a given cow in a different way than when mimicking farmers' expertise. The productive performance of a given cow in a given herd seems to be sufficient to predict CMTpositiveness accurately. These factors can characterise the possible stress affecting the cow leading to an increase in the incidence of mastitis (GIESECKE, 1985). The system clearly uses both SCC and linear score information to predict CMT results and, to a lesser degree, to mimic farmers' expertise. SCC and its log-linear transformation seem to provide different information. Probably, the linear score shows a linear relationship with other factors affected by subclinical mastitis only within a given range of SCC (HORTET and SEEGERS, 1998). ML procedures can take into account different sources of information regardless of their non-linear relationships (GOYACHE et al., 2001). Although decision rules obtained from the available training set should be tested in different productive conditions, the techniques presented in this paper can be generalised. Dairy recording organisations usually provide farmers useful information to identify animals expected to produce persistently high SCCs. This information is often based on the definition of a SCC threshold above which a given animal is highlighted as possibly being subclinical mastitic. ML-based systems fall outside these considerations because their assessments are obtained taking into account the individual profile of a given dairy in given production conditions.

Conclusions
We conclude that ML techniques can be used by dairy organisations to give recommendations to restrain SCC based on the particular production conditions of a given farm. A scenario in which robot milking can gradually replace conventional milking will require a different concept of herd management referring specially to the possibilities of computerised monitoring, analysis and control of individual animals in a transparent manner to the user (SPAHR and MALTZ, 1997). The implementation of knowledge-based tools will deal with this task.