Effects of threshold choice on the results of gene expression profiling , using microarray analysis , in a model feeding experiment with rats

Global gene expression studies using microarray technology are widely employed to identify biological processes which are influenced by a treatment e.g. a specific diet. Affected processes are characterized by a significant enrichment of differentially expressed genes (functional annotation analysis). However, different choices of statistical thresholds to select candidates for differential expression will alter the resulting candidates list. This study was conducted to investigate the effect of applying a False Discovery Rate (FDR) correction and different fold change thresholds in statistical analysis of microarray data on diet-affected biological processes based on a significantly increased proportion of differentially expressed genes. In a model feeding experiment with rats fed genetically modified food additives, animals received a supplement of either lyophilized inactivated recombinant VP60 baculovirus (rBV-VP60) or lyophilized inactivated wild type baculovirus (wtBV). Comparative expression profiling was done in spleen, liver and small intestine mucosa. We demonstrated the extent to which threshold choice can affect the biological processes identified as significantly regulated and thus the conclusion drawn from the microarray data. In our study, the combined application of a moderate fold change threshold (FC≥1.5) and a stringent FDR threshold (q≤0.05) exhibited high reliability of biological processes identified as differentially regulated. The application of a stringent FDR threshold of q≤0.05 seems to be an essential prerequisite to reduce considerably the number of false positives. Microarray results of selected differentially expressed molecules were validated successfully by using real-time RT-PCR.


Introduction
DNA chip (microarray) technology is a modern and powerful method for performing global gene expression analyses and allows a characterization of changes in the multigene expression pattern, providing clues about regulatory mechanisms, cellular functions and biochemical pathways (LOCKHART & WINZELER 2000).In correspondence, the examination of large scale gene expression changes could be a sensitive and useful method for physiological characterization of animals, like e.g. for studying physiological side effects of genetically modified (GM) foods (LIU-STRATTON et al. 2004) in the light of known impacts of macro-and micronutrients as well as dietary bioactive molecules on gene expression (DE CATERINA & MADONNA 2004).Consequently, DNA chip technology could contribute to an increased probability of detecting unexpected physiological side effects due to genetically modified food or food components.
However, methods that are used for data analysis of microarray experiments can have a profound influence on interpretation of the results (QUACKENBUSH 2001).For the determination of genes that show significant expression changes between experimental groups due to a diet, a t-test is usually carried out.According to literature p-values without multiplicity adjustment or False Discovery Rate (FDR) corrected p-values with different stringencies (e.g. from 0.05-0.2) are used in combination with different fold changes (≥1.3 to 2.0) as cut-off thresholds to identify genes, which are »differentially expressed« between the corresponding experimental groups (BOOTH et al. 2004, RIEGER and CHU 2004, RUIZ-BALLESTEROS et al. 2005, ZHU et al. 2005, HAN et al. 2006, CHOI et al. 2007, INGRAM et al. 2007).
It is comprehensible that using different thresholds and filter criteria can have a profound influence on the list of differentially expressed genes (DEGs) (PAN et al. 2005).These lists of DEGs in turn are often the base for the identification of biological processes and thus for the conclusion to be drawn from the results.
In a model feeding experiment physiological effects of a transgenic food additive, consisting of lyophilized inactivated recombinant VP60 baculovirus (rBV-VP60), was investigated in rats.
We intended to study the effect of FDR correction using different threshold stringencies (without FDR: p≤0.05;with FDR: q≤0.2,q≤0.1,q≤0.05)and of fold change filters (without FC, FC≥1.5, FC≥2.0) on the biological processes identified as being dietaffected.Expression data of selected genes were verified by quantification of transcript levels using real-time RT-PCR.

Animals, diets and experimental design
For the feeding trial 13 young male Wistar rats (Charles River Laboratories, Germany) were used.The animals were housed in groups for 42 days.For the experiment the rats were divided into two groups (6 rats in the control group, 7 rats in the test group).A commercial basic diet (ALTROMIN Standard-Diet 1310) supplemented with 15 % of lyophilized potatoes was fed to both groups.In addition, both groups received as fodder additive 20 ml of a solution consisting of corresponding amounts (approximately 10 6,0 TCID 50 per ml) of lyophilized and ethylenimin-inactivated baculovirus.The control group was fed a non-transgenic wildtype baculovirus (wtBV) and the test group was given the transgenic baculovirus (rBV-VP60: ~1 μg VP60/ ml; VP60 is a major structural protein of a calicivirus that causes the rabbit hemorrhagic disease) which was propagated and handled according to HAMMER (2006).The baculovirus additives were fed on days 0, 2, 4 and 21, 23, 25.Both diets were isoenergetic and isonitrogenous.Animals had free access to the food and water.
At the end of the experiments animals were slaughtered and tissue samples of spleen, liver and small intestine mucosa were collected, immediately frozen in liquid nitrogen and stored at −80 °C until isolation of RNA.

RNA extraction
Comparative expression analyses of the rats fed food additives with rBV-VP60 and wtBV, respectively, were done in spleen, small intestine mucosa and liver.After homogenization in Trizol reagent the total RNA was extracted from tissue samples using RNeasy Easy Kit (QIAGEN) according to manufacturers' instructions.Expression analyses were carried out using representative group specific RNA pools of liver, spleen and small intestine mucosa.RNA pools of each tissue and group were made up of equal amounts of RNA aliquots of the individual undiluted RNA samples.

Microarray analysis
For genome wide expression profiling we used the rat 10K oligo-array (MWG Biotech AG), which allows the detection of expression levels of 9 802 transcripts.A dual labeling system was applied (cy3 and cy5 dyes).Probe preparation and chip hybridization were carried out essentially according to manufacturers' protocol (MWG Biotech AG).Each hybridization was carried out in triplicate.
All chips were scanned with a 428TM Scanner (Affymetrix) at 532 nm (cy3) and 635 nm (cy5).Fluorescence intensity of each spot was measured and a quality check was carried out using ImaGene 5 software (BioDiscovery Inc.).Each signal had to pass a number of quality criteria including consistency of signal, minimum intensity level (cy3signal or cy5-signal ≥150) and minimum signal-to-background ratio ([mean signal of spot mean signal of background] / standard deviation [SD] of background signal ≥2).Intensity values for each chip were normalized using the Lowess routine of GeneSight 3.0 (BioDiscovery Inc.).Genes that passed these criteria were included in further statistical analysis.

Real-time RT-PCR
Synthesis of first strand cDNA was performed with MMLV-RT (Promega, Madison, USA) and poly-T primers using 2 μg pooled total RNA of the corresponding feeding group.
Quantitative analysis of PCR products was carried out in the LightCycler instrument (Roche, Mannheim, Germany) according to optimized PCR protocols as described by SCHWERIN et al. (2006).Gene specific primers, annealing temperature and fluorescence acquisition temperature used are given in Table 1.For all assays a standard curve was generated using DNA standard dilutions (10 2 , 10 3 , 10 4 , 10 5 and 10 6 copies) of the corresponding PCR product.The copy number of the housekeeping gene Gapdh was measured to normalize for equal RNA amounts.The mRNA abundance was analyzed in triplicate.

Statistical analysis
Single gene analysis: For microarray studies the expression signals of RNA pools of the control and the test group for each tissue were compared.Fold changes were calculated for each gene (signal intensity of rBV-VP60 / signal intensity of wtBV).Significance of differential expression was analyzed by t-test of SAS (2002).To correct for multiple testing and to control the False Discovery Rate (FDR) we used the concept of STOREY and TIBSHIRANI (2003) and the software Q-value which calculates a corresponding q-value for the ordered p-values.Several FDR thresholds (No FDR: p≤0.05,FDR: q≤0.05/ 0.1/ 0.2) and fold change filters (no FC, FC≥1.5/ 2.0) were tested to identify differentially expressed genes and biological processes, respectively.Functional annotation analysis: Based on Gene Ontology (GO) annotation (http://www.geneontology.org)biological processes with a significantly increased number of differentially expressed genes were arrogated to have an Ease Score ≤ 0.05.The Ease Score is implemented in the DAVID software which calculates the enrichment with respect to the total number of genes assayed and annotated within the classification system.The Ease Score is a conservative adjustment of the fisher exact probability, taking into account the overestimation of categories assigned with only few genes (HOSACK et al. 2003).Not for all genes a functional annotation was available.In spleen 3 738, in small intestine mucosa 3 600 and in liver 4 022 genes were GO annotated.Thus, for the identification of biological processes only these molecules could be included.
Real-time RT-PCR: Transcript levels of selected molecules were determined using realtime RT-PCR and the two experimental groups were compared with t-test of SAS (2002).Significance was assumed at p≤0.05.

Effect of a FDR correction on the number of identified differentially expressed genes and on biological processes with a significant enrichment of DEGs in liver, spleen and small intestine mucosa
We studied the influence of applying different thresholds, on the number of genes identified as differentially expressed and on biological processes identified as being dietaffected.Transcript levels were determined in spleen, small intestine mucosa and liver of rats fed either rBV-VP60, a genetically modified (GM) food additive or wtBV, a non-GM additive.Of 9 802 total probes on chip 8 800, 8 339 and 9 320 in spleen, small intestine and liver, respectively (see Table 2) passed the quality criteria and could be included in further analyses.Expression levels of both feeding groups were compared.
Firstly, we tested the impact of an application of a FDR correction using different significance thresholds (q≤0.05,q≤0.1, q≤0.2) on the quantity of genes identified as differentially expressed between both feeding groups.The number of differentially expressed genes was comparable in all three tissues (spleen, 2 377, liver, 2 754, small intestine mucosa, 1 608 genes) when applying a t-test (p≤0.05)without FDR correction.However, the number of expected false positives was very high (spleen: 440 genes, small intestine mucosa: 417 genes, liver: 466 genes).In contrast, when applying a t-test plus FDR correction and a commonly used threshold (q≤0.05), the number of DEGs was reduced quite differently in the three tissues.Whereas, in the liver and the spleen 95 % and 39 % of the genes, respectively, were still identified as differentially expressed, in intestine less than 1 % of the genes were left as differentially expressed, indicating that most if not all DEGs identified in intestine only by t-test were false positive ones.When using FDR correction with lower stringency (q≤0.1 and 0.2), the number of DEGs increased.At these FDR rates the quantity of DEGs was even higher than after t-test without FDR correction.However, along with more DEGs, an increased amount of false positives has to be expected.Based on the differentially expressed genes recognized in liver and spleen, diet-affected biological processes (GO annotation) with a significant enrichment (Ease Score≤0.05) of DEGs were identified by means of DAVID software.In Table 3a and 3b biological processes of spleen and liver with a significant over-representation of DEGs are shown in dependence on the FDR threshold that was applied.An application of a FDR threshold considerably modified type and number of significantly enriched biological processes.Table 3 shows clearly that the decision to accept higher rates of false positives (10 % and 20 %) can change the results dramatically.Therefore, together with FDR thresholds, fold change filters were additionally applied based on the assumption that among genes showing low expression differences the probability of false positive genes is higher.Small intestine was excluded from the further study because only few genes remained differentially expressed after FDR correction.

Effect of a combined application of FDR correction based thresholds and fold change filters on the number differentially expressed genes and on biological processes with a significant enrichment of DEGs in liver and spleen
We tested a combined application of different FC filters and FDR thresholds on the number of DEGs.Moreover we examined the impact of FC filters on the pattern of significantly enriched biological processes when a stringent FDR threshold (q≤0.05), which is commonly used, was applied.In Table 4 the number of differentially expressed genes identified by using different thresholds is shown for the tissues spleen and liver.When applying FC filters of ≥1.5 or ≥2.0 the number of DEGs was reduced considerably at all FDR rates and in both tissues.
Without FDR correction the application of a FC≥1.5 lead to a reduction of DEGs from 2 377 to 248 in spleen and from 2 754 to 153 in liver, which constitute 10 % and 6 % of the DEGs found without FC filter.A FC filter of ≥2 lead to a further reduction of DEGs; only 3 % and 1 %, respectively remained significant in spleen and liver.With a FDR correction (q≤0.05) the influence of a FC filter of ≥1.5 was just as severe.The number of DEGs decreased from 917 to 157 genes (17 %) in spleen and from 2 620 to 149 genes (6 %) in liver.When raising FC-filter stringency to ≥2 only 5 % of the DEGs in spleen and 1 % in liver remained.The impact of a FC cut-off on the number of significantly diet-affected biological processes was likewise strong.After applying a FDR threshold with a more stringent q≤0.05 in combination with a moderate FC ≥1.5 filter, 1 and 7 biological processes were identified as differentially regulated in spleen and liver, respectively (see Table 5a, b).
Table 5 Influence of applying different »fold change« filters on identified biological processes exhibiting a significant enrichment of differentially expressed genes.Different gene expression is referring to spleen (a) and liver (b) of rats fed rBV-VP60 in comparison with rats fed wtBV and to DEGs with q-values ≤0.05.

Einfluss der Anwendung verschiedener »Verhältnis«-Schwellenwerte auf die Identifizierung von biologischen Prozessen, die eine signifikante Anreicherung von different exprimierten Genen zeigen. Differente Expression bezieht sich auf den Vergleich zwischen rBV-VP60 gefütterten und wtBV gefütterten Tieren in den Geweben
Milz und Leber und auf Gene mit q-Werten ≤0.05 In liver most differentially expressed genes were involved in »nucleic acid metabolism«, »intracellular signaling cascade« and »regulation of cellular metabolism«, whereas in spleen they belonged to »lipid metabolism«.With a FC filter of ≥2 and q-value ≤0.05only one process, namely »lipid catabolism« remained significant in spleen, whereas in liver no significant process was found due to the drastically shortened list of DEGs (compare Table 4).

Quantitative expression analysis using real-time RT-PCR
In order to confirm results obtained from the microarray study we performed real-time RT-PCR for selected molecules.Transcription levels of 9 genes (liver: 2, spleen: 7), which were identified as differentially expressed between both feeding groups in microarray analysis (FC ≥1.5, q≤0.5), were re-analyzed by real-time RT-PCR.In Table 6 (spleen) and 7 (liver) relative transcript levels (mRNA abundance of rBV-VP60 fed rats/ mRNA abundance of wtBV fed rats) of these genes measured by microarray analysis are compared with those measured by real-time RT-PCR.For 7 genes (Pnlip, Pla2gl, Ctrb, Ela1, Add2, Rara, Cebpb) significant different transcript levels were observed between both feeding groups after real-time RT-PCR confirming results of microarray analysis.For the genes Gp5 and F3 microarray results could not be validated.Transcript levels measured by real-time RT-PCR between both feeding groups were not significantly different for these two candidates.

Discussion
The present study aimed to investigate the effect of applying FDR correction thresholds and FC filters with different stringencies to a microarray dataset used for functional annotation analysis, as a prerequisite for a reliable data interpretation.In a model feeding experiment with rats fed either lyophilized inactivated recombinant VP60 baculovirus (rBV-VP60) or lyophilized inactivated wild type baculovirus (wtBV) food additives, we used DNA chip technology to identify significantly affected biological processes.Based on the genes recognized as differentially expressed and on gene ontology (GO) annotation (http://www.geneontology.org),biological processes with a significantly increased number of differentially expressed genes were identified using a modified Fisher exact test (Ease Score ≤0.05), which is implemented in the DAVID software used.
To identify treatment related differences in gene expression, t-statistic and fold change based filtering is commonly used.Besides, it is recommended to correct for high numbers of false positives, which result from thousands of simultaneous t-tests (multiple testing problem) in a typical microarray experiment (ALLISON et al. 2006).A FDR based approach can be applied to adjust for this problem.FC cut-offs with different stringencies are widely used in combination with t-statistics to define relevant expression changes.However, the question which fold change or which FDR threshold should be used remains unanswered from literature.Fold change thresholds of ≥ 2 and a FDR correction with q≤0.05 are common.Nevertheless, fold change filters of 1.3 or 1.5 and FDR thresholds of q≤0.1 and q≤0.2 can be found in literature (BOOTH et al. 2004, RIEGER & CHU 2004, RUIZ-BALLESTEROS et al. 2005, ZHU et al. 2005, HAN et al. 2006, CHOI et al. 2007, INGRAM et al. 2007).
By means of the model feeding experiment with rats gene expression changes, possibly caused by the GM diet, were analyzed in spleen, small intestine mucosa and liver.The number of genes identified as differentially expressed without using any thresholds except p≤0.05 was comparable in all three tissues analyzed (spleen: 2 377, small intestine mucosa: 1 608, liver: 2 754).Applying an FDR threshold with a widely used level of significance (q≤0.05) to our dataset reduced the number of DEGs quite unequally in the three tissues.Whereas in liver a considerable number (95 %) of the genes were still identified as differentially expressed, in spleen only 39 % and in small intestine only 1 % of the genes exceeded the threshold indicating a very high proportion of false positives in small intestine and partly in spleen.Lower stringencies (FDR q≤0.1 or 0.2) increased the number of DEG candidates, even exceeding the number of genes identified as differentially expressed after t-test without FDR correction.However, different inconsistent biological processes identified as significantly regulated support the assumption of a high proportion of false positives among the DEGs when using no or low stringent FDR conditions.The criteria for determining which FDR level to use as threshold should surely depend on the purpose and context of the experiment: If the purpose is to find as many DEGs as possible (maximizing sensitivity) higher rates of expected false positives are probably acceptable.If the objective is to identify a small number of truly differentially expressed genes (maximizing specificity) for further studies, a stringent threshold may be appropriate (PAWITAN et al. 2005, CHEN et al. 2007).
For our data we concluded that a FDR threshold of 5 % would be most appropriate because a high reliability and a small amount of expected false positives were aimed.According to CHOE et al. (2005) many false positive candidates may occur at less stringent FDR thresholds.This may happen, because basic t-statistics are prone to false positives resulting from artificially low standard deviations, especially when only limited numbers of replicates are available, as it was the case in the present experiment.For experiments with large number of replicates variances may be estimated more correctly.In this case it might not be essential to apply a very stringent FDR of 5 %.The MAQC Consortium (2006) has likewise recognized the unstable nature of the variance estimate in the t-statistic measure in the context of their investigations.Consequently they proposed the combined application of test statistics plus a fold change ranking for the identification of DEGs.
In consequence, we additionally applied fold change cut-offs to our data, based on the assumption that among genes showing low expression differences the probability of false positive genes is higher.The combined application of FDR (5 %) and fold change thresholds (FC ≥1.5, FC ≥2.0) in data analysis did significantly reduce the numbers of DEGs.In our study, a combination of FDR threshold (5 %) with a moderate fold change cut-off of FC ≥1.5 exhibited highest reliability regarding biological processes identified as differentially regulated.A FC of ≥2 was probably too stringent, because no significant biological process was identified in liver due to the low number of remaining DEGs with a FC ≥2.In correspondence, it is recommended to apply smaller fold change cut-offs than ≥2 to involve not only excessive changes in fewer genes but also smaller adjustments of large gene sets (BEN-SHAUL et al. 2005).To ensure a high reliability of functional annotation analysis (identified biological processes by a significant enrichment of differentially expressed genes) we applied combined FDR and FC thresholds.According to our results, there is no alternative to an application of a stringent FDR threshold (q≤0.05) in order to reduce considerably the number of false positives.However, the decision, which fold change filter to use, should be taken in the context of the experiment to ensure a high stringency (further reduction of number of false positives) but also a critical number of remaining DEGs for the final Fisher's exact test.For this study a FC of ≥1.5 seems to fulfill these requests most appropriately.
Our results demonstrate the strong impact of threshold choice for defining DEGs on the results of functional annotation analysis.PAN et al. (2005) showed similar effects in their study.Their work was related to the identification of DEGs that comprise »signatures«, which are associated with tumour types and functional categories.By means of varying thresholds they produced »signatures« of different size and exhibited strong effects on the identification of functional categories.In correspondence to our study, it was concluded that microarray based gene signatures should be identified routinely by using a range of threshold conditions to assess the »sensitivity« of biological conclusions to threshold choice and the overall robustness of the conclusion.
As to be seen from our results the above discussed issues about threshold choice are not only a problem that scientists are facing when working in cancer or human related research.It is likewise of great importance in agricultural studies and, since microarray technology is more and more used in this field, the general need of such sensitivity analyses should be emphasized.
To summarize, it becomes clear that threshold choice considerably affects the fundamental outcome of microarray experiments.Probably, if thresholds shall be applied, they have to be chosen for each experiment individually and have to be taken with regard to the distinct experimental conditions and the objective of the study.In this context, a testing of a range of threshold conditions as proposed by PAN et al. (2005) should perhaps be considered as sensitivity analysis.However, suitable quality criteria are still missing, because we do not know the »truth«.That means that the decision, which thresholds are used to ensure a low number of false positives but also a critical number of remaining DEGs for the frequency statistics, will be always empirical.In contrast to GOEMAN et al. (2004), MANSMANN and MEISTER (2005) and NAM and KIM (2008) who discussed threshold-free functional annotation approaches (gene set enrichment) we propose the application of a stringent FDR threshold of q≤0.05 as an essential prerequisite to reduce significantly number of false positives.Although skipping a fold change cut-off probably increases the risk for high numbers of false positives (CHOE et al. 2005), as indicated by the large amount of DEGs that remained at all FDR levels in liver and spleen, results of functional annotation analysis with or without the use of a FC filter showed essential agreement.In the liver and the spleen the most significant biological process identified using a conservative FDR threshold (q≤0.05) in combination with a FC of ≥1.5 (liver, »nucleic acid metabolism«; spleen, »lipid metabolism«) was also identified as significantly regulated using the FDR threshold without FC filter.

Table 3
Influence of applying different »False Discovery Rate« (FDR) thresholds on biological processes exhibiting a significantly increased number of differentially expressed genes.Different gene expression is referring to spleen (a) and liver (b) of rats fed rBV-VP60 in comparison with rats fed wtBV.Einfluss der Anwendung verschiedener »False Discovery Rate«-(FDR)-Grenzwerte auf die Identifizierung von biologischen Prozessen, die eine signifikante Anreicherung von different exprimierten Genen zeigen.

Table 4
Number of differentially expressed genes, identified with different »False Discovery Rate« (FDR) and »fold change« (FC) thresholds in spleen and liver of rats fed rBV-VP60 in comparison with rats fed wtBV.

Table 6
Relative transcript levels (mRNA abundance of rBV-VP60 fed animals/mRNA abundance of wtBV fed animals) of selected genes measured in the spleen of rats with microarray technology and real-time RT-PCR.Relative Transkriptniveaus (mRNA-Menge der rBV-VP60 gefütterten Tiere/mRNA-Menge der wtBV gefütterten Tiere) ausgewählter Gene in der Milz von Ratten nach Messung Microarray Technologie und Echtzeit-RT-PCR.
1 FDR corrected p-value of t-test, 2 p-value of t-test, RTL relative transcript level Table7Relative transcript levels (mRNA abundance of rBV-VP60 fed animals / mRNA abundance of wtBV fed animals) of selected genes measured in the liver of rats with microarray technology and real-time RT-PCR.Relative Transkriptniveaus (mRNA-Menge der rBV-VP60 gefütterten Tiere/mRNA-Menge der wtBV gefütterten Tiere) ausgewählter Gene in der Leber von Ratten nach Messung mit Microarray Technologie und Echtzeit-RT-PCR.1 FDR corrected p-value of t-test, 2 p-value of t-test, RTL relative transcript level