|
|
||||||||
Dep. of Agron., Kansas State Univ., Manhattan, KS 66506
* Corresponding author (amfeyer{at}ksu.edu).
Received for publication March 24, 2003.
| ABSTRACT |
|---|
|
|
|---|
Abbreviations: CPT, cultivar performance trial df, degrees of freedom DYA, differential yielding ability SE, standard error
| INTRODUCTION |
|---|
|
|
|---|
Sample-to-population inference to predict the magnitude of yield differences in future seasons has posed statistical and practical problems. Literature on experimental design has contributed to precision in a single trial, but Fisher (1951) and others (Yates and Cochran, 1938; Salmon, 1951; Kempton and Talbot, 1988) point out the limited utility of single-trial results, stressing the need for trials across multiple location-seasons. Salmon (1951) highlighted the dilemma of waiting many seasons for statistical significance to occur when a producer needs to make decisions in a shorter time frame.
Patterson (1997) described different statistical methods used in cultivar testing. Talbot (1984) estimated variance components involving cultivars, locations, years, and their interactions for 19 crops. Acceptance probabilities were used to assess the risk of underestimating the potential performance of new cultivars. Kempton and Talbot (1988) reviewed the contribution of statistics to national cultivar testing schemes in the UK and called for statistics "to meet the need for greater precision in predicting future variety performance."
In the USA, most new wheat cultivars are tested in trials administered by state land grant universities. Entries in CPTs are a combination of cultivars, some from private sources but most from public programs. Trials are conducted on both producer land and university research stations. In any one season, entries represent the expert judgment of wheat breeders and agronomic specialists. Trials can represent the first season for some entries but can be one of many seasons for others.
We sought statistical methods to summarize yields from sets of CPTs and to aid the process of selection for future trials. We developed models using reasonable assumptions that would address the problem of unbalanced data, partially remove unwanted environmental effects from entry comparisons, compare cultivars that were present for different time periods, have simple computation procedures, and be easy to update annually. Most importantly, we wanted methods that were applicable to probabilistic inference for future years.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Database
Data used to test assumptions and apply our statistical method were wheat yields for entries in CPTs conducted at multiple locations over multiple seasons in Oklahoma, Colorado, Kansas, and Nebraska (fall-planted wheat) and North Dakota and Minnesota (spring-planted wheat). Ecogeographic regions were contiguous USDA Agricultural Statistics Districts that had similar precipitation patterns within their borders. Yields were reported as averages over replications located at experiment stations or producer fields.
Observational Units: Populations and Random Samples
The physical unit of observation was a seed lot with a given genetic makeup, and the measure of interest was grain yield harvested from the sample in a CPT. A complex set of environmental factors had a major influence on yield. These factors included (i) weather, whose effects were present from planting to harvest; (ii) soil properties; (iii) pests (air- and soil-borne diseases, insects, and weeds); and (iv) interactions of weather elements with soil properties and pests (Feyerherm and Paulsen, 1981, 1986; Feyerherm et al., 1992; Karathanasis et al., 1980). The combined effect of these factors acted in a complex fashion on grain yield. This supported our assumption that environmental effects on the yield of an entry in a CPT were equivalent to a random draw from a population of environmental effects present in a region within and among seasons. These uncontrollable environmental effects concealed genetic differences in yielding ability among cultivars. The need for repetitive sampling was met by trials at different locations over multiple seasons. Trials at two locations per season for three seasons provided six observations for analysis and inference concerning genetic effects on yield for cultivars entered in all six trials.
Check Cultivars and Differential Yielding Ability
Large variation in environmental effects from season to season and smaller, but still large, variance among locations within a season make detecting genetic yield differences difficult. Relatively small variance among entry yields within a CPT led to use of check (control) cultivars as targets for remaining entries to compete against. Within a trial, the mean yield of checks was the standard against which all other entries were measured. The difference in yield between a cultivar and the standard was its DYA value for that trial.
To be a check within an ecogeographic region, a cultivar had to be present in all or almost all trials for five or more seasons, demonstrate good yielding ability, and prove popular with producers as noted in USDA surveys. For a noncheck cultivar, only those DYA values from trials where both the cultivar and check or checks appeared were included in an analysis. Even at the expense of loss of precision due to smaller sample size, it may be informative to run analyses for a number of different checks. Patterson (1997) described this procedure as the method of direct differences using control calculations.
Statistical Models and Parameters
We chose the Model II random effects statistical model to divide a DYA value into fixed and random parts (Snedecor and Cochran, 1980). The DYA value for cultivar (v) at location (j) in season (i) was modeled as:

µD(v) = population mean DYA value for cultivar v
Si(v) = random environmental effects for season i, which are normally and independently distributed with mean zero and variance
2S
Lij(v) = random environmental effects for location j in season i, which are normally and independently distributed with mean zero and variance
2L
The Dij(v) values were normally and independently distributed with mean µD(v) and variance
for cultivar v. The Si(v) and Lij(v) values were assumed to be independent. The random behavior of environmental effects gave credence to the independence assumptions.
Estimators, Standard Errors, and Inference
If we define
![]() | [2] |
![]() | [3] |
i
, and
(v) are unbiased estimates of µD(v). The population variance of
(v) is
![]() | [4] |
(v) is estimated by
![]() | [5] |
For the model described by Eq. [1], the quantity {[
(v) µD(v)]/SE[
(v)]} is distributed as Student's t with (N 1) degrees of freedom (df) and was used for tests of significance and confidence intervals.
For N = 1 (a single season) and n
2,
![]() | [6] |
is distributed as t with (n 1) df. One can draw inferences about µD(v), but the conclusions relate to the ith season only. The variance of Dij(v) is
. Unless
2S
= 0, Eq. [6] would underestimate population standard errors.
Computations
The SAS software (SAS Inst., 1985) was used to calculate and produce a compact output of sample statistics
(see Appendix).
The Dij(v) values (v = 1, 2,..., V; i = 1, 2,..., N; j = 1, 2,..., ni) were input to PROC MEANS in the first stage, and
i
values were outputted and input into a second-stage PROC MEANS. The first-stage output contained
i
, SE
, t, and P values for all cultivars for each season. The second-stage output had
(v), SE[
(v)], t, and P values over N seasons.
If ni = n for all seasons, identical results for
(v) and SE[
(v)] were output by PROC NESTED. For unequal n, we replaced n by ni in Eq. [2] and [6] for Stage 1. In Stage 2, Eq. [3] produced unweighted means. If µD(v) = 0, then
(v)/SE[
(v)] was approximately distributed as t with (N 1) df. Robustness of the t distribution when PROC MEANS was applied in two stages was examined by model simulation of Eq. [1] and its properties.
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
2S
= 0, only a single error was needed, and a single-stage PROC MEANS analysis with nN observations would suffice. If
2S
> 0, collapsing the data and not discriminating between locations and seasons leads to an underestimate of the true SE of
(v) and too many significant results. The expected value of the underestimate can be deduced by comparing the first and last lines of Table 1. Dividing the expected mean square of each line by nN and subtracting gives a difference of 
2S. If N = 1, the underestimate is
2S, as expected.
|
2S
> 0 for a group of cultivars for fall-planted wheat in Oklahoma and spring-planted wheat in North Dakota and Minnesota. Treating yields from n locations within seasons over N seasons as a single set of nN environments instead of a two-stage sample not only underestimates population SE[
(v)]s, but the df would be inflated, [(nN 1) vs. (N 1)], further biasing t tests for many cultivars.
|
) and p values decreased as values of
drifted toward their population µD values.
|
|
|
|
in the statewide column in Table 6. Except for Ivan, the remaining five of the top six cultivars were significant at the p < 0.10 level. Conversely, for the bottom three, a µD < 0 conclusion was reasonable.
Regional analyses are usually run because entries vary across regions and a genetic x region interaction may exist (Feyerherm et al., 1992). Inspection of
values in Table 6 suggests that Russ, Parshall, and Ingot may have larger µD values in the east than the west. Two-sample t tests,
/
1/2, for the three cultivars calculate to 2.02, 2.23, and 1.88 with 10, 2, and 4 df, respectively. Resulting p values were 0.07, 0.15, and 0.13. Based on data from the North Dakota Agricultural Statistics Service, Russ was the most popular spring wheat cultivar in 2000 and 2001, with much larger percentages in the east than west. Additional seasons are needed to track performance of Parshall and Ingot across regions. Oxen, the third most popular cultivar in 2001, had quite uniform performance across the state.
Table 7 shows results of a DYA analysis for fall-planted wheat in portions of Colorado, Kansas, and Nebraska. Cultivar 2137 was the chosen check because it appeared in all three interstate regions and was more prevalent in trials than any other cultivar in the 5-yr period. Its superior yielding ability was a good target for other cultivars to match. Only Trego (p = 0.07) in the North Central region and Alliance (p = 0.17) in the Northwest may outyield (µD > 0) 2137.
|
| CONCLUSIONS |
|---|
|
|
|---|
While an equal number of locations (n) per season for all entries provides exact significance tests, it is not necessary. Varying n leads to approximate, but close, tests. Varying the number of entries is immaterial because the DYA analysis is performed separately on each cultivar.
Placing the same check cultivar in all trials is important to maximize sample size. Without checks, a trial is lost for a DYA analysis. However, our method accommodates multiple analyses. Despite potential reduction in sample size, performing analyses using different checks and comparing selected pairs of cultivars may be instructive.
Applications were shown for elite wheat trials, but the DYA method might well be applied to wheat nurseries devoted to prerelease trials and to other crops. Besides yield, it could be applied to other characters (e.g., protein and test weight) where biotic and abiotic factors play an important role in creating random environmental effects.
| APPENDIX |
|---|
|
|
|---|
OPTIONS LS=72;
DATA A ; INPUT LOCATION $ SEASON CHECK HJ98 REEDER;
CARDS;
CAR 1998 3470.10 3443.20 .
CAR 1999 3201.10 3611.33 3698.75
CAR 2000 2952.28 3497.00 3335.60
MAN 1999 2636.20 2965.73 3154.03
MAN 2000 3604.60 . 4055.18
NLZ 1999 3073.33 3375.95 3570.98
NLZ 2000 2851.40 . 3086.78
SEL 1999 2864.85 2716.90 3019.53
SEL 2000 3369.23 . 3517.18
SHD 1999 921.33 . 1176.88
WIS 1998 2723.63 2515.15 .
WIS 1999 1822.48 2118.38 1883.00
WIS 2000 2057.85 1956.98 1936.80
DATA B ; SET A ;
DYA=REEDER-CHECK; CULTIVAR='REEDER'; OUTPUT;
DYA=HJ98-CHECK; CULTIVAR='HJ98'; OUTPUT;
KEEPCULTIVAR SEASON LOCATION DYA ;
DATA C ; SET B; IF DYA=. THEN DELETE;
PROC SORT DATA=C ; BY CULTIVAR SEASON LOCATION;
PROC PRINT ROUND; VAR CULTIVAR SEASON LOCATION DYA;
PROC MEANS DATA=C NOPRINT ; BY CULTIVAR SEASON;
VAR DYA; OUTPUT OUT=NEW N=NDYA MEAN=DBAR STD=SDDYA STDERR=SEDBAR T=TDBAR PRT=PDBAR;
DATA NEW1; SET NEW; DROP _TYPE_ _FREQ_; IF NDYA=1 THEN DELETE;
PROC PRINT ROUND;
PROC SORT DATA=NEW1 ; BY CULTIVAR ;
PROC MEANS DATA=NEW1 NOPRINT; BY CULTIVAR;
VAR DBAR; OUTPUT OUT=NEW2 N=NDBAR MEAN=DBAR2 STD=SDDBAR STDERR=SEDBAR2 T=TDBAR2 PRT=PDBAR2;
DATA NEW3; SET NEW2; DROP _TYPE_ _FREQ_;
PROC SORT; BY DESCENDING DBAR2; PROC PRINT DATA=NEW3 ROUND;
| NOTES |
|---|
|
|
|---|
| REFERENCES |
|---|
|
|
|---|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| The SCI Journals | Crop Science | Vadose Zone Journal | |||
| Journal of Natural Resources and Life Sciences Education |
Soil Science Society of America Journal | ||||
| Journal of Plant Registrations | Journal of Environmental Quality |
The Plant Genome | |||