Agronomy Journal Grow Your Career With ASA
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (16)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Dobermann, A.
Right arrow Articles by Ferguson, R. B.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Dobermann, A.
Right arrow Articles by Ferguson, R. B.
Agricola
Right arrow Articles by Dobermann, A.
Right arrow Articles by Ferguson, R. B.
Related Collections
Right arrow Geostatistics
Right arrow Data Management
Right arrow Site-Specific Analysis
Right arrow Spatial Distribution
Right arrow Maize Management
Right arrow Statistics
Published in Agron. J. 95:1105-1120 (2003).
© American Society of Agronomy
677 S. Segoe Rd., Madison, WI 53711 USA

PRECISION AGRICULTURE

Classification of Crop Yield Variability in Irrigated Production Fields

A. Dobermann*,a, J. L. Pinga, V. I. Adamchukb, G. C. Simbahana and R. B. Fergusona

a Dep. of Agron. and Hortic., Univ. of Nebraska, P.O. Box 830915, Lincoln, NE 68583-0915
b Dep. of Biol. Syst. Eng., Univ. of Nebraska, P.O. Box 830726, Lincoln, NE 68583-0726

* Corresponding author (adobermann2{at}unl.edu).

Received for publication January 21, 2003.

    ABSTRACT
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 MATERIAL AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Crop yield maps reflect stable yield patterns and annual random yield variation. Procedures for classifying a sequence of yield maps to delineate yield zones were evaluated in two irrigated maize (Zea mays L.) fields. Yield classes were created using empirically defined yield categories or through hierarchical or nonhierarchical cluster analysis techniques. Cluster analysis was conducted using average yield (MY), average yield and its standard deviation (MS), or all individual years (AY) as input variables. All methods were compared based on the average yield variability accounted for (RVc). Methods in which yield was empirically classified into three or four classes accounted for less than 54% of the yield variability observed and failed to delineate high-yielding areas. Six to seven yield classes established by cluster analysis of MY accounted for 60 to 66% of the yield variability. Differences among cluster analysis methods were small for MY as data source. However, fuzzy-k-means clustering had lower RVc than other methods if used with the MS or AY data. The spatial fragmentation of yield class maps increased in the order MY < MS < AY. Univariate cluster analysis of mean relative yield measured for at least 5 yr should be used for yield classification in irrigated fields where six to seven classes appear to provide sufficient resolution of the yield variability observed. More research should be conducted to develop methods that result in spatially coherent yield zones and to understand differences between rainfed and irrigated environments in the importance of mapping yield goals for crop management.

Abbreviations: AY, yields in all individual years • CV, coefficient of variation • Dv, fractal dimension • ISODATA, Iterative Self-Organizing Data Analysis • KME, k-means cluster analysis • MS, mean and standard deviation of yield • MY, mean yield • RVc, average yield variability across years accounted for by the classification • RVj, proportion of yield variability in one year accounted for by the classification • SD, standard deviation • SSCM, site-specific crop management • WAR, hierarchical cluster analysis using Ward's method


    INTRODUCTION
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 MATERIAL AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
GEOREFERENCED on-the-go yield mapping using combine-mounted yield monitors has become one of the most widely used precision-farming tools. Yield monitors generate spatially dense data at relatively low cost, potentially allowing characterization of the spatial and temporal yield variability. However, the analysis and interpretation of yield map data has lagged behind yield monitor adoption by farmers. As more yield monitors are used and multiple-year yield data accumulated, there is an increasing concern about how to process and interpret these data for site-specific crop management (SSCM).

On a field average basis, grain yield measured by yield monitors and certified electronic scales agrees within 2 to 5% (Doerge, 1997). With careful calibration and operation, yield monitors are sensitive to changes in yield although a variable time delay exists and the grain flow through a combine resembles a diffusive process (Arslan and Colvin, 2002). Individual data points on a yield map represent grain mixed from a certain area, and some uncertainty is associated with the exact size and geographical location of this area as well as measurement error. For the same location, this uncertainty is likely to vary from year to year because of different combine travel paths. Therefore, a single-year yield map is useful for interpretation of possible causes of yield variation but may be of limited value for more strategic SSCM decisions over medium- to long-term periods.

Procedures must be developed to correct or eliminate recognizable errors of yield monitor measurement and integrate multiyear sequences of yield maps. Here, we assume that a sequence of corrected and interpolated yield maps, which need to be classified to delineate areas with different yield expectation within a field, has been obtained. Such classification will result in a map of past yield performance. With multiple years of georeferenced yield data, repeating patterns and their more stable natural causes may be separated from random variation in each year, providing a basis for spatially varying yield goals and other SSCM decisions.

Interpretation and classification of multiple-year yield maps has often involved empirical criteria or decisions on how many yield classes should be formed. Blackmore (2000) proposed an empirical classification in which the sample mean and the coefficient of variation (CV) were used to classify yield into groups such as high yielding and stable, low yielding and stable, and unstable. Pringle et al. (2003) proposed an "Opportunity Index" for identifying fields with the greatest overall potential for SSCM, which they calculated from the magnitude of yield variation, its spatial structure, and empirical "thresholds" for both. Lark and Stafford (1997)( 1998) used fuzzy-k-means clustering for pattern recognition in multiple-year yield maps. Taylor et al. (2001) attempted to create yield goal maps by aggregating 3 to 7 yr of maize yield data into larger cells, calculating average past yields for different periods, and comparing the different yield goals with the actual yields obtained. They concluded that there was a greater opportunity for classifying consistently high-yielding areas than consistently low-yielding areas based on the mean relative yield and temporal standard deviation (SD).

These as well as other classification methods have not been evaluated using uniform data sets and statistical criteria that express how well spatial and temporal yield variability are accounted for. The objective of our study was to compare different procedures for classifying multiple-year continuous yield maps into categories or zones of different average yield and its variability among years.


    MATERIAL AND METHODS
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 MATERIAL AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Study Sites and Data Collection
Yield monitor data were obtained from two production fields in Nebraska from 1996 through 2001. Field A was located near Clay Center, NE (40°30'24'' N, 98°5'5'' W), and the crop sequence from 1996 to 2001 was maize–soybean [Glycine max. (L.) Merr.]–maize–maize–soybean–maize. The total field area was 62.7 ha, including a circular center-pivot–irrigated area (53.5 ha), three corner areas with partial furrow irrigation (6.9 ha), and a nonirrigated area (2.3 ha) in the southwest corner. Field A had four soil series (Soil Survey Staff, 1999): Butler (fine, smectitic, mesic vertic Argiaquoll), Fillmore (fine, smectitic, mesic Vertic Argiaboll), Crete (fine, smectitic, mesic Pachic Argiustoll), and Hastings (fine, smectitic, mesic Udic Argiustoll). The dominant soil in Field A was Hastings, which occupied about 80% of the total field area. Field A was flat, with an average slope of 0 to 1%, and moderately well to well drained. Crops in this field were grown under ridge tillage with rows in the east–west direction.

Field B was located near Cairo, NE (40°58'43.5'' N, 98°35'36.5'' W). Continuous maize was grown from 1996 to 2001, except for soybean grown in the south half of the field in 2000. The total field area was 62 ha, all under ridge tillage with furrow irrigation, with furrows and water flow in the west–east direction. Soil series at this site included Hall (fine-silty, mixed, mesic Pachic Argiustoll) and Wood River (fine, smectitic, mesic Typic Natrustoll). The majority of Field B is gently sloping or flat. Wood River soils occupied about 55% of the total area, mainly in the eastern half. More fertile Hall soils are mostly found in the western half. An eroded ridge with a slope of 3 to 7% crosses the entire field in a southwest to northeast direction.

At both sites, maize was typically planted from mid- to late April at a density of 7.4 to 7.7 plants m-2. Soybean planting was done in mid-May with seeding densities of 35 to 40 seeds m-2. Two to four different maize hybrids or soybean varieties were grown in each year in different parts of the field. Both crops were fully irrigated, and nutrients were applied based on routine soil testing and standard recommendations. In general, the quality of crop management and yield levels was high at both sites.

Grain yields were measured from 1996 to 2001 using a eight-row combine equipped with a DGPS receiver and an Ag Leader PF 3000 yield monitor (Ag Leader Technol., Ames, IA). Yield monitors were calibrated following standard procedures, and logging intervals were 1 or 2 s. At Site A, yield map data for 1999 were discarded because they were incomplete and affected by errors in position recordings.

Data Preprocessing
Raw data obtained from the yield monitor (.yld files) were processed using SMS Basic v. 2.0 (Ag Leader Technol., Ames, IA) with a constant grain flow delay of 12 s. Advanced export format files obtained from SMS were then further processed through a cleaning algorithm. The algorithm deleted the following erroneous values: (i) header status up, (ii) start and end pass delays (8 s) for both headlands and stop-and-go segments within the field, (iii) short segments (<12 data points), (iv) frequency distribution outliers (values outside mean ± 3 SD), (v) co-located yield records caused by global positioning system (GPS) drift, and (vi) local neighborhood outliers. The latter were removed based on a local neighborhood test performed for each yield data point following the movement of the combine through the field. For each location, a yield estimate was computed by inverse distance interpolation within a moving window that included the three preceding and three succeeding yield records in the same swath as well as yield records within a radius of 2 x the swath width in the perpendicular direction to combine travel. The 99% confidence interval of the estimate was obtained. If the actual yield value was outside this interval, the data point was discarded, assuming that it was an outlier that is unlikely to represent true yield variability because it was not spatially correlated with its immediate neighborhood. Depending on the site and year, the cleaning algorithm removed about 10 to 20% of the original yield monitor records.

To eliminate yield variation caused by different crops or cultivars, each data point was normalized by dividing it by the average of the corresponding cultivar (or hybrid) and/or crop for a given field and year. The resulting yields were the relative percentage yield as used by Blackmore (2000) and indicate how the yield at each point differs relative to the mean of the field. The normalized point yield data were then interpolated to a 4- by 4-m grid using ordinary kriging (Minasny et al., 2002). This resulted in interpolated yield maps for each site-year as well as maps of the MY, its SD, and the CV for each grid cell. Descriptive statistics were computed for both the cleaned original point yield data and the normalized and interpolated yield maps. In addition, the fractal dimension (Dv) was calculated using the semivariogram method proposed by Burrough (1983). Fractal dimension can be interpreted as an index for the overall type of spatial variability (Anderson et al., 1998). A higher value indicates large noise or short-distance variation, whereas a smaller number indicates more spatially structured, smoother variation over larger distances.

Yield Classification Procedures
Yield classification was performed using empirical methods as well as hierarchical and nonhierarchical cluster analysis techniques (Table 1). Except for the group of empirical methods, all other methods were performed for all 18 combinations of three different sets of input data and six levels of class numbers. Input data were either MY (univariate classification), MS (bivariate classification), or AY (multivariate classification). The number of classes ranged from three to eight in steps of one.


View this table:
[in this window]
[in a new window]
 
Table 1. Procedures used for classification of multiple-year yield map data.

 
Empirical Yield Classification
The four empirical procedures included one published method (Blackmore, 2000) and three classification protocols proposed by the authors, which were based on frequency distribution characteristics and presumed expert knowledge about yield and its temporal stability with regard to potential SSCM decisions.

MCV-3: Three yield classes were arbitrarily defined (Blackmore, 2000) using the maps of mean relative yield and its CV among years at each site. Each grid cell was allocated to one of three yield classes: high-yielding and stable (yield > field MY and CV < 30%), low-yielding and stable (yield < field MY and CV < 30%), or unstable (CV >= 30%).

MSD-3: Three yield classes were arbitrarily defined using the maps of mean relative yield and its SD among years at each site. Each grid cell was allocated to one of three yield classes: high-yielding and stable (yield >= field MY and SD <= field mean SD), low-yielding and stable (yield < field MY and SD <= field mean SD), or unstable (SD > field mean).

MSD-4: Four yield classes were arbitrarily defined using the maps of mean relative yield and its SD among years at each site. Each grid cell was allocated to either: high-yielding and stable (yield > 66% percentile and SD <= 66% percentile), medium-yielding and stable (yield within 33 to 66% percentile and SD <= 66% percentile), low-yielding and stable (yield < 33% percentile and SD <= 66% percentile), or unstable (SD > 66% percentile).

T90-3 and T60-3: Three yield classes were arbitrarily defined using a t test with 90 or 60% probability. Assume that the field is composed of m grid cells for which yield was measured in n years. Yield Yij corresponds to the ith cell and jth year, and Yj is the average field yield in a particular year. To test whether the yield in a particular cell is higher or lower than the field average, relative yield differences, yij, were calculated as:

[1]

In this case, yij indicates the percentage of yield compared with the average in a given year. If yij > 0, the yield was higher than the average; if yij < 0, the yield was lower than the average. To judge the yield potential for a particular cell i, it is necessary to find out whether the average cell yield across multiple years (yi) is significantly different from 0:

[2]

The statistical comparison is then based on the mean cell yield (yi) and the corresponding SD (si). If the absolute value of a positive mean is large, and si is small, the yield potential for that cluster is significantly large. If the absolute value of a negative mean is large and si is small, the yield potential for that cluster is significantly small. However, it is difficult to draw a conclusion if si is large or the absolute value of the mean is small. The t statistic is calculated for each cell using:

[3]

If yi is different from 0, ti is large and a t-distribution table (tstat) can be used to see if ti is large enough to claim significance. A desired probability (here 60 or 90%) and the degree of freedom (df = n - 1) must be specified. The decision about class membership is then made using: High: if ti > tstat and yi > 0, then the yield is significantly higher than the field average.

Yield Classification by Cluster Analysis
Ward's minimum variance method (SAS Inst., 1999) was used for hierarchal cluster analysis of yield data. This method agglomerates clusters in a hierarchy of all the individual objects until a single cluster contains all entities in which the within-cluster sum of squares for each given cluster number is minimized over all partitions obtainable by merging two clusters from the previous generation (Johnson and Wichern, 1998; SAS Inst., 1999).

Nonhierarchical or dynamic clustering is recommended for populations that lack an inherent hierarchical structure (Webster and Oliver, 1990). We used the k-means method (SAS Inst., 1999), in which the multidimensional data set is divided into k clusters, and an item is assigned to the cluster whose centroid (mean) is nearest in terms of Euclidean distance. Reassignments take place, and each iteration reduces the least-squares criterion until convergence is achieved (Johnson and Wichern, 1998; SAS Inst., 1999).

The Iterative Self-Organizing Data Analysis (ISODATA) classification technique was designed for image classification based on spectral distance by iteratively classifying the pixels, redefining the criteria for each class, and classifying again to gradually emerge the spectral distance pattern (ERDAS, 1999). The ISODATA algorithm is similar to k-means clustering, but it allows for dynamic changes in the number of cluster centroids through splitting and merging of clusters (Jensen, 1996). Grid yield data were converted into ERDAS image data format using ArcGIS Spatial Analyst 8.2 (ESRI, Redlands, CA) and then processed with Spatial Modeler in ERDAS Imagine 8.5 (Leica Geosystems, Atlanta, GA) to perform the ISODATA classification.

Fuzzy-k-means clustering is an extension of the normal, crisp-k-means clustering method to account for uncertainties associated with class boundaries and class membership. As in k-means clustering, the iterative procedure minimizes the within-class sum of squares, but each object (or cell on a map) is assigned a continuous class membership value ranging from 0 to 1 in all classes, rather than a single class membership value of 0 or 1 used in the normal k-means clustering method (De Gruijter and McBratney, 1988). Fuzzy-k-means clustering was conducted using the FuzME program (Minasny and McBratney, 2002) with Mahalanobis distance and a fuzzy exponent of 1.2. Each cell was assigned to a single yield category based on the highest fuzzy membership value at this particular location.

Evaluation of Yield Classification Results
Several statistics were computed to assess the results of different yield classification procedures in terms of (i) yield variance accounted for and (ii) spatial agreement between two maps of yield classes. To compare the effectiveness of the different classification methods in explaining the yield variance in each year j, we used the complement of the relative variance, denoted as RVj (Webster and Oliver, 1990):

[4]
where S2W is the within-class variance and S2T is the total variance, both estimated by postclassification analysis of variance (ANOVA) for a particular year j. Similar to the R2 value of a regression, RVj is a measure of the proportion of variance accounted for by the classification. A perfect classification would result in zero within-class variance and a RVj of 1. For each yield classification method (combination of clustering method, data source, and number of classes), one-way ANOVA was conducted for each individual yield year based on the assigned yield class membership values. An RVj value was then computed for each individual yield map year, and an average value (RVc) was computed across the five or six yield years at each site. Thus, RVc used in the context of this paper refers to the average yield variability accounted for by the classification during the time period studied for each site. Methods were then ranked according to the RVc. In addition, the range of the RVj in individual years at each site was used to assess how consistent a classification method performed in terms of accounting for annual yield variations. An ideal yield classification method would have an RVc close to 1 and a small range of the RVj among individual years, i.e., it would be able to explain a large proportion of the yield variability within each year. An ANOVA of the RVc values was conducted to test for differences among classification methods, number of classes, choice of data source, and years. Other criteria used were differentiation among classes in terms of mean class yields and within-class CVs.

The spatial agreement between two different maps of yield classes was evaluated using the weighted Kappa index of agreement for categorical data (Kw), which was defined by Cohen (1968) as

[5]
where pij represents the number of observations that have been classified as belonging to class i by the first classification method and to class j by the second classification method and wij (i = 1, 2, ..., k; j = 1, 2, ..., k) is the Fleiss–Cohen weight. The wij was defined by Fleiss and Cohen (1973) as

[6]
where Ci, Cj, Cc, and C1 are the scores of class i, j, c, and 1, respectively. The wij is restricted to 0 <= wij < 1, with wii = 1 and wij = wji. The Kw ranges from 0 to 1, with 1 as perfect map agreement. Kappa statistics were computed for all possible map comparisons of clustering methods, with six yield classes in each method.


    RESULTS
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 MATERIAL AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Spatial and Temporal Yield Variability
Average maize yields at Site A ranged from 10.8 to 13.3 Mg ha-1 (Table 2). Maximum yields in each year were 16.1 to 20.0 Mg ha-1, which is equivalent to the simulated climatic yield potential for this environment (Dobermann et al., 2003). Relative spatial yield variability in each year was modest, with CVs ranging from 12 to 18% for maize and 20 to 23% for soybean. Fractal dimensions varied from 1.81 to 1.90, indicating that much of the yield variation occurred over shorter distances. Yields were generally higher and less variable in the center-pivot–irrigated area compared with partially or nonirrigated pivot corners (Fig. 1) . About 70% of the field had a CV of relative yield across years of 10% or less. Areas with high temporal yield variability (CV > 30%) accounted for only 2.3% of the entire field. Areas with low average yield and large temporal yield variability were found in (i) narrow headlands on the eastern and western side (machine traffic), (ii) the partially gravity-irrigated northeast and southeast corners, and (iii) the nonirrigated southwest corner.


View this table:
[in this window]
[in a new window]
 
Table 2. Summary statistics of spatial yield variability at Sites A (Clay Center, NE) and B (Cairo, NE) before and after standardization and interpolation.

 


View larger version (117K):
[in this window]
[in a new window]
 
Fig. 1. Maps of mean, standard deviation (SD), and coefficient of variation (CV) of relative yield at Sites A (Clay Center) and B (Cairo). Light colors show high-yielding areas; dark colors show low-yielding areas with high yield variability among years.

 
Average maize yields at Site B ranged from 10.3 to 13.2 Mg ha-1, and maximum yields in each year were 16.3 to 20.1 Mg ha-1 (Table 2). Spatial yield variability was modest, with CVs ranging from 14 to 19%, but compared with Site A, the overall yield range at Site B was larger. Fractal dimensions varied from 1.73 to 1.81, indicating that yield variation was somewhat more spatially structured and occurred over longer distances that at Site A, mostly related to changes in elevation across the field. High-yielding areas were located in the northwest and central-eastern parts of the field (Fig. 1). Low-yielding areas occurred in (i) narrow headlands on the eastern and western side due to machine traffic; (ii) the eroded, sloping ridge crossing the field from southwest to northeast; (iii) a poorly drained area in the northeast corner; and (iv) the southeast corner where subsoil clay excavated from adjacent roadwork was disposed. The lowest-yielding areas with relative yield ranges of 0.08 to 0.45 accounted for just 1% of the entire field and were located along the eastern edge and the northeast corner. About 79% of the field had a CVs of 10% or less across years. Areas with a temporal yield CV of greater than 30% accounted for 2.3% of the entire field.

Even after elaborate cleaning of the yield monitor raw data, frequency distributions of yield remained skewed to the left at both sites due to significant proportions of low-yielding areas. Similar observations were made in other studies (Stafford et al., 1996; Taylor et al., 2001). Medians were slightly larger than the means (Table 2). Data standardization and interpolation slightly reduced the CVs for spatial variability in each year but tended to increase Dv (Table 2). Interpretation of Dv as a measure of spatial yield variability is problematic because it mainly represents the rate of change of variation with area (Pringle et al., 2003). The relatively narrow ranges of Dv found in our study (1.81–1.90 for the interpolated data) and those reported by Pringle et al. (2003) for 20 different crop fields in Australia (1.73–1.99) suggest that Dv may not be sensitive enough to describe relatively small differences in magnitudes of variability with regard to SSCM opportunities.

At both sites, linear correlation coefficients of yields measured in different years ranged from 0.38 to 0.74 but were mostly greater than 0.60 (Table 3). In general, the strong yield correlations suggest that irrigation reduced spatial as well as temporal yield variability among years, resulting in relatively stable yield patterns over time.


View this table:
[in this window]
[in a new window]
 
Table 3. Linear correlation coefficients between yields in different years at each site. Correlations were calculated for standardized and interpolated yield maps prepared for each year.

 
Yield Variability Assessed through Empirical Classification Methods
Empirical classification methods performed worse than cluster analysis techniques in terms of RVc accounted for (Table 4). On average, empirical methods accounted for 36% of the yield variability at Site A and 32% at Site B, but differences occurred among the four empirical methods.


View this table:
[in this window]
[in a new window]
 
Table 4. Average yield variance accounted for by the classification (RVc) as affected by different classification methods, data sources used, number of yield classes, and years.

 
Blackmore's method (MCV-3) had an RVc of 0.45 at Site A (range of RVj from 0.30–0.60 in individual years) and 0.54 at Site B (range of RVj from 0.42–0.69). It resulted in three yield classes with distinctively different means (Table 5) but little differentiation of the resulting yield class map (Fig. 2) . At both sites, 2.3% of the field area was classified as unstable (CV > 30%), mostly along the eastern and western headlands. Between 23 and 30% fell into the low and stable class, which mainly represented pivot corners at Site A and eroded soils and not fully irrigated parts of Site B (Fig. 2). At Site A, about 75% of the field was classified as high and stable, and this class covered almost the entire pivot-irrigated circle (Fig. 2) even though average yield varied within that area (Fig. 1). Similarly, at Site B, the same class accounted for 68% of the field.


View this table:
[in this window]
[in a new window]
 
Table 5. Mean relative yield, SD, CV, and the proportional area of yield classes delineated from mean yields.

 


View larger version (86K):
[in this window]
[in a new window]
 
Fig. 2. Maps of three crop yield classes formed by empirical classification procedures: (i) based on mean and CV of yield (MCV-3, Blackmore, 2000), (ii) based on mean and standard deviation of yield (MSD-3), (iii) using t test at 90% probability based on mean and standard deviation of yield difference (T90-3), and (iv) using t test at 60% probability based on mean and standard deviation of yield difference (T60-3). Light colors show high-yielding areas; dark colors show low-yielding areas with high yield variability among years.

 
Methods MSD-3 and MSD-4 performed worst, with an RVc across all years of 0.16 (MSD-3) or 0.20 to 0.23 (MSD-4). The MSD-3 method allocated more than 50% of each field to the high and stable yield class, and MYs were the same for the low-yielding and stable and unstable classes (Table 5). The MSD-4 method resulted in a more even spread of yield classes in terms of MYs and area covered (Table 5), but the resulting yield class maps contained much noise and potential misclassifications (Fig. 2). In MSD-4, for example, 30 to 33% of the field was classified as unstable (Table 5), which included some areas with high average yield that also had somewhat higher SD of yield across years (compare Fig. 1 and 2).

The t-test–based procedures accounted for 48 to 50% of yield variability at Site A and 35 to 37% at Site B (Table 5), indicating that results obtained with this method depend on the type of spatial yield variability at a particular site. Moreover, yield class allocation depended on the choice of an acceptable probability for the t test. Using the relatively strict criterion of 90% probability, almost 53% of the field area at both sites was classified as variable (Table 5 and Fig. 2). Relaxing the probability criterion to 60% resulted in very different yield class maps, in which most of the area (54–61%) was classified as high, resembling the maps obtained with the MCV-3 method (Fig. 2).

Yield Variability Assessed through Cluster Analysis
If used with the optimal choice of input data and number of classes, yield classes established by cluster analysis techniques accounted for more than 60% of the yield variability at Site A and more than 65% at Site B (Fig. 3) . The RVc of all methods was below 0.50 in only 2 out of 11 crop field data sets (Table 4, 1996 and 1998 at Site A). Based on the average RVc of all 18 possible combinations of data sources and number of yield classes, the ranking of cluster analysis methods was Ward's method (WAR) = ISODATA classification = k-means cluster analysis (KME) > fuzzy-k-means clustering at both sites (Table 4). On average, across all methods and years, RVc was similar for all three data sources at Site A but decreased in the order MY > MS > AY at the more variable Site B (Table 4). Although Table 4 provides an overall ANOVA summary of method performance, more differentiation is required to understand how individual clustering methods were affected by the choice of input data and the number of classes.



View larger version (26K):
[in this window]
[in a new window]
 
Fig. 3. Average yield variance accounted for by the classification (RVc) as a function of data sources used and the number of classes selected: (i) hierarchical cluster analysis using Ward's methods (WAR), (ii) nonhierarchical cluster analysis using k means (KME), (iii) unsupervised nonhierarchical ISODATA clustering method (ISO), and (iv) nonhierarchical fuzzy-k-means cluster analysis (FUZ).

 
If univariate classification was performed on MY data, the choice of a particular clustering method had little effect on how much yield variability was accounted for. Using MY as the data source, RVc did not differ significantly among clustering methods. The RVc for methods WAR, KME, ISODATA classification, and fuzzy-k-means clustering in combination with MY data and three to eight classes ranged from 0.59 to 0.61 at Site A and 0.63 to 0.66 at Site B, and differences among methods were not statistically significant. Similar observations were made for the WAR, KME, and ISODATA classification methods applied to MS or AY data. However, fuzzy-k-means clustering had lower RVc than other methods if used with the MS or AY data (Fig. 3), and this difference was statistically significant. For example, using MS data, RVc of three to eight classes was 0.58 to 0.60 for methods WAR, KME, and ISODATA classification at Site A compared with 0.48 for fuzzy-k-means clustering. Using AY data, the RVc of three to eight classes was 0.59 to 0.63 for methods WAR, KME, and ISODATA classification at Site A compared with 0.42 for fuzzy-k-means clustering. Similar results were found at Site B.

Hierarchical cluster analysis using Ward's method and nonhierarchical ISODATA classification were the only methods for which RVc increased using AY data compared with MY or MS, particularly at six or more yield classes (Fig. 3). Maximum RVc achieved with both methods was 0.68 to 0.69 in Field A and 0.72 in Field B at eight yield classes. Little sensitivity of RVc to the choice of data source was shown by k-means clustering, but the fuzzy-k-means method was very sensitive to both choice of input data and the number of classes selected (Fig. 3). In fuzzy-k-means clustering, RVc increased only slightly with increasing number of classes and use of MY data but rose steeply with AY data use. Using AY and less than six (Site A) or seven (Site B) classes, RVc values were less than 0.45 at both sites. Even at six to eight classes, RVc achieved with MS or AY data remained lower than that achieved with MY data in the fuzzy-k-means clustering method.

The RVc values increased with the number of classes (Fig. 3). At Site A, increasing the number of yield classes beyond six did not significantly increase RVc, whereas at Site B, six to seven yield classes resulted in the highest RVc. On average, across all method combinations tested, six yield classes accounted for 61 to 63% of the RVc (Table 4).

Methods also differed in their ability to account for yield variation in each individual year, as expressed by the range of RVj at each site among the 5 to 6 yr of yield map data analyzed. Irrespective of the classification method or number of classes, ranges of RVj among years were smaller at Site B than at Site A (Fig. 4) . This may reflect yield variability that is more spatially structured and temporally consistent at Site B, which also had no nonirrigated areas compared with Site A. Except for the fuzzy-k-means clustering–AY method, RVj ranges were not affected by the number of classes in the range of five to seven yield classes (Fig. 4). For some methods (e.g., WAR–AY, ISODATA classification–AY, fuzzy-k-means clustering–MS, and fuzzy-k-means clustering–AY), the minimum RVj in a particular year increased with slightly increasing mean RVj (=RVc) and number of classes, but the differences were mostly small.



View larger version (26K):
[in this window]
[in a new window]
 
Fig. 4. Ranges (min.–max. bars) and means (circles) of yield variance accounted for by the classification at Sites A and B. The width of the bars indicates how well a particular classification method performed in accounting for the yield variation in each of the five (Site A) or six (Site B) different years. Methods shown: hierarchical cluster analysis using Ward's methods (WAR), nonhierarchical cluster analysis using k means (KME), unsupervised nonhierarchical Iterative Self-Organizing Data Analysis (ISODATA) clustering method (ISO), and nonhierarchical fuzzy-k-means cluster analysis (FUZ). For each method, values are shown for three different data sources [mean yield (MY), mean and standard deviation of yield (MS), and yields in all years (AY)] and five to seven classes.

 
For most methods, six yield classes provided an acceptable compromise in terms of both high RVc, narrow RVj range, and practical class interpretation. Maps of six yield classes for all clustering methods and data sources are shown in Fig. 5 (Site A) and Fig. 6 (Site B). Table 5 shows the descriptive yield class statistics for using MY data and six yield classes. All clustering techniques shown in Table 5 resulted in high RVc (0.61–0.69) and yield classes with significantly different mean relative yields, but the ranges of class means and class proportions of the total field area differed among methods. At both sites, the ISODATA method produced the most even area distribution among the yield classes, in which the largest class accounted for 32 to 37% of the field area (Fig. 5 and 6). In contrast, in most other methods, the largest single yield class accounted for more than 40% of the field area, up to 68% for KME–MY-6 at Site A. Within-class CVs of yield ranged from about 1 to 10% for the four highest-yielding classes in each method. Due to lower means and sometimes also higher SD, CVs were in the 6 to 45% range for the two lowest-yielding classes, which occupied from 1 to 18% of the whole field, mostly along headlands and in nonirrigated parts of the fields.



View larger version (116K):
[in this window]
[in a new window]
 
Fig. 5. Maps of yield classes at Site A (Clay Center) formed by hierarchical and nonhierarchical clustering procedures as affected by the choice of input data: (i) hierarchical cluster analysis using Ward's methods and six classes (WAR-6), (ii) nonhierarchical cluster analysis using k means and six classes (KME-6), (iii) unsupervised nonhierarchical Iterative Self-Organizing Data Analysis (ISODATA) clustering method using six classes (ISO-6), and (iv) nonhierarchical fuzzy-k-means cluster analysis using six classes (FUZ-6). For each method, maps are shown for three different data sources [mean yield (MY), mean and standard deviation of yield (MS), and yields in all years (AY)]. Light colors show high-yielding areas; dark colors show low-yielding areas with high yield variability among years.

 


View larger version (101K):
[in this window]
[in a new window]
 
Fig. 6. Maps of yield classes at Site B (Cairo) formed by hierarchical and nonhierarchical clustering procedures as affected by the choice of input data: (i) hierarchical cluster analysis using Ward's methods and six classes (WAR-6), (ii) nonhierarchical cluster analysis using k means and six classes (KME-6), (iii) unsupervised nonhierarchical Iterative Self-Organizing Data Analysis (ISODATA) clustering method using six classes (ISO-6), and (iv) nonhierarchical fuzzy-k-means cluster analysis using six classes (FUZ-6). For each method, maps are shown for three different data sources [mean yield (MY), mean and standard deviation of yield (MS), and yields in all years (AY)]. Light colors show high-yielding areas; dark colors show low-yielding areas with high yield variability among years.

 
Spatial Agreement among Yield Classes
A high average RVc and small range among years do not mean that a yield class map is useful for SSCM because the map may be too fragmented or affected by artifacts in the yield data. At issue is (i) whether the yield classes mapped were spatially consistent across different method choices and (ii) whether the classes formed represented spatially contiguous areas that would be large enough for discrete management decisions. Except for the KME method, spatial fragmentation generally increased in the order MY < MS < AY as data source. Visually, this can be seen as greater scattering in many maps as the data source is changed from MY to MS or AY (Fig. 5 and 6).

The maps of yield classes (Fig. 5 and 6) showed relatively small differences between using MY and MS data for the classification. Kappa coefficients describing the spatial agreement among the maps of yield classes ranged from 0.56 to 0.94 for using WAR, KME, ISODATA classification, and fuzzy-k-means clustering methods in combination with either MY and MS data (Table 6). However, for all methods, map agreement was generally poorer between MY and AY or MS and AY as data sources because maps produced with AY data tended to show artifacts that were related to yield monitor data in a single year. For example, at Site A, the yield map of 1996 showed a rectangular high-yielding zone in the southeastern quarter of the field, mainly due to high yield mapped there in 1996. This area also stands out as a distinct high-yield class unit in WAR–AY-6, ISODATA classification–AY-6, or fuzzy-k-means clustering–AY-6 but not to the same extent when MY data were used for the classification (Fig. 5).


View this table:
[in this window]
[in a new window]
 
Table 6. Weighted Kappa coefficients (Kw) describing the spatial agreement among categorical maps of six yield classes at Sites A and B generated by heirarchical [Ward's method (WAR)] and nonhiearchical [ISODATA (ISO), k means (KME), and fuzzy k means (FUZ)] clustering techniques. For each clustering method, the upper part oft he table shows Kw among the three different data sources used [mean yield (MY), mean yield and standard deviation (MS), and all years (AY)]. For each data source, the lower part oft he table shows the Kw among different clustering methods. All Kw values were significant at p < 0.0001.

 
Map agreement among classification methods was best when the classification was based on MY data only. Kappa coefficients ranged from 0.69 to 0.98 for using MY data and either WAR, KME, or fuzzy-k-means clustering classification methods. However, using ISODATA yield classification resulted in maps that agreed less with those produced by any of the other methods (Kw = 0.45–0.72) because this method resulted in a more even proportional spread of yield classes (Table 5).


    DISCUSSION
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 MATERIAL AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Yield Classification Methods
None of the empirical yield classification methods resulted in RVc values that were close to those obtained with most cluster analysis techniques, and the results were sensitive to subjective decisions that also affected the spatial fragmentation of yield classes. The advantage of empirical classification methods is their simplicity and the ability to establish criteria based on expert knowledge. However, such results are not necessarily useful with regard to the yield variability accounted for if the number of classes differentiated is too small. Of the empirical methods tested, Blackmore's method based on cell MYs and cell CVs performed best in terms of RVc and small spatial fragmentation. However, due to the limited number of yield classes defined, this method may not result in more detailed differentiation of the higher-yielding, most profitable areas in a field. It may be suitable for varying inputs according to two or three abruptly changing yield categories, but it provides less opportunity for more continuous variable-rate management based on yield performance, particularly in the better parts of a field. Fine-tuning of the class criteria or increasing the number of yield classes similar to that used in cluster analysis could improve empirical methods.

Cluster analysis procedures such as WAR, KME, ISODATA classification, or fuzzy-k-means clustering produced more consistent results in terms of high RVc and more aggregated yield class patterns if used as univariate classification of mean relative cell yields or, to a lesser extent, based on bivariate classification of mean and SD of cell yields. If used with individual year yield data, techniques such as WAR and fuzzy-k-means clustering were more sensitive to individual years or extremes in the input data than KME or ISODATA classification, which was reflected in low RVc (fuzzy-k-means clustering at both sites), artifacts in the yield class maps (WAR and fuzzy-k-means clustering at Site A), or high spatial fragmentation (fuzzy-k-means clustering at Sites A and B; Fig. 5 and 6). Although the ISODATA method performed well in terms of a high RVc, the resulting maps were most fragmented, and this method required image-processing software. Using multivariate cluster analysis on multiple-year data (AY) bears the risk that the resulting yield classes may be affected by unusual events occurring in individual years or errors associated with the yield-mapping procedure. Therefore, such methods must be used with care and, preferably, only with larger time series of yield maps (>5 yr) in which individual years exert less weight on the classification result. At both sites, WAR, KME, and fuzzy-k-means clustering in combination with MY data gave similar results and can be recommended for further use.

Six yield classes provided an acceptable solution in terms of high RVc and are probably sufficient for practical purposes of delineating larger yield goal zones within a field. Of those, four classes had gradual differences in MY but generally low within-class yield variability. Within such discrete yield goal zones, management inputs could be varied more continuously based on soil variation. In addition, each field had two low-yielding classes, which represented marginal field areas that also had the largest spatial–temporal yield variability. Such zones must be managed differently from the core irrigated field area.

Similar to empirical yield classification, the application of cluster analysis techniques also involves a number of empirical choices such as classification method, similarity or distance measure, number of classes, or fuzzy exponent (in fuzzy-k-means clustering only). The influence of such choices on yield classification requires further study. Compared with empirical or standard clustering techniques, continuous (fuzzy) classification offers the additional advantage that uncertainties about class membership and the class boundaries can be mapped based on the fuzzy membership values in different classes (Burrough et al., 1997).

The clustering procedures compared here focused on maximizing the variance between classes and minimizing the variance within classes without constraints to form larger, uniform patches for management. Consequently, the resulting yield class maps showed much spatial fragmentation. At issue is whether other methods such as spatially constrained multivariate classification (Oliver and Webster, 1989) or the application of postclassification spatial-filtering techniques may further improve the results in terms of generating spatially aggregated, finite management elements without significant decrease in RVc. More research should also be conducted to study the effect of different interpolation grid size on patterns of yield classes.

Yield Classification in Irrigated and Rainfed Systems
The consistency of yield patterns for a given field depends on the particular site characteristics and crop management measures such as irrigation, which can reduce the interannual yield variation. In rainfed agriculture, crop yield variability from year to year is often large and driven by soil moisture, often in relation to topography and soil texture (Timlin et al., 1998). Stafford et al. (1996) reported low consistency of normalized winter barley (Hordeum vulgare L.) yield data for four successive years. Using 6 yr of yield data for a maize–soybean rotation field, Jaynes and Colvin (1997) concluded that at least 10 yr of yield data would be required to characterize spatio-temporal yield patterns. Lamb et al. (1997) found poor spatial consistency of rainfed maize yields over a 5-yr period and concluded that grain yield maps may be of little use for site-specific fertilizer recommendations or may require databases longer than the normally recommended 5 yr. Our study was conducted on irrigated land with relatively high yield stability. In the two fields studied, irrigation caused distinct yield differences compared with nonirrigated areas, but the temporal yield variability within the irrigated area was relatively small.

The value of mapping yield goals as a basis for varying production inputs such as N fertilizer has been questioned because many studies have shown little relationship between yields and economically optimum N rates or plant population densities (Doerge, 2002; Bullock et al., 1998). However, much of the previous research has been conducted under rainfed conditions and with varying only one input (e.g., N or plant density) at a time. Given the uncertainties about spatio-temporal yield variability due to climate and soil moisture, yield goal–based approaches to SSCM may have limited potential in rainfed environments. However, for high-yielding irrigated environments in which yields are less limited by water supply and approach potential ceilings, more quantitative approaches to plant nutrient management are required to fine-tune input use (Dobermann and Cassman, 2002). Recent studies on site-specific nutrient management in irrigated rice (Oryza sativa L.) systems of Asia have demonstrated the potential for such approaches (Dobermann et al., 2002b), but their use requires knowledge of yield potential and yield goals. Varying both plant density and N according to differences in the attainable yield potential is likely to be a key management option for exploiting the yield potential of irrigated maize (Dobermann et al., 2002a).

In summary, because crop yield response to production inputs is more predictable, mapping yield classes for spatially varying yield goals within a field is likely to be more important under irrigated conditions than in rainfed agriculture. More research is needed to better understand differences between rainfed and irrigated environments in the length of yield-mapping periods required for yield class mapping. Spatial variation in crop yield measured with a yield monitor is mainly a function of variation in climate, soil productivity, field management, and measurement error. If the latter is small and mostly random, and if climatic effects on yield are minimized due to irrigation, only a few years (about 5 yr) of yield map data may be required for a reliable yield classification. Under rainfed conditions, yield classification procedures may require a long time series of yield maps (perhaps at least 5–10 yr) to accurately predict expected yields and their probabilities. Therefore, crop modeling should complement yield classification and its interpretation for site-specific decision-making in such environments, provided that the available crop ecosystem models can accurately predict the yield potential and the interactions among major yield-determining factors.


    ACKNOWLEDGMENTS
 
We thank Lyle VonSpreckelsen (V6 Farms, Clay Center, NE) and Arnie Hinkson (Hinkson Land Tech, Wood River, NE) for providing the yield monitor data used in this study. Funding for this research was provided through the USDA-CSREES/NASA program on Application of Geospatial and Precision Technologies (AGPT, Grant no. 2001-52103-11303) and the U.S. Department of Energy: (i) EPSCoR program, Grant no. DE-FG-02-00ER45827, and (ii) Office of Science, Biological and Environmental Research Program (BER), Grant no. DE-FG03-00ER62996.


    NOTES
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 MATERIAL AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Contribution of the Nebraska Agric. Exp. Stn. Scientific J. Ser. Paper no. 14009.


    REFERENCES
 TOP
 NOTES
 ABSTRACT
 INTRODUCTION
 MATERIAL AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 




This article has been cited by other articles:


Home page
Crop Sci.Home page
C. L. Williams, M. Liebman, J. W. Edwards, D. E. James, J. W. Singer, R. Arritt, and D. Herzmann
Patterns of Regional Yield Stability in Association with Regional Environmental Characteristics
Crop Sci., July 1, 2008; 48(4): 1545 - 1559.
[Abstract] [Full Text] [PDF]


Home page
Agron. J.Home page
J. L. Ping, R. B. Ferguson, and A. Dobermann
Site-Specific Nitrogen and Plant Density Management in Irrigated Maize
Agron. J., June 23, 2008; 100(4): 1193 - 1204.
[Abstract] [Full Text] [PDF]


Home page
Agron. J.Home page
R. E. Massey, D. B. Myers, N. R. Kitchen, and K. A. Sudduth
Profitability Maps as an Input for Site-Specific Management Decision Making
Agron. J., January 11, 2008; 100(1): 52 - 59.
[Abstract] [Full Text] [PDF]


Home page
Agron. J.Home page
K. L. Martin, P. J. Hodgen, K. W. Freeman, R. Melchiori, D. B. Arnall, R. K. Teal, R. W. Mullen, K. Desta, S. B. Phillips, J. B. Solie, et al.
Plant-to-Plant Variability in Corn Production
Agron. J., November 17, 2005; 97(6): 1603 - 1611.
[Abstract] [Full Text] [PDF]


Home page
Agron. J.Home page
K. G. Hubbard and H. Wu
Modification of a Crop-Specific Drought Index for Simulating Corn Yield in Wet Years
Agron. J., October 19, 2005; 97(6): 1478 - 1484.
[Abstract] [Full Text] [PDF]


Home page
Agron. J.Home page
A. Roel and R. E. Plant
Factors Underlying Yield Variability in Two California Rice Fields
Agron. J., September 1, 2004; 96(5): 1481 - 1494.
[Abstract] [Full Text] [PDF]


Home page
Agron. J.Home page
G. C. Simbahan, A. Dobermann, and J.