Agronomy Journal Journal of Natural Resources and Life Sciences Education
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (23)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Jaynes, D. B.
Right arrow Articles by James, D. E.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Jaynes, D. B.
Right arrow Articles by James, D. E.
Agricola
Right arrow Articles by Jaynes, D. B.
Right arrow Articles by James, D. E.
Related Collections
Right arrow Maize
Right arrow Nutrient Management
Right arrow Production Agriculture
Right arrow Site-Specific Analysis
Agronomy Journal 95:574-586 (2003)
© 2003 American Society of Agronomy

CORN

Cluster Analysis of Spatiotemporal Corn Yield Patterns in an Iowa Field

D. B. Jaynes*, T. C. Kaspar, T. S. Colvin and D. E. James

USDA-ARS, National Soil Tilth Lab., 2150 Pammel Dr., Ames, IA 50011

* Corresponding author (jaynes{at}nstl.gov)

Received for publication December 13, 2001.

    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 NOTES
 RESULTS
 SUMMARY
 REFERENCES
 
Crop yields are frequently heterogeneous across space and time. We performed this study to determine if cluster analysis could be used to decipher the temporal and spatial patterns of corn (Zea mays L.) yield within a field. Nonhierarchal cluster analysis was applied to 6 yr of corn yield data collected for 224 yield plots on a regular grid on the southern half of a 32-ha field. We were able to group the yield observations into five temporal yield patterns or clusters. The clusters were not randomly distributed across the field but instead formed contiguous areas roughly equivalent to landscape positions. Cluster membership was determined primarily by yield differences in years with growing season precipitation greater than the 40-yr average. A multiple discriminant analysis was used to predict the spatial occurrence of the clusters from easily determined field attributes: soil electrical conductivity, elevation, slope, and plan and profile curvature. The multiple discriminant functions were unable to distinguish between the two clusters located on the lowest portions of the landscape. Because of similar temporal yield patterns in these two clusters, they were combined and the multiple discriminant analysis repeated for four clusters. Using a holdout sample approach, we achieved 76 and 80% success rates in classifying the yield plots into the correct yield clusters. If response curves for inputs such as N prove to be unique for the different yield clusters, then clustering of multiple-year yield data may prove an effective method for determining management zones within fields.

Abbreviations: AS, aspect • DEM, digital elevation model • DGPS, differential global positioning system • GIS, geographic information systems • GPS, global positioning system • EC, soil electrical conductivity • EL, elevation • PL, plan curvature • PR, profile curvature • SC, surface soil color • SL, slope • UTM, Universal Transverse Mercator


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 NOTES
 RESULTS
 SUMMARY
 REFERENCES
 
A MAJOR OBSTACLE to applying precision agriculture practices to optimize crop production is identifying management zones within a field. This is a major challenge because grain yields typically do not vary only across space, but the spatial patterns vary among years as well (Eghball and Varvel, 1997; Lamb et al., 1997). For example, Jaynes and Colvin (1998) reported that rank correlations among 3 yr of corn yield data from 224 small plots within a 16-ha field ranged from 0.54 to -0.09. For 3 yr of yield data for soybean [Glycine max (L.) Merr.] grown in rotation with the corn on the same plots, they reported rank correlations of 0.52 to 0.03. Similarly, Lamb et al. (1997) found that the pattern in corn grain yield from one year accounted for only 4 to 42% of the spatial pattern of corn grain yield in subsequent years. Much of this temporal variation in yield pattern is due to dynamic interactions among climate, landscape, soils, plants, and management practices.

Lack of temporal stability in yield patterns makes site-specific prediction of yield from previous yield observations or from intrinsic soil properties suspect (Eghball and Varvel, 1997). For example, simple statistical relationships can be developed between spatial yield patterns for a given year and soil properties such as texture, apparent electrical conductivity, organic matter content, or terrain attributes (Yang et al., 1998). However, the relationships are not the same among years (Jaynes et al., 1995b; Halvorson and Doll, 1991), unless the yield data is divided into subsets based on growing season climatic conditions or field attributes (Kaspar et al., 2003; Timlin et al., 1998). For example, Kaspar et al. (2003) were able to develop a multiple-regression equation based on terrain attributes that accounted for 78% of the yield variability for 6 yr of corn grown on a 16-ha field by restricting the analysis to the 4 yr having below-normal seasonal precipitation. When they included the 2 yr with above-normal growing season precipitation, only 26% of the yield variability could be accounted for by multiple regression.

Rather than trying to predict specific yields within a field, we may be more successful in identifying areas within a field that behave similarly among years. Most studies have approached this problem by first identifying areas of similar soil properties or terrain attributes and then testing for reduction in yield variability or yield response to inputs within these areas (Fraisse et al., 2001; Fleming et al., 2000). These methods rely on using general knowledge of crop production to identify the important yield-limiting properties to include in the analysis, and the results are often valid for individual years only. An alternative approach is to analyze long-term yield data to search for similar patterns of yield across years within a field (Lark and Stafford, 1997; Boydell and McBratney, 2002). For farming by management zones to be practical, one would like to identify a limited number of temporal yield patterns within a field, and the areas of the field exhibiting similar patterns should form spatially simple, contiguous areas.

Cluster analysis of multiyear yield data may be an effective tool for identifying temporal yield patterns. Cluster analysis consists of a three-part process. In the first step, a clustering algorithm is used to partition all members of a population into one of several clusters or groups such that the sum of squares for members within a cluster is minimized while the sum of squares between members of different clusters is maximized. After partitioning the data into clusters, the second step attempts to understand and characterize the nature of each cluster by examining population statistics of the members of each cluster and, in the case for management zones, the spatial distribution of the cluster members. This interpretation step can help us gain valuable insights as to the specific characteristics of the cluster members that determine cluster membership. The third step, termed profiling, is used to relate the occurrence of specific clusters to auxiliary data such as terrain or soil properties not used in the original cluster partitioning. Profiling can help determine why a yield cluster exists at a particular location in a field and is a powerful step in cluster analysis as profiling information can be used to predict the occurrence of yield clusters in areas of the field where yield data is lacking.

The auxiliary data used in the profiling step should include variables relevant to crop growth and yield. Preferably, the variables would be easy to measure or readily available so that the spatial distribution of yield clusters could be extrapolated to other locations. Remotely sensed variables are ideally suited for this purpose because they may already be available or can be obtained quickly and inexpensively. For example, georeferenced, color infrared digital aerial photography of bare soil for much of the USA is available from the U.S. Geological Survey and third-party venders. Alternatively, photographs of fallow fields can be taken with standard color film or slides and surface soil color (SC) easily georeferenced (Chen et al., 2000). Digital coverages of bare soil have proven to be useful in determining the areal distribution of soil properties such as soil organic carbon and may be effective in profiling areas belonging to a specific yield cluster (Varvel et al., 1999; Chen et al., 2000).

Terrain attributes have also been shown to be valuable in delineating soil properties important to crop production (Afyuni et al., 1993; Brubaker et al., 1993; Moore et al., 1993; Timlin et al., 1998; Kravchenko et al., 2000) and for differentiating areas of contrasting yield (Halvorson and Doll, 1991; Yang et al., 1998; Kaspar et al., 2003). A digital elevation model (DEM) for a field can be developed rapidly with centimeter accuracy using kinematic differential global positioning system (DGPS) equipment and a tractor or all-terrain vehicle (Clark and Lee, 1998). With a DEM, primary terrain attributes such as slope (SL), aspect (AS), plan curvature (PL), and profile curvature (PR) can be calculated with readily available software. Thus, terrain attributes may serve as practical auxiliary data in profiling cluster membership within a field.

When collecting elevation (EL) data with a roving vehicle, apparent soil electrical conductivity (EC) can be measured easily at the same time (Jaynes et al., 1993; Sudduth et al., 1999). Soil electrical conductivity has been shown to be highly correlated with soil properties such as clay content (Williams and Hoey, 1987), water content (Kachanoski et al., 1988), soil salinity (Rhoades and Corwin, 1981), soil organic matter content (Jaynes et al., 1995b), and cation exchange capacity (McBride et al., 1990)—all properties that can influence crop yield.

These field attributes appear well suited as auxiliary data for the profiling stage of cluster analysis as they are either readily available or easily measured and have been shown to correlate with yield or soil properties known to affect yield. Other field variables such as soil pH, topsoil depth, and texture also could be used but would require much more time and labor for sampling and laboratory analysis and thus would not be conducive to large-scale, low-cost applications. Ordinate and class variables such as soil type could be helpful in the profiling process, but we will limit this analysis to continuous variables so that multiple discriminant statistical methods can be used in the profiling process.

The objective for this study was to examine if the three steps of cluster analysis—partitioning, interpretation, and profiling—could provide useful interpretations of multiyear corn yield data. In the profiling step, auxiliary data were limited to EC, SC, EL, and the primary terrain attributes derivable from EL because these field attributes can be easily and intensively measured over large areas such as a farm in a relatively short time and would therefore be well suited for routine application.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 NOTES
 RESULTS
 SUMMARY
 REFERENCES
 
The study was conducted on a 32-ha field in central Iowa composed of soils in the Clarion (fine-loamy, mixed, mesic Typic Hapludolls)–Nicollet (fine-loamy, mixed, mesic, Aquic Hapludolls)–Webster (fine-loamy, mixed, mesic, Typic Haplaquolls) association, which have formed in calcareous glacial till. The field was characterized by low-relief swell and swale topography representative of the Des Moines lobe of the Cary substage of glaciation. Detailed information on the soils can be found in Steinwand and Fenton (1995) and Steinwand et al. (1996). Field management was typical for Iowa and has been described in detail in Karlen and Colvin (1992) and Colvin et al. (1997). The field has been in a 2-yr rotation of corn and soybean since 1957. For the past 16 yr, tillage has been by chisel plow and field cultivator, and a 76-cm row spacing has been used. Fertilizer management consisted of at least 168, 38, and 93 kg ha-1 N, P, and K, respectively, applied in the fall after soybean since 1981. Historically, the field had been managed as north and south halves, with crops alternating on each half and rows oriented east to west with our intensive observations limited to the south half. Beginning in 2000, the farmer switched the field to a single soybean crop grown in rotation with corn and reoriented the rows to north to south. In response, the scale of our intensive observations has expanded to include the entire 32-ha field.

We measured corn and soybean yields on the south 16 ha of the field for 11 consecutive years starting in 1989. Only the yield data in years when corn was grown (1989, 1991, 1993, 1995, 1997, and 1999) were considered in this study. Details of the harvest method used and a further description of the yield data for 1989–1995 can be found in Colvin et al. (1997) and Jaynes and Colvin (1998). Grain yield was measured along eight east–west trending transects spaced 48.8 m apart with a combine modified with a weigh hopper mounted inside the combine grain storage tank (Colvin, 1990). Yields from the three-row-wide transects were harvested by driving the combine for a measured distance and then stopping to weigh the harvested grain and measure its moisture content. While the position and length of each line were consistent from year to year, the exact length of each harvest plot varied and averaged slightly over 12 m. Yield data were linearly interpolated to correspond to harvest plots of uniform 12.1-m length, resulting in 28 harvest plots per transect. Thus, 8 by 28, or 224, yield values were measured on a regular grid each year (Fig. 1a) . Corn yields were adjusted and reported at 155 g/kg water content.



View larger version (74K):
[in this window]
[in a new window]
 
Fig. 1. Distribution of (a) 224 yield plots, (b) elevation, (c) 0 to 255 gray-scale soil color, and (d) apparent electrical conductivity within a 32-ha field.

 
Rainfall during the May through August growing season was used as an indicator of plant available water (Kaspar et al., 2003). Daily weather data from an Iowa State University research farm located 7 km from the study area were used (R. Carlson, personal communication, 2000). The weather data from this location were used also to compute the 40-yr average precipitation for the growing season.

Field Attributes
Elevation and position measurements were made for the 32-ha field in fall of 2000 after soybean harvest with a kinematic DGPS receiver (Ashtech Z Surveyor, Magellan Corp., Santa Clara, CA1) mounted on an all-terrain vehicle. Readings were logged every 1 s as the vehicle moved across the field at approximately 4 m s-1, giving measurements about every 4 m. North-to-south transects were driven approximately 8.7 m apart across the field. This data was supplemented by collecting additional EL data by driving along ridges and swales to minimize interpolation errors in the subsequent terrain model. A base-station global positioning system (GPS) receiver, located at a bench mark on the eastern edge of the field, was used to differentially correct the roving GPS receiver. The ground control locations were referenced to a Universal Transverse Mercator (UTM) projection (Zone 15, North American Datum 1983). Elevation values were estimated in height above the ellipsoid (m). Position measurements were reliably within ±0.03 m horizontally and ±0.06 m vertically for this equipment.

The EL data produced a digital terrain model comprising >10 000 points, which was used to generate a DEM for the field. The data were used to estimate EL for an 8- by 8-m regular grid across the field using Surfer 7.0 (Golden Software, Golden, CO) gridding software and a linear, isotropic variogram model. The grid was then smoothed and a 2- by 2-m DEM created by invoking the cubic spline smoothing operation of Surfer 7.0. The coarser grid followed by smoothing was used to reduce the roughness in the DEM caused by small errors in the DGPS EL measurements. The primary terrain attributes—EL (m), SL (the rate of maximum change in EL to surrounding grid cells, in degrees), PL (curvature of the surface perpendicular to the direction of SL, m-1; PL < 0 for curvatures that are concave upwards), PR (curvature of the surface in the direction of the SL, m-1; PR < 0 for curvatures concave upwards), and AS (direction of SL, in degrees, measured from north minus 180°)—were then calculated for each 2-m grid cell of the DEM using the Arc/Info geographic information system (GIS) software CURVATURE command (Arc/Info 1998, Environ. Syst. Res. Inst., Redlands, CA).

Soil electrical conductivity was measured on the transects at the same time EL measurements were made using an EM-38 electromagnetic induction EC meter (Geonics, Mississauga, ON, Canada). The EM-38 was attached to a fiberglass trailer in the vertical dipole position and pulled across the field with an all-terrain vehicle (Jaynes et al., 1993, 1995a). The meter was set at a height of 8 cm above the ground. Soil electrical conductivity was measured and recorded with the position data giving >8500 spatially registered EC measurements. A 2-m grid cell coverage of EC for the field was created from the survey data using the Arc/Info GIS TOPOGRID command and an isotropic, gaussian variogram model.

A color photograph of the field was taken from an airplane at an altitude of 1500 m on 16 May 2001 using ASA 100 film. The bare-soil photograph was taken after planting when most of the surface residue was no longer visible because of decomposition and incorporation. A 0 to 255 gray scale digital image of the photograph was created using a digitizer (ScanMaker III, Microtek, Redondo Beach, CA), with a resulting resolution of 0.029 m2 per pixel. The digital image was georeferenced using the Georeference tool in TNTmips (MicroImages, Lincoln, NE) and control targets of known location set around the perimeter of the field. The resulting georeferenced image was then rectified to the UTM projection using the Plane Projective Model in TNTmips. The digital image was resampled to create a 2-m grid cell coverage of SC for the field.

The 224 yield plots were digitized as polygons and overlaid on the EC, SC, and terrain coverages. The area within each of the 224 yield plot polygons was converted to a raster format with a 0.0625-m2 resolution. The values of the field attributes for each 0.0625-m2 yield polygon raster were taken from the underlying 2-m grid cells of the appropriate coverage. The mean value for each attribute within each of the 224 yield plots was calculated by arithmetic averaging of the values for all 0.0625-m2 rasters that fell within the specific yield plot polygon.

Statistical Analysis
Univariate statistics for corn yield for each of the 6 yr were calculated for standardized yields. Yields were standardized in anticipation of subsequent studies combining corn yield data with yield data of other crops grown on this field and to remove the larger influence that crops with greater yield variances would have on cluster formation (Hair et al., 1987). A global median and interquartile range for corn yield were computed using data from all crop years. The median and interquartile statistics were used as robust estimates of population statistics rather than the mean and standard deviation because of outliers caused by yields equal to zero in some yield plots in years when standing water drowned out the corn plants. All yields were standardized by subtracting the global median from each yield plot value and dividing by the global interquartile range. Distributions of yield and field properties were tested for normality using the Kolmogorov–Smirnov statistic. Because many distributions failed the test for normality and were not readily transformable into normal distributions, Spearman rank correlation coefficients were calculated for all variables.

Cluster Analysis
The 6 yr of standardized corn yield data were used in a disjoint nonhierarchical cluster analysis using PROC FASTCLUS (SAS Inst., 2000). Cluster analysis is a data reduction procedure where the 224 multiyear yield patterns for the yield plots are reduced to a few manageable types. The clustering algorithm attempts to group the yield plots into clusters with high internal homogeneity and high external or between-group heterogeneity. The clustering algorithm groups each yield plot such that it belongs to one and only one cluster. A k-means model was used where the cluster centers are based on a least sum-of-squares estimation. Cluster analysis was used repeatedly to divide the yield plots into two, three, four, five, six, seven, and eight clusters to determine the optimum number of clusters to use. We considered eight clusters to be the maximum number of yield areas of practical use as management zones. The cluster algorithm was run for a maximum of 10 iterations in each case or until the change in all cluster means between iterations was 0. A pseudo-F statistic and the cubic clustering criterion (Milligan and Cooper, 1985) provided by the FASTCLUS procedure were used as indicators of optimum cluster number.

The spatial autocorrelation of the resulting clusters was determined using Moran's I statistic (Moran, 1950; Upton and Fingleton, 1985). Moran's I is similar in concept to correlation and ranges from -1, where members of different clusters would be evenly interspersed like the colored squares of a checkerboard, to 0, for a completely random distribution, to 1, where members of a cluster would be grouped closely together in space. Moran's I was calculated using the Excel 1997/2000 Visual Basic routine written by Sawada (1999).

Multiple Discriminant Analysis
To help interpret the characteristics of the resulting clusters, a canonical multiple discriminant analysis (CANDISC, SAS Inst., 2000) was used to summarize the influence of the different crop years in determining the yield clusters. Multiple discriminant analysis is appropriate to use when the dependent variable (clusters) is categorical and the independent variables are continuous (Hair et al., 1987). Given a set of clusters, multiple canonical discrimination can derive composites of the variables (linear combinations of the 6 yr of corn yield data in this case) that summarize the differences between clusters. A maximum of n - 1 canonical composites can be computed for n number of clusters. The canonical composites were used to examine the effect of individual years in determining clusters by examining the correlations or loadings between the canonical composites and the yearly yield data (Hair et al., 1987).

For the profiling stage of the cluster analysis, multiple discriminant analysis was repeated using the field attributes as independent continuous data and yield clusters as the categorical data. First, the yield data from the 224 plots were randomly divided into calibration and validation subsets such that each subset had similar numbers of members from each cluster. Multiple discriminant functions were developed for data in the calibration subset. The accuracy of the developed functions was tested by using the functions to predict cluster membership of the plots in the validation data subset and comparing predictions with known membership. Calibrating and validating the discriminating functions on two different subsets of the data eliminates the upward bias of testing the functions on the same data set for which they were calibrated (Hair et al., 1987).

After splitting the data, a stepwise multiple discriminant analysis (STEPDISC, SAS Inst., 2000) was performed on the calibration subset to reveal which field attributes contributed significantly toward classifying yield plots into clusters. Values for EC, SC, EL, SL, PR, PL, and AS averaged for each yield plot were included in the analysis. The stepwise discriminant process is analogous to stepwise regression. In the first step, the field attribute that contributed most to discriminating between the clusters was brought into the discriminant model. In the second step, the attribute that contributed most to the discriminating power of the model, that was not already in the model, was entered into the model if it exceeded a preset significance level for entry. Before each new attribute was entered into the model, the method tested the attributes already in the model. If the attribute that contributed the least to the model failed to satisfy a preset significance level for remaining in the model, it was removed. The stepwise process continues until all attributes in the model meet the criterion to stay and none of the remaining attributes meet the criterion to enter the model. A moderate significance level (P = 0.15) was used for attributes both entering and leaving the model because this level was midrange of the levels found to perform best in the forward selection methods (Costanza and Afifi, 1979).

Field attributes found to be significant in the stepwise discriminant process were used to develop linear functions that discriminated between clusters. The calibration subset was used to develop a set of functional criteria based on the field attributes that best discriminated between the clusters (DISCRIM, SAS Inst., 2000). These discriminant functions can be used to categorize any location in the field into the proper yield cluster based on the values of field attributes at that location. The accuracy of the functions was tested by predicting cluster membership of the yield plots in the validation subset and comparing with known membership. The calibration–validation process was then repeated by reversing the roles of the two data subsets (double cross validation), giving two estimates for the accuracy of multiple discriminant functions in predicting proper cluster membership.

After evaluating the accuracy of the multiple discriminant functions, the yield data subsets were combined and a final set of multiple discriminant functions computed using all 224 yield plots. Canonical multiple discriminant analysis was used to evaluate the correlations or loadings between the field attributes and the canonical composites that best summarized the differences between the clusters. These canonical composites were first rotated to redistribute the variance using varimax rotation (SAS Inst., 2000). Rotation essentially preserves the original structure and reliability of the discriminant model while making the interpretation of the loadings easier (Hair et al., 1987). The loadings were then stretched by multiplying them by their univariate F value. Stretching the loadings gave a better indication of the importance of the field attribute in determining cluster membership. The stretched loadings for each attribute were plotted as vectors starting from the origin. Cluster means were also rotated, stretched, and plotted with the attribute loading vectors, resulting in a simple graphical representation of the importance of each attribute in distinguishing between clusters.

In a final step, the multiple discriminant functions developed using all 224 yield plots were used to partition the entire 32-ha field into yield clusters. Every 2-m grid cell within the field was assigned a cluster membership based on the field attribute values within the cell. Yield cluster assignment was thus possible even for areas where yield measurements had not been taken.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 NOTES
 RESULTS
 SUMMARY
 REFERENCES
 
Average corn yields exceeded 9 Mg ha-1 every year except 1993 when record rainfall and a cool summer depressed yields throughout central Iowa (Table 1). On average, 1997 had the highest yield of 10.27 Mg ha-1; however, the maximum yield of 13.78 Mg ha-1 within a single yield plot occurred in 1991. Yields in 1989 and 1995 were the most uniform, as indicated by smaller standard deviations. Wet weather in 1993 and 1999 caused water to pond at the surface and drown corn plants within a closed depression or pothole in the field resulting in no yields from several of the yield plots. Wet weather in April of 1991 also caused waterlogged conditions within some plots and greatly reduced yields in other plots. However, while the average yield in 1993 was depressed compared with other years, average yields in 1991 and 1999 were comparable to the other years. Yield distributions in every year exhibited negative skewness, indicating a tailing in the distributions toward lower yields, and failed the Kolmogorov–Smirnov test for normality.


View this table:
[in this window]
[in a new window]
 
Table 1. Univariate statistics for corn grain yield within 224 yield plots, and growing season precipitation (May–August) by year.

 
Spearman rank correlation coefficients for yields indicated that correlations between years were never strong (Table 2). Considering only the years when growing season precipitation was less than the 40-yr average (1989, 1991, 1995, and 1997), correlations among years were between 0.42 and 0.54 and significant (P < 0.01). However, the magnitude of these correlations indicates that linear predictions of yields in one year from yields in another year would explain at most 29% of the variation in corn yield. Correlations between these dry-year yields and the yields in 1993 and 1999, when growing season precipitation exceeded the 40-yr average, were very low and often negative, indicating that yield patterns were much different between wet and dry years. Similar to the dry years, the correlation between yields in the wet years was fairly strong (r = 0.51), but yield patterns in 1993 still explained no more than 26% of the yield variation in 1999.


View this table:
[in this window]
[in a new window]
 
Table 2. Spearman rank correlation coefficients, r, between corn grain yield by year and the field attributes—electrical conductivity (EC), soil color (SC), slope (SL), profile curvature (PR), plan curvature (PL), elevation (EL), and aspect (AS)—averaged for 224 yield plots.

 
Elevation across the 32-ha field ranged over about 4.5 m (Fig. 1b). This degree of vertical relief was typical of the gently rolling landscape of the Des Moines lobe soils (Andrews and Dideriksen, 1981) and was also reflected by the SL never exceeding 5°. Prominent EL features within the field included a closed depression or pothole in the south-central portion of the field and hills in the center and southeastern edge of the field (Fig. 1b). Digitized gray-scale SC ranged from 42 to 196 across the field, with the darker SCs represented by smaller values (Fig. 1c). Darker soils occurred on lower areas of the field where water, sediment, organic matter, and clay typically accumulate. Lighter colors were especially prominent in isolated areas on the hilltops, perhaps due to erosion of topsoil. Soil electrical conductivity ranged from 9.9 to 67.9 mS m-1, which we have found to be typical for soils in the area (Fig. 1d). Soil electrical conductivity was higher in the lower portions of the landscape where more poorly drained soils predominate and lower near the hilltops where well-drained soils are common (Jaynes, 1996).

Average values for these field attributes and PR, PL, and AS were computed for the 224 yield plots in the south half of the field (Fig. 1a), and their univariate statistics are summarized in Table 3. The ranges for the field attributes within the yield plots were nearly as large as the ranges for the entire field, indicating that the yield plots in the southern half of the field captured most of the variation present within the 32-ha field. None of the field attributes passed the Kolmogorov–Smirnov test for a normal distribution.


View this table:
[in this window]
[in a new window]
 
Table 3. Univariate statistics for the field attributes—electrical conductivity (EC), soil color (SC), slope (SL), profile curvature (PR), plan curvature (PL), elevation (EL), and aspect (AS)—averaged for the 224 yield plots.

 
The field attributes SC, SL, and EL exhibited strong positive correlations with each other (Table 2). This was compatible with the assumption that clay and organic matter have accumulated in the nearly-level, lower ELs of the field. Soil electrical conductivity had a strong negative correlation with SC, SL, and EL, indicating that high-EC values were typically measured in areas with darker soil color (lower color value) and with lower SLs and ELs, which was a direct result of EC being determined primarily by the water and clay content in these soils. Profile curvature, PL, and AS did not exhibit strong correlations with any of the other attributes. Correlations between the field attributes and yearly yield primarily ranged between 0.3 and 0.6. All of the field properties except PR had positive and negative correlations with yield in at least one year. Correlations between PR and yield were negative every year, but PR never accounted for more than 12% of the yield variation in any year. This lack of temporal stability in correlation between yield and field properties illustrates the difficulty in using simple statistical relationships to explain spatial yield patterns across years.

Cluster Analysis and Interpretation
The disjoint clustering routine was used to divide the yield plot data into two, three, four, five, six, seven, and eight clusters. Local maximum values for the pseudo-F statistic occurred for three and five clusters. The cubic clustering criteria also had the highest value at five clusters (not shown). Dividing the yield data into more than five clusters tended to create a number of clusters with only a few members, which is not a desirable outcome for clustering. Thus, the 224 yield plots were each assigned to one of five clusters using the least-squares criterion. Figure 2 shows the spatial distribution of the yield plots assigned to each of the five clusters and illustrates that the clusters were not randomly distributed across the field, but rather the yield plots assigned to the same cluster appeared to group together in space. The Moran's I statistic for the cluster distribution was 0.61 (P < 0.001), confirming that the clusters tended to group together within the field. This spatial correspondence was not a direct result of the clustering algorithm but rather must reflect the spatial correlation of the underlying properties and processes controlling yield. Temporal yield patterns clustered into spatially homogenous areas is encouraging for the application of precision farming techniques.



View larger version (53K):
[in this window]
[in a new window]
 
Fig. 2. Elevation and spatial distribution of the 224 yield plots in the southern half of the 32-ha field assigned to the five yield clusters.

 
The clusters appeared to be associated with landscape position. The fewest number of yield plots, seven, were grouped into Cluster 2, and these occurred all within the pothole (Fig. 1b). In the wet years, these plots experienced ponding of surface water for extended periods, which drowned or damaged the crop and resulted in much reduced (or no) grain yield. But in dry years, these plots had some of the highest yields measured for the field. This is illustrated in Fig. 3 where the average yields for the yield plots in each cluster, transformed back into their original measurement units, are plotted by year.



View larger version (18K):
[in this window]
[in a new window]
 
Fig. 3. Average corn yield by year for each cluster. Years when growing season precipitation was less than the 40-yr average are identified as dry and those greater than the 40-yr average as wet.

 
Nine of the 224 plots were assigned to Cluster 4. Yields for this cluster were comparatively low in every year (Fig. 3). The plots in this cluster were located near the top of the two highest hills, in roughly what Schumacher et al. (1999) termed shoulder positions. Visually, the soil in this shoulder cluster appeared sandier, rockier, and lighter in color than the rest of the field, which may explain the lower yields.

Yield plots comprising Cluster 1 were similar to the Cluster 2 pothole plots, with very good yields in the dry years and lower yields in the wet years (Fig. 3). But the wet-year yields for plots in this cluster were not as low as for Cluster 2 plots. The plots in this cluster tended to occur on nearly-level areas at lower ELs with open surface drainage and approximated the toeslope positions of the landscape. This toeslope cluster contained 18 of the 224 yield plots.

Cluster 3 contained nearly half of the 224 yield plots. Cluster 3 plots tended to be on gentle SLs on lower-EL or footslope positions. These plots tended to be at slightly higher ELs than the Cluster 1 plots and were not stressed as much by excessive soil wetness in wet years but apparently had adequate water in dry years. Plots in this footslope cluster had good yields every year compared with plots in other clusters.

Finally, Cluster 5 comprised 70 yield plots that tended to be distributed on the higher-EL, linear backslope positions within the field. Average yields within this backslope cluster were the best in wet years compared with the other clusters but were lower than yields in Clusters 1, 2, and 3 in dry years.

The four canonical composites were computed from the yield data to summarize the differences between the five clusters. Most of the difference in clusters (93%) was explained by the first two canonical composites (Table 4). Yields from 1993 and 1999, the two wet years, had the highest loadings on the first canonical composite, which accounted for 55% of the cluster separation. Thus, corn yield response in wet years appeared to be the principal determinant of the temporal yield pattern within the field. The second canonical composite accounted for about 38% of the variation and was most heavily weighted by yields in the dry years of 1989, 1991, 1995, and 1997 (Table 4). Thus, there was a significant yield response difference among clusters in dry years as well. The third canonical composite accounted for about 6% of the variation and was positively loaded by yields in 1989 and 1995 and negatively loaded by 1991 yields, all dry years. The negative loadings from 1991 yields may be due in part to the very wet March and April of that year even though the total growing season precipitation was below average. Thus, the canonical composite reflects a refinement of the simplistic categorization of wet and dry years represented by the first two composites. The fourth canonical composite was not significant and contributed little to the cluster separation.


View this table:
[in this window]
[in a new window]
 
Table 4. Canonical composites of the 6 yr of corn yield data that best summarize the differences between the five yield clusters, the proportion of cluster difference accounted for by each composite, the pseudo-F statistic, probability that canonical composite is not correlated with clusters, and the correlation or loading of the yearly yield data on each composite.

 
Multiple Discriminant Analysis and Profiling
Results of the stepwise discriminant analysis on the first data subset indicated that EC, PL, SL, and EL contributed significantly (P <= 0.08) to summarizing the differences between the five clusters (Table 5). Soil electrical conductivity had the highest partial R2 and entered the model first. Soil electrical conductivity was followed in order by PL, SL, and EL, indicating the magnitude of the partial R2 of the attribute before entering the model. For the final model, PL had the largest partial R2 of all attributes in the model. The large reduction in R2 for EC between entering the model and the final model was due to the correlation between EC and other attributes in the model.


View this table:
[in this window]
[in a new window]
 
Table 5. Summary of stepwise discriminant analysis using a split data set of the yield plot data divided into five clusters. (No attributes were removed at P = 0.15 after entering analysis for either data set.)

 
For the second data subset, the same attributes plus PR were significant and entered the model in the same order as in the first subset, with PR entering last (P < 0.05). Surface soil color may not have been important in distinguishing the clusters in either data subset (P > 0.15) because of the strong correlations between SC and two of the attributes included in the models, EC and SL (Table 2). The results of the stepwise multiple discriminant analysis agree with the findings by Kaspar et al. (2003), who found that SL, EL, PR, and PL were significant attributes in a stepwise multiple regression predicting corn yields in this field for the dry years, and reflect the overall importance of terrain on crop growth and yield (Yang et al., 1998).

Multiple discriminant functions were calculated independently for Data Subsets 1 and 2 using the five field attributes identified as significant in either of the stepwise analyses of the two subsets—EC, PL, SL, EL, and PR. Testing the accuracy of the multiple discriminant functions calibrated with Subset 1 on Subset 2 showed that the functions could not accurately classify yield plots into either Cluster 1 or Cluster 2, the pothole and toeslope clusters (results not shown). Only 3 out of 18 yield plots were correctly assigned to Cluster 1 and one out of seven to Cluster 2. Classification of yield plots into the other clusters however, exceeded 80% accuracy. Clusters 1 and 2 were contiguous (Fig. 2) and differed primarily by the severity of yield depression in wet years (Fig. 3). None of the field attributes used accurately captured the enclosed drainage area common to the pothole clusters and were thus inadequate for distinguishing yield plots that would drown out and those that would not. Results for the multiple discriminant analysis when the roles of the subsets were reversed were similar, with poor classification results for Clusters 1 and 2.

Because the discriminating functions could not assign yield plots to Clusters 1 and 2 reliably and because these clusters had very similar temporal yield patterns, the two clusters were merged into a single toeslope cluster (Cluster 1&2) and the multiple discriminant analysis rerun. Repeating the stepwise discriminant analysis with the yield plots divided into four clusters instead of five resulted in the same field attributes (EC, PL, SL, EL, and PR) being significant, and they were used to develop new multiple discriminant functions.

Testing the discriminating functions developed for Data Subset 1 on Subset 2 showed an improvement in classifying yield plots into combined Cluster 1&2, but less than half of the yield plots in this cluster were identified correctly (Table 6). The multiple discriminant functions accurately classified yield plots that were in Cluster 3 in 82% of the cases. Eighty percent of the yield plots in Cluster 4 were identified correctly while yield plots in Cluster 5 were correctly identified in 80% of the cases. Results for the discriminating functions developed for Data Subset 2 were similar, with 31% of the yield plots assigned to Cluster 1&2 in the holdout sample correctly identified, 87% correctly identified for Cluster 3, 100% for Cluster 4, and 71% for Cluster 5. Yield plots that were in Cluster 1&2 and incorrectly identified were always identified as belonging to Cluster 3. Yield plots in Cluster 5 that were misclassified were primarily classified as belonging to Cluster 3 also. Conversely, the most common misclassification of yield plots in Cluster 3 was into Cluster 5. Classification of Cluster 4 yield plots was the most reliable, with only one yield plot out of nine misclassified. Thus, the multiple discriminant functions had the least success in identifying proper membership for plots in the lower, nearly-level areas of the field.


View this table:
[in this window]
[in a new window]
 
Table 6. Percentages for predicting cluster affiliation for yield plots assigned to clusters in one data set based on multiple discriminant analysis calibrated using the other data set.

 
Overall, the discriminating functions developed from the split-data subsets classified the yield plots correctly 81 and 75% of the time. These classification percentages are high but are best evaluated against the maximum- and proportional-chance criteria (Hair et al., 1987) to gage the utility of the multiple discriminant functions. The maximum-chance criterion is equivalent to the classification percentage when all of the yield plots are classified into the single cluster with the greatest number of members (Cluster 3). The maximum-chance criteria was 54% for both data subsets. The proportional-chance criterion was computed by summing the squares of the fractional membership in each cluster and was 40% for both subsets. Thus, the classification success percentages for the two sets of discriminate functions exceeded both chance criteria by >20%, indicating that the field attributes were useful for classifying yield plots into the correct clusters but that additional information is needed to accurately classify all yield plots.

A final set of multiple discriminant functions was calculated by using all 224 yield plots in the calibration process. The canonical composites computed for the field attributes showed that only the first two composites were significantly correlated with the cluster means. The most important composite (Composite I) accounted for 82% of the discrimination between clusters while Composite II contributed 16% to the discrimination between clusters. Ideally, the yield plots would separate into four distinct clusters when plotted against the canonical composites of the retained field attributes. Figure 4 shows that there was separation in the clusters but that the separation was not perfect, i.e., the clusters were not compact and there was considerable overlap among cluster members. Cluster 4 separates the best of the four clusters, whereas there was considerable overlap between Cluster 1&2 and Cluster 3, illustrating that the multiple discriminant functions based on these field attributes were not consistently capable of assigning plots to these clusters. Figure 4 illustrates that the greatest separation of clusters was provided by the first canonical composite, with the second composite contributing little to the separation. Potentially, other field attributes, such as soil chemical properties, could have been included in the analysis to better separate the clusters along a second canonical composite.



View larger version (22K):
[in this window]
[in a new window]
 
Fig. 4. Cluster assignment of the 224 yield plots plotted against the first two canonical composites determined for the field attributes.

 
The two canonical composites of the retained field attributes were rotated using the orthogonal varimax rotation and were used to plot the stretched cluster centers and loading vectors of the field attributes shown in Fig. 5 (Hair et al., 1987). This type of plot is a simple graphical representation of the importance of the canonical composites in separating the clusters and of each field attribute in distinguishing the clusters. The loading vectors also point to the clusters having the highest mean level of that attribute. Figure 5 indicates that Composite I primarily distinguished the toeslope (1&2) cluster from the footslope (3) cluster from the backslope and shoulder clusters (4 and 5). Soil electrical conductivity had the strongest loading on Composite I as indicated by the length and direction of its vector. Elevation was also important for this composite. Thus, EC and EL were the primary attributes that distinguish among three of the four clusters.



View larger version (9K):
[in this window]
[in a new window]
 
Fig. 5. Stretched loading vectors for soil electrical conductivity (EC), soil color (SC), elevation (EL), plan curvature (PL) profile curvature (PR), and slope (SL), and cluster means plotted on the first two rotated canonical composites. The PR vector is obscured by other vectors.

 
The SC loading was included in this plot even though it was found not to be significant in the stepwise discriminant analysis. Figure 5 illustrates why this attribute was not retained in the discriminant analysis as the SC vector was nearly parallel to EL, but of smaller magnitude, and was nearly opposite in direction to EC but again of lesser magnitude. Thus, for this field, SC contributed little additional information beyond that contributed by EC and EL.

Along Composite II, there was little separation between Clusters 1&2, 3, and 5, but Composite II was effective in separating Cluster 5 from Cluster 4. None of the attributes loaded heavily on Composite II although SL had the highest loading on this composite. Plan curvature was noteworthy also because it loaded much more heavily on Composite II than on Composite I. Surprisingly, PR loaded evenly on the two composites and was not particularly effective in distinguishing shoulder plots from backslope plots. Lack of heavy loading on Composite II illustrated that none of the field attributes were dominate in differentiating the shoulder (4) cluster from backslope (5) cluster.

Directions of the loading vectors indicated that EC had its highest values for the toeslope (1&2) and footslope (3) clusters while EL was lowest for these clusters. This is consistent with earlier work that showed the highest EC values in areas where water and clay accumulate (Jaynes et al., 1995b). The steepest SLs and greatest PLs occurred for the shoulder (4) cluster. Schumacher et al. (1999) modeled tillage-induced erosion and showed that shoulder positions typically experience the greatest tillage-induced soil loss, which can reduce yield. Thus, erosion may partly explain the lower yields for the shoulder cluster in this field as well.

Prediction of Yield Cluster Location
Using the 2-m cell coverages of EC, EL, SL, PL, and PR and the multiple discriminant functions developed from the transect plots, yield cluster affiliation was calculated for every 2-m grid cell within the 32-ha field. As with the yield plot locations, the clusters formed simple spatial patterns across the field (Moran's I = 0.94, P < 0.001), roughly aligned with landscape positions rather than in a random spatial pattern (Fig. 6) . As noted above, the multiple discriminant functions were only successful in identifying Cluster 1&2 locations about half of the time. Comparing Fig. 6 with Fig. 2, it was obvious that the functions were not properly identifying the known extent of Cluster 1&2 in the pothole region in the south-central portion of the field but rather identified much of the region as belonging to Cluster 3. Other than the large pothole in the southern half of the field, the toeslope cluster (1&2) extended over only a limited area of the field, principally in drainage ways and areas where surface water collects. These areas produced good corn yields in drier years, but yields were severally reduced in wetter years. Cluster 4, the shoulder cluster, also extended over a limited portion of the field, primarily in isolated hillslope and sideslope locations, and represented areas of consistently low yields. Cluster 3, the footslope cluster, comprised the greatest area of the field and represented the area within the field where corn yields were consistently high for most years. Cluster 5, the backslope cluster, occupied the higher ground, primarily along two large ridges and represented areas of somewhat lower yield expectations for drier years but good yields in wetter years.



View larger version (77K):
[in this window]
[in a new window]
 
Fig. 6. Spatial distribution of the four yield clusters (Cluster 1&2, toeslope; Cluster 3, footslope; Cluster 4, shoulder; Cluster 5, backslope) across the 32-ha field as predicted from the multiple discriminant analysis and significant field attributes.

 

    SUMMARY
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 NOTES
 RESULTS
 SUMMARY
 REFERENCES
 
We used cluster analysis to decipher temporal and spatial patterns in 6 yr of corn yield measurements. Cluster analysis reduced the temporal yield patterns for 224 yield plots into five clusters with similar temporal yield patterns. The clusters were not randomly distributed across the field but instead formed contiguous areas. Spatial contiguity is not a criteria of the clustering algorithm, and its presence likely reflects spatial structure in the soil–weather–crop interactions that determine yield. Interpretation of the temporal yield clusters showed that they were primarily determined by yields in years when growing season precipitation was greater than average. Secondarily, the clusters were determined by yield differences in years of below-average growing season rainfall. The clusters were roughly equivalent to landscape position. Clusters on the lower portions of the landscape were adversely affected by wetter-than-average rainfall. Clusters slightly higher in the landscape had consistently higher yields, whereas clusters on the upper portions of the landscape had consistently lower yields. The rough equivalency between cluster and landscape positions and the sensitivity of behavior within clusters to seasonal rainfall indicates that available moisture may be the dominant factor determining yield behavior in this well-managed, nonirrigated field.

In the profiling step of the cluster analysis, multiple discriminant analysis was used to test for correlations between the yield clusters and the field attributes EC, SC, and EL and the primary terrain attributes calculated from EL: SL, PL, PR, and AS. All field attributes except SC and AS were significant in discriminating between the clusters. High correlations between EC, EL, and SC eliminated SC from contributing significantly to the discriminant analysis. The first attempt at discriminant analysis showed that the field attributes were unable to discriminate between the cluster containing yield plots within a closed depression or pothole and the cluster comprised of plots from other low areas of the field. Because of similar temporal yield patterns for these two clusters, they were combined and the discriminant analysis repeated for four clusters.

Calibrating the multiple discriminant functions on half of the 224 yield plots and validating against the other half of the data, we achieved a 76% success rate in classifying the yield plots into the correct yield cluster. Reversing the roles of the two data subsets gave an 80% success rate in classifying the yield plots into the correct yield cluster. The functions performed worse for predicting membership in the combined Cluster 1&2 compared with the other three clusters. Soil electrical conductivity was the principle field attribute that discriminated between three of the four clusters. Slope and PL were important attributes for discriminating Cluster 4 from Cluster 5. That 20% of the yield plots were misclassified with these field attributes indicates that the yield clusters were not perfectly aligned with terrain. Other terrain or soil attributes will be required to completely identify yield cluster membership.

Using the multiple discriminant functions calibrated with all 224 yield plots in the southern half of the field and field attributes averaged for 2-m grid cells across the 32-ha field, we were able to partition the entire field into four yield clusters—even areas where yield data had not been collected. With the ability to partition the entire field into a few yield clusters, the interpretation of what the clusters may mean for managing the field remains. Yield potential was not the same among the clusters, with the shoulder cluster consistently lower than the footslope and backslope clusters and the toeslope cluster having extremely variable yields depending on growing season precipitation. A question that needs to be answered is whether the identified yield clusters can be equated with management zones, i.e., subregions within a field that express a homogenous combination of yield-limiting functions for which a single rate of a specific crop input is appropriate (Doerge, 1999). If response curves for various inputs such as N are unique for the different yield clusters, then varying input rates may be a cost-effective strategy for corn production on this field.


    ACKNOWLEDGMENTS
 
We are grateful to T.C. Nelsen for suggesting cluster analysis for the interpretation of yield data; to J.D. Cook, K.E. Heikens, and R.O. Hartwig for assistance with field and harvest operations; to D. Meek, T. Green, and L. Hendrickson for reviewing earlier versions of this paper; and to S. Snyder for his cooperation and access to his field.


    NOTES
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 NOTES
 RESULTS
 SUMMARY
 REFERENCES
 
1 Names are necessary to report factually on available data; however, the USDA neither guarantees nor warrants the standard of the product, and the use of the name by USDA implies no approval of the product to the exclusion of others that may also be suitable. Back


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 NOTES
 RESULTS
 SUMMARY
 REFERENCES
 




This article has been cited by other articles:


Home page
Agron. J.Home page
J. Sawchik and A. P. Mallarino
Variability of Soil Properties, Early Phosphorus and Potassium Uptake, and Incidence of Pests and Weeds in Relation to Soybean Grain Yield
Agron. J., September 8, 2008; 100(5): 1450 - 1462.
[Abstract] [Full Text] [PDF]


Home page
Agron. J.Home page
X. Huang, L. Wang, L. Yang, and A. N. Kravchenko
Management Effects on Relationships of Crop Yields with Topography Represented by Wetness Index and Precipitation
Agron. J., September 8, 2008; 100(5): 1463 - 1471.
[Abstract] [Full Text] [PDF]


Home page
Crop Sci.Home page
C. L. Williams, M. Liebman, J. W. Edwards, D. E. James, J. W. Singer,