Agronomy Journal Journal of Natural Resources and Life Sciences Education
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Abstract Freely available
Right arrow Figures Only
Right arrow Full Text (PDF) Free
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via ISI Web of Science (12)
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Lark, R. M.
Right arrow Articles by Wheeler, H. C.
Right arrow Search for Related Content
PubMed
Right arrow Articles by Lark, R. M.
Right arrow Articles by Wheeler, H. C.
Agricola
Right arrow Articles by Lark, R. M.
Right arrow Articles by Wheeler, H. C.
Related Collections
Right arrow Biometrics
Right arrow Field-Scale Studies
Right arrow Experiment Design
Right arrow Statistics
Published in Agron. J. 95:1093-1104 (2003).
© American Society of Agronomy
677 S. Segoe Rd., Madison, WI 53711 USA

PRECISION AGRICULTURE

A Method to Investigate Within-Field Variation of the Response of Combinable Crops to an Input

R. M. Lark* and H. C. Wheeler

Silsoe Res. Inst., Wrest Park, Silsoe, Bedford, UK MK45 4HS

* Corresponding author (murray.lark{at}bbsrc.ac.uk).

Received for publication October 1, 2001.

    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 1. Experiments Assuming Scale...
 2. Experiments to Determine...
 3. Experiments to Model...
 METHOD
 CASE STUDY
 RESULTS AND DISCUSSION
 CONCLUSIONS
 APPENDIX 1
 APPENDIX 2
 REFERENCES
 
Precision agriculture is based on the hypothesis that the optimum rate of inputs to a crop varies spatially within fields. Evidence for this hypothesis is scarce due to the practical and theoretical difficulties of designing appropriate experiments. This paper proposes a procedure for testing the hypothesis of precision agriculture for crops that may be harvested with a combine harvester equipped with a yield monitor. An input is applied according to a randomized block design. Yield monitor data may be treated as a convolution of yield with a function that characterizes the smoothing effect of processes in the combine on the mass flow rate at the sensor. The input rates, determined by the experimental design, are transformed using the combine's smoothing function and a preselected yield response function. The parameters of the response function for the whole field or a local neighborhood can be estimated from these transformed rates and the yield monitor data. A null hypothesis, that the spatial variation in one of these parameters (that determines the local optimum rate) may be attributed to random yield variation about a uniform response function, may be tested. A wheat crop (Triticum aestivum cv. Consort) was treated with varying rates of N fertilizer in a case study in the south of England. Analysis of the yield data showed that the observed variation in the response could not be explained as random fluctuation around the field-scale response function. The economic optimum rate of N varied from zero to greater than 200 kg ha-1.

Abbreviations: ML, maximum likelihood • OLS, ordinary least squares • UK, United Kingdom • SEM, spatial error model


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 1. Experiments Assuming Scale...
 2. Experiments to Determine...
 3. Experiments to Model...
 METHOD
 CASE STUDY
 RESULTS AND DISCUSSION
 CONCLUSIONS
 APPENDIX 1
 APPENDIX 2
 REFERENCES
 
SOILS AND THE YIELD of crops vary within fields. While it does not follow necessarily that the response of the crop to inputs (such as fertilizer) also varies at these scales, this is a plausible hypothesis (Swinton et al., 2002). This hypothesis has given rise to the precision agriculture concept that the rate at which inputs are applied to a crop should be varied within fields to match local requirements. If this is correct, then there may be economic benefits from spatially variable application of the input. There may also be environmental benefits.

It is biologically plausible that the response of a crop to various inputs is not uniform within fields, but it requires experimental evidence because precision agriculture is only feasible if the optimum rate of an input varies substantially at spatial scales at which variable application is practicable. The challenge for experimenters is to develop methods to estimate efficiently and robustly the spatial variation of the response to an input at within-field scales. This is a difficult problem, and this paper proposes a solution.

Research on the precision agriculture concept has been undertaken for more than a decade, so we next consider those experimental methods that have been used and how far they are suitable to test the hypothesis that we stated above. Broadly these experiments fall into three classes.


    1. Experiments Assuming Scale Invariance at Subfield Scales
 TOP
 ABSTRACT
 INTRODUCTION
 1. Experiments Assuming Scale...
 2. Experiments to Determine...
 3. Experiments to Model...
 METHOD
 CASE STUDY
 RESULTS AND DISCUSSION
 CONCLUSIONS
 APPENDIX 1
 APPENDIX 2
 REFERENCES
 
A conventional field trial to estimate a fertilizer response function is based on small subfield plots but is designed, nevertheless, to estimate a response function at field scale, with block, covariate, and residual effects treated as additive. When response functions for different fields are examined, we often find that the field response depends on soil properties (e.g., Shepherd, 1993). Observations such as this provide the basis for rules to predict the field optimum rate of an input from field-specific information such as an analysis of a composite soil sample (e.g., Kachanoski and Fairchild, 1995). We might assume that the relationship between soil properties and the optimum rate of an input is scale invariant, i.e., it does not depend on the size of the region to which the soil information corresponds and for which the prediction of an optimum rate is required. If this assumption is valid (at least for whole- and subfield scales), then the rules for predicting the field optimum rate may be used to prescribe rates at within-field scales. This assumption lies behind some experiments to evaluate the precision agriculture concept.

The treatment in these experiments may be a uniform rate of application of the input or a variable rate determined by a field-scale rule (e.g., Mulla et al., 1992; Hergert and Ferguson 1997; Haahr et al., 1999; Schwarz et al., 2001). If there are real spatial variations in the optimum rate of an input, variable application should give larger yields and/or reduced total inputs without yield loss. When there is no significant benefit from variable treatment in such experiments (as has often been found, e.g., Ferguson et al., 1997), the possible inferences are severely limited. The within-field variation of optimum rates may be small, or the field-scale rules for prescribing rates may not hold at within-field scales.

The assumption of scale invariance for the effect of soil or other factors on the response of a crop to an input is at least questionable. Many hydrological and soil processes and ecological phenomena are often scale dependent over this range of scales (e.g., Bierkens and van der Gaast, 1998; Wagenet, 1998; Lark, 2000a; Kachanoski and Fairchild, 1995), so it would be surprising if responses to inputs do not behave similarly. If we want to quantify the spatial variability of the response of crops to an input at within-field scales, then we need an experimental method for investigating crop responses at that scale that does not invoke the assumption of scale invariance.


    2. Experiments to Determine Response Functions within Subregions of the Field
 TOP
 ABSTRACT
 INTRODUCTION
 1. Experiments Assuming Scale...
 2. Experiments to Determine...
 3. Experiments to Model...
 METHOD
 CASE STUDY
 RESULTS AND DISCUSSION
 CONCLUSIONS
 APPENDIX 1
 APPENDIX 2
 REFERENCES
 
In these experiments, the field is divided into subregions on some criterion, and then separate response functions are estimated within each subregion. Østergaard (1995) did N response experiments on wheat within four subregions of each of two fields and found significant differences in the optimum N rates. Peters et al. (1999) similarly showed differences between N response functions for wheat in three contrasting soil map units within one field. Bongiovanni and Lowenberg-DeBoer (2001a)(2001b) compared the N response functions of corn between topographic units in fields. They detected significant differences between the response functions in the different units.

These experiments do not assume scale invariance, and they use practical subdivisions of the field that can be used for management. This is consistent with the robust and practical management zone approach to precision agriculture (e.g., Franzen et al., 2002). However, investigations of within-field variation by these methods assumes that the subdivision of the field accounts for the key variations in factors that determine the local response. If we wish to design and analyze experiments to quantify within-field variation of the crop response within a field, independent of prior assumptions about which factors drive this variation, then another approach is needed.


    3. Experiments to Model Crop Response Functions as Functions of Soil and Other Variables
 TOP
 ABSTRACT
 INTRODUCTION
 1. Experiments Assuming Scale...
 2. Experiments to Determine...
 3. Experiments to Model...
 METHOD
 CASE STUDY
 RESULTS AND DISCUSSION
 CONCLUSIONS
 APPENDIX 1
 APPENDIX 2
 REFERENCES
 
Hurley et al. (2001) did experiments to measure the value of soil tests and other information for variable application of N. Nitrogen was applied at controlled rates in a randomized block design across a field. Yield monitor data were then divided into local subplots within which one aggregated yield value was recorded for sites at each rate of N. A quadratic response function was postulated for each subplot. This cannot be estimated precisely, but its parameters were assumed to be a linear function of local soil test and other variables, and this function was assumed uniform across the field. It was possible to estimate parameters of this overall function and to use it to assess the value of different kinds of information. A similar method was used by Swinton et al. (2002).

This is a powerful method for identifying and calibrating relationships to predict local crop requirements. The method relies on the assumption that parameters of the local response function may be modeled as linear functions of the ancillary soil or other variables.

The preceding discussion shows that some developments in experimental method are needed to allow empirical estimation of local response functions at within-field scales without prior assumptions about how these depend on soil or other factors. The key requirements are as follows.

  1. It is necessary to map the variation in response to an input continuously in space so that the sensitivity with which differences in the response can be detected is not limited by the choice or the internal uniformity of subregions.
  2. It is necessary to estimate the standard errors of the parameters of the local response function, both to quantify the precision of the estimates and so that the effects of error in the estimates on derived quantities (such as the optimum response rate) can be obtained without bias.

To meet these requirements, we need practicable experimental designs that provide replicated and localized observations of the crop response. There is a conflict between these two requirements if we are constrained to use yield data aggregated over plots—like the 30- by 300-foot (approximately 9- by 91-m) plots used by Swinton et al. (2002). If treatments are replicated with plots of this size, then the group of plots from which a local estimate of the response function is obtained will cover a substantial area and so are no longer localized.

In principle, localized response functions could be estimated in a practical experiment if yield data are collected with a conventional combine yield monitor after the input is applied variably with variable-rate technology. Many local yield estimates could be obtained corresponding to different rates of the input, and a local response function could be estimated. However, data from yield monitors have limitations. These are discussed in more detail below, but in short, the monitor yield for a location in space is not in fact a point measurement of yield but rather is a function of the crop yield along the combine harvester's trajectory through that location (Lark et al., 1997). This smoothes the measured fluctuations in yield, as may arise in response to experimentally varied rates of inputs, and so the local responsiveness of the crop to the input will be underestimated.

Pringle et al. (1999) estimated local response functions for a wheat crop from yield monitor data. Nitrogen was applied to plots at one of a few discrete rates. The yield data were then deconvolved (Lark et al., 1997; Whelan and McBratney, 1997), which in principle, reverses the smoothing effect. The deconvolved yield data for each level of the input were then analyzed separately using geostatistical methods to estimate the response of the crop at each level of the input at a grid of target sites across the field. At each target site, an estimate of yield corresponding to each level of the input was obtained. A response function was fitted to these estimates at each target site.

This is an ingenious strategy, but it has problems. It relies on deconvolution to reconstruct the true yield values, the disadvantages of which are discussed below. The method also assumes that the yields at any level of the input are intrinsically stationary random variables (Webster and Oliver, 2001), which will not always be plausible although it may be possible to limit this assumption to one of local stationarity if the data are sufficiently dense. Finally, it is not clear how to correctly estimate the standard error of the locally fitted model parameters.

This paper presents an alternative method. This uses a treatment map for the applied input, generated by an appropriate experimental design. The key step is a nonlinear transformation and a spatial filtering of this treatment map so that the parameters of the underlying relationship between crop yield and the input can be obtained by estimation from the yield monitor data and the transformed and filtered data on the input rates. The parameters are then estimated within a local moving window to obtain a local response function.

In the next section of this paper, the technique and its application in a case study are described in detail.


    METHOD
 TOP
 ABSTRACT
 INTRODUCTION
 1. Experiments Assuming Scale...
 2. Experiments to Determine...
 3. Experiments to Model...
 METHOD
 CASE STUDY
 RESULTS AND DISCUSSION
 CONCLUSIONS
 APPENDIX 1
 APPENDIX 2
 REFERENCES
 
The proposed method uses two technologies: variable-rate application of an input following a treatment map and yield monitoring to measure the crop's response. It uses two models for analysis of the data. The first is a response function for the crop and input. The second is a convolution model of how yield monitor data depend on the actual yield of the crop. First we describe the two models and how they are combined for the analysis of the experimental data.

Response Functions
The functions considered in this paper are for the response of a crop to N fertilizer, but the principle could be applied to other inputs. Various functions are used for this purpose, and we focus on the exponential response function:

[1]
where y is crop yield, n is the rate of applied N, and a, b, and R are model parameters. This particular model was used in the case study because exponential response functions are the most widely used for analysis of N trials in the United Kingdom (UK) (George, 1984; Shepherd, 1993). However, the method is by no means limited to this response function, and others could be used.

The parameter a represents an asymptotic yield, so biologically meaningful values of the b parameter are negative, and values of the R parameter are positive and smaller than 1. In fact, a fixed value of R (0.99) is commonly used in the UK for analysis of conventional N trials with little reduction in the goodness of fit (George, 1984; Shepherd, 1993).

If a simple exponential response function has been fitted to some data, then the economically optimum N rate, no, is estimated without bias by

[2]
where k is the ratio of the unit cost of N to the unit price of the crop and 2b is the estimation variance of the parameter b. The second term in Eq. [2] is a correction for the bias due to this uncertainty (assuming that the parameter R is fixed), see Appendix 1 for the derivation.

Note that if there is any evidence in an experiment that crop yield has been reduced at high rates of the input (e.g., due to lodging of the crop at large N rates), then an additional term should be added to the model: + cn. A negative value of c implies that yield is reduced with increasing n at large values.

The Convolution Model of Data from Yield monitors
A yield monitoring system comprises a sensing device that measures grain flow at some point within the combine harvester and records simultaneously the combine harvester's location according to the global positioning system (GPS). Many yield monitors, including the system used in the case study, measure mass flow (units of mass of grain per second). For a given combine speed and cut width, mass flow can be converted directly to yield on the ground. Thus, the monitor yield may be written as a function of location in space i, Y(i), or of distance x along the combine harvester's path from some arbitrary origin, Y(x). There are corresponding true yields on the ground that we denote by y(i) and y(x), respectively.

Imagine a batch of grains labeled in some way that are cut simultaneously at the combine header. Some of these grains may be rapidly separated from chaff and drop quickly through the combine harvester's sieves to the elevator and so to the sensor. Other grains may take longer to separate—they may bounce off the walkers and eventually drop through onto the sieves and pass through with the returns. In short, there will be a statistical distribution of transition times from cutter bar to sensor. Figure 1 shows a hypothetical distribution function of transition times. It has two important features, first the interval ts. This represents the time between cutting a section of the crop and measuring the flow rate to which that section makes the largest contribution. Any yield monitoring system must have a working value of ts, a time delay used to correct the location associated with any yield estimate. However, no commercial system of which we are aware attempts to account for the second aspect of this distribution, its width or dispersion (Whelan and McBratney, 2002). This dispersion is the variation in transition times for individual grains. Because of this dispersion, the monitor yield Y(x) is a function of the actual yield on the ground over a finite distance along the combine harvester's path. As a result, the variation in monitor yields is smoother than the true variation on the ground, and any variations in yield that we might induce by experimentally varying input rates will be damped when we measure them with a yield monitor. The contrast between the yields at different rates of the input is therefore reduced because the monitor yields are a function of true yields at sites where different rates have been applied, and we will therefore underestimate how responsive the crop is to that input.



View larger version (11K):
[in this window]
[in a new window]
 
Fig. 1. Notional probability distribution for transition times for a grain from cutter bar to flow sensor. The time shift ts and the dispersion of this process (the variability of the transition times) are illustrated.

 
To overcome this problem, we need a workable model of how the monitor yields Y(x) and true yields y(x) are related. In this paper, we use the convolution model that we have previously applied to the problem (Lark et al., 1997)

The mathematics of convolution are treated in standard textbooks such as that of Bracewell (1986)(p. 185 et seq.) We must assume that monitor yields Y(x) and true yields y(x) are related by an operator that is linear and shift invariant. Linearity means that the operator preserves addition or scalar multiplication, i.e., if we analyzed the true yields into additive components, the operator may be applied to each in turn, and the results will sum to the monitor yields. Shift invariance requires that if there are two peaks in the true yields, separated by a shift, then the same shift will separate the corresponding peaks in the monitor yields. If these conditions hold, then we can write the following:

[3]
where (j * y)x denotes the convolution of the yields on the ground with the function j({delta}); this latter function is called the impulse response function because it is the notional response of the system to an instantaneous input and it describes the smoothing effect of the variation in transition times through the combine harvester ({delta} is a small increment of distance x, not to be confused with Dirac's impulse function). Equation [3] is equivalent to saying that Y(x) is a linear filtering of y(x) with a smoothing function, l({delta}), that is uniquely related to j({delta}), l({delta}) = j(-{delta}),

[4]

The key assumption in this paper is that the convolution model describes the formation of monitor yields. Is it reasonable? We cannot expect the process to be linear and shift invariant under all conditions. Marked changes in speed (Arslan and Colvin, 2002a) or direction or other changes of momentum on rough ground will cause nonlinear behavior. However, in experimental conditions, speed can be kept as uniform as possible, and we can set the experiment up so that measurements can be made while traveling more or less in a straight line. If the combine is used to cut nonexperimental crop before entering the experimental material, then dead spaces in the machine, where grain can accumulate, can reach a steady state, and this will also make the behavior of the system more linear (Lark et al., 1997). There is experimental evidence for the convolution model. Lark et al. (1997) analyzed data from continuous measurements of monitor yield on passes through a crop and adjacent measurements of yield made by augering into sample sacks all the grain cut over 4-m increments. It was shown that the relationship between these measurements was consistent with the monitor yields being the linear filtering of the true yields on the ground. Whelan and McBratney (2000) showed that the flow of marked grain through the combine could be characterized by an impulse response function that could be modeled in a physically meaningful way. Other studies have found the convolution model to be useful and have applied it in practice (Pringle et al., 1999; Whelan and McBratney, 2002; Tøgersen and Waagepetersen, 2002; Arslan and Colvin, 2002b).

Combining the Models
The convolution model can be applied by deconvolution of the monitor data to reconstruct true yields by reversing the filtering process described in Eq. [4] (Pringle et al., 1999). We noted above that deconvolution has disadvantages. Mathematically, it is an ill-posed problem because the solution does not vary continuously with perturbations of the data. In practice, this means that noise and other error is inflated by the process, and the deconvolved data may contain artifacts imposed by the way the data are sampled. For this reason, we propose an alternative.

Because the input has been applied in accordance with a treatment map, we know the applied rate at any location and can write it as a function of location, n(x). This assumes that the input is applied without error. In the experiments reported here, the errors were kept as small as possible, but the possibility that the actual rates received by the crop deviate from the treatment map should not be ignored. Errors may arise from unpredictable factors such as blockages in machinery, positioning error, and drift of material due to wind. There will also be systematic errors due to the dynamics of the application system, which cannot change application rates instantaneously.

We assume that there is a mathematical relationship between the yield and the input rate at any location:

[5]
where f denotes some function with parameters contained in {theta}. Our problem is that we know n(x) but not y(x), only the monitor yields Y(x). However, if f is a linear function (more generally an affine function) then

[6]
where the function f and its parameters {theta} are common to Eq. [5] and [6] and N(x) is a filtering of the applied input rates along the combine path with the same filtering function l(x) that we assume relates the true yields to the monitor yields.


[7]

This follows because an affine relationship (of which a linear relationship is a special case) between two variables is unaffected if both are transformed by the same linear operator (Appendix 2). The problem is to find a linear form of the response function. Polynomial response functions lend themselves to this problem; to use a quadratic model, it would be necessary only to convolve n(x) and n2(x) with the filter l(x). We have not used polynomials in the case study on N response because they are not widely used in the UK (George, 1984). Instead, we have used the exponential response function (Eq. [1]), but this response function is linear with respect to the parameters a and b but not parameter R. A solution to this problem is to assume that only a and b are spatially variable, but R is fixed. If we can estimate R for the whole field, then we may transform the treatment map by a linear filtering of Rn(x), denoted by R* n(x):

[8]

The overall value of R may be found by computing the filtered transformed values of the input variable (with Eq. [8]) for many values of R and finding that for which R* n(x) and the monitor yield Y(x) are most strongly correlated.

The assumption that R is not spatially variable is not ideal, but it is consistent with the practice of specifying a fixed value of R across a set of experiments (George, 1984; Shepherd, 1993). As reported by George (1984) in the analysis of plot experiments from a wide range of conditions across the UK, the parameter R was in the range 0.985 to 0.990, and in no case did perturbing this parameter have a significant effect on the goodness of fit of the response function. An alternative to using a fixed value of R is to make the transformation in Eq. [8] for many values of R and then fit local response models by selecting the best locally fitting value. This would be computationally expensive, and obtaining the estimation variance of R would be even more so. The option of using a fixed R was followed in the case study.

Estimating Local Response Functions
The procedure above will generate a spatially dense set of yield monitor data, Y(x), and corresponding convolved transformed values of the experimentally controlled input R* n(x). For convenience, we now express the location of the data by the coordinate vector i: Y(i) and R* n(i)

The local response function at a location k can be estimated from the yield monitor data and filtered transformed input data within a local window using the m nearest-neighboring points. It is proposed that this is done at locations on a grid over the experimental field to generate a set of local response functions. We assume that the filtered transformed input values are known (as is conventional in analysis of fertilizer trials), so the parameters a and b of the response function may be estimated by regression. However, ordinary least squares (OLS) regression is not suitable for this purpose, because the residuals from the fitted regression within the window are very likely to be spatially correlated, so OLS estimates of the parameters are likely to be biased, and estimates of their variance are certainly biased (Cressie, 1993).

There are two general ways of solving this problem. The first is to use spatial autoregressive (SAR) models of which spatial error models (SEMs) are most appropriate for the present problem where we wish to estimate the parameters of a prespecified model (Anselin, 2001). In the SEM, the residuals from the model are treated as an autoregressive random variable. This model has been used for the estimation of local response functions (Bongiovanni and Lowenberg-DeBoer, 2001a, 2001b). The alternative approach, followed here, is to assume that the residuals from the model are realizations of a random function that may be characterized by its variogram. Thus, it is assumed that the spatial covariance is a continuous function of the distance between two points (and also, perhaps, the direction) while in SEMs, the covariance is treated as a relationship between discrete points. Parameters of the variogram are estimated along with the model parameters by maximum likelihood (ML) (Cook and Pocock, 1983; Lark, 2000b). Hurley et al. (2001) used this method. The SEM is computationally efficient and can incorporate a variety of spatial interactions. The ML method was selected because it offers a well-established basis for making inferences about the fitted models and comparing alternatives and the random function model of the error variable encompasses autoregressive processes (that have an exponential variogram) but also other forms of spatial variability. It is also easy to simulate the error process of the ML model, which is necessary for reasons discussed below.

If the m values R*n(i) within the local window about k are contained in the second column of the m x 2 matrix X that has the value 1 in each element of the first column, and the m monitor yields from the local window are in vector y, then estimates of coefficients a and b of Eq. [1] are contained in the column vector obtained by

[9]
where A is the autocorrelation matrix of the residuals and the superscript T denotes the transpose of the matrix. The residual variance is estimated by

and the covariance matrix of the parameters {Xi} is estimated by

[11]

These estimates are all conditional on the autocorrelation matrix A. This may be written in terms of the variogram of the residuals. Consider the simple isotropic variogram

[12]
where h is a lag distance, co is the nugget variance, and r is the distance parameter of a variogram function g that describes the spatial dependence of the error variance component c1. The variogram function must be a bounded and authorized spatial model such as the exponential or spherical (Webster and Oliver, 2001). If observation i and j within a neighborhood are separated by lag distance hi,j, then the autocorrelation of their errors may be written as

[13]
where the parameter s is the spatial dependence ratio of the error variance, i.e., the ratio c1/(co + c1).

It is necessary to chose a model (e.g., spherical or exponential) to use in Eq. [13]. Lark (2000b) found that the results of the analysis are not very sensitive to the choice of variogram model. Here we specified the class of variogram function, which was chosen for the generalized least-squares fitting of the whole-field response function (see below). The parameters s and r in Eq. [13] are then estimated numerically by finding the values that maximize the log-likelihood of the model, which is achieved when L' is maximized

[14]

This procedure will generate a set of estimates of the parameters a and b for overlapping windows, along with the estimation variances of both parameters, conditional on the selected overall value of R. The estimation variances of the parameters correctly describe the uncertainty with which they have been estimated within each window and so may be used to compute an unbiased estimate of the locally optimum N rate using Eq. [2] for each window in turn. Because the local estimates of the parameters at neighboring points on the grid are estimated from overlapping subsets of the data, they and their estimation variances are not independent, and they should not be treated as such in any model assembled over the whole field. For example, analysis following Hurley et al. (2001) to calibrate predictive relationships between these local parameters and measured soil properties must account for the imposed correlation between neighboring response functions.

Is the Variation among the Local Response Functions Significant?
From the method described so far, parameters of a response function may be estimated within local windows. Reconsider now the original scientific problem of testing the hypothesis that the optimum rate of an input is variable within the field. Given a fixed value of R, this variation depends on b for the exponential response function. Consider a situation where a single response function holds across the field, but various other factors independent of the applied N rate cause yield variation. After completing the analysis above, variation in the b parameter would be observed from window to window simply because of estimation error arising from random variation in yield. A test is needed to decide the strength of the evidence from an experiment for real underlying variation in the response to an input. The null hypothesis that forms the basis for the test developed in this paper is:

Given the choice of an exponential response function, and conditional on the specification of a fixed value of the R parameter, the variation of the estimated b parameter among the local windows (as measured by its sample standard deviation, sb) arises from random variation in yield about a whole-field response function that describes the N response at all points in the field.

Under the null hypothesis, variation about a whole-field function arises from factors that will be uncorrelated with the applied rate of N. As a result, while the locally estimated values of b will vary, this variation will be small because deviations in yield from those expected under the uniform response will not depend systematically on the local applied rate of N. However, if under an alternative hypothesis, the underlying response function is variable across the field, then some of the variation about the whole-field function arises from differences between the local response functions, and this variation will depend on the locally applied rate of N and can be accounted for by variation in locally fitted values of the b parameter. As a result, we would expect the observed value of sb to be larger when the alternative hypothesis is true than when the null hypothesis is true. The practical problem is to obtain a sample distribution of sb under the null hypothesis that can be compared with the value of sb measured in an actual experiment. In particular, we must account for the fact that local values of b will be correlated under the null hypothesis if they are estimated from overlapping subsets of the data. The distribution of sb under the null hypothesis will depend on this correlation and so on the particular decisions that have been made about the neighborhood from which the local response functions are estimated. It is necessary to obtain a sample distribution of sb under the null hypothesis that accounts for this properly. This is achieved by the following procedure:

A whole-field response function is fitted

[15]
using all of the yield monitor data and filtered transformed N rates. The ML method cannot be used because the numerical solution of this requires many inversions of the m x m autocorrelation matrix, and the computational cost is prohibitive when m is more than a few hundred (cf. Mardia and Marshall, 1984). Instead, generalized least-squares regression was used (Cressie, 1993). The model was first fitted to all of the data by OLS, and then the variogram of the residuals was estimated and a variogram model fitted to these estimates by weighted least squares (Webster and Oliver, 2001). This variogram model was then used to specify an autocorrelation matrix for all of the residuals (Eq. [13]) that was then used to obtain estimates of the model parameters and their variances with Eq. [9] to [11]. We are therefore assuming that the errors from the whole-field response function are stationary in the variance, that is, their variability is independent of position.

The standard deviation sb will depend, among other factors, on the spatial distribution of the data, the size of the grid on which the local models are estimated, and how far the subsets of data used to estimate these models overlap. This imposes a correlation between neighboring estimates of b. This must be accounted for when computing a sampling distribution of the statistic. The most general way to do this is by bootstrap resampling in which we generate many realizations of the statistic sb under the null hypothesis. This is done by the following steps:

  1. For each yield monitor datum, Y(i), compute the expected yield according to the whole-field response function with parameters estimated by generalized least squares:

    [16]

  2. An error variable, {epsilon}(i), is now generated. This is a random normal variable with mean of zero and the same variogram as that fitted to the residuals and used to obtain the generalized least-squares estimate of the whole-field response function. It therefore corresponds to the null hypothesis. The errors with the specified variogram were simulated by the LU decomposition method, details of which are given by Deutsch and Journel (1992). A value of the error variable is generated for all locations of yield monitor data i. This is done many times. The yield for location i in the jth simulation is

    [17]

  3. Using a full set of simulated yields, the local response functions are then estimated in exactly the same way as with the real data, at the same grid nodes and using the same neighbors.
  4. The standard deviation of the b parameters fitted to the jth set of simulated yields, sb,j, is then computed.
  5. Steps 2 through 4 are reiterated for j = 1,2,...,M realizations to generate M values of the standard deviation sb. The empirical distribution function of these values, approximated by a histogram, is a bootstrapping estimator of the distribution function of this statistic under the null hypothesis.

The null hypothesis may be rejected if the experimental value of sb is larger than some specified percentile of the bootstrapped sample distribution. If the null hypothesis is accepted, then this suggests that, at least under the seasonal conditions applying during the experiment, there is no reason to assume that the response of the crop to the input is spatially variable. We would be unlikely to find soil or other factors driving such variation, and spatially variable application is unlikely to give any benefits. If the null hypothesis is rejected, then the local response functions do appear to differ. There is a scientific basis for applying variable rates although the actual benefits are not necessarily large.

The remit of this paper is to present a method for experimental determination of within-field variations of a crop's response to inputs. The information generated from the analyses above would allow us to explore the practical implications of this variability for crop management. For example, we could use local response functions to estimate the expected gross margin for a crop under uniform application of inputs at different rates and to compare these margins with those obtained when the local optimum rate is applied. Similarly, we could estimate the gross margins expected if input rates are varied between management zones in the field—defined from soil maps or yield maps, for example. While these comparisons use experimental results that are retrospective, and so exclude the uncertainty about the actual local optimum rates in a particular season, they would be a first step beyond the scientific questions about variability in crop performance to agronomic questions about the potential benefits of variable as opposed to uniform management. For example, how large are the potential benefits of varying inputs at finer scales than broad management zones? Is the economic benefit of variable-rate management primarily a consequence of reducing inputs where the response is poor or increasing them elsewhere?


    CASE STUDY
 TOP
 ABSTRACT
 INTRODUCTION
 1. Experiments Assuming Scale...
 2. Experiments to Determine...
 3. Experiments to Model...
 METHOD
 CASE STUDY
 RESULTS AND DISCUSSION
 CONCLUSIONS
 APPENDIX 1
 APPENDIX 2
 REFERENCES
 
Crop Establishment, Variable Application, and Harvesting
A winter wheat crop (cultivar Consort) was drilled on Bypass field at Silsoe Research Institute on 6 Oct. 2000. Bypass field is 11.6 ha on a sandy clay loam soil formed on Quaternary drift over the Gault Clay (Cretaceous).

The field had 24-m-spaced tramlines (wheelings) and was divided into 467 plots that were surveyed manually and marked with canes. Each plot was 12 m wide (perpendicular to the tramlines), with a tramline along one side (where the plot edge is in the center of the tramline), and 15 m long in the direction of the tramlines. The plots, as laid out, encompassed almost all of the field. The position of each plot was surveyed very accurately using a laser theodolite.

The plots were then grouped into blocks of five contiguous plots (and a metablock with the remaining plots). Each plot within a block was then randomly allocated to one of five levels of N fertilizer, 0,1,...,4. The actual rate for each level was selected in consultation with the Soil Science Department at Rothamsted Experimental Station. Table 1 shows the N rates and the division of these into two or three split dressings.


View this table:
[in this window]
[in a new window]
 
Table 1. Experimental N rates.

 
Nitrogen fertilizer was applied to these plots as a liquid product using the Micron patch sprayer designed at Silsoe Research Institute (Miller et al., 1997). The N rate was changed at the marker cane for each plot by the control system in response to a manual signal. This system was operated by one of us (HCW) rather than by the tractor driver. This, and the dynamic responsiveness of the system (Miller et al., 1997), meant that good control of the fertilizer application was achieved. All operations on these fields other than the N application were conducted uniformly and in accordance with normal local commercial practice following agronomic advice.

The experimental crop was harvested with a Massey Fergusson 32 combine, equipped with the AGCO (Duluth, GA) MF yield meter, which records a point yield estimate at approximately 2-s intervals, and Fieldstar mapping system, version 3.12. The cutter bar of the combine is 4.88 m wide (16 ft), so two full, nonoverlapping passes could be made through each plot parallel to and not overlapping the tramlines. The fields were each harvested in a single operation. After four passes across the headlands at each end of the field, one pass was made through every plot and then a second pass was made through every plot, such that the two passes through any plot were in opposite directions. Combine speed was kept as uniform as possible. All passes were very nearly straight, and each ended at a headland where the combine turned before making another pass. A total of 3919 point estimates of yield were obtained.

All plots were examined during harvest by one of us (HCW) from the cab of the combine harvester. There was no evidence of lodging anywhere in the field, nor of any damage to the crop by heavy rain or pests.

Data Analysis
The analysis of the data requires a linear filter, l(x) (see Eq. [4]), that can be used to process the treatment map. Previously, we have demonstrated how the impulse response function of a combine and yield monitor configuration, j(x), may be estimated from yield monitor data recorded while the combine moves into or out of a crop so that the input is an (unknown) step function. (Lark et al., 1997). If x is distance into the pass from some arbitrary zero and Y(x) is the corresponding yield monitor output, then we showed that Y(x) is well fitted by a Gompertz function

[18]
where d is the system delay expressed in terms of distance traveled by the combine, {kappa} is a shape parameter, y1 is the mean yield before the crop edge, and y1 + y2 is the mean yield after the edge. When these parameters have been estimated and the delay, d, is separately corrected, the corresponding estimate of the impulse response function is

[19]
where z is the distance along the combine path relative to the corrected position, and the maximum of the function is at z = 0. This method was used again here but only with data from entry into the crop because of momentum changes on exit. Data were used from 10 passes of the same combine and yield-mapping system into a nonexperimental wheat crop with no variable treatments applied. The combine was driven into each pass into the crop over stubble from the headland (at least a 20-m run with no input into the system), maintaining as uniform a speed as possible. Because the headlands had first been cut, the combine moves across a sharp edge into the uncut crop. For modeling, we extracted the data for the first 30 to 40 m into the crop. The Gompertz function was then fitted to these data by least squares, estimating a common value of {kappa} for all the passes.

In further analysis of the data, it was assumed that the shift, ts, was correctly accounted for by the yield-mapping system's default correction. This assumption was tested using Lark et al.'s (1997) method for estimating the shift. The correlation was computed between the monitor yields from a pass through a set of plots and the monitor yield nearest to each in the second pass through the same plots. The second pass was in the opposite direction to the first. The correlations were then computed between each datum and the second, third, etc., nearest point in the second pass, offset in each direction. It was found that the maximum correlation was between the data in the first pass and the nearest datum in the second. This shows that the shift had been corrected adequately.

The linear filter must be obtained from the impulse response function given in Eq. [19]. From convolution theory, l({delta}) = j(-{delta}), where {delta} is a small increment so

[20]
where l(z) is the weight attributed to the true yield at position xo + z on the combine harvester's path to determine the monitor yield attributed to position xo. The value of {kappa} is the one estimated from the yield data.

For practical purposes, this continuous filter function must be replaced by a set of discrete weights. Weights were calculated for 1-m increments along the combine path. The ith weight, at position xo + zi m, is wzi,

[21]

A set of weights was calculated; the function l(z) does not reach zero for finite z, so the filter must be truncated. We retained all increments for which wzi was 10% or more of the maximum w0. This gave a filter 22 m long with 23 weights wzi, zi = -15 m, -14m,..., 7 m. The weights were then normalized to sum to 1:

[22]

The normalized weights are shown in Fig. 2 .



View larger version (10K):
[in this window]
[in a new window]
 
Fig. 2. Normalized weights, w'zi, in the discrete linear filter (see Eq. [22]). The position along the path relative to the nominal position of a yield datum is denoted by zi (m).

 
For any monitor yield datum, Y(x), the applied N rate can be found for positions x + zi, where zi = -15 m, -14 m,..., 7 m, with the exception of data nearer than 15 m from the entrance of the combine into a pass through experimental plots or closer than 7 m to the exit. For any given value of the parameter R of the response function, it is therefore possible to compute a discrete approximation to the filtered transformed N rate:

[23]

This was done for all of the yield monitor data setting R = 0.900, 0.901,..., 0.999. The correlation over all of the field between Y(x) and *n(x) was then computed for each of these values of R, and the value of R for which the correlation was strongest (largest absolute value) was selected for use in all further analyses.

The whole-field response function was estimated by generalized least squares. Because there was not evidence of lodging in the crop, we used the simple exponential response function in Eq. [1]. To assess the impact of the convolution process, the whole-field response function was also estimated, by generalized least squares, from the yield monitor data and the corresponding N rate at each position, n(x); the R parameter was estimated separately (i.e., it was not set to the overall value obtained with the method in the previous paragraph).

Local exponential response functions were estimated at nodes of a square grid, interval of 10 m, using the nearest 90 yield monitor data and filtered transformed N values. On average, these data were within 34 m of the grid node. An exponential variogram model was used for the error variance in the ML estimation because this model gave the best fit in the generalized least-squares estimation for the whole-field response function.

The standard deviation, sb, of the local b parameters over the whole grid was calculated, and the bootstrapping method was applied, with 100 iterations of the simulation and estimation, to obtain an empirical distribution function for sb under the null hypothesis of a uniform underlying response function.


    RESULTS AND DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 1. Experiments Assuming Scale...
 2. Experiments to Determine...
 3. Experiments to Model...
 METHOD
 CASE STUDY
 RESULTS AND DISCUSSION
 CONCLUSIONS
 APPENDIX 1
 APPENDIX 2
 REFERENCES
 
Figure 3 shows the correlation, {rho}, between the monitor yields Y(x) and *n(x) for different values of R. This was strongest for R = 0.988, which was used in all further analyses. The parameters of the whole-field response function obtained using the filtered transformed N rates and that obtained ignoring the convolution problem are presented in Table 2. Note that in both cases, the fitted value of b is significantly different from zero with large t ratios. The error variance is somewhat larger in the latter case. The R parameters of the model are the same although they were estimated independently. However, the b parameters differ between the models, and this is apparent in Fig. 4 where the models are plotted. The model fitted using the filtered transformed input values implies that the crop is more responsive to N, and as a result, the economic optimum rate is significantly larger. This indicates how the smoothing effect of grain flow through the combine can result in underestimation of the yield differences between treatments if it is not adequately accounted for by the data analysis.



View larger version (9K):
[in this window]
[in a new window]
 
Fig. 3. Correlation ({rho}) between monitor yields Y(x) and filtered transformed N rate *n(x) for different values of the response function parameter R (see Eq. [8]). Note that there are many data points that are not shown for neatness. The graph is formed by a straight line joining each point.

 

View this table:
[in this window]
[in a new window]
 
Table 2. Whole-field N response functions.

 


View larger version (11K):
[in this window]
[in a new window]
 
Fig. 4. Overall response function fitted using (bold line) monitor yields Y(x) and filtered transformed N rate *n(x) and (thin line) monitor yields Y(x) and nominal N rate n(x).

 
The estimated variogram for the residuals from the whole-field response function was fitted by an exponential model with a nugget of 0.4 t2 ha-2, a spatially structured variance of 1.3 t2 ha-2, and a distance parameter of 138 m. In Table 3 are statistics of the parameters of the variograms for residuals from the locally fitted response functions. Note that overall the residual from the locally fitted models has a smaller variance and the distance parameter is smaller than the corresponding variogram for the whole-field response function. This is what we would expect if there is spatial variation in the N response of the crop, variation which will contribute to substantial, spatially structured error about a whole-field response function.


View this table:
[in this window]
[in a new window]
 
Table 3. Statistics of local (exponential) variogram parameters.

 
The economic optimum N rate at each grid node was derived using Eq. [2]. These rates are plotted in Fig. 5 where a symbol represents the node on which it is centered. The optimum rates show considerable variation, with a maximum of 225 kg ha-1. Over about a quarter of the field, the optimum rate is less than 100 kg ha-1, and there are a few grid nodes where the economic optimum rate is zero. The variation in the optimum N rate is spatially structured with, for example, much of the area requiring 150 kg ha-1 or more occurring in a few discrete blocks. The average value of the correction factor for the optimum rate was only -4 kg ha-1, indicating that the uncertainty of these local estimates is small.



View larger version (65K):
[in this window]
[in a new window]
 
Fig. 5. Local optimum N rate at grid nodes across Bypass field. The coordinates are in meters and are relative to the datum of the United Kingdom Ordnance Survey.

 
The histogram in Fig. 6 shows the bootstrapping approximation to the frequency distribution of sb for this particular experiment under the null hypothesis that the whole-field response function applies at all nodes of the grid. The actual value of sb is also shown (0.69). This exceeds the maximum value obtained in the bootstrapping (0.46) by nearly twice the range of the bootstrapped values. On this evidence, we may confidently reject the null hypothesis and assume that the variation in Fig. 4 represents real underlying variation in the response of this crop to N.



View larger version (13K):
[in this window]
[in a new window]
 
Fig. 6. Histogram of bootstrapped estimates of the standard deviation of the b parameter, Sb, under the null hypothesis for grid nodes on Bypass field. The standard deviation of the b values fitted to the experimental data was 0.69; this value is indicated on the abscissa by the impulse symbol (arrow).

 
The goal of this paper was to present a novel method for experimentally measuring the variability of the requirements of the crop for an input. A method has been developed and applied, but further development is needed. Seven particular areas are important. The first is the analysis of such experiments for more than one season. It will be necessary to identify variation in input requirements that is temporally stable and so predictable and manageable. This will be particularly important if the method is to be adapted for use in on-farm experiments.

Second, other response models may be incorporated. Polynomial response functions such as the quadratic could easily be used and would have the advantage in this procedure of being linear in their parameters. One problem is how to select among different models. Because the local model fitting is done by ML, it is possible to compare different models for each neighborhood by computing the Akaike information criterion (Lark, 2001). This will identify the most parsimonious local model, i.e., the model that is no more complex than is justified by the data. However, making an overall model selection on the basis of locally fitted models (that are not independent) will require some statistical development.

Third, a wider range of experimental designs should be considered. In the reported experiments, randomized block designs were used, but it would be possible to use systematic designs for the treatment map, such as Pringle et al.'s (1999) chequer board. These designs might be simpler to apply in practice, particularly for on-farm experiments, but they have statistical problems. Experimental designs with fewer zero-rate plots will also be preferable in a commercial context, and further work is needed to find a design that gives the best balance between the quality of the resulting information and the opportunity costs of suboptimal fertilizer applications.

Fourth, the bootstrapping method for generating a sample distribution of sb could be developed. In particular, it would be interesting to explore ways to relax the assumption that errors of the whole-field model are stationary in the variance.

Fifth, the analytical method relies on the key assumption that the monitor yields are a convolution of true yields on the ground. This has some support, but further work is needed to develop a more general method that might be based on the mechanistic modeling of Maertens et al. (2001).

Sixth, the scientific method proposed here could be developed into a procedure for on-farm experiments to provide information on variable response for use in management. The key development needed for this is an economic assessment of the value of the information that is generated and, in particular, the value of improved precision of the local estimates of response parameters. It is also necessary to assess the costs of the procedure, in particular the costs arising from suboptimal application of inputs to zero or non-zero plots. It will then be possible to design experiments in which the costs and benefits to the farmer are optimized.

Finally, if this method is to be widely used, particularly on farms, then it will be necessary to account for error in the application of inputs. There are two aspects to this. First, there will be a systematic error due to the dynamics of the control system, which cannot change application rates instantaneously. Usually the control characteristics of the applicator should have been quantified by the manufacturer, but they can be described experimentally using tracers. This information could then be used to transform the nominal treatment map to a corrected map that better reflects the rate of application across the field. Second, there will be unpredictable errors, perhaps better characterized as random. These should be characterized (again, perhaps, by experiments using tracers), and this information could then be incorporated into the procedure for fitting local and whole-field response functions.

With a sound methodology, we may return to the biological and agronomic problems. What is most needed is a general understanding of how the spatial variation of various factors within fields determines the variation of the requirements of the crop for key inputs. The reported analysis using the exponential response function may help with this. The exponential response function posits a maximum yield, a, that assumes that the input of concern is not limiting and a reduction below this yield—described by the R and b parameters—due to the limiting effect of the input when it is applied at a certain rate. This implies that spatial variation of the a parameter across a field may indicate the spatial variation of limiting factors other than those of the input while variation in the b parameter indicates variation in factors of soil and site that determine how efficiently the crop is able to use the input. This hypothesis offers a systematic way of investigating the effects of a variety of soil and other factors on the use of an input by a crop, perhaps following Hurley et al. (2001) by modeling these parameters as functions of different soil variables.


    CONCLUSIONS
 TOP
 ABSTRACT
 INTRODUCTION
 1. Experiments Assuming Scale...
 2. Experiments to Determine...
 3. Experiments to Model...
 METHOD
 CASE STUDY
 RESULTS AND DISCUSSION
 CONCLUSIONS
 APPENDIX 1
 APPENDIX 2
 REFERENCES
 
To conclude, it has been shown how local response functions can be estimated from designed fertilizer experiments harvested using commercial yield mapping. Appropriate analysis is essential for this to account for the limitations inherent in the yield data. The results show that experiments in which the yield monitor data are analyzed directly with no account of the smoothing effect of the combine can seriously underestimate the responsiveness of the crop to inputs. It has also been shown how a null hypothesis that a whole-field response function can account for the observed variation may be tested. The methodology proposed here must now be applied in experiments aimed at increasing our understanding of why N response varies spatially and how the local response function might be predicted.