|
|
||||||||
a Crop and Soil Sciences, Cornell Univ., Ithaca, NY 14853
b Dep. of Mathematics and Dep. of Statistical Science, Cornell Univ., Ithaca, NY 14853
* Corresponding author (gwf2{at}cornell.edu).
Received for publication February 12, 2002.
| ABSTRACT |
|---|
|
|
|---|
Abbreviations: ANOVA, analysis of variance LC, lack of correlation LCS, lack of positive correlation weighted by the standard deviations of the measurements and simulations MSD, mean squared deviation MSEP, mean squared error of prediction NU, nonunity slope SB, squared bias SDSD, difference in the magnitudes of fluctuation between the measurements and simulations
| INTRODUCTION |
|---|
|
|
|---|
Here the convention is to use model outputs as predictors for actual measurements, symbolized by X and Y, respectively. One simple statistic for assessing a model's merit is the correlation coefficient (r) between X and Y. Another common analysis is linear regression of Y on X to check whether the intercept (a) is near 0 and the slope (b) is near 1. Wallach and Goffinet (1989) observed that agricultural and ecological researchers use diverse statistical analyses for model evaluation, but many of these analyses fail to quantify directly the predictive accuracy of a model, even when such is the researchers' explicit objective. This confusion persists. For instance, see the 10 papers from a symposium on "Crop Modeling and Genomics" published recently in this journal (Agronomy Journal 95:4113). That symposium illustrates the frequent use of correlation and regression for model evaluation.
However, Kobayashi and Salam (2000) present cogent reasons why the correlation coefficient and linear regression are not entirely satisfactory for model evaluation and suggest that MSD and its components are often more informative. Further developing those findings, a different partitioning of MSD components has the advantage of yielding distinct components with straightforward meanings.
| COMPONENTS OF MEAN SQUARED DEVIATION |
|---|
|
|
|---|
![]() | [1] |
Let
and
be the means. Also, let xn = Xn -
and yn = Yn -
be the deviations from the means.
The partitioning of MSD suggested by Kobayashi and Salam (2000) has three components [also see Xie et al. (2001) and Ewert et al. (2002)]. Using their notations, the first component is SB, which arises from these two means being unequal,
![]() | [2] |
The (population) standard deviation of the simulation SDs is (
x2n/N)0.5, and likewise the measurements have an SDm of (
y2n/N)0.5. Accordingly, their second component, the difference in the magnitudes of fluctuation between the measurements and simulations (SDSD), arises from these two standard deviations being unequal,
![]() | [3] |
Third and finally, there is lack of positive correlation weighted by the standard deviations (LCS).
![]() | [4] |
However, Kobayasi and Salam (2000) note two problems with their MSD components. First, SDSD and LCS "are not entirely independent" since both involve SDs and SDm. Consequently, their meanings are confounded and unclear. Second, it is difficult to use regression parameters (a, b, and r) in combination with MSD parameters (SB, SDSD, and LCS) for understanding data sets because these parameters "are not explicitly related to each other." Also, a third problem could be mentioned, namely that their MSD components have no interpretation in terms of ANOVA (except, of course, for SB). Accordingly, a better partitioning of MSD was sought.
We retain SB since it is standard fare in statistics and has an entirely clear meaning. Incidentally, as a special case, note that if the slope b = 1, then SB > 0 if and only if the intercept a
0.
Our second component is the mean square for NU,
![]() | [5] |
xnyn/
x2n. Obviously, NU > 0 if and only if b
1.
And our third component is the mean square for LC,
![]() | [6] |
xnyn)2/
x2n
y2n. Obviously, LC > 0 if and only if r2
1. | INTERPRETATION OF COMPONENTS |
|---|
|
|
|---|
These components have a simple geometric interpretation that reinforces the meanings of their names, as shown in Fig. 1 . The first panel shows the relationship of perfect equality, Y = X, for which MSD = 0. Departure from perfect equality is possible in exactly three ways as considered next: translation, rotation, and scatter.
|
(which in this special case with b = 1 emerges from a
0), for which SB = 1. The third panel (Fig. 1) shows rotation, NU resulting from b
1, for which NU = 2. And finally, the fourth panel (Fig. 1) shows scatter, LC resulting from r2
1 because of errors
= (1, -1, 0), for which LC = 0.6667.
The resulting MSD for any combination of these three problems is simply additive. The model values are always the same, X = (-1, -1, 2). Combining NU and LC gives the data Y = 2X +
= (-1, -3, 4), resulting in MSD = 2.6667 with components NU = 2 and LC = 0.6667. Likewise, combining all three problems gives Y = 1 + 2X +
= (0, -2, 5), resulting in MSD = 3.6667 with the expected three components.
Incidentally, MSD itself also has a geometric interpretation. It is the mean square of the deviations around the 1:1 line in a plot of model predictions X against measured values Y, these deviations being reckoned as usual in the direction of the ordinate Y (although reckoning instead in the direction of the abscissa X necessarily gives identical deviations and the same result for MSD). And the square root of MSD is the (population) standard deviation of these deviations around the 1:1 line. For comparison, the root mean square error reported so commonly for regression analyses is the analogous (sample) standard deviation around the regression line (rather than the 1:1 line), having deviations reckoned in the direction of the ordinate for the usual regression of Y on X.
Because they are distinct and additive, our MSD components have simple and clear meanings. For comparison, the SB component is identical in the Kobayashi and Salam (2000) partitioning of MSD and in our partitioning, their SDSD component is roughly similar to our NU, and their LCS is roughly similar to our LC. But their SDSD and LCS are difficult to interpret. For example, for the fourth panel in Fig. 1 with LC = 0.6667, the Kobayashi and Salam (2000) analysis splits this simple LC into two components, SDSD = 0.0479 and LCS = 0.6188, so neither of these values has any obvious meaning. Likewise, when NU = 2 and LC = 0.6667 are combined as described in a previous paragraph, another awkward split occurs with SDSD = 2.3400 and LCS = 0.3267, which again has no transparent interpretation.
Another difference between the Kobayashi and Salam (2000) components of MSD and ours is that their components are unchanged by reversing X and Y, whereas ours do change. For example, consider reversing the axes in the last panel of Fig. 1, resulting in model X = (0, -2, 2) and data Y = (-1, -1, 2). Unlike the original regression with its slope of 1, this regression has a slope of b = 0.75, and since b
1, consequently NU > 0, namely 0.1667. Furthermore, this new regression lies closer to its data points, resulting in the smaller LC = 0.5000. But given the symmetry in X and Y in Eq. [2], obviously SB is unchanged, namely 0. By contrast, all three of the Kobayashi and Salam (2000) components are symmetric and consequently unchanged. Again, reversal of X and Y has changed the slope and has changed the deviations from the regression line, and yet SDSD and LCS fail to reflect these important changes. Whether X or Y is the model, and correspondingly whether Y or X is the data, ought to matter for MSD components (except SB).
Besides their regression and geometric interpretations, our components also have an ANOVA interpretation. Consider the linear regression
![]() | [7] |
n is the error. The uncorrected total sum of squares, divided by N, is MSD. The type I or sequential sum of squares for the intercept divided by N is SB. The type I sum of squares for X, divided by N, is NU. And finally, the sum of squares for error, divided by N, is LC. Because of these facts, our partitioning of MSD has another nice property. Under the assumptions that (Xn,
n) are independently identically distributed and that Xn and
n are independent, it can be shown that SB, NU, and LC are statistically independent (whereas SDSD and LCS are not). These independencies can be used to construct various F tests and t tests. | EXAMPLE OF WHEAT MODELS |
|---|
|
|
|---|
|
The winner among these five wheat yield models according to MSD is AFRCWHEAT2. Coincidentally, this model also has the smallest LC component of MSD (as well as the smallest NU, although not the smallest SB). This model also achieves the highest correlation (0.9707). And for the usual regression of Y on X, this model has the intercept closest to zero (0.3293), as well as the slope closest to one (1.0079). Hence, several considerations agree on AFRCWHEAT2 as being the best of these models for the seven environments used in this study.
Mean squared deviation components are best regarded as complements to regression parameters, rather than as replacements for them. Most pointedly, NU > 0 indicates that b
1, but NU does not distinguish between b > 1 and b < 1, so knowing the slope provides important additional insight. This same verdict applies to the statistic that NU replaces, namely SDSD.
| DISCUSSION |
|---|
|
|
|---|
The burden of Kobayashi and Salam (2000) is that MSD is better suited for model selection than is regression and correlation. A necessary precondition for this claim to be true is that MSD and r rank models differently, at least sometimesotherwise there is no difference in practice. For the present example of five wheat yield models, however, recall that AFRCWHEAT2 has both the smallest MSD and the highest r, so MSD and regression approaches select the same winner. So, when would MSD and r give different rankings?
From Eq. [6], LC and r2 are related inversely, so model rankings by (low) LC and rankings by (high) r2 are identical. Accordingly, different winners will be selected if and only if one model has the lowest MSD but a different model has the lowest LC component of MSD (or equivalently, the highest r), which can happen if the lowest LC happens to be accompanied by relatively high SB or NU or both. Every difference and any potential superiority that MSD has over regression for purposes of model evaluation are due precisely to MSD considering not only LC but also SB and NU.
A special but important case arises when a model's problems with SB and NU are relatively easy to fix or reduce, but not problems with LC. Then MSD ranks the models well as regards their current merits, but LC (or its correlate r) gives a better ranking of the potential merits of the models after relatively easily fixed defects have been corrected. Unfortunately, failure to distinguish between current and potential merits could result in breezy dismissal of a promising model.
In the present context of the data Y having the special role of being the standard of comparison, MSD is related to the mean squared error of prediction (MSEP), which is a standard statistic for assessing predictive accuracy (Wallach and Goffinet, 1989). By definition, MSEP is the mean squared difference between model predictions and the true values, which can be estimated from available empirical data by means of proper calculations even though the true values are inaccessible theoretical quantities. Hence, MSEP reflects but one kind of imperfection, the model's imperfection, which clearly is the single most informative statistic regarding a model's predictive accuracy. By contrast, MSD has two sources of discrepancies between model X and data Y, namely that X is not perfect, and neither is Y.
In the special but moderately frequent case that the errors in Y are small relative to those in X, MSD is a close surrogate for MSEP. Otherwise, MSD needs to be discounted for the discrepancies originating from the validation data Y to estimate the errors in the model X itself, MSEP (Gauch, 1992, p. 134153, especially page 140; Gauch, 2003, p. 303312). In either case, the relationship between MSD and MSEP explains in part the inherent importance of MSD, especially when the data Y have relatively small errors. Even if the magnitude of the errors in the validation data cannot be quantified accurately (because of lack of replication) so that MSEP values cannot be calculated, and even though corresponding MSEP and MSD values will differ, nevertheless the rankings of models by MSD can be expected to be similar to those that would emerge from MSEP. In such situations, MSD values (or more precisely, MSD rankings) are still useful for selecting the most predictively accurate model.
There are interesting relationships between regression and MSD components (beyond the transparent connections already noted between NU and b and between LC and r). Consider the least-squares regression estimator for the dependent variable Yn, which is ordinarily denoted by
n
, but here is more conveniently denoted by Zn, namely Zn = a + bXn. The vector, Z, opens up two new possibilities for MSD calculations: Z could replace Y or else X.
First, for the MSD comparison of X and Z, replacement of Y by Z is equivalent geometrically to projecting points vertically onto the regression line. Accordingly, all scatter has been eliminated, so r2 = 1 and hence LC = 0. But SB and NU remain the same as they were for the original comparison of X and Y.
Second, for the MSD comparison of Z and Y, replacement of X by Z is equivalent geometrically to projecting points horizontally onto the 1:1 line of equality. The regression line is automatically the 1:1 line, so a = 0 and b = 1 and hence SB = 0 and NU = 0. But LC remains the same.
Note that these two replacements have clear and simple consequences for MSD components, which are exact opposites. The first replacement could be of special interest when the LC component of MSD is large, whereas the second could be of interest when the SB or NU component or both components are large.
The usual reason for interest in Z, related to this first replacement, is that regression estimates are often closer to the true values than are the imperfect data Y. Basically, accuracy and efficiency are gained because parsimonious models can reduce noise and because regressions are based on more data than just one individual datum (Gauch, 1993, 2003, p. 291296; Gruber, 1998).
But this second replacement is also interesting. Applying the trivial transformation a + bXn to model outputs Xn can be regarded as a patch to the model that automatically eliminates the SB and NU components of MSD. For instance, for the five wheat yield models shown in Fig. 2, on average this simple patch reduces MSD to only 46% of the original values.
The real issue meriting more research, however, is not how much this patch helps for model outputs and data used in constructing this patch, but rather how much this patch would help for other (or new) model outputs and data not used in constructing this patch. Needless to say, merely patching a model and properly refining a model are two different matters. But sometimes a quick patch may have some utility, especially if a validation study has proven that it improves predictive accuracy. Furthermore, awareness of MSD components may assist and accelerate subsequent model refinement.
In conclusion, MSD and its components are useful for model evaluation, including assessing predictive accuracy (which implies special interest in the 1:1 line of equality). But MSD measures discrepancies between model predictions and validation observations, whereas the more direct measure of predictive accuracy is MSEP, which reflects discrepancies between model predictions and true values. Accordingly, MSD is more valuable when the errors in the validation data Y are rather small compared with the errors in the model predictions X, in which case MSD approaches MSEP. Regression parameters are also informative, particularly since the slope b is closely related to the NU component of MSD and the squared correlation r2 is closely related to LC, as Eq. [5] and [6] show.
| ACKNOWLEDGMENTS |
|---|
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
R. H. Skinner, M. S. Corson, and T. G. Gilmanov Simulating Gross Primary Productivity of Humid-Temperate Pastures Agron. J., May 7, 2008; 100(3): 801 - 807. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. I. Lizaso, A. E. Fonseca, and M. E. Westgate Simulating Source-Limited and Sink-Limited Kernel Set with CERES-Maize Crop Sci., September 1, 2007; 47(5): 2078 - 2088. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. J. Causarano, J. N. Shaw, A. J. Franzluebbers, D. W. Reeves, R. L. Raper, K. S. Balkcom, M. L. Norfleet, and R. C. Izaurralde Simulating Field-Scale Soil Organic Carbon Dynamics Using EPIC Soil Sci. Soc. Am. J., June 8, 2007; 71(4): 1174 - 1185. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. H. Waiser, C. L. S. Morgan, D. J. Brown, and C. T. Hallmark In Situ Characterization of Soil Clay Content with Visible Near-Infrared Diffuse Reflectance Spectroscopy Soil Sci. Soc. Am. J., March 12, 2007; 71(2): 389 - 396. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Parsons, J. H. Cherney, and H. G. Gauch Jr. Alfalfa Fiber Estimation in Mixed Stands and Its Relationship to Plant Morphology Crop Sci., October 2, 2006; 46(6): 2446 - 2452. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Parsons, J. H. Cherney, and H. G. Gauch Estimation of Preharvest Fiber Content of Mixed Alfalfa-Grass Stands in New York Agron. J., June 27, 2006; 98(4): 1081 - 1089. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Kobayashi Comments on another way of partitioning mean squared deviation proposed by Gauch et al. (2003) Agron. J., July 1, 2004; 96(4): 1206 - 1207. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| The SCI Journals | Crop Science | Vadose Zone Journal | |||
| Journal of Natural Resources and Life Sciences Education |
Soil Science Society of America Journal | ||||
| Journal of Plant Registrations | Journal of Environmental Quality |
The Plant Genome | |||