Second Edition
John O. Rawlings
Sastry G. Pantula
David A. Dickey
CONTENTS
PREFACE vii
1 REVIEW OF SIMPLE REGRESSION 1
1.1 The Linear Model and Assumptions . . . . . . . . . . . . . 2
1.2 Least Squares Estimation . . . . . . . . . . . . . . . . . . . 3
1.3 Predicted Values and Residuals . . . . . . . . . . . . . . . . 6
1.4 Analysis of Variation in the Dependent Variable . . . . . . . 7
1.5 Precision of Estimates . . . . . . . . . . . . . . . . . . . . . 11
1.6 Tests of Significance and Confidence Intervals . . . . . . . . 16
1.7 Regression Through the Origin . . . . . . . . . . . . . . . . 21
1.8 Models with Several Independent Variables . . . . . . . . . 27
1.9 Violation of Assumptions . . . . . . . . . . . . . . . . . . . 28
1.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2 INTRODUCTION TO MATRICES 37
2.1 Basic Definitions . . . . . . . . . . . . . . . . . . . . . . . . 37
2.2 Special Types of Matrices . . . . . . . . . . . . . . . . . . . 39
2.3 Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . 40
2.4 Geometric Interpretations of Vectors . . . . . . . . . . . . . 46
2.5 Linear Equations and Solutions . . . . . . . . . . . . . . . . 50
2.6 Orthogonal Transformations and Projections . . . . . . . . 54
2.7 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . 57
2.8 Singular Value Decomposition . . . . . . . . . . . . . . . . . 60
xiv CONTENTS
2.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
2.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
3 MULTIPLE REGRESSION IN MATRIX NOTATION 75
3.1 TheModel . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.2 The Normal Equations and Their Solution . . . . . . . . . . 78
3.3 The Y and Residuals Vectors . . . . . . . . . . . . . . . . . 80
3.4 Properties of Linear Functions of Random Vectors . . . . . 82
3.5 Properties of Regression Estimates . . . . . . . . . . . . . . 87
3.6 Summary of Matrix Formulae . . . . . . . . . . . . . . . . . 92
3.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4 ANALYSIS OF VARIANCE
AND QUADRATIC FORMS 101
4.1 Introduction to Quadratic Forms . . . . . . . . . . . . . . . 102
4.2 Analysis of Variance . . . . . . . . . . . . . . . . . . . . . . 107
4.3 Expectations of Quadratic Forms . . . . . . . . . . . . . . . 113
4.4 Distribution of Quadratic Forms . . . . . . . . . . . . . . . 115
4.5 General Form for Hypothesis Testing . . . . . . . . . . . . . 119
4.5.1 The General Linear Hypothesis . . . . . . . . . . . . 119
4.5.2 Special Cases of the General Form . . . . . . . . . . 121
4.5.3 A Numerical Example . . . . . . . . . . . . . . . . . 122
4.5.4 Computing Q from Differences in Sums of Squares . 126
4.5.5 The R-Notation to Label Sums of Squares . . . . . . 129
4.5.6 Example: Sequential and Partial Sums of Squares . . 133
4.6 Univariate and Joint Confidence Regions . . . . . . . . . . . 135
4.6.1 Univariate Confidence Intervals . . . . . . . . . . . . 135
4.6.2 Simultaneous Confidence Statements . . . . . . . . . 137
4.6.3 Joint Confidence Regions . . . . . . . . . . . . . . . 139
4.7 Estimation of Pure Error . . . . . . . . . . . . . . . . . . . 143
4.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
5 CASE STUDY: FIVE INDEPENDENT VARIABLES 161
5.1 Spartina Biomass Production in the Cape Fear Estuary . . 161
5.2 Regression Analysis for the Full Model . . . . . . . . . . . . 162
5.2.1 The Correlation Matrix . . . . . . . . . . . . . . . . 164
5.2.2 Multiple Regression Results: Full Model . . . . . . . 165
5.3 Simplifying the Model . . . . . . . . . . . . . . . . . . . . . 167
5.4 Results of the Final Model . . . . . . . . . . . . . . . . . . . 170
5.5 General Comments . . . . . . . . . . . . . . . . . . . . . . . 177
5.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
6 GEOMETRY OF LEAST SQUARES 183
6.1 Linear Model and Solution . . . . . . . . . . . . . . . . . . . 184
6.2 Sums of Squares and Degrees of Freedom . . . . . . . . . . 189
CONTENTS xv
6.3 Reparameterization . . . . . . . . . . . . . . . . . . . . . . . 192
6.4 Sequential Regressions . . . . . . . . . . . . . . . . . . . . . 196
6.5 The Collinearity Problem . . . . . . . . . . . . . . . . . . . 197
6.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
6.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
7 MODEL DEVELOPMENT: VARIABLE SELECTION 205
7.1 Uses of the Regression Equation . . . . . . . . . . . . . . . 206
7.2 Effects of Variable Selection on Least Squares . . . . . . . . 208
7.3 All Possible Regressions . . . . . . . . . . . . . . . . . . . . 210
7.4 Stepwise Regression Methods . . . . . . . . . . . . . . . . . 213
7.5 Criteria for Choice of Subset Size . . . . . . . . . . . . . . . 220
7.5.1 Coefficient of Determination . . . . . . . . . . . . . . 220
7.5.2 Residual Mean Square . . . . . . . . . . . . . . . . . 222
7.5.3 Adjusted Coefficient of Determination . . . . . . . . 222
7.5.4 Mallows’ Cp Statistic . . . . . . . . . . . . . . . . . . 223
7.5.5 Information Criteria: AIC and SBC . . . . . . . . . 225
7.5.6 “Significance Levels” for Choice of Subset Size . . . 226
7.6 Model Validation . . . . . . . . . . . . . . . . . . . . . . . . 228
7.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
8 POLYNOMIAL REGRESSION 235
8.1 Polynomials in One Variable . . . . . . . . . . . . . . . . . . 236
8.2 Trigonometric Regression Models . . . . . . . . . . . . . . . 245
8.3 Response Curve Modeling . . . . . . . . . . . . . . . . . . . 249
8.3.1 Considerations in Specifying the Functional Form . . 249
8.3.2 Polynomial Response Models . . . . . . . . . . . . . 250
8.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262
9 CLASS VARIABLES IN REGRESSION 269
9.1 Description of Class Variables . . . . . . . . . . . . . . . . . 270
9.2 The Model for One-Way Structured Data . . . . . . . . . . 271
9.3 Reparameterizing to Remove Singularities . . . . . . . . . . 273
9.3.1 Reparameterizing with the Means Model . . . . . . . 274
9.3.2 Reparameterization Motivated by τi =0 . . . . . 277
9.3.3 Reparameterization Motivated by τt =0 . . . . . . . 279
9.3.4 Reparameterization: A Numerical Example . . . . . 280
9.4 Generalized Inverse Approach . . . . . . . . . . . . . . . . . 282
9.5 The Model for Two-Way Classified Data . . . . . . . . . . . 284
9.6 Class Variables To Test Homogeneity of Regressions . . . . 288
9.7 Analysis of Covariance . . . . . . . . . . . . . . . . . . . . . 294
9.8 Numerical Examples . . . . . . . . . . . . . . . . . . . . . . 300
9.8.1 Analysis of Variance . . . . . . . . . . . . . . . . . . 301
9.8.2 Test of Homogeneity of Regression Coefficients . . . 306
9.8.3 Analysis of Covariance . . . . . . . . . . . . . . . . . 307
xvi CONTENTS
9.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 316
10 PROBLEMAREAS IN LEAST SQUARES 325
10.1 Nonnormality . . . . . . . . . . . . . . . . . . . . . . . . . . 326
10.2 Heterogeneous Variances . . . . . . . . . . . . . . . . . . . . 328
10.3 Correlated Errors . . . . . . . . . . . . . . . . . . . . . . . . 329
10.4 Influential Data Points and Outliers . . . . . . . . . . . . . 330
10.5 Model Inadequacies . . . . . . . . . . . . . . . . . . . . . . . 332
10.6 The Collinearity Problem . . . . . . . . . . . . . . . . . . . 333
10.7 Errors in the Independent Variables . . . . . . . . . . . . . 334
10.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
10.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
11 REGRESSION DIAGNOSTICS 341
11.1 Residuals Analysis . . . . . . . . . . . . . . . . . . . . . . . 342
11.1.1 Plot of e Versus Y . . . . . . . . . . . . . . . . . . . 346
11.1.2 Plots of e Versus Xi . . . . . . . . . . . . . . . . . . 350
11.1.3 Plots of e Versus Time . . . . . . . . . . . . . . . . . 351
11.1.4 Plots of ei Versus ei−1 . . . . . . . . . . . . . . . . . 354
11.1.5 Normal Probability Plots . . . . . . . . . . . . . . . 356
11.1.6 Partial Regression Leverage Plots . . . . . . . . . . . 359
11.2 Influence Statistics . . . . . . . . . . . . . . . . . . . . . . . 361
11.2.1 Cook’s D . . . . . . . . . . . . . . . . . . . . . . . . 362
11.2.2 DFFITS . . . . . . . . . . . . . . . . . . . . . . . . . 363
11.2.3 DFBETAS . . . . . . . . . . . . . . . . . . . . . . . 364
11.2.4 COVRATIO . . . . . . . . . . . . . . . . . . . . . . 364
11.2.5 Summary of Influence Measures . . . . . . . . . . . . 367
11.3 Collinearity Diagnostics . . . . . . . . . . . . . . . . . . . . 369
11.3.1 Condition Number and Condition Index . . . . . . . 371
11.3.2 Variance Inflation Factor . . . . . . . . . . . . . . . 372
11.3.3 Variance Decomposition Proportions . . . . . . . . . 373
11.3.4 Summary of Collinearity Diagnostics . . . . . . . . . 377
11.4 Regression Diagnostics on the Linthurst Data . . . . . . . . 377
11.4.1 Plots of Residuals . . . . . . . . . . . . . . . . . . . 378
11.4.2 Influence Statistics . . . . . . . . . . . . . . . . . . . 388
11.4.3 Collinearity Diagnostics . . . . . . . . . . . . . . . . 391
11.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392
12 TRANSFORMATION OF VARIABLES 397
12.1 Reasons for Making Transformations . . . . . . . . . . . . . 397
12.2 Transformations to Simplify Relationships . . . . . . . . . . 399
12.3 Transformations to Stabilize Variances . . . . . . . . . . . . 407
12.4 Transformations to Improve Normality . . . . . . . . . . . . 409
12.5 Generalized Least Squares . . . . . . . . . . . . . . . . . . . 411
12.5.1 Weighted Least Squares . . . . . . . . . . . . . . . . 414
CONTENTS xvii
12.5.2 Generalized Least Squares . . . . . . . . . . . . . . . 417
12.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426
12.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427
13 COLLINEARITY 433
13.1 Understanding the Structure of the X-Space . . . . . . . . . 435
13.2 Biased Regression . . . . . . . . . . . . . . . . . . . . . . . 443
13.2.1 Explanation . . . . . . . . . . . . . . . . . . . . . . . 443
13.2.2 Principal Component Regression . . . . . . . . . . . 446
13.3 General Comments on Collinearity . . . . . . . . . . . . . . 457
13.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459
13.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459
14 CASE STUDY: COLLINEARITY PROBLEMS 463
14.1 The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 463
14.2 Multiple Regression: Ordinary Least Squares . . . . . . . . 467
14.3 Analysis of the Correlational Structure . . . . . . . . . . . . 471
14.4 Principal Component Regression . . . . . . . . . . . . . . . 479
14.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 482
14.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483
15 MODELS NONLINEAR IN THE PARAMETERS 485
15.1 Examples of Nonlinear Models . . . . . . . . . . . . . . . . 486
15.2 Fitting Models Nonlinear in the Parameters . . . . . . . . . 494
15.3 Inference in Nonlinear Models . . . . . . . . . . . . . . . . . 498
15.4 Violation of Assumptions . . . . . . . . . . . . . . . . . . . 507
15.4.1 Heteroscedastic Errors . . . . . . . . . . . . . . . . . 507
15.4.2 Correlated Errors . . . . . . . . . . . . . . . . . . . . 509
15.5 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . 509
15.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 511
16 CASE STUDY: RESPONSE CURVE MODELING 515
16.1 The Ozone–Sulfur Dioxide Response Surface (1981) . . . . . 517
16.1.1 Polynomial Response Model . . . . . . . . . . . . . . 520
16.1.2 Nonlinear Weibull Response Model . . . . . . . . . . 524
16.2 Analysis of the Combined Soybean Data . . . . . . . . . . . 530
16.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 543
17 ANALYSIS OF UNBALANCED DATA 545
17.1 Sources Of Imbalance . . . . . . . . . . . . . . . . . . . . . 546
17.2 Effects Of Imbalance . . . . . . . . . . . . . . . . . . . . . . 547
17.3 Analysis of Cell Means . . . . . . . . . . . . . . . . . . . . . 549
17.4 Linear Models for Unbalanced Data . . . . . . . . . . . . . 553
17.4.1 Estimable Functions with Balanced Data . . . . . . 554
17.4.2 Estimable Functions with Unbalanced Data . . . . . 558
xviii CONTENTS
17.4.3 Least Squares Means . . . . . . . . . . . . . . . . . . 564
17.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 568
18 MIXED EFFECTS MODELS 573
18.1 Random Effects Models . . . . . . . . . . . . . . . . . . . . 574
18.2 Fixed and Random Effects . . . . . . . . . . . . . . . . . . . 579
18.3 Random Coefficient Regression Models . . . . . . . . . . . . 584
18.4 General Mixed Linear Models . . . . . . . . . . . . . . . . . 586
18.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 589
19 CASE STUDY: ANALYSIS OF UNBALANCED DATA 593
19.1 The Analysis Of Variance . . . . . . . . . . . . . . . . . . . 596
19.2 Mean Square Expectations and Choice of Errors . . . . . . 607
19.3 Least Squares Means and Standard Errors . . . . . . . . . . 610
19.4 Mixed Model Analysis . . . . . . . . . . . . . . . . . . . . . 615
19.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 618
A APPENDIX TABLES 621
REFERENCES 635
AUTHOR INDEX 647
SUBJECT INDEX 650
|