Statistics homework help. Stat 423 Section 02 Spring 2020 Name ______________________________________

Exam 3 (100 points) ID Number __________________________

Part I. Workout Problems. Show solution in support of your answers. Unsupported answers will not receive full

credit. (61 points)

1. A 2!”# fractional factorial involving factors A, B, C, D, E and F is to be run. Practitioners have these two sets of

generators in mind:

Design 1 Generators: E=ABD and F=ACD

Design 2 Generators: E=ABCD and F=ABD

a. Consider Design 1. Which treatments in this experiment will have both factors A and B at their high (+)

levels? [6 pts]

b. Consider Design 1. Derive its defining relation and determine its resolution. [8 pts]

c. The defining relation for Design 2 is I=CEF=ABDF=ABCDE. Which design (1 or 2) is better? Explain briefly

and give at least one reason for your choice. [3 pts]

2. A 2$”% fractional factorial was conducted to study the effects of four factors on the bond strength of an

integrated circuit mounted on metallized glass substrate. The four factors (and their levels) that engineers

identified as potentially important determiners of bond strength are listed in the table below.

Factor Levels

A – Adhesive Type D2A (−) vs. H-1-E (+)

B – Conductor Material Copper (−) vs. Nickel (+)

C – Cure Time at 90°C 90 min (−) vs. 120 min (+)

D – Deposition Material Tin (−) vs. Silver (+)

Let �& = main effect of A, �’= main effect of B, �( = main effect of C, �) = main effect of D, and � = interaction

effect. Summary statistics and the results of the Yates algorithm for computing fitted effects are given below.

Treatment Replication

Sample

Variance ��

Sample

Mean �+

Yates Algorithm

Cycle 1 Cycle 2 Cycle 3 Fitted Effect

(1) 5 2.452 73.48 157.36 314.54 650.84 81.355

ad 5 4.233 83.88 157.18 336.30 7.84 0.980

bd 5 0.647 81.58 166.60 4.42 2.92 0.365

ab 5 26.711 75.60 169.70 3.42 2.08 0.260

cd 5 0.503 87.06 10.40 −0.18 21.76 2.720

ac 5 8.562 79.54 −5.98 3.10 −1.00 −0.125

bc 5 1.982 79.38 −7.52 −16.38 3.28 0.410

abcd 5 3.977 90.32 10.94 18.46 34.84 4.355

a. The replications and the sample variances of the 8 treatment combinations are given in the 2nd and 3rd

columns, respectively, in the table above. Compute �(0.05) for judging if a fitted effect is statistically

significant at the � = 0.05 level. Note that the sum of the variances is 49.067. [8 pts]

b. The generator and defining relation were D=ABC and I=ABCD, respectively. If you have no answer in (a), use

�(�. ��) = �. ���.

i. Based on your answer in (a), is the fitted effect 0.980 statistically significant? [2 pts]

Select one: NO YES

ii. What sum of effects does the fitted effect 0.980 estimate? Your answer should be a sum of

subscripted/superscripted Greek letters (e.g., �# + �##

+,). [4 pts]

3. The diameter � of a tree at breast height (in cm, relatively easy to measure) is used to predict the height � of a

tree (in m, difficult to measure). Summary data on � = 36 white spruce trees (in British Columbia) are given

below.

B� = 655.1, B�# = 12711.47, B� = 644.7, B�# = 11824.45,

B�� = 12112.34, �– = 790.4697, ��� = �.. = 278.9475, �̅= 18.1972, �G = 17.9083.

a. Do some calculations to show that the least-squares line is �H = 9.1468 + 0.4815�. [10 pts]

b. Compute the sample correlation � between � and �. Give a quick interpretation. [6 pts]

Interpretation:

c. Construct an interval with 95% confidence for the height of a new spruce tree with a breast height diameter �

= 19 cm. Plug in numbers in a formula and do not simplify. Use � = 36, �̅= 18.1972, �– = 790.4697,

�# = ��� = 2.815. [8 pts]

Problem 3 (continued).

d. A scatterplot of the data and ��� values for the linear and quadratic model fits are given below. Also, the tota

l sum of squares for either model is ��� = 1824.45. Which of the two models provides a better description o

f the data? Explain briefly. In your explanation, use both graphical AND numeric results [6 pts]

Part II. Multiple Choice. Circle the letter of the correct/best answer. (39 points)

1. Which of the following statements is NOT true?

A. The simple linear regression model is � = �/ + �%� + � where the � is a random variable that is normally

distributed with mean 0 and variance �#.

B. In simple linear regression, the independent variable � is also referred to as the predictor or explanatory

variable.

C. The goal of least-squares regression is to find the curve that maximizes the sum of the squared distances

between the curve and the data points.

D. A first step in a regression analysis involving two variables is to construct a scatter plot.

2. In fitting � = �/ + �%� + � through data, (1.7, 2.5) is a 90% confidence interval for �%. What is a 90%

confidence interval for the mean change in � when we reduce � by 0.65.

A. (−1.625, −1.105)

B. (1.05, 1.85)

C. (1.105, 1.625)

D. (2.35, 3.15)

3. Which of the following is/are TRUE about the correlation coefficient � between � and �?

A. For the simple linear regression, 100% × �# = �# where �# is the coefficient of determination (in %).

B. A correlation of � = −0.87 is weaker than a correlation of � = 0.25.

C. The correlation � is a measure of the strength of the linear relationship between � and �.

D. If � = −0.1, and we convert � (in inches) to centimeters (1 in = 2.54 cm), then the correlation becomes

2.54 × (−0.1) = −0.254.

E. Both (A) and (C).

Model ���

� = �/ + �%� + � 95.703

� = �/ + �%� + �#�# + � 63.007

5 10 15 20 25 30

8 10 12 14 16 18 20 22

Breast-Height Diameter x

Height y

4. Is � = �/ ⋅ �%

0 intrinsically linear? If yes, what is appropriate transformation to obtain a linear model?

Recall: log(��) = log(�) + log(�) , log(�1) = � ⋅ log(�)

A. No.

B. Yes, log(�) = log(�/) + log(�%) ⋅ �

C. Yes, log(�) = log(�/) + �% ⋅ log (�)

D. Yes, log(�) = log(�/) + �% ⋅ �

For Problems 5 to 8: A study investigated the effects of �% = Seal Temperature, �# = Cooling Bar Temperature, and

�2 = % Polyethylene Additive on the seal strength �. The three models in column of the table below were fit to the

data.

There were � = 20 observations, and the total sum of squares (for all 3 models) is ��� = 82.17 (total df = 19).

5. What is ��� for Model (1)?

A. 30.96

B. 51.21

C. 21.36

D. 60.81

6. What is �34′

# for Model (2)?

A. 49.42%

B. 76.66%

C. 23.34%

D. 84.03%

7. What is the F statistic for testing �/:{�% = �# = ⋯ = �5 = 0} versus �3: {�/ is false.} with model (3).

A. 6.59

B. 9.69

C. 3.23

D. 5.36

8. In the fit of Model (2), we get �^

6 = −0.5 and �78! = 0.3552 and find that the P-value is 0.1827 for testing

�/: �6 = 0 versus �3: �6 ≠ 0. What are the � test statistic and conclusion at � = 0.10 significance level?

A. � = −1.41. There is NO significant interaction between �% and �2.

B. � = 1.41. The predictor �6 has NO significant effect on the response �.

C. � = −0.84. There is NO significant interaction between �% and �2.

D. � = −1.41. There is significant interaction between �% and �2.

Model �� ����

� ���

(1) � = �/ + �%�% + �#�# + �2�2 + � 37.68% 25.99% ?

(2) � = �/ + �%�% + �2�2 + �$�%

# + �<�#

# + �!�2

# + �6�%�2 + �

84.03% ? 13.1231

(3) � = �/ + �%�% + �#�# + �2�2 + �$�%

# + �<�#

# + �!�2

#

+ �=�%�# + �6�%�2 + �5�#�2 + � 85.57% 72.58% 11.8593

9. Which of the following is not true about 2>”? fractional factorial studies?

A. The loss of information and ambiguity (confounding) can be held to a minimum by careful planning and

wise analysis.

B. A loss of information is usually expected because we are unable to observe responses at all of the 2>

factor combinations.

C. If two effects are aliased or confounded together, it means that we can discuss their significance together

but not apart from each other.

D. None of the above.

10. A fitted multiple regression model is �H = 10 − 4�% + 3�#. If �% is decreased by 2, while holding �# fixed, then

then we can expect �

A. to increase by 8

B. to decrease by 6

C. to increase by 6

D. to decrease by 8

E. remain the same

11. Suppose that the least-squares line is �H = −2.12 + 15.75�. If the � test statistic for testing �/: �% = 0

against �3: �% ≠ 0 is � = 2.1 (from the ANOVA table), what is the � test statistic for testing the same

hypotheses?

A. � = 1.45

B. � = −4.41

C. � = −1.45

D. � = 4.41

12. Which of the following statements is true?

A. Model 1 with more predictor terms may not necessarily be a better than Model 2 with fewer predictor

terms even though Model 1’s coefficient of multiple determination �# is larger.

B. To balance the cost of using more parameters against the gain in the coefficient of multiple determination

�#, many statisticians use �34′

# = {the adjusted �#}.

C. An objective of regression analysis is to find a model that is simple (relatively few parameters) and provides

a good fit to the data.

D. All of the above.

13. A study investigated the effects of three explanatory variables �%, �#, and �2 on the response �. The model � =

�/ + �%�% + �#�# + �2�2 + � provided a good �# value. Which of the following is NOT appropriate in assessing

the (statistical) significance of the relationship between �2 and �?

A. a � test of �/: �2 = 0 versus �3: �2 ≠ 0

B. a prediction interval

C. a confidence interval for �2

D. the sample correlation between �2 and �

E. a comparison of �34′

# values for � = �/ + �%�% + �#�# + �2�2 + � and � = �/ + �%�% + �#�# + �