Problem Set 5

(due March 25, 2004)

  1. Extending Problem 1 from Homework Assignment 2, suppose that the random five families had the following assets and number of children in addition to annual incomes and savings (in thousands of dollars):
    Family	Income 	Savings	Assets	Children
    		  I	   S	  W	   N
        -----------------------------------------------
    	 A	  8	  .6	 12	   5
    	 B	 11	 1.2	  6	   3
    	 C	  9	 1.0	  6	   1
    	 D	  6	  .7	  3	   3
    	 E	  6	  .3	 18	   4
    
    1. Use Stata to estimate the regression of S on I and W.
    2. Use the matrix capability in Stata to confirm your estimates by obtaining:
      1. X'X (where X includes both income and assets)
      2. X'X)-1
      3. X'y
      4. b
    3. Recompute the regression of S on I alone; compare the coefficient for Income in the bivariate regression to the coefficient in the multiple regression.
    4. For a family with assets of $5 thousand and income of $8 thousand, what would you predict their savings to be?
    5. If a family had a $2 thousand increase in income, while assets remained constant, what would you expect to be the change in savings?
    6. If income increased by $1 thousand, and assets by $3 thousand, how much would you expect savings to change?
    7. What is the multiple correlation of S on I and W?
      1. How does it compare to the simple correlation of S on I?
      2. What proportion of variance is explained by:
        1. I alone?
        2. I and W?
        3. by the addition of W (after I)?
        4. left unexplained by W and I?
    8. construct a 95% interval estimate for:
      1. b0
      2. b1
      3. b2
    9. evaluate the following hypotheses:
      1. ß1 = 0 (income does not affect savings)
      2. ß2 = 0 (wealth does not affect savings)
      3. ß1 = 0 and ß2 = 0 simultaneously
      4. ß1 = .10
      5. ß1 = -5ß2
    10. For a family with an income of $10 thousand and assets of $15 thousand, what is the point estimate and 95% interval estimate ("prediction interval") for savings?


  2. Add number of children (N) to the analysis above.
    1. what is the change in the coefficient of determination?
    2. compare the coefficient estimates for W and I in the new equation to that you obtained in Problem 1.


  3. A study of several hundred professors salaries in a large American university in 1969 yielded the following multiple regression equation (for the sake of brevity several terms have been omitted):
                  S = 230B + 18A + 100E + 490D + 190Y + 50T + . . .
    
    standard error    (86)   (8)   (28)   (60)   (17)   (370)
    t ratio           (  )   (  )  (  )   (  )   (   )  (   )
    95% Int. Est.     (  )   (  )  (  )   (  )   (   )  (   )
    
         where
            S = the professor's annual salary (dollars)
            B = number of books written by the professor
            A = number of ordinary articles
            E = number of excellent articles
            D = number of Ph.D.'s supervised
            Y = number of year's experience
            T = teaching scores as measured by student
                evaluations (coded 0 for below the median,
                100 for above the median)
    
    1. Fill in the brackets below the equation.
    2. Answer true or false; where false, correct it; where debatable or unclear, clarify and support your own point of view:
      1. The coefficient of B is estimated to be 230. Other social scientists might collect other samples from the same population and calculate other estimates. The distribution of these estimates would be centered around the population value of 230 obtained from the original sample. Therefore, the estimator is called unbiased.
      2. If there were no prior reason to believe that T affects S, it is reasonable to accept the null hypothesis that its coefficent is zero, thereby dropping it from the equation. This will, fortunately, make the equation briefer.
      3. Repeat (ii), substituting Y for T.
    3. For someone who knows no statistics, briefly summarize the influences on professors incomes, by indicating where strong evidence exists and where it does not.


  4. Identify a simple regression problem using the data set that will form the basis of your term paper
    1. Carry out a simple (bivariate) regression
    2. Examine the results for the following problems:
      1. Nonlinearity
      2. Outliers
      3. Heteroscedasticity
      4. Nonnormality
    3. For any of the problems that you detect, discuss what you might do to correct for them.

 

Bert Kritzer, 608-263-2277, Kritzer@PoliSci.Wisc.Edu
Last modified, March 4, 2004