Saturday, May 18, 2019

Econometrics Chapter Summaries Essay

2) Basic Ideas of Linear retroversion The Two-Variable ModelIn this chapter we introduced some fundamental ideas of lapse abstract. Starting with the key concept of the population fixation function (PRF), we developed the concept of elongated PRF. This book is primarily concerned with elongated PRFs, that is, revertings that argon additive in the parameters regardless of whether or non they be linear in the versatiles. We thus introduced the idea of the stochastic PRF and discussed in detail the constitution and role of the stochastic misunderstanding term u. PRF is, of course, a theoretical or idealized construct beca ingestion, in practice, each we set out is a sample(s) from some population.This necessitated the discussion of the sample simple infantile fixation function (SRF). We then considered the heading of how we actually go ab pop discover obtaining the SRF. Here we discussed the popular manner of common least squ ars (OLS) and presented the appropriat e formulas to estimate the parameters of the PRF. We illustrated the OLS rule with a fully worked-out numerical example as soundly as with several practical examples. Our next task is to get a line out how good the SRF obtained by OLS is as an estimator of the true PRF. We undertake this important task in Chapter 3.3) The Two-Variable Model Hypothesis TestingIn Chapter 2 we showed how to estimate the parameters of the two- multivariate linear throwback puzzle. In this chapter we showed how the estimated baby-sit screw be used for the part of drawing inferences close to the true population fixing exercise. Although the two- shifting model is the simplest possible linear regression model, the ideas introduced in these two chapters argon the foundation of the more involved doubled regression models that we will discuss in ensuing chapters. As we will see, in umpteen ways the multiple regression model is a straightforward extension of the two-variable model.4) Multiple Regr ession Estimation and Hypothesis TestingIn this chapter we considered the simplest of the multiple regression models, namely, the three-variable linear regression model maven leechlike variable and two explanatory variables. Although in m all ways a straightforward extension of the two-variable linear regression model, the three-variable model introduced several new concepts, such(prenominal) as partial regression coefficients, ad plainlyed and unadjusted multiple coefficient of determination,and multicollinearity. Insofar as adherence of the parameters of the multiple regression coefficients is concerned, we still worked within the framework of the classical linear regression model and used the method of ordinary least squ ares (OLS). The OLS estimators of multiple regression, like the two-variable model, possess several desirable statistical properties summed up in the Gauss-Markov billet of best linear unbiased estimators (BLUE).With the assumption that the disturbance term fol scurvys the normal distribution with zipper mean and unvarying section 2, we saw that, as in the two-variable case, to each cardinal estimated coefficient in the multiple regression follows the normal distribution with a mean equal to the true population value and the variances presumption by the formulas developed in the text. Unfortunately, in practice, 2 is not known and has to be estimated. The OLS estimator of this unknown variance is . except if we replace 2 by , then, as in the two-variable case, each estimated coefficient of the multiple regression follows the t distribution, not the normal distribution. The association that each multiple regression coefficient follows the t distribution with d.f. equal to (n k), where k is the sum up of parameters estimated (including the intercept), essence we earth-closet use the t distribution to stress statistical hypotheses about each multiple regression coefficient single(a)ly.This can be dvirtuoso on the basis of e ither the t interrogatory of significance or the confidence interval based on the t distribution. In this respect, the multiple regression model does not differ much from the two-variable model, except that proper allowance must be do for the d.f., which now depend on the number of parameters estimated. However, when testing the hypothesis that all partial slope coefficients are simultaneously equal to zero, the individual t testing referred to earlier is of no serving.Here we should use the analysis of variance (ANOVA) technique and the at ladderant F test. Incidentally, testing that all partial slope coefficients are simultaneously equal to zero is the same as testing that the multiple coefficient of determination R2 is equal to zero. Therefore, the F test can in like manner be used to test this latter but equivalent hypothesis. We also discussed the question of when to add a variable or a group of variables to a model, using either the t test or the F test. In this context we also discussed the method of restricted least squares.5) available Forms of Regression ModelsIn this chapter we considered models that are linear in parameters, or that can be rendered as such with suitable transformation, but that are not necessarily linear in variables. There are a variety of such models, each having special applications. We considered five major types of nonlinear-in-variable but linear-in-parameter models, namely 1.The logarithm-linear model, in which some(prenominal) the parasitic variable and the explanatory variable are in logarithmic form. 2.The log-lin or growth model, in which the leechlike variable is logarithmic but the separate variable is linear. 3.The lin-log model, in which the qualified variable is linear but the independent variable is logarithmic. 4.The reciprocal model, in which the dependent variable is linear but the independent variable is not. 5.The polynominal model, in which the independent variable enters with dissimilar powers. Of course, thither is nothing that prevents us from combining the features of sensation or more of these models.Thus, we can have a multiple regression model in which the dependent variable is in log form and some of the X variables are also in log form, but some are in linear form. We studied the properties of these various models in terms of their relevancy in applied research, their slope coefficients, and their elasticity coefficients. We also showed with several examples the situations in which the various models could be used. costless to say, we will come across several more examples in the remainder of the text. In this chapter we also considered the regression-through-the-origin model and discussed some of its features. It cannot be overemphasized that in choosing among the competing models, the overriding objective should be the economic relevance of the various models and not merely the summary statistics, such as R2.Model constructing requires a proper balance of openi ng, availableness of the appropriate data, a good understanding of the statistical properties of the various models, and the elusive quality that is called practical judgment. Since the theory underlying a outlet of interest is never perfect, on that point is no such thing as a perfect model. What we hope for is a reasonably good model that will balance all these criteria. Whatever model is chosen in practice, we have to pay careful attention to the units in which the dependent and independent variables are expressed, for the interpretation of regression coefficients whitethorn hinge upon units ofmeasurement.6) Dummy Variable Regression ModelsIn this chapter we showed how qualitative, or skunk, variables taking value of 1 and 0 can be introduced into regression models alongside duodecimal variables. As the various examples in the chapter showed, the dummy variables are essentially a data-classifying device in that they take off a sample into various subgroups based on qualities or attributes (sex, marital status, race, religion, etc.) and implicitly run individual regressions for each subgroup. Now if at that place are differences in the responses of the dependent variable to the variation in the numerical variables in the various subgroups, they will be reflected in the differences in the intercepts or slope coefficients of the various subgroups, or both. Although it is a versatile tool, the dummy variable technique has to be handled carefully. First, if the regression model contains a constant term (as most models usually do), the number of dummy variables must be one less than the number of classifications of each qualitative variable.Second, the coefficient attached to the dummy variables must always be interpreted in coincidence to the control, or benchmark, groupthe group that gets the value of zero. Finally, if a model has several qualitative variables with several classes, submission of dummy variables can consume a large number of degrees of freedom (d.f.). Therefore, we should weigh the number of dummy variables to be introduced into the model against the total number of observations in the sample. In this chapter we also discussed the possibility of committing a condition error, that is, of fitting the wrong model to the data. If intercepts as well as slopes are expected to differ among groups, we should build a model that incorporates both the differential intercept and slope dummies.In this case a model that introduces only the differential intercepts is apparent to lead to a specification error. Of course, it is not always easy a priori to find out which is the true model. Thus, some amount of experimentation is required in a cover study, especially in situations where theory does not provide much guidance. The topic of specification error is discussed upgrade in Chapter 7. In this chapter we also briefly discussed the linear probability model (LPM) in which the dependent variable is itself binary. Although LPM can be estimated by ordinary least square (OLS), there are several problems with a routine application of OLS. Some of the problems can be resolved easily and some cannot. Therefore, alternative estimating procedures are needed. We mentioned two such alternatives, the logit and probit models, but we did not discuss them in view of the somewhat advanced nature of these models (but see Chapter 12).7) Model Selection Criteria and TestsThe major points discussed in this chapter can be summarized as follows 1.The classical linear regression model assumes that the model used in empirical analysis is emendly specified. 2.The term correct specification of a model can mean several things, including a.No theoretically relevant variable has been excluded from the model. b.No unnecessary or irrelevant variables are included in the model. c.The functional form of the model is correct.d.There are no errors of measurement.3.If a theoretically relevant variable(s) has been excluded from the model , the coefficients of the variables retained in the model are generally biased as well as inconsistent, and the error variance and the hackneyed errors of the OLS estimators are biased. As a result, the conventional t and F tests remain of questionable value. 4.Similar consequences ensue if we use the wrong functional form. 5.The consequences of including irrelevant variables(s) in the model are less serious in that estimated coefficients still remain unbiased and consistent, the error variance and normal errors of the estimators are correctly estimated, and the conventional hypothesis-testing procedure is still valid. The major penalty we pay is that estimated standard errors course to be relatively large, which pith parameters of the model are estimated rather imprecisely.As a result, confidence intervals tend to be somewhat wider. 6.In view of the potential seriousness of specification errors, in this chapter we considered several diagnostic tools to sustain us find out if w e have the specification error problem in any concrete situation. These tools include a graphical examination of the residuals and more formal tests, such as MWD and RESET. Since the search for a theoretically correct model can be exasperating, inthis chapter we considered several practical criteria that we should keep in mind in this search, such as (1) parsimony, (2) identifiability, (3) goodness of fit, (4) theoretical consistency, and (5) predictive power. As Granger notes, In the ultimate analysis, model building is probably both an art and a science. A sound knowledge of theoretical econometrics and the availability of an efficient computer program are not enough to ensure success.8) Multicollinearity What Happens If informative Variables are Correlated? An important assumption of the classical linear regression model is that there is no adopt linear relationship(s), or multicollinearity, among explanatory variables. Although cases of exact multicollinearity are rare in prac tice, situations of near exact or high multicollinearity occur frequently. In practice, therefore, the term multicollinearity refers to situations where two or more variables can be highly linearly related. The consequences of multicollinearity are as follows. In cases of perfect multicollinearity we cannot estimate the individual regression coefficients or their standard errors. In cases of high multicollinearity individual regression coefficients can be estimated and the OLS estimators retain their BLUE property.But the standard errors of one or more coefficients tend to be large in relation to their coefficient set, thereby reducing t values. As a result, based on estimated t values, we can say that the coefficient with the low t value is not statistically different from zero. In different words, we cannot assess the marginal or individual contribution of the variable whose t value is low. Recall that in a multiple regression the slope coefficient of an X variable is the partia l regression coefficient, which measures the (marginal or individual) effect of that variable on the dependent variable, holding all other Xvariables constant.However, if the objective of study is to estimate a group of coefficients fairly accurately, this can be done so long as collinearity is not perfect. In this chapter we considered several methods of detecting multicollinearity, pointing out their pros and cons. We also discussed the various remedies that have been proposed to solve the problem of multicollinearity and noted their strengths and weaknesses. Since multicollinearity is a feature of a given sample, we cannot foretell which method of detecting multicollinearity or whichremedial measure will work in any given concrete situation.9) Heteroscedasticity What Happens If the Error Variance Is Nonconstant? A critical assumption of the classical linear regression model is that the disturbances ui all have the same (i.e., homoscedastic) variance. If this assumption is not sat isfied, we have heteroscedasticity. Heteroscedasticity does not destroy the unbiasedness property of OLS estimators, but these estimators are no longer efficient. In other words, OLS estimators are no longer BLUE. If heteroscedastic variances i2 are known, then the method of weighted least squares (WLS) provides BLUE estimators. Despite heteroscedasticity, if we continue to use the usual OLS method not only to estimate the parameters (which remain unbiased) but also to establish confidence intervals and test hypotheses, we are likely to draw misleading conclusions, as in the NYSE Example 9.8. This is because estimated standard errors are likely to be biased and therefore the resulting t ratios are likely to be biased, too.Thus, it is important to find out whether we are faced with the heteroscedasticity problem in a specific application. There are several diagnostic tests of heteroscedasticity, such as plotting the estimated residuals against one or more of the explanatory variables , the Park test, the Glejser test, or the rank correlativity test (See Problem 9.13). If one or more diagnostic tests reveal that we have the heteroscedasticity problem, remedial measures are called for. If the true error variance i2 is known, we can use the method of WLS to obtain BLUE estimators. Unfortunately, knowledge about the true error variance is rarely available in practice.As a result, we are pressure to make some plausible assumptions about the nature of heteroscedasticity and to transform our data so that in the alter model the error term is homoscedastic. We then apply OLS to the transformed data, which amounts to using WLS. Of course, some skill and cognize are required to obtain the appropriate transformations. But without such a transformation, the problem of heteroscedasticity is insoluble in practice. However, if the sample size is reasonably large, we can use Whites procedure to obtain heteroscedasticity-corrected standard errors.10) Autocorrelation What Happ ens If Error Terms Are Correlated? The majorpoints of this chapter are as follows1.In the presence of autocorrelation OLS estimators, although unbiased, are not efficient. In short, they are not BLUE. 2.Assuming the Markov first- ordering autoregressive, the AR(1), scheme, we pointed out that the conventionally computed variances and standard errors of OLS estimators can be seriously biased. 3.As a result, standard t and F tests of significance can be seriously misleading. 4.Therefore, it is important to know whether there is autocorrelation in any given case. We considered three methods of detecting autocorrelation a.graphical plotting of the residualsb.the runs testc.the Durbin-Watson d test5.If autocorrelation is found, we suggest that it be corrected by appropriately transforming the model so that in the transformed model there is no autocorrelation. We illustrated the actual mechanics with several examples.11) Simultaneous Equation ModelsIn contrast to the single comparison mo dels discussed in the preceding chapters, in simultaneous equivalence regression models what is a dependent (endogenous) variable in one equation appears as an explanatory variable in another(prenominal) equation. Thus, there is a feedback relationship amid the variables. This feedback creates the simultaneousness problem,rendering OLS inappropriate to estimate the parameters of each equation individually. This is because the endogenous variable that appears as an explanatory variable in another equation may be correlated with the stochastic error term of that equation. This violates one of the critical assumptions of OLS that the explanatory variable be either fixed, or non ergodic, or if random, that it be uncorrelated with the error term. Because of this, if we use OLS, the estimates we obtain will be biased as well as inconsistent. Besides the simultaneity problem, a simultaneous equation model may have an identification problem.An identification problem means we cannot uniqu ely estimate the values of the parameters of an equation. Therefore, before we estimate a simultaneous equation model, we must find out if an equation insuch a model is identified. One cumbersome method of finding out whether an equation is identified is to obtain the cut down form equations of the model. A reduced form equation expresses a dependent (or endogenous) variable solely as a function of exogenous, or predetermined, variables, that is, variables whose values are determined outside the model. If there is a one-to-one correspondence between the reduced form coefficients and the coefficients of the master key equation, then the original equation is identified. A shortcut to determining identification is via the order condition of identification. The order condition counts the number of equations in the model and the number of variables in the model (both endogenous and exogenous).Then, based on whether some variables are excluded from an equation but included in other equ ations of the model, the order condition decides whether an equation in the model is underidentified, exactly identified, or overidentified. An equation in a model is underidentified if we cannot estimate the values of the parameters of that equation. If we can obtain unique values of parameters of an equation, that equation is said to be exactly identified. If, on the other hand, the estimates of one or more parameters of an equation are not unique in the sense that there is more than one value of some parameters, that equation is said to be overidentified. If an equation is underidentified, it is a cul de sac case. There is not much we can do, short of changing the specification of the model (i.e., developing another model).If an equation is exactly identified, we can estimate it by the method of indirect least squares (ILS). ILS is a trip the light fantastic procedure. In step 1, we apply OLS to the reduced form equations of the model, and then we retrieve the original structur al coefficients from the reduced form coefficients. ILS estimators are consistent that is, as the sample size increases indefinitely, the estimators converge to their true values. The parameters of the overidentified equation can be estimated by the method of two-stage least squares (2SLS). The basic idea behind 2SLS is to replace the explanatory variable that is correlated with the error term of the equation in which that variable appears by a variable that is not so correlated. Such a variable is called a proxy, or instrumental, variable.2SLS estimators, like the ILS estimators, are consistent estimators.12) Selected Topics in Single Equation Regression ModelsIn this chapter we discussed several topics of considerable practical importance. The first topic we discussed was energising modeling, in which time or lag explicitly enters into the analysis. In such models the current value of the dependent variable depends upon one or more lagged values of the explanatory variable(s). Th is dependence can be callable to psychological, technological, or institutional reasons. These models are generally known as distributed lag models. Although the inclusion of one or more lagged terms of an explanatory variable does not violate any of the standard CLRM assumptions, the estimation of such models by the usual OLS method is generally not recommended because of the problem of multicollinearity and the fact that every additional coefficient estimated means a loss of degrees of freedom. Therefore, such models are usually estimated by imposing some restrictions on the parameters of the models (e.g., the values of the various lagged coefficients decline from the first coefficient onward).This is the approach adopted by the Koyck, the adaptive expectations, and the partial, or stock, adjustment models. A unique feature of all these models is that they replace all lagged values of the explanatory variable by a single lagged value of the dependent variable. Because of the pres ence of the lagged value of the dependent variable among explanatory variables, the resulting model is called an autoregressive model. Although autoregressive models obtain economy in the estimation of distributed lag coefficients, they are not free from statistical problems. In particular, we have to guard against the possibility of autocorrelation in the error term because in the presence of autocorrelation and the lagged dependent variable as an explanatory variable, the OLS estimators are biased as well as inconsistent.In discussing the dynamic models, we pointed out how they help us to assess the short- and long-run impact of an explanatory variable on the dependent variable. The next topic we discussed related to the phenomenon of spurious, or nonsense, regression. Spurious regression arises when we regress a nonstationary random variable on one or more nonstationary random variables. A time series is said to be (weakly) stationary, if its mean, variance, and covariances at v arious lags are not time dependent. To find out whether a time series is stationary, we can use the unit root test. If the unit root test (or other tests) shows that the time series of interest is stationary,then the regression based on such time series may not be spurious. We also introduced the concept of cointegration. Two or more time series are said to be cointegrated if there is a stable, long-term relationship between the two even though individually each may be nonstationary.If this is the case, regression involving such time series may not be spurious. Next we introduced the random walk model, with or without drift. Several financial time series are found to follow a random walk that is, they are nonstationary either in their mean value or their variance or both. Variables with these characteristics are said to follow stochastic trends. Stock prices are a prime example of a random walk. It is hard to tell what the price of a stock will be tomorrow just by knowing its price like a shot. The best guess about tomorrows price is todays price plus or minus a random error term (or shock, as it is called). If we could predict tomorrows price fairly accurately, we would all be millionairesThe next topic we discussed in this chapter was the dummy dependent variable, where the dependent variable can take values of either 1 or 0. Although such models can be estimated by OLS, in which case they are called linear probability models (LPM), this is not the recommended procedure since probabilities estimated from such models can sometimes be negative or greater than 1. Therefore, such models are usually estimated by the logit or probit procedures. In this chapter we illustrated the logit model with concrete examples. Thanks to excellent computer packages, estimation of logit and probit models is no longer a mysterious or forbidding task.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.