Modelos de estatística básica podem ser rodados de forma muito convencional usando o pacote Statsmodels conforme já foi mencionado aqui nessa resposta.
Nesse capítulo são usados basicamente três bases de dados: CEOSAL1.raw, WAGE1.raw e VOTE1.raw e são estudados alguns modelos de regressão linear simples.
Modelos usando CEOSAL1.raw:
Wooldridge considera os seguintes modelos de regressão: \(salary =\beta_0 + \beta_1 roe\) e \(\log(salary) = \beta_0 + \beta_1 \log(roe)\).
# Example 2.3 and 2.6 and 2.8
import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt
import pandas as pd
import patsy as ps
if __name__ == '__main__':
df = pd.read_csv('/home/daniel/Documents/Projetos/Prorum/Python For Econometrics/DataSets/Txt/CEOSAL1.raw',delim_whitespace=True,header=None)
df.columns=['salary','pcsalary','sales','roe','pcroe','ros','indus','finance','consprod','utility','lsalary','lsales']
y,X = ps.dmatrices('salary ~ roe',data=df, return_type='dataframe')
model = sm.OLS(y,X) # Describe Model
results = model.fit() # Fit model
print results.summary()
# Evaluating FittedResults
fittedValues = results.fittedvalues
fittedValues.columns=['fitted']
#Evaluating Residual Results
residualValues = results.resid
residualValues.columns=['residuals']
# Creating the matrix of results
dfResults=pd.concat([df['roe'],df['salary'],fittedValues,residualValues],axis=1,keys=['roe','salary','salaryhat','uhat'])
#plot
fig=plt.figure()
ax = fig.add_subplot(111)
x=df['roe']
ax.plot(x, y, 'o', label="data")
ax.plot(x, fittedValues, 'r--.', label="OLS")
ax.legend(loc='best');
ax.set_xlabel('roe')
ax.set_ylabel('salary')
# Elasticity
y2,X2 = ps.dmatrices('np.log(salary) ~ np.log(sales)',data=df, return_type='dataframe')
model2 = sm.OLS(y2,X2) # Describe Model
results2 = model2.fit() # Fit model
print results2.summary()
Resultados do Primeiro Modelo:
OLS Regression Results
==============================================================================
Dep. Variable: salary R-squared: 0.013
Model: OLS Adj. R-squared: 0.008
Method: Least Squares F-statistic: 2.767
Date: Tue, 16 Feb 2016 Prob (F-statistic): 0.0978
Time: 16:41:07 Log-Likelihood: -1804.5
No. Observations: 209 AIC: 3613.
Df Residuals: 207 BIC: 3620.
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [95.0% Conf. Int.]
------------------------------------------------------------------------------
Intercept 963.1913 213.240 4.517 0.000 542.790 1383.592
roe 18.5012 11.123 1.663 0.098 -3.428 40.431
==============================================================================
Omnibus: 311.096 Durbin-Watson: 2.105
Prob(Omnibus): 0.000 Jarque-Bera (JB): 31120.901
Skew: 6.915 Prob(JB): 0.00
Kurtosis: 61.158 Cond. No. 43.3
==============================================================================

Resultados do Segundo Modelo:
==============================================================================
Dep. Variable: np.log(salary) R-squared: 0.211
Model: OLS Adj. R-squared: 0.207
Method: Least Squares F-statistic: 55.30
Date: Ter, 16 Fev 2016 Prob (F-statistic): 2.70e-12
Time: 16:41:07 Log-Likelihood: -152.50
No. Observations: 209 AIC: 309.0
Df Residuals: 207 BIC: 315.7
Df Model: 1
Covariance Type: nonrobust
=================================================================================
coef std err t P>|t| [95.0% Conf. Int.]
---------------------------------------------------------------------------------
Intercept 4.8220 0.288 16.723 0.000 4.254 5.390
np.log(sales) 0.2567 0.035 7.436 0.000 0.189 0.325
==============================================================================
Omnibus: 84.151 Durbin-Watson: 1.860
Prob(Omnibus): 0.000 Jarque-Bera (JB): 403.831
Skew: 1.507 Prob(JB): 2.04e-88
Kurtosis: 9.106 Cond. No. 70.0
==============================================================================
Modelos usando WAGE1.raw
Wooldridge considera os seguintes modelos \(wage=\beta_0 + \beta_1 educ\) e \(\log(wage)=\beta_0 + \beta_1 educ\).
import numpy as np
import statsmodels.api as sm
import matplotlib as plt
import pandas as pd
import patsy as ps
# Ex. 2.4 e Ex. 2.10
if __name__ == '__main__':
df = pd.read_csv('/home/daniel/Documents/Projetos/Prorum/Python For Econometrics/DataSets/Txt/WAGE1.raw',delim_whitespace=True,header=None)
df.columns=['wage','educ','exper','tenure','nonwhite','female','married','numdep','smsa','northcen','south','west','construc','ndurman','trcommpu','trade',\
'services','profserv','profocc','clerocc','servocc','lwage','expersq','tenursq']
y,X = ps.dmatrices('wage ~ educ',data=df, return_type='dataframe')
model = sm.OLS(y,X) # Describe Model
results = model.fit() # Fit model
print results.summary()
# Incorporating nonlinearities
y2,X2 = ps.dmatrices('np.log(wage) ~ educ',data=df, return_type='dataframe')
model2 = sm.OLS(y2,X2) # Describe Model
results2 = model2.fit() # Fit model
print results2.summary()
Modelo usando VOTE1.raw:
Wooldridge considera o seguinte modelo \(voteA=\beta_0 + \beta_1 shareA\).
import numpy as np
import statsmodels.api as sm
import matplotlib as plt
import pandas as pd
import patsy as ps
# Ex 2.5 and 2.9
if __name__ == '__main__':
df = pd.read_csv('/home/daniel/Documents/Projetos/Prorum/Python For Econometrics/DataSets/Txt/VOTE1.raw',delim_whitespace=True,header=None)
df.columns=['state','district','democA','voteA','expendA','expendB','prtystrA','lexpendA','lexpendB','shareA']
y,X = ps.dmatrices('voteA ~ shareA',data=df, return_type='dataframe')
model = sm.OLS(y,X) # Describe Model
results = model.fit() # Fit model
print results.summary()