Modelos de estatística básica podem ser rodados de forma muito convencional usando o pacote Statsmodels conforme já foi mencionado aqui nessa resposta.
Nesse capítulo são usados basicamente três bases de dados: GPA1.raw, WAGE1.raw, 401K.raw e CRIME1.raw são estudados alguns modelos de regressão linear (a diferença do capítulo anterior é que ele lida apenas com modelos de regressão linear simples).
Modelos usando GPA1.raw:
Wooldridge estima dois modelos \(colGPA=\beta_0 + \beta_1 hsGPA + \beta_2 ACT\) e \(colGPA=\beta_0 + \beta_1 ACT\)
import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt
import pandas as pd
import patsy as ps
# Examples 3.1 and 3.4
if __name__ == '__main__':
df = pd.read_csv('/home/daniel/Documents/Projetos/Prorum/Python For Econometrics/DataSets/Txt/GPA1.raw',delim_whitespace=True,header=None)
df.columns=['age','soph','junior','senior','senior5','male','campus','business','engineer','colGPA','hsGPA','ACT','job19','job20','drive','bike','walk','voluntr','PC','greek','car','siblings','bgfriend',
'clubs','skipped','alcohol','gradMI','fathcoll','mothcoll']
# Multiple Regression
y,X = ps.dmatrices('colGPA ~ hsGPA + ACT',data=df, return_type='dataframe')
model = sm.OLS(y,X) # Describe Model
results = model.fit() # Fit model
print results.summary()
# Simple Regression
y,X = ps.dmatrices('colGPA ~ ACT',data=df, return_type='dataframe')
model = sm.OLS(y,X) # Describe Model
results = model.fit() # Fit model
print results.summary()
Resultado do modelo 1:
OLS Regression Results
==============================================================================
Dep. Variable: colGPA R-squared: 0.176
Model: OLS Adj. R-squared: 0.164
Method: Least Squares F-statistic: 14.78
Date: Wed, 17 Feb 2016 Prob (F-statistic): 1.53e-06
Time: 09:42:13 Log-Likelihood: -46.573
No. Observations: 141 AIC: 99.15
Df Residuals: 138 BIC: 108.0
Df Model: 2
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [95.0% Conf. Int.]
------------------------------------------------------------------------------
Intercept 1.2863 0.341 3.774 0.000 0.612 1.960
hsGPA 0.4535 0.096 4.733 0.000 0.264 0.643
ACT 0.0094 0.011 0.875 0.383 -0.012 0.031
==============================================================================
Omnibus: 3.056 Durbin-Watson: 1.885
Prob(Omnibus): 0.217 Jarque-Bera (JB): 2.469
Skew: 0.199 Prob(JB): 0.291
Kurtosis: 2.488 Cond. No. 298.
==============================================================================
Resultado do Modelo 2:
OLS Regression Results
==============================================================================
Dep. Variable: colGPA R-squared: 0.043
Model: OLS Adj. R-squared: 0.036
Method: Least Squares F-statistic: 6.207
Date: Wed, 17 Feb 2016 Prob (F-statistic): 0.0139
Time: 09:42:13 Log-Likelihood: -57.177
No. Observations: 141 AIC: 118.4
Df Residuals: 139 BIC: 124.3
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [95.0% Conf. Int.]
------------------------------------------------------------------------------
Intercept 2.4030 0.264 9.095 0.000 1.881 2.925
ACT 0.0271 0.011 2.491 0.014 0.006 0.049
==============================================================================
Omnibus: 3.174 Durbin-Watson: 1.909
Prob(Omnibus): 0.205 Jarque-Bera (JB): 2.774
Skew: 0.248 Prob(JB): 0.250
Kurtosis: 2.525 Cond. No. 209.
==============================================================================
Modelos usando WAGE1.raw:
Wooldridge estima dois modelos \(\log(wage)=\beta_0 + \beta_1 educ + \beta_2 expert + \beta_3 tenure\) e \(\log(wage)=\beta_0 + \beta_1 educ\)
import numpy as np
import statsmodels.api as sm
import matplotlib as plt
import pandas as pd
import patsy as ps
# Ex. 3.2 and Ex. 3.6
if __name__ == '__main__':
df = pd.read_csv('/home/daniel/Documents/Projetos/Prorum/Python For Econometrics/DataSets/Txt/WAGE1.raw',delim_whitespace=True,header=None)
df.columns=['wage','educ','exper','tenure','nonwhite','female','married','numdep','smsa','northcen','south','west','construc','ndurman','trcommpu','trade',\
'services','profserv','profocc','clerocc','servocc','lwage','expersq','tenursq']
# Multiple regression
y,X = ps.dmatrices('np.log(wage) ~ educ + exper + tenure',data=df, return_type='dataframe')
model = sm.OLS(y,X) # Describe Model
results = model.fit() # Fit model
print results.summary()
# Simple regression
y,X = ps.dmatrices('np.log(wage) ~ educ',data=df, return_type='dataframe')
model = sm.OLS(y,X) # Describe Model
results = model.fit() # Fit model
print results.summary()
Modelo usando 401K.raw:
Wooldridge estima o modelo \(prate=\beta_0 + \beta_1 mrate +\beta_2 age\)
import numpy as np
import statsmodels.api as sm
import matplotlib as plt
import pandas as pd
import patsy as ps
# Ex. 3.3
if __name__ == '__main__':
df = pd.read_csv('/home/daniel/Documents/Projetos/Prorum/Python For Econometrics/DataSets/Txt/401K.raw',delim_whitespace=True,header=None)
df.columns=['prate','mrate','totpart','totelg','age','totemp','sole','ltotemp']
y,X = ps.dmatrices('prate ~ mrate + age',data=df, return_type='dataframe')
model = sm.OLS(y,X) # Describe Model
results = model.fit() # Fit model
print results.summary()
Modelos usando CRIME1.raw:
Wooldridge estima dois modelos \(narr86=\beta_0 + \beta_1 pcnv + \beta_2 ptime86 + \beta_3 qemp86\) e \(narr86=\beta_0 + \beta_1 pcnv + \beta_2 avgsen + \beta_3 ptime86 + \beta_4 qemp86\)
import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt
import pandas as pd
import patsy as ps
# Example 3.5
if __name__ == '__main__':
df = pd.read_csv('/home/daniel/Documents/Projetos/Prorum/Python For Econometrics/DataSets/Txt/CRIME1.raw',delim_whitespace=True,header=None)
df.columns=['narr86','nfarr86','nparr86','pcnv','avgsen','tottime','ptime86','qemp86','inc86','durat','black','hispan','born60','pcnvsq','pt86sq','inc86sq']
# Model 1
y,X = ps.dmatrices('narr86 ~ pcnv + ptime86 + qemp86',data=df, return_type='dataframe')
model = sm.OLS(y,X) # Describe Model
results = model.fit() # Fit model
print results.summary()
# Model 2
y,X = ps.dmatrices('narr86 ~ pcnv + avgsen + ptime86 + qemp86',data=df, return_type='dataframe')
model = sm.OLS(y,X) # Describe Model
results = model.fit() # Fit model
print results.summary()