Primeira vez aqui? Seja bem vindo e cheque o FAQ!
x

Introductory Econometrics - Jeffrey M. Wooldridge: Como replicar os exemplos do Capítulo 3 usando python?

+3 votos
119 visitas

1 Resposta

0 votos
respondida Fev 17, 2016 por danielcajueiro (5,251 pontos)  

Modelos de estatística básica podem ser rodados de forma muito convencional usando o pacote Statsmodels conforme já foi mencionado aqui nessa resposta.

Nesse capítulo são usados basicamente três bases de dados: GPA1.raw, WAGE1.raw, 401K.raw e CRIME1.raw são estudados alguns modelos de regressão linear (a diferença do capítulo anterior é que ele lida apenas com modelos de regressão linear simples).

Modelos usando GPA1.raw:

Wooldridge estima dois modelos \(colGPA=\beta_0 + \beta_1 hsGPA + \beta_2 ACT\) e \(colGPA=\beta_0 + \beta_1 ACT\)

import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt
import pandas as pd
import patsy as ps

# Examples 3.1 and 3.4

if __name__ == '__main__':

    df = pd.read_csv('/home/daniel/Documents/Projetos/Prorum/Python For Econometrics/DataSets/Txt/GPA1.raw',delim_whitespace=True,header=None)
    df.columns=['age','soph','junior','senior','senior5','male','campus','business','engineer','colGPA','hsGPA','ACT','job19','job20','drive','bike','walk','voluntr','PC','greek','car','siblings','bgfriend', 
    'clubs','skipped','alcohol','gradMI','fathcoll','mothcoll']


    # Multiple Regression
    y,X = ps.dmatrices('colGPA ~ hsGPA + ACT',data=df, return_type='dataframe')
    model = sm.OLS(y,X) # Describe Model
    results = model.fit() # Fit model
    print results.summary()

    # Simple Regression
    y,X = ps.dmatrices('colGPA ~ ACT',data=df, return_type='dataframe')
    model = sm.OLS(y,X) # Describe Model
    results = model.fit() # Fit model
    print results.summary()

Resultado do modelo 1:

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 colGPA   R-squared:                       0.176
Model:                            OLS   Adj. R-squared:                  0.164
Method:                 Least Squares   F-statistic:                     14.78
Date:                Wed, 17 Feb 2016   Prob (F-statistic):           1.53e-06
Time:                        09:42:13   Log-Likelihood:                -46.573
No. Observations:                 141   AIC:                             99.15
Df Residuals:                     138   BIC:                             108.0
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
Intercept      1.2863      0.341      3.774      0.000         0.612     1.960
hsGPA          0.4535      0.096      4.733      0.000         0.264     0.643
ACT            0.0094      0.011      0.875      0.383        -0.012     0.031
==============================================================================
Omnibus:                        3.056   Durbin-Watson:                   1.885
Prob(Omnibus):                  0.217   Jarque-Bera (JB):                2.469
Skew:                           0.199   Prob(JB):                        0.291
Kurtosis:                       2.488   Cond. No.                         298.
==============================================================================

Resultado do Modelo 2:

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                 colGPA   R-squared:                       0.043
Model:                            OLS   Adj. R-squared:                  0.036
Method:                 Least Squares   F-statistic:                     6.207
Date:                Wed, 17 Feb 2016   Prob (F-statistic):             0.0139
Time:                        09:42:13   Log-Likelihood:                -57.177
No. Observations:                 141   AIC:                             118.4
Df Residuals:                     139   BIC:                             124.3
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
Intercept      2.4030      0.264      9.095      0.000         1.881     2.925
ACT            0.0271      0.011      2.491      0.014         0.006     0.049
==============================================================================
Omnibus:                        3.174   Durbin-Watson:                   1.909
Prob(Omnibus):                  0.205   Jarque-Bera (JB):                2.774
Skew:                           0.248   Prob(JB):                        0.250
Kurtosis:                       2.525   Cond. No.                         209.
==============================================================================

Modelos usando WAGE1.raw:

Wooldridge estima dois modelos \(\log(wage)=\beta_0 + \beta_1 educ + \beta_2 expert + \beta_3 tenure\) e \(\log(wage)=\beta_0 + \beta_1 educ\)

import numpy as np
import statsmodels.api as sm
import matplotlib as plt
import pandas as pd
import patsy as ps

# Ex. 3.2 and Ex. 3.6

if __name__ == '__main__':

    df = pd.read_csv('/home/daniel/Documents/Projetos/Prorum/Python For Econometrics/DataSets/Txt/WAGE1.raw',delim_whitespace=True,header=None)
    df.columns=['wage','educ','exper','tenure','nonwhite','female','married','numdep','smsa','northcen','south','west','construc','ndurman','trcommpu','trade',\
    'services','profserv','profocc','clerocc','servocc','lwage','expersq','tenursq']

    # Multiple regression
    y,X = ps.dmatrices('np.log(wage) ~ educ + exper + tenure',data=df, return_type='dataframe')
    model = sm.OLS(y,X) # Describe Model
    results = model.fit() # Fit model
    print results.summary()

    # Simple regression
    y,X = ps.dmatrices('np.log(wage) ~ educ',data=df, return_type='dataframe')
    model = sm.OLS(y,X) # Describe Model
    results = model.fit() # Fit model
    print results.summary()

Modelo usando 401K.raw:

Wooldridge estima o modelo \(prate=\beta_0 + \beta_1 mrate +\beta_2 age\)

import numpy as np
import statsmodels.api as sm
import matplotlib as plt
import pandas as pd
import patsy as ps

# Ex. 3.3

if __name__ == '__main__':

    df = pd.read_csv('/home/daniel/Documents/Projetos/Prorum/Python For Econometrics/DataSets/Txt/401K.raw',delim_whitespace=True,header=None)
    df.columns=['prate','mrate','totpart','totelg','age','totemp','sole','ltotemp']

    y,X = ps.dmatrices('prate ~ mrate + age',data=df, return_type='dataframe')
    model = sm.OLS(y,X) # Describe Model
    results = model.fit() # Fit model
    print results.summary()

Modelos usando CRIME1.raw:

Wooldridge estima dois modelos \(narr86=\beta_0 + \beta_1 pcnv + \beta_2 ptime86 + \beta_3 qemp86\) e \(narr86=\beta_0 + \beta_1 pcnv + \beta_2 avgsen + \beta_3 ptime86 + \beta_4 qemp86\)

import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt
import pandas as pd
import patsy as ps

# Example 3.5

if __name__ == '__main__':

    df = pd.read_csv('/home/daniel/Documents/Projetos/Prorum/Python For Econometrics/DataSets/Txt/CRIME1.raw',delim_whitespace=True,header=None)
    df.columns=['narr86','nfarr86','nparr86','pcnv','avgsen','tottime','ptime86','qemp86','inc86','durat','black','hispan','born60','pcnvsq','pt86sq','inc86sq']


    # Model 1
    y,X = ps.dmatrices('narr86 ~ pcnv + ptime86 + qemp86',data=df, return_type='dataframe')
    model = sm.OLS(y,X) # Describe Model
    results = model.fit() # Fit model
    print results.summary()

    # Model 2
    y,X = ps.dmatrices('narr86 ~ pcnv + avgsen + ptime86 + qemp86',data=df, return_type='dataframe')
    model = sm.OLS(y,X) # Describe Model
    results = model.fit() # Fit model
    print results.summary()  
...