2009.4.6
2009.4.6より
一般化線形モデル(GLM)が普通に使われるようになってきてから、説明変数と別の説明変数の間の交互作用(interaction)を使うことも増えてきた。たとえばRでは*や:で指定するだけだから、簡単に取り扱える。
では交互作用は、実際にはどのように取り扱われているのだろうか。Rのglm関数を例に見てみる。答は、説明変数の積(掛け算)である。
まずデータである(n=20)。以下のx01とx02が説明変数で、y01tが目的変数である、
> x01
[1] 0.7452506 1.6278334 1.3070735 1.0267938 2.7319448 2.3952411 0.1893610
[8] 3.8487420 3.0717137 2.5531784 2.0608768 2.4838042 2.9895419 2.8983844
[15] 0.8401353 1.2294717 2.0002388 1.8784546 2.1792990 1.8727374
> x02
[1] -0.8687907 0.2168528 -2.3573511 -4.0435968 -1.9759199 -1.4642176
[7] -1.0799398 -2.2423776 1.0092726 -1.5163328 -2.2651893 -1.4782773
[13] -2.2331311 -3.2363845 -2.1090825 -3.1086167 -2.9875158 -1.4077831
[19] -0.9521418 -1.1867783
> y01t
[1] 0.4587932 0.7647846 1.1178820 0.8554710 -1.7343197 -0.5432930
[7] -0.2574776 0.9133137 0.3155327 0.4516872 -0.6668922 -0.3319111
[13] 0.7316391 -0.2585008 -0.6691648 0.5402623 -0.6621848 -0.2750970
[19] -0.6651810 0.6239368
まず、交互作用項だけの場合、
> summary(glm(y01t~x01:x02,family=gaussian))
Call:
glm(formula = y01t ~ x01:x02, family = gaussian)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.72604 -0.59330 -0.01384 0.58178 1.07421
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.11277 0.26448 0.426 0.675
x01:x02 0.02243 0.05875 0.382 0.707
(Dispersion parameter for gaussian family taken to be 0.5787262)
Null deviance: 10.501 on 19 degrees of freedom
Residual deviance: 10.417 on 18 degrees of freedom
AIC: 49.712
Number of Fisher Scoring iterations: 2
> summary(glm(y01t~I(x01*x02),family=gaussian))
Call:
glm(formula = y01t ~ I(x01 * x02), family = gaussian)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.72604 -0.59330 -0.01384 0.58178 1.07421
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.11277 0.26448 0.426 0.675
I(x01 * x02) 0.02243 0.05875 0.382 0.707
(Dispersion parameter for gaussian family taken to be 0.5787262)
Null deviance: 10.501 on 19 degrees of freedom
Residual deviance: 10.417 on 18 degrees of freedom
AIC: 49.712
Number of Fisher Scoring iterations: 2
まったく同じである。
次に両方の説明変数(主効果)と交互作用項の場合、
> summary(glm(y01t~x01*x02,family=gaussian))
Call:
glm(formula = y01t ~ x01 * x02, family = gaussian)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.71094 -0.59507 -0.03846 0.58345 1.06288
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.21115 0.82850 -0.255 0.802
x01 0.12585 0.36487 0.345 0.735
x02 -0.18907 0.38706 -0.488 0.632
x01:x02 0.09811 0.17392 0.564 0.580
(Dispersion parameter for gaussian family taken to be 0.6413707)
Null deviance: 10.501 on 19 degrees of freedom
Residual deviance: 10.262 on 16 degrees of freedom
AIC: 53.412
Number of Fisher Scoring iterations: 2
> summary(glm(y01t~x01+x02+I(x01*x02),family=gaussian))
Call:
glm(formula = y01t ~ x01 + x02 + I(x01 * x02), family = gaussian)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.71094 -0.59507 -0.03846 0.58345 1.06288
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.21115 0.82850 -0.255 0.802
x01 0.12585 0.36487 0.345 0.735
x02 -0.18907 0.38706 -0.488 0.632
I(x01 * x02) 0.09811 0.17392 0.564 0.580
(Dispersion parameter for gaussian family taken to be 0.6413707)
Null deviance: 10.501 on 19 degrees of freedom
Residual deviance: 10.262 on 16 degrees of freedom
AIC: 53.412
Number of Fisher Scoring iterations: 2
交互作用項と2つの説明変数の積である新しい説明変数は同じ結果である。
次に、片方の主効果と交互作用項の場合、
> summary(glm(y01t~x01+x01:x02,family=gaussian))
Call:
glm(formula = y01t ~ x01 + x01:x02, family = gaussian)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.7202086 -0.5906895 -0.0002825 0.5830039 1.0658866
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.13221 0.42860 0.308 0.761
x01 -0.01311 0.22331 -0.059 0.954
x01:x02 0.02047 0.06900 0.297 0.770
(Dispersion parameter for gaussian family taken to be 0.6126447)
Null deviance: 10.501 on 19 degrees of freedom
Residual deviance: 10.415 on 17 degrees of freedom
AIC: 51.708
Number of Fisher Scoring iterations: 2
> summary(glm(y01t~x01+I(x01*x02),family=gaussian))
Call:
glm(formula = y01t ~ x01 + I(x01 * x02), family = gaussian)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.7202086 -0.5906895 -0.0002825 0.5830039 1.0658866
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.13221 0.42860 0.308 0.761
x01 -0.01311 0.22331 -0.059 0.954
I(x01 * x02) 0.02047 0.06900 0.297 0.770
(Dispersion parameter for gaussian family taken to be 0.6126447)
Null deviance: 10.501 on 19 degrees of freedom
Residual deviance: 10.415 on 17 degrees of freedom
AIC: 51.708
Number of Fisher Scoring iterations: 2
片方の主効果と交互作用項の、もう1つの組み合わせ、
> summary(glm(y01t~x02+x01:x02,family=gaussian))
Call:
glm(formula = y01t ~ x02 + x01:x02, family = gaussian)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.69410 -0.58283 0.03360 0.59910 1.08609
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.05105 0.32077 0.159 0.875
x02 -0.08498 0.23602 -0.360 0.723
x02:x01 0.04802 0.09315 0.515 0.613
(Dispersion parameter for gaussian family taken to be 0.6081312)
Null deviance: 10.501 on 19 degrees of freedom
Residual deviance: 10.338 on 17 degrees of freedom
AIC: 51.56
Number of Fisher Scoring iterations: 2
> summary(glm(y01t~x02+I(x01*x02),family=gaussian))
Call:
glm(formula = y01t ~ x02 + I(x01 * x02), family = gaussian)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.69410 -0.58283 0.03360 0.59910 1.08609
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.05105 0.32077 0.159 0.875
x02 -0.08498 0.23602 -0.360 0.723
I(x01 * x02) 0.04802 0.09315 0.515 0.613
(Dispersion parameter for gaussian family taken to be 0.6081312)
Null deviance: 10.501 on 19 degrees of freedom
Residual deviance: 10.338 on 17 degrees of freedom
AIC: 51.56
Number of Fisher Scoring iterations: 2
と、説明変数として、交互作用項のみ、片方の主効果と交互作用項、両方の主効果と交互作用項のどれでも、積とまったく同じ結果だった。