DFFITS

DFFITS は統計学の回帰分析において、ある点の影響度を示す統計量である。 1980年に出版されたベルスレー、クー、ウェルシュ共著の『回帰診断：影響の強いデータと共線形性の源泉を同定する』 ^[1] で提案された。

DFFITS は問題の点を回帰から外した場合の予測（回帰）値の変化 "DFFIT" を問題の点での当てはめの標準偏差の推定値で割って（スチューデント化、'S'）したものである。

[math]DFFITS = {\widehat{y_i} - \widehat{y_{i(i)}} \over s_{(i)} \sqrt{h_{ii}}}[/math]

ここで [math]\widehat{y_i}[/math] と [math]\widehat{y_{i(i)}}[/math] は点 i が回帰に含まれた場合と除かれた場合の予測値である。 [math]s_{(i)}[/math] は問題の点を含まずに推定された標準誤差の値である。 [math]h_{ii}[/math] はその点のてこ値である。

DFFITS は外部スチューデント化残差に似ている。実はそれを[math]\sqrt{h_{ii}/(1-h_{ii})}[/math]　倍したものである^[2]。誤差が正規分布するとき、外部スチューデント化残差はスチューデントのt分布（自由度は（残差の自由度－1））する。ある点での DFFITS とその点でのテコ因子 [math]\sqrt{h_{ii}/(1-h_{ii})}[/math] との積は同じt分布をする。したがって、テコ値の小さい点では DFFITS は小さいことが期待され、テコ値が 1 に近づくと DFFITS 値の分布は無限に広がる。

完全に均衡のとれた実験計画、たとえば（:en:因子計画や均衡部分因子計画）の場合、各点でのテコ値は [math]p/n[/math] 、すなわち母数の個数を点の個数で割ったものである。これは DFFITS 値が（正規分布の場合）[math]\sqrt{p \over n-p} \approx \sqrt{p \over n}[/math] と t 変数の積である。したがって、同書の著者は DFFITS が [math]2\sqrt{p \over n}[/math] より大きい場合を外れ点としてチェックすることを薦めている。

類似の量に en:クックの距離がある。

文献

↑ Belsley, David A.; Edwin Kuh, Roy E. Welsch (1980). Regression diagnostics : identifying influential data and sources of collinearity, Wiley series in probability and mathematical statistics. New York: John Wiley & Sons. ISBN 0471058564.
↑ Montogomery, Douglas C.; Elizabeth A. Peck (1992). “Appendix C.4”, Introduction to Linear Regression Analysis, 2nd ed. (English), New York: John Wiley & Sons, 504-505. ISBN 0-471-53387-4.

[1] Belsley, David A.; Edwin Kuh, Roy E. Welsch (1980). Regression diagnostics : identifying influential data and sources of collinearity, Wiley series in probability and mathematical statistics. New York: John Wiley & Sons. ISBN 0471058564.

[2] Montogomery, Douglas C.; Elizabeth A. Peck (1992). “Appendix C.4”, Introduction to Linear Regression Analysis, 2nd ed. (English), New York: John Wiley & Sons, 504-505. ISBN 0-471-53387-4.

[1]

[2]

DFFITS

文献

案内メニュー

表示

個人用ツール

navigation

search

tool