[程式] R 的 factor 設定

看板Statistics作者gsuper (統計的巴比倫塔)時間14年前 (2010/03/31 15:47)推噓1(1推 0噓 24→)

留言25則, 5人參與討論串1/1

我看了一篇 paper 後想要重覆原作者的數據分析但他只有給 unpair data 的 function 所以我就寫信問作者要怎麼改成 pair 來來回回通了10幾封信後總算是把 paired data function 寫好了 (最後還是他來寫 , 我執行後給 report) 問題來了他沒辦法重覆他的分析 paper上算出來是 400 , 我卻算 2000 多 (多300就算差滿多了) 以下是他的理由不過我看不太懂想請深入了解 R 的高手解釋一下上色的那行是什麼意思? 補充 : 這是在跑　two-way ANOVA 之前建　linear model , 用 anova(lm())來做 , unpair function 的 linear model 有2個factor 和一個交互做用而 pair function , 再多加一個 pair factor ) ---------------------------------------------------------------- I found out that using R for anova with more than 2 factors generate p-values depending on how factors enter into model. See examples below. That's an example for my other dataset. But it applies to the paired sample data analysis. Basically, we need to add a block effect in the two-way anova model to account for that effect. So, the results of the second case study in my paper might not be accurate. ### block effect 就是新的 pair factor ### Although there are some minor issues, but the interaction effect reported by R is still correct, this leads to the corrected pooling of probesets whenever applicable. This again proves that power of consolidation. I suggest you not to use the per gene model for the paired samples in R. If you could implement it in SAS or other softwares which will give you the right TYPE III test, that should be fine. Sorry for all the confusion. --------------------example------------------------------------- anova(lm(y~as.factor(fcid)+genotype+time, data=y)) Analysis of Variance Table Response: y Df Sum Sq Mean Sq F value Pr(>F) as.factor(fcid) 4 4.0412 1.01029 5.5987 0.0034300 ** genotype 1 0.1275 0.12748 0.7065 0.4105584 time 4 6.2609 1.56522 8.6740 0.0003138 *** Residuals 20 3.6090 0.18045 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 > anova(lm(y~time+as.factor(fcid)+genotype, data=y)) Analysis of Variance Table Response: y Df Sum Sq Mean Sq F value Pr(>F) time 4 9.1197 2.27993 12.6347 2.741e-05 *** as.factor(fcid) 4 1.1806 0.29515 1.6356 0.2044 genotype 1 0.1292 0.12919 0.7159 0.4075 Residuals 20 3.6090 0.18045 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 -- 他的結論好像是在說 Factors 數量大於 3 的時候 R 會算的不準而原因是在於 lm() 裡面 factors 的置放順序會導致 unpredictable 的影響請問我有誤解嗎? --------------------------------------------------- 列一下我的資料 Normal_1 與　Tumor_1 為同一病人組織 , 有 pair 關係 Normal_1 Normal_2 Normal_3 | Tumor_1 Tumor_2 Tumor_3 ------------------------------------------------------------ block1| 10 20 30 40 50 60 block2| 70 80 90 100 110 120 block3| 130 140 150 160 170 180 轉成以下格式跑　2-way ANOVA anova(lm(tmp~trt+v+trt*v+block , data=data)) tmp trt v block -------------------------- 10 N 1 1 20 N 1 2 30 N 1 3 40 T 1 1 50 T 1 2 60 T 1 3 70 N 2 1 80 N 2 2 90 N 2 3 100 T 2 1 110 T 2 2 120 T 2 3 130 N 3 1 140 N 3 2 150 N 3 3 160 T 3 1 170 T 3 2 180 T 3 3 -- ※ 發信站: 批踢踢實業坊(ptt.cc) ◆ From: 140.113.239.247 ※ 編輯: gsuper 來自: 140.113.239.247 (03/31 15:47) ※ 編輯: gsuper 來自: 140.113.239.247 (03/31 15:49) ※ 編輯: gsuper 來自: 140.113.239.247 (03/31 15:49) ※ 編輯: gsuper 來自: 140.113.239.247 (03/31 15:51) ※ 編輯: gsuper 來自: 140.113.239.247 (03/31 15:55) ※ 編輯: gsuper 來自: 140.113.239.247 (03/31 15:57) ※ 編輯: gsuper 來自: 140.113.239.247 (03/31 16:12) ※ 編輯: gsuper 來自: 140.113.239.247 (03/31 16:13) ※ 編輯: gsuper 來自: 140.113.239.247 (03/31 16:16)

→

bmka

03/31 21:17, , 1^F

03/31 21:17, 1^F

→

bmka

03/31 21:18, , 2^F

03/31 21:18, 2^F

→

gsuper

03/31 21:38, , 3^F

03/31 21:38, 3^F

→