内网

检测到您当前使用浏览器版本过于老旧,会导致无法正常浏览网站;请您使用电脑里的其他浏览器如:360、QQ、搜狗浏览器的极速模式浏览,或者使用谷歌、火狐等浏览器。

下载Firefox

On the use of careful study design for molecular biomarker discovery and molecular classification

日期: 2017-07-14
威廉希尔学术报告
Title: On the use of careful study design for molecular biomarker discovery and molecular classification
Speaker: Li-Xuan Qin,
Associate Member, Memorial Sloan Kettering Cancer Center
Time: 14:00-15:00, Thursday, July 25, 2017
Location: Room 311, Wang Ke-Zhen Building, Peking University
Abstract:
Purpose: Reproducibility of scientific experimentation has become a major concern, due to the perception that many published molecular studies cannot be replicated. Careful study design (based on statistical principles such as blocking, stratification, and randomization) has the potential to improve the quality of molecular data and the reproducibility of the scientific inference from the data. However, its use in practice has been scarce. We set out to demonstrate the logistic feasibility of careful study design in molecular studies and its scientific benefits for discovering molecular biomarkers and developing molecular classifiers.
Methods: We conducted a microRNA study of endometrial tumors (n=96) and ovarian tumors (n=96) using uniform handling blocked randomization in the array-to-sample-group assignment to prevent handling effects. We profiled the same set of tumors for a second time using no blocking, randomization, or uniform handling. For molecular biomarker discovery, we assessed empirical evidence of differential expression between the two tumor types in each study, and also conducted simulation studies based on ‘virtual re-hybridization’ to evaluate the benefits of various forms of study design in the presence of handling effects. For molecular classification, we examined the validity of the cross-validation technique for error estimation, and its dependence on balanced array-to-sample assignment, using virtual re-hybridizations based on the paired datasets.
Results: There was moderate and asymmetric differential expression (10%=351/3,523) between endometrial and ovarian tumors in the first dataset (which was carefully designed). Handling effects were observed in the second dataset and 1,934 markers (55%) were called differentially expressed (DE), among which 181 were deemed DE (181/351, 53%) and 1,749 non-DE (1,749/1,934, 90%) in the first dataset. Normalization improved the detection of true positive markers but was still associated with a false positive rate as high as 50%. In the simulation study, when randomization was applied to all samples at once or within each of multiple batches balanced in sample groups (that is, stratification), blocking improved the true positive rate (TPR) from 0.95 to 0.97 and the false positive rate (FPR) from to 0.02 to 0.002; when sample batches are unbalanced, randomization within each batch is associated with a 0.92 TPR and a 0.10 FPR regardless of blocking. For the problem of molecular classification, our study showed that (1) cross-validation tended to under-estimate the error rate when the data possessed confounding handling effects, (2) depending on the relative amount of handling effects, normalization may further worsen the under-estimation of the error rate, (3) balanced assignment of arrays to comparison groups (via blocking or stratification) allowed cross-validation to provide an unbiased error estimate.
Conclusion: Through empirical and simulated studies, we showed that balanced array assignment can effectively improve the accuracy of detecting disease markers. We also showed that balanced array assignment can restore the validity of cross-validation for error estimation in molecular classification. Careful study design based on blocking, stratification, and randomization should be used to more fully reap the benefits of genomics technologies.
Speaker Bio:
Dr. Qin develops statistical methods for high throughput data analysis problems to address clinically important questions in cancer research. Within this framework, her current research primarily focuses on the use of statistical experimental design, exploratory data analysis, and mixture models for cancer biomarker discovery and on the genomics and translational studies of soft tissue sarcoma. She serves as the Corresponding Co-Director of the Biostatistics and Bioinformatics Core in the Soft Tissue Sarcoma Specialized Program of Research Excellence, and received an NIH R01 grant to develop efficient statistical methods for analyzing microRNA array and sequencing data.
欢迎各位老师同学积极参加!