麻醉學專科筆試項目能鑑度反應分析

Kuang-Yi Chang; 張光宜

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/9424

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	陳秀熙
dc.contributor.author	Kuang-Yi Chang	en
dc.contributor.author	張光宜	zh_TW
dc.date.accessioned	2021-05-20T20:21:57Z	-
dc.date.available	2016-10-03
dc.date.available	2021-05-20T20:21:57Z	-
dc.date.copyright	2011-10-03
dc.date.issued	2011
dc.date.submitted	2011-08-11
dc.identifier.citation	References Albert, J. H., & Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association, 88(422), 669-679. Allen, M. J., & Yen, W. M. (1979). Introduction to Measurement Theory. Monterey, CA: Brooks/Cole Publishing. Amtmann, D., Cook, K. F., Jensen, M. P., Chen, W. H., Choi, S., Revicki, D., . . . Lai, J.-S. (2010). Development of a PROMIS item bank to measure pain interference. Pain, 150(1), 173-182. Andrich, D. (1988). Rasch Models for Measurement. Newbury Park, CA: Sage Publications. Aronson, S., Butler, A., Subhiyah, R., Buckingham Jr, R. E., Cahalan, M. K., Konstandt, S., . . . Thys, D. (2002). Development and analysis of a new certifying examination in perioperative transesophageal echocardiography. Anesthesia & Analgesia, 95(6), 1476-1482. Baker, F. B., & Kim, S. H. (2004). Item Response Theory: Parameter Estimation Techniques (2nd ed.). New York: Marcel Dekker. Best, N., Cowles, M. K., & Vines, K. (1996). CODA: Convergence diagnosis and output analysis software for Gibbs sampling output, wersion 0.3. Cambridge: MRC Biostatistics Unit. Bimbaum, A. (1968). Some latent trait models and their use in inferring an examinee's ability. In F. M. Lord & M. R. Novick (Eds.), Statistical Theories of Mental Test Scores (pp. 395-479). Reading: Addison-Wesley. Boeckstyns, M. E. (1987). Development and construct validity of a knee pain questionnaire. Pain, 31(1), 47-52. Bond, T. G., & Fox, C. M. (2007). Applying the Rasch Model: Fundamental Measurement in the Human Sciences (2nd ed.). Mahwah, NJ: Lawrence Erlbaum Associates. Borsting, E., Chase, C. H., & Ridder, W. H., 3rd. (2007). Measuring visual discomfort in college students. Optometry & Vision Science 84(8), 745-751. Bould, M. D., Crabtree, N. A., & Naik, V. N. (2009). Assessment of procedural skills in anaesthesia. British Journal of Anaesthesia, 103(4), 472-483. Byrne, A. J., & Greaves, J. D. (2001). Assessment instruments used during anaesthetic simulation: review of published studies. British Journal of Anaesthesia, 86(3), 445-450. Casella, G., & Berger, R. L. (2002). Statistical Inference (2nd ed.). Pacific Grove, CA: Duxbury/Thomson Learning. Chang, K. Y., Tsou, M. Y., Chan, K. H., Chang, S. H., Tai, J. J., & Chen, H. H. (2010). Item analysis for the written test of Taiwanese board certification examination in anaesthesiology using the Rasch model. British Journal of Anaesthesia, 104(6), 717-722. Clauser, B. E., Margolis, M. J., & Case, S. M. (2006). Testing for licensure and certification in the professions. In R. L. Brennan (Ed.), Educational Measurement (4th ed., pp. 701-728). Westport: Praeger Publishers. Cohen, J. (1983). The Cost of Dichotomization. Applied Psychological Measurement, 7(3), 249-253. Cook, C. E., Richardson, J. K., Pietrobon, R., Braga, L., Silva, H. M., & Turner, D. (2006). Validation of the NHANES ADL scale in a sample of patients with report of cervical pain: factor analysis, item response theory analysis, and line item validity. Disability & Rehabilitation, 28(15), 929-935. Crocker, L. M., & Algina, J. (1986). Introduction to Classical and Modern Test Theory. New York: Holt, Rinehart, and Winston, Inc. Cronbach, L. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16(3), 297-334. Davidson, M. (2008). Rasch analysis of three versions of the Oswestry Disability Questionnaire. Manual therapy, 13(3), 222-231. de Ayala, R. (2008). The Theory and Practice of Item Response Theory. New York: Guilford Press. de la Torre, J. (2008). Multidimensional scoring of abilities: The ordered polytomous response case. Applied Psychological Measurement, 32(5), 355-370. Decruynaere, C., Thonnard, J. L., & Plaghki, L. (2007). Measure of experimental pain using Rasch analysis. European Journal of Pain, 11(4), 469-474. Decruynaere, C., Thonnard, J. L., & Plaghki, L. (2009). How many response levels do children distinguish on faces scales for pain assessment? European Journal of Anaesthesiology, 13(6), 641-648. Embretson, S. E., & Reise, S. P. (2000). Item Response Theory for Psychologists. Mahwah, NJ: Lawrence Erlbaum Associates. Fisher, W. P. (1992). Reliability statistics. Rasch Measurement Transactions, 6, 238. Fleiss, J., Levin, B., & Paik, M. (2003). Statistical methods for rates and proportions (3rd ed.). Hoboken, NJ: Wiley-Interscience. Fox, J. P. (2010). Bayesian Item Response Modeling: Theory and Applications. New York: Springer. Garibaldi, R. A., Subhiyah, R., Moore, M. E., & Waxman, H. (2002). The in-training examination in internal medicine: an analysis of resident performance over time. Annals of Internal Medicine, 137(6), 505-510. Gelman, A., & Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7(4), 457-472. Geweke, J. (1991). Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments. In J. Bernardo, J. Berger, A. David & A. Smith (Eds.), Bayesian Statistics (Vol. 4, pp. 169-194). Oxford: Claredon Press. Gilks, W. R., Richardson, S., & Spiegelhalter, D. J. (1996). Markov Chain Monte Carlo in Practice. London: Chapman & Hall. Gill, J. (2008). Bayesian Methods: A Social and Behavioral Sciences Approach (2nd ed.). Boca Raton: Chapman & Hall/CRC. Greaves, J. D. (1997). Anaesthesia and the competence revolution. British Journal of Anaesthesia, 79(5), 555-557. Gregory, R. J. (2007). Psychological Testing: History, Principles, and Applications (5th ed.). Boston: Pearson/Allyn and Bacon. Hambleton, R. K., & Swaminathan, H. (1985). Item Response Theory: Principles and Applications. Boston: Kluwer-Nijhoff Publishing. Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of Item Response Theory. Newbury Park: Sage Publications. Heidelberger, P., & Welch, P. D. (1981). A spectral method for confidence interval generation and run length control in simulations. Communications of the ACM, 24(4), 233-245. Heidelberger, P., & Welch, P. D. (1983). Simulation run length control in the presence of an initial transient. Operations Research, 31(6), 1109-1144. Houston, P., Kearney, R. A., & Savoldelli, G. (2006). The oral examination process - gold standard or fool's gold. Canadian Journal of Anaesthesia, 53(7), 639-642. Kaplan, R. M., & Saccuzzo, D. P. (2009). Psychological Testing: Principles, Applications, and Issues. Belmont, CA: Wadsworth Cengage Learning. Kim, J. S., & Bolt, D. M. (2007). Estimating item response theory models using Markov chain Monte Carlo methods. Educational Measurement: Issues and Practice, 26(4), 38-51. Kolen, M. J., & Brennan, R. L. (2004). Test Equating, Scaling, and Linking: Methods and Practices. New York: Springer. Kuder, G., & Richardson, M. (1937). The theory of the estimation of test reliability. Psychometrika, 2(3), 151-160. Lai, J. S., Dineen, K., Reeve, B. B., Von Roenn, J., Shervin, D., McGuire, M., . . . Cella, D. (2005). An item response theory-based pain item bank can enhance measurement precision. Journal of Pain and Symptom Management, 30(3), 278-288. Linacre, J. M. (1994). Sample size and item calibration stability. Rasch Measurement Transactions, 7(4), 328. Linacre, J. M. (1997). KR-20 or Rasch reliability: Which tells the 'truth'? Rasch Measurement Transactions, 11(3), 580-581. Linacre, J. M. (2002). Optimizing rating scale category effectiveness. Journal of Applied Measurement, 3(1), 85-106. Lord, F. M. (1974). Estimation of latent ability and item parameters when there are omitted responses. Psychometrika, 39(2), 247-264. Lord, F. M. (1980). Applications of Item Response Theory to Practical Testing Problems. Hillsdale: Lawrence Erlbaum Associates. Lunn, D. J., Thomas, A., Best, N., & Spiegelhalter, D. J. (2000). WinBUGS - A Bayesian modelling framework: Concepts, structure, and extensibility. Statistics and Computing, 10(4), 325-337. Lunz, M., & Bashook, P. (2008). Relationship between candidate communication ability and oral certification examination scores. Medical Education, 42(12), 1227-1233. Lynch, S. M. (2007). Introduction to Applied Bayesian Statistics and Estimation for Social Scientists. New York: Springer. Masters, G. (1982). A rasch model for partial credit scoring. Psychometrika, 47(2), 149-174. McDonald, R. P. (1999). Test Theory: A Unified Treatment. Mahwah, NJ: Lawrence Erlbaum Associates. Merbitz, C., Morris, J., & Grip, J. C. (1989). Ordinal scales and foundations of misinference. Archives of Physical Medicine and Rehabilitation, 70(4), 308-312. Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16(2), 159-176. Murphy, K. R., & Davidshofer, C. O. (2005). Psychological Testing: Principles and Applications. Upper Saddle River, NJ: Pearson/Prentice Hall. Ntzoufras, I. (2009). Bayesian Modeling Using WinBUGS. Hoboken, NJ: Wiley. Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory. New York: McGraw-Hill. O'Malley, K. J., Suarez-Almazor, M., Aniol, J., Richardson, P., Kuykendall, D. H., Moseley, J. B., Jr., & Wray, N. P. (2003). Joint-specific multidimensional assessment of pain (J-MAP): Factor structure, reliability, validity, and responsiveness in patients with knee osteoarthritis. Journal of Rheumatology, 30(3), 534-543. O'Neill, T. R., Marks, C. M., & Reynolds, M. (2005). Re-evaluating the NCLEX-RN passing standard. Journal of Nursing Measurement, 13(2), 147-165. Ostini, R., & Nering, M. L. (2006). Polytomous Item Response Theory Models. Thousand Oaks, CA: Sage Publications. Page, S. J., Shawaryn, M. A., Cernich, A. N., & Linacre, J. M. (2002). Scaling of the revised Oswestry low back pain questionnaire. Archives of Physical Medicine and Rehabilitation, 83(11), 1579-1584. Patz, R. J., & Junker, B. W. (1999). A straightforward approach to Markov chain Monte Carlo methods for item response models. Journal of Educational and Behavioral Statistics, 24(2), 146-178. Pesudovs, K., & Noble, B. A. (2005). Improving subjective scaling of pain using Rasch analysis. Journal of Pain, 6(9), 630-636. Raftery, A. E., & Lewis, S. M. (1992a). How many iterations in the Gibbs sampler? In J. Bernardo, J. Berger, A. David & A. Smith (Eds.), Bayesian Statistics (Vol. 4, pp. 763-773). Oxford: Claredon Press. Raftery, A. E., & Lewis, S. M. (1992b). One long run with diagnostics: Implementation strategies for Markov chain Monte Carlo. Statistical Science, 7(4), 493-497. Rasch, G. (1960). Probabilistic Models for Some Intelligence and Attainnment Tests (Vol. 1). Copenhagen: Danmarks paedagogiske Institut. Reise, S. P., & Haviland, M. G. (2005). Item response theory and the measurement of clinical change. Journal of Personality Assessment, 84(3), 228 - 238. Revicki, D. A., Chen, W. H., Harnam, N., Cook, K. F., Amtmann, D., Callahan, L. F., . . . Keefe, F. J. (2009). Development and psychometric analysis of the PROMIS pain behavior item bank. Pain, 146(1-2), 158-169. Samejima, F. (1997). Graded response model. In W. J. van de Linden & R. K. Hambleton (Eds.), Handbook of Modern Item Response Theory (pp. 85-100). New York: Springer. Schumacker, R. E. (2004). Rasch Measurement: The Dichotomous Model. In E. V. Smith & R. M. Smith (Eds.), Introduction to Rasch Measurement: Theory, Models and Applications (pp. 226-253). Maple Grove: JAM Press. Sheu, C., Chen, C., Su, Y., & Wang, W. (2005). Using SAS PROC NLMIXED to fit item response theory models. Behav Res Methods, 37(2), 202-218. Smith Jr, E. V. (2001). Evidence for the reliability of measures and validity of measure interpretation: a Rasch measurement perspective. J Appl Meas, 2(3), 281-311. Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & van der Linde, A. (2002). Bayesian Measures of Model Complexity and Fit. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 64(4), 583-639. Spiegelhalter, D. J., Thomas, A., Best, N., & Gilks, W. (1996). BUGS 0.5: Bayesian inference using Gibbs sampling manual (version ii). Cambridge: MRC Biostatistics Unit. Spiegelhalter, D. J., Thomas, A., Best, N., & Lunn, D. (2003). WinBUGS User Manual, Version 1.4. Cambridge: MRC Biostatistics Unit. Swaminathan, H., & Gifford, J. (1985). Bayesian estimation in the two-parameter logistic model. Psychometrika, 50(3), 349-364. Swaminathan, H., & Gifford, J. (1986). Bayesian estimation in the three-parameter logistic model. Psychometrika, 51(4), 589-601. Tesio, L., Granger, C. V., & Fiedler, R. C. (1997). A unidimensional pain/disability measure for low-back pain syndromes. Pain, 69(3), 269-278. Thomeé, R., Grimby, G., Wright, B. D., & Linacre, J. M. (1995). Rasch analysis of Visual Analog Scale measurements before and after treatment of Patellofemoral Pain Syndrome in women. Scandinavian Journal of Rehabilitation Medicine, 27(3), 145-151. Valderas, J. M., & Alonso, J. (2008). Patient reported outcome measures: a model-based classification system for research and clinical practice. Quality of Life Research, 17(9), 1125-1135. Varni, J. W., Stucky, B. D., Thissen, D., Dewitt, E. M., Irwin, D. E., Lai, J. S., . . . Dewalt, D. A. (2010). PROMIS pediatric pain interference scale: An item response theory analysis of the pediatric pain item bank. Journal of Pain. von Eye, A., & Mun, E. Y. (2005). Analyzing Rater Agreement: Manifest Variable Methods. Mahwah, NJ: Lawrence Erlbaum Associates. White, L. J., & Velozo, C. A. (2002). The use of Rasch measurement to improve the Oswestry classification scheme. Archives of Physical Medicine and Rehabilitation, 83(6), 822-831. Wolfe, F. (2003). Pain extent and diagnosis: development and validation of the regional pain scale in 12,799 patients with rheumatic disease. Journal of Rheumatology, 30(2), 369-378. Wright, B. D., Linacre, J. M., Gustafson, J. E., & Martin-Löf , P. (1994). Reasonable mean-square fit values. Rasch Measurement Transactions, 8(3), 370. Wright, B. D., & Master, G. N. (1982). Rating Scale Analysis. Chicago: MESA Press. Wright, B. D., & Stone, M. H. (1979). Best Test Design: Rasch Measurement. Chicago: MESA Press. Yang, S. C., Tsou, M. Y., Chen, E. T., Chan, K. H., & Chang, K. Y. (2011). Statistical item analysis of the examination in anesthesiology for medical students using the Rasch model. Journal of the Chinese Medical Association, 74(3), 125-129. Yen, W. M., & Fitzpatrick, A. R. (2006). Item response theory. In R. L. Brennan (Ed.), Educational Measurement (4th ed., pp. 111-153). Westport, CT: Praeger Publishers.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/9424	-
dc.description.abstract	專科醫師測驗的目的在於評估考生是否足以勝任執業的最低要求。雖然專科醫師測驗對醫療照護品質而言相當重要，但是目前仍缺乏相關研究對專科醫師測驗結果進行詳盡的試題分析。事實上測驗中的項目反應隱含豐富且有價值的測驗訊息值得進行更多的相關研究。有鑑於此，本研究的主要目的為利用項目反應模式針對2007至2010年的台灣麻醉專科醫師筆試測驗進行廣泛的項目反應分析。這四個年度的麻醉專科醫師筆試測驗均為100道單選題，應考人數介於34至37人之間。本研究採用兩種不同分析策略進行項目反應分析，先利用最大概似估計法估計模式參數與測驗信度，再將貝氏項目反應分析應用在更複雜模式的參數估計、不同模式的模式比較、評估共變數對考生能力的影響與多階層項目反應分析。研究結果顯示這四個年度台灣麻醉專科醫師筆試測驗的信度介於0.71至0.75之間。兩種估計方式都可以得到單參數項目反應模式的考生能力與試題難度參數。但是在估計更複雜的雙參數與三參數模式時，最大概似估計法會遭遇無法收斂的問題。而貝氏法所得到的三參數模式估計結果顯示有過度參數化的疑慮，因此將所有猜題參數設為相等重新進行分析後發現這個共同參數的值接近於0。模式比較結果有利於採用單參數項目反應模式。而所收集到諸如考生年齡、性別與其訓練中心地理位置等變項對考生能力皆無顯著影響，階層項目反應分析結果顯示來自於同一中心考生彼此間的能力有相關性存在。本研究證實了針對台灣麻醉專科醫師筆試測驗所進行的項目反應分析可以為將來的命題提供有用的資訊，而貝氏項目反應分析的彈性與多功能性對台灣麻醉專科醫師測驗的試題分析具有重大價值。	zh_TW
dc.description.abstract	Board certification examinations for medical specialists aim to evaluate whether an examinee is competent to exceed minimum requirement for clinical practice. Although board certification examinations are of paramount importance to the quality of medical care, there is still lack of thorough investigations which focused on item response analyses of board certification examinations in a medical specialty. Item responses in a test are influenced by the examinee ability and item difficulty which require an in-depth statistical analysis. Therefore, the major goal of this thesis was to conduct comprehensive item response analyses on written tests of the Taiwanese board certification examinations in anesthesiology from 2007 to 2010 using a series of item response theory models. Data were derived from one hundred multiple choice items with single best answer included in each certification examination. The number of examinees ranged from 34 to 37 in each year for these four years. Two analytical strategies were applied to the item response analyses on the written tests of the Taiwanese board certification examinations in anesthesiology. The maximum likelihood estimation (MLE) method was used at first to estimate the parameters of the examinee ability and item difficulty and evaluate test reliability based on the one-parameter logistic (1-PL) model, so-called the Rasch model. Bayesian item response analyses were applied to dealing with more complicated item response models, including the two-parameter logistic (2-PL, considering item discrimination) and three-parameter logistic models (3-PL, considering guessing parameter). Bayesian approach was also used to assess the effects of covariate such as age gender, and geographic area on examinee ability. Bayesian multi-level model was also adopted to consider hierarchical data resulting from the correlation of item response within the same training center. The test reliability of written tests of board certification examination in Taiwan ranged between 0.71 and 0.75 in these four years. Both analytical approaches could estimate parameters of examinee ability and item difficulty in the one-parameter logistic item response model but the MLE methods encountered convergence problems during parameter estimation of the 2-PL and 3-PL item response models. The 3-PL model without restriction on guessing parameters based on Bayesian methods may lead to overparameterization. The common guessing parameters in the restricted 3-PL models with Bayesian approach were close to 0 in all the certification examination in anesthesiology held during the four-year study period. Model comparisons based on deviance information criteria provided evidence in favor of the 1-PL model. The effects of examinee characteristics such as gender, age and location of training centers on ability levels of examinees were not statistically significant. The application of multi-level Bayesian model to hierarchical data revealed correlation between ability levels of examinees from the same training centers. The effect of training center on examinee ability was not salient. This thesis demonstrates that item response analyses on written tests of the Taiwanese board certification examinations can provide useful information on test development in the future. The flexibility and versatility of Bayesian item response analyses were of great value for test analysis on written tests of the Taiwanese board certification examinations in anesthesiology.	en
dc.description.provenance	Made available in DSpace on 2021-05-20T20:21:57Z (GMT). No. of bitstreams: 1 ntu-100-D95842008-1.pdf: 5611602 bytes, checksum: 478de5268b500fd043d7ac1333c067f3 (MD5) Previous issue date: 2011	en
dc.description.tableofcontents	Contents 口試委員會審定書 i 誌謝 ii 摘要 iii Abstract iv 1. Introduction 1 2. Literature Review 4 2.1 Applications of IRT in the field of anesthesiology 4 2.1.1 Applications of IRT in examinations in anesthesiology 4 2.1.2 Application of IRT in pain measurement 8 2.2 Item response theory 15 2.2.1 Assumptions 15 2.2.2 Various item response models 15 2.2.2.1 Dichotomous item response model 15 2.3 Bayesian approach to item response analysis 30 2.3.1 Brief overview of Bayesian inference 30 2.3.2 Implementation of Bayesian approaches 32 2.3.3 Bayesian item response analysis 39 3. Methods 41 3.1 Data sources 41 3.2 Variables 42 3.3 Parameter estimation 44 3.4 Analytical approach 45 3.4.1 Traditional approach 45 3.4.2 Bayesian approach 47 4. Results 51 4.1 Characteristics of examinees and results of the examinations 51 4.2 Traditional item response analysis 51 4.2.1 One-parameter logistic item response analysis (the Rasch model) 51 4.3 Bayesian item response analysis 60 4.3.1 Parameter estimation 60 4.3.2 Model comparisons 68 4.3.3 Effects of covariates on examinee ability 69 4.3.4 Multilevel item response model 71 4.3.5 Convergence diagnostics 75 5. Discussion 77 6. Conclusion 82 References 83 Appendix 90 Appendix 1 WinBUGS code for the 1-PL item response model 90 Appendix 2 WinBUGS code for the 2-PL item response model 91 Appendix 3 WinBUGS code for the 3-PL item response model 92 Appendix 4 Item distribution map of the certification examinations held in 2008, 2009 and 2010 93 Appendix 5 WinBUGS code for the evaluation of covariate effects 96 Appendix 6 WinBUGS code for multilevel item response model 97
dc.language.iso	en
dc.title	麻醉學專科筆試項目能鑑度反應分析	zh_TW
dc.title	Item Response Analysis for Data on Written Examinations in Anesthesiology	en
dc.type	Thesis
dc.date.schoolyear	99-2
dc.description.degree	博士
dc.contributor.oralexamcommittee	呂炳榮,劉宏輝,于承平,張淑惠,鄭宗記,戴政
dc.subject.keyword	麻醉學,貝氏法,專科醫師甄審,項目反應模式,最大概似法,筆試測驗,	zh_TW
dc.subject.keyword	Anesthesiology,Bayesian approach,board certification examination,item response model,maximum likelihood,written examination,	en
dc.relation.page	98
dc.rights.note	同意授權(全球公開)
dc.date.accepted	2011-08-11
dc.contributor.author-college	公共衛生學院	zh_TW
dc.contributor.author-dept	流行病學與預防醫學研究所	zh_TW
顯示於系所單位：	流行病學與預防醫學研究所

文件中的檔案：

檔案	大小	格式
ntu-100-1.pdf	5.48 MB	Adobe PDF	檢視/開啟

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。