Browse > Article
http://dx.doi.org/10.3743/KOSIM.2019.36.1.191

The Stream of Uncertainty in Scientific Knowledge using Topic Modeling  

Heo, Go Eun (연세대학교 문헌정보학과)
Publication Information
Journal of the Korean Society for information Management / v.36, no.1, 2019 , pp. 191-213 More about this Journal
Abstract
The process of obtaining scientific knowledge is conducted through research. Researchers deal with the uncertainty of science and establish certainty of scientific knowledge. In other words, in order to obtain scientific knowledge, uncertainty is an essential step that must be performed. The existing studies were predominantly performed through a hedging study of linguistic approaches and constructed corpus with uncertainty word manually in computational linguistics. They have only been able to identify characteristics of uncertainty in a particular research field based on the simple frequency. Therefore, in this study, we examine pattern of scientific knowledge based on uncertainty word according to the passage of time in biomedical literature where biomedical claims in sentences play an important role. For this purpose, biomedical propositions are analyzed based on semantic predications provided by UMLS and DMR topic modeling which is useful method to identify patterns in disciplines is applied to understand the trend of entity based topic with uncertainty. As time goes by, the development of research has been confirmed that uncertainty in scientific knowledge is moving toward a decreasing pattern.
Keywords
text mining; uncertainty; DMR topic modeling; semantic predication; trend analysis;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Acedo, F. J., & Casillas, J. C. (2005). Current paradigms in the international management field: An author co-citation analysis. International Business Review, 14(5), 619-639. http://dx.doi.org/10.1016/j.ibusrev.2005.05.003   DOI
2 Nerur, S. P., Rasheed, A. A., & Natarajan, V. (2008). The intellectual structure of the strategic management field: An author co-citation analysis. Strategic Management Journal, 29(3), 319-336. https://doi.org/10.1002/smj.659   DOI
3 Newman, D. J., & Block, S. (2006). Probabilistic topic decomposition of an eighteenth-century American newspaper. Journal of the American Society for Information Science and Technology, 57(6), 753-767. https://doi.org/10.1002/asi.20342   DOI
4 Peters, H., & Van Raan, A. (1991). Structuring scientific activities by co-author analysis: An expercise on a university faculty level. Scientometrics, 20(1), 235-255. https://doi.org/10.1007/BF02018157   DOI
5 Pilkington, A., & Meredith, J. (2009). The evolution of the intellectual structure of operations management-1980-2006: A citation/co-citation analysis. Journal of Operations Management, 27(3), 185-202. https://doi.org/10.1016/j.jom.2008.08.001   DOI
6 An, X. Y., & Wu, Q. Q. (2011). Co-word analysis of the trends in stem cells field based on subject heading weighting. Scientometrics, 88(1), 133-144. http://dx.doi.org/10.1007/s11192-011-0374-1   DOI
7 Astrom, F. (2007). Changes in the LIS research front: Time-sliced cocitation analyses of LIS journal articles, 1990-2004. Journal of the American Society for Information Science and Technology, 58(7), 947-957. https://doi.org/10.1002/asi.20567   DOI
8 Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3(Jan), 993-1022.
9 Bodenreider, O. (2004). The unified medical language system (UMLS): Integrating biomedical terminology. Nucleic Acids Research, 32(suppl_1), D267-D270. https://doi.org/10.1093/nar/gkh061   DOI
10 Callon, M., Rip, A., & Law, J. (Eds.). (1986). Mapping the dynamics of science and technology: Sociology of science in the real world. Springer.
11 Cambrosio, A., Limoges, C., Courtial, J. P., & Laville, F. (1993). Historical scientometrics?: Mapping over 70 years of biological safety research with co-word analysis. Scientometrics, 27(2), 119-143. https://doi.org/10.1007/BF02016546   DOI
12 Rizomilioti, V. (2006). Exploring epistemic modality in academic discourse using corpora. In Information Technology in Languages for Specific Purposes, 53-71. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-28624-2_4   DOI
13 Ravetz, J. R. (1973). Scientific knowledge and its social problems. Transaction publishers.
14 Rindflesch, T. C., & Fiszman, M. (2003). The interaction of domain knowledge and linguistic structure in natural language processing: Interpreting hypernymic propositions in biomedical text. Journal of Biomedical Informatics, 36(6), 462-477. https://doi.org/10.1016/j.jbi.2003.11.003   DOI
15 Rip, A., & Courtial, J. P. (1984). Co-word maps of biotechnology: An example of cognitive scientometrics. Scientometrics, 6(6), 381-400. https://doi.org/10.1007/BF02025827   DOI
16 Chapman, W. W., Bridewell, W., Hanbury, P., Cooper, G. F., & Buchanan, B. G. (2001). A simple algorithm for identifying negated findings and diseases in discharge summaries. Journal of biomedical informatics, 34(5), 301-310. https://doi.org/10.1006/jbin.2001.1029   DOI
17 Chang, Y. W., & Huang, M. H. (2012). A study of the evolution of interdisciplinarity in library and information science: Using three bibliometric methods. Journal of the American Society for Information Science and Technology, 63(1), 22-33. https://doi.org/10.1002/asi.21649   DOI
18 Sebastian, Y., Siew, E. G., & Orimaye, S. O. (2017). Emerging approaches in literature-based discovery: Techniques and performance review. The Knowledge Engineering Review, 32. https://doi.org/10.1017/S0269888917000042   DOI
19 Solti, I., Cooke, C. R., Xia, F., & Wurfel, M. M. (2009, November). Automated classification of radiology reports for acute lung injury: Comparison of keyword and machine learning based natural language processing approaches. In 2009 IEEE International Conference on Bioinformatics and Biomedicine Workshop, 314-319. IEEE. https://doi.org/10.1109/BIBMW.2009.5332081   DOI
20 Song, M., Heo, G. E., & Lee, D. (2015). Identifying the landscape of Alzheimer's disease research with network and content analysis. Scientometrics, 102(1), 905-927. https://doi.org/10.1007/s11192-014-1372-x   DOI
21 Chen, C. (2006). CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific literature. Journal of the American Society for information Science and Technology, 57(3), 359-377. https://doi.org/10.1002/asi.20317   DOI
22 Chen, C., Song, M., & Heo, G. E. (2018). A scalable and adaptive method for finding semantically equivalent cue words of uncertainty. Journal of Informetrics, 12(1), 158-180. https://doi.org/10.1016/j.joi.2017.12.004   DOI
23 Chen, K., & Guan, J. (2011). A bibliometric investigation of research performance in emerging nanobiopharmaceuticals. Journal of Informetrics, 5(2), 233-247. https://doi.org/10.1016/j.joi.2010.10.007   DOI
24 Cobo, M. J., Lopez-Herrera, A. G., Herrera-Viedma, E., & Herrera, F. (2011). An approach for detecting, quantifying, and visualizing the evolution of a research field: A practical application to the fuzzy sets theory field. Journal of Informetrics, 5(1), 146-166. https://doi.org/10.1016/j.joi.2010.10.002   DOI
25 Culnan, M. J. (1986). The intellectual development of management information systems, 1972-1982: A co-citation analysis. Management Science, 32(2), 156-172. https://doi.org/10.1287/mnsc.32.2.156   DOI
26 Culnan, M. J. (1987). Mapping the intellectual structure of MIS, 1980-1985: A co-citation analysis. Mis Quarterly, 341-353. https://www.jstor.org/stable/248680
27 Uzun, A. (2002). Library and information science research in developing countries and eastern european countries: A brief bibliometric perspective. International Information & Library Review, 34(1), 21-33. https://doi.org/10.1080/10572317.2002.10762561   DOI
28 Song, M., Kim, S., Zhang, G., Ding, Y., & Chambers, T. (2014). Productivity and influence in bioinformatics: A bibliometric analysis using PubMed central. Journal of the Association for Information Science and Technology, 65(2), 352-371. https://doi.org/10.1002/asi.22970   DOI
29 Steyvers, M., & Griffiths, T. (2007). Probabilistic topic models. Handbook of Latent Semantic Analysis, 427(7), 424-440.
30 Szarvas, G., Vincze, V., Farkas, R., & Csirik, J. (2008, June). The BioScope corpus: Annotation for negation, uncertainty and their scope in biomedical texts. In Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing, 38-45. Association for Computational Linguistics.
31 Friedman, C., Alderson, P. O., Austin, J. H., Cimino, J. J., & Johnson, S. B. (1994). A general natural-language text processor for clinical radiology. Journal of the American Medical Informatics Association, 1(2), 161-174. https://doi.org/10.1136/jamia.1994.95236146   DOI
32 Vincze, V., Szarvas, G., Farkas, R., Mora, G., & Csirik, J. (2008). The BioScope corpus: Biomedical texts annotated for uncertainty, negation and their scopes. BMC Bioinformatics, 9(11), S9. https://doi.org/10.1186/1471-2105-9-S11-S9   DOI
33 Vold, E. T. (2006). Epistemic modality markers in research articles: a cross-linguistic and crossdisciplinary study. International Journal of Applied Linguistics, 16(1), 61-87. https://doi.org/10.1111/j.1473-4192.2006.00106.x   DOI
34 Ding, Y., Chowdhury, G. G., & Foo, S. (2001). Bibliometric cartography of information retrieval research by using co-word analysis. Information Processing & Management, 37(6), 817-842. https://doi.org/10.1016/S0306-4573(00)00051-0   DOI
35 Falahati, R. (2006, February). The use of hedging across different disciplines and rhetorical sections of research articles. In Proceedings of the 22nd NorthWest Linguistics Conference (NWLC22), 99-112.
36 Farkas, R., Vincze, V., Mora, G., Csirik, J., & Szarvas, G. (2010, July). The CoNLL-2010 shared task: Learning to detect hedges and their scope in natural language text. In Proceedings of the Fourteenth Conference on Computational Natural Language Learning - Shared Task (pp. 1-12). Association for Computational Linguistics.
37 Griffiths, T. L., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National Academy of Sciences, 101(suppl 1), 5228-5235. https://doi.org/10.1073/pnas.0307752101   DOI
38 Heo, G. E., Kang, K. Y., Song, M., & Lee, J. H. (2017). Analyzing the field of bioinformatics with the multi-faceted topic modeling technique. BMC Bioinformatics, 18(7), 251. https://doi.org/10.1186/s12859-017-1640-x   DOI
39 Wilbur, W. J., Rzhetsky, A., & Shatkay, H. (2006). New directions in biomedical text annotation: Definitions, guidelines and corpus construction. BMC Bioinformatics, 7(1), 356. https://doi.org/10.1186/1471-2105-7-356   DOI
40 White, H. D., & McCain, K. W. (1998). Visualizing a discipline: An author co-citation analysis of information science, 1972-1995. Journal of the American SOCIEty for Information Science, 49(4), 327-355. https://doi.org/10.1002/(SICI)1097-4571(19980401)49:4<327:AID-ASI4>3.0.CO;2-4   DOI
41 Zehr, S. C. (1999). Scientists' representations of uncertainty. Communicating Uncertainty: Media Coverage of New and Controversial Science, 3-21.
42 Zerva, C., Batista-Navarro, R., Day, P., & Ananiadou, S. (2017). Using uncertainty to link and rank evidence from biomedical literature for model curation. Bioinformatics, 33(23), 3784-3792. https://doi.org/10.1093/bioinformatics/btx466   DOI
43 Zhao, D., & Strotmann, A. (2008). Evolution of research activities and intellectual influences in information science 1996-2005: Introducing author bibliographic-coupling analysis. Journal of the American Society for Information Science and Technology, 59(13), 2070-2086. https://doi.org/10.1002/asi.20910   DOI
44 Zhao, L. M., & Zhang, Q. P. (2011). Mapping knowledge domains of Chinese digital library research output, 1994-2010. Scientometrics, 89(1), 51-87. http://dx.doi.org/10.1007/s11192-011-0428-4   DOI
45 Kilicoglu, H., Rosemblat, G., & Rindflesch, T. C. (2017). Assigning factuality values to semantic relations extracted from biomedical research literature. PloS One, 12(7), e0179926. https://doi.org/10.1371/journal.pone.0179926   DOI
46 Hristovski, D., Friedman, C., Rindflesch, T. C., & Peterlin, B. (2006). Exploiting semantic relations for literature-based discovery. In AMIA annual symposium proceedings (Vol. 2006, p. 349). American Medical Informatics Association.
47 Hyland, K. (1998). Hedging in scientific research articles (Vol. 54). John Benjamins Publishing.
48 Jensen, J. D. (2008). Scientific uncertainty in news coverage of cancer research: Effects of hedging on scientists' and journalists' credibility. Human Communication Research, 34(3), 347-369. https://doi.org/10.1111/j.1468-2958.2008.00324.x   DOI
49 Jeong, Y. K., Heo, G. E., Kang, K. Y., Yoon, D. S., & Song, M. (2016). Trajectory analysis of drug-research trends in pancreatic cancer on PubMed and ClinicalTrials. gov. Journal of Informetrics, 10(1), 273-285 https://doi.org/10.1016/j.joi.2016.01.003   DOI
50 Jin, Y., Myaeng, S. H., & Jung, Y. (2007). Use of place information for improved event tracking. Information Processing & Management, 43(2), 365-378. https://doi.org/10.1016/j.ipm.2006.07.007   DOI
51 Kilicoglu, H., Shin, D., Fiszman, M., Rosemblat, G., & Rindflesch, T. C. (2012). SemMedDB: a PubMed-scale repository of biomedical semantic predications. Bioinformatics, 28(23), 3158-3160. https://doi.org/10.1093/bioinformatics/bts591   DOI
52 Kleinberg, J. (2003). Bursty and hierarchical structure in streams. Data Mining and Knowledge Discovery, 7(4), 373-397. https://doi.org/10.1023/A:1024940629314   DOI
53 Malin, B., & Carley, K. (2007). A longitudinal social network analysis of the editorial boards of medical informatics and bioinformatics journals. Journal of the American Medical Informatics Association, 14(3), 340-348. http://dx.doi.org/10.1197/jamia.M2228   DOI
54 Lakoff, G. (1972). Hedges: A study in meaning criteria and the logic of fuzzy concepts. Papers from the eighth regional meeting, Chicago Linguistic Society, Chicago: University of Chicago Linguistics Department, 8, 183-228. https://doi.org/10.1007/978-94-010-1756-5_9   DOI
55 Light, M., Qiu, X. Y., & Srinivasan, P. (2004). The language of bioscience: Facts, speculations, and statements in between. In HLT-NAACL 2004 Workshop: Linking Biological Literature, Ontologies and Databases.
56 Liu, D. R., Omar, H., Liou, C. H., Chi, H. C., & Hsu, C. H. (2015). Recommending blog articles based on popular event trend analysis. Information Sciences, 305, 302-319. https://doi.org/10.1016/j.ins.2015.02.003   DOI
57 Liu, G. Y., Hu, J. M., & Wang, H. L. (2012). A co-word analysis of digital library field in China. Scientometrics, 91(1), 203-217. http://dx.doi.org/10.1007/s11192-011-0586-4   DOI
58 Malhotra, A., Younesi, E., Gurulingappa, H., & Hofmann-Apitius, M. (2013). 'HypothesisFinder:' A strategy for the detection of speculative statements in scientific text. PLoS Computational Biology, 9(7), e1003117. https://doi.org/10.1371/journal.pcbi.1003117   DOI
59 McCallum, A. K. (2002). Mallet: A machine learning for language toolkit.
60 Medlock, B., & Briscoe, T. (2007). Weakly supervised learning for hedge classification in scientific literature. In Proceedings of the 45th annual meeting of the association of computational linguistics (pp. 992-999).
61 Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems (pp. 3111-3119).
62 Milojevic, S., Sugimoto, C. R., Yan, E., & Ding, Y. (2011). The cognitive structure of library and information science: Analysis of article title words. Journal of the American Society for Information Science and Technology, 62(10), 1933-1953. http://dx.doi.org/10.1002/asi.21602   DOI
63 Mimno, D., & McCallum, A. (2012). Topic models conditioned on arbitrary features with dirichletmultinomial regression. arXiv Preprint arXiv:1206.3278.