• Title/Summary/Keyword: literature mining

Search Result 227, Processing Time 0.021 seconds

PubMiner: Machine Learning-based Text Mining for Biomedical Information Analysis

  • Eom, Jae-Hong;Zhang, Byoung-Tak
    • Genomics & Informatics
    • /
    • v.2 no.2
    • /
    • pp.99-106
    • /
    • 2004
  • In this paper we introduce PubMiner, an intelligent machine learning based text mining system for mining biological information from the literature. PubMiner employs natural language processing techniques and machine learning based data mining techniques for mining useful biological information such as protein­protein interaction from the massive literature. The system recognizes biological terms such as gene, protein, and enzymes and extracts their interactions described in the document through natural language processing. The extracted interactions are further analyzed with a set of features of each entity that were collected from the related public databases to infer more interactions from the original interactions. An inferred interaction from the interaction analysis and native interaction are provided to the user with the link of literature sources. The performance of entity and interaction extraction was tested with selected MEDLINE abstracts. The evaluation of inference proceeded using the protein interaction data of S. cerevisiae (bakers yeast) from MIPS and SGD.

A bio-text mining system using keywords and patterns in a grid environment

  • Kwon, Hyuk-Ryul;Jung, Tae-Sung;Kim, Kyoung-Ran;Jahng, Hye-Kyoung;Cho, Wan-Sup;Yoo, Jae-Soo
    • Proceedings of the Korea Society for Industrial Systems Conference
    • /
    • 2007.02a
    • /
    • pp.48-52
    • /
    • 2007
  • As huge amount of literature including biological data is being generated after post genome era, it becomes difficult for researcher to find useful knowledge from the biological databases. Bio-text mining and related natural language processing technique are the key issues in the intelligent knowledge retrieval from the biological databases. We propose a bio-text mining technique for the biologists who find Knowledge from the huge literature. At first, web robot is used to extract and transform related literature from remote databases. To improve retrieval speed, we generate an inverted file for keywords in the literature. Then, text mining system is used for extracting given knowledge patterns and keywords. Finally, we construct a grid computing environment to guarantee processing speed in the text mining even for huge literature databases. In the real experiment for 10,000 bio-literatures, the system shows 95% precision and 98% recall.

  • PDF

Biomedical Ontologies and Text Mining for Biomedicine and Healthcare: A Survey

  • Yoo, Ill-Hoi;Song, Min
    • Journal of Computing Science and Engineering
    • /
    • v.2 no.2
    • /
    • pp.109-136
    • /
    • 2008
  • In this survey paper, we discuss biomedical ontologies and major text mining techniques applied to biomedicine and healthcare. Biomedical ontologies such as UMLS are currently being adopted in text mining approaches because they provide domain knowledge for text mining approaches. In addition, biomedical ontologies enable us to resolve many linguistic problems when text mining approaches handle biomedical literature. As the first example of text mining, document clustering is surveyed. Because a document set is normally multiple topic, text mining approaches use document clustering as a preprocessing step to group similar documents. Additionally, document clustering is able to inform the biomedical literature searches required for the practice of evidence-based medicine. We introduce Swanson's UnDiscovered Public Knowledge (UDPK) model to generate biomedical hypotheses from biomedical literature such as MEDLINE by discovering novel connections among logically-related biomedical concepts. Another important area of text mining is document classification. Document classification is a valuable tool for biomedical tasks that involve large amounts of text. We survey well-known classification techniques in biomedicine. As the last example of text mining in biomedicine and healthcare, we survey information extraction. Information extraction is the process of scanning text for information relevant to some interest, including extracting entities, relations, and events. We also address techniques and issues of evaluating text mining applications in biomedicine and healthcare.

Research Trends on Literature Reviews in Scopus Journals by Authors from Indonesia, Japan, South Korea, Vietnam, Singapore, and Malaysia: A Bibliometric Analysis from 2003 to 2022

  • Prakoso Bhairawa Putera;Amelya Gustina
    • Asian Journal of Innovation and Policy
    • /
    • v.12 no.3
    • /
    • pp.304-322
    • /
    • 2023
  • Text data mining ('big data methods') is one of the most widely used approaches during the COVID-19 pandemic. In particular, text data mining on Scopus databases or Web of Science (WoS). Text data mining is widely used to collect literature for later bibliometric analysis, and in the end, it becomes a literature review article. Therefore, in this article, we reveal the trend of publication of literature reviews in Scopus journals from Indonesia, Japan, South Korea, Vietnam, Singapore, and Malaysia. This article describes two essential parts, namely 1) a comparison of international publication trends and subject area of literature review publications, and 2) a comparison of Top 5 for Authors, Affiliation, Source Title, and Collaboration Country.

Predicting stock price direction by using data mining methods : Emphasis on comparing single classifiers and ensemble classifiers

  • Eo, Kyun Sun;Lee, Kun Chang
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.11
    • /
    • pp.111-116
    • /
    • 2017
  • This paper proposes a data mining approach to predicting stock price direction. Stock market fluctuates due to many factors. Therefore, predicting stock price direction has become an important issue in the field of stock market analysis. However, in literature, there are few studies applying data mining approaches to predicting the stock price direction. To contribute to literature, this paper proposes comparing single classifiers and ensemble classifiers. Single classifiers include logistic regression, decision tree, neural network, and support vector machine. Ensemble classifiers we consider are adaboost, random forest, bagging, stacking, and vote. For the sake of experiments, we garnered dataset from Korea Stock Exchange (KRX) ranging from 2008 to 2015. Data mining experiments using WEKA revealed that random forest, one of ensemble classifiers, shows best results in terms of metrics such as AUC (area under the ROC curve) and accuracy.

A Study on Research Trend Analysis and Topic Class Prediction of Digital Transformation using Text Mining

  • Lee, JeeYoung
    • International journal of advanced smart convergence
    • /
    • v.8 no.2
    • /
    • pp.183-190
    • /
    • 2019
  • In the era of the Fourth Industrial Revolution, digital transformation, which means changes in all industrial structures, politics, economics and society as well as IT technology, is an important issue. It is difficult to know which research topic is being studied because digital transformation is being studied in various fields. Convergence research is possible because a research topic is studied in various fields such as computer science area and Decision science area. However, it is difficult to know the specific research status of the research topic. In this study, eight research topics were derived using the topic modeling technique of text mining for abstract of academic literature and the trend of each topic was analyzed. We also proposed to create a Topic-Word Proportions Table in the LDA based Topic modeling process to predict the topic of new literature. The results of this study are expected to contribute to advanced convergence research on topic of digital transformation. It is expected that the literature related to each research topic will be grasped and contribute to the design of a new convergence research.

Association Analysis of Reactive Oxygen Species-Hypertension Genes Discovered by Literature Mining

  • Lim, Ji Eun;Hong, Kyung-Won;Jin, Hyun-Seok;Oh, Bermseok
    • Genomics & Informatics
    • /
    • v.10 no.4
    • /
    • pp.244-248
    • /
    • 2012
  • Oxidative stress, which results in an excessive product of reactive oxygen species (ROS), is one of the fundamental mechanisms of the development of hypertension. In the vascular system, ROS have physical and pathophysiological roles in vascular remodeling and endothelial dysfunction. In this study, ROS-hypertension-related genes were collected by the biological literature-mining tools, such as SciMiner and gene2pubmed, in order to identify the genes that would cause hypertension through ROS. Further, single nucleotide polymorphisms (SNPs) located within these gene regions were examined statistically for their association with hypertension in 6,419 Korean individuals, and pathway enrichment analysis using the associated genes was performed. The 2,945 SNPs of 237 ROS-hypertension genes were analyzed, and 68 genes were significantly associated with hypertension (p < 0.05). The most significant SNP was rs2889611 within MAPK8 (p = $2.70{\times}10^{-5}$; odds ratio, 0.82; confidence interval, 0.75 to 0.90). This study demonstrates that a text mining approach combined with association analysis may be useful to identify the candidate genes that cause hypertension through ROS or oxidative stress.

Enhanced Hybrid Privacy Preserving Data Mining Technique

  • Kundeti Naga Prasanthi;M V P Chandra Sekhara Rao;Ch Sudha Sree;P Seshu Babu
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.6
    • /
    • pp.99-106
    • /
    • 2023
  • Now a days, large volumes of data is accumulating in every field due to increase in capacity of storage devices. These large volumes of data can be applied with data mining for finding useful patterns which can be used for business growth, improving services, improving health conditions etc. Data from different sources can be combined before applying data mining. The data thus gathered can be misused for identity theft, fake credit/debit card transactions, etc. To overcome this, data mining techniques which provide privacy are required. There are several privacy preserving data mining techniques available in literature like randomization, perturbation, anonymization etc. This paper proposes an Enhanced Hybrid Privacy Preserving Data Mining(EHPPDM) technique. The proposed technique provides more privacy of data than existing techniques while providing better classification accuracy. The experimental results show that classification accuracies have increased using EHPPDM technique.

BIOLOGY ORIENTED TARGET SPECIFIC LITERATURE MINING FOR GPCR PATHWAY EXTRACTION (GPCR 경로 추출을 위한 생물학 기반의 목적지향 텍스트 마이닝 시스템)

  • KIm, Eun-Ju;Jung, Seol-Kyoung;Yi, Eun-Ji;Lee, Gary-Geunbae;Park, Soo-Jun
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2003.10a
    • /
    • pp.86-94
    • /
    • 2003
  • Electronically available biological literature has been accumulated exponentially in the course of time. So, researches on automatically acquiring knowledge from these tremendous data by text mining technology become more and more prosperous. However, most of the previous researches are technology oriented and are not well focused in practical extraction target, hence result in low performance and inconvenience for the bio-researchers to actually use. In this paper, we propose a more biology oriented target domain specific text mining system, that is, POSTECH bio-text mining system (POSBIOTM), for signal transduction pathway extraction, especially for G protein-coupled receptor (GPCR) pathway. To reflect more domain knowledge, we specify the concrete target for pathway extraction and define the minimal pathway domain ontology. Under this conceptual model, POSBIOTM extracts interactions and entities of pathways from the full biological articles using a machine learning oriented extraction method and visualizes the pathways using JDesigner module provided in the system biology workbench (SBW) [14]

  • PDF

Identifying Research Trends in Big data-driven Digital Transformation Using Text Mining (텍스트마이닝을 활용한 빅데이터 기반의 디지털 트랜스포메이션 연구동향 파악)

  • Minjun, Kim
    • Smart Media Journal
    • /
    • v.11 no.10
    • /
    • pp.54-64
    • /
    • 2022
  • A big data-driven digital transformation is defined as a process that aims to innovate companies by triggering significant changes to their capabilities and designs through the use of big data and various technologies. For a successful big data-driven digital transformation, reviewing related literature, which enhances the understanding of research statuses and the identification of key research topics and relationships among key topics, is necessary. However, understanding and describing literature is challenging, considering its volume and variety. Establishing a common ground for central concepts is essential for science. To clarify key research topics on the big data-driven digital transformation, we carry out a comprehensive literature review by performing text mining of 439 articles. Text mining is applied to learn and identify specific topics, and the suggested key references are manually reviewed to develop a state-of-the-art overview. A total of 10 key research topics and relationships among the topics are identified. This study contributes to clarifying a systematized view of dispersed studies on big data-driven digital transformation across multiple disciplines and encourages further academic discussions and industrial transformation.