• Title/Summary/Keyword: Text Mining

Search Result 1,518, Processing Time 0.033 seconds

Inferring Undiscovered Public Knowledge by Using Text Mining Analysis and Main Path Analysis: The Case of the Gene-Protein 'brings_about' Chains of Pancreatic Cancer (텍스트마이닝과 주경로 분석을 이용한 미발견 공공 지식 추론 - 췌장암 유전자-단백질 유발사슬의 경우 -)

  • Ahn, Hyerim;Song, Min;Heo, Go Eun
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.26 no.1
    • /
    • pp.217-231
    • /
    • 2015
  • This study aims to infer the gene-protein 'brings_about' chains of pancreatic cancer which were referred to in the pancreatic cancer related researches by constructing the gene-protein interaction network of pancreatic cancer. The chains can help us uncover publicly unknown knowledge that would develop as empirical studies for investigating the cause of pancreatic cancer. In this study, we applied a novel approach that grafts text mining and the main path analysis into Swanson's ABC model for expanding intermediate concepts to multi-levels and extracting the most significant path. We carried out text mining analysis on the full texts of the pancreatic cancer research papers published during the last ten-year period and extracted the gene-protein entities and relations. The 'brings_about' network was established with bio relations represented by bio verbs. We also applied main path analysis to the network. We found the main direct 'brings_about' path of pancreatic cancer which includes 14 nodes and 13 arcs. 9 arcs were confirmed as the actual relations emerged on the related researches while the other 4 arcs were arisen in the network transformation process for main path analysis. We believe that our approach to combining text mining analysis with main path analysis can be a useful tool for inferring undiscovered knowledge in the situation where either a starting or an ending point is unknown.

A Technical Approach for Suggesting Research Directions in Telecommunications Policy

  • Oh, Junseok;Lee, Bong Gyou
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.8 no.12
    • /
    • pp.4467-4488
    • /
    • 2014
  • The bibliometric analysis is widely used for understanding research domains, trends, and knowledge structures in a particular field. The analysis has majorly been used in the field of information science, and it is currently applied to other academic fields. This paper describes the analysis of academic literatures for classifying research domains and for suggesting empty research areas in the telecommunications policy. The application software is developed for retrieving Thomson Reuters' Web of Knowledge (WoK) data via web services. It also used for conducting text mining analysis from contents and citations of publications. We used three text mining techniques: the Keyword Extraction Algorithm (KEA) analysis, the co-occurrence analysis, and the citation analysis. Also, R software is used for visualizing the term frequencies and the co-occurrence network among publications. We found that policies related to social communication services, the distribution of telecommunications infrastructures, and more practical and data-driven analysis researches are conducted in a recent decade. The citation analysis results presented that the publications are generally received citations, but most of them did not receive high citations in the telecommunications policy. However, although recent publications did not receive high citations, the productivity of papers in terms of citations was increased in recent ten years compared to the researches before 2004. Also, the distribution methods of infrastructures, and the inequity and gap appeared as topics in important references. We proposed the necessity of new research domains since the analysis results implies that the decrease of political approaches for technical problems is an issue in past researches. Also, insufficient researches on policies for new technologies exist in the field of telecommunications. This research is significant in regard to the first bibliometric analysis with abstracts and citation data in telecommunications as well as the development of software which has functions of web services and text mining techniques. Further research will be conducted with Big Data techniques and more text mining techniques.

Text Mining Analysis on the Research Field of the Coastal and Ocean Engineering Based on the SCOPUS Bibliographic Information (해안해양공학 연구 분야의 SCOPUS 서지정보 Text Mining 분석)

  • Lee, Gi Seop;Cho, Hong Yeon;Han, Jae Rim
    • Journal of Korean Society of Coastal and Ocean Engineers
    • /
    • v.30 no.1
    • /
    • pp.19-28
    • /
    • 2018
  • Numerous research papers have been accumulated due to the development and computerization of bibliometrics. This made it difficult to review all of the related papers published worldwide to conduct the study. However, due to the development of Natural language processing techniques, the tendency analysis of published research papers has become easier. In this study, text mining analysis using the statistical computing language R was carried out based on the bibliographic information of SCOPUS DB (Data Base) in the field of coastal and ocean engineering. As expected, the term 'wave' predominates, and it was confirmed that numerical analysis and hydraulic experiments were still dominant from the terms 'numerical model', 'numerical simulation', and 'experimental study'. In addition, recent use of the term 'wave energy' related to marine energy has been recognized. On the other hand, it was quantitatively confirmed that the frequency of connection between 'wave', and 'height' or 'energy' prevailed, and suggested the possibility of high resolution analysis by detailed field and period in the future.

Probabilistic filtering for a biological knowledge discovery system with text mining and automatic inference (텍스트 마이닝 및 자동 추론 기반 생물학 지식 발견 시스템을 위한 확률 기반 필터링)

  • Lee, Hee-Jin;Park, Jong-C.
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.2
    • /
    • pp.139-147
    • /
    • 2012
  • In this paper, we discuss the structure of biological knowledge discovery system based on text mining and automatic inference. Given a set of biology documents, the system produces a new hypothesis in an integrated manner. The text mining module of the system first extracts the 'event' information of predefined types from the documents. The inference module then produces a new hypothesis based on the extracted results. Such an integrated system can use information more up-to-date and diverse than other automatic knowledge discovery systems use. However, for the success of such an integrated system, the precision of the text mining module becomes crucial, as any hypothesis based on a single piece of false positive information would highly likely be erroneous. In this paper, we propose a probabilistic filtering method that filters out false positives from the extraction results. Our proposed method shows higher performance over an occurrence-based baseline method.

Using the PubAnnotation ecosystem to perform agile text mining on Genomics & Informatics: a tutorial review

  • Nam, Hee-Jo;Yamada, Ryota;Park, Hyun-Seok
    • Genomics & Informatics
    • /
    • v.18 no.2
    • /
    • pp.13.1-13.6
    • /
    • 2020
  • The prototype version of the full-text corpus of Genomics & Informatics has recently been archived in a GitHub repository. The full-text publications of volumes 10 through 17 are also directly downloadable from PubMed Central (PMC) as XML files. During the Biomedical Linked Annotation Hackathon 6 (BLAH6), we experimented with converting, annotating, and updating 301 PMC full-text articles of Genomics & Informatics using PubAnnotation, a system that provides a convenient way to add PMC publications based on PMCID. Thus, this review aims to provide a tutorial overview of practicing the iterative task of named entity recognition with the PubAnnotation/PubDictionaries/TextAE ecosystem. We also describe developing a conversion tool between the Genia tagger output and the JSON format of PubAnnotation during the hackathon.

A Multilevel Project-Oriented Risk-Mining Framework for Overseas Construction Projects

  • Son, JeongWook;Lee, JeeHee;Yi, June-Seong
    • International conference on construction engineering and project management
    • /
    • 2015.10a
    • /
    • pp.39-40
    • /
    • 2015
  • As international construction market increases, the importance of risk management in international construction project is emphasized. Unfortunately, current risk management practice does not sufficiently deal with project risks. Although a lot of risk analysis techniques have been introduced, most of them focus on project's external unexpected risks such as country conditions and owner's financial standing. However, because those external risks are difficult to manage and take preemptive action, we need to concentrate on project inherent risks. Based on this premise, this paper proposes a project-oriented risk mining approach which could detect and extract project risk factors automatically before they are materialized. This study presents a methodology regarding how to extract potential risks which exist in owner's project requirements and project tender documents using state of the art data analysis method such as text mining. The project-oriented risk mining approach is expected to effectively reflect project characteristics to the project risk management and could provide construction firms with valuable business intelligence.

  • PDF

An Ensemble Approach for Cyber Bullying Text messages and Images

  • Zarapala Sunitha Bai;Sreelatha Malempati
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.11
    • /
    • pp.59-66
    • /
    • 2023
  • Text mining (TM) is most widely used to find patterns from various text documents. Cyber-bullying is the term that is used to abuse a person online or offline platform. Nowadays cyber-bullying becomes more dangerous to people who are using social networking sites (SNS). Cyber-bullying is of many types such as text messaging, morphed images, morphed videos, etc. It is a very difficult task to prevent this type of abuse of the person in online SNS. Finding accurate text mining patterns gives better results in detecting cyber-bullying on any platform. Cyber-bullying is developed with the online SNS to send defamatory statements or orally bully other persons or by using the online platform to abuse in front of SNS users. Deep Learning (DL) is one of the significant domains which are used to extract and learn the quality features dynamically from the low-level text inclusions. In this scenario, Convolutional neural networks (CNN) are used for training the text data, images, and videos. CNN is a very powerful approach to training on these types of data and achieved better text classification. In this paper, an Ensemble model is introduced with the integration of Term Frequency (TF)-Inverse document frequency (IDF) and Deep Neural Network (DNN) with advanced feature-extracting techniques to classify the bullying text, images, and videos. The proposed approach also focused on reducing the training time and memory usage which helps the classification improvement.

A Content Analysis for Website Usefulness Evaluation: Utilizing Text Mining Technique

  • Kwon, Do Young;Jeong, Seung Ryul
    • Journal of Internet Computing and Services
    • /
    • v.16 no.4
    • /
    • pp.71-81
    • /
    • 2015
  • With the increasing influence of online media, company websites have become important communication channels between companies and customers. Companies use their websites as a marketing tool for a variety of purposes, including enhancing their image and selling products or services. Many researchers have examined the criteria, methods, and tools for website evaluation, but most have focused on usability. Prior content analyses have focused not on text content but on website components, an approach likely to produce subjective evaluations. This study attempts to objectively evaluate company websites by utilizing text mining. We analyze the usefulness of company websites by presenting visualized outputs from a business perspective, allowing practitioners to easily understand the results of the website evaluation and use them in decision making. To demonstrate our method empirically, we selected a company with a number of affiliates in Korea and analyzed the text content of their websites to assess their usefulness using natural language processing and graphics packages in R. Practitioners can easily employ our objective evaluation method, and researchers can use it to gain a new perspective on website evaluation.

Korean Consumers' Political Consumption of Japanese Fashion Products (국내 소비자의 일본 패션제품에 대한 정치적 소비 연구)

  • Choi, Yeong-Hyeon;Lee, Kyu-Hye
    • Journal of the Korean Society of Clothing and Textiles
    • /
    • v.44 no.2
    • /
    • pp.295-309
    • /
    • 2020
  • In 2019, Japan announced trade regulations against Korean products; consequently, the sales of Japanese products in Korea dropped due to a Korean consumers' boycott. This study measured the Korean consumers' political consumption behavior toward Japanese fashion products. Unstructured text data from online media sources and consumer posted sources such as blog and SNS were collected. Text mining techniques and semantic network analysis were used to process unstructured data. This study used text mining techniques and semantic network analysis to process data. The results identified boycotting Japanese fashion products and buycotting alternative products and Korean brands due to consumers' political consumption. Two brand cases were investigated in detail. Online text data before and after the political action were compared and significant changes in consumption as well as emotional expressions were identified. Product related industry sectors were identified in terms of the political consumption of fashion: liquor, automobile and tourism industry sectors were closely linked to the fashion sector in terms of boycotting. More "boycott" and "buycott" fashion brands (reflected in consumer attitudes and feelings) were detected in consumer driven texts than in media driven sources.

Analyzing Research Trends of Food Tourism Using Text Mining Techniques (텍스트마이닝 기법을 활용한 국내 음식관광 연구 동향 분석)

  • Shin, Seo-Young;Lee, Bum-Jun
    • Journal of the Korean Society of Food Culture
    • /
    • v.35 no.1
    • /
    • pp.65-78
    • /
    • 2020
  • The objective of this study was to review and evaluate the growing subject of food tourism research, and thus identify the trend of food tourism research. Using a Text mining technique, this paper discovered the trends of the literature on food tourism that was published from 2004 to 2018. The study reviewed 201 articles that include the words 'food' and 'tourism' in their abstracts in the KCI database. The Wordscloud analysis results presented that the research subjects were predominantly 'Festival', 'Region', 'Culture', 'Tourist', but there was a slight difference in frequency according to the time period. Based on the main path analysis, we extracted the meaningful paths between the cited references published domestically, resulting in a total of 12 networks from 2004 to 2018. The Text network analysis indicated that the words with high centrality showed similarities and differences in the food tourism literature according to the time period, displaying them in a sociogram, a visualization tool. This study has implications that it offers a new perspective of comprehending the overall flow of relevant research.