• Title/Summary/Keyword: TextMining

Search Result 1,563, Processing Time 0.024 seconds

BIOLOGY ORIENTED TARGET SPECIFIC LITERATURE MINING FOR GPCR PATHWAY EXTRACTION (GPCR 경로 추출을 위한 생물학 기반의 목적지향 텍스트 마이닝 시스템)

  • KIm, Eun-Ju;Jung, Seol-Kyoung;Yi, Eun-Ji;Lee, Gary-Geunbae;Park, Soo-Jun
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2003.10a
    • /
    • pp.86-94
    • /
    • 2003
  • Electronically available biological literature has been accumulated exponentially in the course of time. So, researches on automatically acquiring knowledge from these tremendous data by text mining technology become more and more prosperous. However, most of the previous researches are technology oriented and are not well focused in practical extraction target, hence result in low performance and inconvenience for the bio-researchers to actually use. In this paper, we propose a more biology oriented target domain specific text mining system, that is, POSTECH bio-text mining system (POSBIOTM), for signal transduction pathway extraction, especially for G protein-coupled receptor (GPCR) pathway. To reflect more domain knowledge, we specify the concrete target for pathway extraction and define the minimal pathway domain ontology. Under this conceptual model, POSBIOTM extracts interactions and entities of pathways from the full biological articles using a machine learning oriented extraction method and visualizes the pathways using JDesigner module provided in the system biology workbench (SBW) [14]

  • PDF

Fake News Detection for Korean News Using Text Mining and Machine Learning Techniques (텍스트 마이닝과 기계 학습을 이용한 국내 가짜뉴스 예측)

  • Yun, Tae-Uk;Ahn, Hyunchul
    • Journal of Information Technology Applications and Management
    • /
    • v.25 no.1
    • /
    • pp.19-32
    • /
    • 2018
  • Fake news is defined as the news articles that are intentionally and verifiably false, and could mislead readers. Spread of fake news may provoke anxiety, chaos, fear, or irrational decisions of the public. Thus, detecting fake news and preventing its spread has become very important issue in our society. However, due to the huge amount of fake news produced every day, it is almost impossible to identify it by a human. Under this context, researchers have tried to develop automated fake news detection method using Artificial Intelligence techniques over the past years. But, unfortunately, there have been no prior studies proposed an automated fake news detection method for Korean news. In this study, we aim to detect Korean fake news using text mining and machine learning techniques. Our proposed method consists of two steps. In the first step, the news contents to be analyzed is convert to quantified values using various text mining techniques (Topic Modeling, TF-IDF, and so on). After that, in step 2, classifiers are trained using the values produced in step 1. As the classifiers, machine learning techniques such as multiple discriminant analysis, case based reasoning, artificial neural networks, and support vector machine can be applied. To validate the effectiveness of the proposed method, we collected 200 Korean news from Seoul National University's FactCheck (http://factcheck.snu.ac.kr). which provides with detailed analysis reports from about 20 media outlets and links to source documents for each case. Using this dataset, we will identify which text features are important as well as which classifiers are effective in detecting Korean fake news.

A Study on Monitoring Method of Citizen Opinion based on Big Data : Focused on Gyeonggi Lacal Currency (Gyeonggi Money) (빅데이터 기반 시민의견 모니터링 방안 연구 : "경기지역화폐"를 중심으로)

  • Ahn, Soon-Jae;Lee, Sae-Mi;Ryu, Seung-Ei
    • Journal of Digital Convergence
    • /
    • v.18 no.7
    • /
    • pp.93-99
    • /
    • 2020
  • Text mining is one of the big data analysis methods that extracts meaningful information from atypical large-scale text data. In this study, text mining was used to monitor citizens' opinions on the policies and systems being implemented. We collected 5,108 newspaper articles and 748 online cafe posts related to 'Gyeonggi Lacal Currency' and performed frequency analysis, TF-IDF analysis, association analysis, and word tree visualization analysis. As a result, many articles related to the purpose of introducing local currency, the benefits provided, and the method of use. However, the contents related to the actual use of local currency were written in the online cafe posts. In order to revitalize local currency, the news was involved in the promotion of local currency as an informant. Online cafe posts consisted of the opinions of citizens who are local currency users. SNS and text mining are expected to effectively activate various policies as well as local currency.

Analysis of patterns in meteorological research and development using a text-mining algorithm (텍스트 마이닝 알고리즘을 이용한 기상청 연구개발분야 과제의 추세 분석)

  • Park, Hongju;Kim, Habin;Park, Taeyoung;Lee, Yung-Seop
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.5
    • /
    • pp.935-947
    • /
    • 2016
  • This paper considers the analysis of patterns in meteorological research and development using a text-mining algorithm as the method of analyzing unstructured data. To analyze text data, we define a list of terms related to meteorological research and development, construct times series of a term-document matrix through data preprocessing, and identify terms that have upward or downward patterns over time. The proposed methodology is applied to multi-year plans funded by Korea Meteorological Administration research and development programs from 2011 to 2015.

Relevant Analysis on User Choice Tendency of Intelligent Tourism Platform under the Background of Text mining

  • Liu, Zi-Yang;Liao, Kai;Guo, Zi-Han
    • Journal of the Korea Society of Computer and Information
    • /
    • v.24 no.9
    • /
    • pp.119-125
    • /
    • 2019
  • The purpose of this study is to find out the relevant factors of the choice tendency of tourism users to Intelligent Tourism platform through big data analysis, which will help enterprises to make accurate positioning and improvement according to user information feedback in the tourism market in the future, so as to gain the favor of users' choice and achieve long-term market competitiveness. This study takes the Intelligent Tourism platform as the independent variable and the user choice tendency as the dependent variable, and explores the related factors between the Intelligent Tourism platform and the user choice tendency. This study make use of text mining and R language text analysis, and uses SPSS and AMOS statistical analysis tools to carry out empirical analysis. According to the analysis results, the conclusions are as follows: service quality has a significant positive correlation with user choice tendency; service quality has a significant positive correlation with tourism trust; Tourism Trust has a significant positive correlation with user choice tendency; service quality has a significant positive correlation with user experience; user experience has a significant positive correlation with user choice tendency Positive correlation effect.

Trend Analysis of Thyroid Cancer Research in Korea with Text Mining Techniques

  • Lee, Tae-Gyeong;Heo, Seong-Min;Shin, Seung-Hyeok;Yang, Ji-Yeon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.23 no.12
    • /
    • pp.153-161
    • /
    • 2018
  • In this paper, we propose a text-centered approach to identify the research trend of thyroid cancer in Korea. We incorporate statistical analysis, text mining and machine learning techniques with our clinical insights to find connective associations between terminologies and to discover informative clusters of literatures. The incidence of thyroid cancer in Korea increased rapidly in the 2000s, which fueled the debate regarding overdiagnosis, but recently the number of patients undergoing surgery has decreased significantly due to conscious reform efforts from various circles. We analyzed the abstracts and keywords of related research papers from DBpia. It was found that most were case reports in the 1980s, and some papers in the 1990s discussed the early detection of thyroid cancer by mass screening. While many papers focused on different diagnostic techniques and the detection of small cancers in the 2000s, many emphasized more on the quality of life of patients in the 2010s. There was an apparent change in the topics of thyroid cancer research over past decades. The results of this study would serve as a reference guide for current and future research directions.

A Text Mining Approach to the Comparative Analysis of the Blockchain Issues : South Korea and the United States (텍스트 마이닝을 활용한 블록체인 이슈 분석 : 한국과 미국)

  • Shon, Saeah;Jeon, Byeong-Jin;Kim, Hee-Woong
    • Journal of Information Technology Services
    • /
    • v.18 no.1
    • /
    • pp.45-61
    • /
    • 2019
  • Blockchain technology, which enables transparent transactions among individuals without central control, opens up diverse business possibilities. It is also expected that blockchain will have a ripple effect on the entire area of society including finance, manufacturing, distribution, and the public sector. Previous studies related to the blockchain also deals with its functional features and application to industrial and public fields. In the new technology such as blockchain, it is necessary to know what social perception is in order to create technological development environment, but there is a lack of research on it. Therefore, this study aims to find out the implications for industrial and policy direction by analyzing issues related to the blockchain in South Korea and the US through text mining. From these two countries, we collected text data related to blockchain in online communities and internet articles. Then, we did co-occurrence analysis and topic modeling on them respectively. As a result of this study, we have found common points and differences in keywords and topics extracted from social media in the two countries. Based on them, we can offer helpful suggestions for building a sound blockchain ecosystem, and directions for future research.

Predicting numeric ratings for Google apps using text features and ensemble learning

  • Umer, Muhammad;Ashraf, Imran;Mehmood, Arif;Ullah, Saleem;Choi, Gyu Sang
    • ETRI Journal
    • /
    • v.43 no.1
    • /
    • pp.95-108
    • /
    • 2021
  • Application (app) ratings are feedback provided voluntarily by users and serve as important evaluation criteria for apps. However, these ratings can often be biased owing to insufficient or missing votes. Additionally, significant differences have been observed between numeric ratings and user reviews. This study aims to predict the numeric ratings of Google apps using machine learning classifiers. It exploits numeric app ratings provided by users as training data and returns authentic mobile app ratings by analyzing user reviews. An ensemble learning model is proposed for this purpose that considers term frequency/inverse document frequency (TF/IDF) features. Three TF/IDF features, including unigrams, bigrams, and trigrams, were used. The dataset was scraped from the Google Play store, extracting data from 14 different app categories. Biased and unbiased user ratings were discriminated using TextBlob analysis to formulate the ground truth, from which the classifier prediction accuracy was then evaluated. The results demonstrate the high potential for machine learning-based classifiers to predict authentic numeric ratings based on actual user reviews.

An Analysis of Key Elements for FinTech Companies Based on Text Mining: From the User's Review (텍스트 마이닝 기반의 자산관리 핀테크 기업 핵심 요소 분석: 사용자 리뷰를 바탕으로)

  • Son, Aelin;Shin, Wangsoo;Lee, Zoonky
    • The Journal of Information Systems
    • /
    • v.29 no.4
    • /
    • pp.137-151
    • /
    • 2020
  • Purpose Domestic asset management fintech companies are expected to grow by leaps and bounds along with the implementation of the "Data bills." Contrary to the market fever, however, academic research is insufficient. Therefore, we want to analyze user reviews of asset management fintech companies that are expected to grow significantly in the future to derive strengths and complementary points of services that have been provided, and analyze key elements of asset management fintech companies. Design/methodology/approach To analyze large amounts of review text data, this study applied text mining techniques. Bank Salad and Toss, domestic asset management application services, were selected for the study. To get the data, app reviews were crawled in the online app store and preprocessed using natural language processing techniques. Topic Modeling and Aspect-Sentiment Analysis were used as analysis methods. Findings According to the analysis results, this study was able to derive the elements that asset management fintech companies should have. As a result of Topic Modeling, 7 topics were derived from Bank Salad and Toss respectively. As a result, topics related to function and usage and topics on stability and marketing were extracted. Sentiment Analysis showed that users responded positively to function-related topics, but negatively to usage-related topics and stability topics. Through this, we were able to extract the key elements needed for asset management fintech companies.

A Study on the Characteristics of Amekaji Fashion Trends Using Big Data Text Mining Analysis (빅데이터 텍스트 마이닝 분석을 활용한 아메카지 패션 트렌드 특징 고찰)

  • Kim, Gihyung
    • Journal of Fashion Business
    • /
    • v.26 no.3
    • /
    • pp.138-154
    • /
    • 2022
  • The purpose of this study is to identify the characteristics of domestic American casual fashion trends using big data text mining analysis. 108,524 posts and 2,038,999 extracted keywords from Naver and Daum related to American casual fashion in the past 5 years were collected and refined by the Textom program, and frequency analysis, word cloud, N-gram, centrality analysis, and CONCOR analysis were performed. The frequency analysis, 'vintage', 'style', 'daily look', 'coordination', 'workwear', 'men's wear' appeared as the main keywords. The main nationality of the representative brands was Japanese, followed by American, Korean, and others. As a result of the CONCOR analysis, four clusters were derived: "general American casual trend", "vintage taste", "direct sales mania", and "American styling". This study results showed that Japanese American casual clothes are influenced by American casual clothes, and American casual fashion in Korea, which has been reinterpreted, is completed with various coordination and creative styles such as workwear, street, military, classic, etc., focusing on items and brands. Looks were worn and shared on social networks, and the existence of an active consumer group and market potential to obtain genuine products, ranging from second-hand transactions for limited edition vintages to individual transactions were also confirmed. The significance of this study is that it presented the characteristics of American casual fashion trends academically based on online text data that the public actually uses because it has been spread by the public.