• 제목/요약/키워드: Text-mining Analysis

검색결과 1,221건 처리시간 0.035초

PubMiner: Machine Learning-based Text Mining for Biomedical Information Analysis

  • Eom, Jae-Hong;Zhang, Byoung-Tak
    • Genomics & Informatics
    • /
    • 제2권2호
    • /
    • pp.99-106
    • /
    • 2004
  • In this paper we introduce PubMiner, an intelligent machine learning based text mining system for mining biological information from the literature. PubMiner employs natural language processing techniques and machine learning based data mining techniques for mining useful biological information such as protein­protein interaction from the massive literature. The system recognizes biological terms such as gene, protein, and enzymes and extracts their interactions described in the document through natural language processing. The extracted interactions are further analyzed with a set of features of each entity that were collected from the related public databases to infer more interactions from the original interactions. An inferred interaction from the interaction analysis and native interaction are provided to the user with the link of literature sources. The performance of entity and interaction extraction was tested with selected MEDLINE abstracts. The evaluation of inference proceeded using the protein interaction data of S. cerevisiae (bakers yeast) from MIPS and SGD.

텍스트마이닝(Text mining)을 활용한 한의학 원전 연구의 가능성 모색 -『황제내경(黃帝內經)』에 대한 적용례를 중심으로 - (Investigation of the Possibility of Research on Medical Classics Applying Text Mining - Focusing on the Huangdi's Internal Classic -)

  • 배효진;김창업;이충열;신상원;김종현
    • 대한한의학원전학회지
    • /
    • 제31권4호
    • /
    • pp.27-46
    • /
    • 2018
  • Objectives : In this paper, we investigated the applicability of text mining to Korean Medical Classics and suggest that researchers of Medical Classics utilize this methodology. Methods : We applied text mining to the Huangdi's internal classic, a seminal text of Korean Medicine, and visualized networks which represent connectivity of terms and documents based on vector similarity. Then we compared this outcome to the prior knowledge generated through conventional qualitative analysis and examined whether our methodology could accurately reflect the keyword of documents, clusters of terms, and relationships between documents. Results : In the term network, we confirmed that Qi played a key role in the term network and that the theory development based on relativity between Yin and Yang was reflected. In the document network, Suwen and Lingshu are quite distinct from each other due to their differences in description form and topic. Also, Suwen showed high similarity between adjacent chapters. Conclusions : This study revealed that text mining method could yield a significant discovery which corresponds to prior knowledge about Huangdi's internal classic. Text mining can be used in a variety of research fields covering medical classics, literatures, and medical records. In addition, visualization tools can also be utilized for educational purposes.

The Adaptive SPAM Mail Detection System using Clustering based on Text Mining

  • Hong, Sung-Sam;Kong, Jong-Hwan;Han, Myung-Mook
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제8권6호
    • /
    • pp.2186-2196
    • /
    • 2014
  • Spam mail is one of the most general mail dysfunctions, which may cause psychological damage to internet users. As internet usage increases, the amount of spam mail has also gradually increased. Indiscriminate sending, in particular, occurs when spam mail is sent using smart phones or tablets connected to wireless networks. Spam mail consists of approximately 68% of mail traffic; however, it is believed that the true percentage of spam mail is at a much more severe level. In order to analyze and detect spam mail, we introduce a technique based on spam mail characteristics and text mining; in particular, spam mail is detected by extracting the linguistic analysis and language processing. Existing spam mail is analyzed, and hidden spam signatures are extracted using text clustering. Our proposed method utilizes a text mining system to improve the detection and error detection rates for existing spam mail and to respond to new spam mail types.

소셜네트워크서비스에 활용할 비표준어 한글 처리 방법 연구 (Research on Methods for Processing Nonstandard Korean Words on Social Network Services)

  • 이종화;레환수;이현규
    • 한국산업정보학회논문지
    • /
    • 제21권3호
    • /
    • pp.35-46
    • /
    • 2016
  • 특정한 관심이나 활동을 공유하는 관계망을 구축해주는 온라인 서비스인 소셜네트워크서비스(SNS), 자신의 관심사에 따라 자유롭게 글, 사진, 동영상 등을 올릴 수 있는 공간인 블로그(Blog) 등은 자신을 알리고 표현하는 사회현상으로 자리 매김하고 있다. 이러한 SNS나 블로그를 통해 사용자들이 자유롭게 표현한 글들을 분석하여 의미있는 정보와 가치, 그리고 패턴을 찾기 위한 텍스트 마이닝(Text Mining), 오피니언 마이닝(Opinion Mining), 의미 분석(Semantic Analysis) 등의 연구가 활발히 이루어지고 있다. 또한, 연구자들의 연구 효율을 보다 높이기 위하여 키워드 기반 연구들도 이루어져있다. 하지만 대부분의 연구들은 한글의 맞춤법에 많은 한계점을 나타내고 있다. 본 연구는 어근을 찾기 힘든 이상한 외계 언어, 무분별하게 표현되는 속어, 알기 힘든 한글 이모티콘 인터넷 언어, 마이닝 처리 과정에서 파악하기 어려운 단어들을 데이터베이스에 구축하여 데이터 사전 기반 마이닝 처리 기법의 한계를 극복하고자 한다. 특정 주제에 대한 주관적 견해로 구성된 블로그를 사례 분석 대상으로 연구를 진행하였으며 유니코드를 활용한 비표준어 추출은 텍스트 마이닝 처리에 유용함을 발견할 수 있었다.

우수 의약품 제조 기준 위반 패턴 인식을 위한 연관규칙과 텍스트 마이닝 기반 t-SNE분석 (Violation Pattern Analysis for Good Manufacturing Practice for Medicine using t-SNE Based on Association Rule and Text Mining)

  • 이준오;손소영
    • 품질경영학회지
    • /
    • 제50권4호
    • /
    • pp.717-734
    • /
    • 2022
  • Purpose: The purpose of this study is to effectively detect violations that occur simultaneously against Good Manufacturing Practice, which were concealed by drug manufacturers. Methods: In this study, we present an analysis framework for analyzing regulatory violation patterns using Association Rule Mining (ARM), Text Mining, and t-distributed Stochastic Neighbor Embedding (t-SNE) to increase the effectiveness of on-site inspection. Results: A number of simultaneous violation patterns was discovered by applying Association Rule Mining to FDA's inspection data collected from October 2008 to February 2022. Among them there were 'concurrent violation patterns' derived from similar regulatory ranges of two or more regulations. These patterns do not help to predict violations that simultaneously appear but belong to different regulations. Those unnecessary patterns were excluded by applying t-SNE based on text-mining. Conclusion: Our proposed approach enables the recognition of simultaneous violation patterns during the on-site inspection. It is expected to decrease the detection time by increasing the likelihood of finding intentionally concealed violations.

An Enhanced Text Mining Approach using Ensemble Algorithm for Detecting Cyber Bullying

  • Z.Sunitha Bai;Sreelatha Malempati
    • International Journal of Computer Science & Network Security
    • /
    • 제23권5호
    • /
    • pp.1-6
    • /
    • 2023
  • Text mining (TM) is most widely used to process the various unstructured text documents and process the data present in the various domains. The other name for text mining is text classification. This domain is most popular in many domains such as movie reviews, product reviews on various E-commerce websites, sentiment analysis, topic modeling and cyber bullying on social media messages. Cyber-bullying is the type of abusing someone with the insulting language. Personal abusing, sexual harassment, other types of abusing come under cyber-bullying. Several existing systems are developed to detect the bullying words based on their situation in the social networking sites (SNS). SNS becomes platform for bully someone. In this paper, An Enhanced text mining approach is developed by using Ensemble Algorithm (ETMA) to solve several problems in traditional algorithms and improve the accuracy, processing time and quality of the result. ETMA is the algorithm used to analyze the bullying text within the social networking sites (SNS) such as facebook, twitter etc. The ETMA is applied on synthetic dataset collected from various data a source which consists of 5k messages belongs to bullying and non-bullying. The performance is analyzed by showing Precision, Recall, F1-Score and Accuracy.

A Technical Approach for Suggesting Research Directions in Telecommunications Policy

  • Oh, Junseok;Lee, Bong Gyou
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제8권12호
    • /
    • pp.4467-4488
    • /
    • 2014
  • The bibliometric analysis is widely used for understanding research domains, trends, and knowledge structures in a particular field. The analysis has majorly been used in the field of information science, and it is currently applied to other academic fields. This paper describes the analysis of academic literatures for classifying research domains and for suggesting empty research areas in the telecommunications policy. The application software is developed for retrieving Thomson Reuters' Web of Knowledge (WoK) data via web services. It also used for conducting text mining analysis from contents and citations of publications. We used three text mining techniques: the Keyword Extraction Algorithm (KEA) analysis, the co-occurrence analysis, and the citation analysis. Also, R software is used for visualizing the term frequencies and the co-occurrence network among publications. We found that policies related to social communication services, the distribution of telecommunications infrastructures, and more practical and data-driven analysis researches are conducted in a recent decade. The citation analysis results presented that the publications are generally received citations, but most of them did not receive high citations in the telecommunications policy. However, although recent publications did not receive high citations, the productivity of papers in terms of citations was increased in recent ten years compared to the researches before 2004. Also, the distribution methods of infrastructures, and the inequity and gap appeared as topics in important references. We proposed the necessity of new research domains since the analysis results implies that the decrease of political approaches for technical problems is an issue in past researches. Also, insufficient researches on policies for new technologies exist in the field of telecommunications. This research is significant in regard to the first bibliometric analysis with abstracts and citation data in telecommunications as well as the development of software which has functions of web services and text mining techniques. Further research will be conducted with Big Data techniques and more text mining techniques.

텍스트 마이닝과 오피니언 마이닝 분석을 활용한 국내외 스포츠용품 브랜드 비교·분석 연구 (Comparison and Analysis of Domestic and Foreign Sports Brands Using Text Mining and Opinion Mining Analysis)

  • 김재환;이재문
    • 한국콘텐츠학회논문지
    • /
    • 제18권6호
    • /
    • pp.217-234
    • /
    • 2018
  • 본 연구는 국내외 스포츠용품 브랜드에 대한 빅데이터 분석을 실시하였다. 이를 위해 소셜 매트릭스 프로그램인 텍스톰과 패션데이터 분석 플랫폼인 MISP를 통해 텍스트 마이닝, TF-IDF, 오피니언 마이닝, 관심도 그래프를 실시하였으며, 스포츠브랜드에 대한 최근 인식을 살펴보기 위해 2017년 1월 1일부터 2017년 12월 31일까지 1년간을 연구대상 기간으로 한정하였다. 분석 결과, 첫째, 각 브랜드를 대표하는 상품을 확인할 수 있었다. 둘째, 각 브랜드를 대표하는 마케팅을 확인할 수 있었다. 셋째, 각 브랜드에서 공통적으로 추출된 단어를 확인할 수 있었다. 넷째, 각 브랜드의 긍정 및 부정에 대한 감정을 확인할 수 있었다.

사회과학을 위한 양적 텍스트 마이닝: 이주, 이민 키워드 논문 및 언론기사 분석 (Quantitative Text Mining for Social Science: Analysis of Immigrant in the Articles)

  • 이수정;최두영
    • 한국콘텐츠학회논문지
    • /
    • 제20권5호
    • /
    • pp.118-127
    • /
    • 2020
  • 본 연구는 최근 사회과학에서 실시되고 있는 양적 텍스트 분석의 흐름과 분석을 실시함에 있어 주의해야 할 사례를 포함하여 기술 하였다. 특히, 2017년부터 2019년까지 3년간 학술지와 언론에서 사용된 "이주", "이민" 키워드를 기반으로 사례연구를 실시하였다. 이를 위해 최근 사회과학분야에서 주목 받는 자연어 처리 기술(NLP)를 이용한 양적 텍스트 분석 (Quantitate text analysis)을 사용하였다. 양적 텍스트 분석은 문서를 구조적 데이터로 변환하여, 가설의 발견 및 검증을 실시하는 데이터 과학의 영역으로, 데이터의 모델링 및 가시화 등이 가능하고, 특히 비구조화 된 데이터를 구조화할 수 있다는 점에서 사회과학 분야에 많이 도입하였다. 따라서 본 연구는 양적 텍스트 분석을 통해 "이주", "이민"을 키워드로 한 연구 및 언론 기사에 대한 통계 분석을 실시하고 도출된 결론에 대한 해석을 실시하였다.

Competitive intelligence in Korean Ramen Market using Text Mining and Sentiment Analysis

  • Kim, Yoosin;Jeong, Seung Ryul
    • 인터넷정보학회논문지
    • /
    • 제19권1호
    • /
    • pp.155-166
    • /
    • 2018
  • These days, online media, such as blogospheres, online communities, and social networking sites, provides the uncountable user-generated content (UGC) to discover market intelligence and business insight with. The business has been interested in consumers, and constantly requires the approach to identify consumers' opinions and competitive advantage in the competing market. Analyzing consumers' opinion about oneself and rivals can help decision makers to gain in-depth and fine-grained understanding on the human and social behavioral dynamics underlying the competition. In order to accomplish the comparison study for rival products and companies, we attempted to do competitive analysis using text mining with online UGC for two popular and competing ramens, a market leader and a market follower, in the Korean instant noodle market. Furthermore, to overcome the lack of the Korean sentiment lexicon, we developed the domain specific sentiment dictionary of Korean texts. We gathered 19,386 pieces of blogs and forum messages, developed the Korean sentiment dictionary, and defined the taxonomy for categorization. In the context of our study, we employed sentiment analysis to present consumers' opinion and statistical analysis to demonstrate the differences between the competitors. Our results show that the sentiment portrayed by the text mining clearly differentiate the two rival noodles and convincingly confirm that one is a market leader and the other is a follower. In this regard, we expect this comparison can help business decision makers to understand rich in-depth competitive intelligence hidden in the social media.