• Title/Summary/Keyword: Text Mining Method

Search Result 451, Processing Time 0.025 seconds

Logistic Regression Ensemble Method for Extracting Significant Information from Social Texts (소셜 텍스트의 주요 정보 추출을 위한 로지스틱 회귀 앙상블 기법)

  • Kim, So Hyeon;Kim, Han Joon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.5
    • /
    • pp.279-284
    • /
    • 2017
  • Currenty, in the era of big data, text mining and opinion mining have been used in many domains, and one of their most important research issues is to extract significant information from social media. Thus in this paper, we propose a logistic regression ensemble method of finding the main body text from blog HTML. First, we extract structural features and text features from blog HTML tags. Then we construct a classification model with logistic regression and ensemble that can decide whether any given tags involve main body text or not. One of our important findings is that the main body text can be found through 'depth' features extracted from HTML tags. In our experiment using diverse topics of blog data collected from the web, our tag classification model achieved 99% in terms of accuracy, and it recalled 80.5% of documents that have tags involving the main body text.

A Study on Plagiarism Detection and Document Classification Using Association Analysis (연관분석을 이용한 효과적인 표절검사 및 문서분류에 관한 연구)

  • Hwang, Insoo
    • The Journal of Information Systems
    • /
    • v.23 no.3
    • /
    • pp.127-142
    • /
    • 2014
  • Plagiarism occurs when the content is copied without permission or citation, and the problem of plagiarism has rapidly increased because of the digital era of resources available on the World Wide Web. An important task in plagiarism detection is measuring and determining similar text portions between a given pair of documents. One of the main difficulties of this task is that not all similar text fragments are examples of plagiarism, since thematic coincidences also tend to produce portions of similar text. In order to handle this problem, this paper proposed association analysis in data mining to detect plagiarism. This method is able to detect common actions performed by plagiarists such as word deletion, insertion and transposition, allowing to obtain plausible portions of plagiarized text. Experimental results employing an unsupervised document classification strategy showed that the proposed method outperformed traditionally used approaches.

A Study on the Trend Analysis Based on Personal Information Threats Using Text Mining (텍스트 마이닝을 활용한 개인정보 위협기반의 트렌드 분석 연구)

  • Kim, Young-Hee;Lee, Taek-Hyun;Kim, Jong-Myoung;Park, Won-Hyung;Koo, Kwang-Ho
    • Convergence Security Journal
    • /
    • v.19 no.2
    • /
    • pp.29-38
    • /
    • 2019
  • For that reason, trend research has been actively conducted to identify and analyze the key topics in large amounts of data and information. Also personal information protection field is increasing activities in order to identify prospects and trends in advance for preemptive response. However, only research based on technology such as trends in information security field and personal information protection solution is broadly taking place. In this study, threat-based trends in personal information protection field is analyzed through text mining method. This will be the key to deduct undiscovered issues and provide visibility of current and future trends. Policy formulation is possible for companies handling personal information and for that reason, it is expected to be used for searching direction of strategy establishment for effective response.

Using Text-mining Method to Identify Research Trends of Freshwater Exotic Species in Korea (텍스트마이닝 (text-mining) 기법을 이용한 국내 담수외래종 연구동향 파악)

  • Do, Yuno;Ko, Eui-Jeong;Kim, Young-Min;Kim, Hyo-Gyeom;Joo, Gea-Jae;Kim, Ji Yoon;Kim, Hyun-Woo
    • Korean Journal of Ecology and Environment
    • /
    • v.48 no.3
    • /
    • pp.195-202
    • /
    • 2015
  • We identified research trends for freshwater exotic species in South Korea using text mining methods in conjunction with bibliometric analysis. We searched scientific and common names of freshwater exotic species as searching keywords including 1 mammal species, 3 amphibian-reptile species, 11 fish species, 2 aquatic plant species. A total of 245 articles including research articles and abstracts of conference proceedings published by 56 academic societies and institutes were collected from scientific article databases. The search keywords used were the common names for the exotic species. The $20^{th}$ century (1900's) saw the number of articles increase; however, during the early $21^{st}$ century (2000's) the number of published articles decreased slowly. The number of articles focusing on physiological and embryological research was significantly greater than taxonomic and ecological studies. Rainbow trout and Nile tilapia were the main research topic, specifically physiological and embryological research associated with the aquaculture of these species. Ecological studies were only conducted on the distribution and effect of large-mouth bass and nutria. The ecological risk associated with freshwater exotic species has been expressed yet the scientific information might be insufficient to remove doubt about ecological issues as expressed by interested by individuals and policy makers due to bias in research topics with respect to freshwater exotic species. The research topics of freshwater exotic species would have to diversify to effectively manage freshwater exotic species.

Using Text Mining for the Analysis of Research Trends Related to Laws Under the Ministry of Oceans and Fisheries (텍스트 마이닝을 활용한 해양수산부 법률 관련 연구동향 분석연구)

  • Hwang, Kyu Won;Lee, Moon Suk;Yun, So Ra
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.28 no.4
    • /
    • pp.549-566
    • /
    • 2022
  • Recently, artificial intelligence (AI) technology has progressed rapidly, and industries using this technology are significantly increasing. Further, analysis research using text mining, which is an application of artificial intelligence, is being actively developed in the field of social science research. About 125 laws, including joint laws, have been enacted under the Ministry of Oceans and Fisheries in various sectors including marine environment, fisheries, ships, fishing villages, ports, etc. Research on the laws under the Ministry of Oceans and Fisheries has been progressively conducted, and is steadily increasing quantitatively. In this study, the domestic research trends were analyzed through text mining, targeting the research papers related to laws of the Ministry of Oceans and Fisheries. As part of this research method, first, topic modeling which is a type of text mining was performed to identify potential topics. Second, co-occurrence network analysis was performed, focusing on the keywords in the research papers dealing with specific laws to derive the key themes covered. Finally, author network analysis was performed to explore social networks among authors. The results showed that key topics have been changed by period, and subjects were explored by targeting Ship Safety Law, Marine Environment Management Law, Fisheries Law, etc. Furthermore, in this study, core researchers were selected based on author network analysis, and the tendency for joint research performed by authors was identified. Through this study, changes in the topics for research related to the laws of the Ministry of Oceans and Fisheries were identified up to date, and it is expected that future research topics will be further diversified, and there will be growth of quantitative and qualitative research in the field of oceans and fisheries.

Text Document Categorization using FP-Tree (FP-Tree를 이용한 문서 분류 방법)

  • Park, Yong-Ki;Kim, Hwang-Soo
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.11
    • /
    • pp.984-990
    • /
    • 2007
  • As the amount of electronic documents increases explosively, automatic text categorization methods are needed to identify those of interest. Most methods use machine learning techniques based on a word set. This paper introduces a new method, called FPTC (FP-Tree based Text Classifier). FP-Tree is a data structure used in data-mining. In this paper, a method of storing text sentence patterns in the FP-Tree structure and classifying text using the patterns is presented. In the experiments conducted, we use our algorithm with a #Mutual Information and Entropy# approach to improve performance. We also present an analysis of the algorithm via an ordinary differential categorization method.

Establishment of ITS Policy Issues Investigation Method in the Road Section applied Textmining (텍스트마이닝을 활용한 도로분야 ITS 정책이슈 탐색기법 정립)

  • Oh, Chang-Seok;Lee, Yong-taeck;Ko, Minsu
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.15 no.6
    • /
    • pp.10-23
    • /
    • 2016
  • With requiring circumspections using big data, this study attempts to develop and apply the search method for audit issues relating to the ITS policy or program. For the foregoing, the auditing process of the board of audit and inspection was converged with the theoretical frame of boundary analysis proposed by William Dunn as an analysis tool for audit issues. Moreover, we apply the text mining technique in order to computerize the analysis tool, which is similar to the boundary analysis in the concept of approaching meta-problems. For the text mining analysis, specific model we applied the antisymmetry-symmetry compound lexeme-based LDA model based on the Latent Dirichlet Allocation(LDA) methodologies proposed by David Blei. The several prime issues were founded through a case analysis as follows: lack of collection of traffic information by the urban traffic information system, which is operated by the National Police Agency, the overlapping problems between the Ministry of Land, Infrastructure and Transport and the Advanced Traffic Management System and fabrication of the mileage on digital tachograph.

A study on integrating and discovery of semantic based knowledge model (의미 기반의 지식모델 통합과 탐색에 관한 연구)

  • Chun, Seung-Su
    • Journal of Internet Computing and Services
    • /
    • v.15 no.6
    • /
    • pp.99-106
    • /
    • 2014
  • Generation and analysis methods have been proposed in recent years, such as using a natural language and formal language processing, artificial intelligence algorithms based knowledge model is effective meaning. its semantic based knowledge model has been used effective decision making tree and problem solving about specific context. and it was based on static generation and regression analysis, trend analysis with behavioral model, simulation support for macroeconomic forecasting mode on especially in a variety of complex systems and social network analysis. In this study, in this sense, integrating knowledge-based models, This paper propose a text mining derived from the inter-Topic model Integrated formal methods and Algorithms. First, a method for converting automatically knowledge map is derived from text mining keyword map and integrate it into the semantic knowledge model for this purpose. This paper propose an algorithm to derive a method of projecting a significant topic map from the map and the keyword semantically equivalent model. Integrated semantic-based knowledge model is available.

Quantitative Analysis of Research Trends in Korean E-Government Using Text Mining and Network Analysis Methods (국내 전자정부 연구동향에 대한 정량적 분석: 텍스트 마이닝과 네트워크 분석 기법을 중심으로)

  • Lee, Soo-In;Shin, Shin-Ae;Kang, Dong-Seok;Kim, Sang-Hyun
    • Informatization Policy
    • /
    • v.25 no.4
    • /
    • pp.84-107
    • /
    • 2018
  • The existing research on domestic e-government trends in Korea has weaknesses in that it depends only on qualitative research methods. Therefore, a quantitative analysis was conducted through this study as of September 2018 based on the data from 1996 to 2017. A total of seven research topics were derived from text mining, of which the network centrality of the framework and public policy effect were identified as highly significant. The results of this study provide academic and policy implications for the development of e-government. including that using a quantitative analysis method instead of a qualitative method contributes to ensuring relative objectivity and diversity of learning.

Co-occurrence Based Drug-disease Relationship Inference with Genes as Mediators (유전자를 중간 매개로 고려한 동시발생 기반의 약물-질병 관계 추론)

  • Shin, Sangwon;Sin, Yeeun;Jang, Giup;Yoo, Youngmi
    • The Journal of Korean Institute of Information Technology
    • /
    • v.16 no.11
    • /
    • pp.1-9
    • /
    • 2018
  • Drug repositioning is to discover new uses of drugs. Text mining derives knowledge from unstructured text. We propose a method to predict new drug-disease relationships by taking into account the rate of frequency of genes simultaneously measured in disease-gene and gene-drug. Co-occurrence of drug-gene and gene-disease in the biological literature is counted and calculate the rate of the gene for each drug and disease. Weights of drug-disease relationships are calculated using the average of the rates of genes that are measured and used to measure the accuracy for each disease. In measuring drug-disease relationships, a more accurate identification of relationships was shown by measuring the frequency on a sentence and considering multiple relationships than existing method.