• Title/Summary/Keyword: Text network

Search Result 1,111, Processing Time 0.026 seconds

Analysis of Keywords in national river occupancy permits by region using text mining and network theory (텍스트 마이닝과 네트워크 이론을 활용한 권역별 국가하천 점용허가 키워드 분석)

  • Seong Yun Jeong
    • Smart Media Journal
    • /
    • v.12 no.11
    • /
    • pp.185-197
    • /
    • 2023
  • This study was conducted using text mining and network theory to extract useful information for application for occupancy and performance of permit tasks contained in the permit contents from the permit register, which is used only for the simple purpose of recording occupancy permit information. Based on text mining, we analyzed and compared the frequency of vocabulary occurrence and topic modeling in five regions, including Seoul, Gyeonggi, Gyeongsang, Jeolla, Chungcheong, and Gangwon, as well as normalization processes such as stopword removal and morpheme analysis. By applying four types of centrality algorithms, including stage, proximity, mediation, and eigenvector, which are widely used in network theory, we looked at keywords that are in a central position or act as an intermediary in the network. Through a comprehensive analysis of vocabulary appearance frequency, topic modeling, and network centrality, it was found that the 'installation' keyword was the most influential in all regions. This is believed to be the result of the Ministry of Environment's permit management office issuing many permits for constructing facilities or installing structures. In addition, it was found that keywords related to road facilities, flood control facilities, underground facilities, power/communication facilities, sports/park facilities, etc. were at a central position or played a role as an intermediary in topic modeling and networks. Most of the keywords appeared to have a Zipf's law statistical distribution with low frequency of occurrence and low distribution ratio.

A Chinese Spam Filter Using Keyword and Text-in-Image Features

  • Chen, Ying-Nong;Wang, Cheng-Tzu;Lo, Chih-Chung;Han, Chin-Chuan;Fana, Kuo-Chin
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2009.01a
    • /
    • pp.32-37
    • /
    • 2009
  • Recently, electronic mail(E-mail) is the most popular communication manner in our society. In such conventional environments, spam increasingly congested in Internet. In this paper, Chinese spam could be effectively detected using text and image features. Using text features, keywords and reference templates in Chinese mails are automatically selected using genetic algorithm(GA). In addition, spam containing a promotion image is also filtered out by detecting the text characters in images. Some experimental results are given to show the effectiveness of our proposed method.

  • PDF

Analysis of Aviation Safety Management Issues using Text Mining (Text Mining 기법을 활용한 항공안전관리 이슈 분석)

  • Moonjin Kwon;Jang Ryong Lee
    • Journal of the Korean Society for Aviation and Aeronautics
    • /
    • v.31 no.4
    • /
    • pp.19-27
    • /
    • 2023
  • In this study, a total of 2,584 domestic research papers with the keywords "Aviation Safety" and "Aviation Accidents" were subjected to Text Mining analysis. Various text mining techniques, including keyword frequency analysis, word correlation analysis, network analysis, and topic modeling, were applied to examine the research trends in the field of aviation safety. The results revealed a significant increase in research using the keyword "Aviation Safety" since 2015, with over 300 papers published annually. Through keyword frequency analysis, it was observed that "Aircraft" was the most frequently mentioned term, followed by "Drones" and "Unmanned Aircraft." Phi coefficients were calculated for words closely related to "Aircraft," "Aviation," "Drones," and "Safety." Furthermore, topic modeling was employed to identify 12 distinct topics in the field of aviation safety and aviation accidents, allowing for an in-depth exploration of research trends.

CR-M-SpanBERT: Multiple embedding-based DNN coreference resolution using self-attention SpanBERT

  • Joon-young Jung
    • ETRI Journal
    • /
    • v.46 no.1
    • /
    • pp.35-47
    • /
    • 2024
  • This study introduces CR-M-SpanBERT, a coreference resolution (CR) model that utilizes multiple embedding-based span bidirectional encoder representations from transformers, for antecedent recognition in natural language (NL) text. Information extraction studies aimed to extract knowledge from NL text autonomously and cost-effectively. However, the extracted information may not represent knowledge accurately owing to the presence of ambiguous entities. Therefore, we propose a CR model that identifies mentions referring to the same entity in NL text. In the case of CR, it is necessary to understand both the syntax and semantics of the NL text simultaneously. Therefore, multiple embeddings are generated for CR, which can include syntactic and semantic information for each word. We evaluate the effectiveness of CR-M-SpanBERT by comparing it to a model that uses SpanBERT as the language model in CR studies. The results demonstrate that our proposed deep neural network model achieves high-recognition accuracy for extracting antecedents from NL text. Additionally, it requires fewer epochs to achieve an average F1 accuracy greater than 75% compared with the conventional SpanBERT approach.

Automatic Construction of Reduced Dimensional Cluster-based Keyword Association Networks using LSI (LSI를 이용한 차원 축소 클러스터 기반 키워드 연관망 자동 구축 기법)

  • Yoo, Han-mook;Kim, Han-joon;Chang, Jae-young
    • Journal of KIISE
    • /
    • v.44 no.11
    • /
    • pp.1236-1243
    • /
    • 2017
  • In this paper, we propose a novel way of producing keyword networks, named LSI-based ClusterTextRank, which extracts significant key words from a set of clusters with a mutual information metric, and constructs an association network using latent semantic indexing (LSI). The proposed method reduces the dimension of documents through LSI, decomposes documents into multiple clusters through k-means clustering, and expresses the words within each cluster as a maximal spanning tree graph. The significant key words are identified by evaluating their mutual information within clusters. Then, the method calculates the similarities between the extracted key words using the term-concept matrix, and the results are represented as a keyword association network. To evaluate the performance of the proposed method, we used travel-related blog data and showed that the proposed method outperforms the existing TextRank algorithm by about 14% in terms of accuracy.

Enhancing the Performance of Blog Retrieval by User Tagging and Social Network Analysis (사용자 태그와 중심성 지수를 이용한 블로그 검색 성능 향상에 관한 연구)

  • Kim, Eun-Hee;Chung, Young-Mee
    • Journal of the Korean Society for information Management
    • /
    • v.27 no.1
    • /
    • pp.61-77
    • /
    • 2010
  • Blogs are now one of the major information resources on the web. The purpose of this study is to enhance the performance of blog retrieval by means of user assigned tags and trackback information. To this end, retrieval experiments were performed with a dataset of 4,908 blog pages together with their associated trackback URLs. In the experiments, text terms, user tags, and network centrality values based on trackbacks were variously combined as retrieval features. The experimental results showed that employing user tags and network centrality values as retrieval features in addition to text words could improve the performance of blog retrieval.

Author Identification Using Artificial Neural Network (Artificial Neural Network를 이용한 논문 저자 식별)

  • Jung, Jisoo;Yoon, Ji Won
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.26 no.5
    • /
    • pp.1191-1199
    • /
    • 2016
  • To ensure the fairness, journal reviewers use blind-review system which hides the author information of the journal. Even though the author information is blinded, we could identify the author by looking at the field of the journal or containing words and phrases in the text. In this paper, we collected 315 journals of 20 authors and extracted text data. Bag-of-words were generated after preprocessing and used as an input of artificial neural network. The experiment shows the possibility of circumventing the blind review through identifying the author of the journal. By the experiment, we demonstrate the limitation of the current blind-review system and emphasize the necessity of robust blind-review system.

Big Data Analysis of the Women Who Score Goal Sports Entertainment Program: Focusing on Text Mining and Semantic Network Analysis.

  • Hyun-Myung, Kim;Kyung-Won, Byun
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.15 no.1
    • /
    • pp.222-230
    • /
    • 2023
  • The purpose of this study is to provide basic data on sports entertainment programs by collecting data on unstructured data generated by Naver and Google for SBS entertainment program 'Women Who Score Goal', which began regular broadcast in June 2021, and analyzing public perceptions through data mining, semantic matrix, and CONCOR analysis. Data collection was conducted using Textom, and 27,911 cases of data accumulated for 16 months from June 16, 2021 to October 15, 2022. For the collected data, 80 key keywords related to 'Kick a Goal' were derived through simple frequency and TF-IDF analysis through data mining. Semantic network analysis was conducted to analyze the relationship between the top 80 keywords analyzed through this process. The centrality was derived through the UCINET 6.0 program using NetDraw of UCINET 6.0, understanding the characteristics of the network, and visualizing the connection relationship between keywords to express it clearly. CONCOR analysis was conducted to derive a cluster of words with similar characteristics based on the semantic network. As a result of the analysis, it was analyzed as a 'program' cluster related to the broadcast content of 'Kick a Goal' and a 'Soccer' cluster, a sports event of 'Kick a Goal'. In addition to the scenes about the game of the cast, it was analyzed as an 'Everyday Life' cluster about training and daily life, and a cluster about 'Broadcast Manipulation' that disappointed viewers with manipulation of the game content.

A Study on the Current Status of Supply Chain Risks after COVID-19: Focusing on Network Analysis (코로나19 이후 공급사슬 리스크에 대한 현황연구: 네트워크 분석을 중심으로)

  • EuiBeom Jeong;Keontaek Oh
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.28 no.4
    • /
    • pp.77-92
    • /
    • 2023
  • In this study, keyword network analysis was performed based on global and domestic journals, and network text analysis was conducted for news and articles to examine major issues and research trends on supply chain risks after COVID-19. As a result of analyzing the supply chain risk, after COVID-19 which was relatively insufficient in previous studies, research trends and topics such as supply chain risk recovery, response and public welfare, which are different from previous previous studies, were found in global and domestic journals, news and articles and it was possible to suggest practical strategies and insights for supply chain risk strategies for firms.

A Multi-Class Classifier of Modified Convolution Neural Network by Dynamic Hyperplane of Support Vector Machine

  • Nur Suhailayani Suhaimi;Zalinda Othman;Mohd Ridzwan Yaakub
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.11
    • /
    • pp.21-31
    • /
    • 2023
  • In this paper, we focused on the problem of evaluating multi-class classification accuracy and simulation of multiple classifier performance metrics. Multi-class classifiers for sentiment analysis involved many challenges, whereas previous research narrowed to the binary classification model since it provides higher accuracy when dealing with text data. Thus, we take inspiration from the non-linear Support Vector Machine to modify the algorithm by embedding dynamic hyperplanes representing multiple class labels. Then we analyzed the performance of multi-class classifiers using macro-accuracy, micro-accuracy and several other metrics to justify the significance of our algorithm enhancement. Furthermore, we hybridized Enhanced Convolution Neural Network (ECNN) with Dynamic Support Vector Machine (DSVM) to demonstrate the effectiveness and efficiency of the classifier towards multi-class text data. We performed experiments on three hybrid classifiers, which are ECNN with Binary SVM (ECNN-BSVM), and ECNN with linear Multi-Class SVM (ECNN-MCSVM) and our proposed algorithm (ECNNDSVM). Comparative experiments of hybrid algorithms yielded 85.12 % for single metric accuracy; 86.95 % for multiple metrics on average. As for our modified algorithm of the ECNN-DSVM classifier, we reached 98.29 % micro-accuracy results with an f-score value of 98 % at most. For the future direction of this research, we are aiming for hyperplane optimization analysis.