• 제목/요약/키워드: semantic mining

검색결과 213건 처리시간 0.021초

시맨틱 텍스트 마이닝을 위한 온톨로지 활용 방안 (Using Ontologies for Semantic Text Mining)

  • 유은지;김정철;이춘열;김남규
    • 한국정보시스템학회지:정보시스템연구
    • /
    • 제21권3호
    • /
    • pp.137-161
    • /
    • 2012
  • The increasing interest in big data analysis using various data mining techniques indicates that many commercial data mining tools now need to be equipped with fundamental text analysis modules. The most essential prerequisite for accurate analysis of text documents is an understanding of the exact semantics of each term in a document. The main difficulties in understanding the exact semantics of terms are mainly attributable to homonym and synonym problems, which is a traditional problem in the natural language processing field. Some major text mining tools provide a thesaurus to solve these problems, but a thesaurus cannot be used to resolve complex synonym problems. Furthermore, the use of a thesaurus is irrelevant to the issue of homonym problems and hence cannot solve them. In this paper, we propose a semantic text mining methodology that uses ontologies to improve the quality of text mining results by resolving the semantic ambiguity caused by homonym and synonym problems. We evaluate the practical applicability of the proposed methodology by performing a classification analysis to predict customer churn using real transactional data and Q&A articles from the "S" online shopping mall in Korea. The experiments revealed that the prediction model produced by our proposed semantic text mining method outperformed the model produced by traditional text mining in terms of prediction accuracy such as the response, captured response, and lift.

Big Data Analysis of the Women Who Score Goal Sports Entertainment Program: Focusing on Text Mining and Semantic Network Analysis.

  • Hyun-Myung, Kim;Kyung-Won, Byun
    • International Journal of Internet, Broadcasting and Communication
    • /
    • 제15권1호
    • /
    • pp.222-230
    • /
    • 2023
  • The purpose of this study is to provide basic data on sports entertainment programs by collecting data on unstructured data generated by Naver and Google for SBS entertainment program 'Women Who Score Goal', which began regular broadcast in June 2021, and analyzing public perceptions through data mining, semantic matrix, and CONCOR analysis. Data collection was conducted using Textom, and 27,911 cases of data accumulated for 16 months from June 16, 2021 to October 15, 2022. For the collected data, 80 key keywords related to 'Kick a Goal' were derived through simple frequency and TF-IDF analysis through data mining. Semantic network analysis was conducted to analyze the relationship between the top 80 keywords analyzed through this process. The centrality was derived through the UCINET 6.0 program using NetDraw of UCINET 6.0, understanding the characteristics of the network, and visualizing the connection relationship between keywords to express it clearly. CONCOR analysis was conducted to derive a cluster of words with similar characteristics based on the semantic network. As a result of the analysis, it was analyzed as a 'program' cluster related to the broadcast content of 'Kick a Goal' and a 'Soccer' cluster, a sports event of 'Kick a Goal'. In addition to the scenes about the game of the cast, it was analyzed as an 'Everyday Life' cluster about training and daily life, and a cluster about 'Broadcast Manipulation' that disappointed viewers with manipulation of the game content.

KNN-based Image Annotation by Collectively Mining Visual and Semantic Similarities

  • Ji, Qian;Zhang, Liyan;Li, Zechao
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제11권9호
    • /
    • pp.4476-4490
    • /
    • 2017
  • The aim of image annotation is to determine labels that can accurately describe the semantic information of images. Many approaches have been proposed to automate the image annotation task while achieving good performance. However, in most cases, the semantic similarities of images are ignored. Towards this end, we propose a novel Visual-Semantic Nearest Neighbor (VS-KNN) method by collectively exploring visual and semantic similarities for image annotation. First, for each label, visual nearest neighbors of a given test image are constructed from training images associated with this label. Second, each neighboring subset is determined by mining the semantic similarity and the visual similarity. Finally, the relevance between the images and labels is determined based on maximum a posteriori estimation. Extensive experiments were conducted using three widely used image datasets. The experimental results show the effectiveness of the proposed method in comparison with state-of-the-arts methods.

Semantic Trajectory Based Behavior Generation for Groups Identification

  • Cao, Yang;Cai, Zhi;Xue, Fei;Li, Tong;Ding, Zhiming
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제12권12호
    • /
    • pp.5782-5799
    • /
    • 2018
  • With the development of GPS and the popularity of mobile devices with positioning capability, collecting massive amounts of trajectory data is feasible and easy. The daily trajectories of moving objects convey a concise overview of their behaviors. Different social roles have different trajectory patterns. Therefore, we can identify users or groups based on similar trajectory patterns by mining implicit life patterns. However, most existing daily trajectories mining studies mainly focus on the spatial and temporal analysis of raw trajectory data but missing the essential semantic information or behaviors. In this paper, we propose a novel trajectory semantics calculation method to identify groups that have similar behaviors. In our model, we first propose a fast and efficient approach for stay regions extraction from daily trajectories, then generate semantic trajectories by enriching the stay regions with semantic labels. To measure the similarity between semantic trajectories, we design a semantic similarity measure model based on spatial and temporal similarity factor. Furthermore, a pruning strategy is proposed to lighten tedious calculations and comparisons. We have conducted extensive experiments on real trajectory dataset of Geolife project, and the experimental results show our proposed method is both effective and efficient.

Developing an User Location Prediction Model for Ubiquitous Computing based on a Spatial Information Management Technique

  • Choi, Jin-Won;Lee, Yung-Il
    • Architectural research
    • /
    • 제12권2호
    • /
    • pp.15-22
    • /
    • 2010
  • Our prediction model is based on the development of "Semantic Location Model." It embodies geometrical and topological information which can increase the efficiency in prediction and make it easy to manipulate the prediction model. Data mining is being implemented to extract the inhabitant's location patterns generated day by day. As a result, the self-learning system will be able to semantically predict the inhabitant's location in advance. This context-aware system brings about the key component of the ubiquitous computing environment. First, we explain the semantic location model and data mining methods. Then the location prediction model for the ubiquitous computing system is described in details. Finally, the prototype system is introduced to demonstrate and evaluate our prediction model.

Deep Learning Framework with Convolutional Sequential Semantic Embedding for Mining High-Utility Itemsets and Top-N Recommendations

  • Siva S;Shilpa Chaudhari
    • Journal of information and communication convergence engineering
    • /
    • 제22권1호
    • /
    • pp.44-55
    • /
    • 2024
  • High-utility itemset mining (HUIM) is a dominant technology that enables enterprises to make real-time decisions, including supply chain management, customer segmentation, and business analytics. However, classical support value-driven Apriori solutions are confined and unable to meet real-time enterprise demands, especially for large amounts of input data. This study introduces a groundbreaking model for top-N high utility itemset mining in real-time enterprise applications. Unlike traditional Apriori-based solutions, the proposed convolutional sequential embedding metrics-driven cosine-similarity-based multilayer perception learning model leverages global and contextual features, including semantic attributes, for enhanced top-N recommendations over sequential transactions. The MATLAB-based simulations of the model on diverse datasets, demonstrated an impressive precision (0.5632), mean absolute error (MAE) (0.7610), hit rate (HR)@K (0.5720), and normalized discounted cumulative gain (NDCG)@K (0.4268). The average MAE across different datasets and latent dimensions was 0.608. Additionally, the model achieved remarkable cumulative accuracy and precision of 97.94% and 97.04% in performance, respectively, surpassing existing state-of-the-art models. This affirms the robustness and effectiveness of the proposed model in real-time enterprise scenarios.

Grid Management System and Information System for Semantic Grid Middleware

  • Kim, Hyung-Lae;Han, Byong-John;Jeong, In-Yong;Jeong, Chang-Sung
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제4권6호
    • /
    • pp.1080-1097
    • /
    • 2010
  • Well organized and easy usable Grid management system is very important for executing various Grid applications and managing Grid computing environment. Moreover, information system which can support Grid management system by providing various Grid environment related information is also one of the most interesting issue in the Grid middleware system area. Effective cooperation between Grid management system and information system can make a novel Grid middleware system. Especially, service oriented architecture based Grid management system is flexible and extensible for providing various type of Grid services. Also, information system based on data mining process which comprises various different kinds of domains such as users, resources and applications can make Grid management system more precise and efficient. In this paper, we propose semantic Grid middleware system which is a combination of Grid management system and semantic information system.

의미 기반의 지식모델 통합과 탐색에 관한 연구 (A study on integrating and discovery of semantic based knowledge model)

  • 전승수
    • 인터넷정보학회논문지
    • /
    • 제15권6호
    • /
    • pp.99-106
    • /
    • 2014
  • 최근 자연어 및 정형언어 처리, 인공지능 알고리즘 등을 활용한 효율적인 의미 기반 지식모델의 생성과 분석 방법이 제시되고 있다. 이러한 의미 기반 지식모델은 효율적 의사결정트리(Decision Making Tree)와 특정 상황에 대한 체계적인 문제해결(Problem Solving) 경로 분석에 활용된다. 특히 다양한 복잡계 및 사회 연계망 분석에 있어 정적 지표 생성과 회귀 분석, 행위적 모델을 통한 추이분석, 거시예측을 지원하는 모의실험 모형의 기반이 된다. 하지만 대부분의 지식 모델은 특정 지표나 정제된 데이터를 수동적으로 모델링하여 분석에 활용한다. 본 논문에서는 텍스트 마이닝 기술을 통해 방대한 비정형 정보로부터 지식 모델을 구성하는 토픽인자와 관계 노드를 생성하고 이를 통합하는 방법과 정형적 알고리즘을 제시한다. 이를 위해 먼저, 텍스트 마이닝을 통해 도출되는 키워드 맵을 동치적 지식맵으로 변환하고 이를 의미적 지식모델로 통합하는 방법을 설명한다. 또한 키워드 맵으로부터 유의미한 토픽 맵을 투영하는 방법과 의미적 동치 모델을 유도하는 알고리즘을 제안한다.

국내 소비자의 일본 패션제품에 대한 정치적 소비 연구 (Korean Consumers' Political Consumption of Japanese Fashion Products)

  • 최영현;이규혜
    • 한국의류학회지
    • /
    • 제44권2호
    • /
    • pp.295-309
    • /
    • 2020
  • In 2019, Japan announced trade regulations against Korean products; consequently, the sales of Japanese products in Korea dropped due to a Korean consumers' boycott. This study measured the Korean consumers' political consumption behavior toward Japanese fashion products. Unstructured text data from online media sources and consumer posted sources such as blog and SNS were collected. Text mining techniques and semantic network analysis were used to process unstructured data. This study used text mining techniques and semantic network analysis to process data. The results identified boycotting Japanese fashion products and buycotting alternative products and Korean brands due to consumers' political consumption. Two brand cases were investigated in detail. Online text data before and after the political action were compared and significant changes in consumption as well as emotional expressions were identified. Product related industry sectors were identified in terms of the political consumption of fashion: liquor, automobile and tourism industry sectors were closely linked to the fashion sector in terms of boycotting. More "boycott" and "buycott" fashion brands (reflected in consumer attitudes and feelings) were detected in consumer driven texts than in media driven sources.

데이터 마이닝 기법을 이용한 XML 문서의 온톨로지 반자동 생성 (Semi-Automatic Ontology Generation about XML Documents using Data Mining Method)

  • 구미숙;황정희;류근호;홍장의
    • 정보처리학회논문지D
    • /
    • 제13D권3호
    • /
    • pp.299-308
    • /
    • 2006
  • 최근 웹 문서를 비롯한 공공 문서 등에 대한 문서 교환을 위해 XML 데이터를 이용한 표준화 작업이 진행 중이므로 XML 문서가 증가하고 있다. 이와 같은 XML 문서에 대한 정보 검색의 효율을 높이기 위해 의미적 요소를 추가한 온톨로지를 기반으로 하는 시맨틱 웹이 등장하였다. 그러나 기존의 수동적인 온톨로지 구축 방식은 비용과 시간이 많이 소모되는 단점이 있으므로 이 논문에서는 유사한 도메인의 XML문서 집합으로부터 데이터 마이닝 기법의 연관규칙 알고리즘을 이용하여 반자동으로 온톨로지를 구축하는 방법을 제안한다. 제안한 방법은 특정한 도메인에 대한 온톨로지를 구축하기 위해서 필요한 데이터의 형태 및 개념 레벨, 그리고 얼마나 많은 개념을 사용할 것인가 하는 도메인 범위의 자동 설정을 온톨로지 자동 생성을 위한 온톨로지 도메인 레벨을 결정하기 위해서 데이터 마이닝 알고리즘을 이용한다. XML 문서의 태그에 대해 연관규칙을 적용하여 빈발하게 발생하는 빈발 패턴을 찾아내고, 서로 관련 있는 개념의 쌍을 추출하여 온톨로지 자동 생성을 위한 도메인 범위를 설정한다. 온톨로지 구축은 온톨로지 언어중의 하나인 XML Topic Maps와 공개 소스인 토픽법 엔진인 TM4J를 이용하여 온톨로지 기반의 시맨틱 웹 엔진을 구현하였다.