• Title/Summary/Keyword: 토픽 모델

Search Result 186, Processing Time 0.026 seconds

Technique for Concurrent Processing Graph Structure and Transaction Using Topic Maps and Cassandra (토픽맵과 카산드라를 이용한 그래프 구조와 트랜잭션 동시 처리 기법)

  • Shin, Jae-Hyun
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.1 no.3
    • /
    • pp.159-168
    • /
    • 2012
  • Relation in the new IT environment, such as the SNS, Cloud, Web3.0, has become an important factor. And these relations generate a transaction. However, existing relational database and graph database does not processe graph structure representing the relationships and transactions. This paper, we propose the technique that can be processed concurrently graph structures and transactions in a scalable complex network system. The proposed technique simultaneously save and navigate graph structures and transactions using the Topic Maps data model. Topic Maps is one of ontology language to implement the semantic web(Web 3.0). It has been used as the navigator of the information through the association of the information resources. In this paper, the architecture of the proposed technique was implemented and design using Cassandra - one of column type NoSQL. It is to ensure that can handle up to Big Data-level data using distributed processing. Finally, the experiments showed about the process of storage and query about typical RDBMS Oracle and the proposed technique to the same data source and the same questions. It can show that is expressed by the relationship without the 'join' enough alternative to the role of the RDBMS.

Comparative Study of User Reactions in OTT Service Platforms Using Text Mining (텍스트 마이닝을 활용한 OTT 서비스 플랫폼별 사용자 반응 비교 연구)

  • Soonchan Kwon;Jieun Kim;Beakcheol Jang
    • Journal of Internet Computing and Services
    • /
    • v.25 no.3
    • /
    • pp.43-54
    • /
    • 2024
  • This study employs text mining techniques to compare user responses across various Over-The-Top (OTT) service platforms. The primary objective of the research is to understand user satisfaction with OTT service platforms and contribute to the formulation of more effective review strategies. The key questions addressed in this study involve identifying prominent topics and keywords in user reviews of different OTT services and comprehending platform-specific user reactions. TF-IDF is utilized to extract significant words from positive and negative reviews, while BERTopic, an advanced topic modeling technique, is employed for a more nuanced and comprehensive analysis of intricate user reviews. The results from TF-IDF analysis reveal that positive app reviews exhibit a high frequency of content-related words, whereas negative reviews display a high frequency of words associated with potential issues during app usage. Through the utilization of BERTopic, we were able to extract keywords related to content diversity, app performance components, payment, and compatibility, by associating them with content attributes. This enabled us to verify that the distinguishing attributes of the platforms vary among themselves. The findings of this study offer significant insights into user behavior and preferences, which OTT service providers can leverage to improve user experience and satisfaction. We also anticipate that researchers exploring deep learning models will find our study results valuable for conducting analyses on user review text data.

Target Extraction Based on HITS Graph for Opinion Bias Detection in Twitter (트윗 문서에서 의견 바이어스 탐지를 위한 HITS 그래프 기반 핵심 자질 추출)

  • Kwon, A-Rong;Lee, Kyung-Soon
    • Annual Conference on Human and Language Technology
    • /
    • 2012.10a
    • /
    • pp.58-61
    • /
    • 2012
  • 본 논문에서는 트위터 사용자들의 의견을 바이어스 탐지 하기 위해, 핵심 자질 추출 방법으로 HITS 그래프를 이용한 방법을 제안한다. 제안하는 핵심 자질 추출 방법은 사람이 직접 추출하지 못하는 자질도 추출할 수 있는 장점을 보였다. 제안한 핵심 자질 추출이 바이어스 탐지에 유효함을 검증하기 위해 4개의 토픽에 대해 평가 했을 때 제안 모델이 기존 모델보다 우수한 성능을 보였다.

  • PDF

Meta-Model Transformations for Efficient Storing DDS Topics (효율적인 DDS 토픽 저장을 위한 메타 모델 변환 방법)

  • Lee, Hyun-Woo;Yim, Hyung-Jun;Choi, Hoon;Kim, Jum-Su;Lee, Kyu-Chul
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2011.06c
    • /
    • pp.123-126
    • /
    • 2011
  • 최근 전투체계는 동시에 다수의 장비들 사이에 실시간으로 데이터를 전달해야 하는데, 이러한 특성을 만족하는 통신 미들웨어로서 OMG (Object Management Group)에서 정의한 데이터 분배 서비스 (DDS ; Data Distribution Service)가 적합하다. 이를 구현한 DDS 시스템에는 RTI의 NDDS, PrismTech의 OpenSplice, 충남대학교의 ReTicom 등이 있다. 이 중 NDDS와 OpenSplice는 데이터의 영속성을 지원하지만 ReTicom에서는 영속성을 아직 지원하지 못한다. 이를 해결함과 동시에 실시간성을 보장하기 위해서 ReTicom은 메인 메모리 기반의 객체 관계형 데이터베이스를 사용하여 구현중이다. 이를 위해서는 DDS의 객체 모델 데이터를 정의하는 IDL과 객체 관계형 데이터베이스의 데이터 타입 및 구조 등이 동일하지 않기 때문에 IDL과 객체 관계형 데이터베이스간의 데이터 타입 및 구조를 변환하는 메타 모델 변환 방법이 제공되어야 한다. 본 논문에서는 이러한 메타 모델 변환 방법을 해결하고자 IDL을 구조파악이 쉬운 XML 스키마로 변환 후 이를 객체 관계형 데이터베이스의 데이터 타입 및 구조 형태로 변환 해주는 방법을 제안한다.

Multitask Transformer Model-based Fintech Customer Service Chatbot NLU System with DECO-LGG SSP-based Data (DECO-LGG 반자동 증강 학습데이터 활용 멀티태스크 트랜스포머 모델 기반 핀테크 CS 챗봇 NLU 시스템)

  • Yoo, Gwang-Hoon;Hwang, Chang-Hoe;Yoon, Jeong-Woo;Nam, Jee-Sun
    • Annual Conference on Human and Language Technology
    • /
    • 2021.10a
    • /
    • pp.461-466
    • /
    • 2021
  • 본 연구에서는 DECO(Dictionnaire Electronique du COreen) 한국어 전자사전과 LGG(Local-Grammar Graph)에 기반한 반자동 언어데이터 증강(Semi-automatic Symbolic Propagation: SSP) 방식에 입각하여, 핀테크 분야의 CS(Customer Service) 챗봇 NLU(Natural Language Understanding)을 위한 주석 학습 데이터를 효과적으로 생성하고, 이를 기반으로 RASA 오픈 소스에서 제공하는 DIET(Dual Intent and Entity Transformer) 아키텍처를 활용하여 핀테크 CS 챗봇 NLU 시스템을 구현하였다. 실 데이터을 통해 확인된 핀테크 분야의 32가지의 토픽 유형 및 38가지의 핵심 이벤트와 10가지 담화소 구성에 따라, DECO-LGG 데이터 생성 모듈은 질의 및 불만 화행에 대한 양질의 주석 학습 데이터를 효과적으로 생성하며, 이를 의도 분류 및 Slot-filling을 위한 개체명 인식을 종합적으로 처리하는 End to End 방식의 멀티태스크 트랜스포머 모델 DIET로 학습함으로써 DIET-only F1-score 0.931(Intent)/0.865(Slot/Entity), DIET+KoBERT F1-score 0.951(Intent)/0.901(Slot/Entity)의 성능을 확인하였으며, DECO-LGG 기반의 SSP 생성 데이터의 학습 데이터로서의 효과성과 함께 KoBERT에 기반한 DIET 모델 성능의 우수성을 입증하였다.

  • PDF

A Study on the Fraud Detection in an Online Second-hand Market by Using Topic Modeling and Machine Learning (토픽 모델링과 머신 러닝 방법을 이용한 온라인 C2C 중고거래 시장에서의 사기 탐지 연구)

  • Dongwoo Lee;Jinyoung Min
    • Information Systems Review
    • /
    • v.23 no.4
    • /
    • pp.45-67
    • /
    • 2021
  • As the transaction volume of the C2C second-hand market is growing, the number of frauds, which intend to earn unfair gains by sending products different from specified ones or not sending them to buyers, is also increasing. This study explores the model that can identify frauds in the online C2C second-hand market by examining the postings for transactions. For this goal, this study collected 145,536 field data from actual C2C second-hand market. Then, the model is built with the characteristics from postings such as the topic and the linguistic characteristics of the product description, and the characteristics of products, postings, sellers, and transactions. The constructed model is then trained by the machine learning algorithm XGBoost. The final analysis results show that fraudulent postings have less information, which is also less specific, fewer nouns and images, a higher ratio of the number and white space, and a shorter length than genuine postings do. Also, while the genuine postings are focused on the product information for nouns, delivery information for verbs, and actions for adjectives, the fraudulent postings did not show those characteristics. This study shows that the various features can be extracted from postings written in C2C second-hand transactions and be used to construct an effective model for frauds. The proposed model can be also considered and applied for the other C2C platforms. Overall, the model proposed in this study can be expected to have positive effects on suppressing and preventing fraudulent behavior in online C2C markets.

Latent topics-based product reputation mining (잠재 토픽 기반의 제품 평판 마이닝)

  • Park, Sang-Min;On, Byung-Won
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.39-70
    • /
    • 2017
  • Data-drive analytics techniques have been recently applied to public surveys. Instead of simply gathering survey results or expert opinions to research the preference for a recently launched product, enterprises need a way to collect and analyze various types of online data and then accurately figure out customer preferences. In the main concept of existing data-based survey methods, the sentiment lexicon for a particular domain is first constructed by domain experts who usually judge the positive, neutral, or negative meanings of the frequently used words from the collected text documents. In order to research the preference for a particular product, the existing approach collects (1) review posts, which are related to the product, from several product review web sites; (2) extracts sentences (or phrases) in the collection after the pre-processing step such as stemming and removal of stop words is performed; (3) classifies the polarity (either positive or negative sense) of each sentence (or phrase) based on the sentiment lexicon; and (4) estimates the positive and negative ratios of the product by dividing the total numbers of the positive and negative sentences (or phrases) by the total number of the sentences (or phrases) in the collection. Furthermore, the existing approach automatically finds important sentences (or phrases) including the positive and negative meaning to/against the product. As a motivated example, given a product like Sonata made by Hyundai Motors, customers often want to see the summary note including what positive points are in the 'car design' aspect as well as what negative points are in thesame aspect. They also want to gain more useful information regarding other aspects such as 'car quality', 'car performance', and 'car service.' Such an information will enable customers to make good choice when they attempt to purchase brand-new vehicles. In addition, automobile makers will be able to figure out the preference and positive/negative points for new models on market. In the near future, the weak points of the models will be improved by the sentiment analysis. For this, the existing approach computes the sentiment score of each sentence (or phrase) and then selects top-k sentences (or phrases) with the highest positive and negative scores. However, the existing approach has several shortcomings and is limited to apply to real applications. The main disadvantages of the existing approach is as follows: (1) The main aspects (e.g., car design, quality, performance, and service) to a product (e.g., Hyundai Sonata) are not considered. Through the sentiment analysis without considering aspects, as a result, the summary note including the positive and negative ratios of the product and top-k sentences (or phrases) with the highest sentiment scores in the entire corpus is just reported to customers and car makers. This approach is not enough and main aspects of the target product need to be considered in the sentiment analysis. (2) In general, since the same word has different meanings across different domains, the sentiment lexicon which is proper to each domain needs to be constructed. The efficient way to construct the sentiment lexicon per domain is required because the sentiment lexicon construction is labor intensive and time consuming. To address the above problems, in this article, we propose a novel product reputation mining algorithm that (1) extracts topics hidden in review documents written by customers; (2) mines main aspects based on the extracted topics; (3) measures the positive and negative ratios of the product using the aspects; and (4) presents the digest in which a few important sentences with the positive and negative meanings are listed in each aspect. Unlike the existing approach, using hidden topics makes experts construct the sentimental lexicon easily and quickly. Furthermore, reinforcing topic semantics, we can improve the accuracy of the product reputation mining algorithms more largely than that of the existing approach. In the experiments, we collected large review documents to the domestic vehicles such as K5, SM5, and Avante; measured the positive and negative ratios of the three cars; showed top-k positive and negative summaries per aspect; and conducted statistical analysis. Our experimental results clearly show the effectiveness of the proposed method, compared with the existing method.

Experimental Study for Effective Combination of Opinion Features (효과적인 의견 자질 결합을 위한 실험적 연구)

  • Han, Kyoung-Soo
    • Journal of the Korean Society for information Management
    • /
    • v.27 no.3
    • /
    • pp.227-239
    • /
    • 2010
  • Opinion retrieval is to retrieve items which are relevant to the user information need topically and include opinion about the topic. This paper aims to find a method to represent user information need for effective opinion retrieval and to analyze the combination methods for opinion features through various experiments. The experiments are carried out in the inference network framework using the Blogs06 collection and 100 TREC test topics. The results show that our suggested representation method based on hidden 'opinion' concept is effective, and the compact model with very small opinion lexicon shows the comparable performance to the previous model on the same test data set.

Study on Text Analysis of the Liquefied Natural Gas Carriers Dock Specification for Development of the Ship Predictive Maintenance Model (선박예지정비모델 개발을 위한 LNG 선박 도크 수리 항목의 텍스트 분석 연구)

  • Hwang, Taemin;Youn, Ik-Hyun;Oh, Jungmo
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.27 no.1
    • /
    • pp.60-66
    • /
    • 2021
  • The importance of maintenance is leading the application of the maintenance strategy in various industries. The maritime industry is also a part of them, with changes in selecting and applying the maintenance strategy, but rather slowly, by retaining the old strategy. In particular, the ship is maintaining a previously used strategy. In the circumstance of the sea, ship requires a new suggestion for maintenance strategy. A ship predictive maintenance model predicts the breakdown or malfunction of machineries to secure maintenance time with preventive actions and treatments, thereby avoiding maintenance-related dangerous factors. This study focused on applying text analysis to an Liquefied Natural Gas Carriers dock indent document, and the analysis results were interpreted from the original document. The inter-relational patterns observed from the frequency of common maintenance combinations among different parts and equipment in ships will be applied to the development of ship predictive maintenance.

A School-tailored High School Integrated Science Q&A Chatbot with Sentence-BERT: Development and One-Year Usage Analysis (인공지능 문장 분류 모델 Sentence-BERT 기반 학교 맞춤형 고등학교 통합과학 질문-답변 챗봇 -개발 및 1년간 사용 분석-)

  • Gyeongmo Min;Junehee Yoo
    • Journal of The Korean Association For Science Education
    • /
    • v.44 no.3
    • /
    • pp.231-248
    • /
    • 2024
  • This study developed a chatbot for first-year high school students, employing open-source software and the Korean Sentence-BERT model for AI-powered document classification. The chatbot utilizes the Sentence-BERT model to find the six most similar Q&A pairs to a student's query and presents them in a carousel format. The initial dataset, built from online resources, was refined and expanded based on student feedback and usability throughout over the operational period. By the end of the 2023 academic year, the chatbot integrated a total of 30,819 datasets and recorded 3,457 student interactions. Analysis revealed students' inclination to use the chatbot when prompted by teachers during classes and primarily during self-study sessions after school, with an average of 2.1 to 2.2 inquiries per session, mostly via mobile phones. Text mining identified student input terms encompassing not only science-related queries but also aspects of school life such as assessment scope. Topic modeling using BERTopic, based on Sentence-BERT, categorized 88% of student questions into 35 topics, shedding light on common student interests. A year-end survey confirmed the efficacy of the carousel format and the chatbot's role in addressing curiosities beyond integrated science learning objectives. This study underscores the importance of developing chatbots tailored for student use in public education and highlights their educational potential through long-term usage analysis.