• Title/Summary/Keyword: Topic Clustering

Search Result 100, Processing Time 0.025 seconds

Station Extension Algorithm Considering Destinations to Solve Illegal Parking of E-Scooters

  • Jeongeun, Song;Yoon-Ah, Song;ZoonKy, Lee
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.2
    • /
    • pp.131-142
    • /
    • 2023
  • In this paper, we propose a new station selection algorithm to solve the illegal parking problem of shared electric scooters and improve the service quality. Recently, as a solution to the urban transportation problem, shared electric scooters are attracting attention as the first and last mile means between public transportation and final destinations. As a result, the shared electric scooter market grew rapidly, problems caused by electric scooters are becoming serious. Therefore, in this study, text data are collected to understand the nature of the problem, and the problems related to shared scooters are viewed from the perspective of pedestrians and users in 'LDA Topic Modeling', and a station extension algorithm is based on this. Some parking lots have already been installed, but the existing parking lot location is different from the actual area of tow. Therefore, in this study, we propose an algorithm that can install stations at high actual tow density using mixed clustering technology using K-means after primary clustering by DBSCAN, reflecting the 'current state of electric scooter tow in Seoul'.

How does the General Public Understand Science and Technology Issues?: A Case on the Nuclear Power Issue Using Topic Modeling Approach (과학기술이슈에 대한 일반인의 인식분석: 토픽모델링을 활용한 원자력발전 사례)

  • Choi, Hyundo;Ahn, Jongwuk
    • Journal of Technology Innovation
    • /
    • v.23 no.4
    • /
    • pp.151-175
    • /
    • 2015
  • The general public is a key stakeholder in the science and technology domain. However, traditional approaches require substantial efforts and resources to analyze how does the general public understand science and technology issues. We applied the topic modeling, a form of text clustering, to the texts about the nuclear power which were posted on an online space in order to explore the general public's thoughts on the issue. This study investigates the extent to which macro-level events influence understandings of the general public on the science and technology issues and weather these changes in understandings are sustained over time. It examines the possibility of applying topic modeling in narrowing a perception gap between the general public and the experts through a near-real-time monitoring of the public interests and perceptions about the science and technology issues.

WV-BTM: A Technique on Improving Accuracy of Topic Model for Short Texts in SNS (WV-BTM: SNS 단문의 주제 분석을 위한 토픽 모델 정확도 개선 기법)

  • Song, Ae-Rin;Park, Young-Ho
    • Journal of Digital Contents Society
    • /
    • v.19 no.1
    • /
    • pp.51-58
    • /
    • 2018
  • As the amount of users and data of NS explosively increased, research based on SNS Big data became active. In social mining, Latent Dirichlet Allocation(LDA), which is a typical topic model technique, is used to identify the similarity of each text from non-classified large-volume SNS text big data and to extract trends therefrom. However, LDA has the limitation that it is difficult to deduce a high-level topic due to the semantic sparsity of non-frequent word occurrence in the short sentence data. The BTM study improved the limitations of this LDA through a combination of two words. However, BTM also has a limitation that it is impossible to calculate the weight considering the relation with each subject because it is influenced more by the high frequency word among the combined words. In this paper, we propose a technique to improve the accuracy of existing BTM by reflecting semantic relation between words.

A Study of Personalized Retrieval System through Society of Korean Journal Articles of Science and Technology (개인화 검색시스템에 관한 연구 - 과학기술학회마을을 중심으로 -)

  • Kim, Kwang-Young;Kwak, Seung-Jin
    • Journal of Korean Library and Information Science Society
    • /
    • v.41 no.1
    • /
    • pp.149-165
    • /
    • 2010
  • In this research, we analyze about the general service provided by Society of Korean journal articles of science and technology. Personalized retrieval services which are suitable to the articles service were developed based on this. That is, there are personalized retrieval system based on user's keyword, authors navigation system, automatic topic recommendation system based on author's keyword, and similar user automatic recommendation system. In this research, personalized service methods being suitable to the articles service of Society tries to be considered through the user survey.

  • PDF

Dynamic Text Categorizing Method using Text Mining and Association Rule

  • Kim, Young-Wook;Kim, Ki-Hyun;Lee, Hong-Chul
    • Journal of the Korea Society of Computer and Information
    • /
    • v.23 no.10
    • /
    • pp.103-109
    • /
    • 2018
  • In this paper, we propose a dynamic document classification method which breaks away from existing document classification method with artificial categorization rules focusing on suppliers and has changing categorization rules according to users' needs or social trends. The core of this dynamic document classification method lies in the fact that it creates classification criteria real-time by using topic modeling techniques without standardized category rules, which does not force users to use unnecessary frames. In addition, it can also search the details through the relevance analysis by calculating the relationship between the words that is difficult to grasp by word frequency alone. Rather than for logical and systematic documents, this method proposed can be used more effectively for situation analysis and retrieving information of unstructured data which do not fit the category of existing classification such as VOC (Voice Of Customer), SNS and customer reviews of Internet shopping malls and it can react to users' needs flexibly. In addition, it has no process of selecting the classification rules by the suppliers and in case there is a misclassification, it requires no manual work, which reduces unnecessary workload.

Analysis of Research Trends in Homomorphic Encryption Using Bibliometric Analysis (서지통계학적 분석을 이용한 동형 암호의 연구경향 분석)

  • Akihiko Yamada;Eunsang Lee
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.33 no.4
    • /
    • pp.601-608
    • /
    • 2023
  • Homomorphic encryption is a promising technology that has been extensively researched in recent years. It allows computations to be performed on encrypted data, without the need to decrypt it. In this paper, we perform bibliometric analysis to objectively and quantitatively analyze the research trends of homomorphic encryption technology using 6,047 homomorphic encryption papers from the Scopus database. Specifically, we analyze the number of papers by year, keyword co-occurrence, topic clustering, changes in related keywords over time, and country of homomorphic encryption research institutions. Our analysis results provide strategic directions for research and application of homomorphic encryption and can be a great help for subsequent research and industrial applications.

An Efficient Directional MAC Protocol for Vehicular Ad-hoc Networks (차량 Ad-hoc에서 효율적인 메시지 전달을 위한 지향성 MAC 프로토콜)

  • Ji, Soonbae;Kim, Junghyun;You, Cheolwoo
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.52 no.4
    • /
    • pp.9-16
    • /
    • 2015
  • Quick and safe message transmission is an important research topic of vehicular ad hoc networks (VANET). Most studies assume that the periodic broadcast of beacon-frames between vehicles increases the safety of the driver. In this paper, we propose a medium access control (MAC) protocol and location-based clustering for the VANET to support reliable data transfer. In our proposal, the cluster heade (CH) manage the access and allocate the resources of the node. Our proposal uses simulation to confirm the reduction of the transmission delay and the collision rate of the signal.

Empirical Comparison of Word Similarity Measures Based on Co-Occurrence, Context, and a Vector Space Model

  • Kadowaki, Natsuki;Kishida, Kazuaki
    • Journal of Information Science Theory and Practice
    • /
    • v.8 no.2
    • /
    • pp.6-17
    • /
    • 2020
  • Word similarity is often measured to enhance system performance in the information retrieval field and other related areas. This paper reports on an experimental comparison of values for word similarity measures that were computed based on 50 intentionally selected words from a Reuters corpus. There were three targets, including (1) co-occurrence-based similarity measures (for which a co-occurrence frequency is counted as the number of documents or sentences), (2) context-based distributional similarity measures obtained from a latent Dirichlet allocation (LDA), nonnegative matrix factorization (NMF), and Word2Vec algorithm, and (3) similarity measures computed from the tf-idf weights of each word according to a vector space model (VSM). Here, a Pearson correlation coefficient for a pair of VSM-based similarity measures and co-occurrence-based similarity measures according to the number of documents was highest. Group-average agglomerative hierarchical clustering was also applied to similarity matrices computed by individual measures. An evaluation of the cluster sets according to an answer set revealed that VSM- and LDA-based similarity measures performed best.

A Sentiment Classification Approach of Sentences Clustering in Webcast Barrages

  • Li, Jun;Huang, Guimin;Zhou, Ya
    • Journal of Information Processing Systems
    • /
    • v.16 no.3
    • /
    • pp.718-732
    • /
    • 2020
  • Conducting sentiment analysis and opinion mining are challenging tasks in natural language processing. Many of the sentiment analysis and opinion mining applications focus on product reviews, social media reviews, forums and microblogs whose reviews are topic-similar and opinion-rich. In this paper, we try to analyze the sentiments of sentences from online webcast reviews that scroll across the screen, which we call live barrages. Contrary to social media comments or product reviews, the topics in live barrages are more fragmented, and there are plenty of invalid comments that we must remove in the preprocessing phase. To extract evaluative sentiment sentences, we proposed a novel approach that clusters the barrages from the same commenter to solve the problem of scattering the information for each barrage. The method developed in this paper contains two subtasks: in the data preprocessing phase, we cluster the sentences from the same commenter and remove unavailable sentences; and we use a semi-supervised machine learning approach, the naïve Bayes algorithm, to analyze the sentiment of the barrage. According to our experimental results, this method shows that it performs well in analyzing the sentiment of online webcast barrages.

Interest-based Customer Segmentation Methodology Using Topic Modeling (토픽 분석을 활용한 관심 기반 고객 세분화 방법론)

  • Hyun, Yoonjin;Kim, Namgyu;Cho, Yoonho
    • Journal of Information Technology Applications and Management
    • /
    • v.22 no.1
    • /
    • pp.77-93
    • /
    • 2015
  • As the range of the customer choice becomes more diverse, the average life span of companies' products and services is becoming shorter. Most companies are striving to maximize the revenue by understanding the customer's needs and providing customized products and services. However, companies had to bear a significant burden, in terms of the time and cost involved in the process of determining each individual customer's needs. Therefore, an alternative method is employed that involves grouping the customers into different categories based on certain criteria and establishing a marketing strategy tailored for each group. In this way, customer segmentation and customer clustering are performed using demographic information and behavioral information. Demographic information included sex, age, income level, and etc., while behavioral information was usually identified indirectly through customers' purchase history and search history. However, there is a limitation regarding companies' customer behavioral information, because the information is usually obtained through the limited data provided by a customer on a company's website. This is because the pattern indicated when a customer accesses a particular site might not be representative of the general tendency of that customer. Therefore, in this study, rather than the pattern indicated through a particular site, a customer's interest is identified using that customer's access record pertaining to external news. Hence, by utilizing this method, we proposed a methodology to perform customer segmentation. In addition, by extracting the main issues through a topic analysis covering approximately 3,000 Internet news articles, the actual experiment applying customer segmentation is performed and the applicability of the proposed methodology is analyzed.