• Title/Summary/Keyword: CLuster Approach

Search Result 641, Processing Time 0.028 seconds

Derivation of Digital Music's Ranking Change Through Time Series Clustering (시계열 군집분석을 통한 디지털 음원의 순위 변화 패턴 분류)

  • Yoo, In-Jin;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.3
    • /
    • pp.171-191
    • /
    • 2020
  • This study focused on digital music, which is the most valuable cultural asset in the modern society and occupies a particularly important position in the flow of the Korean Wave. Digital music was collected based on the "Gaon Chart," a well-established music chart in Korea. Through this, the changes in the ranking of the music that entered the chart for 73 weeks were collected. Afterwards, patterns with similar characteristics were derived through time series cluster analysis. Then, a descriptive analysis was performed on the notable features of each pattern. The research process suggested by this study is as follows. First, in the data collection process, time series data was collected to check the ranking change of digital music. Subsequently, in the data processing stage, the collected data was matched with the rankings over time, and the music title and artist name were processed. Each analysis is then sequentially performed in two stages consisting of exploratory analysis and explanatory analysis. First, the data collection period was limited to the period before 'the music bulk buying phenomenon', a reliability issue related to music ranking in Korea. Specifically, it is 73 weeks starting from December 31, 2017 to January 06, 2018 as the first week, and from May 19, 2019 to May 25, 2019. And the analysis targets were limited to digital music released in Korea. In particular, digital music was collected based on the "Gaon Chart", a well-known music chart in Korea. Unlike private music charts that are being serviced in Korea, Gaon Charts are charts approved by government agencies and have basic reliability. Therefore, it can be considered that it has more public confidence than the ranking information provided by other services. The contents of the collected data are as follows. Data on the period and ranking, the name of the music, the name of the artist, the name of the album, the Gaon index, the production company, and the distribution company were collected for the music that entered the top 100 on the music chart within the collection period. Through data collection, 7,300 music, which were included in the top 100 on the music chart, were identified for a total of 73 weeks. On the other hand, in the case of digital music, since the cases included in the music chart for more than two weeks are frequent, the duplication of music is removed through the pre-processing process. For duplicate music, the number and location of the duplicated music were checked through the duplicate check function, and then deleted to form data for analysis. Through this, a list of 742 unique music for analysis among the 7,300-music data in advance was secured. A total of 742 songs were secured through previous data collection and pre-processing. In addition, a total of 16 patterns were derived through time series cluster analysis on the ranking change. Based on the patterns derived after that, two representative patterns were identified: 'Steady Seller' and 'One-Hit Wonder'. Furthermore, the two patterns were subdivided into five patterns in consideration of the survival period of the music and the music ranking. The important characteristics of each pattern are as follows. First, the artist's superstar effect and bandwagon effect were strong in the one-hit wonder-type pattern. Therefore, when consumers choose a digital music, they are strongly influenced by the superstar effect and the bandwagon effect. Second, through the Steady Seller pattern, we confirmed the music that have been chosen by consumers for a very long time. In addition, we checked the patterns of the most selected music through consumer needs. Contrary to popular belief, the steady seller: mid-term pattern, not the one-hit wonder pattern, received the most choices from consumers. Particularly noteworthy is that the 'Climbing the Chart' phenomenon, which is contrary to the existing pattern, was confirmed through the steady-seller pattern. This study focuses on the change in the ranking of music over time, a field that has been relatively alienated centering on digital music. In addition, a new approach to music research was attempted by subdividing the pattern of ranking change rather than predicting the success and ranking of music.

A Study on the Emotional Reaction to the Interior Design - Focusing on the Worship Space in the Church Buildings - (실내공간 구성요소에 의한 감성반응 연구 - 기독교 예배공간 강단부를 중심으로 -)

  • Lee, Hyun-Jeong;Lee, Gyoo-Baek
    • Archives of design research
    • /
    • v.18 no.4 s.62
    • /
    • pp.257-266
    • /
    • 2005
  • The purpose of this study is to investigate the psychological reaction to the image of the worship space in the church buildings and to quantify its contribution of the stimulation elements causing such reaction, and finally to suggest basic data for realizing emotional worship space of the church architecture. For this, 143 christians were surveyed to analyze the relationship between 23 emotional expressions extracted from the worship space and 32 images of the worship space. The combined data was described with the two dimensional dispersion using the quantification theory III. The analysis found out that 'simplicity-complexity' of the image consisted of the horizontal axis (the x-axis) and 'creativity' of the image the vertical axis(the y-axis). In addition, to extract the causal relationship between the value of emotional reaction and its stimulation elements quantitatively, the author indicated 4 emotional word groups such as simple, sublime for x-axis and typical creative for y-axis based on its similarity by the cluster analysis, The quantification theory I was also used with total value of equivalent emotional words as the standard variance and the emotional stimulation elements of the worship space as the independent variance. 9 specific examples of the emotional stimulation elements were selected including colors and shapes of the wall and the ceiling, shapes and finish of the floor materials, window shapes, and the use of the symbolic elements. Furthermore, 31 subcategories were also chosen to analyse their contribution on the emotional reaction. As a result, the color and finish of the wall found to be the most effective element on the subjects' emotional reaction, while the symbolic elements and the color of the wall found to be the least effective. It is estimated that the present study would be helpful to increase the emotional satisfaction of the users and to approach a spatial design through satisfying the types and purposes of the space.

  • PDF

Regional Differences of Proteins Expressing in Adipose Depots Isolated from Cows, Steers and Bulls as Identified by a Proteomic Approach

  • Cho, Jin Hyoung;Jeong, Jin Young;Lee, Ra Ham;Park, Mi Na;Kim, Seok-Ho;Park, Seon-Min;Shin, Jae-Cheon;Jeon, Young-Joo;Shim, Jung-Hyun;Choi, Nag-Jin;Seo, Kang Seok;Cho, Young Sik;Kim, MinSeok S.;Ko, Sungho;Seo, Jae-Min;Lee, Seung-Youp;Chae, Jung-Il;Lee, Hyun-Jeong
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.29 no.8
    • /
    • pp.1197-1206
    • /
    • 2016
  • Adipose tissue in the loin muscle area of beef cattle as a marbling factor is directly associated with beef quality. To elucidate whether properties of proteins involved in depot specific adipose tissue were sex-dependent, we analyzed protein expression of intramuscular adipose tissue (IMAT) and omental adipose tissue (OMAT) from Hanwoo cows, steers, and bulls of Korean native beef cattle by liquid chromatography-tandem mass spectrometry (LC-MS/MS)-based proteomic analysis, quantitative polymerase chain reaction (PCR) and western blot analysis. Two different adipose depots (i.e. intramuscular and omental) were collected from cows (n = 7), steers (n = 7), or bulls (n = 7). LC-MS/MS revealed a total of 55 and 35 proteins in IMAT and OMAT, respectively. Of the 55 proteins identified, 44, 40, and 42 proteins were confirmed to be differentially expressed in IMAT of cows, steers, and bulls, respectively. In OMAT of cows, steers, and bulls, 33, 33, and 22 were confirmed to be differentially expressed, respectively. Tropomyosin (TPM) 1, TPM 2, and TPM3 were subjected to verification by quantitative PCR and western blot analysis in IMAT and OMAT of Hanwoo cows, steers, and bulls as key factors closely associated with muscle development. Both mRNA levels and protein levels of TPM1, TPM2, and TPM3 in IMAT were lower in bulls compared to in cows or steers suggesting that they were positively correlated with marbling score and quality grade. Our results may aid the regulation of marbling development and improvement of meat quality grades in beef cattle.

A Study on the Exploratory Spatial Data Analysis of the Distribution of Longevity Population and the Scale Effect of the Modifiable Areal Unit Problem(MAUP) (장수 인구의 분포 패턴에 관한 탐색적 공간 데이터 분석과 수정 가능한 공간단위 문제(MAUP)의 Scale Effect에 관한 연구)

  • Choi, Don-Jeong;Suh, Yong-Cheol
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.16 no.3
    • /
    • pp.40-53
    • /
    • 2013
  • Most of the existing domestic studies to identify the distribution of longevity population and influencing factors oriented confirmatory approach. Furthermore, most of the studies in this research topic simply have used their own definition of spatial unit of analysis or employed arbitrary spatial units of analysis according to data availability. These research approaches can not sufficiently reflect the spatial characteristic of longevity phenomenon and exposed to the Modifiable Aerial Unit Problem(MAUP). This research performed the Exploratory Spatial Data Analysis(ESDA) to identify the spatial autocorrelation of the distribution of longevity population and investigated whether the modifiable areal unit problem in the aspect of scale effect using spatial population data in Korea. We used Si_Gun_Gu and Eup_Myeon_Dong as two different spatial units of regional longevity indicators measured. Then, we applied Getis-Ord Gi* to investigate the existence of spatial hot spots and cold spots. The results from our analysis show that there exist statistically significant spatial autocorrelation and spatial hot spots and cold spots of regional longevity at both Si_Gun_Gu and Eup_Myeon_Dong levels. This result implies that the modifiable areal unit problem does exist in the studies of spatial patterns of longevity population distribution. The demand for longevity researches would be increased inevitably. In addition, there were apparent differences for the global spatial autocorrelation and local spatial cluster which calculated different spatial units such as Si_Gun_Gu and Eup_Myeon_Dong and this can be seen as scale effect of MAUP. The findings from our analysis show that any study in this topic can mislead results when the modifiable areal unit problem and spatial autocorrelation are not explicitly considered.

A Case Study on the Community-based Elderly Care Services Provided by the Social Economy Network in Gwangjin-Gu, Seoul (사회적경제 조직의 지역사회 돌봄 네트워킹 가능성에 대한 비판적 고찰: 서울시 광진구 노인돌봄 클러스터 사례연구)

  • Kim, HyoungYong;Han, EunYoung
    • 한국노년학
    • /
    • v.38 no.4
    • /
    • pp.1057-1081
    • /
    • 2018
  • This study analyzed the case of elderly care cluster in Gwangjin-gu to explore the possibilities of social economy as a provider of community-based social services. Community-based means the approach by which community organizations build a voluntary and collaborative network to enhance collective problem-solving abilities. Therefore, it is very likely that the social economy that emphasizes people, labor, community, and democratic principles can contribute to community-based social services. This study analyzed social economic network by using four characteristics of social economy suggested by OECD community economy and employment program as an analysis framework. The results of this study are as follows: First, it is found that social economy would hardly supply community-based social services through network cooperation because of a large variation in community identity, investment to new product, and labor protection. Second, community users are not the consumers of the social economy and the products of the social economy stay in market products only for the organizations in social economy. In order to create good services that meet the needs of residents, community development approaches are required at the same time. The importance of community space where local residents and social economy meet is derived. Third, public support such as purchasing support has weakened the ecosystem of social economy by making the distinction between public economy and social economy more obscure. On the other hand, public investment in community infrastructure is an indirect aid to social economy to communicate with residents and to promote good supply and consumption. In the end, community-based social services need a platform where the social economy and the people meet. This type of public investment can create the ecosystem of the social economy.

Deep Learning OCR based document processing platform and its application in financial domain (금융 특화 딥러닝 광학문자인식 기반 문서 처리 플랫폼 구축 및 금융권 내 활용)

  • Dongyoung Kim;Doohyung Kim;Myungsung Kwak;Hyunsoo Son;Dongwon Sohn;Mingi Lim;Yeji Shin;Hyeonjung Lee;Chandong Park;Mihyang Kim;Dongwon Choi
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.1
    • /
    • pp.143-174
    • /
    • 2023
  • With the development of deep learning technologies, Artificial Intelligence powered Optical Character Recognition (AI-OCR) has evolved to read multiple languages from various forms of images accurately. For the financial industry, where a large number of diverse documents are processed through manpower, the potential for using AI-OCR is great. In this study, we present a configuration and a design of an AI-OCR modality for use in the financial industry and discuss the platform construction with application cases. Since the use of financial domain data is prohibited under the Personal Information Protection Act, we developed a deep learning-based data generation approach and used it to train the AI-OCR models. The AI-OCR models are trained for image preprocessing, text recognition, and language processing and are configured as a microservice architected platform to process a broad variety of documents. We have demonstrated the AI-OCR platform by applying it to financial domain tasks of document sorting, document verification, and typing assistance The demonstrations confirm the increasing work efficiency and conveniences.

SKU recommender system for retail stores that carry identical brands using collaborative filtering and hybrid filtering (협업 필터링 및 하이브리드 필터링을 이용한 동종 브랜드 판매 매장간(間) 취급 SKU 추천 시스템)

  • Joe, Denis Yongmin;Nam, Kihwan
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.4
    • /
    • pp.77-110
    • /
    • 2017
  • Recently, the diversification and individualization of consumption patterns through the web and mobile devices based on the Internet have been rapid. As this happens, the efficient operation of the offline store, which is a traditional distribution channel, has become more important. In order to raise both the sales and profits of stores, stores need to supply and sell the most attractive products to consumers in a timely manner. However, there is a lack of research on which SKUs, out of many products, can increase sales probability and reduce inventory costs. In particular, if a company sells products through multiple in-store stores across multiple locations, it would be helpful to increase sales and profitability of stores if SKUs appealing to customers are recommended. In this study, the recommender system (recommender system such as collaborative filtering and hybrid filtering), which has been used for personalization recommendation, is suggested by SKU recommendation method of a store unit of a distribution company that handles a homogeneous brand through a plurality of sales stores by country and region. We calculated the similarity of each store by using the purchase data of each store's handling items, filtering the collaboration according to the sales history of each store by each SKU, and finally recommending the individual SKU to the store. In addition, the store is classified into four clusters through PCA (Principal Component Analysis) and cluster analysis (Clustering) using the store profile data. The recommendation system is implemented by the hybrid filtering method that applies the collaborative filtering in each cluster and measured the performance of both methods based on actual sales data. Most of the existing recommendation systems have been studied by recommending items such as movies and music to the users. In practice, industrial applications have also become popular. In the meantime, there has been little research on recommending SKUs for each store by applying these recommendation systems, which have been mainly dealt with in the field of personalization services, to the store units of distributors handling similar brands. If the recommendation method of the existing recommendation methodology was 'the individual field', this study expanded the scope of the store beyond the individual domain through a plurality of sales stores by country and region and dealt with the store unit of the distribution company handling the same brand SKU while suggesting a recommendation method. In addition, if the existing recommendation system is limited to online, it is recommended to apply the data mining technique to develop an algorithm suitable for expanding to the store area rather than expanding the utilization range offline and analyzing based on the existing individual. The significance of the results of this study is that the personalization recommendation algorithm is applied to a plurality of sales outlets handling the same brand. A meaningful result is derived and a concrete methodology that can be constructed and used as a system for actual companies is proposed. It is also meaningful that this is the first attempt to expand the research area of the academic field related to the existing recommendation system, which was focused on the personalization domain, to a sales store of a company handling the same brand. From 05 to 03 in 2014, the number of stores' sales volume of the top 100 SKUs are limited to 52 SKUs by collaborative filtering and the hybrid filtering method SKU recommended. We compared the performance of the two recommendation methods by totaling the sales results. The reason for comparing the two recommendation methods is that the recommendation method of this study is defined as the reference model in which offline collaborative filtering is applied to demonstrate higher performance than the existing recommendation method. The results of this model are compared with the Hybrid filtering method, which is a model that reflects the characteristics of the offline store view. The proposed method showed a higher performance than the existing recommendation method. The proposed method was proved by using actual sales data of large Korean apparel companies. In this study, we propose a method to extend the recommendation system of the individual level to the group level and to efficiently approach it. In addition to the theoretical framework, which is of great value.

A Proposal of a Keyword Extraction System for Detecting Social Issues (사회문제 해결형 기술수요 발굴을 위한 키워드 추출 시스템 제안)

  • Jeong, Dami;Kim, Jaeseok;Kim, Gi-Nam;Heo, Jong-Uk;On, Byung-Won;Kang, Mijung
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.3
    • /
    • pp.1-23
    • /
    • 2013
  • To discover significant social issues such as unemployment, economy crisis, social welfare etc. that are urgent issues to be solved in a modern society, in the existing approach, researchers usually collect opinions from professional experts and scholars through either online or offline surveys. However, such a method does not seem to be effective from time to time. As usual, due to the problem of expense, a large number of survey replies are seldom gathered. In some cases, it is also hard to find out professional persons dealing with specific social issues. Thus, the sample set is often small and may have some bias. Furthermore, regarding a social issue, several experts may make totally different conclusions because each expert has his subjective point of view and different background. In this case, it is considerably hard to figure out what current social issues are and which social issues are really important. To surmount the shortcomings of the current approach, in this paper, we develop a prototype system that semi-automatically detects social issue keywords representing social issues and problems from about 1.3 million news articles issued by about 10 major domestic presses in Korea from June 2009 until July 2012. Our proposed system consists of (1) collecting and extracting texts from the collected news articles, (2) identifying only news articles related to social issues, (3) analyzing the lexical items of Korean sentences, (4) finding a set of topics regarding social keywords over time based on probabilistic topic modeling, (5) matching relevant paragraphs to a given topic, and (6) visualizing social keywords for easy understanding. In particular, we propose a novel matching algorithm relying on generative models. The goal of our proposed matching algorithm is to best match paragraphs to each topic. Technically, using a topic model such as Latent Dirichlet Allocation (LDA), we can obtain a set of topics, each of which has relevant terms and their probability values. In our problem, given a set of text documents (e.g., news articles), LDA shows a set of topic clusters, and then each topic cluster is labeled by human annotators, where each topic label stands for a social keyword. For example, suppose there is a topic (e.g., Topic1 = {(unemployment, 0.4), (layoff, 0.3), (business, 0.3)}) and then a human annotator labels "Unemployment Problem" on Topic1. In this example, it is non-trivial to understand what happened to the unemployment problem in our society. In other words, taking a look at only social keywords, we have no idea of the detailed events occurring in our society. To tackle this matter, we develop the matching algorithm that computes the probability value of a paragraph given a topic, relying on (i) topic terms and (ii) their probability values. For instance, given a set of text documents, we segment each text document to paragraphs. In the meantime, using LDA, we can extract a set of topics from the text documents. Based on our matching process, each paragraph is assigned to a topic, indicating that the paragraph best matches the topic. Finally, each topic has several best matched paragraphs. Furthermore, assuming there are a topic (e.g., Unemployment Problem) and the best matched paragraph (e.g., Up to 300 workers lost their jobs in XXX company at Seoul). In this case, we can grasp the detailed information of the social keyword such as "300 workers", "unemployment", "XXX company", and "Seoul". In addition, our system visualizes social keywords over time. Therefore, through our matching process and keyword visualization, most researchers will be able to detect social issues easily and quickly. Through this prototype system, we have detected various social issues appearing in our society and also showed effectiveness of our proposed methods according to our experimental results. Note that you can also use our proof-of-concept system in http://dslab.snu.ac.kr/demo.html.

An Expert System for the Estimation of the Growth Curve Parameters of New Markets (신규시장 성장모형의 모수 추정을 위한 전문가 시스템)

  • Lee, Dongwon;Jung, Yeojin;Jung, Jaekwon;Park, Dohyung
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.4
    • /
    • pp.17-35
    • /
    • 2015
  • Demand forecasting is the activity of estimating the quantity of a product or service that consumers will purchase for a certain period of time. Developing precise forecasting models are considered important since corporates can make strategic decisions on new markets based on future demand estimated by the models. Many studies have developed market growth curve models, such as Bass, Logistic, Gompertz models, which estimate future demand when a market is in its early stage. Among the models, Bass model, which explains the demand from two types of adopters, innovators and imitators, has been widely used in forecasting. Such models require sufficient demand observations to ensure qualified results. In the beginning of a new market, however, observations are not sufficient for the models to precisely estimate the market's future demand. For this reason, as an alternative, demands guessed from those of most adjacent markets are often used as references in such cases. Reference markets can be those whose products are developed with the same categorical technologies. A market's demand may be expected to have the similar pattern with that of a reference market in case the adoption pattern of a product in the market is determined mainly by the technology related to the product. However, such processes may not always ensure pleasing results because the similarity between markets depends on intuition and/or experience. There are two major drawbacks that human experts cannot effectively handle in this approach. One is the abundance of candidate reference markets to consider, and the other is the difficulty in calculating the similarity between markets. First, there can be too many markets to consider in selecting reference markets. Mostly, markets in the same category in an industrial hierarchy can be reference markets because they are usually based on the similar technologies. However, markets can be classified into different categories even if they are based on the same generic technologies. Therefore, markets in other categories also need to be considered as potential candidates. Next, even domain experts cannot consistently calculate the similarity between markets with their own qualitative standards. The inconsistency implies missing adjacent reference markets, which may lead to the imprecise estimation of future demand. Even though there are no missing reference markets, the new market's parameters can be hardly estimated from the reference markets without quantitative standards. For this reason, this study proposes a case-based expert system that helps experts overcome the drawbacks in discovering referential markets. First, this study proposes the use of Euclidean distance measure to calculate the similarity between markets. Based on their similarities, markets are grouped into clusters. Then, missing markets with the characteristics of the cluster are searched for. Potential candidate reference markets are extracted and recommended to users. After the iteration of these steps, definite reference markets are determined according to the user's selection among those candidates. Then, finally, the new market's parameters are estimated from the reference markets. For this procedure, two techniques are used in the model. One is clustering data mining technique, and the other content-based filtering of recommender systems. The proposed system implemented with those techniques can determine the most adjacent markets based on whether a user accepts candidate markets. Experiments were conducted to validate the usefulness of the system with five ICT experts involved. In the experiments, the experts were given the list of 16 ICT markets whose parameters to be estimated. For each of the markets, the experts estimated its parameters of growth curve models with intuition at first, and then with the system. The comparison of the experiments results show that the estimated parameters are closer when they use the system in comparison with the results when they guessed them without the system.

A Study on Market Size Estimation Method by Product Group Using Word2Vec Algorithm (Word2Vec을 활용한 제품군별 시장규모 추정 방법에 관한 연구)

  • Jung, Ye Lim;Kim, Ji Hui;Yoo, Hyoung Sun
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.1-21
    • /
    • 2020
  • With the rapid development of artificial intelligence technology, various techniques have been developed to extract meaningful information from unstructured text data which constitutes a large portion of big data. Over the past decades, text mining technologies have been utilized in various industries for practical applications. In the field of business intelligence, it has been employed to discover new market and/or technology opportunities and support rational decision making of business participants. The market information such as market size, market growth rate, and market share is essential for setting companies' business strategies. There has been a continuous demand in various fields for specific product level-market information. However, the information has been generally provided at industry level or broad categories based on classification standards, making it difficult to obtain specific and proper information. In this regard, we propose a new methodology that can estimate the market sizes of product groups at more detailed levels than that of previously offered. We applied Word2Vec algorithm, a neural network based semantic word embedding model, to enable automatic market size estimation from individual companies' product information in a bottom-up manner. The overall process is as follows: First, the data related to product information is collected, refined, and restructured into suitable form for applying Word2Vec model. Next, the preprocessed data is embedded into vector space by Word2Vec and then the product groups are derived by extracting similar products names based on cosine similarity calculation. Finally, the sales data on the extracted products is summated to estimate the market size of the product groups. As an experimental data, text data of product names from Statistics Korea's microdata (345,103 cases) were mapped in multidimensional vector space by Word2Vec training. We performed parameters optimization for training and then applied vector dimension of 300 and window size of 15 as optimized parameters for further experiments. We employed index words of Korean Standard Industry Classification (KSIC) as a product name dataset to more efficiently cluster product groups. The product names which are similar to KSIC indexes were extracted based on cosine similarity. The market size of extracted products as one product category was calculated from individual companies' sales data. The market sizes of 11,654 specific product lines were automatically estimated by the proposed model. For the performance verification, the results were compared with actual market size of some items. The Pearson's correlation coefficient was 0.513. Our approach has several advantages differing from the previous studies. First, text mining and machine learning techniques were applied for the first time on market size estimation, overcoming the limitations of traditional sampling based- or multiple assumption required-methods. In addition, the level of market category can be easily and efficiently adjusted according to the purpose of information use by changing cosine similarity threshold. Furthermore, it has a high potential of practical applications since it can resolve unmet needs for detailed market size information in public and private sectors. Specifically, it can be utilized in technology evaluation and technology commercialization support program conducted by governmental institutions, as well as business strategies consulting and market analysis report publishing by private firms. The limitation of our study is that the presented model needs to be improved in terms of accuracy and reliability. The semantic-based word embedding module can be advanced by giving a proper order in the preprocessed dataset or by combining another algorithm such as Jaccard similarity with Word2Vec. Also, the methods of product group clustering can be changed to other types of unsupervised machine learning algorithm. Our group is currently working on subsequent studies and we expect that it can further improve the performance of the conceptually proposed basic model in this study.