• Title/Summary/Keyword: CLuster Approach

Search Result 646, Processing Time 0.03 seconds

Developing the Strategies of Redesigning the Role of Retail Stores Using Cluster Analysis: The Case of Mongolian Retail Company (클러스터링을 통한 유통매장의 역할 재설계 전략 수립: 몽골유통사를 대상으로)

  • Tsatsral Telmentugs;KwangSup Shin
    • The Journal of Bigdata
    • /
    • v.8 no.1
    • /
    • pp.131-156
    • /
    • 2023
  • The traditional retail industry significantly changed over the past decade due to the mobile and online technologies. This change has been accompanied by a shift in consumer behavior regarding purchasing patterns. Despite the rise of online shopping, there are still specific categories of products, such as "Processed food" in Mongolia, for which traditional shopping remains the preferred purchase method. To prepare for the inevitable future of retail businesses, firms need to closely analyze the performance of their offline stores to plan their further actions in a new multi-channel environment. Retailers must integrate diverse channels into their operations to stay relevant and adjust to the shifting market. In this research, we have analyzed the performance data such as sales, profit, and amount of sales of offline stores by using clustering approach. From the clustering, we have found the several distinct insights by comparing the circumstances and performance of retail stores. For the certain retail stores, we have proposed three different strategies: a fulfillment hub store between online and offline channels, an experience store to elongate customers' time on the premises, and a merge between two non-related channels that could complement each other to increase traffic based on the store characteristics. With the proposed strategies, it may enhance the user experience and profit at the same time.

User-Perspective Issue Clustering Using Multi-Layered Two-Mode Network Analysis (다계층 이원 네트워크를 활용한 사용자 관점의 이슈 클러스터링)

  • Kim, Jieun;Kim, Namgyu;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.93-107
    • /
    • 2014
  • In this paper, we report what we have observed with regard to user-perspective issue clustering based on multi-layered two-mode network analysis. This work is significant in the context of data collection by companies about customer needs. Most companies have failed to uncover such needs for products or services properly in terms of demographic data such as age, income levels, and purchase history. Because of excessive reliance on limited internal data, most recommendation systems do not provide decision makers with appropriate business information for current business circumstances. However, part of the problem is the increasing regulation of personal data gathering and privacy. This makes demographic or transaction data collection more difficult, and is a significant hurdle for traditional recommendation approaches because these systems demand a great deal of personal data or transaction logs. Our motivation for presenting this paper to academia is our strong belief, and evidence, that most customers' requirements for products can be effectively and efficiently analyzed from unstructured textual data such as Internet news text. In order to derive users' requirements from textual data obtained online, the proposed approach in this paper attempts to construct double two-mode networks, such as a user-news network and news-issue network, and to integrate these into one quasi-network as the input for issue clustering. One of the contributions of this research is the development of a methodology utilizing enormous amounts of unstructured textual data for user-oriented issue clustering by leveraging existing text mining and social network analysis. In order to build multi-layered two-mode networks of news logs, we need some tools such as text mining and topic analysis. We used not only SAS Enterprise Miner 12.1, which provides a text miner module and cluster module for textual data analysis, but also NetMiner 4 for network visualization and analysis. Our approach for user-perspective issue clustering is composed of six main phases: crawling, topic analysis, access pattern analysis, network merging, network conversion, and clustering. In the first phase, we collect visit logs for news sites by crawler. After gathering unstructured news article data, the topic analysis phase extracts issues from each news article in order to build an article-news network. For simplicity, 100 topics are extracted from 13,652 articles. In the third phase, a user-article network is constructed with access patterns derived from web transaction logs. The double two-mode networks are then merged into a quasi-network of user-issue. Finally, in the user-oriented issue-clustering phase, we classify issues through structural equivalence, and compare these with the clustering results from statistical tools and network analysis. An experiment with a large dataset was performed to build a multi-layer two-mode network. After that, we compared the results of issue clustering from SAS with that of network analysis. The experimental dataset was from a web site ranking site, and the biggest portal site in Korea. The sample dataset contains 150 million transaction logs and 13,652 news articles of 5,000 panels over one year. User-article and article-issue networks are constructed and merged into a user-issue quasi-network using Netminer. Our issue-clustering results applied the Partitioning Around Medoids (PAM) algorithm and Multidimensional Scaling (MDS), and are consistent with the results from SAS clustering. In spite of extensive efforts to provide user information with recommendation systems, most projects are successful only when companies have sufficient data about users and transactions. Our proposed methodology, user-perspective issue clustering, can provide practical support to decision-making in companies because it enhances user-related data from unstructured textual data. To overcome the problem of insufficient data from traditional approaches, our methodology infers customers' real interests by utilizing web transaction logs. In addition, we suggest topic analysis and issue clustering as a practical means of issue identification.

Multi-Vector Document Embedding Using Semantic Decomposition of Complex Documents (복합 문서의 의미적 분해를 통한 다중 벡터 문서 임베딩 방법론)

  • Park, Jongin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.19-41
    • /
    • 2019
  • According to the rapidly increasing demand for text data analysis, research and investment in text mining are being actively conducted not only in academia but also in various industries. Text mining is generally conducted in two steps. In the first step, the text of the collected document is tokenized and structured to convert the original document into a computer-readable form. In the second step, tasks such as document classification, clustering, and topic modeling are conducted according to the purpose of analysis. Until recently, text mining-related studies have been focused on the application of the second steps, such as document classification, clustering, and topic modeling. However, with the discovery that the text structuring process substantially influences the quality of the analysis results, various embedding methods have actively been studied to improve the quality of analysis results by preserving the meaning of words and documents in the process of representing text data as vectors. Unlike structured data, which can be directly applied to a variety of operations and traditional analysis techniques, Unstructured text should be preceded by a structuring task that transforms the original document into a form that the computer can understand before analysis. It is called "Embedding" that arbitrary objects are mapped to a specific dimension space while maintaining algebraic properties for structuring the text data. Recently, attempts have been made to embed not only words but also sentences, paragraphs, and entire documents in various aspects. Particularly, with the demand for analysis of document embedding increases rapidly, many algorithms have been developed to support it. Among them, doc2Vec which extends word2Vec and embeds each document into one vector is most widely used. However, the traditional document embedding method represented by doc2Vec generates a vector for each document using the whole corpus included in the document. This causes a limit that the document vector is affected by not only core words but also miscellaneous words. Additionally, the traditional document embedding schemes usually map each document into a single corresponding vector. Therefore, it is difficult to represent a complex document with multiple subjects into a single vector accurately using the traditional approach. In this paper, we propose a new multi-vector document embedding method to overcome these limitations of the traditional document embedding methods. This study targets documents that explicitly separate body content and keywords. In the case of a document without keywords, this method can be applied after extract keywords through various analysis methods. However, since this is not the core subject of the proposed method, we introduce the process of applying the proposed method to documents that predefine keywords in the text. The proposed method consists of (1) Parsing, (2) Word Embedding, (3) Keyword Vector Extraction, (4) Keyword Clustering, and (5) Multiple-Vector Generation. The specific process is as follows. all text in a document is tokenized and each token is represented as a vector having N-dimensional real value through word embedding. After that, to overcome the limitations of the traditional document embedding method that is affected by not only the core word but also the miscellaneous words, vectors corresponding to the keywords of each document are extracted and make up sets of keyword vector for each document. Next, clustering is conducted on a set of keywords for each document to identify multiple subjects included in the document. Finally, a Multi-vector is generated from vectors of keywords constituting each cluster. The experiments for 3.147 academic papers revealed that the single vector-based traditional approach cannot properly map complex documents because of interference among subjects in each vector. With the proposed multi-vector based method, we ascertained that complex documents can be vectorized more accurately by eliminating the interference among subjects.

Exploring Changes in Science PCK Characteristics through a Family Resemblance Approach (가족유사성 접근을 통한 과학 PCK 변화 탐색)

  • Kwak, Youngsun
    • Journal of the Korean Society of Earth Science Education
    • /
    • v.15 no.2
    • /
    • pp.235-248
    • /
    • 2022
  • With the changes in the future educational environment, such as the rapid decline of the school-age population and the expansion of students' choice of curriculum, changes are also required in PCK, the expertise of science teachers. In other words, the categories constituting the existing 'consensus-PCK' and the characteristics of 'science PCK' are not fixed, so more categories and characteristics can be added. The purpose of this study is to explore the potential area of science PCK required to cope with changes in the future educational environment in the form of 'Family Resemblance Science PCK (Family Resemblance-PCK, hereafter)' through Wittgenstein's family resemblance approach. For this purpose, in-depth interviews were conducted with three focus groups. In the focus group in-depth interview, participants discussed how the science PCK required for science teachers in future schools in 2030-2045 will change due to changes in the future society and educational environment. Qualitative analysis was performed based on the in-depth interview, and semantic network analysis was performed on the in-depth interview text to analyze the characteristics of 'Family Resemblance-PCK' differentiated from the existing 'consensus-PCK'. In results, the characteristics of Family Resemblance-PCK, which are newly requested along with changes in role expectations of science teachers, were examined by PCK area. As a result of semantic network analysis of Family Resemblance-PCK, it was found that Family Resemblance-PCK expands its boundaries from the existing consensus-PCK, which is the starting point, and new PCK elements were added. Looking at the aspects of Family Resemblance-PCK, [AI-Convergence Knowledge-Contents-Digital], [Community-Network-Human Resources-Relationships], [Technology-Exploration-Virtual Reality-Research], [Self-Directed Learning-Collaboration-Community], etc., form a distinct network cluster, and it is expected that future science teacher expertise will be formed and strengthened around these PCK areas. Based on the research results, changes in the professionalism of science teachers in future schools and countermeasures were proposed as a conclusion.

Color-related Query Processing for Intelligent E-Commerce Search (지능형 검색엔진을 위한 색상 질의 처리 방안)

  • Hong, Jung A;Koo, Kyo Jung;Cha, Ji Won;Seo, Ah Jeong;Yeo, Un Yeong;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.109-125
    • /
    • 2019
  • As interest on intelligent search engines increases, various studies have been conducted to extract and utilize the features related to products intelligencely. In particular, when users search for goods in e-commerce search engines, the 'color' of a product is an important feature that describes the product. Therefore, it is necessary to deal with the synonyms of color terms in order to produce accurate results to user's color-related queries. Previous studies have suggested dictionary-based approach to process synonyms for color features. However, the dictionary-based approach has a limitation that it cannot handle unregistered color-related terms in user queries. In order to overcome the limitation of the conventional methods, this research proposes a model which extracts RGB values from an internet search engine in real time, and outputs similar color names based on designated color information. At first, a color term dictionary was constructed which includes color names and R, G, B values of each color from Korean color standard digital palette program and the Wikipedia color list for the basic color search. The dictionary has been made more robust by adding 138 color names converted from English color names to foreign words in Korean, and with corresponding RGB values. Therefore, the fininal color dictionary includes a total of 671 color names and corresponding RGB values. The method proposed in this research starts by searching for a specific color which a user searched for. Then, the presence of the searched color in the built-in color dictionary is checked. If there exists the color in the dictionary, the RGB values of the color in the dictioanry are used as reference values of the retrieved color. If the searched color does not exist in the dictionary, the top-5 Google image search results of the searched color are crawled and average RGB values are extracted in certain middle area of each image. To extract the RGB values in images, a variety of different ways was attempted since there are limits to simply obtain the average of the RGB values of the center area of images. As a result, clustering RGB values in image's certain area and making average value of the cluster with the highest density as the reference values showed the best performance. Based on the reference RGB values of the searched color, the RGB values of all the colors in the color dictionary constructed aforetime are compared. Then a color list is created with colors within the range of ${\pm}50$ for each R value, G value, and B value. Finally, using the Euclidean distance between the above results and the reference RGB values of the searched color, the color with the highest similarity from up to five colors becomes the final outcome. In order to evaluate the usefulness of the proposed method, we performed an experiment. In the experiment, 300 color names and corresponding color RGB values by the questionnaires were obtained. They are used to compare the RGB values obtained from four different methods including the proposed method. The average euclidean distance of CIE-Lab using our method was about 13.85, which showed a relatively low distance compared to 3088 for the case using synonym dictionary only and 30.38 for the case using the dictionary with Korean synonym website WordNet. The case which didn't use clustering method of the proposed method showed 13.88 of average euclidean distance, which implies the DBSCAN clustering of the proposed method can reduce the Euclidean distance. This research suggests a new color synonym processing method based on RGB values that combines the dictionary method with the real time synonym processing method for new color names. This method enables to get rid of the limit of the dictionary-based approach which is a conventional synonym processing method. This research can contribute to improve the intelligence of e-commerce search systems especially on the color searching feature.

Derivation of Digital Music's Ranking Change Through Time Series Clustering (시계열 군집분석을 통한 디지털 음원의 순위 변화 패턴 분류)

  • Yoo, In-Jin;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.3
    • /
    • pp.171-191
    • /
    • 2020
  • This study focused on digital music, which is the most valuable cultural asset in the modern society and occupies a particularly important position in the flow of the Korean Wave. Digital music was collected based on the "Gaon Chart," a well-established music chart in Korea. Through this, the changes in the ranking of the music that entered the chart for 73 weeks were collected. Afterwards, patterns with similar characteristics were derived through time series cluster analysis. Then, a descriptive analysis was performed on the notable features of each pattern. The research process suggested by this study is as follows. First, in the data collection process, time series data was collected to check the ranking change of digital music. Subsequently, in the data processing stage, the collected data was matched with the rankings over time, and the music title and artist name were processed. Each analysis is then sequentially performed in two stages consisting of exploratory analysis and explanatory analysis. First, the data collection period was limited to the period before 'the music bulk buying phenomenon', a reliability issue related to music ranking in Korea. Specifically, it is 73 weeks starting from December 31, 2017 to January 06, 2018 as the first week, and from May 19, 2019 to May 25, 2019. And the analysis targets were limited to digital music released in Korea. In particular, digital music was collected based on the "Gaon Chart", a well-known music chart in Korea. Unlike private music charts that are being serviced in Korea, Gaon Charts are charts approved by government agencies and have basic reliability. Therefore, it can be considered that it has more public confidence than the ranking information provided by other services. The contents of the collected data are as follows. Data on the period and ranking, the name of the music, the name of the artist, the name of the album, the Gaon index, the production company, and the distribution company were collected for the music that entered the top 100 on the music chart within the collection period. Through data collection, 7,300 music, which were included in the top 100 on the music chart, were identified for a total of 73 weeks. On the other hand, in the case of digital music, since the cases included in the music chart for more than two weeks are frequent, the duplication of music is removed through the pre-processing process. For duplicate music, the number and location of the duplicated music were checked through the duplicate check function, and then deleted to form data for analysis. Through this, a list of 742 unique music for analysis among the 7,300-music data in advance was secured. A total of 742 songs were secured through previous data collection and pre-processing. In addition, a total of 16 patterns were derived through time series cluster analysis on the ranking change. Based on the patterns derived after that, two representative patterns were identified: 'Steady Seller' and 'One-Hit Wonder'. Furthermore, the two patterns were subdivided into five patterns in consideration of the survival period of the music and the music ranking. The important characteristics of each pattern are as follows. First, the artist's superstar effect and bandwagon effect were strong in the one-hit wonder-type pattern. Therefore, when consumers choose a digital music, they are strongly influenced by the superstar effect and the bandwagon effect. Second, through the Steady Seller pattern, we confirmed the music that have been chosen by consumers for a very long time. In addition, we checked the patterns of the most selected music through consumer needs. Contrary to popular belief, the steady seller: mid-term pattern, not the one-hit wonder pattern, received the most choices from consumers. Particularly noteworthy is that the 'Climbing the Chart' phenomenon, which is contrary to the existing pattern, was confirmed through the steady-seller pattern. This study focuses on the change in the ranking of music over time, a field that has been relatively alienated centering on digital music. In addition, a new approach to music research was attempted by subdividing the pattern of ranking change rather than predicting the success and ranking of music.

A Study on the Emotional Reaction to the Interior Design - Focusing on the Worship Space in the Church Buildings - (실내공간 구성요소에 의한 감성반응 연구 - 기독교 예배공간 강단부를 중심으로 -)

  • Lee, Hyun-Jeong;Lee, Gyoo-Baek
    • Archives of design research
    • /
    • v.18 no.4 s.62
    • /
    • pp.257-266
    • /
    • 2005
  • The purpose of this study is to investigate the psychological reaction to the image of the worship space in the church buildings and to quantify its contribution of the stimulation elements causing such reaction, and finally to suggest basic data for realizing emotional worship space of the church architecture. For this, 143 christians were surveyed to analyze the relationship between 23 emotional expressions extracted from the worship space and 32 images of the worship space. The combined data was described with the two dimensional dispersion using the quantification theory III. The analysis found out that 'simplicity-complexity' of the image consisted of the horizontal axis (the x-axis) and 'creativity' of the image the vertical axis(the y-axis). In addition, to extract the causal relationship between the value of emotional reaction and its stimulation elements quantitatively, the author indicated 4 emotional word groups such as simple, sublime for x-axis and typical creative for y-axis based on its similarity by the cluster analysis, The quantification theory I was also used with total value of equivalent emotional words as the standard variance and the emotional stimulation elements of the worship space as the independent variance. 9 specific examples of the emotional stimulation elements were selected including colors and shapes of the wall and the ceiling, shapes and finish of the floor materials, window shapes, and the use of the symbolic elements. Furthermore, 31 subcategories were also chosen to analyse their contribution on the emotional reaction. As a result, the color and finish of the wall found to be the most effective element on the subjects' emotional reaction, while the symbolic elements and the color of the wall found to be the least effective. It is estimated that the present study would be helpful to increase the emotional satisfaction of the users and to approach a spatial design through satisfying the types and purposes of the space.

  • PDF

Regional Differences of Proteins Expressing in Adipose Depots Isolated from Cows, Steers and Bulls as Identified by a Proteomic Approach

  • Cho, Jin Hyoung;Jeong, Jin Young;Lee, Ra Ham;Park, Mi Na;Kim, Seok-Ho;Park, Seon-Min;Shin, Jae-Cheon;Jeon, Young-Joo;Shim, Jung-Hyun;Choi, Nag-Jin;Seo, Kang Seok;Cho, Young Sik;Kim, MinSeok S.;Ko, Sungho;Seo, Jae-Min;Lee, Seung-Youp;Chae, Jung-Il;Lee, Hyun-Jeong
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.29 no.8
    • /
    • pp.1197-1206
    • /
    • 2016
  • Adipose tissue in the loin muscle area of beef cattle as a marbling factor is directly associated with beef quality. To elucidate whether properties of proteins involved in depot specific adipose tissue were sex-dependent, we analyzed protein expression of intramuscular adipose tissue (IMAT) and omental adipose tissue (OMAT) from Hanwoo cows, steers, and bulls of Korean native beef cattle by liquid chromatography-tandem mass spectrometry (LC-MS/MS)-based proteomic analysis, quantitative polymerase chain reaction (PCR) and western blot analysis. Two different adipose depots (i.e. intramuscular and omental) were collected from cows (n = 7), steers (n = 7), or bulls (n = 7). LC-MS/MS revealed a total of 55 and 35 proteins in IMAT and OMAT, respectively. Of the 55 proteins identified, 44, 40, and 42 proteins were confirmed to be differentially expressed in IMAT of cows, steers, and bulls, respectively. In OMAT of cows, steers, and bulls, 33, 33, and 22 were confirmed to be differentially expressed, respectively. Tropomyosin (TPM) 1, TPM 2, and TPM3 were subjected to verification by quantitative PCR and western blot analysis in IMAT and OMAT of Hanwoo cows, steers, and bulls as key factors closely associated with muscle development. Both mRNA levels and protein levels of TPM1, TPM2, and TPM3 in IMAT were lower in bulls compared to in cows or steers suggesting that they were positively correlated with marbling score and quality grade. Our results may aid the regulation of marbling development and improvement of meat quality grades in beef cattle.

A Study on the Exploratory Spatial Data Analysis of the Distribution of Longevity Population and the Scale Effect of the Modifiable Areal Unit Problem(MAUP) (장수 인구의 분포 패턴에 관한 탐색적 공간 데이터 분석과 수정 가능한 공간단위 문제(MAUP)의 Scale Effect에 관한 연구)

  • Choi, Don-Jeong;Suh, Yong-Cheol
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.16 no.3
    • /
    • pp.40-53
    • /
    • 2013
  • Most of the existing domestic studies to identify the distribution of longevity population and influencing factors oriented confirmatory approach. Furthermore, most of the studies in this research topic simply have used their own definition of spatial unit of analysis or employed arbitrary spatial units of analysis according to data availability. These research approaches can not sufficiently reflect the spatial characteristic of longevity phenomenon and exposed to the Modifiable Aerial Unit Problem(MAUP). This research performed the Exploratory Spatial Data Analysis(ESDA) to identify the spatial autocorrelation of the distribution of longevity population and investigated whether the modifiable areal unit problem in the aspect of scale effect using spatial population data in Korea. We used Si_Gun_Gu and Eup_Myeon_Dong as two different spatial units of regional longevity indicators measured. Then, we applied Getis-Ord Gi* to investigate the existence of spatial hot spots and cold spots. The results from our analysis show that there exist statistically significant spatial autocorrelation and spatial hot spots and cold spots of regional longevity at both Si_Gun_Gu and Eup_Myeon_Dong levels. This result implies that the modifiable areal unit problem does exist in the studies of spatial patterns of longevity population distribution. The demand for longevity researches would be increased inevitably. In addition, there were apparent differences for the global spatial autocorrelation and local spatial cluster which calculated different spatial units such as Si_Gun_Gu and Eup_Myeon_Dong and this can be seen as scale effect of MAUP. The findings from our analysis show that any study in this topic can mislead results when the modifiable areal unit problem and spatial autocorrelation are not explicitly considered.

A Case Study on the Community-based Elderly Care Services Provided by the Social Economy Network in Gwangjin-Gu, Seoul (사회적경제 조직의 지역사회 돌봄 네트워킹 가능성에 대한 비판적 고찰: 서울시 광진구 노인돌봄 클러스터 사례연구)

  • Kim, HyoungYong;Han, EunYoung
    • 한국노년학
    • /
    • v.38 no.4
    • /
    • pp.1057-1081
    • /
    • 2018
  • This study analyzed the case of elderly care cluster in Gwangjin-gu to explore the possibilities of social economy as a provider of community-based social services. Community-based means the approach by which community organizations build a voluntary and collaborative network to enhance collective problem-solving abilities. Therefore, it is very likely that the social economy that emphasizes people, labor, community, and democratic principles can contribute to community-based social services. This study analyzed social economic network by using four characteristics of social economy suggested by OECD community economy and employment program as an analysis framework. The results of this study are as follows: First, it is found that social economy would hardly supply community-based social services through network cooperation because of a large variation in community identity, investment to new product, and labor protection. Second, community users are not the consumers of the social economy and the products of the social economy stay in market products only for the organizations in social economy. In order to create good services that meet the needs of residents, community development approaches are required at the same time. The importance of community space where local residents and social economy meet is derived. Third, public support such as purchasing support has weakened the ecosystem of social economy by making the distinction between public economy and social economy more obscure. On the other hand, public investment in community infrastructure is an indirect aid to social economy to communicate with residents and to promote good supply and consumption. In the end, community-based social services need a platform where the social economy and the people meet. This type of public investment can create the ecosystem of the social economy.