• Title/Summary/Keyword: clustering patterns

Search Result 439, Processing Time 0.034 seconds

The Efficient Feature Extraction of Handwritten Numerals in GLVQ Clustering Network (GLVQ클러스터링을 위한 필기체 숫자의 효율적인 특징 추출 방법)

  • Jeon, Jong-Won;Min, Jun-Yeong
    • The Transactions of the Korea Information Processing Society
    • /
    • v.2 no.6
    • /
    • pp.995-1001
    • /
    • 1995
  • The structure of a typical pattern recognition consists a pre-processing, a feature extraction(algorithm) and classification or recognition. In classification, when widely varying patterns exist in same category, we need the clustering which organize the similar patterns. Clustering algorithm is two approaches. Firs, statistical approaches which are k-means, ISODATA algorithm. Second, neural network approach which is T. Kohonen's LVQ(Learning Vector Quantization). Nikhil R. Palet al proposed the GLVQ(Generalized LVQ, 1993). This paper suggest the efficient feature extraction methods of handwritten numerals in GLVQ clustering network. We use the handwritten numeral data from 21's authors(ie, 200 patterns) and compare the proportion of misclassified patterns for each feature extraction methods. As results, when we use the projection combination method, the classification ratio is 98.5%.

  • PDF

Classification of Seoul Metro Stations Based on Boarding/ Alighting Patterns Using Machine Learning Clustering (기계학습 클러스터링을 이용한 승하차 패턴에 따른 서울시 지하철역 분류)

  • Min, Meekyung
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.18 no.4
    • /
    • pp.13-18
    • /
    • 2018
  • In this study, we classify Seoul metro stations according to boarding and alighting patterns using machine earning technique. The target data is the number of boarding and alighting passengers per hour every day at 233 subway stations from 2008 to 2017 provided by the public data portal. Gaussian mixture model (GMM) and K-means clustering are used as machine learning techniques in order to classify subway stations. The distribution of the boarding time and the alighting time of the passengers can be modeled by the Gaussian mixture model. K-means clustering algorithm is used for unsupervised learning based on the data obtained by GMM modeling. As a result of the research, Seoul metro stations are classified into four groups according to boarding and alighting patterns. The results of this study can be utilized as a basic knowledge for analyzing the characteristics of Seoul subway stations and analyzing it economically, socially and culturally. The method of this research can be applied to public data and big data in areas requiring clustering.

Curriculum Mining Analysis Using Clustering-Based Process Mining (군집화 기반 프로세스 마이닝을 이용한 커리큘럼 마이닝 분석)

  • Joo, Woo-Min;Choi, Jin Young
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.38 no.4
    • /
    • pp.45-55
    • /
    • 2015
  • In this paper, we consider curriculum mining as an application of process mining in the domain of education. The basic objective of the curriculum mining is to construct a registration pattern model by using logs of registration data. However, subject registration patterns of students are very unstructured and complicated, called a spaghetti model, because it has a lot of different cases and high diversity of behaviors. In general, it is typically difficult to develop and analyze registration patterns. In the literature, there was an effort to handle this issue by using clustering based on the features of students and behaviors. However, it is not easy to obtain them in general since they are private and qualitative. Therefore, in this paper, we propose a new framework of curriculum mining applying K-means clustering based on subject attributes to solve the problems caused by unstructured process model obtained. Specifically, we divide subject's attribute data into two parts : categorical and numerical data. Categorical attribute has subject name, class classification, and research field, while numerical attribute has ABEEK goal and semester information. In case of categorical attribute, we suggest a method to quantify them by using binarization. The number of clusters used for K-means clustering, we applied Elbow method using R-squared value representing the variance ratio that can be explained by the number of clusters. The performance of the suggested method was verified by using a log of student registration data from an 'A university' in terms of the simplicity and fitness, which are the typical performance measure of obtained process model in process mining.

Ozone Pollution Patterns and the Relation to Meteorological Conditions in the Greater Seoul Area (수도권지역 오존오염 패턴과 기상학적 특성)

  • Oh In-Bo;Kim Yoo-Keun;Hwang Mi-Kyoung
    • Journal of Korean Society for Atmospheric Environment
    • /
    • v.21 no.3
    • /
    • pp.357-365
    • /
    • 2005
  • The typical patterns of surface $O_3$ pollution and their dependence on meteorology were studied in the Greater Seoul Area (GSA) during warm season (April-September) from 1998 to 2002. In order to classify the $O_3$ pollution patterns, two-stage (average linkage then k-means) clustering technique was employed based on daily maximum $O_3$ concentrations obtained from 53 monitoring sites during high $O_3$ events (118 days). The clustering technique identified four statistically distinct $O_3$ pollution patterns representing the different horizontal distributions and levels of $O_3$ in GSA. The prevailed pattern (93 days, $49.5\%$) distinctly showed the gradient of $49.5\%$ concentrations going from west to east in GSA. Very high $49.5\%$ concentrations throughout GSA (24 days, $12.8\%$) were also found as a significant pattern of severe $O_3$ pollution. In order to understand the characteristics of $O_3$ pollution patterns, the relationship between $O_3$ pollution patterns and meteorological conditions were analyzed using both synoptic charts and surface/upper air data. Each pattern was closely associated with surface wind interacted with synoptic background flow allowing to transport and accumulate $O_3$ and its precursor. In particular, the timing and inland penetration of sea-breeze were apparently found to play very important role in determining $O_3$ distributions.

Movie Recommendation Using Co-Clustering by Infinite Relational Models (Infinite Relational Model 기반 Co-Clustering을 이용한 영화 추천)

  • Kim, Byoung-Hee;Zhang, Byoung-Tak
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.24 no.4
    • /
    • pp.443-449
    • /
    • 2014
  • Preferences of users on movies are observables of various factors that are related with user attributes and movie features. For movie recommendation, analysis methods for relation among users, movies, and preference patterns are mandatory. As a relational analysis tool, we focus on the Infinite Relational Model (IRM) which was introduced as a tool for multiple concept search. We show that IRM-based co-clustering on preference patterns and movie descriptors can be used as the first tool for movie recommender methods, especially content-based filtering approaches. By introducing a set of well-defined tag sets for movies and doing three-way co-clustering on a movie-rating matrix and a movie-tag matrix, we discovered various explainable relations among users and movies. We suggest various usages of IRM-based co-clustering, espcially, for incremental and dynamic recommender systems.

Shoppers' Shopping Path Pattern Analysis using RFID Data (RFID 데이터를 이용한 고객 쇼핑 동선 패턴 분석)

  • Yang, Seungjoon;Jung, In-Chul;Kwon, Young S.
    • Journal of Information Technology Services
    • /
    • v.11 no.sup
    • /
    • pp.61-74
    • /
    • 2012
  • As the retail industry has been challenged by stiff competition, the retailer becomes more interested in better understanding consumers' in-store behavior to gain and sustain competitive advantage. Consumers' shopping paths provide valuable clues to understanding customers' in-store behavior, which has been a long standing research issue in business. This study is to explore the shopping path patterns in a grocery using RFID technology and clustering method. To this end, we designed the RFID systems, affixing active RFID tags to the bottom of grocery carts. The tag emit signal that is received by receptors installed at various location throughout the store. The RFID systems provide the time and location of the cart while consumers shop around the store. The point of sale data are matched with the cart movement records to provide a complete picture of each shopping path. To find the distinctive patterns of consumers' shopping paths, we proposed the distance-index matrix using dijkstra method and normalization method to conduct the clustering in order to handle the problem in measuring the similarity among shopping paths, which is raised by the spatial nature of consumer movement in a grocery. After analyzing the RFID data obtained in one of the groceries in a major Korean retailer, we could successfully identify several distinctive patterns of shopping paths, which prove to provide the valuable implications for store management.

Sales Pattern and Related Product Attributes of T-shirts (티셔츠 상품의 판매패턴과 연관된 상품속성)

  • Chae, Jin Mie;Kim, Eun Hie
    • Journal of the Korean Society of Clothing and Textiles
    • /
    • v.44 no.6
    • /
    • pp.1053-1069
    • /
    • 2020
  • This study examined the sales pattern relationship with respect to product attributes to propose sales forecasting for fashion products. We analyzed 537 SKU sales data of T-shirts in the domestic sports brand using SAS program. The sales pattern of fashion products fluctuated and were influenced by exogenous factors; therefore, we removed the influence of exogenous factors found to be price discounts and holiday effects as a result of regression analysis. In addition, it was difficult to predict sales using the sales patterns of the same product since fashion products were released as new products every year. Therefore, the forecasting model was proposed using sales patterns of related product attributes when attributes were considered descriptive variables. We classified sales patterns using K-means clustering in order to explain the relationship between sales patterns and product attributes along with creating a decision tree classifier using attributes as input and sales patterns as output. As a result, the sales patterns of T-shirts were clustered into six types that featured the characteristic shape of peak and slope. It was also associated with the combination of product attributes and their values in regards to the proposed sales pattern prediction model.

GGenre Pattern based User Clustering for Performance Improvement of Collaborative Filtering System (협업적 여과 시스템의 성능 향상을 위한 장르 패턴 기반 사용자 클러스터링)

  • Choi, Ja-Hyun;Ha, In-Ay;Hong, Myung-Duk;Jo, Geun-Sik
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.11
    • /
    • pp.17-24
    • /
    • 2011
  • Collaborative filtering system is the clustering about user is built and then based on that clustering results will recommend the preferred item to the user. However, building user clustering is time consuming and also once the users evaluate and give feedback about the film then rebuilding the system is not simple. In this paper, genre pattern of movie recommendation systems is being used and in order to simplify and reduce time of rebuilding user clustering. A Frequent pattern networks is used and then extracts user preference genre patterns and through that extracted patterns user clustering will be built. Through built the clustering for all neighboring users to collaborative filtering is applied and then recommends movies to the user. When receiving user information feedback, traditional collaborative filtering is to rebuild the clustering for all neighbouring users to research and do the clustering. However by using frequent pattern Networks, through user clustering based on genre pattern, collaborative filtering is applied and when rebuilding user clustering inquiry limited by search time can be reduced. After receiving user information feedback through proposed user clustering based on genre pattern, the time that need to spent on re-establishing user clustering can be reduced and also enable the possibility of traditional collaborative filtering systems and recommendation of a similar performance.

A Study on Performance Evaluation of Clustering Algorithms using Neural and Statistical Method (클러스터링 성능평가: 신경망 및 통계적 방법)

  • 윤석환;신용백
    • Journal of the Korean Professional Engineers Association
    • /
    • v.29 no.2
    • /
    • pp.71-79
    • /
    • 1996
  • This paper evaluates the clustering performance of a neural network and a statistical method. Algorithms which are used in this paper are the GLVQ(Generalized Loaming vector Quantization) for a neural method and the k -means algorithm for a statistical clustering method. For comparison of two methods, we calculate the Rand's c statistics. As a result, the mean of c value obtained with the GLVQ is higher than that obtained with the k -means algorithm, while standard deviation of c value is lower. Experimental data sets were the Fisher's IRIS data and patterns extracted from handwritten numerals.

  • PDF

Customer Behavior Pattern Discovery by Adaptive Clustering Based on Swarm Intelligence

  • Dai, Weihui
    • Journal of Information Technology Applications and Management
    • /
    • v.17 no.1
    • /
    • pp.127-139
    • /
    • 2010
  • Customer behavior pattern discovery is the fundament for conducting customer oriented services and the services management. But, the composition, need, interest and experience of customers may be continuously changing, thereof lead to the difficulty in refining a stable description of their consistent behavior pattern. This paper presented a new method for the behavior pattern discovery from a changing collection of customers. It was originally inspired from the swarm intelligence of ant colony. By the adaptive clustering, some typical behavior patterns which reflect the characteristics of related customer clusters can extracted dynamically and adaptively.

  • PDF