• Title/Summary/Keyword: Hybrid Clustering

Search Result 176, Processing Time 0.028 seconds

Scalable Collaborative Filtering Technique based on Adaptive Clustering (적응형 군집화 기반 확장 용이한 협업 필터링 기법)

  • Lee, O-Joun;Hong, Min-Sung;Lee, Won-Jin;Lee, Jae-Dong
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.73-92
    • /
    • 2014
  • An Adaptive Clustering-based Collaborative Filtering Technique was proposed to solve the fundamental problems of collaborative filtering, such as cold-start problems, scalability problems and data sparsity problems. Previous collaborative filtering techniques were carried out according to the recommendations based on the predicted preference of the user to a particular item using a similar item subset and a similar user subset composed based on the preference of users to items. For this reason, if the density of the user preference matrix is low, the reliability of the recommendation system will decrease rapidly. Therefore, the difficulty of creating a similar item subset and similar user subset will be increased. In addition, as the scale of service increases, the time needed to create a similar item subset and similar user subset increases geometrically, and the response time of the recommendation system is then increased. To solve these problems, this paper suggests a collaborative filtering technique that adapts a condition actively to the model and adopts the concepts of a context-based filtering technique. This technique consists of four major methodologies. First, items are made, the users are clustered according their feature vectors, and an inter-cluster preference between each item cluster and user cluster is then assumed. According to this method, the run-time for creating a similar item subset or user subset can be economized, the reliability of a recommendation system can be made higher than that using only the user preference information for creating a similar item subset or similar user subset, and the cold start problem can be partially solved. Second, recommendations are made using the prior composed item and user clusters and inter-cluster preference between each item cluster and user cluster. In this phase, a list of items is made for users by examining the item clusters in the order of the size of the inter-cluster preference of the user cluster, in which the user belongs, and selecting and ranking the items according to the predicted or recorded user preference information. Using this method, the creation of a recommendation model phase bears the highest load of the recommendation system, and it minimizes the load of the recommendation system in run-time. Therefore, the scalability problem and large scale recommendation system can be performed with collaborative filtering, which is highly reliable. Third, the missing user preference information is predicted using the item and user clusters. Using this method, the problem caused by the low density of the user preference matrix can be mitigated. Existing studies on this used an item-based prediction or user-based prediction. In this paper, Hao Ji's idea, which uses both an item-based prediction and user-based prediction, was improved. The reliability of the recommendation service can be improved by combining the predictive values of both techniques by applying the condition of the recommendation model. By predicting the user preference based on the item or user clusters, the time required to predict the user preference can be reduced, and missing user preference in run-time can be predicted. Fourth, the item and user feature vector can be made to learn the following input of the user feedback. This phase applied normalized user feedback to the item and user feature vector. This method can mitigate the problems caused by the use of the concepts of context-based filtering, such as the item and user feature vector based on the user profile and item properties. The problems with using the item and user feature vector are due to the limitation of quantifying the qualitative features of the items and users. Therefore, the elements of the user and item feature vectors are made to match one to one, and if user feedback to a particular item is obtained, it will be applied to the feature vector using the opposite one. Verification of this method was accomplished by comparing the performance with existing hybrid filtering techniques. Two methods were used for verification: MAE(Mean Absolute Error) and response time. Using MAE, this technique was confirmed to improve the reliability of the recommendation system. Using the response time, this technique was found to be suitable for a large scaled recommendation system. This paper suggested an Adaptive Clustering-based Collaborative Filtering Technique with high reliability and low time complexity, but it had some limitations. This technique focused on reducing the time complexity. Hence, an improvement in reliability was not expected. The next topic will be to improve this technique by rule-based filtering.

Integrating Color, Texture and Edge Features for Content-Based Image Retrieval (내용기반 이미지 검색을 위한 색상, 텍스쳐, 에지 기능의 통합)

  • Ma Ming;Park Dong-Won
    • Science of Emotion and Sensibility
    • /
    • v.7 no.4
    • /
    • pp.57-65
    • /
    • 2004
  • In this paper, we present a hybrid approach which incorporates color, texture and shape in content-based image retrieval. Colors in each image are clustered into a small number of representative colors. The feature descriptor consists of the representative colors and their percentages in the image. A similarity measure similar to the cumulative color histogram distance measure is defined for this descriptor. The co-occurrence matrix as a statistical method is used for texture analysis. An optimal set of five statistical functions are extracted from the co-occurrence matrix of each image, in order to render the feature vector for eachimage maximally informative. The edge information captured within edge histograms is extracted after a pre-processing phase that performs color transformation, quantization, and filtering. The features where thus extracted and stored within feature vectors and were later compared with an intersection-based method. The content-based retrieval system is tested to be effective in terms of retrieval and scalability through experimental results and precision-recall analysis.

  • PDF

Analysis of Energy Consumption and Processing Delay of Wireless Sensor Networks according to the Characteristic of Applications (응용프로그램의 특성에 따른 무선센서 네트워크의 에너지 소모와 처리 지연 분석)

  • Park, Chong Myung;Han, Young Tak;Jeon, Soobin;Jung, Inbum
    • Journal of KIISE
    • /
    • v.42 no.3
    • /
    • pp.399-407
    • /
    • 2015
  • Wireless sensor networks are used for data collection and processing from the surrounding environment for various applications. Since wireless sensor nodes operate on low computing power, restrictive battery capacity, and low network bandwidth, their architecture model has greatly affected the performance of applications. If applications have high computation complexity or require the real-time processing, the centralized architecture in wireless sensor networks have a delay in data processing. Otherwise, if applications only performed simple data collection for long period, the distributed architecture wasted battery energy in wireless sensors. In this paper, the energy consumption and processing delay were analyzed in centralized and distributed sensor networks. In addition, we proposed a new hybrid architecture for wireless sensor networks. According to the characteristic of applications, the proposed method had the optimal number of wireless sensors in wireless sensor networks.

Discrimination of cultivation ages and cultivars of ginseng leaves using Fourier transform infrared spectroscopy combined with multivariate analysis

  • Kwon, Yong-Kook;Ahn, Myung Suk;Park, Jong Suk;Liu, Jang Ryol;In, Dong Su;Min, Byung Whan;Kim, Suk Weon
    • Journal of Ginseng Research
    • /
    • v.38 no.1
    • /
    • pp.52-58
    • /
    • 2014
  • To determine whether Fourier transform (FT)-IR spectral analysis combined with multivariate analysis of whole-cell extracts from ginseng leaves can be applied as a high-throughput discrimination system of cultivation ages and cultivars, a total of total 480 leaf samples belonging to 12 categories corresponding to four different cultivars (Yunpung, Kumpung, Chunpung, and an open-pollinated variety) and three different cultivation ages (1 yr, 2 yr, and 3 yr) were subjected to FT-IR. The spectral data were analyzed by principal component analysis and partial least squares-discriminant analysis. A dendrogram based on hierarchical clustering analysis of the FT-IR spectral data on ginseng leaves showed that leaf samples were initially segregated into three groups in a cultivation age-dependent manner. Then, within the same cultivation age group, leaf samples were clustered into four subgroups in a cultivar-dependent manner. The overall prediction accuracy for discrimination of cultivars and cultivation ages was 94.8% in a cross-validation test. These results clearly show that the FT-IR spectra combined with multivariate analysis from ginseng leaves can be applied as an alternative tool for discriminating of ginseng cultivars and cultivation ages. Therefore, we suggest that this result could be used as a rapid and reliable F1 hybrid seed-screening tool for accelerating the conventional breeding of ginseng.

A Novel Approach towards use of Adaptive Multiple Kernels in Interval Type-2 Possibilistic Fuzzy C-Means (적응적 Multiple Kernels을 이용한 Interval Type-2 Possibilistic Fuzzy C-Means 방법)

  • Joo, Won-Hee;Rhee, Frank Chung-Hoon
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.24 no.5
    • /
    • pp.529-535
    • /
    • 2014
  • In this paper, we propose a hybrid approach towards multiple kernels interval type-2 possibilistic fuzzy C-means(PFCM) based on interval type-2 possibilistic fuzzy c-means(IT2PFCM) and possibilistic fuzzy c-means using multiple kernels( PFCM-MK). In case of noisy data or overlapping cluster prototypes, fuzzy C-means gives poor performance in comparison to possibilistic fuzzy C-means(PFCM). Moreover, to address the uncertainty associated with fuzzifier parameter m, interval type-2 possibilistic fuzzy C-means(PFCM) is used. Most of the practical data available are complex and non-linearly separable. In such cases using Gaussian kernels proves helpful. Therefore, in order to overcome all these issues, we have integrated multiple kernels possibilistic fuzzy C-means(PFCM) into interval type-2 possibilistic fuzzy C-means(IT2PFCM) and propose the idea of multiple kernels based interval type-2 possibilistic fuzzy C-means(IT2PFCM-MK).

Probabilistic Reinterpretation of Collaborative Filtering Approaches Considering Cluster Information of Item Contents (항목 내용물의 클러스터 정보를 고려한 협력필터링 방법의 확률적 재해석)

  • Kim, Byeong-Man;Li, Qing;Oh, Sang-Yeop
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.9
    • /
    • pp.901-911
    • /
    • 2005
  • With the development of e-commerce and the proliferation of easily accessible information, information filtering has become a popular technique to prune large information spaces so that users are directed toward those items that best meet their needs and preferences. While many collaborative filtering systems have succeeded in capturing the similarities among users or items based on ratings to provide good recommendations, there are still some challenges for them to be more efficient, especially the user bias problem, non-transitive association problem and cold start problem. Those three problems impede us to capture more accurate similarities among users or items. In this paper, we provide probabilistic model approaches for UCHM and ICHM which are suggested to solve the addressed problems in hopes of achieving better performance. In this probabilistic model, objects (users or items) are classified into groups and predictions are made for users considering the Gaussian distribution of user ratings. Experiments on a real-word data set illustrate that our proposed approach is comparable with others.

The Horizon Run 5 Cosmological Hydrodynamical Simulation: Probing Galaxy Formation from Kilo- to Giga-parsec Scales

  • Lee, Jaehyun;Shin, Jihey;Snaith, Owain N.;Kim, Yonghwi;Few, C. Gareth;Devriendt, Julien;Dubois, Yohan;Cox, Leah M.;Hong, Sungwook E.;Kwon, Oh-Kyoung;Park, Chan;Pichon, Christophe;Kim, Juhan;Gibson, Brad K.;Park, Changbom
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.45 no.1
    • /
    • pp.38.2-38.2
    • /
    • 2020
  • Horizon Run 5 (HR5) is a cosmological hydrodynamical simulation which captures the properties of the Universe on a Gpc scale while achieving a resolution of 1 kpc. This enormous dynamic range allows us to simultaneously capture the physics of the cosmic web on very large scales and account for the formation and evolution of dwarf galaxies on much smaller scales. Inside the simulation box. we zoom-in on a high-resolution cuboid region with a volume of 1049 × 114 × 114 Mpc3. The subgrid physics chosen to model galaxy formation includes radiative heating/cooling, reionization, star formation, supernova feedback, chemical evolution tracking the enrichment of oxygen and iron, the growth of supermassive black holes and feedback from active galactic nuclei (AGN) in the form of a dual jet-heating mode. For this simulation we implemented a hybrid MPI-OpenMP version of the RAMSES code, specifically targeted for modern many-core many thread parallel architectures. For the post-processing, we extended the Friends-of-Friend (FoF) algorithm and developed a new galaxy finder to analyse the large outputs of HR5. The simulation successfully reproduces many observations, such as the cosmic star formation history, connectivity of galaxy distribution and stellar mass functions. The simulation also indicates that hydrodynamical effects on small scales impact galaxy clustering up to very large scales near and beyond the baryonic acoustic oscillation (BAO) scale. Hence, caution should be taken when using that scale as a cosmic standard ruler: one needs to carefully understand the corresponding biases. The simulation is expected to be an invaluable asset for the interpretation of upcoming deep surveys of the Universe.

  • PDF

Synoptic-Scale Meteorological Clustering Analysis of Volcanic Ash Inflow into the Korean Peninsula Following the Eruption of Mt. Baekdu

  • Da Eun Chae;Hearim Jeong;Soon-Hwan Lee
    • Journal of Environmental Science International
    • /
    • v.33 no.8
    • /
    • pp.591-604
    • /
    • 2024
  • To investigate the frequency and trajectories of volcanic ash from Mt. Baekdu reaching the Korean Peninsula, a forward trajectory analysis was conducted using the Hybrid Single-Particle Lagrangian Integrated Trajectory (HYSPLIT) model. Through a cluster analysis of air parcel trajectories, we identified the main pathways of the volcanic ash from Mt. Baekdu entering the Korean Peninsula and analyzed the synoptic meteorological conditions on those days. The frequency of volcanic ash reaching the Korean Peninsula was 82 times at an altitude of 1000 m and 70 times at 2000 m, with an increasing trend from 2016 to 2022. This increase is attributed to the weakening of westerly winds and the strengthening of north-south winds due to global warming. Five and three trajectory clusters were classified at 1000 m and 2000 m, respectively. At a starting altitude of 1000 m, most air parcels originating from Mt. Baekdu entered the Korean Peninsula under weather conditions (C2, C3) where the pressure gradient from the northwest to the southeast was small, resulting in weak northerly winds. C2 and C3 showed shorter trajectories, which occurred in all seasons, except summer. At a starting altitude of 2000 m, air parcels mostly passed over the Korean Peninsula in a synoptic pattern similar to that at 1000 m in altitude; however, the air parcels had simpler paths and less frequent inflow. C2, at a starting altitude of 2000 m, originates from Mount Baekdu, crosses the center of the Korean Peninsula, and continues to the central region. At a starting altitude of 1000 m, volcanic ash can enter the Korean Peninsula when there is no strong low-pressure system to the southeast of the Korean Peninsula, whereas at 2000 m, volcanic ash can enter the Korean Peninsula when the Siberian high-pressure system is weak.

SKU recommender system for retail stores that carry identical brands using collaborative filtering and hybrid filtering (협업 필터링 및 하이브리드 필터링을 이용한 동종 브랜드 판매 매장간(間) 취급 SKU 추천 시스템)

  • Joe, Denis Yongmin;Nam, Kihwan
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.4
    • /
    • pp.77-110
    • /
    • 2017
  • Recently, the diversification and individualization of consumption patterns through the web and mobile devices based on the Internet have been rapid. As this happens, the efficient operation of the offline store, which is a traditional distribution channel, has become more important. In order to raise both the sales and profits of stores, stores need to supply and sell the most attractive products to consumers in a timely manner. However, there is a lack of research on which SKUs, out of many products, can increase sales probability and reduce inventory costs. In particular, if a company sells products through multiple in-store stores across multiple locations, it would be helpful to increase sales and profitability of stores if SKUs appealing to customers are recommended. In this study, the recommender system (recommender system such as collaborative filtering and hybrid filtering), which has been used for personalization recommendation, is suggested by SKU recommendation method of a store unit of a distribution company that handles a homogeneous brand through a plurality of sales stores by country and region. We calculated the similarity of each store by using the purchase data of each store's handling items, filtering the collaboration according to the sales history of each store by each SKU, and finally recommending the individual SKU to the store. In addition, the store is classified into four clusters through PCA (Principal Component Analysis) and cluster analysis (Clustering) using the store profile data. The recommendation system is implemented by the hybrid filtering method that applies the collaborative filtering in each cluster and measured the performance of both methods based on actual sales data. Most of the existing recommendation systems have been studied by recommending items such as movies and music to the users. In practice, industrial applications have also become popular. In the meantime, there has been little research on recommending SKUs for each store by applying these recommendation systems, which have been mainly dealt with in the field of personalization services, to the store units of distributors handling similar brands. If the recommendation method of the existing recommendation methodology was 'the individual field', this study expanded the scope of the store beyond the individual domain through a plurality of sales stores by country and region and dealt with the store unit of the distribution company handling the same brand SKU while suggesting a recommendation method. In addition, if the existing recommendation system is limited to online, it is recommended to apply the data mining technique to develop an algorithm suitable for expanding to the store area rather than expanding the utilization range offline and analyzing based on the existing individual. The significance of the results of this study is that the personalization recommendation algorithm is applied to a plurality of sales outlets handling the same brand. A meaningful result is derived and a concrete methodology that can be constructed and used as a system for actual companies is proposed. It is also meaningful that this is the first attempt to expand the research area of the academic field related to the existing recommendation system, which was focused on the personalization domain, to a sales store of a company handling the same brand. From 05 to 03 in 2014, the number of stores' sales volume of the top 100 SKUs are limited to 52 SKUs by collaborative filtering and the hybrid filtering method SKU recommended. We compared the performance of the two recommendation methods by totaling the sales results. The reason for comparing the two recommendation methods is that the recommendation method of this study is defined as the reference model in which offline collaborative filtering is applied to demonstrate higher performance than the existing recommendation method. The results of this model are compared with the Hybrid filtering method, which is a model that reflects the characteristics of the offline store view. The proposed method showed a higher performance than the existing recommendation method. The proposed method was proved by using actual sales data of large Korean apparel companies. In this study, we propose a method to extend the recommendation system of the individual level to the group level and to efficiently approach it. In addition to the theoretical framework, which is of great value.

Bankruptcy Type Prediction Using A Hybrid Artificial Neural Networks Model (하이브리드 인공신경망 모형을 이용한 부도 유형 예측)

  • Jo, Nam-ok;Kim, Hyun-jung;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.3
    • /
    • pp.79-99
    • /
    • 2015
  • The prediction of bankruptcy has been extensively studied in the accounting and finance field. It can have an important impact on lending decisions and the profitability of financial institutions in terms of risk management. Many researchers have focused on constructing a more robust bankruptcy prediction model. Early studies primarily used statistical techniques such as multiple discriminant analysis (MDA) and logit analysis for bankruptcy prediction. However, many studies have demonstrated that artificial intelligence (AI) approaches, such as artificial neural networks (ANN), decision trees, case-based reasoning (CBR), and support vector machine (SVM), have been outperforming statistical techniques since 1990s for business classification problems because statistical methods have some rigid assumptions in their application. In previous studies on corporate bankruptcy, many researchers have focused on developing a bankruptcy prediction model using financial ratios. However, there are few studies that suggest the specific types of bankruptcy. Previous bankruptcy prediction models have generally been interested in predicting whether or not firms will become bankrupt. Most of the studies on bankruptcy types have focused on reviewing the previous literature or performing a case study. Thus, this study develops a model using data mining techniques for predicting the specific types of bankruptcy as well as the occurrence of bankruptcy in Korean small- and medium-sized construction firms in terms of profitability, stability, and activity index. Thus, firms will be able to prevent it from occurring in advance. We propose a hybrid approach using two artificial neural networks (ANNs) for the prediction of bankruptcy types. The first is a back-propagation neural network (BPN) model using supervised learning for bankruptcy prediction and the second is a self-organizing map (SOM) model using unsupervised learning to classify bankruptcy data into several types. Based on the constructed model, we predict the bankruptcy of companies by applying the BPN model to a validation set that was not utilized in the development of the model. This allows for identifying the specific types of bankruptcy by using bankruptcy data predicted by the BPN model. We calculated the average of selected input variables through statistical test for each cluster to interpret characteristics of the derived clusters in the SOM model. Each cluster represents bankruptcy type classified through data of bankruptcy firms, and input variables indicate financial ratios in interpreting the meaning of each cluster. The experimental result shows that each of five bankruptcy types has different characteristics according to financial ratios. Type 1 (severe bankruptcy) has inferior financial statements except for EBITDA (earnings before interest, taxes, depreciation, and amortization) to sales based on the clustering results. Type 2 (lack of stability) has a low quick ratio, low stockholder's equity to total assets, and high total borrowings to total assets. Type 3 (lack of activity) has a slightly low total asset turnover and fixed asset turnover. Type 4 (lack of profitability) has low retained earnings to total assets and EBITDA to sales which represent the indices of profitability. Type 5 (recoverable bankruptcy) includes firms that have a relatively good financial condition as compared to other bankruptcy types even though they are bankrupt. Based on the findings, researchers and practitioners engaged in the credit evaluation field can obtain more useful information about the types of corporate bankruptcy. In this paper, we utilized the financial ratios of firms to classify bankruptcy types. It is important to select the input variables that correctly predict bankruptcy and meaningfully classify the type of bankruptcy. In a further study, we will include non-financial factors such as size, industry, and age of the firms. Thus, we can obtain realistic clustering results for bankruptcy types by combining qualitative factors and reflecting the domain knowledge of experts.