• 제목/요약/키워드: Two-step Clustering

검색결과 85건 처리시간 0.022초

The cluster-indexing collaborative filtering recommendation

  • Park, Tae-Hyup;Ingoo Han
    • 한국지능정보시스템학회:학술대회논문집
    • /
    • 한국지능정보시스템학회 2003년도 춘계학술대회
    • /
    • pp.400-409
    • /
    • 2003
  • Collaborative filtering (CF) recommendation is a knowledge sharing technology for distribution of opinions and facilitating contacts in network society between people with similar interests. The main concerns of the CF algorithm are about prediction accuracy, speed of response time, problem of data sparsity, and scalability. In general, the efforts of improving prediction algorithms and lessening response time are decoupled. We propose a three-step CF recommendation model which is composed of profiling, inferring, and predicting steps while considering prediction accuracy and computing speed simultaneously. This model combines a CF algorithm with two machine learning processes, SOM (Self-Organizing Map) and CBR (Case Based Reasoning) by changing an unsupervised clustering problem into a supervised user preference reasoning problem, which is a novel approach for the CF recommendation field. This paper demonstrates the utility of the CF recommendation based on SOM cluster-indexing CBR with validation against control algorithms through an open dataset of user preference.

  • PDF

Noisy Band Removal Using Band Correlation in Hyperspectral lmages

  • Huan, Nguyen Van;Kim, Hak-Il
    • 대한원격탐사학회지
    • /
    • 제25권3호
    • /
    • pp.263-270
    • /
    • 2009
  • Noise band removal is a crucial step before spectral matching since the noise bands can distort the typical shape of spectral reflectance, leading to degradation on the matching results. This paper proposes a statistical noise band removal method for hyperspectral data using the correlation coefficient between two bands. The correlation coefficient measures the strength and direction of a linear relationship between two random variables. Considering each band of the hyperspectral data as a random variable, the correlation between two signal bands is high; existence of a noisy band will produce a low correlation due to ill-correlativeness and undirected ness. The unsupervised k-nearest neighbor clustering method is implemented in accordance with three well-accepted spectral matching measures, namely ED, SAM and SID in order to evaluate the validation of the proposed method. This paper also proposes a hierarchical scheme of combining those measures. Finally, a separability assessment based on the between-class and the within-class scatter matrices is followed to evaluate the applicability of the proposed noise band removal method. Also, the paper brings out a comparison for spectral matching measures. The experimental results conducted on a 228-band hyperspectral data show that while the SAM measure is rather resistant, the performance of SID measure is more sensitive to noise.

이단계 군집분석에 의한 농촌관광 편의시설 유형별 소비자 선호 결정요인 (Determinants of Consumer Preference by type of Accommodation: Two Step Cluster Analysis)

  • 박덕병;윤유식;이민수
    • 마케팅과학연구
    • /
    • 제17권3호
    • /
    • pp.1-19
    • /
    • 2007
  • 본 연구에서는 농촌관광 방문객에게 제공되는 편의시설을 유형화하고 어떤 특징을 가진 방문객이 어떤 편의시설을 선호하는지를 규명하기 위한 방법과 그 분석결과를 제시하였다. 이를 위하여 우선 2단계 군집분석법을 사용하여 농촌관광 편의시설을 유형화하였다. 그 다음으로 군집분석에 사용되는 변인이 범주형 변인이 있을 경우 전통적인 군집분석 방법을 적용할 수 없기 때문에 2단계 군집분석을 하였다. 본 연구는 2단계 군집분석법이 범주형 변인으로 측정된 농촌관광의 편의시설을 유형화하는 데 매우 유용하다는 것을 보여 주고 있다. 다중로짓 모형을 사용하여 특정 편의시설 유형을 선호할 확률에 영향을 미치는 농촌관광 방문자의 사회인구학적 특성과 여행특성을 규명하였다. 즉, 다중로짓 모형을 통해 참조항(일반농가형)으로 설정된 편의시설 유형에 비해 특정 편의시설을 선호할 확률에 영향을 미치는 소비자의 특성을 규명할 수 있다는 것이 본 연구의 특징이다.

  • PDF

Online Video Synopsis via Multiple Object Detection

  • Lee, JaeWon;Kim, DoHyeon;Kim, Yoon
    • 한국컴퓨터정보학회논문지
    • /
    • 제24권8호
    • /
    • pp.19-28
    • /
    • 2019
  • In this paper, an online video summarization algorithm based on multiple object detection is proposed. As crime has been on the rise due to the recent rapid urbanization, the people's appetite for safety has been growing and the installation of surveillance cameras such as a closed-circuit television(CCTV) has been increasing in many cities. However, it takes a lot of time and labor to retrieve and analyze a huge amount of video data from numerous CCTVs. As a result, there is an increasing demand for intelligent video recognition systems that can automatically detect and summarize various events occurring on CCTVs. Video summarization is a method of generating synopsis video of a long time original video so that users can watch it in a short time. The proposed video summarization method can be divided into two stages. The object extraction step detects a specific object in the video and extracts a specific object desired by the user. The video summary step creates a final synopsis video based on the objects extracted in the previous object extraction step. While the existed methods do not consider the interaction between objects from the original video when generating the synopsis video, in the proposed method, new object clustering algorithm can effectively maintain interaction between objects in original video in synopsis video. This paper also proposed an online optimization method that can efficiently summarize the large number of objects appearing in long-time videos. Finally, Experimental results show that the performance of the proposed method is superior to that of the existing video synopsis algorithm.

효율적인 그래프 기반 2단계 슈퍼픽셀 생성 방법 (Efficient graph-based two-stage superpixel generation method)

  • 박상현
    • 한국정보통신학회논문지
    • /
    • 제23권12호
    • /
    • pp.1520-1527
    • /
    • 2019
  • 컴퓨터 비전 분야에서 영상의 특성을 유지하면서 영상을 간소화하여 계산량을 줄이는 방법으로 전처리 단계에서 슈퍼픽셀 방법이 많이 사용되고 있다. 하지만 슈퍼픽셀 단계에서는 영상의 특성을 고려하는 것 보다는 화소의 값을 기준으로 일정한 크기와 형태의 슈퍼픽셀을 생성하는 것이 일반적이다. 본 논문에서는 응용에 맞게 영상의 특성을 고려하여 슈퍼픽셀을 생성할 수 있는 방법을 제안한다. 제안하는 방법은 두 단계로 이루어지며, 첫 번째 단계에서 영상을 과분할 하여 영상의 경계 정보들이 잘 보존되게 한다. 두 번째 단계에서는 과분할 된 슈퍼픽셀들을 유사도를 기준으로 병합하여 원하는 개수의 슈퍼픽셀을 생성한다. 이때 슈퍼픽셀의 최대 크기를 제한함으로써 슈퍼픽셀의 형태를 제어한다. 실험 결과는 제안하는 방법으로 생성된 슈퍼픽셀이 기존 방법에 의해 생성된 슈퍼픽셀 보다 정확하게 경계 정보를 보존하는 것을 보여준다.

Radial Basis 함수를 이용한 동적 - 단기 전력수요예측 모형의 개발 (The Development of Dynamic Forecasting Model for Short Term Power Demand using Radial Basis Function Network)

  • 민준영;조형기
    • 한국정보처리학회논문지
    • /
    • 제4권7호
    • /
    • pp.1749-1758
    • /
    • 1997
  • 전력수요의 예측은 예측기간에 따라 중장기 전력수요 예측과 단기 부하 예측으로 구분할 수 있다. 기존의 단기 부하예측은 주로 역전파 알고리즘(back propagation algorithm)다층퍼셉트론을 이용하여 예측을 하였으나 이는 학습시간이 많이 걸릴 뿐만 아니라 학습도중에 지역최소점(local minima)에 빠져 학습이 계속되지 못한다는 문제가 있다. 본 논문은 이러한 역전파 알고리즘의 문제점을 해결할 수 있는 방법으로 Radial Basis 함수(Radial Basis Function)를 이용하여 동적 단기부하 예측 모형을 제안한다. Radial Basis 함수는 하나의 은닉층(hidden layer)을 갖고 있으며, 전방향(feed-forward)학습을 한다는 특징이 있다. 본 논문에서 제안한 단기 부하 예측모형은 학습을 하기 위하여 시간대별 부하량을 클러스터링 하고, 이 클러스터의 중심값을 Radial Basis 함수의 은닉층으로 하여 학습을 한 다음 예측하고자 하는 패턴을 한 단위로 하여 시단대별로 예측하였다. 기존의 연구에서의 클러스터링 방법으로는 통계학의 K-Means 방법이나 Kohonen의 LVQ(Learning Vector Quantization)을 주로 이용하였으나 본 논문에서는 패턴의 분류에 있어서 다른 알고리즘보다 편차가 작은 Pal, et. al.의 GLVQ(Generalized LVQ) 알고리즘을 이용하였다. 본 논문에서 이용한 데이타는 1995년 3월 1일-3일, 6월 1일-3일, 7월 1일-3일, 9월 1일-3일, 11월 1일-3일의 72시간 데이타를 입력하여 월별 4일의 24시간의 예측시간으로 예측하였다. 실험결과 월별 1일과 3일까지의 학습데이타로 1시간 후의 부하량을 24시간동안 예측한 결과 1.3795%의 평균 오차율로 예측하였다.

  • PDF

Underdetermined Blind Source Separation from Time-delayed Mixtures Based on Prior Information Exploitation

  • Zhang, Liangjun;Yang, Jie;Guo, Zhiqiang;Zhou, Yanwei
    • Journal of Electrical Engineering and Technology
    • /
    • 제10권5호
    • /
    • pp.2179-2188
    • /
    • 2015
  • Recently, many researches have been done to solve the challenging problem of Blind Source Separation (BSS) problems in the underdetermined cases, and the “Two-step” method is widely used, which estimates the mixing matrix first and then extracts the sources. To estimate the mixing matrix, conventional algorithms such as Single-Source-Points (SSPs) detection only exploits the sparsity of original signals. This paper proposes a new underdetermined mixing matrix estimation method for time-delayed mixtures based on the receiver prior exploitation. The prior information is extracted from the specific structure of the complex-valued mixing matrix, which is used to derive a special criterion to determine the SSPs. Moreover, after selecting the SSPs, Agglomerative Hierarchical Clustering (AHC) is used to automaticly cluster, suppress, and estimate all the elements of mixing matrix. Finally, a convex-model based subspace method is applied for signal separation. Simulation results show that the proposed algorithm can estimate the mixing matrix and extract the original source signals with higher accuracy especially in low SNR environments, and does not need the number of sources before hand, which is more reliable in the real non-cooperative environment.

A Bibliometric Approach for Department-Level Disciplinary Analysis and Science Mapping of Research Output Using Multiple Classification Schemes

  • Gautam, Pitambar
    • Journal of Contemporary Eastern Asia
    • /
    • 제18권1호
    • /
    • pp.7-29
    • /
    • 2019
  • This study describes an approach for comparative bibliometric analysis of scientific publications related to (i) individual or several departments comprising a university, and (ii) broader integrated subject areas using multiple disciplinary schemes. It uses a custom dataset of scientific publications (ca. 15,000 articles and reviews, published during 2009-2013, and recorded in the Web of Science Core Collections) with author affiliations to the research departments, dedicated to science, technology, engineering, mathematics, and medicine (STEMM), of a comprehensive university. The dataset was subjected, at first, to the department level and discipline level analyses using the newly available KAKEN-L3 classification (based on MEXT/JSPS Grants-in-Aid system), hierarchical clustering, correspondence analysis to decipher the major departmental and disciplinary clusters, and visualization of the department-discipline relationships using two-dimensional stacked bar diagrams. The next step involved the creation of subsets covering integrated subject areas and a comparative analysis of departmental contributions to a specific area (medical, health and life science) using several disciplinary schemes: Essential Science Indicators (ESI) 22 research fields, SCOPUS 27 subject areas, OECD Frascati 38 subordinate research fields, and KAKEN-L3 66 subject categories. To illustrate the effective use of the science mapping techniques, the same subset for medical, health and life science area was subjected to network analyses for co-occurrences of keywords, bibliographic coupling of the publication sources, and co-citation of sources in the reference lists. The science mapping approach demonstrates the ways to extract information on the prolific research themes, the most frequently used journals for publishing research findings, and the knowledge base underlying the research activities covered by the publications concerned.

Comparison of genome-wide association and genomic prediction methods for milk production traits in Korean Holstein cattle

  • Lee, SeokHyun;Dang, ChangGwon;Choy, YunHo;Do, ChangHee;Cho, Kwanghyun;Kim, Jongjoo;Kim, Yousam;Lee, Jungjae
    • Asian-Australasian Journal of Animal Sciences
    • /
    • 제32권7호
    • /
    • pp.913-921
    • /
    • 2019
  • Objective: The objectives of this study were to compare identified informative regions through two genome-wide association study (GWAS) approaches and determine the accuracy and bias of the direct genomic value (DGV) for milk production traits in Korean Holstein cattle, using two genomic prediction approaches: single-step genomic best linear unbiased prediction (ss-GBLUP) and Bayesian Bayes-B. Methods: Records on production traits such as adjusted 305-day milk (MY305), fat (FY305), and protein (PY305) yields were collected from 265,271 first parity cows. After quality control, 50,765 single-nucleotide polymorphic genotypes were available for analysis. In GWAS for ss-GBLUP (ssGWAS) and Bayes-B (BayesGWAS), the proportion of genetic variance for each 1-Mb genomic window was calculated and used to identify informative genomic regions. Accuracy of the DGV was estimated by a five-fold cross-validation with random clustering. As a measure of accuracy for DGV, we also assessed the correlation between DGV and deregressed-estimated breeding value (DEBV). The bias of DGV for each method was obtained by determining regression coefficients. Results: A total of nine and five significant windows (1 Mb) were identified for MY305 using ssGWAS and BayesGWAS, respectively. Using ssGWAS and BayesGWAS, we also detected multiple significant regions for FY305 (12 and 7) and PY305 (14 and 2), respectively. Both single-step DGV and Bayes DGV also showed somewhat moderate accuracy ranges for MY305 (0.32 to 0.34), FY305 (0.37 to 0.39), and PY305 (0.35 to 0.36) traits, respectively. The mean biases of DGVs determined using the single-step and Bayesian methods were $1.50{\pm}0.21$ and $1.18{\pm}0.26$ for MY305, $1.75{\pm}0.33$ and $1.14{\pm}0.20$ for FY305, and $1.59{\pm}0.20$ and $1.14{\pm}0.15$ for PY305, respectively. Conclusion: From the bias perspective, we believe that genomic selection based on the application of Bayesian approaches would be more suitable than application of ss-GBLUP in Korean Holstein populations.

"비급천금요방(備急千金要方)" 침구편(鍼灸篇)으로 구성한 경혈(經穴) 네트워크에 공간적 위치 변수가 미치는 영향 (Spatial Influence on Acupoints Network Derived from the Chapter on Acupuncture & Moxibustion in "Beijiqianjinyaofang")

  • 김민욱;양승범;안성훈;손인철;김재효
    • Korean Journal of Acupuncture
    • /
    • 제29권3호
    • /
    • pp.431-440
    • /
    • 2012
  • Objectives : Recently, network science is very popular topic in various scientific fields and many studies have reported that it gives meaningful results on studying characteristics of a complex system. In this study, based on network theory, we made acupoints network using data of combined acupoints which appeared at "Beijiqianjinyaofang". We focused to find out the distinctive roles of remote and local combinations on the network. Furthermore, we aimed to identify the possibility of numerical and quantitative application to acupuncture researches. Methods : Based on examples of combined acupoints in "Beijiqianjinyaofang", the network consisted of 291 nodes and 2,431 links. The spatial distances between combined acupoints were calculated by the human dummy model. We removed the links step by step for the three cases - remote, local, and random cases, and observed the characteristic changes by calculating path lengths, similarity indices, and clustering coefficients. Also cluster analysis was carried out. Results : The network had a small number of remote links, and a large number of local links. These two links had the distinct characteristics. Whereas the local links formed a cluster of nearby nodes, remote links played a role to increase the correlation between the clusters. Conclusions : These results suggest that acupoints network increases the connectivity between the distal part and the trunk of human body, and enables various combinations of the acupoints. This finding conclusively showed that mechanism of combined acupoints could be interpreted meaningfully by applying network theory in acupuncture researches.