• 제목/요약/키워드: Clustering behavior

검색결과 183건 처리시간 0.023초

Abnormal Behavior Recognition Based on Spatio-temporal Context

  • Yang, Yuanfeng;Li, Lin;Liu, Zhaobin;Liu, Gang
    • Journal of Information Processing Systems
    • /
    • 제16권3호
    • /
    • pp.612-628
    • /
    • 2020
  • This paper presents a new approach for detecting abnormal behaviors in complex surveillance scenes where anomalies are subtle and difficult to distinguish due to the intricate correlations among multiple objects' behaviors. Specifically, a cascaded probabilistic topic model was put forward for learning the spatial context of local behavior and the temporal context of global behavior in two different stages. In the first stage of topic modeling, unlike the existing approaches using either optical flows or complete trajectories, spatio-temporal correlations between the trajectory fragments in video clips were modeled by the latent Dirichlet allocation (LDA) topic model based on Markov random fields to obtain the spatial context of local behavior in each video clip. The local behavior topic categories were then obtained by exploiting the spectral clustering algorithm. Based on the construction of a dictionary through the process of local behavior topic clustering, the second phase of the LDA topic model learns the correlations of global behaviors and temporal context. In particular, an abnormal behavior recognition method was developed based on the learned spatio-temporal context of behaviors. The specific identification method adopts a top-down strategy and consists of two stages: anomaly recognition of video clip and anomalous behavior recognition within each video clip. Evaluation was performed using the validity of spatio-temporal context learning for local behavior topics and abnormal behavior recognition. Furthermore, the performance of the proposed approach in abnormal behavior recognition improved effectively and significantly in complex surveillance scenes.

Optimizing Clustering and Predictive Modelling for 3-D Road Network Analysis Using Explainable AI

  • Rotsnarani Sethy;Soumya Ranjan Mahanta;Mrutyunjaya Panda
    • International Journal of Computer Science & Network Security
    • /
    • 제24권9호
    • /
    • pp.30-40
    • /
    • 2024
  • Building an accurate 3-D spatial road network model has become an active area of research now-a-days that profess to be a new paradigm in developing Smart roads and intelligent transportation system (ITS) which will help the public and private road impresario for better road mobility and eco-routing so that better road traffic, less carbon emission and road safety may be ensured. Dealing with such a large scale 3-D road network data poses challenges in getting accurate elevation information of a road network to better estimate the CO2 emission and accurate routing for the vehicles in Internet of Vehicle (IoV) scenario. Clustering and regression techniques are found suitable in discovering the missing elevation information in 3-D spatial road network dataset for some points in the road network which is envisaged of helping the public a better eco-routing experience. Further, recently Explainable Artificial Intelligence (xAI) draws attention of the researchers to better interprete, transparent and comprehensible, thus enabling to design efficient choice based models choices depending upon users requirements. The 3-D road network dataset, comprising of spatial attributes (longitude, latitude, altitude) of North Jutland, Denmark, collected from publicly available UCI repositories is preprocessed through feature engineering and scaling to ensure optimal accuracy for clustering and regression tasks. K-Means clustering and regression using Support Vector Machine (SVM) with radial basis function (RBF) kernel are employed for 3-D road network analysis. Silhouette scores and number of clusters are chosen for measuring cluster quality whereas error metric such as MAE ( Mean Absolute Error) and RMSE (Root Mean Square Error) are considered for evaluating the regression method. To have better interpretability of the Clustering and regression models, SHAP (Shapley Additive Explanations), a powerful xAI technique is employed in this research. From extensive experiments , it is observed that SHAP analysis validated the importance of latitude and altitude in predicting longitude, particularly in the four-cluster setup, providing critical insights into model behavior and feature contributions SHAP analysis validated the importance of latitude and altitude in predicting longitude, particularly in the four-cluster setup, providing critical insights into model behavior and feature contributions with an accuracy of 97.22% and strong performance metrics across all classes having MAE of 0.0346, and MSE of 0.0018. On the other hand, the ten-cluster setup, while faster in SHAP analysis, presented challenges in interpretability due to increased clustering complexity. Hence, K-Means clustering with K=4 and SVM hybrid models demonstrated superior performance and interpretability, highlighting the importance of careful cluster selection to balance model complexity and predictive accuracy.

공간적 패턴클러스터링을 위한 새로운 접근방법의 제안 : 슈퍼마켓고객의 동선분석 (A New Approach to Spatial Pattern Clustering based on Longest Common Subsequence with application to a Grocery)

  • 정인철;권영식
    • 산업공학
    • /
    • 제24권4호
    • /
    • pp.447-456
    • /
    • 2011
  • Identifying the major moving patterns of shoppers' movements in the selling floor has been a longstanding issue in the retailing industry. With the advent of RFID technology, it has been easier to collect the moving data for a individual shopper's movement. Most of the previous studies used the traditional clustering technique to identify the major moving pattern of customers. However, in using clustering technique, due to the spatial constraint (aisle layout or other physical obstructions in the store), standard clustering methods are not feasible for moving data like shopping path should be adjusted for the analysis in advance, which is time-consuming and causes data distortion. To alleviate this problems, we propose a new approach to spatial pattern clustering based on longest common subsequence (LCSS). Experimental results using the real data obtained from a grocery in Seoul show that the proposed method performs well in finding the hot spot and dead spot as well as in finding the major path patterns of customer movements.

통행시간 분포 기반의 전철역 클러스터링 (Metro Station Clustering based on Travel-Time Distributions)

  • 공인택;김동윤;민윤홍
    • 한국전자거래학회지
    • /
    • 제27권2호
    • /
    • pp.193-204
    • /
    • 2022
  • 스마트교통카드 데이터는 대표적인 모빌리티 데이터로 이를 이용하여 대중교통 이용행태를 분석하고 정책 개발에 활용할 수 있다. 본 논문은 이러한 연구의 하나로 전철 이용패턴을 이용하여 전철역들을 분류하는 문제를 다룬다. 전철역의 클러스터링을 다룬 기존 논문들은 이용행태 중 통행량만을 고려하였기에 본 논문은 이에 대한 보완적인 방법의 하나로 통행시간을 고려한 클러스터링을 제안한다. 각 역의 승객들을 출근 시간 출발, 출근 시간 도착, 퇴근 시간 출발, 퇴근 시간 도착 승객들로 분류한 다음 각각의 통행시간을 와이블 분포로 모형화하여 추정한 형상모수를 역의 특성값으로 정의하였다. 그리고 특성 벡터들을 K-평균 클러스터링 기법을 사용하여 클러스터링하였다. 실험결과 통행시간을 고려하여 역의 클러스터링을 수행하면 기존 연구의 클러스터링 결과와 유사한 결과가 나올 뿐만 아니라 더 세분화 된 클러스터링이 가능함을 관찰하였다.

흡연, 음주와 운동습관의 군집현상을 통한 건강행태의 고위험군: 국민건강영양 조사 (High Risk Groups in Health Behavior Defined by Clustering of Smoking, Alcohol, and Exercise Habits: National Heath and Nutrition Examination Survey)

  • 강기원;성주헌;김창엽
    • Journal of Preventive Medicine and Public Health
    • /
    • 제43권1호
    • /
    • pp.73-83
    • /
    • 2010
  • Objectives: We investigated the clustering of selected lifestyle factors (cigarette smoking, heavy alcohol consumption, lack of physical exercise) and identified the population characteristics associated with increasing lifestyle risks. Methods: Data on lifestyle risk factors, sociodemographic characteristics, and history of chronic diseases were obtained from 7,694 individuals ${\geq}20$ years of age who participated in the 2005 Korea National Health and Nutrition Examination Survey (KNHANES). Clustering of lifestyle risks involved the observed prevalence of multiple risks and those expected from marginal exposure prevalence of the three selected risk factors. Prevalence odds ratio was adopted as a measurement of clustering. Multiple correspondence analysis, Kendall tau correlation, Man-Whitney analysis, and ordinal logistic regression analysis were conducted to identify variables increasing lifestyle risks. Results: In both men and women, increased lifestyle risks were associated with clustering of: (1) cigarette smoking and excessive alcohol consumption, and (2) smoking, excessive alcohol consumption, and lack of physical exercise. Patterns of clustering for physical exercise were different from those for cigarette smoking and alcohol consumption. The increased unhealthy clustering was found among men 20-64 years of age with mild or moderate stress, and among women 35-49 years of age who were never-married, with mild stress, and increased body mass index (>$30\;kg/m^2$). Conclusions: Addressing a lack of physical exercise considering individual characteristics including gender, age, employment activity, and stress levels should be a focus of health promotion efforts.

시계열데이터의 모델기반 클러스터 결정 (Determining on Model-based Clusters of Time Series Data)

  • 전진호;이계성
    • 한국콘텐츠학회논문지
    • /
    • 제7권6호
    • /
    • pp.22-30
    • /
    • 2007
  • 대부분의 실세계의 시스템들, 즉 경제, 주식시장, 의료분야 등의 많은 시스템들은 동적이며 복잡한 현상을 갖는다. 이러한 특징들의 시스템을 이해하는 전형적인 방법은 시스템행위에 대한 모델을 세우고 분석하는 것이다. 본 연구에서는 실세계의 동적 시스템에서 발생되는 시계열데이터들에 대하여 최적의 클러스터를 형성하기 위한 방법을 연구한다. 먼저 클러스터 수를 결정하는 기준으로 베이지안정보기준(BIC : Bayesian Information Criterion)근사법의 활용도를 검증하고 데이터 크기와 베이지안정보기준값의 상관관계를 파악함으로 탐색 효율을 높이는 방안을 제안하며 클러스터링 과정으로 모델기반과 유사기반의 방법론을 비교 확인하여 본다. 실제의 시계열데이터(주가)에 대해 실험을 시행하였고 베이지안정보기준 근사 측도는 데이터의 크기에 따라 파티션의 사이즈를 정확히 추정하는 것을 확인하였으며 또한 유사기반의 방식보다 모델기반의 방법론이 클러스터링에서 더 나은 결과를 갖는 것을 확인하였다.

융합 인공벌군집 데이터 클러스터링 방법 (Combined Artificial Bee Colony for Data Clustering)

  • 강범수;김성수
    • 산업경영시스템학회지
    • /
    • 제40권4호
    • /
    • pp.203-210
    • /
    • 2017
  • Data clustering is one of the most difficult and challenging problems and can be formally considered as a particular kind of NP-hard grouping problems. The K-means algorithm is one of the most popular and widely used clustering method because it is easy to implement and very efficient. However, it has high possibility to trap in local optimum and high variation of solutions with different initials for the large data set. Therefore, we need study efficient computational intelligence method to find the global optimal solution in data clustering problem within limited computational time. The objective of this paper is to propose a combined artificial bee colony (CABC) with K-means for initialization and finalization to find optimal solution that is effective on data clustering optimization problem. The artificial bee colony (ABC) is an algorithm motivated by the intelligent behavior exhibited by honeybees when searching for food. The performance of ABC is better than or similar to other population-based algorithms with the added advantage of employing fewer control parameters. Our proposed CABC method is able to provide near optimal solution within reasonable time to balance the converged and diversified searches. In this paper, the experiment and analysis of clustering problems demonstrate that CABC is a competitive approach comparing to previous partitioning approaches in satisfactory results with respect to solution quality. We validate the performance of CABC using Iris, Wine, Glass, Vowel, and Cloud UCI machine learning repository datasets comparing to previous studies by experiment and analysis. Our proposed KABCK (K-means+ABC+K-means) is better than ABCK (ABC+K-means), KABC (K-means+ABC), ABC, and K-means in our simulations.

무선 센서 네트워크를 위한 에너지 효율적인 이중 레이어 분산 클러스터링 기법 (A Dual-layer Energy Efficient Distributed Clustering Algorithm for Wireless Sensor Networks)

  • 여명호;김유미;유재수
    • 한국정보과학회논문지:데이타베이스
    • /
    • 제35권1호
    • /
    • pp.84-95
    • /
    • 2008
  • 최근 무선 센서 네트워크는 다양한 응용분야의 플랫폼으로써 사용되고 있다. 무선 센서를 배치하고, 센서 네트워크를 구성함으로써 원격으로 어떤 영역에 포함된 객체들의 동작, 상태, 위치 등에 관한 정보를 얻을 수 있다. 일반적으로 센서 노드들은 제한된 배터리로 동작하기 때문에 센서 네트워크의 생명주기를 연장시키기 위한 에너지 효율적인 데이타 수집 메커니즘은 필수 조건이다. 본 논문에서는 클러스터 헤드의 에너지 소모를 분산할 수 있는 새로운 클러스터링 기법을 제안한다. 먼저 클러스터 헤드의 역할에 따른 에너지 소모를 분석하고, 클러스터를 수집과 전송을 위한 두 계층으로 분리한다. 다음 각 계층을 담당하는 센서 노드를 선출하여 단일 클러스터 헤드의 에너지 소모를 2개의 센서 노드로 분산한다. 제안하는 클러스터링 기법의 우수성을 보이기 위해 시뮬레이션을 통해 기존의 클러스터링 기법과 성능을 비교했다. 그 결과, 기존의 알고리즘에 비해 생명 주기(lifetime)가 $10%{\sim}40%$ 향상되는 것을 확인할 수 있었다.

자율 이동 로봇의 정렬 군지능 알고리즘 구현 (Implementation of the Arrangement Algorithm for Autonomous Mobile Robots)

  • 김장현;공성곤
    • 대한전기학회:학술대회논문집
    • /
    • 대한전기학회 1998년도 하계학술대회 논문집 G
    • /
    • pp.2186-2188
    • /
    • 1998
  • In this paper, Fundamental rules governing group intelligence "arrangement" behavior of multiple number of autonomous mobile robots are represented by a small number of fuzzy rules. Complex lifelike behavior is considered as local interactions between simple individuals under small number of fundamental rules. The fuzzy rules for arrangement are generated from clustering the input-output data obtained from the arrangement algorithm. Simulation shows the fuzzy rules successfully realizes fundamental rules of the flocking group behavior.

  • PDF

지역 간 흡연율 격차 영향요인 분석 및 금연사업 상대적 효율성 평가: Clustering Analysis와 Data Envelopment Analysis를 활용하여 (Analysis of Factors Affecting the Smoking Rates Gap between Regions and Evaluation of Relative Efficiency of Smoking Cessation Projects)

  • 김희년;이다호;정지윤;구여정;정형선
    • 보건행정학회지
    • /
    • 제30권2호
    • /
    • pp.199-210
    • /
    • 2020
  • Background: Based on the importance of ceasing smoking programs to control the regional disparity of smoking behavior in Korea, this study aims to reveal the variation of smoke rate and determinants of it for 229 provinces. An evaluation of the relative efficiency of the cease smoking program under the consideration of regional characteristics was followed. Methods: The main sources of data are the Korean Statistical Information Service and a national survey on the expenditure of public health centers. Multivariate regression is performed to figure the determinants of regional variation of smoking rate. Based on the result of the regression model, clustering analysis was conducted to group 229 regions by their characteristics. Three clusters were generated. Using data envelopment analysis (DEA), relative efficiency scores are calculated. Results from the pooled model which put 229 provinces in one model to score relative efficiency were compared with the cluster-separated model of each cluster. Results: First, the maximum variation of the smoking rate was 16.9%p. Second, sex ration, the proportion of the elder, and high risk drinking alcohol behavior have a significant role in the regional variation of smoking. Third, the population and proportion of the elder are the main variables for clustering. Fourth, dissimilarity on the results of relative efficiency was found between the pooled model and cluster-separated model, especially for cluster 2. Conclusion: This study figured regional variation of smoking rate and its determinants on the regional level. Unconformity of the DEA results between different models implies the issues on regional features when the regional evaluation performed especially on the programs of public health centers.