• Title/Summary/Keyword: Unsupervised algorithm

Search Result 281, Processing Time 0.029 seconds

A Machine learning Approach for Knowledge Base Construction Incorporating GIS Data for land Cover Classification of Landsat ETM+ Image (지식 기반 시스템에서 GIS 자료를 활용하기 위한 기계 학습 기법에 관한 연구 - Landsat ETM+ 영상의 토지 피복 분류를 사례로)

  • Kim, Hwa-Hwan;Ku, Cha-Yang
    • Journal of the Korean Geographical Society
    • /
    • v.43 no.5
    • /
    • pp.761-774
    • /
    • 2008
  • Integration of GIS data and human expert knowledge into digital image processing has long been acknowledged as a necessity to improve remote sensing image analysis. We propose inductive machine learning algorithm for GIS data integration and rule-based classification method for land cover classification. Proposed method is tested with a land cover classification of a Landsat ETM+ multispectral image and GIS data layers including elevation, aspect, slope, distance to water bodies, distance to road network, and population density. Decision trees and production rules for land cover classification are generated by C5.0 inductive machine learning algorithm with 350 stratified random point samples. Production rules are used for land cover classification integrated with unsupervised ISODATA classification. Result shows that GIS data layers such as elevation, distance to water bodies and population density can be effectively integrated for rule-based image classification. Intuitive production rules generated by inductive machine learning are easy to understand. Proposed method demonstrates how various GIS data layers can be integrated with remotely sensed imagery in a framework of knowledge base construction to improve land cover classification.

Automatic Estimation of Threshold Values for Change Detection of Multi-temporal Remote Sensing Images (다중시기 원격탐사 화상의 변화탐지를 위한 임계치 자동 추정)

  • 박노욱;지광훈;이광재;권병두
    • Korean Journal of Remote Sensing
    • /
    • v.19 no.6
    • /
    • pp.465-478
    • /
    • 2003
  • This paper presents two methods for automatic estimation of threshold values in unsupervised change detection of multi-temporal remote sensing images. The proposed methods consist of two analytical steps. The first step is to compute the parameters of a 3-component Gaussian mixture model from difference or ratio images. The second step is to determine a threshold value using Bayesian rule for minimum error. The first method which is an extended version of Bruzzone and Prieto' method (2000) is to apply an Expectation-Maximization algorithm for estimation of the parameters of the Gaussian mixture model. The second method is based on an iterative thresholding algorithm that successively employs thresholding and estimation of the model parameters. The effectiveness and applicability of the methods proposed here were illustrated by two experiments and one case study including the synthetic data sets and KOMPSAT-1 EOC images. The experiments demonstrate that the proposed methods can effectively estimate the model parameters and the threshold value determined shows the minimum overall error.

Structural Shape Estimation Based on 3D LiDAR Scanning Method for On-site Safety Diagnostic of Plastic Greenhouse (비닐 온실의 현장 안전진단을 위한 3차원 LiDAR 스캔 기법 기반 구조 형상 추정)

  • Seo, Byung-hun;Lee, Sangik;Lee, Jonghyuk;Kim, Dongsu;Kim, Dongwoo;Jo, Yerim;Kim, Yuyong;Lee, Jeongmin;Choi, Won
    • Journal of The Korean Society of Agricultural Engineers
    • /
    • v.66 no.5
    • /
    • pp.1-13
    • /
    • 2024
  • In this study, we applied an on-site diagnostic method for estimating the structural safety of a plastic greenhouse. A three-dimensional light detection and ranging (3D LiDAR) sensor was used to scan the greenhouse to extract point cloud data (PCD). Differential thresholds of the color index were applied to the partitions of raw PCD to separate steel frames from plastic films. Additionally, the K-means algorithm was used to convert the steel frame PCD into the nodes of unit members. These nodes were subsequently transformed into structural shape data. To verify greenhouse shape reproducibility, the member lengths of the scan and blueprint models were compared with the measurements along the X-, Y-, and Z-axes. The error of the scan model was accurate at 2%-3%, whereas the error of the blueprint model was 5.4%. At a maximum snow depth of 0.5 m, the scan model revealed asymmetric horizontal deflection and extreme bending stress, which indicated that even minor shape irregularities could result in critical failures in extreme weather. The safety factor for bending stress in the scan model was 18.7% lower than that in the blueprint model. This phenomenon indicated that precise shape estimation is crucial for safety diagnostic. Future studies should focus on the development of an automated process based on supervised learning to ensure the widespread adoption of greenhouse safety diagnostics.

Improving the Retrieval Effectiveness by Incorporating Word Sense Disambiguation Process (정보검색 성능 향상을 위한 단어 중의성 해소 모형에 관한 연구)

  • Chung, Young-Mee;Lee, Yong-Gu
    • Journal of the Korean Society for information Management
    • /
    • v.22 no.2 s.56
    • /
    • pp.125-145
    • /
    • 2005
  • This paper presents a semantic vector space retrieval model incorporating a word sense disambiguation algorithm in an attempt to improve retrieval effectiveness. Nine Korean homonyms are selected for the sense disambiguation and retrieval experiments. The total of approximately 120,000 news articles comprise the raw test collection and 18 queries including homonyms as query words are used for the retrieval experiments. A Naive Bayes classifier and EM algorithm representing supervised and unsupervised learning algorithms respectively are used for the disambiguation process. The Naive Bayes classifier achieved $92\%$ disambiguation accuracy. while the clustering performance of the EM algorithm is $67\%$ on the average. The retrieval effectiveness of the semantic vector space model incorporating the Naive Bayes classifier showed $39.6\%$ precision achieving about $7.4\%$ improvement. However, the retrieval effectiveness of the EM algorithm-based semantic retrieval is $3\%$ lower than the baseline retrieval without disambiguation. It is worth noting that the performances of disambiguation and retrieval depend on the distribution patterns of homonyms to be disambiguated as well as the characteristics of queries.

Performance Comparison of Anomaly Detection Algorithms: in terms of Anomaly Type and Data Properties (이상탐지 알고리즘 성능 비교: 이상치 유형과 데이터 속성 관점에서)

  • Jaeung Kim;Seung Ryul Jeong;Namgyu Kim
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.3
    • /
    • pp.229-247
    • /
    • 2023
  • With the increasing emphasis on anomaly detection across various fields, diverse anomaly detection algorithms have been developed for various data types and anomaly patterns. However, the performance of anomaly detection algorithms is generally evaluated on publicly available datasets, and the specific performance of each algorithm on anomalies of particular types remains unexplored. Consequently, selecting an appropriate anomaly detection algorithm for specific analytical contexts poses challenges. Therefore, in this paper, we aim to investigate the types of anomalies and various attributes of data. Subsequently, we intend to propose approaches that can assist in the selection of appropriate anomaly detection algorithms based on this understanding. Specifically, this study compares the performance of anomaly detection algorithms for four types of anomalies: local, global, contextual, and clustered anomalies. Through further analysis, the impact of label availability, data quantity, and dimensionality on algorithm performance is examined. Experimental results demonstrate that the most effective algorithm varies depending on the type of anomaly, and certain algorithms exhibit stable performance even in the absence of anomaly-specific information. Furthermore, in some types of anomalies, the performance of unsupervised anomaly detection algorithms was observed to be lower than that of supervised and semi-supervised learning algorithms. Lastly, we found that the performance of most algorithms is more strongly influenced by the type of anomalies when the data quantity is relatively scarce or abundant. Additionally, in cases of higher dimensionality, it was noted that excellent performance was exhibited in detecting local and global anomalies, while lower performance was observed for clustered anomaly types.

Automatic Clustering on Trained Self-organizing Feature Maps via Graph Cuts (그래프 컷을 이용한 학습된 자기 조직화 맵의 자동 군집화)

  • Park, An-Jin;Jung, Kee-Chul
    • Journal of KIISE:Software and Applications
    • /
    • v.35 no.9
    • /
    • pp.572-587
    • /
    • 2008
  • The Self-organizing Feature Map(SOFM) that is one of unsupervised neural networks is a very powerful tool for data clustering and visualization in high-dimensional data sets. Although the SOFM has been applied in many engineering problems, it needs to cluster similar weights into one class on the trained SOFM as a post-processing, which is manually performed in many cases. The traditional clustering algorithms, such as t-means, on the trained SOFM however do not yield satisfactory results, especially when clusters have arbitrary shapes. This paper proposes automatic clustering on trained SOFM, which can deal with arbitrary cluster shapes and be globally optimized by graph cuts. When using the graph cuts, the graph must have two additional vertices, called terminals, and weights between the terminals and vertices of the graph are generally set based on data manually obtained by users. The Proposed method automatically sets the weights based on mode-seeking on a distance matrix. Experimental results demonstrated the effectiveness of the proposed method in texture segmentation. In the experimental results, the proposed method improved precision rates compared with previous traditional clustering algorithm, as the method can deal with arbitrary cluster shapes based on the graph-theoretic clustering.

Dimensionality Reduction Methods Analysis of Hyperspectral Imagery for Unsupervised Change Detection of Multi-sensor Images (이종 영상 간의 무감독 변화탐지를 위한 초분광 영상의 차원 축소 방법 분석)

  • PARK, Hong-Lyun;PARK, Wan-Yong;PARK, Hyun-Chun;CHOI, Seok-Keun;CHOI, Jae-Wan;IM, Hon-Ryang
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.22 no.4
    • /
    • pp.1-11
    • /
    • 2019
  • With the development of remote sensing sensor technology, it has become possible to acquire satellite images with various spectral information. In particular, since the hyperspectral image is composed of continuous and narrow spectral wavelength, it can be effectively used in various fields such as land cover classification, target detection, and environment monitoring. Change detection techniques using remote sensing data are generally performed through differences of data with same dimensions. Therefore, it has a disadvantage that it is difficult to apply to heterogeneous sensors having different dimensions. In this study, we have developed a change detection method applicable to hyperspectral image and high spat ial resolution satellite image with different dimensions, and confirmed the applicability of the change detection method between heterogeneous images. For the application of the change detection method, the dimension of hyperspectral image was reduced by using correlation analysis and principal component analysis, and the change detection algorithm used CVA. The ROC curve and the AUC were calculated using the reference data for the evaluation of change detection performance. Experimental results show that the change detection performance is higher when using the image generated by adequate dimensionality reduction than the case using the original hyperspectral image.

PDA Personalized Agent System (PDA용 개인화 에이전트 시스템)

  • 표석진;박영택
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2002.11a
    • /
    • pp.345-352
    • /
    • 2002
  • 무선 인터넷을 이용하는 사용자는 정보의 양의 따른 시간적 통신비용의 증가 문제로 개인화 에이전트가 사용자의 관심에 따라 서비스를 제공하는 기능과 맞춤화된 정보를 제공하는 기능, 지식 기반 방식으로 정보를 예측하는 기능을 가지기를 바라고 있다. 본 논문에서는 이와 같이 무선 인터넷을 사용하는 사용자를 위한 PDA 개인화 에이전트 시스템을 구축하고자 한다. PDA 개인화 에이전트 시스템 구축을 위해 프로파일 기반의 에이전트 엔진과 사용자 프로파일을 이용한 지식기반 방식을 사용한다. 사용자가 웹페이지에서 행하는 행위들을 모니터링하여 사용자가 관심 가지는 문서를 파악하고 정보 검색을 통해 얻어진 문서를 분석하여 사용자 각각의 관심 문서로 나누어 서비스하게 된다. 모니터링 되어진 문서를 효과적으로 분석하기 위해 unsupervised clustering 기계학습 방식인 Cobweb을 이용한다. unsupervised 기계 학습은 conceptual 방식을 이용하여 검색되어진 정보를 사용자의 관심 분야별로 clustering한다. 클러스터링을 통해 얻어진 결과를 다시 기계학습을 통해 사용자 관심문서에 대한 프로파일을 생성하게 된다. 이렇게 만들어진 프로파일을 룰(Rule)로 만들어 이를 기반으로 사용자에게 서비스하게 된다. 이러한 룰은 사용자의 모니터링 결과로 얻어지기 때문에 주기적으로 업데이트하게 된다. 제안하는 시스템은 인터넷신문이나 웹진 등에서 사용자들에게 뉴스를 전달하기 위한 목적으로 생성하는 뉴스문서를 특정 대상으로 선정하였고 사용자 정보를 이용한 검색을 실시하고 결과로 얻어진 정보를 정보 분류를 통해 PDA나 휴대폰을 통해 사용자에게 제공한다. 상품을 검색하기 위한 검색노력을 줄이고, 검색된 대안들로부터 구매자와 시스템이 웹상에서 서로 상호작용(interactivity) 하여 해를 찾고, 제약조건과 규칙들에 의해 적합한 해를 찾아가는 방법을 제시한다. 본 논문은 구성기반 예로서 컴퓨터 부품조립을 사용해서 Template-based reasoning 예를 보인다 본 방법론은 검색노력을 줄이고, 검색에 있어 Feasibility와 Admissibility를 보장한다.매김할 수 있는 중요한 계기가 될 것이다.재무/비재무적 지표를 고려한 인공신경망기법의 예측적중률이 높은 것으로 나타났다. 즉, 로지스틱회귀 분석의 재무적 지표모형은 훈련, 시험용이 84.45%, 85.10%인 반면, 재무/비재무적 지표모형은 84.45%, 85.08%로서 거의 동일한 예측적중률을 가졌으나 인공신경망기법 분석에서는 재무적 지표모형이 92.23%, 85.10%인 반면, 재무/비재무적 지표모형에서는 91.12%, 88.06%로서 향상된 예측적중률을 나타내었다.ting LMS according to increasing the step-size parameter $\mu$ in the experimentally computed. learning curve. Also we find that convergence speed of proposed algorithm is increased by (B+1) time proportional to B which B is the number of recycled data buffer without complexity of compu

  • PDF

Vegetation Cover Type Mapping Over The Korean Peninsula Using Multitemporal AVHRR Data (시계열(時系列) AVHRR 위성자료(衛星資料)를 이용한 한반도 식생분포(植生分布) 구분(區分))

  • Lee, Kyu-Sung
    • Journal of Korean Society of Forest Science
    • /
    • v.83 no.4
    • /
    • pp.441-449
    • /
    • 1994
  • The two reflective channels(red and near infrared spectrum) of advanced very high resolution radiometer(AVHRR) data were used to classify primary vegetation cover types in the Korean Peninsula. From the NOAA-11 satellite data archive of 1991, 27 daytime scenes of relatively minimum cloud coverage were obtained. After the initial radiometric calibration, normalized difference vegetation index(NDVI) was calculated for each of the 27 data sets. Four or five daily NDVI data were then overlaid for each of the six months starting from February to November and the maximum value of NDVI was retained for every pixel location to make a monthly composite. The six bands of monthly NDVI composite were nearly cloud free and used for the computer classification of vegetation cover. Based on the temporal signatures of different vegetation cover types, which were generated by an unsupervised block clustering algorithm, every pixel was classified into one of the six cover type categories. The classification result was evaluated by both qualitative interpretation and quantitative comparison with existing forest statistics. Considering frequent data acquisition, low data cost and volume, and large area coverage, it is believed that AVHRR data are effective for vegetation cover type mapping at regional scale.

  • PDF

Probabilistic reduced K-means cluster analysis (확률적 reduced K-means 군집분석)

  • Lee, Seunghoon;Song, Juwon
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.6
    • /
    • pp.905-922
    • /
    • 2021
  • Cluster analysis is one of unsupervised learning techniques used for discovering clusters when there is no prior knowledge of group membership. K-means, one of the commonly used cluster analysis techniques, may fail when the number of variables becomes large. In such high-dimensional cases, it is common to perform tandem analysis, K-means cluster analysis after reducing the number of variables using dimension reduction methods. However, there is no guarantee that the reduced dimension reveals the cluster structure properly. Principal component analysis may mask the structure of clusters, especially when there are large variances for variables that are not related to cluster structure. To overcome this, techniques that perform dimension reduction and cluster analysis simultaneously have been suggested. This study proposes probabilistic reduced K-means, the transition of reduced K-means (De Soete and Caroll, 1994) into a probabilistic framework. Simulation shows that the proposed method performs better than tandem clustering or clustering without any dimension reduction. When the number of the variables is larger than the number of samples in each cluster, probabilistic reduced K-means show better formation of clusters than non-probabilistic reduced K-means. In the application to a real data set, it revealed similar or better cluster structure compared to other methods.