• Title/Summary/Keyword: 2-Step Clustering

Search Result 86, Processing Time 0.023 seconds

Parallel Processing of K-means Clustering Algorithm for Unsupervised Classification of Large Satellite Imagery (대용량 위성영상의 무감독 분류를 위한 K-means 군집화 알고리즘의 병렬처리)

  • Han, Soohee
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.35 no.3
    • /
    • pp.187-194
    • /
    • 2017
  • The present study introduces a method to parallelize k-means clustering algorithm for fast unsupervised classification of large satellite imagery. Known as a representative algorithm for unsupervised classification, k-means clustering is usually applied to a preprocessing step before supervised classification, but can show the evident advantages of parallel processing due to its high computational intensity and less human intervention. Parallel processing codes are developed by using multi-threading based on OpenMP. In experiments, a PC of 8 multi-core integrated CPU is involved. A 7 band and 30m resolution image from LANDSAT 8 OLI and a 8 band and 10m resolution image from Sentinel-2A are tested. Parallel processing has shown 6 time faster speed than sequential processing when using 10 classes. To check the consistency of parallel and sequential processing, centers, numbers of classified pixels of classes, classified images are mutually compared, resulting in the same results. The present study is meaningful because it has proved that performance of large satellite processing can be significantly improved by using parallel processing. And it is also revealed that it easy to implement parallel processing by using multi-threading based on OpenMP but it should be carefully designed to control the occurrence of false sharing.

Segmentation of Cooperatives' Mutuality Bank for Effective Risk Management using Factor Analysis and Cluster Analysis

  • Cho, Yong-Jun;Ko, Seoung-Gon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.19 no.3
    • /
    • pp.831-844
    • /
    • 2008
  • Since cooperatives consist of many distinct members in the management environment and characteristics, it is necessary to make similar cooperatives into a few groups for the effective risk management of cooperatives' mutuality bank. This paper is a priori research for suggesting a guidance for effective risk management of cooperatives with different management strategy. For such purpose, we propose a way to group the members of cooperative's mutuality bank. The 30 continuous variables which is relative to cooperatives' management status are considered and six factors are extracted from those variables through factor analysis with empirical consideration to avoid wrong grouping and to enhance the practical interpretation. Based on extracted six factors and additional 3 categorical variables, six representative groups are derived by the two step clustering analysis. These findings are useful to execute a discriminatory risk management and other management strategy for a mutuality bank and others.

  • PDF

How to Compute the Smallest / Largest Eigenvalue of a Symmetric Matrix

  • Baik, Ran
    • Journal of the Korean Society for Industrial and Applied Mathematics
    • /
    • v.3 no.2
    • /
    • pp.37-49
    • /
    • 1999
  • In this paper we develop a general Homotopy method called the Group Homotopy method to solve the symmetric eigenproblem. The Group Homotopy method overcomes notable drawbacks of the existing Homotopy method, namely, (i) the possibility of breakdown or having a slow rate of convergence in the presence of clustering of the eigenvalues and (ii) the absence of any definite criterion to choose a step size that guarantees the convergence of the method. On the other hand, We also have a good approximations of the largest eigenvalue of a Symmetric matrix from Lanczos algorithm. We apply it for the largest eigenproblem of a very large symmetric matrix with a good initial points.

  • PDF

A Study on a Real Time Freight Delivery Planning for Supply Center based on GIS (GIS기반의 실시간 통합화물운송시스템 계획에 관한 연구)

  • 황흥석;김호균;조규성
    • Korean Management Science Review
    • /
    • v.19 no.2
    • /
    • pp.75-89
    • /
    • 2002
  • According to the fast-paced environment of information technology and improving customer services, the design activities of logistics systems improve customer centric services and delivery performance implementing e-logistics system. The fundamental design issues that arise in the delivery system planning are optimizing the system with minimum cost and maximum throughput and service level. This study is concerned with the integrated model development of delivery system with customer responsive service level for DCM, Demand Chain Management. We used a two-step approach for this study. First, we formulated the supply. center facility planning using stochastic set-covering problem and assigned the customers to the supply center using clustering algorithm. Second, we developed vehicle delivery planning for a supply center based on GIS, GIS-VRP. Also we developed a GUI-type computer program for proposed method for supply center problem using GIS and Geo-DataBase of Busan area. The computational results showed that the proposed method was very effective on a set of test problems.

Optimizing Speed For Adaptive Local Thresholding Algorithm U sing Dynamic Programing

  • Due Duong Anh;Hong Du Tran Le;Duan Tran Duc
    • Proceedings of the IEEK Conference
    • /
    • summer
    • /
    • pp.438-441
    • /
    • 2004
  • Image binarization using a global threshold value [3] performs at high speed, but usually results in undesired binary images when the source images are of poor quality. In such cases, adaptive local thresholding algorithms [1][2][3] are used to obtain better results, and the algorithm proposed by A.E.Savekis which chooses local threshold using fore­ground and background clustering [1] is one of the best thresholding algorithms. However, this algorithm runs slowly due to its re-computing threshold value of each central pixel in a local window MxM. In this paper, we present a dynamic programming approach for the step of calculating local threshold value that reduces many redundant computations and improves the execution speed significantly. Experiments show that our proposal improvement runs more ten times faster than the original algorithm.

  • PDF

A Geometric Constraint Solver for Parametric Modeling

  • Jae Yeol Lee;Kwangsoo Kim
    • Korean Journal of Computational Design and Engineering
    • /
    • v.3 no.4
    • /
    • pp.211-222
    • /
    • 1998
  • Parametric design is an important modeling paradigm in CAD/CAM applications, enabling efficient design modifications and variations. One of the major issues in parametric design is to develop a geometric constraint solver that can handle a large set of geometric configurations efficiently and robustly. In this appear, we propose a new approach to geometric constraint solving that employs a graph-based method to solve the ruler-and-compass constructible configurations and a numerical method to solve the ruler-and-compass non-constructible configurations, in a way that combines the advantages of both methods. The geometric constraint solving process consists of two phases: 1) planning phase and 2) execution phase. In the planning phase, a sequence of construction steps is generated by clustering the constrained geometric entities and reducing the constraint graph in sequence. in the execution phase, each construction step is evaluated to determine the geometric entities, using both approaches. By combining the advantages of the graph-based constructive approach with the universality of the numerical approach, the proposed approach can maximize the efficiency, robustness, and extensibility of geometric constraint solver.

  • PDF

Shallow P+-n Junction Formation and the Design of Boron Diffusion Simulator (박막 P+-n 접합 형성과 보론 확산 시뮬레이터 설계)

  • 김재영;이충근;김보라;홍신남
    • Journal of the Korean Institute of Electrical and Electronic Material Engineers
    • /
    • v.17 no.7
    • /
    • pp.708-712
    • /
    • 2004
  • Shallow $p^+-n$ junctions were formed by ion implantation and dual-step annealing processes. The dopant implantation was performed into the crystalline substrates using BF$_2$ ions. The annealing was performed with a rapid thermal processor and a furnace. FA+RTA annealing sequence exhibited better junction characteristics than RTA+FA thermal cycle from the viewpoint of junction depth and sheet resistance. A new simulator is designed to model boron diffusion in silicon. The model which is used in this simulator takes into account nonequilibrium diffusion, reactions of point defects, and defect-dopant pairs considering their charge states, and the dopant inactivation by introducing a boron clustering reaction. Using initial conditions and boundary conditions, coupled diffusion equations are solved successfully. The simulator reproduced experimental data successfully.

Segmentation of Multispectral MRI Using Fuzzy Clustering (퍼지 클러스터링을 이용한 다중 스펙트럼 자기공명영상의 분할)

  • 윤옥경;김현순;곽동민;김범수;김동휘;변우목;박길흠
    • Journal of Biomedical Engineering Research
    • /
    • v.21 no.4
    • /
    • pp.333-338
    • /
    • 2000
  • In this paper, an automated segmentation algorithm is proposed for MR brain images using T1-weighted, T2-weighted, and PD images complementarily. The proposed segmentation algorithm is composed of 3 step. In the first step, cerebrum images are extracted by putting a cerebrum mask upon the three input images. In the second step, outstanding clusters that represent inner tissues of the cerebrum are chosen among 3-dimensional(3D) clusters. 3D clusters are determined by intersecting densely distributed parts of 2D histogram in the 3D space formed with three optimal scale images. Optimal scale image is made up of applying scale space filtering to each 2D histogram and searching graph structure. Optimal scale image best describes the shape of densely distributed parts of pixels in 2D histogram and searching graph structure. Optimal scale image best describes the shape of densely distributed parts of pixels in 2D histogram. In the final step, cerebrum images are segmented using FCM algorithm with its initial centroid value as the outstanding clusters centroid value. The proposed cluster's centroid accurately. And also can get better segmentation results from the proposed segmentation algorithm with multi spectral analysis than the method of single spectral analysis.

  • PDF

Multi-Vector Document Embedding Using Semantic Decomposition of Complex Documents (복합 문서의 의미적 분해를 통한 다중 벡터 문서 임베딩 방법론)

  • Park, Jongin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.19-41
    • /
    • 2019
  • According to the rapidly increasing demand for text data analysis, research and investment in text mining are being actively conducted not only in academia but also in various industries. Text mining is generally conducted in two steps. In the first step, the text of the collected document is tokenized and structured to convert the original document into a computer-readable form. In the second step, tasks such as document classification, clustering, and topic modeling are conducted according to the purpose of analysis. Until recently, text mining-related studies have been focused on the application of the second steps, such as document classification, clustering, and topic modeling. However, with the discovery that the text structuring process substantially influences the quality of the analysis results, various embedding methods have actively been studied to improve the quality of analysis results by preserving the meaning of words and documents in the process of representing text data as vectors. Unlike structured data, which can be directly applied to a variety of operations and traditional analysis techniques, Unstructured text should be preceded by a structuring task that transforms the original document into a form that the computer can understand before analysis. It is called "Embedding" that arbitrary objects are mapped to a specific dimension space while maintaining algebraic properties for structuring the text data. Recently, attempts have been made to embed not only words but also sentences, paragraphs, and entire documents in various aspects. Particularly, with the demand for analysis of document embedding increases rapidly, many algorithms have been developed to support it. Among them, doc2Vec which extends word2Vec and embeds each document into one vector is most widely used. However, the traditional document embedding method represented by doc2Vec generates a vector for each document using the whole corpus included in the document. This causes a limit that the document vector is affected by not only core words but also miscellaneous words. Additionally, the traditional document embedding schemes usually map each document into a single corresponding vector. Therefore, it is difficult to represent a complex document with multiple subjects into a single vector accurately using the traditional approach. In this paper, we propose a new multi-vector document embedding method to overcome these limitations of the traditional document embedding methods. This study targets documents that explicitly separate body content and keywords. In the case of a document without keywords, this method can be applied after extract keywords through various analysis methods. However, since this is not the core subject of the proposed method, we introduce the process of applying the proposed method to documents that predefine keywords in the text. The proposed method consists of (1) Parsing, (2) Word Embedding, (3) Keyword Vector Extraction, (4) Keyword Clustering, and (5) Multiple-Vector Generation. The specific process is as follows. all text in a document is tokenized and each token is represented as a vector having N-dimensional real value through word embedding. After that, to overcome the limitations of the traditional document embedding method that is affected by not only the core word but also the miscellaneous words, vectors corresponding to the keywords of each document are extracted and make up sets of keyword vector for each document. Next, clustering is conducted on a set of keywords for each document to identify multiple subjects included in the document. Finally, a Multi-vector is generated from vectors of keywords constituting each cluster. The experiments for 3.147 academic papers revealed that the single vector-based traditional approach cannot properly map complex documents because of interference among subjects in each vector. With the proposed multi-vector based method, we ascertained that complex documents can be vectorized more accurately by eliminating the interference among subjects.

Nano Technology Trend Analysis Using Google Trend and Data Mining Method for Nano-Informatics (나노 인포매틱스 기반 구축을 위한 구글 트렌드와 데이터 마이닝 기법을 활용한 나노 기술 트렌드 분석)

  • Shin, Minsoo;Park, Min-Gyu;Bae, Seong-Hun
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.40 no.4
    • /
    • pp.237-245
    • /
    • 2017
  • Our research is aimed at predicting recent trend and leading technology for the future and providing optimal Nano technology trend information by analyzing Nano technology trend. Under recent global market situation, Users' needs and the technology to meet these needs are changing in real time. At this point, Nano technology also needs measures to reduce cost and enhance efficiency in order not to fall behind the times. Therefore, research like trend analysis which uses search data to satisfy both aspects is required. This research consists of four steps. We collect data and select keywords in step 1, detect trends based on frequency and create visualization in step 2, and perform analysis using data mining in step 3. This research can be used to look for changes of trend from three perspectives. This research conducted analysis on changes of trend in terms of major classification, Nano technology of 30's, and key words which consist of relevant Nano technology. Second, it is possible to provide real-time information. Trend analysis using search data can provide information depending on the continuously changing market situation due to the real-time information which search data includes. Third, through comparative analysis it is possible to establish a useful corporate policy and strategy by apprehending the trend of the United States which has relatively advanced Nano technology. Therefore, trend analysis using search data like this research can suggest proper direction of policy which respond to market change in a real time, can be used as reference material, and can help reduce cost.