• Title/Summary/Keyword: 군집 수 최적화

Search Result 128, Processing Time 0.032 seconds

The Optimization of Near Duplicate Detection Using Representative Unigram Grouping (대표 Unigram 군집화를 통한 유사중복문서 검출 최적화)

  • Kwon, Young-Hyun;Yun, Do-Hyun;Ahn, Young-Min
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2012.06b
    • /
    • pp.291-293
    • /
    • 2012
  • SNS, 블로그의 이용이 늘어나면서, 문서의 복제와 재생산이 빈번하게 발생함에 따라 대용량 문서에서의 유사중복문서 검출이 큰 이슈로 제기되고 있다. 본 논문에서는 한국어 문서를 대상으로 이러한 문제를 해결하기 위해 품질을 유지하면서 신속하게 문서집합 중 유사중복문서를 검출하는 방법에 대해 제안한다. 제안하는 알고리즘에서는 문서를 대표하는 고빈도 Unigram Token을 활용하여 문서를 군집화함으로써 비교 대상을 최소화 하였다. 실험결과, 76만 문서에서 기존 방법 대비 평균 0.88의 Recall을 유지하면서도 중복을 검출하는데 있어서 십수초내에 처리가 가능함을 보였다. 향후 대용량 검색시스템 및 대용량 이미지, 동영상 유사중복 검출에도 활용할 수 있을 것으로 기대한다.

Design and Implementation of Distributed In-Memory DBMS-based Parallel K-Means as In-database Analytics Function (분산 인 메모리 DBMS 기반 병렬 K-Means의 In-database 분석 함수로의 설계와 구현)

  • Kou, Heymo;Nam, Changmin;Lee, Woohyun;Lee, Yongjae;Kim, HyoungJoo
    • KIISE Transactions on Computing Practices
    • /
    • v.24 no.3
    • /
    • pp.105-112
    • /
    • 2018
  • As data size increase, a single database is not enough to serve current volume of tasks. Since data is partitioned and stored into multiple databases, analysis should also support parallelism in order to increase efficiency. However, traditional analysis requires data to be transferred out of database into nodes where analytic service is performed and user is required to know both database and analytic framework. In this paper, we propose an efficient way to perform K-means clustering algorithm inside the distributed column-based database and relational database. We also suggest an efficient way to optimize K-means algorithm within relational database.

A Mesh Partitioning Using Adaptive Vertex Clustering (적응형 정점 군집화를 이용한 메쉬 분할)

  • Kim, Dae-Young;Kim, Jong-Won;Lee, Hae-Young
    • Journal of the Korea Computer Graphics Society
    • /
    • v.15 no.3
    • /
    • pp.19-26
    • /
    • 2009
  • In this paper, a new adaptive vertex clustering using a KD-tree is presented for 3D mesh partitioning. A vertex clustering is used to divide a huge 3D mesh into several partitions for various mesh processing. An octree-based clustering and K-means clustering are currently leading techniques. However, the octree-based methods practice uniform space divisions and so each partitioned mesh has non-uniformly distributed number of vertices and the difference in its size. The K-means clustering produces uniformly partitioned meshes but takes much time due to many repetitions and optimizations. Therefore, we propose to use a KD-tree to efficiently partition meshes with uniform number of vertices. The bounding box region of the given mesh is adaptively subdivided according to the number of vertices included and dynamically determined axis. As a result, the partitioned meshes have a property of compactness with uniformly distributed vertices.

  • PDF

Relationship between Phytoplankton Community and Water Quality in Lakes in Jeonnam using SOM (SOM을 이용한 전남 호소의 식물플랑크톤 군집과 수질 관계 분석)

  • Cho, Hyeon Jin;Na, Jeong Eun;Jung, Myoung Hwa;Lee, Hak Young
    • Korean Journal of Ecology and Environment
    • /
    • v.50 no.1
    • /
    • pp.148-156
    • /
    • 2017
  • In this study, we analyzed the relationship between phytoplankton community and physicochemical factors in 12 lakes located in Jeollanam-do based on the data surveyed from March to November 2014. Totally, 297 species of phytoplankton were identified including 98 Bacillariophyceae, 148 Chlorophyceae, 23 Cyanophyceae and 28 other phytoplankton taxa. The standing crops ranged from 124 to $59,148cells\;mL^{-1}$ and showed the highest in August with the increase of Cyanophycean cells. The self-organizing map (SOM) was optimized into $9{\times}6$ grid and was classified into 5 clusters based on the similarity of environmental factors and phytoplankton indices. The SOM results showed that phytoplankton communities had positive relationship with water temperature, SS, DO, BOD, TP and Chl-a, whereas low relationship with pH, TN, $NH_3-N$, $NO_3-N$, $PO_4-P$ and Conductivity. In Pearson's correlation coefficient, relationship between environmental factors and phytoplankton communities showed similar results with SOM.

Extracting Typical Group Preferences through User-Item Optimization and User Profiles in Collaborative Filtering System (사용자-상품 행렬의 최적화와 협력적 사용자 프로파일을 이용한 그룹의 대표 선호도 추출)

  • Ko Su-Jeong
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.7
    • /
    • pp.581-591
    • /
    • 2005
  • Collaborative filtering systems have problems involving sparsity and the provision of recommendations by making correlations between only two users' preferences. These systems recommend items based only on the preferences without taking in to account the contents of the items. As a result, the accuracy of recommendations depends on the data from user-rated items. When users rate items, it can be expected that not all users ran do so earnestly. This brings down the accuracy of recommendations. This paper proposes a collaborative recommendation method for extracting typical group preferences using user-item matrix optimization and user profiles in collaborative tittering systems. The method excludes unproven users by using entropy based on data from user-rated items and groups users into clusters after generating user profiles, and then extracts typical group preferences. The proposed method generates collaborative user profiles by using association word mining to reflect contents as well as preferences of items and groups users into clusters based on the profiles by using the vector space model and the K-means algorithm. To compensate for the shortcoming of providing recommendations using correlations between only two user preferences, the proposed method extracts typical preferences of groups using the entropy theory The typical preferences are extracted by combining user entropies with item preferences. The recommender system using typical group preferences solves the problem caused by recommendations based on preferences rated incorrectly by users and reduces time for retrieving the most similar users in groups.

Efficient Uncertainty Analysis of TOPMODEL Using Particle Swarm Optimization (입자군집최적화 알고리듬을 이용한 효율적인 TOPMODEL의 불확실도 분석)

  • Cho, Huidae;Kim, Dongkyun;Lee, Kanghee
    • Journal of Korea Water Resources Association
    • /
    • v.47 no.3
    • /
    • pp.285-295
    • /
    • 2014
  • We applied the ISPSO-GLUE method, which integrates the Isolated-Speciation-based Particle Swarm Optimization (ISPSO) with the Generalized Likelihood Uncertainty Estimation (GLUE) method, to the uncertainty analysis of the Topography Model (TOPMODEL) and compared its performance with that of the GLUE method. When we performed the same number of model runs for the both methods, we were able to identify the point where the performance of ISPSO-GLUE exceeded that of GLUE, after which ISPSOGLUE kept improving its performance steadily while GLUE did not. When we compared the 95% uncertainty bounds of the two methods, their general shapes and trends were very similar, but those of ISPSO-GLUE enclosed about 5.4 times more observed values than those of GLUE did. What it means is that ISPSOGLUE requires much less number of parameter samples to generate better performing uncertainty bounds. When compared to ISPSO-GLUE, GLUE overestimated uncertainty in the recession limb following the maximum peak streamflow. For this recession period, GLUE requires to find more behavioral models to reduce the uncertainty. ISPSO-GLUE can be a promising alternative to GLUE because the uncertainty bounds of the method were quantitatively superior to those of GLUE and, especially, computationally expensive hydrologic models are expected to greatly take advantage of the feature.

Comparative Study of Optimization Algorithms for Designing Optimal Aperiodic Optical Phased Arrays for Minimal Side-lobe Levels (비주기적 광위상배열에서 Side-lobe Level이 최소화된 구조 설계를 위한 최적화 알고리즘의 비교 연구)

  • Lee, Bohae;Ryu, Han-Youl
    • Korean Journal of Optics and Photonics
    • /
    • v.33 no.1
    • /
    • pp.11-21
    • /
    • 2022
  • We have investigated the optimal design of an aperiodic optical phased array (OPA) for use in light detection and ranging applications. Three optimization algorithms - particle-swarm optimization (PSO), a genetic algorithm (GA), and a pattern-search algorithm (PSA) - were employed to obtain the optimal arrangement of optical antennas comprising an OPA. The optimization was performed to obtain the minimal side-lobe level (SLL) of an aperiodic OPA at each steering angle, using the three optimization algorithms. It was found that PSO and GA exhibited similar results for the SLL of the optimized OPA, while the SLL obtained by PSA showed somewhat different features from those obtained by PSO and GA. For an OPA optimized at a steering angle <45°, the SLL value averaged over all steering angles increased as the angle of optimization decreased. However, when the angle of optimization was larger than 45°, low average SLL values of <13 dB were obtained for all three optimization algorithms. This implies that an OPA with high signal quality can be obtained when the arrangement of the optical antennas is optimized at a large steering angle.

Structural Design of Radial Basis Function-based Polynomial Neural Networks by Using Multiobjective Particle Swarm Optimization (다중목적 입자군집 최적화 알고리즘을 이용한 방사형 기저 함수 기반 다항식 신경회로망 구조 설계)

  • Kim, Wook-Dong;Oh, Sung-Kwun
    • Proceedings of the KIEE Conference
    • /
    • 2011.07a
    • /
    • pp.1966-1967
    • /
    • 2011
  • 본 연구에서는 방사형 기저 함수를 이용한 다항식 신경회로망(Polynomial Neural Network) 분류기를 제안한다. 제안된 모델은 PNN을 기본 구조로 하여 1층의 다항식 노드 대신에 다중 출력 형태의 방사형 기저 함수를 사용하여 각 노드가 방사형 기저 함수 신경회로망(RBFNN)을 형성한다. RBFNN의 은닉층에는 fuzzy 클러스터링을 사용하여 입력 데이터의 특성을 고려한 적합도를 사용하였다. 제안된 분류기는 입력변수의 수와 다항식 차수가 모델의 성능을 결정함으로 최적화가 필요하며 본 논문에서는 Multiobjective Particle Swarm Optimization(MoPSO)을 사용하여 모델의 성능뿐만 아니라 모델의 복잡성 및 해석력을 고려하였다. 패턴 분류기로써의 제안된 모델을 평가하기 위해 Iris 데이터를 이용하였다.

  • PDF

Generating Adaptive Fuzzy Classification Rules using An Efficient Evolutionary Algorithm (효율적인 진화알고리즘을 이용한 적응형 퍼지 분류 규칙 생성)

  • Ryu, Joung-Woo;Kim, Sung-Eun;Kim, Myung-Won
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.11b
    • /
    • pp.769-771
    • /
    • 2005
  • 데이터 특성이 연속적이고 애매할 때 퍼지규칙으로 분류 규칙을 표현하는 것은 매우 유용하고 효과적이다. 그러나 일반적으로 정확하지 않은 데이터 특성에 대해서 소속함수를 결정한다는 것은 어려운 일이다. 본 논문에서는 진화알고리즘을 이용하여 효과적인 퍼지 분류 규칙을 자동으로 생성하는 방법을 제안한다. 제안한 방법에서 규칙의 정확성과 이해성을 고려하여 최적화된 소속함수를 생성하기 위해 진화알고리즘을 사용한다. 먼저 지도 군집화로 진화를 위한 초기 소속함수를 생성한다. 진화알고리즘은 전역적 최적 해를 찾는데 효과적이다. 그러나 시간에 대한 효율성이 낮다. 특히 모델 최적화 문제에서는 개체 평가 단계에서 많은 시간이 소요된다. 따라서 본 논문에서는 전체 데이터를 여러 개의 부분 데이터들로 나누고 개체들은 전체 데이터 대신 매번 부분 데이터를 임의적으로 선택하여 개체를 평가함으로써 수행 시간을 단축시킬 수 있는 진화 방법을 제안한다. 제안한 퍼지 분류 규칙 생성 방법의 타당성을 검증하기 위한 실험 데이터로 UCI에서 제공하는 데이터들을 사용하였으며, 실험 결과는 기존 방법에 비해 평균적으로 더 효과적임을 확인하였다.

  • PDF

Effective Design of Pixel-type Frequency Selective Surfaces using an Improved Binary Particle Swarm Optimization Algorithm (개선된 이진 입자 군집 최적화 알고리즘을 적용한 픽셀 형태 주파수 선택적 표면의 효율적인 설계방안 연구)

  • Yang, Dae-Do;Park, Chan-Sun;Yook, Jong-Gwan
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.30 no.4
    • /
    • pp.261-269
    • /
    • 2019
  • This study investigates a method of designing pixel-type frequency selective surfaces(FSS) with flexibility while considering factors, such as polarization and incident angle. Among the various methods used to solve the discrete space problem when designing a pixel-type FSS, the binary particle swarm optimization(BPSO) algorithm is one of the most applicable techniques to determine the periodic structure pattern of an FSS. Therefore, a method of efficiently designing FSS with roll-off band pass characteristics using an improved BPSO algorithm is proposed. To solve the convergence problem in the fitness function design to induce particles in the desired solution, FSS with desired roll-off wave characteristics can be easily obtained by applying a fitness function using "slope" as an input parameter.