Search | Korea Science

A Study on Variable Selection Bias in Data Mining Software Packages (데이터마이닝 패키지에서 변수선택 편의에 관한 연구)

송문섭;윤영주
- The Korean Journal of Applied Statistics
- /
- v.14 no.2
- /
- pp.475-486
- /
- 2001
데이터마이닝 패키지에 구현된 분류나무 알고리즘 가운데 CART, CHAID, QUEST, C4.5에서 변수 선택법을 비교하였다. CART의 전체탐색법이 편의를 갖는다는 사실은 잘알려졌으며, 여기서는 상품화된 패키지들에서 이들 알고리즘의 편의와 선택력을 모의실험 연구를 통하여 비교하였다. 상용 패키지로는 CART, Enterprise Miner, AnswerTree, Clementine을 사용하였다. 본 논문의 제한된 모의실험 연구 결과에 의하면 C4.5와 CART는 모두 변수선택에서 심각한 편의를 갖고 있으며, CHAID와 QUEST는 비교적 안정된 결과를 보여주고 있었다.
PDF

Real-Time Optical Flow Rendering (실시간 영상 생성을 위한 광학 흐름 요소 렌더링)

Park, Tae-Joon;Lee, Seungyong;Shin, Sung Yong
- Journal of the Korea Computer Graphics Society
- /
- v.4 no.2
- /
- pp.15-28
- /
- 1998
최근 영상 기반 렌더링(image-based rendering)을 위한 새로운 접근방법으로서 광학 흐름 요소 렌더링(optical flow rendering)이 제안되었다. 이 방법은 좌우 영상 대응(stereo matching)에서 발생하는 오류와 무관하게 고품질의 영상을 생성할 수 있고 깊이 정보 비교를 통해 기존의 렌더링 방법으로 생성한 영상과 광학 흐름 요소로부터 생성한 영상을 합성할 수 있는 반면에, 한 화소 당 하나 이상의 광학 흐름 요소를 필요로하기 때문에 연산량이 많아져 영상 생성이 느려지는 단점이 있었다. 본 논문에서는 실시간 영상 생성을 위한 광학 흐름 요소 구성법과 영상 생성법을 제안한다. 각각의 광학 흐름 요소가 영상 내에서 화소들의 구간에 대응되도록 개선하여 전체 광학 흐름 요소의 수를 줄였으며, 필터링 탐색법 (filtering search)을 적용하여 전체 광학 흐름 요소를 모두 탐색하는 대신 실제로 영상 생성에 사용되는 광학 흐름 요소만을 탐색함으로써 전체 연산량을 크게 줄였다. 제안된 방법을 SGI Indigo2 Impact 워크스테이션(R10000 CPU; 128 Mbytes)상에서 구현한 결과, 초당 10장 이상의 고속 영상 생성이 가능했다.
PDF

Fast Codebook Search Method using Triangle Inequality for Vector Quantization (백터 양자화를 위한 삼각 부등식을 이용하는 빠른 코드북 탐색법)

김성재;안철웅;김승호
- Proceedings of the Korean Information Science Society Conference
- /
- 1998.10c
- /
- pp.526-528
- /
- 1998
영상 자료는 일반적으로 많은 정보량을 가지기 때문에 저장 공간과 전송 시간의 문제 등이 발생한다. 이 문제를 해결하기 위해 영상 압축 기법이 사용되며 그 방법 중의 하나로 벡터 양자화가 있다. 벡터 양자화는 압축률은 높지만 시간이 많이 걸리는데, 전체 처리 시간 중에서도 영상의 각 블록에 해당하는 코드벡터를 찾기 위해 주어진 코드북을 탐색하는 단계에 소요되는 시간이 가장 큰 비중을 차지한다. 본 논문에서는 코드북 탐색에 소요되는 시간을 줄여 벡터 양자화를 빠르게 하기 위한 방법으로 삼각 부등식을 이용하는 빠른 코드북 탐색법을 제안한다. 제안된 방법은 삼각 부등식을 이용해 구한 하한값을 기준으로 불필요한 계산을 줄여서 탐색 속도를 증가시킨다. 제안된 방법의 평가를 의해 100장의 256$\times$256, 256 레벨 흑백 영상을 사용하였고, 기존의 전체 탐색 방법에 비해 배 이상의 속도 향상을 얻을 수 있었다.
PDF

The extended longest match strategy for efficient Korean analysis (효율적인 한국어 분석을 위한 확장된 최장일치법)

Lee, Gi-O;Lee, Keun-Yong;Lee, Yong-Seok
- Annual Conference on Human and Language Technology
- /
- 1996.10a
- /
- pp.255-261
- /
- 1996
한국어 형태소 분석 방법중 최장일치법은 영어의 분석처럼 one-pass로 한국어를 분석할 수 있도록 하는 기법에 가장 적절하다. 그러나 최장일치법은 매우 많은 분석 후보를 생성하여 탐색 회수가 많아 시스템의 성능을 떨어뜨린다. 또한 대부분의 한국어 형태소 분석 시스템들은 형태소 자체에만 중점을 두어 한국어 분석 시스템 전체의 성능은 고려하지 않아 형태소 분석 시스템의 결과가 파서의 입력에 적절치 못한 결과를 생성한다. 본 논문에서는 형태소 분석의 원형복원 규칙과 사전 탐색을 통합하여 과분석 후보에 대한 탐색 회수를 줄이고 전체 시스템의 성능을 향상시키기 위해 파서에 적합한 입력을 제공하는 확장된 최장일치법을 제안한다.
PDF

Parameter Calibrations of a Daily Rainfall-Runoff Model Using Global Optimization Methods (전역최적화 기법을 이용한 강우-유출모형의 매개변수 자동보정)

Kang, Min-Goo;Park, Seung-Woo;Im, Sang-Jun;Kim, Hyun-Jun
- Journal of Korea Water Resources Association
- /
- v.35 no.5
- /
- pp.541-552
- /
- 2002
Two global optimization methods, the SCE-UA method and the Annealing-Simplex(A-S) method for calibrating a daily rainfall-runoff model, a Tank model, was compared with that of the Downhill Simplex method. In synthetic data study, 100% success rates for all objective functions were obtained from the A-S method, and the SCE-UA method was also consistently able to obtain good estimates. The Downhill Simplex method was converged to the true values only when the initial guess was close to the true values. In the historical data study, the A-S method and the SCE-UA method showed consistently good results regardless of objective function. An objective function was developed, which puts more weight on the low flows.
https://doi.org/10.3741/JKWRA.2002.35.5.541 인용 PDF KSCI

Past Block Matching Motion Estimation based on Multiple Local Search Using Spatial Temporal Correlation (시공간적 상관성을 이용한 국소 다중 탐색기반 고속 블록정합 움직임 추정)

조영창;남혜영;이태홍
- Journal of Korea Multimedia Society
- /
- v.3 no.4
- /
- pp.356-364
- /
- 2000
Block based fast motion estimation algorithm use the fixed search pattern to reduce the search point, and are based on the assumption that the error in the mean absolute error space monotonically decreases to the global minimum. Therefore, in case of many local minima in a search region we are likely to find local minima instead of the global minimum and highly rely on the initial search points. This situation is evident in the motion boundary. In this paper we define the candidate regions within the search region using the motion information of the neighbor blocks and we propose the multiple local search method (MLSM) which search for the solution throughout the candidate regions to reduce the possibilities of isolation to the local minima. In the MLSM we mark the candidate region in the search point map and we avoid to search the candidate regions already visited to reduce the calculation. In the simulation results the proposed method shows more excellent results than that of other gradient based method especially in the search of motion boundary. Especially, in PSNR the proposed method obtains similar estimate accuracy with the significant reduction of search points to that of full search.
PDF

Regression Trees with. Unbiased Variable Selection (변수선택 편향이 없는 회귀나무를 만들기 위한 알고리즘)

김진흠;김민호
- The Korean Journal of Applied Statistics
- /
- v.17 no.3
- /
- pp.459-473
- /
- 2004
It has well known that an exhaustive search algorithm suggested by Breiman et. a1.(1984) has a trend to select the variable having relatively many possible splits as an splitting rule. We propose an algorithm to overcome this variable selection bias problem and then construct unbiased regression trees based on the algorithm. The proposed algorithm runs two steps of selecting a split variable and determining a split rule for binary split based on the split variable. Simulation studies were performed to compare the proposed algorithm with Breiman et a1.(1984)'s CART(Classification and Regression Tree) in terms of degree of variable selection bias, variable selection power, and MSE(Mean Squared Error). Also, we illustrate the proposed algorithm with real data sets.
https://doi.org/10.5351/KJAS.2004.17.3.459 인용 PDF KSCI

Optimization of the Parameter of Neuro-Fuzzy system using Particle Swarm Optimization (PSO를 이용한 뉴로-퍼지 시스템의 파라미터 최적화)

Kim Seung-Seok;Kim Yong-Tae;Kim Ju-Sik;Jeon Byeong-Seok
- Proceedings of the Korean Institute of Intelligent Systems Conference
- /
- 2006.05a
- /
- pp.168-171
- /
- 2006
본 논문에서는 Particle Swarm Optimization 기법을 이용한 뉴로-퍼지 시스템의 파라미터 동정을 실시한다. PSO의 학습 및 군집 특성을 이용하여 시스템을 학습한다. 유전 알고리즘과 같은 무작위 탐색법을 이용하며 하나의 해 군집에 대해 다수 객체들이 탐색하는 기법을 통하여 최적해 부분의 탐색성능을 높여 전체 모델의 학습성능을 개선하고자 한다. 제안된 기법의 유용성을 시뮬레이션을 통하여 보이고자 한다.
PDF

Integration of Integer Programming and Neighborhood Search Algorithm for Solving a Nonlinear Optimization Problem (비선형 최적화 문제의 해결을 위한 정수계획법과 이웃해 탐색 기법의 결합)

Hwang, Jun-Ha
- Journal of the Korea Society of Computer and Information
- /
- v.14 no.2
- /
- pp.27-35
- /
- 2009
Integer programming is a very effective technique for searching optimal solution of combinatorial optimization problems. However, its applicability is limited to linear models. In this paper, I propose an effective method for solving a nonlinear optimization problem by integrating the powerful search performance of integer programming and the flexibility of neighborhood search algorithms. In the first phase, integer programming is executed with subproblem which can be represented as a linear form from the given problem. In the second phase, a neighborhood search algorithm is executed with the whole problem by taking the result of the first phase as the initial solution. Through the experimental results using a nonlinear maximal covering problem, I confirmed that such a simple integration method can produce far better solutions than a neighborhood search algorithm alone. It is estimated that the success is primarily due to the powerful performance of integer programming.
https://doi.org/10.9708/jksci.2009.14.2.027 인용 PDF

An Index Interpolation-based Subsequence Matching Algorithm supporting Normalization Transform in Time-Series Databases (시계열 데이터베이스에서 인덱스 보간법을 기반으로 정규화 변환을 지원하는 서브시퀀스 매칭 알고리즘)

No, Ung-Gi;Kim, Sang-Uk;Hwang, Gyu-Yeong
- Journal of KIISE:Databases
- /
- v.28 no.2
- /
- pp.217-232
- /
- 2001
본 논문에서는 시계열 데이터베이스에서 정규화 변환을 지원하는 서브시퀀스 매칭 알고리즘을 제안한다. 정규화 변환을 시계열 데이터 간의 절대적인 유클리드 거리에 관계 없이, 구성하는 값들의 상대적인 변화 추이가 유사한 패턴을 갖는 시계열 데이터를 검색하는 데에 유용하다. 기존의 서브시퀀스 매칭 알고리즘을 확장 없이 정규화 변환 서브시퀀스 매칭에 단순히 응용할 경우, 질의 결과로 반환되어야 할 서부시퀀스를 모두 찾아내지 못하는 착오 기각이 발생한다. 또한, 정규화 변환을 지원하는 기존의 전체 매칭 알고리즘의 경우, 모든 가능한 질의 시퀀스 길이 각각에 대하여 하나씩의 인덱스를 생성하여야 하므로, 저장 공간 및 데이터 시퀀스 삽입/삭제의 부담이 매우 심각하다. 본 논문에서는 인덱스 보간법을 이용하여 문제를 해결한다. 인덱스 보간법은 인덱스가 요구되는 모든 경우 중에서 적당한 간격의 일부에 대해서만 생성된 인덱스를 이용하며, 인덱스가 필요한 모든 경우에 대한 탐색을 수행하는 기법이다. 제안된 알고리즘은 몇 개의 질의 시퀀스 길이에 대해서만 각각 인덱스를 생성한 후, 이를 이용하여 모든 가능한 길이의 질의 시퀀스에 대해서 탐색을 수행한다. 이때, 착오 기각이 발생하지 않음을 증명한다. 제안된 알고리즘은 질의 시에 주어진 질의 시퀀스의 길이에 따라 생성되어 있는 인덱스 중에서 가장 적절한 것을 선택하여 탐색을 수행한다. 이때, 생성되어 있는 인덱스의 개수가 많을수록 탐색 성능이 향상된다. 필요에 따라 인덱스의 개수를 변화함으로써 탐색 성능과 저장 공간 간의 비율을 유연하게 조정할 수 있다. 질의 시퀀스의 길이 256 ~ 512중 다섯 개의 길이에 대해 인덱스를 생성하여 실험한 결과, 탐색 결과 선택률이 $10^{-2}$일 때 제안된 알고리즘의 탐색 성능이 순차 검색에 비하여 평균 2.40배, 선택률이 $10^{-5}$일 때 평균 14.6배 개선되었다. 제안된 알고리즘의 탐색 성능은 탐색 결과 선택률이 작아질수록 더욱 향상되므로, 실제 데이터베이스 응용에서의 효용성이 높다고 판단된다.
PDF

Search Result 96, Processing Time 0.027 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)