Search | Korea Science

An Algorithm for Sequential Sampling Method in Data Mining (데이터 마이닝에서 샘플링 기법을 이용한 연속패턴 알고리듬)

홍지명;김낙현;김성집
- Journal of Korean Society of Industrial and Systems Engineering
- /
- v.21 no.45
- /
- pp.101-112
- /
- 1998
Data mining, which is also referred to as knowledge discovery in database, means a process of nontrivial extraction of implicit, previously unknown and potentially useful information (such as knowledge rules, constraints, regularities) from data in databases. The discovered knowledge can be applied to information management, decision making, and many other applications. In this paper, a new data mining problem, discovering sequential patterns, is proposed which is to find all sequential patterns using sampling method. Recognizing that the quantity of database is growing exponentially and transaction database is frequently updated, sampling method is a fast algorithm reducing time and cost while extracting the trend of customer behavior. This method analyzes the fraction of database but can in general lead to results of a very high degree of accuracy. The relaxation factor, as well as the sample size, can be properly adjusted so as to improve the result accuracy while minimizing the corresponding execution time. The superiority of the proposed algorithm will be shown through analyzing accuracy and efficiency by comparing with Apriori All algorithm.
PDF

A Range-Based Monte Carlo Box Algorithm for Mobile Nodes Localization in WSNs

Li, Dan;Wen, Xianbin
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.11 no.8
- /
- pp.3889-3903
- /
- 2017
Fast and accurate localization of randomly deployed nodes is required by many applications in wireless sensor networks (WSNs). However, mobile nodes localization in WSNs is more difficult than static nodes localization since the nodes mobility brings more data. In this paper, we propose a Range-based Monte Carlo Box (RMCB) algorithm, which builds upon the Monte Carlo Localization Boxed (MCB) algorithm to improve the localization accuracy. This algorithm utilizes Received Signal Strength Indication (RSSI) ranging technique to build a sample box and adds a preset error coefficient in sampling and filtering phase to increase the success rate of sampling and accuracy of valid samples. Moreover, simplified Particle Swarm Optimization (sPSO) algorithm is introduced to generate new samples and avoid constantly repeated sampling and filtering process. Simulation results denote that our proposed RMCB algorithm can reduce the location error by 24%, 14% and 14% on average compared to MCB, Range-based Monte Carlo Localization (RMCL) and RSSI Motion Prediction MCB (RMMCB) algorithm respectively and are suitable for high precision required positioning scenes.
https://doi.org/10.3837/tiis.2017.08.007 인용 PDF KSCI

A Searching Algorithm for Minimum Bandpass Sampling Frequency in Simultaneous Down-Conversion of Multiple RF Signals

Bae, Jung-Hwa;Park, Jin-Woo
- Journal of Communications and Networks
- /
- v.10 no.1
- /
- pp.55-62
- /
- 2008
Bandpass sampling (BPS) techniques for the direct down-conversion of RF bandpass signals have become an essential technique for software defined radio (SDR), due to their advantage of minimizing the radio frequency (RF) front-end hardware dependency. This paper proposes an algorithm for finding the minimum BPS frequency for simultaneously down-converting multiple RF signals through full permutation over all the valid sampling ranges found for the multiple RF signals. We also present a scheme for reducing the computational complexity resulting from the large scale of the purmutation calculation involved in searching for the minimum BPS frequency. In addition, we investigate the BPS frequency allowing for the guard-band between adajacent down-converted signals, which help lessen the severe requirements in practical implementations. The performance of the proposed method is compared with those of other pre-reported methods to prove its effectiveness.
PDF KSCI

Support Vector Machine based on Stratified Sampling

Jun, Sung-Hae
- International Journal of Fuzzy Logic and Intelligent Systems
- /
- v.9 no.2
- /
- pp.141-146
- /
- 2009
Support vector machine is a classification algorithm based on statistical learning theory. It has shown many results with good performances in the data mining fields. But there are some problems in the algorithm. One of the problems is its heavy computing cost. So we have been difficult to use the support vector machine in the dynamic and online systems. To overcome this problem we propose to use stratified sampling of statistical sampling theory. The usage of stratified sampling supports to reduce the size of training data. In our paper, though the size of data is small, the performance accuracy is maintained. We verify our improved performance by experimental results using data sets from UCI machine learning repository.
https://doi.org/10.5391/IJFIS.2009.9.2.141 인용 PDF KSCI

Level Selection Algorithm with Fixed Sampling Frequency for Modular Multilevel Converter (고정 샘플링 주파수에서의 모듈형 멀티레벨 컨버터 레벨 선택 알고리즘)

Kim, Chan-Ki;Park, Chang-Hwan;Kim, Jang-Mok
- The Transactions of the Korean Institute of Power Electronics
- /
- v.23 no.6
- /
- pp.415-423
- /
- 2018
This study uses a level selection algorithm with fixed sampling frequency for modular multilevel converter (MMC) systems. Theoretically, the proposed method increases the level infinitely while the sampling time remains the same. The proposed method called cluster stream buffer (CSB) consists of several clusters, wherein each cluster is composed of 32 submodules that depend on the level of the submodules in the MMC system. To increase the level of the MMC system, additional clusters are used, and the sampling time between clusters is determined from the sampling time between levels needed for utilizing the entire level from the MMC system. This method is crucial in the control of MMC-type HVDC systems because it improves scalability and precision.
https://doi.org/10.6113/TKPE.2018.23.6.415 인용 PDF KSCI HTML

An importance sampling for a function of a multivariate random variable

Jae-Yeol Park;Hee-Geon Kang;Sunggon Kim
- Communications for Statistical Applications and Methods
- /
- v.31 no.1
- /
- pp.65-85
- /
- 2024
The tail probability of a function of a multivariate random variable is not easy to estimate by the crude Monte Carlo simulation. When the occurrence of the function value over a threshold is rare, the accurate estimation of the corresponding probability requires a huge number of samples. When the explicit form of the cumulative distribution function of each component of the variable is known, the inverse transform likelihood ratio method is directly applicable scheme to estimate the tail probability efficiently. The method is a type of the importance sampling and its efficiency depends on the selection of the importance sampling distribution. When the cumulative distribution of the multivariate random variable is represented by a copula and its marginal distributions, we develop an iterative algorithm to find the optimal importance sampling distribution, and show the convergence of the algorithm. The performance of the proposed scheme is compared with the crude Monte Carlo simulation numerically.
https://doi.org/10.29220/CSAM.2024.31.1.065 인용 PDF

Matrix completion based adaptive sampling for measuring network delay with online support

Meng, Wei;Li, Laichun
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.14 no.7
- /
- pp.3057-3075
- /
- 2020
End-to-end network delay plays an vital role in distributed services. This delay is used to measure QoS (Quality-of-Service). It would be beneficial to know all node-pair delay information, but unfortunately it is not feasible in practice because the use of active probing will cause a quadratic growth in overhead. Alternatively, using the measured network delay to estimate the unknown network delay is an economical method. In this paper, we adopt the state-of-the-art matrix completion technology to better estimate the network delay from limited measurements. Although the number of measurements required for an exact matrix completion is theoretically bounded, it is practically less helpful. Therefore, we propose an online adaptive sampling algorithm to measure network delay in which statistical leverage scores are used to select potential matrix elements. The basic principle behind is to sample the elements with larger leverage scores to keep the traits of important rows or columns in the matrix. The amount of samples is adaptively decided by a proposed stopping condition. Simulation results based on real delay matrix show that compared with the traditional sampling algorithm, our proposed sampling algorithm can provide better performance (smaller estimation error and less convergence pressure) at a lower cost (fewer samples and shorter processing time).
https://doi.org/10.3837/tiis.2020.07.018 인용 PDF KSCI HTML

Global sensitivity analysis improvement of rotor-bearing system based on the Genetic Based Latine Hypercube Sampling (GBLHS) method

Fatehi, Mohammad Reza;Ghanbarzadeh, Afshin;Moradi, Shapour;Hajnayeb, Ali
- Structural Engineering and Mechanics
- /
- v.68 no.5
- /
- pp.549-561
- /
- 2018
Sobol method is applied as a powerful variance decomposition technique in the field of global sensitivity analysis (GSA). The paper is devoted to increase convergence speed of the extracted Sobol indices using a new proposed sampling technique called genetic based Latine hypercube sampling (GBLHS). This technique is indeed an improved version of restricted Latine hypercube sampling (LHS) and the optimization algorithm is inspired from genetic algorithm in a new approach. The new approach is based on the optimization of minimax value of LHS arrays using manipulation of array indices as chromosomes in genetic algorithm. The improved Sobol method is implemented to perform factor prioritization and fixing of an uncertain comprehensive high speed rotor-bearing system. The finite element method is employed for rotor-bearing modeling by considering Eshleman-Eubanks assumption and interaction of axial force on the rotor whirling behavior. The performance of the GBLHS technique are compared with the Monte Carlo Simulation (MCS), LHS and Optimized LHS (Minimax. criteria). Comparison of the GBLHS with other techniques demonstrates its capability for increasing convergence speed of the sensitivity indices and improving computational time of the GSA.
https://doi.org/10.12989/sem.2018.68.5.549 인용 KSCI

RANDOM SAMPLING AND RECONSTRUCTION OF SIGNALS WITH FINITE RATE OF INNOVATION

Jiang, Yingchun;Zhao, Junjian
- Bulletin of the Korean Mathematical Society
- /
- v.59 no.2
- /
- pp.285-301
- /
- 2022
In this paper, we mainly study the random sampling and reconstruction of signals living in the subspace V^p(𝚽, 𝚲) of L^p(ℝ^d), which is generated by a family of molecules 𝚽 located on a relatively separated subset 𝚲 ⊂ ℝ^d. The space V^p(𝚽, 𝚲) is used to model signals with finite rate of innovation, such as stream of pulses in GPS applications, cellular radio and ultra wide-band communication. The sampling set is independently and randomly drawn from a general probability distribution over ℝ^d. Under some proper conditions for the generators 𝚽 = {𝜙_λ : λ ∈ 𝚲} and the probability density function 𝜌, we first approximate V^p(𝚽, 𝚲) by a finite dimensional subspace V^p_N (𝚽, 𝚲) on any bounded domains. Then, we prove that the random sampling stability holds with high probability for all signals in V^p(𝚽, 𝚲) whose energy concentrate on a cube when the sampling size is large enough. Finally, a reconstruction algorithm based on random samples is given for signals in V^p_N (𝚽, 𝚲).
https://doi.org/10.4134/BKMS.b200916 인용 PDF KSCI

CHAID Algorithm by Cube-based Sampling

Park, Hee-Chang;Cho, Kwang-Hyun
- 한국데이터정보과학회:학술대회논문집
- /
- 2003.10a
- /
- pp.239-247
- /
- 2003
Decision tree algorithms are used extensively for data mining in many domains such as retail target marketing, fraud dection, data reduction and variable screening, etc. CHAID(Chi-square Automatic Interaction Detector), is an exploratory method used to study the relationship between a dependent variable and a series of predictor variables. In this paper we propose and CHAID algorithm by cube-based sampling and explore CHAID algorithm in view of accuracy and speed by the number of variables.
PDF

Search Result 1,005, Processing Time 0.022 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)