• 제목/요약/키워드: Number Data

검색결과 22,385건 처리시간 0.045초

The Effect of the Number of Training Data on Speech Recognition

  • Lee, Chang-Young
    • The Journal of the Acoustical Society of Korea
    • /
    • 제28권2E호
    • /
    • pp.66-71
    • /
    • 2009
  • In practical applications of speech recognition, one of the fundamental questions might be on the number of training data that should be provided for a specific task. Though plenty of training data would undoubtedly enhance the system performance, we are then faced with the problem of heavy cost. Therefore, it is of crucial importance to determine the least number of training data that will afford a certain level of accuracy. For this purpose, we investigate the effect of the number of training data on the speaker-independent speech recognition of isolated words by using FVQ/HMM. The result showed that the error rate is roughly inversely proportional to the number of training data and grows linearly with the vocabulary size.

실루엣을 적용한 그룹탐색 최적화 데이터클러스터링 (Group Search Optimization Data Clustering Using Silhouette)

  • 김성수;백준영;강범수
    • 한국경영과학회지
    • /
    • 제42권3호
    • /
    • pp.25-34
    • /
    • 2017
  • K-means is a popular and efficient data clustering method that only uses intra-cluster distance to establish a valid index with a previously fixed number of clusters. K-means is useless without a suitable number of clusters for unsupervised data. This paper aimsto propose the Group Search Optimization (GSO) using Silhouette to find the optimal data clustering solution with a number of clusters for unsupervised data. Silhouette can be used as valid index to decide the number of clusters and optimal solution by simultaneously considering intra- and inter-cluster distances. The performance of GSO using Silhouette is validated through several experiment and analysis of data sets.

Data-Dependent Choice of Optimal Number of Lags in Variogram Estimation

  • Choi, Seung-Bae;Kang, Chang-Wan;Cho, Jang-Sik
    • 응용통계연구
    • /
    • 제23권3호
    • /
    • pp.609-619
    • /
    • 2010
  • Geostatistical data among spatial data is analyzed in three stages: (1) variogram estimation, (2) model fitting for the estimated variograms and (3) spatial prediction using the fitted variogram model. It is very important to estimate the variograms properly as the first stage(i.e., variogram estimation) affects the next two stages. In general, the variogram is estimated with the moment estimator. To estimate the variogram, we have to decide the 'lag increment' or the 'number of lags'. However, there is no established rule for selecting the number of lags in estimating the variogram. The present paper proposes a method of choosing the optimal number of lags based on the PRESS statistic. To show the usefulness of the proposed method, we perform a small simulation study and show an empirical example with with air pollution data from Korea.

클러스터 분석을 위한 IRC기반 클러스터 개수 자동 결정 방법 (Systematic Determination of Number of Clusters Based on Input Representation Coverage)

  • 신미영
    • 전자공학회논문지CI
    • /
    • 제41권6호
    • /
    • pp.39-46
    • /
    • 2004
  • 클러스터 분석에 있어 중요한 문제 중의 하나는 주어진 데이터에 내재된 적절한 클러스터의 수를 찾아내는 것이다. 본 논문에서는 이러한 클러스터의 개수를 체계적으로 결정하기 위하여 IRC (Input Representation Coverage) 개념을 새로이 정의하고, 이를 이용하여 주어진 데이터에 적합한 클러스터의 개수를 자동 결정하는 방법을 제시한다. 또한, 이러한 방법의 유용성 및 응용성을 알아보기 위하여 가상 데이터를 가지고 분석 실험을 하였으며, 실험을 통해 데이터에 내재된 실제 클러스터의 개수를 찾아내는 데에 제안된 방법이 매우 유용하게 사용될 수 있음을 보여준다.

형법범죄 중 5대 범죄와 민간경비 간의 관계 (The relation between the five critical crime of criminal law and the private security services)

  • 주일엽;조광래
    • 시큐리티연구
    • /
    • 제8호
    • /
    • pp.361-377
    • /
    • 2004
  • This study is to examine the relations between the big five critical crime that consist of homicide, robbery, rape, theft, violence and the private security services. To achieve this objective, this research selected the subject of study, specially, 2002 status of the private security such as the number of companies and employees classified by areas along with the big five crime mentioned above classified by area. The research data is secondary data that is from '2003 Crime Analysis' of the Supreme Public Prosecutors' Office and 'The private Security Related Data' of the National Police Agency. The selected data were analyzed according to the variables by using SPSS 10.0 statistics software program. Each hypothesis was verified around the level of significance ${\alpha}$=.05 by using the statistical techniques, such as Descriptive Statistics, Correlation, Regression, etc. The following was the result of the study, First, the total number of the big five crime affects the number of the companies at significant level. Second, the number of the security companies can be explained by the each total number of the big five crime in the order of theft, robbery, violence, rape and murder. Third, the total number of the big five crime affects the number of the security employees at significant level. Forth the number of the security employees can be explained by the each total number of the big five crime in the order of theft, robbery, violence, rape and murder.

  • PDF

K-means based Clustering Method with a Fixed Number of Cluster Members

  • Yi, Faliu;Moon, Inkyu
    • 한국멀티미디어학회논문지
    • /
    • 제17권10호
    • /
    • pp.1160-1170
    • /
    • 2014
  • Clustering methods are very useful in many fields such as data mining, classification, and object recognition. Both the supervised and unsupervised grouping approaches can classify a series of sample data with a predefined or automatically assigned cluster number. However, there is no constraint on the number of elements for each cluster. Numbers of cluster members for each cluster obtained from clustering schemes are usually random. Thus, some clusters possess a large number of elements whereas others only have a few members. In some areas such as logistics management, a fixed number of members are preferred for each cluster or logistic center. Consequently, it is necessary to design a clustering method that can automatically adjust the number of group elements. In this paper, a k-means based clustering method with a fixed number of cluster members is proposed. In the proposed method, first, the data samples are clustered using the k-means algorithm. Then, the number of group elements is adjusted by employing a greedy strategy. Experimental results demonstrate that the proposed clustering scheme can classify data samples efficiently for a fixed number of cluster members.

필드 고장 요약 데이터를 활용한 미래 고장수의 예측 (Predicting the future number of failures based on the field failure summary data)

  • 백재욱;조진남
    • Journal of the Korean Data and Information Science Society
    • /
    • 제22권4호
    • /
    • pp.755-764
    • /
    • 2011
  • 기업은 종종 과거의 필드 고장 데이터를 이용하여 미래에 필드에서 고장이 얼마나 일어날 것인지 예측한다. 특히 이런 예측은 필드에서 예기치 않던 고장모드 (failure mode)가 발견될 때 더욱 하고 싶어진다. 왜냐하면 기업은 이런 예측을 통해 미래에 품질보증 비용이 얼마나 될 것인지 파악하고, 고장 난 부품을 재빨리 수리하는데 필요한 여유 부품의 수를 파악하고 싶기 때문이다. 본 연구에서는 기업에서 생길 수 있는 요약 데이터를 사용하여 미래 필드에서 고장이 얼마나 발생할 것인지 예측하고, 이런 요약 데이터이외에 또 어떤 데이터가 생길 수 있으며 이때 분석결과가 어떻게 나올 수 있는지 알아본다.

비파괴시험 자료를 적용한 콘크리트 기준강도의 통계적 추정 (Statistical Estimation of Specified Concrete Strength by Applying Non-Destructive Test Data)

  • 백인열
    • 한국안전학회지
    • /
    • 제30권1호
    • /
    • pp.52-59
    • /
    • 2015
  • The aim of the paper is to introduce the statistical definition of the specified compressive strength of the concrete to be used for safety evaluation of the existing structure in domestic practice and to present the practical method to obtain the specified strength by utilizing the non-destructive test data as well as the limited number of core test data. The statistical definition of the specified compressive strength of concrete in the design codes is reviewed and the consistent formulations to statistically estimate the specified strength for assessment are described. In order to prevent estimating an unrealistically small value of the specified strength due to limited number of data, it is proposed that the information from the non-destructive test data is combined to that of the minimum core test data. The the sample mean, standard deviation and total number of concrete test are obtained from combined test data. The proposed procedures are applied to an example test data composed of the artificial numerical values and the actual evaluation data collected from the bridge assessment reports. The calculation results show that the proposed statistical estimation procedures yield reasonable values of the specified strength for assessment by applying the non-destructive test data in addition to the limited number of core test data.

Heat Transfer Correlation for the Forced Convective Flow on Single Circular Fin-tube Heat Exchanger

  • Kang Hie-Chan
    • International Journal of Air-Conditioning and Refrigeration
    • /
    • 제14권1호
    • /
    • pp.14-18
    • /
    • 2006
  • This study was performed to investigate the heat transfer characteristics of the circular fin-tube heat exchanger. This paper contains the experimental data for the seven kinds of fin geometries. The correlation of Stasiulevicius agreed with the experimental data at high Reynolds number, however not well at low Reynolds number. The Nusselt number was well correlated with Graetz number, and showed a transition near Gz=10. An empirical correlation proposed in the present study agreed well with the experimental data.

단일 원형휜-원형관에 대한 강제대류열전달 상관식 (Forced Convection Correlation for Single Circular Fin-tube Heat Exchanger)

  • 강희찬;강민철
    • 설비공학논문집
    • /
    • 제16권6호
    • /
    • pp.584-588
    • /
    • 2004
  • This work was performed to investigate the heat transfer characteristics of the circular fin-tube heat exchanger. This paper contains the experimental data for the seven kinds of fin geometries. The correlation of Stasiulevicius agreed with the experimental data at high Reynolds number, however not well at low Reynolds number. The Nusselt number was well correlated with Graetz number, and showed a transition near Gz=10. An empirical correlation proposed in the present work agreed well with the experimental data.