• Title/Summary/Keyword: Similar Data

Search Result 9,166, Processing Time 0.034 seconds

A Comparison of Clustering Algorithm in Data Mining

  • Lee, Yung-Seop;An, Mi-Young
    • Journal of the Korean Data and Information Science Society
    • /
    • v.14 no.4
    • /
    • pp.725-736
    • /
    • 2003
  • To provide the information needed to make a decision, it is important to know the relationship or pattern between variables in database. Grouping objects which have similar characteristics of pattern is called as cluster analysis, one of data mining techniques. In this study, it is compared with several partitioning clustering algorithms, based on the statistical distance or total variance in each cluster.

  • PDF

Analysis Period of Input Data for Improving the Prediction Accuracy of Express-Bus Travel Times (고속버스 통행시간 예측의 정확도 제고를 위한 입력자료 분석기간 선정 연구)

  • Nam, Seung-Tae;Yun, Ilsoo;Lee, Choul-Ki;Oh, Young-Tae;Choi, Yun-Taik;Kwon, Kenan
    • International Journal of Highway Engineering
    • /
    • v.16 no.5
    • /
    • pp.99-108
    • /
    • 2014
  • PURPOSES : The travel times of expressway buses have been estimated using the travel time data between entrance tollgates and exit tollgates, which are produced by the Toll Collections System (TCS). However, the travel time data from TCS has a few critical problems. For example, the travel time data include the travel times of trucks as well as those of buses. Therefore, the travel time estimation of expressway buses using TCS data may be implicitly and explicitly incorrect. The goal of this study is to improve the accuracy of the expressway bus travel time estimation using DSRC-based travel time by identifying the appropriate analysis period of input data. METHODS : All expressway buses are equipped with the Hi-Pass transponders so that the travel times of only expressway buses can be extracted now using DSRC. Thus, this study analyzed the operational characteristics as well as travel time patterns of the expressway buses operating between Seoul and Dajeon. And then, this study determined the most appropriate analysis period of input data for the expressway bus travel time estimation model in order to improve the accuracy of the model. RESULTS : As a result of feasibility analysis according to the analysis period, overall MAPE values were found to be similar. However, the MAPE values of the cases using similar volume patterns outperformed other cases. CONCLUSIONS : The best input period was that of the case which uses the travel time pattern of the days whose total expressway traffic volumes are similar to that of one day before the day during which the travel times of expressway buses must be estimated.

A Broken Image Screening Method based on Histogram Analysis to Improve GAN Algorithm (GAN 알고리즘 개선을 위한 히스토그램 분석 기반 파손 영상 선별 방법)

  • Cho, Jin-Hwan;Jang, Jongwook;Jang, Si-Woong
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.4
    • /
    • pp.591-597
    • /
    • 2022
  • Recently, many studies have been done on the data augmentation technique as a way to efficiently build datasets. Among them, a representative data augmentation technique is a method of utilizing Generative Adversarial Network (GAN), which generates data similar to real data by competitively learning generators and discriminators. However, when learning GAN, there are cases where a broken pixel image occurs among similar data generated according to the environment and progress, which cannot be used as a dataset and causes an increase in learning time. In this paper, an algorithm was developed to select these damaged images by analyzing the histogram of image data generated during the GAN learning process, and as a result of comparing them with the images generated in the existing GAN, the ratio of the damaged images was reduced by 33.3 times(3,330%).

Productivity Measurement and Analysis on Factors in Steel Erection (철골세우기의 현장생산성 측정 및 영향요인 분석)

  • Lee, Ji-Yong;Huh, Young Ki;Ahn, Bang Ryul
    • Proceedings of the Korean Institute of Building Construction Conference
    • /
    • 2008.11a
    • /
    • pp.123-127
    • /
    • 2008
  • As buildings becoming higher and more enormous the portion of steel works has been increased, which makes the schedule planning and management more significant. However, in actual construction sites, management is more based on a manager's construction experience than productivity data accumulated in previous projects. Moreover, most of the existing studies also featured a theoretical approach rather than an analysis of data straightforwardly collected in sites. In this study, a steel-erection site was visited to collect productivity data. The study found that there were significant disparities between aboveground work productivity and underground work. However, the productivities of 'first node on ground' and 'second node on ground' were estimated similar. The productivity data collected and factors affecting the productivity will help managers to plan and control their similar steel-erection works. This study will also be beneficial for those performing related studies.

  • PDF

A Novel Security Scheme with Message Level Security for Hybrid Applications

  • Ma, Suoning;Joe, Inwhee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2016.04a
    • /
    • pp.215-217
    • /
    • 2016
  • With the popularity of smart device, mobile applications are playing more and more important role in people's daily life, these applications stores various information which greatly facilitate the user's daily life. However due to the frequent transmission of data in the network also increases the risk of data leakage, more and more developers began to focus on how to protect user data. Current mainstream development models include Native development, Web development and Hybrid development. Hybrid development is based on JavaScript and HTML5, it has a cross platform advantages similar to Web Apps and a good user experience similar to Native Apps. In this paper according to the features of Hybrid applications, we proposed a security scheme in Hybrid development model implements message-level data encryption to protect user information. And through the performance evaluation we found that in some scenario the proposed security scheme has a better performance.

Voice Similarities between Sisters

  • Ko, Do-Heung
    • Speech Sciences
    • /
    • v.8 no.3
    • /
    • pp.43-50
    • /
    • 2001
  • This paper deals with voice similarities between sisters who are supposed to have common physiological characteristics from a single biological mother. Nine pairs of sisters who are believed to have similar voices participated in this experiment. The speech samples obtained from one pair of sisters were eliminated in the analysis because their perceptual score was relatively low. The words were measured in both isolation and context, and the subjects were asked to read the text five times with about three seconds of interval between readings. Recordings were made at natural speed in a quiet room. The data were analyzed in pitch and formant frequencies using CSL (Computerized Speech Lab) and PCQuirer. It was found that data of the initial vowels are much more similar and homogeneous than those of vowels in other positions. The acoustic data showed that voice similarities are strikingly high in both pitch and formant frequencies. It is assumed that statistical data obtained from this experiment can be used as a guideline for modelling speaker identification and speaker verification.

  • PDF

Graphical Methods for the Sensitivity Analysis in Discriminant Analysis

  • Jang, Dae-Heung;Anderson-Cook, Christine M.;Kim, Youngil
    • Communications for Statistical Applications and Methods
    • /
    • v.22 no.5
    • /
    • pp.475-485
    • /
    • 2015
  • Similar to regression, many measures to detect influential data points in discriminant analysis have been developed. Many follow similar principles as the diagnostic measures used in linear regression in the context of discriminant analysis. Here we focus on the impact on the predicted classification posterior probability when a data point is omitted. The new method is intuitive and easily interpretable compared to existing methods. We also propose a graphical display to show the individual movement of the posterior probability of other data points when a specific data point is omitted. This enables the summaries to capture the overall pattern of the change.

Accuracy and Stability of Temperature and Salinity from Autonomous Profiling CTD Floats (ARGO Float) (자동 수직물성관측 뜰개(ARGO Float)로 얻은 수온과 염분의 정확도와 안정도)

  • 오경희;박영규;석문식
    • The Sea:JOURNAL OF THE KOREAN SOCIETY OF OCEANOGRAPHY
    • /
    • v.9 no.4
    • /
    • pp.204-211
    • /
    • 2004
  • Autonomous profiling CTD floats are a useful tool for observing the oceans. We, however, cannot perform post-deployment calibration of the CTD's attached to the floats, and the assessment of the accuracy and stability of the profile data from the floats is one of the important issues in the delayed mode quality control of the profiles. Variations in salinity in the intermediate level of East Sea is comparable to the accuracy of salinity data required by the international Argo Program, which is 0.01. Therefore, we can assess the credibility of salinity data from the floats deployed in the East Sea using three independent methods while considering the East Sea as a salinity calibration bath. The methods utilized here are 1) comparison of high quality CTD data and float data obtained at similar locations at similar time, 2) comparison of float data obtained at similar locations at similar time, and 3) investigation of long term stability and accuracy of salinity data from parking depths. All three methods show that without any calibration, the salinity data satisfy the accuracy criterion by the Argo Program. While assuming that the intermediate level temperature in the East Sea is as homogeneous as the salinity, we have applied the three methods to temperature data. We found that the accuracy of temperature reading is 0.01$^{\circ}C$, which is about twice larger than the requirement by the Argo Program, 0.005$^{\circ}C$. This does not mean that the temperature readings are inaccurate, because the intermediate level temperature does vary spacially and temporally more than the accuracy interval required by the Argo Program. If we take into account the variation in the intermediate level temperature, the accuracy of temperature data from the floats is not significantly different from that proposed by the Argo Program. Therefore, one could use both temperature and salinity profiles from the floats assessed in this study without calibration.

Algorithmic Generation of Self-Similar Network Traffic Based on SRA (SRA 알고리즘을 이용한 Self-Similar 네트워크 Traffic의 생성)

  • Jeong HaeDuck J.;Lee JongSuk R.
    • The KIPS Transactions:PartC
    • /
    • v.12C no.2 s.98
    • /
    • pp.281-288
    • /
    • 2005
  • It is generally accepted that self-similar (or fractal) Processes may provide better models for teletraffic in modem computer networks than Poisson processes. f this is not taken into account, it can lead to inaccurate conclusions about performance of computer networks. Thus, an important requirement for conducting simulation studies of telecommunication networks is the ability to generate long synthetic stochastic self-similar sequences. A generator of pseudo-random self similar sequences, based on the SRA (successive random addition) method, is implemented and analysed in this paper. Properties of this generator were experimentally studied in the sense of its statistical accuracy and the time required to produce sequences of a given (long) length. This generator shows acceptable level of accuracy of the output data (in the sense of relative accuracy of the Hurst parameter) and is fast. The theoretical algorithmic complexity is O(n).

An Experimental Study on the Degree of Phonetic Similarity between Korean and Japanese Vowels (한국어와 일본어 단모음의 유사성 분석을 위한 실험음성학적 연구)

  • Kwon, Sung-Mi
    • MALSORI
    • /
    • no.63
    • /
    • pp.47-66
    • /
    • 2007
  • This study aims at exploring the degree of phonetic similarity between Korean and Japanese vowels in terms of acoustic features by performing the speech production test on Korean speakers and Japanese speakers. For this purpose, the speech of 16 Japanese speakers for Japanese speech data, and the speech of 16 Korean speakers for Korean speech data were utilized. The findings in assessing the degree of the similarity of the 7 nearest equivalents of the Korean and Japanese vowels are as follows: First, Korean /i/ and /e/ turned out to display no significant differences in terms of F1 and F2 with their counterparts, Japanese /i/ and /e/, and the distribution of F1 and F2 of Korean /i/ and /e/ in the distributional map completely overlapped with Japanese /i/ and /e/. Accordingly, Korean /i/ and /e/ were believed to be "identical." Second, Korean /a/, /o/, and /i/ displayed a significant difference in either F1 or F2, but showed a great similarity in distribution of F1 and F2 with Japanese /a/, /o/, and /m/ respectively. Korean /a/ /o/, and /i/, therefore, were categorized as very similar to Japanese vowels. Third, Korean /u/, which has the counterpart /m/ in Japanese, showed a significant difference in both F1 and F2, and only half of the distribution overlapped. Thus, Korean /u/ was analyzed as being a moderately similar vowel to Japanese vowels. Fourth, Korean /${\wedge}$/ did not have a close counterpart in Japanese, and was classified as "the least similar vowel."

  • PDF