• Title/Summary/Keyword: 불균형 자료

Search Result 304, Processing Time 0.023 seconds

A Data Fusion Algorithm for Link Travel Time Estimation (링크 통행시간 추정을 위한 데이터 퓨젼 알고리즘의 개발)

  • 최기수;정연식
    • Journal of Korean Society of Transportation
    • /
    • v.16 no.2
    • /
    • pp.177-195
    • /
    • 1998
  • 지능형교통체계(ITS:Intellegent Transport System)의 구현을 위한 가장 중요한 요소중의 하나는 교통정보의 생성이다. 교통정보의 생성은 루프 검지기, 폐쇄회로(CCTV), probe 차량, 경찰, 통신원 등을 수집된 제보자료들을 분석 및 가공함으로써 이루어진다. 그러나 이들 수집원은 주어진 시간에 있어 모든 네트웍을 통해서 자료가 완전히 수집되어지는 것은 아니다. 즉, 특정 지역에 수집원이 몰려 있는 경우가 있는 반면, 전혀 수집되어지지 않는 지역이 발생할 수도 있다. 이러한 공간적인 불균형적 특성은 동시에 발생한 다량의 자료를 처리하는 기술과 자료가 수집되지 않은 지역에 대한 처리기술을 요하게 된다. 본 논문은 전술한 바와 같은 사항에 대하여 ITS의 진행 단계별로 드러날 수 있는 문제점을 검토하고, 자료통합에 대한 일반적인 개념을 우선 설명한다. 다음에 특정시각에 주어진 자료의 통합을 위해 퍼지선형회귀모형(fuzzy linear regression model)과 데이터 퓨전(data fusion)기법의 내용을 소개하고, 신뢰성있는 단일 교통정보생성을 위한 테이터 퓨전 알고리즘을 제시한다. 또한 제시된 알고리즘을 토대로 가상의 자료를 이용하여 적용가능 봉? 타진해 보았다. 제시되어진 알고리즘은 향후 교통정보 수집환경이 어느 정도 형성된다고 볼 때, 예측치와 실측자료간의 자료검증을 통하여 신뢰도를 가질 경우 보다 광범위하게 사용되어질 수 있을 것으로 판단된다.

  • PDF

Classification Analysis for Unbalanced Data (불균형 자료에 대한 분류분석)

  • Kim, Dongah;Kang, Suyeon;Song, Jongwoo
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.3
    • /
    • pp.495-509
    • /
    • 2015
  • We study a classification problem of significant differences in the proportion of two groups known as the unbalanced classification problem. It is usually more difficult to classify classes accurately in unbalanced data than balanced data. Most observations are likely to be classified to the bigger group if we apply classification methods to the unbalanced data because it can minimize the misclassification loss. However, this smaller group is misclassified as the larger group problem that can cause a bigger loss in most real applications. We compare several classification methods for the unbalanced data using sampling techniques (up and down sampling). We also check the total loss of different classification methods when the asymmetric loss is applied to simulated and real data. We use the misclassification rate, G-mean, ROC and AUC (area under the curve) for the performance comparison.

Study on Detection Technique for Cochlodinium polykrikoides Red tide using Logistic Regression Model under Imbalanced Data (불균형 데이터 환경에서 로지스틱 회귀모형을 이용한 Cochlodinium polykrikoides 적조 탐지 기법 연구)

  • Bak, Su-Ho;Kim, Heung-Min;Kim, Bum-Kyu;Hwang, Do-Hyun;Enkhjargal, Unuzaya;Yoon, Hong-Joo
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.13 no.6
    • /
    • pp.1353-1364
    • /
    • 2018
  • This study proposed a method to detect Cochlodinium polykrikoides red tide pixels in satellite images using a logistic regression model of machine learning technique under Imbalanced data. The spectral profiles extracted from red tide, clear water, and turbid water were used as training dataset. 70% of the entire data set was extracted and used for as model training, and the classification accuracy of the model was evaluated using the remaining 30%. At this time, the white noise was added to the spectral profile of the red tide, which has a relatively small number of data compared to the clear water and the turbid water, and over-sampling was performed to solve the unbalanced data problem. As a result of the accuracy evaluation, the proposed algorithm showed about 94% classification accuracy.

Comparison of the Family Based Association Test and Sib Transmission Disequilibrium Test for Dichotomous Trait (이산형 형질에 대한 가족자료 연관성 검정법 FBAT와 형제 전달 불균형 연관성 검정법 S-TDT의 비교)

  • Kim, Han-Sang;Oh, Young-Sin;Song, Hae-Hiang
    • The Korean Journal of Applied Statistics
    • /
    • v.23 no.6
    • /
    • pp.1103-1113
    • /
    • 2010
  • An extensively used approach for family based association test(FBAT) is compared with the sib transmission/disequilibrium test(S-TDT), and in particular the adjusted S-TDT, in which the covariance among related siblings is taken into consideration, can provide a more sensitive test statistic for association. A simulation study comparing the three test statistics demonstrates that the type I error rates of all three tests are larger than the prespecified significance level and the power of the FBAT is lower than those of the other two tests. More detailed studies are required in order to assess the influence of the assumed conditions in FBAT on the efficiency of the test.

Estimation of Haplotype Proportions in Single Necleotide Polymorphism Group Using EM Algorithm (EM 알고리듬을 이용한 단일염기변이 (SNP;SINGLE NUCLEOTIDE POLYMORPHISM)군의 일배체형 (HAPLOTYPE) 비율 추정)

  • 김선우;김종원;이경아
    • The Korean Journal of Applied Statistics
    • /
    • v.16 no.2
    • /
    • pp.195-202
    • /
    • 2003
  • Haplotype analysis in SNP is very useful for the study of complex genetic disease due to low cost and high efficiency comparing to individual analysis of each SNP, and is functionally important in biological view. But, the gametic phase of haplotypes is usually unknown in SNP group, and it is difficult to predict haplotype proportions. In this study, haplotype proportions were estimated using EM algorithm from diploid data of SNP group in solid tumor group and normal group. From these results, linkage disequilibrium among SNPs was analyzed.

Local Imbalance of Emergency Medical Services(EMS): Analyses on 119 EMS Activity Reports of Busan (구급서비스의 지역 불균형: 부산시 119 구급활동일지 분석)

  • Lee, Dalbyul
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.23 no.3
    • /
    • pp.161-173
    • /
    • 2020
  • This study analyzed local imbalances in the supply and demand of emergency medical services in Busan using the 119 emergency activity reports of the Busan Fire & Disaster Headquarters. The data for EMS activity reports in 2017 was converted into Jimgyegu units. The spatial distribution of the indicators representing the local imbalance of emergency demand and supply (number of reports, number of reports relative to the population, average coefficient of variation and outlier of on-site arrival time, and number of dispatches outside the jurisdiction) was analyzed using Hotspot analysis of GIS spatial statistics analysis. As a result of the analysis, the hot spot area and the cold spot area where both supply and demand of emergency services are concentrated were clearly distinguished. This means that the supply and demand of emergency services in Busan are locally unbalanced. In particular, there was a difference in the demand and supply of emergency services in the original downtown and its surrounding areas, and in the outskirts of Busan.

The Impact of the Supply Regulation on the Price in Farming Olive Flounder (출하량 조절이 양식 넙치가격에 미치는 영향)

  • Kang, Seokkyu
    • Environmental and Resource Economics Review
    • /
    • v.24 no.4
    • /
    • pp.709-725
    • /
    • 2015
  • This study is to analyse the relationship between the price and the supply in the farming Olive Flounder's production area market. The data used in this study correspond to daily price and supply quantity covering time period from January 1, 2007 to June 30. 2013. The analysis methods of cointegration and vector error correction model are employed. The empirical results of this study are summarized as follows: First, the price and the supply follow random walks and they are integrated of order 1. Second, the price and the supply are cointegrated. Third, vector error correction model suggests that the relationship between the price change ration and the supply quantity change ratio has negative and feedback effect exists in the long-run, but the disequilibrium between the price and the supply is corrected by the supply quantity. Finally, vector error correction model suggests that the supply quantity leads the price in the short-run. This indicates that the decrease(increase) of the supply quantity results in the increase(decrease) of the price.

Net Radiation Estimation Using Flux Tower Data and Integrated Hydrological Model: For the Seolmacheon and Chungmichen Watersheds (플럭스 타워 관측 자료 및 통합수문모형을 이용한 순복사량 산정: 설마천, 청미천 유역을 대상으로)

  • Kim, Daeun;Baek, JongJin;Jung, Sung-Won;Choi, Minha
    • Journal of Korea Water Resources Association
    • /
    • v.46 no.3
    • /
    • pp.301-314
    • /
    • 2013
  • Spatial heterogeneous characteristics of solar radiation energy from Climate Change gives rise to energy imbalance in the general ecological system including water resources. To understand energy flow, flux towers are up and running throughout the world. In step with, in domestic major areas, there have been observed using several flux towers. In this study, downward shortwave radiation, downward long wave radiation, and net radiation that take important part in hydro-meteorology and ecology were calculated by proposed physical equations using flux data of the Seolmacheon and Choengmicheon, then, the calculated net radiation and observed net radiation were individually compared and validated. The results confirmed applicability of physical methods for insufficient hydro-meteorological data and possibility for observed data of hydro-meteorological variables.

Evaluating the Imbalance of Green Space and Establishing its Management Zone Using Spatial Analysis - Focused on the Use of Green Space - (공간분석을 활용한 녹지의 불균형 평가 및 관리권역 설정 - 녹지의 이용적 측면을 중심으로 -)

  • Lee, Woo-Sung;Jung, Sung-Gwan
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.15 no.2
    • /
    • pp.126-138
    • /
    • 2012
  • The purpose of this study is to evaluate the imbalance of green space using various spatial analysis methods and to establish the management zone for green space with service supply in the aspect of its use in Daegu. The total green space of Daegu is 48,936.1ha which is the second among 7 metropolitan cities of Korea. According to the imbalance analysis of green space, the Gini's coefficient based on the area was not high, on the other hand, the Gini's coefficient based on the population was high by above 0.6. According to an evaluation of service supply of green space in Dalseo-gu, the area within about 100m around large green space was supplied with green spaces of above $25m^2$/pop. On the other hand, the area such as Sangin, Jukjeon, and Yongsan was not almost supplied with green space. Finally, 'Rich zone', 'Fair zone', 'Poor zone', and Broken zone' could be established based on the service supply for the management direction of green space. The findings from this study can be used as the basic data for selecting the construction priority of new green spaces.

Classification Algorithm-based Prediction Performance of Order Imbalance Information on Short-Term Stock Price (분류 알고리즘 기반 주문 불균형 정보의 단기 주가 예측 성과)

  • Kim, S.W.
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.4
    • /
    • pp.157-177
    • /
    • 2022
  • Investors are trading stocks by keeping a close watch on the order information submitted by domestic and foreign investors in real time through Limit Order Book information, so-called price current provided by securities firms. Will order information released in the Limit Order Book be useful in stock price prediction? This study analyzes whether it is significant as a predictor of future stock price up or down when order imbalances appear as investors' buying and selling orders are concentrated to one side during intra-day trading time. Using classification algorithms, this study improved the prediction accuracy of the order imbalance information on the short-term price up and down trend, that is the closing price up and down of the day. Day trading strategies are proposed using the predicted price trends of the classification algorithms and the trading performances are analyzed through empirical analysis. The 5-minute KOSPI200 Index Futures data were analyzed for 4,564 days from January 19, 2004 to June 30, 2022. The results of the empirical analysis are as follows. First, order imbalance information has a significant impact on the current stock prices. Second, the order imbalance information observed in the early morning has a significant forecasting power on the price trends from the early morning to the market closing time. Third, the Support Vector Machines algorithm showed the highest prediction accuracy on the day's closing price trends using the order imbalance information at 54.1%. Fourth, the order imbalance information measured at an early time of day had higher prediction accuracy than the order imbalance information measured at a later time of day. Fifth, the trading performances of the day trading strategies using the prediction results of the classification algorithms on the price up and down trends were higher than that of the benchmark trading strategy. Sixth, except for the K-Nearest Neighbor algorithm, all investment performances using the classification algorithms showed average higher total profits than that of the benchmark strategy. Seventh, the trading performances using the predictive results of the Logical Regression, Random Forest, Support Vector Machines, and XGBoost algorithms showed higher results than the benchmark strategy in the Sharpe Ratio, which evaluates both profitability and risk. This study has an academic difference from existing studies in that it documented the economic value of the total buy & sell order volume information among the Limit Order Book information. The empirical results of this study are also valuable to the market participants from a trading perspective. In future studies, it is necessary to improve the performance of the trading strategy using more accurate price prediction results by expanding to deep learning models which are actively being studied for predicting stock prices recently.