• Title/Summary/Keyword: Random indices

Search Result 135, Processing Time 0.028 seconds

Approximation of π by financial historical data (금융시계열자료를 이용한 원주율값 π의 추정)

  • Jang, Dae-Heung;Uhm, TaeWoong;Yi, Seongbaek
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.4
    • /
    • pp.831-841
    • /
    • 2017
  • The irrational number ${\pi}$ is defined as the ratio of circumference of a circle to its radius and always becomes constant. This article does Monte Carlo approximation of its value using the famous Buffon's needle experiment and shows that its convergence is not always proportional to the sample size. We also do Monte Carlo simulations to see the convergence of the computed ${\pi}$ values from the random walk series with independent normal increment. Finally we apply the theoretical derivation to various financial time series data such as KOSPI, stock prices of Korean big firms, global stock indices and major foreign exchange rates. The historical data shows that log transformed data random walk process but most of their first lagged data don't follow a normal distribution. More importantly the computed value from the ratio of the regression coefficient ${\pi}$ tend to converge a constant, unfortunately not ${\pi}$. Using this result we could doubt on the efficient market hypothesis, and relate the degree of the hypothesis with the amount of deviation of the estimated ${\pi}$ values.

Classification of muscle tension dysphonia (MTD) female speech and normal speech using cepstrum variables and random forest algorithm (켑스트럼 변수와 랜덤포레스트 알고리듬을 이용한 MTD(근긴장성 발성장애) 여성화자 음성과 정상음성 분류)

  • Yun, Joowon;Shim, Heejeong;Seong, Cheoljae
    • Phonetics and Speech Sciences
    • /
    • v.12 no.4
    • /
    • pp.91-98
    • /
    • 2020
  • This study investigated the acoustic characteristics of sustained vowel /a/ and sentence utterance produced by patients with muscle tension dysphonia (MTD) using cepstrum-based acoustic variables. 36 women diagnosed with MTD and the same number of women with normal voice participated in the study and the data were recorded and measured by ADSVTM. The results demonstrated that cepstral peak prominence (CPP) and CPP_F0 among all of the variables were statistically significantly lower than those of control group. When it comes to the GRBAS scale, overall severity (G) was most prominent, and roughness (R), breathiness (B), and strain (S) indices followed in order in the voice quality of MTD patients. As these characteristics increased, a statistically significant negative correlation was observed in CPP. We tried to classify MTD and control group using CPP and CPP_F0 variables. As a result of statistic modeling with a Random Forest machine learning algorithm, much higher classification accuracy (100% in training data and 83.3% in test data) was found in the sentence reading task, with CPP being proved to be playing a more crucial role in both vowel and sentence reading tasks.

Novel two-stage hybrid paradigm combining data pre-processing approaches to predict biochemical oxygen demand concentration (생물화학적 산소요구량 농도예측을 위하여 데이터 전처리 접근법을 결합한 새로운 이단계 하이브리드 패러다임)

  • Kim, Sungwon;Seo, Youngmin;Zakhrouf, Mousaab;Malik, Anurag
    • Journal of Korea Water Resources Association
    • /
    • v.54 no.spc1
    • /
    • pp.1037-1051
    • /
    • 2021
  • Biochemical oxygen demand (BOD) concentration, one of important water quality indicators, is treated as the measuring item for the ecological chapter in lakes and rivers. This investigation employed novel two-stage hybrid paradigm (i.e., wavelet-based gated recurrent unit, wavelet-based generalized regression neural networks, and wavelet-based random forests) to predict BOD concentration in the Dosan and Hwangji stations, South Korea. These models were assessed with the corresponding independent models (i.e., gated recurrent unit, generalized regression neural networks, and random forests). Diverse water quality and quantity indicators were implemented for developing independent and two-stage hybrid models based on several input combinations (i.e., Divisions 1-5). The addressed models were evaluated using three statistical indices including the root mean square error (RMSE), Nash-Sutcliffe efficiency (NSE), and correlation coefficient (CC). It can be found from results that the two-stage hybrid models cannot always enhance the predictive precision of independent models confidently. Results showed that the DWT-RF5 (RMSE = 0.108 mg/L) model provided more accurate prediction of BOD concentration compared to other optimal models in Dosan station, and the DWT-GRNN4 (RMSE = 0.132 mg/L) model was the best for predicting BOD concentration in Hwangji station, South Korea.

Developing Road Hazard Estimation Algorithms Based on Dynamic and Static Data (동적·정적 자료 기반 도로위험도 산정 알고리즘 개발)

  • Yang, Choongheon;Kim, Jinguk
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.19 no.4
    • /
    • pp.55-66
    • /
    • 2020
  • This study developed four algorithms and their associated indices that can quantify and qualify road hazards along roadways. Initially, relevant raw data can be collected from commercial vehicles by camera and DTG. Well-processed data, such as potholes, road freezing, and fog, can be generated from the Integrated management system. Road hazard algorithms combine these data with road inventory data in the Data Sharing Platform. Depending on well-processed data, four different road hazard algorithms and their associated indices were developed. To test the algorithms, an experimental plan based on passive DTG attached in probe vehicles was performed at two different test locations. Selection of the test routes was based on historical data. Although there were limitations using random data for commercial vehicles, hazardous roadways sections, such as fog, road freezing, and potholes, were generated based on actual historical data. As a result, no algorithm error was found in the entire test. Because this study provides road hazard information according to a section, not a point, it can be practically helpful to road users as well as road agencies.

Spatio-temporal Distribution Pattern of New Biotypes of Weedy Rice (Oryza sativa L.) in Selangor North-West Project, Malaysia

  • Baki, B.B.;M.M., Shakirin
    • Korean Journal of Weed Science
    • /
    • v.30 no.2
    • /
    • pp.68-83
    • /
    • 2010
  • Weedy rice (Oryza sativa L.) occurred sympatrically with other weeds and the rice crop in Malaysian rice granaries. We conducted field surveys in 2006-2008 seasons in 7 farm blocks of Selangor's North West Project, Malaysia to enlist the new biotypes of weedy rice (NBWR) and assess their spatio-temporal pattern of distribution based on quantitative and dispersion indices. No less than 16 accessions of NBWR were identified based on their special traits, viz. panicle type, pericarp colour, presence or absence of awn, seed type and degree of grain shattering. The NBWR accessions exhibited a combination of morphological traits from open panicle, grain with awns, red pericarp, short grain type, and degrees of grain shattering. Others mimic commercial rices with close panicle, awnless grains, white pericarp, long or short grain-type. Invariably, the NBWRs mimic and stand as tall as cultivated rice namely MR219, MR220, or MR235 and these NBWR accessions stand among equals morphologically vis-a-vis the commercial rice varieties. Most accessions displayed varying degrees of grain shattering in excess of 50%, except Acc9 and Acc12. The seasonal dynamics of on the prevalence of dominant NBWR accessions were also displaying significant differences among farm blocks. While Bagan Terap farm block, for example, did not record any measurable changes in the dominant NBWR accessions over seasons, the Sungai Leman farm block recorded measurable season-mediated changes in the dominant NBWR accessions. Sungai Leman started with NBWR Acc3, Acc4, Acc5, Acc7, Acc8, and Acc12 in season 1 of 2006/2007, but no measurable records of Acc3 and Acc5 were shown in season 2 of 2007. In season 3 of 2007/2008, only Acc8 and Acc12 prevailed in the farm block. In Sawah Sempadan farm block, season 3 of 2007/2008 showed much reduced prevalence of NBWRs leaving only Acc8 and Acc12. Most accessions registered clump or under-dispersed spatial distribution pattern based on quantitative indices: variance-to-mean ratio (VMR) and Lloyd's patchiness values. The dynamics on the extent of infestation and prevalence of dominant NBWR accessions registered both season- and farm-block mediated differences. Most accessions showed VMR >1 thus indicative of having a clump or clustered spatial distribution, as exemplified by Acc3, Acc4, Acc7, Acc8 and Acc12 in all farm blocks throughout three seasons. Some accessions have either random or uniform distribution in a few farm blocks. The Acc8 has the highest population counts based on important value index, followed by Acc12, and both were the most dominant accession while Sawah Sempadan was the worst farm block infested by NBWR compared to other farm blocks. These results were discussed in relation with the current agronomic and weed management practices, water availability and extension services in the granary.

Variation of Seasonal Groundwater Recharge Analyzed Using Landsat-8 OLI Data and a CART Algorithm (CART알고리즘과 Landsat-8 위성영상 분석을 통한 계절별 지하수함양량 변화)

  • Park, Seunghyuk;Jeong, Gyo-Cheol
    • The Journal of Engineering Geology
    • /
    • v.31 no.3
    • /
    • pp.395-432
    • /
    • 2021
  • Groundwater recharge rates vary widely by location and with time. They are difficult to measure directly and are thus often estimated using simulations. This study employed frequency and regression analysis and a classification and regression tree (CART) algorithm in a machine learning method to estimate groundwater recharge. CART algorithms are considered for the distribution of precipitation by subbasin (PCP), geomorphological data, indices of the relationship between vegetation and landuse, and soil type. The considered geomorphological data were digital elevaion model (DEM), surface slope (SLOP), surface aspect (ASPT), and indices were the perpendicular vegetation index (PVI), normalized difference vegetation index (NDVI), normalized difference tillage index (NDTI), normalized difference residue index (NDRI). The spatio-temperal distribution of groundwater recharge in the SWAT-MOD-FLOW program, was classified as group 4, run in R, sampled for random and a model trained its groundwater recharge was predicted by CART condidering modified PVI, NDVI, NDTI, NDRI, PCP, and geomorphological data. To assess inter-rater reliability for group 4 groundwater recharge, the Kappa coefficient and overall accuracy and confusion matrix using K-fold cross-validation were calculated. The model obtained a Kappa coefficient of 0.3-0.6 and an overall accuracy of 0.5-0.7, indicating that the proposed model for estimating groundwater recharge with respect to soil type and vegetation cover is quite reliable.

Genetic status of Acanthamoeba spp. Korean isolates on the basis of RAPD markers (RAPD 표지자 분석 에 의한 가시아메바속 한국분리주의 유전적 지위)

  • 홍용표;오승환
    • Parasites, Hosts and Diseases
    • /
    • v.33 no.4
    • /
    • pp.341-348
    • /
    • 1995
  • Genetic status of Acnnthamoebc sap. were tested on the basis of random amplified polymorphic DNA (RAPD) marker analysis. Four previously established Accnthcmoebn species, 4 Korean isolates of Acnnthamoeba sp., and one American isolate of Acanthcmoebc sp. were analyzed by RAPD-PCR using an arbitrary decamer primers. Amplification products were fractionated by agarose gel electrophoresis and slainrd by ethidium bromide . Eighteen primers produced DNA amplification profiles revealing clear differences among 4 species. Nine of them also produced DNA amplification profiles which included some isolate-specific amplification products. On the basis of amplified fragments by 18 primers, the pairwise similarity indices between A. culbensoni and other species (i.e. A. hntchetti, A. trinngularis, A. polyphaga) were 0.300, 0.308, and 0.313, respectively. Similarity index between A. hctchetti and A. triansulcris was 0.833. The mean similarity index among the 3 Korean isolates (YM-2, -3, -4) was 0.959 and 0.832 among them and 2 other species (A. hatchetti and A. triongulnris). The mean similarity index among YM-5 and other Korean isolates (YM-2, -3, -4) was 0.237. However, the similarity index between YM-5 and A. culbeksoni was 0.857, which suggests that YM-5 is genetically more similar to A. culbertsoni than other Korean isolates. Phonogram reconstructed by UPGMA method revealed that there are two groups: one group consists of A. hctchetti, A. tlonsulcns, and 3 Korean isolates (YM-2, -3, -4) , and the other group consists of A. cuLbensoni. A. polwphosc, HOV, and YM-5.

  • PDF

Development of a Gangwon Province Forest Fire Prediction Model using Machine Learning and Sampling (머신러닝과 샘플링을 이용한 강원도 지역 산불발생예측모형 개발)

  • Chae, Kyoung-jae;Lee, Yu-Ri;cho, yong-ju;Park, Ji-Hyun
    • The Journal of Bigdata
    • /
    • v.3 no.2
    • /
    • pp.71-78
    • /
    • 2018
  • The study is based on machine learning techniques to increase the accuracy of the forest fire predictive model. It used 14 years of data from 2003 to 2016 in Gang-won-do where forest fire were the most frequent. To reduce weather data errors, Gang-won-do was divided into nine areas and weather data from each region was used. However, dividing the forest fire forecast model into nine zones would make a large difference between the date of occurrence and the date of not occurring. Imbalance issues can degrade model performance. To address this, several sampling methods were applied. To increase the accuracy of the model, five indices in the Canadian Frost Fire Weather Index (FWI) were used as derived variable. The modeling method used statistical methods for logistic regression and machine learning methods for random forest and xgboost. The selection criteria for each zone's final model were set in consideration of accuracy, sensitivity and specificity, and the prediction of the nine zones resulted in 80 of the 104 fires that occurred, and 7426 of the 9758 non-fires. Overall accuracy was 76.1%.

Characteristics of Dioscorea alata L. Introduced from Tropical and Subtropical Regions (도입 마(Dioscorea alata L.)의 특성 분석)

  • Chang, K.J.;Yoo, K.O.;Park, C.H.;Park, J.I.;Hong, K.H.;Park, J.H.
    • Journal of Practical Agriculture & Fisheries Research
    • /
    • v.3 no.1
    • /
    • pp.48-69
    • /
    • 2001
  • A lot of clones of the genus Dioscorea have been introduced from some tropical and subtropical regions since 1997. In 33 clones of water yams (Dioscorea alata L.), some morphological characteristics were investigated at the field. Variation ranges of the total weight and tuber number per stump were within the ranges from 90 to 2,147 g with an average of 610 g ; and 1.3-4.7 with an average of 2.8, respectively. The color tones observed on the tuber-flesh were sorted into 3 color-categories, i.e., white, pale brown and pale purple, and those on leaves were sorted into 3 color-categories, i.e., green, heavy green and purplish green. Intraspecific genetic relationship of 19 variation types of the Yam classified by their external morphological characteristics such as leaf and tuber shape was assessed by DNA using random and specific primers. Twenty two out of 113 primers (100 random[10-mer] primers, two 15 mer [M13 core sequence, and (GGAT)4 sequence]) had been used in PCR-amplification. Only 12 primers, however, were successful in DNA amplification in all of the analyzed plants, resulting in 93 randomly and specifically amplified DNA fragments. The analyzed taxa showed very high polymorphisms(69 bands, 71.0%), allowing individual taxon to be identified based on DNA fingerprinting. Monomorphic bands among total amplified DNA bands of each primer was low under the 50%. Similarity indices between accessions were computed from PCR(polymerase chain reaction) data, and genetic relationships among intraspecific variations were closely related at the levels ranging from 0.66 to 0.90.

Analysis of Future Land Use and Climate Change Impact on Stream Discharge (미래토지이용 및 기후변화에 따른 하천유역의 유출특성 분석)

  • Ahn, So Ra;Lee, Yong Jun;Park, Geun Ae;Kim, Seong Joon
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.28 no.2B
    • /
    • pp.215-224
    • /
    • 2008
  • The effect of streamflow considering future land use change and vegetation index information by climate change scenario was assessed using SLURP (Semi-distributed Land-Use Runoff Process) model. The model was calibrated and verified using 4 years (1999-2002) daily observed streamflow data for the upstream watershed ($260.4km^2$) of Gyeongan water level gauging station. By applying CA-Markov technique, the future land uses (2030, 2060, 2090) were predicted after test the comparison of 2004 Landsat land use and 2004 CA-Markov land use by 1996 and 2000 land use data. The future land use showed a tendency that the forest and paddy decreased while urban, grassland and bareground increased. The future vegetation indices (2030, 2060, 2090) were estimated by the equation of linear regression between monthly NDVI of NOAA AVHRR images and monthly mean temperature of 5 years (1998-2002). Using CCCma CGCM2 simulation result based on SRES A2 and B2 scenario (2030s, 2060s, 2090s) of IPCC and data were downscaled by Stochastic Spatio-Temporal Random Cascade Model (SST-RCM) technique, the model showed that the future runoff ratio was predicted from 13% to 34% while the runoff ratio of 1999-2002 was 59%. On the other hand, the impact on runoff ratio by land use change showed about 0.1% to 1% increase.