• 제목/요약/키워드: Representative Bias

Search Result 77, Processing Time 0.022 seconds

Convolution Neural Network for Prediction of DNA Length and Number of Species (DNA 길이와 혼합 종 개수 예측을 위한 합성곱 신경망)

  • Sunghee Yang;Yeone Kim;Hyomin Lee
    • Korean Chemical Engineering Research
    • /
    • v.62 no.3
    • /
    • pp.274-280
    • /
    • 2024
  • Machine learning techniques utilizing neural networks have been employed in various fields such as disease gene discovery and diagnosis, drug development, and prediction of drug-induced liver injury. Disease features can be investigated by molecular information of DNA. In this study, we developed a neural network to predict the length of DNA and the number of DNA species in mixture solution which are representative molecular information of DNA. In order to address the time-consuming limitations of gel electrophoresis as conventional analysis, we analyzed the dynamic data of a microfluidic concentrating device. The dynamic data were reconstructed into a spatiotemporal map, which reduced the computational cost required for training and prediction. We employed a convolutional neural network to enhance the accuracy to analyze the spatiotemporal map. As a result, we successfully performed single DNA length prediction as single-variable regression, simultaneous prediction of multiple DNA lengths as multivariable regression, and prediction of the number of DNA species in mixture as binary classification. Additionally, based on the composition of training data, we proposed a solution to resolve the problem of prediction bias. By utilizing this study, it would be effectively performed that medical diagnosis using optical measurement such as liquid biopsy of cell-free DNA, cancer diagnosis, etc.

A stratified random sampling design for paddy fields: Optimized stratification and sample allocation for effective spatial modeling and mapping of the impact of climate changes on agricultural system in Korea (농지 공간격자 자료의 층화랜덤샘플링: 농업시스템 기후변화 영향 공간모델링을 위한 국내 농지 최적 층화 및 샘플 수 최적화 연구)

  • Minyoung Lee;Yongeun Kim;Jinsol Hong;Kijong Cho
    • Korean Journal of Environmental Biology
    • /
    • v.39 no.4
    • /
    • pp.526-535
    • /
    • 2021
  • Spatial sampling design plays an important role in GIS-based modeling studies because it increases modeling efficiency while reducing the cost of sampling. In the field of agricultural systems, research demand for high-resolution spatial databased modeling to predict and evaluate climate change impacts is growing rapidly. Accordingly, the need and importance of spatial sampling design are increasing. The purpose of this study was to design spatial sampling of paddy fields (11,386 grids with 1 km spatial resolution) in Korea for use in agricultural spatial modeling. A stratified random sampling design was developed and applied in 2030s, 2050s, and 2080s under two RCP scenarios of 4.5 and 8.5. Twenty-five weather and four soil characteristics were used as stratification variables. Stratification and sample allocation were optimized to ensure minimum sample size under given precision constraints for 16 target variables such as crop yield, greenhouse gas emission, and pest distribution. Precision and accuracy of the sampling were evaluated through sampling simulations based on coefficient of variation (CV) and relative bias, respectively. As a result, the paddy field could be optimized in the range of 5 to 21 strata and 46 to 69 samples. Evaluation results showed that target variables were within precision constraints (CV<0.05 except for crop yield) with low bias values (below 3%). These results can contribute to reducing sampling cost and computation time while having high predictive power. It is expected to be widely used as a representative sample grid in various agriculture spatial modeling studies.

Accuracy Analysis of ADCP Stationary Discharge Measurement for Unmeasured Regions (ADCP 정지법 측정 시 미계측 영역의 유량 산정 정확도 분석)

  • Kim, Jongmin;Kim, Seojun;Son, Geunsoo;Kim, Dongsu
    • Journal of Korea Water Resources Association
    • /
    • v.48 no.7
    • /
    • pp.553-566
    • /
    • 2015
  • Acoustic Doppler Current Profilers(ADCPs) have capability to concurrently capitalize three-dimensional velocity vector and bathymetry with highly efficient and rapid manner, and thereby enabling ADCPs to document the hydrodynamic and morphologic data in very high spatial and temporal resolution better than other contemporary instruments. However, ADCPs are also limited in terms of the inevitable unmeasured regions near bottom, surface, and edges of a given cross-section. The velocity in those unmeasured regions are usually extrapolated or assumed for calculating flow discharge, which definitely affects the accuracy in the discharge assessment. This study aimed at scrutinizing a conventional extrapolation method(i.e., the 1/6 power law) for estimating the unmeasured regions to figure out the accuracy in ADCP discharge measurements. For the comparative analysis, we collected spatially dense velocity data using ADV as well as stationary ADCP in a real-scale straight river channel, and applied the 1/6 power law for testing its applicability in conjunction with the logarithmic law which is another representative velocity law. As results, the logarithmic law fitted better with actual velocity measurement than the 1/6 power law. In particular, the 1/6 power law showed a tendency to underestimate the velocity in the near surface region and overestimate in the near bottom region. This finding indicated that the 1/6 power law could be unsatisfactory to follow actual flow regime, thus that resulted discharge estimates in both unmeasured top and bottom region can give rise to discharge bias. Therefore, the logarithmic law should be considered as an alternative especially for the stationary ADCP discharge measurement. In addition, it was found that ADCP should be operated in at least more than 0.6 m of water depth in the left and right edges for better estimate edge discharges. In the future, similar comparative analysis might be required for the moving boat ADCP discharge measurement method, which has been more widely used in the field.

A Meta-analysis of Ambient Air Pollution in Relation to Daily Mortality in Seoul, $1991\sim1995$ (메타분석 방법을 적용한 서울시 대기오염과 조기사망의 상관성 연구 (1991년$\sim$1995년))

  • Dockery, Douglas W.;Kim, Chun-Bae;Jee, Sun-Ha;Chung, Yong;Lee, Jong-Tae
    • Journal of Preventive Medicine and Public Health
    • /
    • v.32 no.2
    • /
    • pp.177-182
    • /
    • 1999
  • Objectives: To reexamine the association between air pollution and daily mortality in Seoul, Korea using a method of meta-analysis with the data filed for 1991 through 1995. Methods: A separate Poisson regression analysis on each district within the metropolitan area of Seoul was conducted to regress daily death counts on levels of each ambient air pollutant, such as total suspended particulates (TSP), sulfur dioxide $(SO_2)$, and ozone $(O_3)$, controlling for variability in the weather condition. We calculated a weighted mean as a meta-analysis summary of the estimates and its standard error. Results: We found that the p value from each pollutant model to test the homogeneity assumption was small (p<0.01) because of the large disparity among district-specific estimates. Therefore, all results reported here were estimated from the random effect model. Using the weighted mean that we calculated, the mortality at a $100{\mu}g/m^3$ increment in a 3-day moving average of TSP levels was 1.034 (95% Cl 1.009-1.059). The mortality was estimated to increase 6% (95% Cl 3-10%) and 3% (95% Cl 0-6%) with each 50 ppb increase for 9-day moving average of SO2 and 1-hr maximum O3, respectively. Conclusions: Like most of air pollution epidemiologic studies, this meta-analysis cannot avoid fleeing from measurement misclassification since no personal measurement was taken. However, we can expect that a measurement bias be reduced in a district-specific estimate since a monitoring station is hefter representative cf air quality of the matched district. The similar results to those from the previous studios indicated existence of health effect of air pollution at current levels in many industrialized countries, including Korea.

  • PDF

The Impacts of Smoking Bans on Smoking in Korea (금연법 강화가 흡연에 미치는 영향)

  • Kim, Beomsoo;Kim, Ahram
    • KDI Journal of Economic Policy
    • /
    • v.31 no.2
    • /
    • pp.127-153
    • /
    • 2009
  • There is a growing concern about potential harmful effect of second-hand or environmental tobacco smoking. As a result, smoking bans in workplace become more prevalent worldwide. In Korea, workplace smoking ban policy become more restrictive in 2003 when National health enhancing law was amended. The new law requires all office buildings larger than 3,000 square meters (multi-purpose buildings larger than 2,000 square meters) should be smoke free. Therefore, a lot of indoor office became non smoking area. Previous studies in other counties often found contradicting answers for the effects of workplace smoking ban on smoking behavior. In addition, there was no study in Korea yet that examines the causal impacts of smoking ban on smoking behavior. The situation in Korea might be different from other countries. Using 2001 and 2005 Korea National Health and Nutrition surveys which are representative for population in Korea we try to examine the impacts of law change on current smoker and cigarettes smoked per day. The amended law impacted the whole country at the same time and there was a declining trend in smoking rate even before the legislation update. So, the challenge here is to tease out the true impact only. We compare indoor working occupations which are constrained by the law change with outdoor working occupations which are less impacted. Since the data has been collected before (2001) and after (2005) the law change for treated (indoor working occupations) and control (outdoor working occupations) groups we will use difference in difference method. We restrict our sample to working age (between 20 and 65) since these are the relevant population by the workplace smoking ban policy. We also restrict the sample to indoor occupations (executive or administrative and administrative support) and outdoor occupations (sales and low skilled worker) after dropping unemployed and someone working for military since it is not clear whether these occupations are treated group or control group. This classification was supported when we examined the answers for workplace smoking ban policy existing only in 2005 survey. Sixty eight percent of indoor occupations reported having an office smoking ban policy compared to forty percent of outdoor occupation answering workplace smoking ban policy. The estimated impacts on current smoker are 4.1 percentage point decline and cigarettes per day show statistically significant decline of 2.5 cigarettes per day. Taking into account consumption of average sixteen cigarettes per day among smokers it is sixteen percent decline in smoking rate which is substantial. We tested robustness using the same sample across two surveys and also using tobit model. Our results are robust against both concerns. It is possible that our measure of treated and control group have measurement error which will lead to attenuation bias. However, we are finding statistically significant impacts which might be a lower bound of the true estimates. The magnitude of our finding is not much different from previous finding of significant impacts. For cigarettes per day previous estimates varied from 1.37 to 3.9 and for current smoker it showed between 1%p and 7.8%p.

  • PDF

THE LUMINOSITY-LINEWIDTH RELATION AS A PROBE OF THE EVOLUTION OF FIELD GALAXIES

  • GUHATHAKURTA PURAGRA;ING KRISTINE;RIX HANS-WALTER;COLLESS MATTHEW;WILLIAMS TED
    • Journal of The Korean Astronomical Society
    • /
    • v.29 no.spc1
    • /
    • pp.63-64
    • /
    • 1996
  • The nature of distant faint blue field galaxies remains a mystery, despite the fact that much attention has been devoted to this subject in the last decade. Galaxy counts, particularly those in the optical and near ultraviolet bandpasses, have been demonstrated to be well in excess of those expected in the 'no-evolution' scenario. This has usually been taken to imply that galaxies were brighter in the past, presumably due to a higher rate of star formation. More recently, redshift surveys of galaxies as faint as B$\~$24 have shown that the mean redshift of faint blue galaxies is lower than that predicted by standard evolutionary models (de-signed to fit the galaxy counts). The galaxy number count data and redshift data suggest that evolutionary effects are most prominent at the faint end of the galaxy luminosity function. While these data constrain the form of evolution of the overall luminosity function, they do not constrain evolution in individual galaxies. We are carrying out a series of observations as part of a long-term program aimed at a better understanding of the nature and amount of luminosity evolution in individual galaxies. Our study uses the luminosity-linewidth relation (Tully-Fisher relation) for disk galaxies as a tool to study luminosity evolution. Several studies of a related nature are being carried out by other groups. A specific experiment to test a 'no-evolution' hypothesis is presented here. We have used the AUTOFIB multifibre spectro-graph on the 4-metre Anglo-Australian Telescope (AAT) and the Rutgers Fabry-Perot imager on the Cerro Tolalo lnteramerican Observatory (CTIO) 4-metre tele-scope to measure the internal kinematics of a representative sample of faint blue field galaxies in the red-shift range z = 0.15-0.4. The emission line profiles of [OII] and [OIII] in a typical sample galaxy are significantly broader than the instrumental resolution (100-120 km $s^{-l}$), and it is possible to make a reliable de-termination of the linewidth. Detailed and realistic simulations based on the properties of nearby, low-luminosity spirals are used to convert the measured linewidth into an estimate of the characteristic rotation speed, making statistical corrections for the effects of inclination, non-uniform distribution of ionized gas, rotation curve shape, finite fibre aperture, etc.. The (corrected) mean characteristic rotation speed for our distant galaxy sample is compared to the mean rotation speed of local galaxies of comparable blue luminosity and colour. The typical galaxy in our distant sample has a B-band luminosity of about 0.25 L$\ast$ and a colour that corresponds to the Sb-Sd/Im range of Hub-ble types. Details of the AUTOFIB fibre spectroscopic study are described by Rix et al. (1996). Follow-up deep near infrared imaging with the 10-metre Keck tele-scope+ NIRC combination and high angular resolution imaging with the Hubble Space Telescope's WFPC2 are being used to determine the structural and orientation parameters of galaxies on an individual basis. This information is being combined with the spatially resolved CTIO Fabry-Perot data to study the internal kinematics of distant galaxies (Ing et al. 1996). The two main questions addressed by these (preliminary studies) are: 1. Do galaxies of a given luminosity and colour have the same characteristic rotation speed in the distant and local Universe? The distant galaxies in our AUTOFIB sample have a mean characteristic rotation speed of $\~$70 km $s^{-l}$ after correction for measurement bias (Fig. 1); this is inconsistent with the characteristic rotation speed of local galaxies of comparable photometric proper-ties (105 km $s^{-l}$) at the > $99\%$ significance level (Fig. 2). A straightforward explanation for this discrepancy is that faint blue galaxies were about 1-1.5 mag brighter (in the B band) at z $\~$ 0.25 than their present-day counterparts. 2. What is the nature of the internal kinematics of faint field galaxies? The linewidths of these faint galaxies appear to be dominated by the global disk rotation. The larger galaxies in our sample are about 2"-.5" in diameter so one can get direct insight into the nature of their internal velocity field from the $\~$ I" seeing CTIO Fabry-Perot data. A montage of Fabry-Perot data is shown in Fig. 3. The linewidths are too large (by. $5\sigma$) to be caused by turbulence in giant HII regions.

  • PDF

Rainfall image DB construction for rainfall intensity estimation from CCTV videos: focusing on experimental data in a climatic environment chamber (CCTV 영상 기반 강우강도 산정을 위한 실환경 실험 자료 중심 적정 강우 이미지 DB 구축 방법론 개발)

  • Byun, Jongyun;Jun, Changhyun;Kim, Hyeon-Joon;Lee, Jae Joon;Park, Hunil;Lee, Jinwook
    • Journal of Korea Water Resources Association
    • /
    • v.56 no.6
    • /
    • pp.403-417
    • /
    • 2023
  • In this research, a methodology was developed for constructing an appropriate rainfall image database for estimating rainfall intensity based on CCTV video. The database was constructed in the Large-Scale Climate Environment Chamber of the Korea Conformity Laboratories, which can control variables with high irregularity and variability in real environments. 1,728 scenarios were designed under five different experimental conditions. 36 scenarios and a total of 97,200 frames were selected. Rain streaks were extracted using the k-nearest neighbor algorithm by calculating the difference between each image and the background. To prevent overfitting, data with pixel values greater than set threshold, compared to the average pixel value for each image, were selected. The area with maximum pixel variability was determined by shifting with every 10 pixels and set as a representative area (180×180) for the original image. After re-transforming to 120×120 size as an input data for convolutional neural networks model, image augmentation was progressed under unified shooting conditions. 92% of the data showed within the 10% absolute range of PBIAS. It is clear that the final results in this study have the potential to enhance the accuracy and efficacy of existing real-world CCTV systems with transfer learning.