• 제목/요약/키워드: Incomplete Dataset

Search Result 19, Processing Time 0.022 seconds

The Colorectal Cancer Mortality-to-Incidence Ratio as a Potential Cancer Surveillance Measure in Asia

  • Sunkara, Vasu;Hebert, James R
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.17 no.9
    • /
    • pp.4323-4326
    • /
    • 2016
  • Background: The cancer mortality-to-incidence ratio (MIR) has been established as an important measure of health disparities in local and global circumstances. Past work has corroborated a linkage between the colorectal cancer MIR and the World Health Organization (WHO) Health System ranking. The literature further documents many Asian countries having incomplete cancer registries and a lack of comprehensive colorectal cancer screening guidelines. Materials and Methods: The colorectal cancer MIR values for 23 Asian countries were calculated from data obtained from the 2012 GLOBOCAN database. The 2000 World Health Organization (WHO) Health System rankings were used as a proxy for health system infrastructure and responsiveness. A regression equation was calculated with the MIR as the dependent variable and the WHO Health System ranking as the independent variable. Predicted MIR values were next calculated based on the regression results. Actual MIR values that exceeded 0.20 from the predicted MIR were removed as 'divergent' points. The regression equation was then re-plotted. Goodness-of-fit for both regressions was assessed by the R-squared test. Results: Asian countries have a relatively wide colorectal cancer MIR range, from a minimum of 0.24 to a maximum of 0.86. For the full dataset, the adjusted R-squared value for this regression was 0.53. The equation was then used to calculate a predicted MIR, whereby two data points were identified as 'divergent' and removed. The adjusted R-squared for the edited dataset increased to 0.66. Conclusions: Asian countries have a marked range in their colorectal cancer MIR values and there is a strong correlationwith the WHO Health System ranking. These results corroborate the contribution of the MIR as a potentially robust tool in monitoring changes in colorectal cancer care for Asian nations.

Automatic Electronic Cleansing in Computed Tomography Colonography Images using Domain Knowledge

  • Manjunath, KN;Siddalingaswamy, PC;Prabhu, GK
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.16 no.18
    • /
    • pp.8351-8358
    • /
    • 2016
  • Electronic cleansing is an image post processing technique in which the tagged colonic content is subtracted from colon using CTC images. There are post processing artefacts, like: 1) soft tissue degradation; 2) incomplete cleansing; 3) misclassification of polyp due to pseudo enhanced voxels; and 4) pseudo soft tissue structures. The objective of the study was to subtract the tagged colonic content without losing the soft tissue structures. This paper proposes a novel adaptive method to solve the first three problems using a multi-step algorithm. It uses a new edge model-based method which involves colon segmentation, priori information of Hounsfield units (HU) of different colonic contents at specific tube voltages, subtracting the tagging materials, restoring the soft tissue structures based on selective HU, removing boundary between air-contrast, and applying a filter to clean minute particles due to improperly tagged endoluminal fluids which appear as noise. The main finding of the study was submerged soft tissue structures were absolutely preserved and the pseudo enhanced intensities were corrected without any artifact. The method was implemented with multithreading for parallel processing in a high performance computer. The technique was applied on a fecal tagged dataset (30 patients) where the tagging agent was not completely removed from colon. The results were then qualitatively validated by radiologists for any image processing artifacts.

Review for time-dependent ROC analysis under diverse survival models (생존 분석 자료에서 적용되는 시간 가변 ROC 분석에 대한 리뷰)

  • Kim, Yang-Jin
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.1
    • /
    • pp.35-47
    • /
    • 2022
  • The receiver operating characteristic (ROC) curve was developed to quantify the classification ability of marker values (covariates) on the response variable and has been extended to survival data with diverse missing data structure. When survival data is understood as binary data (status of being alive or dead) at each time point, the ROC curve expressed at every time point results in time-dependent ROC curve and time-dependent area under curve (AUC). In particular, a follow-up study brings the change of cohort and incomplete data structures such as censoring and competing risk. In this paper, we review time-dependent ROC estimators under several contexts and perform simulation to check the performance of each estimators. We analyzed a dementia dataset to compare the prognostic power of markers.

Face inpainting via Learnable Structure Knowledge of Fusion Network

  • Yang, You;Liu, Sixun;Xing, Bin;Li, Kesen
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.3
    • /
    • pp.877-893
    • /
    • 2022
  • With the development of deep learning, face inpainting has been significantly enhanced in the past few years. Although image inpainting framework integrated with generative adversarial network or attention mechanism enhanced the semantic understanding among facial components, the issues of reconstruction on corrupted regions are still worthy to explore, such as blurred edge structure, excessive smoothness, unreasonable semantic understanding and visual artifacts, etc. To address these issues, we propose a Learnable Structure Knowledge of Fusion Network (LSK-FNet), which learns a prior knowledge by edge generation network for image inpainting. The architecture involves two steps: Firstly, structure information obtained by edge generation network is used as the prior knowledge for face inpainting network. Secondly, both the generated prior knowledge and the incomplete image are fed into the face inpainting network together to get the fusion information. To improve the accuracy of inpainting, both of gated convolution and region normalization are applied in our proposed model. We evaluate our LSK-FNet qualitatively and quantitatively on the CelebA-HQ dataset. The experimental results demonstrate that the edge structure and details of facial images can be improved by using LSK-FNet. Our model surpasses the compared models on L1, PSNR and SSIM metrics. When the masked region is less than 20%, L1 loss reduce by more than 4.3%.

Inferring the Transit Trip Destination Zone of Smart Card User Using Trip Chain Structure (통행사슬 구조를 이용한 교통카드 이용자의 대중교통 통행종점 추정)

  • SHIN, Kangwon
    • Journal of Korean Society of Transportation
    • /
    • v.34 no.5
    • /
    • pp.437-448
    • /
    • 2016
  • Some previous researches suggested a transit trip destination inference method by constructing trip chains with incomplete(missing destination) smart card dataset obtained on the entry fare control systems. To explore the feasibility of the transit trip destination inference method, the transit trip chains are constructed from the pre-paid smart card tagging data collected in Busan on October 2014 weekdays by tracing the card IDs, tagging times(boarding, alighting, transfer), and the trip linking distances between two consecutive transit trips in a daily sequences. Assuming that most trips in the transit trip chains are linked successively, the individual transit trip destination zones are inferred as the consecutive linking trip's origin zones. Applying the model to the complete trips with observed OD reveals that about 82% of the inferred trip destinations are the same as those of the observed trip destinations and the inference error defined as the difference in distance between the inferred and observed alighting stops is minimized when the trip linking distance is less than or equal to 0.5km. When applying the model to the incomplete trips with missing destinations, the overall destination missing rate decreases from 71.40% to 21.74% and approximately 77% of the destination missing trips are the single transit trips for which the destinations can not be inferable. In addition, the model remarkably reduces the destination missing rate of the multiple incomplete transit trips from 69.56% to 6.27%. Spearman's rank correlation and Chi-squared goodness-of-fit tests showed that the ranks for transit trips of each zone are not significantly affected by the inferred trips, but the transit trip distributions only using small complete trips are significantly different from those using complete and inferred trips. Therefore, it is concluded that the model should be applicable to derive a realistic transit trip patterns in cities with the incomplete smart card data.

Feature Extraction to Detect Hoax Articles (낚시성 인터넷 신문기사 검출을 위한 특징 추출)

  • Heo, Seong-Wan;Sohn, Kyung-Ah
    • Journal of KIISE
    • /
    • v.43 no.11
    • /
    • pp.1210-1215
    • /
    • 2016
  • Readership of online newspapers has grown with the proliferation of smart devices. However, fierce competition between Internet newspaper companies has resulted in a large increase in the number of hoax articles. Hoax articles are those where the title does not convey the content of the main story, and this gives readers the wrong information about the contents. We note that the hoax articles have certain characteristics, such as unnecessary celebrity quotations, mismatch in the title and content, or incomplete sentences. Based on these, we extract and validate features to identify hoax articles. We build a large-scale training dataset by analyzing text keywords in replies to articles and thus extracted five effective features. We evaluate the performance of the support vector machine classifier on the extracted features, and a 92% accuracy is observed in our validation set. In addition, we also present a selective bigram model to measure the consistency between the title and content, which can be effectively used to analyze short texts in general.

Oil Pipeline Weld Defect Identification System Based on Convolutional Neural Network

  • Shang, Jiaze;An, Weipeng;Liu, Yu;Han, Bang;Guo, Yaodan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.3
    • /
    • pp.1086-1103
    • /
    • 2020
  • The automatic identification and classification of image-based weld defects is a difficult task due to the complex texture of the X-ray images of the weld defect. Several depth learning methods for automatically identifying welds were proposed and tested. In this work, four different depth convolutional neural networks were evaluated and compared on the 1631 image set. The concavity, undercut, bar defects, circular defects, unfused defects and incomplete penetration in the weld image 6 different types of defects are classified. Another contribution of this paper is to train a CNN model "RayNet" for the dataset from scratch. In the experiment part, the parameters of convolution operation are compared and analyzed, in which the experimental part performs a comparative analysis of various parameters in the convolution operation, compares the size of the input image, gives the classification results for each defect, and finally shows the partial feature map during feature extraction with the classification accuracy reaching 96.5%, which is 6.6% higher than the classification accuracy of other existing fine-tuned models, and even improves the classification accuracy compared with the traditional image processing methods, and also proves that the model trained from scratch also has a good performance on small-scale data sets. Our proposed method can assist the evaluators in classifying pipeline welding defects.

Variance Components and Genetic Parameters for Milk Production and Lactation Pattern in an Ethiopian Multibreed Dairy Cattle Population

  • Gebreyohannes, Gebregziabher;Koonawootrittriron, Skorn;Elzo, Mauricio A.;Suwanasopee, Thanathip
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.26 no.9
    • /
    • pp.1237-1246
    • /
    • 2013
  • The objective of this study was to estimate variance components and genetic parameters for lactation milk yield (LY), lactation length (LL), average milk yield per day (YD), initial milk yield (IY), peak milk yield (PY), days to peak (DP) and parameters (ln(a) and c) of the modified incomplete gamma function (MIG) in an Ethiopian multibreed dairy cattle population. The dataset was composed of 5,507 lactation records collected from 1,639 cows in three locations (Bako, Debre Zeit and Holetta) in Ethiopia from 1977 to 2010. Parameters for MIG were obtained from regression analysis of monthly test-day milk data on days in milk. The cows were purebred (Bos indicus) Boran (B) and Horro (H) and their crosses with different fractions of Friesian (F), Jersey (J) and Simmental (S). There were 23 breed groups (B, H, and their crossbreds with F, J, and S) in the population. Fixed and mixed models were used to analyse the data. The fixed model considered herd-year-season, parity and breed group as fixed effects, and residual as random. The single and two-traits mixed animal repeatability models, considered the fixed effects of herd-year-season and parity subclasses, breed as a function of cow H, F, J, and S breed fractions and general heterosis as a function of heterozygosity, and the random additive animal, permanent environment, and residual effects. For the analysis of LY, LL was added as a fixed covariate to all models. Variance components and genetic parameters were estimated using average information restricted maximum likelihood procedures. The results indicated that all traits were affected (p<0.001) by the considered fixed effects. High grade $B{\times}F$ cows (3/16B 13/16F) had the highest least squares means (LSM) for LY ($2,490{\pm}178.9kg$), IY ($10.5{\pm}0.8kg$), PY ($12.7{\pm}0.9kg$), YD ($7.6{\pm}0.55kg$) and LL ($361.4{\pm}31.2d$), while B cows had the lowest LSM values for these traits. The LSM of LY, IY, YD, and PY tended to increase from the first to the fifth parity. Single-trait analyses yielded low heritability ($0.03{\pm}0.03$ and $0.08{\pm}0.02$) and repeatability ($0.14{\pm}0.01$ to $0.24{\pm}0.02$) estimates for LL, DP and parameter c. Medium heritability ($0.21{\pm}0.03$ to $0.33{\pm}0.04$) and repeatability ($0.27{\pm}0.02$ to $0.53{\pm}0.01$) estimates were obtained for LY, IY, PY, YD and ln(a). Genetic correlations between LY, IY, PY, YD, ln(a), and LL ranged from 0.59 to 0.99. Spearman's rank correlations between sire estimated breeding values for LY, LL, IY, PY, YD, ln(a) and c were positive (0.67 to 0.99, p<0.001). These results suggested that selection for IY, PY, YD, or LY would genetically improve lactation milk yield in this Ethiopian dairy cattle population.

Estimation of Reference Crop Evapotranspiration Using Backpropagation Neural Network Model (역전파 신경망 모델을 이용한 기준 작물 증발산량 산정)

  • Kim, Minyoung;Choi, Yonghun;O'Shaughnessy, Susan;Colaizzi, Paul;Kim, Youngjin;Jeon, Jonggil;Lee, Sangbong
    • Journal of The Korean Society of Agricultural Engineers
    • /
    • v.61 no.6
    • /
    • pp.111-121
    • /
    • 2019
  • Evapotranspiration (ET) of vegetation is one of the major components of the hydrologic cycle, and its accurate estimation is important for hydrologic water balance, irrigation management, crop yield simulation, and water resources planning and management. For agricultural crops, ET is often calculated in terms of a short or tall crop reference, such as well-watered, clipped grass (reference crop evapotranspiration, $ET_o$). The Penman-Monteith equation recommended by FAO (FAO 56-PM) has been accepted by researchers and practitioners, as the sole $ET_o$ method. However, its accuracy is contingent on high quality measurements of four meteorological variables, and its use has been limited by incomplete and/or inaccurate input data. Therefore, this study evaluated the applicability of Backpropagation Neural Network (BPNN) model for estimating $ET_o$ from less meteorological data than required by the FAO 56-PM. A total of six meteorological inputs, minimum temperature, average temperature, maximum temperature, relative humidity, wind speed and solar radiation, were divided into a series of input groups (a combination of one, two, three, four, five and six variables) and each combination of different meteorological dataset was evaluated for its level of accuracy in estimating $ET_o$. The overall findings of this study indicated that $ET_o$ could be reasonably estimated using less than all six meteorological data using BPNN. In addition, it was shown that the proper choice of neural network architecture could not only minimize the computational error, but also maximize the relationship between dependent and independent variables. The findings of this study would be of use in instances where data availability and/or accuracy are limited.