• Title/Summary/Keyword: biological dataset

Search Result 121, Processing Time 0.033 seconds

MarSel : LD based tagSNP Selection System for Large-scale SNP Haplotype Dataset (MarSel : 대용량 SNP 일배체형 데이터에 대한 연관불균형기반의 tagSNP 선택 시스템)

  • Kim Sang-Jun;Yeo Sang-Soo;Kim Sung-Kwon
    • The KIPS Transactions:PartA
    • /
    • v.13A no.1 s.98
    • /
    • pp.79-86
    • /
    • 2006
  • Recently the tagSNP selection problem has been researched for reducing the cost of association studies between human's diversities and SNPs. General approach for this problem is that all of SNPs are separated into appropriate blocks and then tagSNPs are chosen in each block. Marsel in this paper is the system that involved the concept of linkage disequilibrium for overcoming the problem that the existing block partitioning approaches have short of biological meanings. In most approaches, the contiguous regions, which recombinations have LD coefficient |D'| and then tagSNP selection step is performed. And MarSel guarantees the minimum tagSNP selection using entropy-based optimal selection algorithm when tagSNPs are chosen in each block, and enables chromosome-level association studies using efficient memory management technique when input is very large-scale dataset that is impossible to be processed in the existing systems.

Phylogeny of Phellinus and Related Genera Inferred from Combined Data of ITS and Mitochondrial SSU rDNA Sequences

  • JEONG WON JIN;LIM YOUNG WOON;LEE JIN SUNG;JUNG HACK SUNG
    • Journal of Microbiology and Biotechnology
    • /
    • v.15 no.5
    • /
    • pp.1028-1038
    • /
    • 2005
  • To elucidate phylogenetic relationships of Phellinus and its related genera, nuclear internal transcribed spacer and mitochondrial small subunit ribosomal DNA sequences from 65 strains were determined and compared. The combined dataset of two sequences increased informative characters and led to the production of trees with higher levels of resolution. Phylogenetic analysis of the combined dataset revealed thirteen evolutionary lineages and several unresolved species that were together subdivided into two large clusters consisting of oligonucleate species and binucleate species. These results coincided with previous cytological, morphological, and molecular studies. It is newly recognized that the Phellinus linteus complex forms a sister clade to Inonotus, and that Fulvifomes is somehow related to Inocutis. The Phellinus linteus complex of dimitic perennial taxa made an independent clade from Inonotus and suggested that hyphal miticity and fruitbody permanence had enough phylogenetic significance to keep the complex within the traditional genus Phellinus. Taxa lacking setae were clustered into Fulvifomes, Phylloporia, Inocutis, and Fomitiporia, and the first three were closely related sister groups, but Fomitiporia was a genus distantly related to them. Several taxa with branched setae were shown among distantly related genera. Molecular evidence indicated that the ancestral nuclear type could be a binucleate feature, and that there might be parallel gains of branched setae and parallel losses of setae in the Hymenochaetales.

Performance Improvement of Convolutional Neural Network for Pulmonary Nodule Detection (폐 결절 검출을 위한 합성곱 신경망의 성능 개선)

  • Kim, HanWoong;Kim, Byeongnam;Lee, JeeEun;Jang, Won Seuk;Yoo, Sun K.
    • Journal of Biomedical Engineering Research
    • /
    • v.38 no.5
    • /
    • pp.237-241
    • /
    • 2017
  • Early detection of the pulmonary nodule is important for diagnosis and treatment of lung cancer. Recently, CT has been used as a screening tool for lung nodule detection. And, it has been reported that computer aided detection(CAD) systems can improve the accuracy of the radiologist in detection nodules on CT scan. The previous study has been proposed a method using Convolutional Neural Network(CNN) in Lung CAD system. But the proposed model has a limitation in accuracy due to its sparse layer structure. Therefore, we propose a Deep Convolutional Neural Network to overcome this limitation. The model proposed in this work is consist of 14 layers including 8 convolutional layers and 4 fully connected layers. The CNN model is trained and tested with 61,404 regions-of-interest (ROIs) patches of lung image including 39,760 nodules and 21,644 non-nodules extracted from the Lung Image Database Consortium(LIDC) dataset. We could obtain the classification accuracy of 91.79% with the CNN model presented in this work. To prevent overfitting, we trained the model with Augmented Dataset and regularization term in the cost function. With L1, L2 regularization at Training process, we obtained 92.39%, 92.52% of accuracy respectively. And we obtained 93.52% with data augmentation. In conclusion, we could obtain the accuracy of 93.75% with L2 Regularization and Data Augmentation.

Comparison of External Information Performance Predicting Subcellular Localization of Proteins (단백질의 세포내 위치를 예측하기 위한 외부정보의 성능 비교)

  • Chi, Sang-Mun
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.11
    • /
    • pp.803-811
    • /
    • 2010
  • Since protein subcellular location and biological function are highly correlated, the prediction of protein subcellular localization can provide information about the function of a protein. In order to enhance the prediction performance, external information other than amino acids sequence information is actively exploited in many researches. This paper compares the prediction capabilities resided in amino acid sequence similarity, protein profile, gene ontology, motif, and textual information. In the experiments using PLOC dataset which has proteins less than 80% sequence similarity, sequence similarity information and gene ontology are effective information, achieving a classification accuracy of 94.8%. In the experiments using BaCelLo IDS dataset with low sequence similarity less than 30%, using gene ontology gives the best prediction accuracies, 93.2% for animals and 86.6% for fungi.

Monitoring soil respiration using an automatic operating chamber in a Gwangneung temperate deciduous forest

  • Lee, Jae-Seok
    • Journal of Ecology and Environment
    • /
    • v.34 no.4
    • /
    • pp.411-423
    • /
    • 2011
  • This study was conducted to quantify soil $CO_2$ efflux using the continuous measurement method and to examine the applicability of an automatic continuous measurement system in a Korean deciduous broad-leaved forest. Soil respiration rate (Rs) was assessed through continuous measurements during the 2004-2005 full growing seasons using an automatic opening/closing chamber system in sections of a Gwangneung temperate deciduous forest, Korea. The study site was an old-growth natural mixed deciduous forest approximately 80 years old. For each full growth season, the annual Rs, which had a gap that was filled with data using an exponential function derived from soil temperature (Ts) at 5-cm depth, and Rs values collected in each season were 2,738.1 g $CO_2$ $m^{-2}y^{-1}$ in 2004 and 3,355.1 g $CO_2$ $m^{-2}y^{-1}$ in 2005. However, the diurnal variation in Rs showed stronger correlations with Ts (r = 0.91, P < 0.001 in 2004, r = 0.87, P < 0.001 in 2005) and air temperature (Ta) (r = 0.84, P < 0.001 in 2004, r = 0.79, P < 0.001 in 2005) than with deep Ts during the spring season. However, the temperature functions derived from the Ts at various depths of 0, -2, -5, -10, and -20 cm revealed that the correlation coefficient decreased with increasing soil depth in the spring season, whereas it increased in the summer. Rs showed a weak correlation with precipitation (r = 0.25, P < 0.01) and soil water content (r = 0.28, P < 0.05). Additionally, the diurnal change in Rs revealed a higher correlation with Ta than that of Ts. The $Q_{10}$ values from spring to winter were calculated from each season's dataset and were 3.2, 1.5, 7.4, and 2.7 in 2004 and 6.0, 3.1, 3.0, and 2.6 in 2005; thus, showing high fluctuation within each season. The applicability of an automatic continuous system was demonstrated for collecting a high resolution soil $CO_2$ efflux dataset under various environmental conditions.

Development and Evaluation of D-Attention Unet Model Using 3D and Continuous Visual Context for Needle Detection in Continuous Ultrasound Images (연속 초음파영상에서의 바늘 검출을 위한 3D와 연속 영상문맥을 활용한 D-Attention Unet 모델 개발 및 평가)

  • Lee, So Hee;Kim, Jong Un;Lee, Su Yeol;Ryu, Jeong Won;Choi, Dong Hyuk;Tae, Ki Sik
    • Journal of Biomedical Engineering Research
    • /
    • v.41 no.5
    • /
    • pp.195-202
    • /
    • 2020
  • Needle detection in ultrasound images is sometimes difficult due to obstruction of fat tissues. Accurate needle detection using continuous ultrasound (CUS) images is a vital stage of treatment planning for tissue biopsy and brachytherapy. The main goal of the study is classified into two categories. First, new detection model, i.e. D-Attention Unet, is developed by combining the context information of 3D medical data and CUS images. Second, the D-Attention Unet model was compared with other models to verify its usefulness for needle detection in continuous ultrasound images. The continuous needle images taken with ultrasonic waves were converted into still images for dataset to evaluate the performance of the D-Attention Unet. The dataset was used for training and testing. Based on the results, the proposed D-Attention Unet model showed the better performance than other 3 models (Unet, D-Unet and Attention Unet), with Dice Similarity Coefficient (DSC), Recall and Precision at 71.9%, 70.6% and 73.7%, respectively. In conclusion, the D-Attention Unet model provides accurate needle detection for US-guided biopsy or brachytherapy, facilitating the clinical workflow. Especially, this kind of research is enthusiastically being performed on how to add image processing techniques to learning techniques. Thus, the proposed method is applied in this manner, it will be more effective technique than before.

Assessment of Classification Accuracy of fNIRS-Based Brain-computer Interface Dataset Employing Elastic Net-Based Feature Selection (Elastic net 기반 특징 선택을 적용한 fNIRS 기반 뇌-컴퓨터 인터페이스 데이터셋 분류 정확도 평가)

  • Shin, Jaeyoung
    • Journal of Biomedical Engineering Research
    • /
    • v.42 no.6
    • /
    • pp.268-276
    • /
    • 2021
  • Functional near-infrared spectroscopy-based brain-computer interface (fNIRS-based BCI) has been receiving much attention. However, we are practically constrained to obtain a lot of fNIRS data by inherent hemodynamic delay. For this reason, when employing machine learning techniques, a problem due to the high-dimensional feature vector may be encountered, such as deteriorated classification accuracy. In this study, we employ an elastic net-based feature selection which is one of the embedded methods and demonstrate the utility of which by analyzing the results. Using the fNIRS dataset obtained from 18 participants for classifying brain activation induced by mental arithmetic and idle state, we calculated classification accuracies after performing feature selection while changing the parameter α (weight of lasso vs. ridge regularization). Grand averages of classification accuracy are 80.0 ± 9.4%, 79.3 ± 9.6%, 79.0 ± 9.2%, 79.7 ± 10.1%, 77.6 ± 10.3%, 79.2 ± 8.9%, and 80.0 ± 7.8% for the various values of α = 0.001, 0.005, 0.01, 0.05, 0.1, 0.2, and 0.5, respectively, and are not statistically different from the grand average of classification accuracy estimated with all features (80.1 ± 9.5%). As a result, no difference in classification accuracy is revealed for all considered parameter α values. Especially for α = 0.5, we are able to achieve the statistically same level of classification accuracy with even 16.4% features of the total features. Since elastic net-based feature selection can be easily applied to other cases without complicated initialization and parameter fine-tuning, we can be looking forward to seeing that the elastic-based feature selection can be actively applied to fNIRS data.

Threshold-based Pre-impact Fall Detection and its Validation Using the Real-world Elderly Dataset (임계값 기반 충격 전 낙상검출 및 실제 노인 데이터셋을 사용한 검증)

  • Dongkwon Kim;Seunghee Lee;Bummo Koo;Sumin Yang;Youngho Kim
    • Journal of Biomedical Engineering Research
    • /
    • v.44 no.6
    • /
    • pp.384-391
    • /
    • 2023
  • Among the elderly, fatal injuries and deaths are significantly attributed to falls. Therefore, a pre-impact fall detection system is necessary for injury prevention. In this study, a robust threshold-based algorithm was proposed for pre-impact fall detection, reducing false positives in highly dynamic daily-living movements. The algorithm was validated using public datasets (KFall and FARSEEING) that include the real-world elderly fall. A 6-axis IMU sensor (Movella Dot, Movella, Netherlands) was attached to S2 of 20 healthy adults (aged 22.0±1.9years, height 164.9±5.9cm, weight 61.4±17.1kg) to measure 14 activities of daily living and 11 fall movements at a sampling frequency of 60Hz. A 5Hz low-pass filter was applied to the IMU data to remove high-frequency noise. Sum vector magnitude of acceleration and angular velocity, roll, pitch, and vertical velocity were extracted as feature vector. The proposed algorithm showed an accuracy 98.3%, a sensitivity 100%, a specificity 97.0%, and an average lead-time 311±99ms with our experimental data. When evaluated using the KFall public dataset, an accuracy in adult data improved to 99.5% compared to recent studies, and for the elderly data, a specificity of 100% was achieved. When evaluated using FARSEEING real-world elderly fall data without separate segmentation, it showed a sensitivity of 71.4% (5/7).

Mining Maximal Frequent Contiguous Sequences in Biological Data Sequences (생물학적 데이터 서열들에서 빈번한 최대길이 연속 서열 마이닝)

  • Kang, Tae-Ho;Yoo, Jae-Soo
    • The KIPS Transactions:PartD
    • /
    • v.15D no.2
    • /
    • pp.155-162
    • /
    • 2008
  • Biological sequences such as DNA sequences and amino acid sequences typically contain a large number of items. They have contiguous sequences that ordinarily consist of hundreds of frequent items. In biological sequences analysis(BSA), a frequent contiguous sequence search is one of the most important operations. Many studies have been done for mining sequential patterns efficiently. Most of the existing methods for mining sequential patterns are based on the Apriori algorithm. In particular, the prefixSpan algorithm is one of the most efficient sequential pattern mining schemes based on the Apriori algorithm. However, since the algorithm expands the sequential patterns from frequent patterns with length-1, it is not suitable for biological dataset with long frequent contiguous sequences. In recent years, the MacosVSpan algorithm was proposed based on the idea of the prefixSpan algorithm to significantly reduce its recursive process. However, the algorithm is still inefficient for mining frequent contiguous sequences from long biological data sequences. In this paper, we propose an efficient method to mine maximal frequent contiguous sequences in large biological data sequences by constructing the spanning tree with the fixed length. To verify the superiority of the proposed method, we perform experiments in various environments. As the result, the experiments show that the proposed method is much more efficient than MacosVSpan in terms of retrieval performance.

Total Intracranial Volume Measurement for Children by Using an Automatized Program (자동화 프로그램을 이용한 아동의 전체두개강내용적 평가)

  • Lee, Jeonghwan;Kim, Ji-Eun;Im, Sungjin;Ju, Gawon;Kim, Siekyeong;Son, Jung-Woo;Shin, Chul-Jin;Lee, Sang-Ick;Ghim, Hei-Rhee
    • Korean Journal of Biological Psychiatry
    • /
    • v.21 no.3
    • /
    • pp.81-86
    • /
    • 2014
  • Objectives Total intracranial volume (TIV) is a major nuisance of neuroimaging research for interindividual differences of brain structure and function. Authors intended to prove the reliability of the atlas scaling factor (ASF) method for TIV estimation in FreeSurfer by comparing it with the results of manual tracing as reference method. Methods The TIVs of 26 normal children and 26 children with attention-deficit hyperactivity disorder (ADHD) were obtained by using FreeSurfer reconstruction and manual tracing with T1-weighted images. Manual tracing performed in every 10th slice of MRI dataset from midline of sagittal plane by one researcher who was blinded from clinical data. Another reseacher performed manual tracing independently for randomly selected 20 dataset to verify interrater reliability. Results The interrater reliability was excellent (intraclass coefficient = 0.91, p < 7.1e-07). There were no significant differences of age and gender distribution between normal and ADHD groups. No significant differences were found between TIVs from ASF method and manual tracing. Strong correlation between TIVs from 2 different methods were shown (r = 0.90, p < 2.2e-16). Conclusions The ASF method for TIV estimation by using FreeSurfer showed good agreement with the reference method. We can use the TIV from ASF method for correction in analysis of structural and functional neuroimaging studies with not only elderly subjects but also children, even with ADHD.