• Title/Summary/Keyword: biological dataset

Search Result 126, Processing Time 0.022 seconds

Improving classification of low-resource COVID-19 literature by using Named Entity Recognition

  • Lithgow-Serrano, Oscar;Cornelius, Joseph;Kanjirangat, Vani;Mendez-Cruz, Carlos-Francisco;Rinaldi, Fabio
    • Genomics & Informatics
    • /
    • v.19 no.3
    • /
    • pp.22.1-22.5
    • /
    • 2021
  • Automatic document classification for highly interrelated classes is a demanding task that becomes more challenging when there is little labeled data for training. Such is the case of the coronavirus disease 2019 (COVID-19) clinical repository-a repository of classified and translated academic articles related to COVID-19 and relevant to the clinical practice-where a 3-way classification scheme is being applied to COVID-19 literature. During the 7th Biomedical Linked Annotation Hackathon (BLAH7) hackathon, we performed experiments to explore the use of named-entity-recognition (NER) to improve the classification. We processed the literature with OntoGene's Biomedical Entity Recogniser (OGER) and used the resulting identified Named Entities (NE) and their links to major biological databases as extra input features for the classifier. We compared the results with a baseline model without the OGER extracted features. In these proof-of-concept experiments, we observed a clear gain on COVID-19 literature classification. In particular, NE's origin was useful to classify document types and NE's type for clinical specialties. Due to the limitations of the small dataset, we can only conclude that our results suggests that NER would benefit this classification task. In order to accurately estimate this benefit, further experiments with a larger dataset would be needed.

Classification of Gripping Movement in Daily Life Using EMG-based Spider Chart and Deep Learning (근전도 기반의 Spider Chart와 딥러닝을 활용한 일상생활 잡기 손동작 분류)

  • Lee, Seong Mun;Pi, Sheung Hoon;Han, Seung Ho;Jo, Yong Un;Oh, Do Chang
    • Journal of Biomedical Engineering Research
    • /
    • v.43 no.5
    • /
    • pp.299-307
    • /
    • 2022
  • In this paper, we propose a pre-processing method that converts to Spider Chart image data for classification of gripping movement using EMG (electromyography) sensors and Convolution Neural Networks (CNN) deep learning. First, raw data for six hand gestures are extracted from five test subjects using an 8-channel armband and converted into Spider Chart data of octagonal shapes, which are divided into several sliding windows and are learned. In classifying six hand gestures, the classification performance is compared with the proposed pre-processing method and the existing methods. Deep learning was performed on the dataset by dividing 70% of the total into training, 15% as testing, and 15% as validation. For system performance evaluation, five cross-validations were applied by dividing 80% of the entire dataset by training and 20% by testing. The proposed method generates 97% and 94.54% in cross-validation and general tests, respectively, using the Spider Chart preprocessing, which was better results than the conventional methods.

Improving LTC using Markov Chain Model of Sensory Neurons and Synaptic Plasticity (감각 뉴런의 마르코프 체인 모델과 시냅스 가소성을 이용한 LTC 개선)

  • Lee, Junhyeok
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.10a
    • /
    • pp.150-152
    • /
    • 2022
  • In this work, we propose a model that considers the behavior and synaptic plasticity of sensory neurons based on Liquid Time-constant Network (LTC). The neuron connection structure was experimented with four types: the increasing number of neurons, the decreasing number, the decreasing number, and the decreasing number. In this study, we experimented using a time series prediction dataset to see if the performance of the changed model improved compared to LTC. Experimental results show that the application of modeling of sensory neurons does not always bring about performance improvements, but improves performance through proper selection of learning rules depending on the type of dataset. In addition, the connective structure of neurons showed improved performance when it was less than four layers.

  • PDF

Chronic Stress Evaluation using Neuro-Fuzzy (뉴로-퍼지를 이용한 만성적인 스트레스 평가)

  • ;;;;;;;Hiroko Takeuchi;Haruyuki Minamitani
    • Journal of Biomedical Engineering Research
    • /
    • v.24 no.5
    • /
    • pp.465-471
    • /
    • 2003
  • The purpose of this research was to evaluate chronic stress using physiological parameters. Wistar rats were exposed to the sound stress for 14 days. Biosignals were acquired hourly. To develop a fuzzy inference system which can integrate physiological parameters. the parameters of the system were adjusted by the adaptive neuro-fuzzy inference system. Of the training dataset, input dataset was the physiological parameters from the biosignals and output dataset was the target values from the cortisol production. Physiological parameters were integrated using the fuzzy inference system. then 24-hour results were analyzed by the Cosinor method. Chronic stress was evaluated from the degree of circadian rhythm disturbance. Suppose that the degree of stress for initial rest period is 1. Then. the degree of stress after 14-day sound stress increased to 1.37, and increased to 1.47 after the 7-day recovery period. That is, the rat was exposed to 37%-increased amount of stress by the 14-day sound and did not recover after the 7-day recovery period.

A Study on the Dataset Construction and Model Application for Detecting Surgical Gauze in C-Arm Imaging Using Artificial Intelligence (인공지능을 활용한 C-Arm에서 수술용 거즈 검출을 위한 데이터셋 구축 및 검출모델 적용에 관한 연구)

  • Kim, Jin Yeop;Hwang, Ho Seong;Lee, Joo Byung;Choi, Yong Jin;Lee, Kang Seok;Kim, Ho Chul
    • Journal of Biomedical Engineering Research
    • /
    • v.43 no.4
    • /
    • pp.290-297
    • /
    • 2022
  • During surgery, Surgical instruments are often left behind due to accidents. Most of these are surgical gauze, so radioactive non-permeable gauze (X-ray gauze) is used for preventing of accidents which gauze is left in the body. This gauze is divided into wire and pad type. If it is confirmed that the gauze remains in the body, gauze must be detected by radiologist's reading by imaging using a mobile X-ray device. But most of operating rooms are not equipped with a mobile X-ray device, but equipped C-Arm equipment, which is of poorer quality than mobile X-ray equipment and furthermore it takes time to read them. In this study, Use C-Arm equipment to acquire gauze image for detection and Build dataset using artificial intelligence and select a detection model to Assist with the relatively low image quality and the reading of radiology specialists. mAP@50 and detection time are used as indicators for performance evaluation. The result is that two-class gauze detection dataset is more accurate and YOLOv5 model mAP@50 is 93.4% and detection time is 11.7 ms.

A streamlined pipeline based on HmmUFOtu for microbial community profiling using 16S rRNA amplicon sequencing

  • Hyeonwoo Kim;Jiwon Kim;Ji Won Cho;Kwang-Sung Ahn;Dong-Il Park;Sangsoo Kim
    • Genomics & Informatics
    • /
    • v.21 no.3
    • /
    • pp.40.1-40.11
    • /
    • 2023
  • Microbial community profiling using 16S rRNA amplicon sequencing allows for taxonomic characterization of diverse microorganisms. While amplicon sequence variant (ASV) methods are increasingly favored for their fine-grained resolution of sequence variants, they often discard substantial portions of sequencing reads during quality control, particularly in datasets with large number samples. We present a streamlined pipeline that integrates FastP for read trimming, HmmUFOtu for operational taxonomic units (OTU) clustering, Vsearch for chimera checking, and Kraken2 for taxonomic assignment. To assess the pipeline's performance, we reprocessed two published stool datasets of normal Korean populations: one with 890 and the other with 1,462 independent samples. In the first dataset, HmmUFOtu retained 93.2% of over 104 million read pairs after quality trimming, discarding chimeric or unclassifiable reads, while DADA2, a commonly used ASV method, retained only 44.6% of the reads. Nonetheless, both methods yielded qualitatively similar β-diversity plots. For the second dataset, HmmUFOtu retained 89.2% of read pairs, while DADA2 retained a mere 18.4% of the reads. HmmUFOtu, being a closed-reference clustering method, facilitates merging separately processed datasets, with shared OTUs between the two datasets exhibiting a correlation coefficient of 0.92 in total abundance (log scale). While the first two dimensions of the β-diversity plot exhibited a cohesive mixture of the two datasets, the third dimension revealed the presence of a batch effect. Our comparative evaluation of ASV and OTU methods within this streamlined pipeline provides valuable insights into their performance when processing large-scale microbial 16S rRNA amplicon sequencing data. The strengths of HmmUFOtu and its potential for dataset merging are highlighted.

RNA-Seq De Novo Assembly and Differential Transcriptome Analysis of Korean Medicinal Herb Cirsium japonicum var. spinossimum

  • Roy, Neha Samir;Kim, Jung-A;Choi, Ah-Young;Ban, Yong-Wook;Park, Nam-Il;Park, Kyong-Cheul;Yang, Hee-sun;Choi, Ik-Young;Kim, Soonok
    • Genomics & Informatics
    • /
    • v.16 no.4
    • /
    • pp.34.1-34.9
    • /
    • 2018
  • Cirsium japonicum belongs to the Asteraceae or Compositae family and is a medicinal plant in Asia that has a variety of effects, including tumour inhibition, improved immunity with flavones, and antidiabetic and hepatoprotective effects. Silymarin is synthesized by 4-coumaroyl-CoA via both the flavonoid and phenylpropanoid pathways to produce the immediate precursors taxifolin and coniferyl alcohol. Then, the oxidative radicalization of taxifolin and coniferyl alcohol produces silymarin. We identified the expression of genes related to the synthesis of silymarin in C. japonicum in three different tissues, namely, flowers, leaves, and roots, through RNA sequencing. We obtained 51,133 unigenes from transcriptome sequencing by de novo assembly using Trinity v2.1.1, TransDecoder v2.0.1, and CD-HIT v4.6 software. The differentially expressed gene analysis revealed that the expression of genes related to the flavonoid pathway was higher in the flowers, whereas the phenylpropanoid pathway was more highly expressed in the roots. In this study, we established a global transcriptome dataset for C. japonicum. The data shall not only be useful to focus more deeply on the genes related to product medicinal metabolite including flavolignan but also to study the functional genomics for genetic engineering of C. japonicum.

The Development and Application of Multi-metric Water Quality Assessment Model for Reservoir Managements in Korea. (우리나라 인공호 관리를 위한 다변수 수질평가 모델의 개발 및 적용)

  • Lee, Hyun-Joon;An, Kwang-Guk
    • Korean Journal of Ecology and Environment
    • /
    • v.42 no.2
    • /
    • pp.242-252
    • /
    • 2009
  • The purpose of this study was to develop a Multi-metric Water Quality Assessment (MWQA) model and apply it to dataset sampled from Paldang and Daechung reservoir in 2008. The various water dataset used to this study included 5 year data sets (2003${\sim}$2007) in Korean reservoirs which were obtained from the Ministry of Environment, Korea. In this study, suggested MWQA model has 4 metrics that were composed of 4 parameters such as chemical, physical, biological, and hydrological variables. And, each of the variables attributed total phosphorus (TP) concentration in water, secchi depth (SD) measure in water, chlorophyll-${\alpha}$(Chl-${\alpha}$) concentration in water and the ratio of inflow of water into lakes and efflux of water from lakes, input/output (I/O). First, we established the criteria for trophic boundaries. The boundary between oligotrophic and mesotrophic categories was defined by the lower third of the cumulative distribution of the values. The mesotrophic-eutrophic boundary was defined by the upper third of the distribution. Second, each metric was given by a point-oligo=1, meso=3, eu=5. And then, obtained total score from each metric was divided 5 grade-Excellent, Good, Fair, Poor, and Very poor. As the results of applying the proposed MWQA model, the Paldang reservoir obtained "Fair" or "Poor" grade and Daechung reservoir obtained "Excellent" or "Good" grade. The suggested MWQA model through these procedures will enable to manage efficiently the reservoir. And, more studies such as metric numbers and attributes should be done for the accurate application of the new model.

Spatio-temporal variabilities of nutrients and chlorophyll, and the trophic state index deviations on the relation of nutrients-chlorophyll-light availability

  • Calderon, Martha S.;An, Kwang-Guk
    • Journal of Ecology and Environment
    • /
    • v.39 no.1
    • /
    • pp.31-42
    • /
    • 2016
  • The object of this study was to determine long-term temporal and spatial patterns of nutrients (nitrogen and phosphorus), suspended solids, and chlorophyll (Chl) in Chungju Reservoir, based on the dataset of 1992 - 2013, and then to develop the empirical models of nutrient-Chl for predicting the eutrophication of the reservoir. Concentrations of total nitrogen (TN) and total phosphorus (TP) were largely affected by an intensity of Asian monsoon and the longitudinal structure of riverine (Rz), transition (Tz), and lacustrine zone (Lz). This system was nitrogen-rich system and phosphorus contents in the water were relatively low, implying a P-limiting system. Regression analysis for empirical model, however, showed that Chl had a weak linear relation with TP or TN, and this was mainly associated with turbid, and nutrient-rich inflows in the system. The weak relation was associated with non-algal light attenuation coefficients (Kna), which is inversely related water residence time. Thus, values of Chl had negative functional relation (R2 = 0.25, p < 0.001) with nonalgal light attenuation. Thus, the low chlorophyll at a given TP indicated a light-limiting for phytoplankton growth and total suspended solids (TSS) was highly correlated (R2 = 0.94, p < 0.001) with non-algal light attenuation. The relations of Trophic State Index (TSI) indicated that phosphorus limitation was weak [TSI (Chl) - TSI (TP) < 0; TSI (SD) - TSI (Chl) > 0] and the effects of zooplankton grazing were also minor [TSI (Chl) - TSI (TP) > 0; TSI (SD) - TSI (Chl) > 0].

Integrative Analysis of Microarray Data with Gene Ontology to Select Perturbed Molecular Functions using Gene Ontology Functional Code

  • Kim, Chang-Sik;Choi, Ji-Won;Yoon, Suk-Joon
    • Genomics & Informatics
    • /
    • v.7 no.2
    • /
    • pp.122-130
    • /
    • 2009
  • A systems biology approach for the identification of perturbed molecular functions is required to understand the complex progressive disease such as breast cancer. In this study, we analyze the microarray data with Gene Ontology terms of molecular functions to select perturbed molecular functional modules in breast cancer tissues based on the definition of Gene ontology Functional Code. The Gene Ontology is three structured vocabularies describing genes and its products in terms of their associated biological processes, cellular components and molecular functions. The Gene Ontology is hierarchically classified as a directed acyclic graph. However, it is difficult to visualize Gene Ontology as a directed tree since a Gene Ontology term may have more than one parent by providing multiple paths from the root. Therefore, we applied the definition of Gene Ontology codes by defining one or more GO code(s) to each GO term to visualize the hierarchical classification of GO terms as a network. The selected molecular functions could be considered as perturbed molecular functional modules that putatively contributes to the progression of disease. We evaluated the method by analyzing microarray dataset of breast cancer tissues; i.e., normal and invasive breast cancer tissues. Based on the integration approach, we selected several interesting perturbed molecular functions that are implicated in the progression of breast cancers. Moreover, these selected molecular functions include several known breast cancer-related genes. It is concluded from this study that the present strategy is capable of selecting perturbed molecular functions that putatively play roles in the progression of diseases and provides an improved interpretability of GO terms based on the definition of Gene Ontology codes.