• Title/Summary/Keyword: 2D Dataset

Search Result 208, Processing Time 0.025 seconds

Factors influencing metabolic syndrome perception and exercising behaviors in Korean adults: Data mining approach (대사증후군의 인지와 신체활동 실천에 영향을 미치는 요인: 데이터 마이닝 접근)

  • Lee, Soo-Kyoung;Moon, Mikyung
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.18 no.12
    • /
    • pp.581-588
    • /
    • 2017
  • This study was conducted to determine which factors would predict metabolic syndrome (MetS) perception and exercise by applying a machine learning classifier, or Extreme Gradient Boosting algorithm (XGBoost) from July 2014 to December 2015. Data were obtained from the Korean Community Health Survey (KCHS), representing different community-dwelling Korean adults 19 years and older, from 2009 to 2013. The dataset includes 370,430 adults. Outcomes were categorized as follows based on the perception of MetS and physical activity (PA): Stage 1 (no perception, no PA), Stage 2 (perception, no PA), and Stage 3 (perception, PA). Features common to all questionnaires for the last 5 years were selected for modeling. Overall, there were 161 features, categorical except for age and the visual analogue scale (EQ-VAS). We used the Extreme Boosting algorithm in R programming for a model to predict factors and achieved prediction accuracy in 0.735 submissions. The top 10 predictive factors in Stage 3 were: age, education level, attempt to control weight, EQ mobility, nutrition label checks, private health insurance, EQ-5D usual activities, anti-smoking advertising, EQ-VAS, education in health centers for diabetes, and dental care. In conclusion, the results showed that XGBoost can be used to identify factors influencing disease prevention and management using healthcare bigdata.

Mining Maximal Frequent Contiguous Sequences in Biological Data Sequences (생물학적 데이터 서열들에서 빈번한 최대길이 연속 서열 마이닝)

  • Kang, Tae-Ho;Yoo, Jae-Soo
    • The KIPS Transactions:PartD
    • /
    • v.15D no.2
    • /
    • pp.155-162
    • /
    • 2008
  • Biological sequences such as DNA sequences and amino acid sequences typically contain a large number of items. They have contiguous sequences that ordinarily consist of hundreds of frequent items. In biological sequences analysis(BSA), a frequent contiguous sequence search is one of the most important operations. Many studies have been done for mining sequential patterns efficiently. Most of the existing methods for mining sequential patterns are based on the Apriori algorithm. In particular, the prefixSpan algorithm is one of the most efficient sequential pattern mining schemes based on the Apriori algorithm. However, since the algorithm expands the sequential patterns from frequent patterns with length-1, it is not suitable for biological dataset with long frequent contiguous sequences. In recent years, the MacosVSpan algorithm was proposed based on the idea of the prefixSpan algorithm to significantly reduce its recursive process. However, the algorithm is still inefficient for mining frequent contiguous sequences from long biological data sequences. In this paper, we propose an efficient method to mine maximal frequent contiguous sequences in large biological data sequences by constructing the spanning tree with the fixed length. To verify the superiority of the proposed method, we perform experiments in various environments. As the result, the experiments show that the proposed method is much more efficient than MacosVSpan in terms of retrieval performance.

Data-centric XAI-driven Data Imputation of Molecular Structure and QSAR Model for Toxicity Prediction of 3D Printing Chemicals (3D 프린팅 소재 화학물질의 독성 예측을 위한 Data-centric XAI 기반 분자 구조 Data Imputation과 QSAR 모델 개발)

  • ChanHyeok Jeong;SangYoun Kim;SungKu Heo;Shahzeb Tariq;MinHyeok Shin;ChangKyoo Yoo
    • Korean Chemical Engineering Research
    • /
    • v.61 no.4
    • /
    • pp.523-541
    • /
    • 2023
  • As accessibility to 3D printers increases, there is a growing frequency of exposure to chemicals associated with 3D printing. However, research on the toxicity and harmfulness of chemicals generated by 3D printing is insufficient, and the performance of toxicity prediction using in silico techniques is limited due to missing molecular structure data. In this study, quantitative structure-activity relationship (QSAR) model based on data-centric AI approach was developed to predict the toxicity of new 3D printing materials by imputing missing values in molecular descriptors. First, MissForest algorithm was utilized to impute missing values in molecular descriptors of hazardous 3D printing materials. Then, based on four different machine learning models (decision tree, random forest, XGBoost, SVM), a machine learning (ML)-based QSAR model was developed to predict the bioconcentration factor (Log BCF), octanol-air partition coefficient (Log Koa), and partition coefficient (Log P). Furthermore, the reliability of the data-centric QSAR model was validated through the Tree-SHAP (SHapley Additive exPlanations) method, which is one of explainable artificial intelligence (XAI) techniques. The proposed imputation method based on the MissForest enlarged approximately 2.5 times more molecular structure data compared to the existing data. Based on the imputed dataset of molecular descriptor, the developed data-centric QSAR model achieved approximately 73%, 76% and 92% of prediction performance for Log BCF, Log Koa, and Log P, respectively. Lastly, Tree-SHAP analysis demonstrated that the data-centric-based QSAR model achieved high prediction performance for toxicity information by identifying key molecular descriptors highly correlated with toxicity indices. Therefore, the proposed QSAR model based on the data-centric XAI approach can be extended to predict the toxicity of potential pollutants in emerging printing chemicals, chemical process, semiconductor or display process.

Motion generation using Center of Mass (무게중심을 활용한 모션 생성 기술)

  • Park, Geuntae;Sohn, Chae Jun;Lee, Yoonsang
    • Journal of the Korea Computer Graphics Society
    • /
    • v.26 no.2
    • /
    • pp.11-19
    • /
    • 2020
  • When a character's pose changes, its center of mass(COM) also changes. The change of COM has distinctive patterns corresponding to various motion types like walking, running or sitting. Thus the motion type can be predicted by using COM movement. We propose a motion generator that uses character's center of mass information. This generator can generate various motions without annotated action type labels. Thus dataset for training and running can be generated full-automatically. Our neural network model takes the motion history of the character and its center of mass information as inputs and generates a full-body pose for the current frame, and is trained using simple Convolutional Neural Network(CNN) that performs 1D convolution to deal with time-series motion data.

Association between fatty liver disease and hearing impairment in Korean adults: a retrospective cross-sectional study

  • Da Jung Jung
    • Journal of Yeungnam Medical Science
    • /
    • v.40 no.4
    • /
    • pp.402-411
    • /
    • 2023
  • Background: We hypothesized that fatty liver disease (FLD) is associated with a high prevalence of hearing loss (HL) owing to metabolic disturbances. This study aimed to evaluate the association between FLD and HL in a large sample of the Korean population. Methods: We used a dataset of adults who underwent routine voluntary health checkups (n=21,316). Fatty liver index (FLI) was calculated using Bedogni's equation. The patients were divided into two groups: the non-FLD (NFLD) group (n=18,518, FLI <60) and the FLD group (n=2,798, FLI ≥60). Hearing thresholds were measured using an automatic audiometer. The average hearing threshold (AHT) was calculated as the pure-tone average at four frequencies (0.5, 1, 2, and 3 kHz). HL was defined as an AHT of >40 dB. Results: HL was observed in 1,370 (7.4%) and 238 patients (8.5%) in the NFLD and FLD groups, respectively (p=0.041). Compared with the NFLD group, the odds ratio for HL in the FLD group was 1.16 (p=0.040) and 1.46 (p<0.001) in univariate and multivariate logistic regression analyses, respectively. Linear regression analyses revealed that FLI was positively associated with AHT in both univariate and multivariate analyses. Analyses using a propensity score-matched cohort showed trends similar to those using the total cohort. Conclusion: FLD and FLI were associated with poor hearing thresholds and HL. Therefore, active monitoring of hearing impairment in patients with FLD may be helpful for early diagnosis and treatment of HL in the general population.

Development of a Real-time Action Recognition-Based Child Behavior Analysis Service System (실시간 행동인식 기반 아동 행동분석 서비스 시스템 개발)

  • Chimin Oh;Seonwoo Kim;Jeongmin Park;Injang Jo;Jaein Kim;Chilwoo Lee
    • Smart Media Journal
    • /
    • v.13 no.2
    • /
    • pp.68-84
    • /
    • 2024
  • This paper describes the development of a system and algorithms for high-quality welfare services by recognizing behavior development indicators (activity, sociability, danger) in children aged 0 to 2 years old using action recognition technology. Action recognition targeted 11 behaviors from lying down in 0-year-olds to jumping in 2-year-olds, using data directly obtained from actual videos provided for research purposes by three nurseries in the Gwangju and Jeonnam regions. A dataset of 1,867 actions from 425 clip videos was built for these 11 behaviors, achieving an average recognition accuracy of 97.4%. Additionally, for real-world application, the Edge Video Analyzer (EVA), a behavior analysis device, was developed and implemented with a region-specific random frame selection-based PoseC3D algorithm, capable of recognizing actions in real-time for up to 30 people in four-channel videos. The developed system was installed in three nurseries, tested by ten childcare teachers over a month, and evaluated through surveys, resulting in a perceived accuracy of 91 points and a service satisfaction score of 94 points.

Stream Health Assessments on Tributaries of Lake Paldang Using Index of Biological Integrity for Fish Community and Physical Habitat Parameters (어류 모델 메트릭과 물리적 서식지 변수를 이용한 팔당호 유입하천 하류부의 하천건강성 평가)

  • Choi, Myung-Jae;Park, Hae-Kyung;Lee, Jang-Ho;Yun, Seok-Hwan
    • Korean Journal of Ecology and Environment
    • /
    • v.42 no.3
    • /
    • pp.280-289
    • /
    • 2009
  • The fish communities and physical habitat conditions of fifteen tributaries of Lake Paldang in spring and autumn, 2008 were surveyed to evaluate the ecological health of the streams. The total 2,746 individuals were collected belonging to 11 families 31 genera 40 species. Two new species (Cottus koreanus, Gnathopogon strigaus) that have never been reported as yet in Lake Paldang watershed were found for the first time. The most dominant species in the tributaries was Acheilognathus yamatsutae (19.9%) which is Korean endemic species. Ecological health evaluation of fifteen tributaries using index of biological integrity (IBI) model for fish community and qualitative habitat evaluation index (QHEI) was performed. According th the IBI analysis, four streams (Siwoo-Stream, Jojong-Stream, Moonho-Stream and Mugab-Stream) were evaluated as "good" condition (B grade), Woosan-Stream were "poor" condition (D grade) and others were "fair" condition (C grade). Qualitative habitat evaluation index values of the four streams were the grade "II" indicating "good" condition and those of eleven streams were the grade "III", indicating 'fair' condition. On the whole, dataset of IBI and QHEI showed that ecological health of Jojong-Stream has been well maintained compared to other tributaries of Lake Paldang.

An Analysis on the Research Network Structure of Convergence Technologies in Government-sponsored Research Institutes (출연연구기관 융합기술 연구네트워크 구조 분석)

  • Kim, Hongyoung;Chung, Sunyang
    • Journal of Korea Technology Innovation Society
    • /
    • v.18 no.4
    • /
    • pp.693-718
    • /
    • 2015
  • This paper examines the presence of network structures among convergence technologies focusing on national R&D projects performed by GRIs(Government-sponsored Research Institutes) in Korea. The dataset of convergence technology projects, which were conducted by 24 GRIs over 3 years (2011-2013), are analysed using the network analysis method. In this paper, a convergence technology project is defined as a project that consists of 2 or more then 2 technologies according to the intermediate classification of National Standard Classification of S&T. The research results confirm that convergence researches of government-sponsored research institutes are performed more actively than the entire convergence researches of national R&D projects. Furthermore, technological fields of GRIs' convergence projects are found to be much more varied. This paper also shows that in-house researches are more active than collaborative ones with external organizations. According to the network centrality analysis, it is identified that the network central characteristics of convergence technologies can be classified into internally oriented technologies and externally oriented technologies. Convergence technologies do not just mean simple mixture of different technologies. Therefore Korean government-sponsored research institutes should make more efforts to create convergence research areas which could generate new technologies and industries more effectively than simple multidisciplinary technology researches. From this perspective, some policy suggestions can be derived on the role of government-sponsored research institutes for activating convergence researches through the analysis of status of convergence researches and networks of institutions.

A Social Network Analysis on the Research Trend of Korean Medicine (한의학 연구동향에 대한 사회연결망분석)

  • Kwon, Ki-Seok;Yi, Junhyeok;Lee, Juyeon;Chae, Sungwook;Han, Dong Seong
    • Journal of Korea Technology Innovation Society
    • /
    • v.17 no.2
    • /
    • pp.334-354
    • /
    • 2014
  • This study aims to analyze the research trend of Korean medicine based on social network analysis. To do this, a dataset has been collected from KCI (Korea Citation Index) database. According to the results, we have identify the longitudinal trend of the number of papers, journals, organizations and key words in this field. Moreover, based on the nodes' centrality of co-author network, we have found a core journal (i.e. Korean Journal of Oriental Physiology and Pathology), a hub institution (i.e. Kyunghee university) and two main key words (i.e. anti-oxidation and acupuncture) in the research network. In conclusion, integrating field experts' tacit knowledge in Korean medicine studies with the results of the explicit social network analysis on the research trend, we put forward further policy implications with regard to R&D strategies in this field.

Application of SOM for the Detection of Spatial Distribution considering the Analysis of Basic Statistics for Water Quality and Runoff Data (수질 및 유량자료의 기초통계량 분석에 따른 공간분포 파악을 위한 SOM의 적용)

  • Jin, Young-Hoon;Kim, Yong-Gu;Roh, Kyong-Bum;Park, Sung-Chun
    • Journal of Korean Society on Water Environment
    • /
    • v.25 no.5
    • /
    • pp.735-741
    • /
    • 2009
  • In order to support the basic information for planning and performing the environment management such as Total Maximum Daily Loads (TMDLs), it is highly recommended to understand the spatial distribution of water quality and runoff data in the unit watersheds. Therefore, in the present study, we applied Self-Organizing Map (SOM) to detect the characteristics of spatial distribution of Biological Oxygen Demand (BOD) concentration and runoff data which have been measured in the Yeongsan, Seomjin, and Tamjin River basins. For the purpose, the input dataset for SOM was constructed with the mean, standard deviation, skewness, and kurtosis values of the respective data measured from the stations of 22-subbasins in the rivers. The results showed that the $4{\times}4$ array structure of SOM was selected by the trial and error method and the best performance was revealed when it classified the stations into three clusters according to the basic statistics. The cluster-1 and 2 were classified primarily by the skewness and kurtosis of runoff data and the cluster-3 including the basic statistics of YB_B, YB_C, and YB_D stations was clearly decomposed by the mean value of BOD concentration showing the worst condition of water quality among the three clusters. Consequently, the methodology based on the SOM proposed in the present study can be considered that it is highly applicable to detect the spatial distribution of BOD concentration and runoff data and it can be used effectively for the further utilization using different water quality items as a data analysis tool.