• Title/Summary/Keyword: R Programming

Search Result 309, Processing Time 0.038 seconds

Lowess and outlier analysis of biological oxygen demand on Nakdong main stream river (낙동강 본류 측정소들의 생물학적 산소요구량 수치에 대한 비모수적 회귀분석과 특이점분석)

  • Kim, Jong Tae
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.1
    • /
    • pp.119-130
    • /
    • 2014
  • This paper is based on water information system of NIE, National Institute of Environmental Research. We used monthly data of water quality from January, 2013 to August, 2013 starting from measuring point A (nbA) to measuring point N (nbN) located along the Nakdong river main stream. Statistical water quality analysis of BOD (biological oxygen demand) is specified by R programming depending on month, year, and points. Based on BOD measured from Nakdong river's measuring points, we used exploratory data analysis and locally weighted scatter plot smoother (Lowess) trend analysis, which is a method of non-parametic regression analysis, to analyze long-term water tendency and water quality distribution depending on points. Also, we analyzed the period and the measuring point of which the outliers are abundant. As a result, compared to BOD measured in nbM located in Busan along the downstream, BOD measured in nbG located in Daegu and nbI located in Changwon along the midstream showed higher rate of water pollution at a severe level.

Factors influencing metabolic syndrome perception and exercising behaviors in Korean adults: Data mining approach (대사증후군의 인지와 신체활동 실천에 영향을 미치는 요인: 데이터 마이닝 접근)

  • Lee, Soo-Kyoung;Moon, Mikyung
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.18 no.12
    • /
    • pp.581-588
    • /
    • 2017
  • This study was conducted to determine which factors would predict metabolic syndrome (MetS) perception and exercise by applying a machine learning classifier, or Extreme Gradient Boosting algorithm (XGBoost) from July 2014 to December 2015. Data were obtained from the Korean Community Health Survey (KCHS), representing different community-dwelling Korean adults 19 years and older, from 2009 to 2013. The dataset includes 370,430 adults. Outcomes were categorized as follows based on the perception of MetS and physical activity (PA): Stage 1 (no perception, no PA), Stage 2 (perception, no PA), and Stage 3 (perception, PA). Features common to all questionnaires for the last 5 years were selected for modeling. Overall, there were 161 features, categorical except for age and the visual analogue scale (EQ-VAS). We used the Extreme Boosting algorithm in R programming for a model to predict factors and achieved prediction accuracy in 0.735 submissions. The top 10 predictive factors in Stage 3 were: age, education level, attempt to control weight, EQ mobility, nutrition label checks, private health insurance, EQ-5D usual activities, anti-smoking advertising, EQ-VAS, education in health centers for diabetes, and dental care. In conclusion, the results showed that XGBoost can be used to identify factors influencing disease prevention and management using healthcare bigdata.

Development of a Gridded Simulation Support System for Rice Growth Based on the ORYZA2000 Model (ORYZA2000 모델에 기반한 격자형 벼 생육 모의 지원 시스템 개발)

  • Hyun, Shinwoo;Yoo, Byoung Hyun;Park, Jinyu;Kim, Kwang Soo
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.19 no.4
    • /
    • pp.270-279
    • /
    • 2017
  • Regional assessment of crop productivity using a gridded simulation approach could aid policy making and crop management. Still, little effort has been made to develop the systems that allows gridded simulations of crop growth using ORYZA 2000 model, which has been used for predicting rice yield in Korea. The objectives of this study were to develop a series of data processing modules for creating input data files, running the crop model, and aggregating output files in a region of interest using gridded data files. These modules were implemented using C++ and R to make the best use of the features provided by these programming languages. In a case study, 13000 input files in a plain text format were prepared using daily gridded weather data that had spatial resolution of 1km and 12.5 km for the period of 2001-2010. Using the text files as inputs to ORYZA2000 model, crop yield simulations were performed for each grid cell using a scenario of crop management practices. After output files were created for grid cells that represent a paddy rice field in South Korea, each output file was aggregated into an output file in the netCDF format. It was found that the spatial pattern of crop yield was relatively similar to actual distribution of yields in Korea, although there were biases of crop yield depending on regions. It seemed that those differences resulted from uncertainties incurred in input data, e.g., transplanting date, cultivar in an area, as well as weather data. Our results indicated that a set of tools developed in this study would be useful for gridded simulation of different crop models. In the further study, it would be worthwhile to take into account compatibility to a modeling interface library for integrated simulation of an agricultural ecosystem.

Text-Mining Analysis on the Interaction between the American Consumers Aged over 60 and Companion Pets Robots: Focused on Amazon Reviews for Joy For All Companion Pets (텍스트 마이닝을 활용한 미국 노년 소비자와 애완용 로봇 간 상호작용에 대한 분석: Joy For All Companion Pets에 대한 아마존 리뷰를 중심으로)

  • Chung, Yea-Eun;Lee, Yu Lim;Chung, Jae-Eun
    • Journal of Digital Convergence
    • /
    • v.19 no.10
    • /
    • pp.469-489
    • /
    • 2021
  • This study explores consumers' responses to socially assistive robotics by using text-mining method focusing on Companion Pets from Hasbro as it gives emotional support. We conducted text frequency analysis, LDA analysis using R programming. The key findings are 1)the most frequently used words the mimicry of living pets and the appearance of companion pets, 2)the five topics were derived from the LDA analysis and classified keywords in each topic split between positive and negative, 3)user, product, environment affect the interaction between consumer and companion pets, 4)consumers who have difficulty in cognition and physical conditions use companion pets to replace living pets. This study provides an understanding of consumer responses in companion pets and gives practical implications that may improve the efficacy of usage for consumers and understand the companion robot, which provides emotional support in COVID-19.

An exploratory study on consumers' responses to mobile payment service focused on Samsung Pay (텍스트 마이닝 기법을 이용한 모바일 간편결제 서비스에 대한 소비자 반응 분석: 삼성페이를 중심으로)

  • Jung, Minji;Lee, Yu Lim;Yoo, Chae Min;Kim, Ji Won;Chung, Jae-Eun
    • Journal of Digital Convergence
    • /
    • v.17 no.1
    • /
    • pp.9-27
    • /
    • 2019
  • The purpose of this study is to examine consumers' responses to mobile payment services by using a text-mining technique focusing on Samsung Pay as it is used in both online and offline transactions. We conducted text frequency analysis, text clustering analysis, and text network analysis using R programming. The major findings are as follows. First, the most frequently used key words referenced the brand names of the mobile devices, the replacement of traditional wallets and unique functions of Samsung Pay. Second, there was a clear split between positive and negative responses at the macro level. Third, replacement of traditional wallets played a great role in the positive responses and continuous use of mobile payment services. This study provides in-depth understanding of consumer responses toward mobile payment services. It also offers practical implications that may help mobile payment marketers correspond to consumer values and expectations, thus increasing consumer satisfaction.

Water Level Prediction on the Golok River Utilizing Machine Learning Technique to Evaluate Flood Situations

  • Pheeranat Dornpunya;Watanasak Supaking;Hanisah Musor;Oom Thaisawasdi;Wasukree Sae-tia;Theethut Khwankeerati;Watcharaporn Soyjumpa
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2023.05a
    • /
    • pp.31-31
    • /
    • 2023
  • During December 2022, the northeast monsoon, which dominates the south and the Gulf of Thailand, had significant rainfall that impacted the lower southern region, causing flash floods, landslides, blustery winds, and the river exceeding its bank. The Golok River, located in Narathiwat, divides the border between Thailand and Malaysia was also affected by rainfall. In flood management, instruments for measuring precipitation and water level have become important for assessing and forecasting the trend of situations and areas of risk. However, such regions are international borders, so the installed measuring telemetry system cannot measure the rainfall and water level of the entire area. This study aims to predict 72 hours of water level and evaluate the situation as information to support the government in making water management decisions, publicizing them to relevant agencies, and warning citizens during crisis events. This research is applied to machine learning (ML) for water level prediction of the Golok River, Lan Tu Bridge area, Sungai Golok Subdistrict, Su-ngai Golok District, Narathiwat Province, which is one of the major monitored rivers. The eXtreme Gradient Boosting (XGBoost) algorithm, a tree-based ensemble machine learning algorithm, was exploited to predict hourly water levels through the R programming language. Model training and testing were carried out utilizing observed hourly rainfall from the STH010 station and hourly water level data from the X.119A station between 2020 and 2022 as main prediction inputs. Furthermore, this model applies hourly spatial rainfall forecasting data from Weather Research and Forecasting and Regional Ocean Model System models (WRF-ROMs) provided by Hydro-Informatics Institute (HII) as input, allowing the model to predict the hourly water level in the Golok River. The evaluation of the predicted performances using the statistical performance metrics, delivering an R-square of 0.96 can validate the results as robust forecasting outcomes. The result shows that the predicted water level at the X.119A telemetry station (Golok River) is in a steady decline, which relates to the input data of predicted 72-hour rainfall from WRF-ROMs having decreased. In short, the relationship between input and result can be used to evaluate flood situations. Here, the data is contributed to the Operational support to the Special Water Resources Management Operation Center in Southern Thailand for flood preparedness and response to make intelligent decisions on water management during crisis occurrences, as well as to be prepared and prevent loss and harm to citizens.

  • PDF

A Study on Practical Classes for Healthcare Administration Education Program Using Health and Medical Big Data (보건의료 빅데이터를 활용한 보건행정 교육프로그램 실무수업에 관한 고찰)

  • Ok-Yul Yang;Yeon-Hee Lee
    • Journal of the Health Care and Life Science
    • /
    • v.10 no.1
    • /
    • pp.1-14
    • /
    • 2022
  • This study is a study on the possibility of using big data-related education programs in health and medical administration-related departments using health and medical big data. This paper intends to examine the health and medical big data from five perspectives. 1st, in addition to the aforementioned 'Health and Medical Big Data Open System', I would like to examine the characteristics and application technologies of public big data disclosed by 'Korea Welfare Panel', 'Public Big Data', 'Seoul City Big Data', 'Statistical Office Big Data', etc. 2nd, it is intended to examine the appropriateness of whether the applicable health and medical big data can be used as living data in regular subjects of health and medical administration and health information related departments of junior colleges. 3rd, we want to select the most appropriate tool for classroom lectures using existing statistical processing packages and programming languages. Fourth, finally, by using verified health and medical big data and appropriate tools, we want to test the possibility of expressing graphs, etc. in class and the steps from writing a report. 4th, I would like to describe the relative advantages of R language that can satisfy portability, installability, cost effectiveness, compatibility, and big data processing potential.

Expression of the Circadian Clock Genes in the Mouse Gonad (생쥐 생식소의 발달 단계에 따른 일주기성 유전자 발현에 관한 연구)

  • Chung Mi-Kyung;Choi Yoon-Jeong;Jung Kyenng-Hwa;Kim Eun-Ah;Chung Hyung-Min;Lee Sook-Hwan;Yoon Tae-Ki;Chai Young-Gyu
    • Development and Reproduction
    • /
    • v.8 no.1
    • /
    • pp.57-64
    • /
    • 2004
  • This study was carried out to examine the expression of the circadian clock genes in the mouse ovary and testis at different developmental stages. Expression of Period1(Per 1), Period2(Per2), Period3(Per3), Cryptochrome1(Cry1), Cyptochrome2(Cry2), Clock Small and Prokineticin1 and Prokineticin2 receptor(Prok1r, Prok2r) genes in mouse ovary was explored by semiquantitative reverse transcription Polymerase chain reaction(RT-PCR) according to the developmental stage(post partum day; ppd 1, 7, 10, 21 and 35). Immunohistochemistry using PER1 antibody was also analyzed. The differential expression pattern of clock genes was presented according to stages of the mouse ovarian development (ppd 1, 7, 10, 21 and 35). In the cases of ovaries, at the starting point of follicle growth at ppd 7 and 10, the clock gene expression patterns were changed vastly. According to the developmental stages, the clock genes were highly expressed at ppd 7 and 10 in mouse testis also. Receptors for Prok2, the circadian output molecule of SCN, were also expressed in ovary at ppd 7 and in testis at ppd 1 and 7, respectively. Immnunohistochemical analysis of PER1 showed positive signals in the cytoplasm of oocytes and granulosa cells. The level or PER1 expression was increased in cells at the spermatogonia and the condensing spermatids. The expression pattern of Perl and localization of PER1 were showed similar patterns according to the developmental stages in ovary and testis. Taken together, it could be observed that the expression of clock genes was highly correlated with gonadal development and germ cell differentiation in mice. Therefore, in this study, circadian programming of the genes in the ovary and testis is strongly imposed across a wide range of core reproductive cycles and normal development of gametes. Although the existence of circadian genes is clearly investigated, further studies on the direct evidence is required for the understanding of the relationship between circadian genes and regulation of gonadal differentiation and germ cell development.

  • PDF

Verification of Indicator Rotation Correction Function of a Treatment Planning Program for Stereotactic Radiosurgery (방사선수술치료계획 프로그램의 지시자 회전 오차 교정 기능 점검)

  • Chung, Hyun-Tai;Lee, Re-Na
    • Journal of Radiation Protection and Research
    • /
    • v.33 no.2
    • /
    • pp.47-51
    • /
    • 2008
  • Objective: This study analyzed errors due to rotation or tilt of the magnetic resonance (MR) imaging indicator during image acquisition for a stereotactic radiosurgery. The error correction procedure of a commercially available stereotactic neurosurgery treatment planning program has been verified. Materials and Methods: Software virtual phantoms were built with stereotactic images generated by a commercial programming language, Interactive Data Language (version 5.5). The thickness of an image slice was 0.5 mm, pixel size was $0.5{\times}0.5mm$, field of view was 256 mm, and image resolution was $512{\times}512$. The images were generated under the DICOM 3.0 standard in order to be used with Leksell GammaPlan$^{(R)}$. For the verification of the rotation error correction function of Leksell GammaPlan$^{(R)}$, 45 measurement points were arranged in five axial planes. On each axial plane, there were nine measurement points along a square of length 100 mm. The center of the square was located on the z-axis and a measurement point was on the z-axis, too. Five axial planes were placed at z=-50.0, -30.0, 0.0, 30.0, 50.0 mm, respectively. The virtual phantom was rotated by $3^{\circ}$ around one of x, y, and z-axis. It was also rotated by $3^{\circ}$ around two axes of x, y, and z-axis, and rotated by $3^{\circ}$ along all three axes. The errors in the position of rotated measurement points were measured with Leksell GammaPlan$^{(R)}$ and the correction function was verified. Results: The image registration errors of the virtual phantom images was $0.1{\pm}0.1mm$ and it was within the requirement of stereotactic images. The maximum theoretical errors in position of measurement points were 2.6 mm for a rotation around one axis, 3.7 mm for a rotation around two axes, and 4.5 mm for a rotation around three axes. The measured errors in position was $0.1{\pm}0.1mm$ for a rotation around single axis, $0.2{\pm}0.2mm$ for double and triple axes. These small errors verified that the rotation error correction function of Leksell GammaPlan$^{(R)}$ is working fine. Conclusion: A virtual phantom was built to verify software functions of stereotactic neurosurgery treatment planning program. The error correction function of a commercial treatment planning program worked within nominal error range. The virtual phantom of this study can be applied in many other fields to verify various functions of treatment planning programs.

Transcriptome Analysis of Longissimus Tissue in Fetal Growth Stages of Hanwoo (Korean Native Cattle) with Focus on Muscle Growth and Development (한우 태아기 6, 9개월령 등심 조직의 전사체 분석을 통한 근생성 및 지방생성 관여 유전자 발굴)

  • Jeong, Taejoon;Chung, Ki-Yong;Park, Woncheol;Son, Ju-Hwan;Park, Jong-Eun;Chai, Han-Ha;Kwon, Eung-Gi;Ahn, Jun-Sang;Park, Mi-Rim;Lee, Jiwoong;Lim, Dajeong
    • Journal of Life Science
    • /
    • v.30 no.1
    • /
    • pp.45-57
    • /
    • 2020
  • The prenatal period in livestock animals is crucial for meat production because net increase in the number of muscle fibers is finished before birth. However, there is no study on the growth and development mechanism of muscles in Hanwoo during this period. Therefore, to find candidate genes involved in muscle growth and development during this period in Hanwoo, mRNA expression data of longissimus in Hanwoo at 6 and 9 months post-conceptional age (MPA) were analyzed. We independently identified differentially expressed genes (DEGs) using DESeq2 and edgeR which are R software packages, and considered the overlaps of the results as final-DEGs to use in downstream analysis. The DEGs were classified into several modules using WGCNA then the modules' functions were analyzed to identify modules which involved in myogenesis and adipogenesis. Finally, the hub genes which had the highest WGCNA module membership among the top 10% genes of the STRING network maximal clique centrality were identified. 913(6 MPA specific DEGs) and 233(9 MPA specific DEGs) DEGs were figured out, and these were classified into five and two modules, respectively. Two of the identified modules'(one was in 6, and another was in 9 MPA specific modules) functions was found to be related to myogenesis and adipogenesis. One of the hub genes belonging to the 6 MPA specific module was axin1 (AXIN1) which is known as an inhibitor of Wnt signaling pathway, another was succinate-CoA ligase ADP-forming beta subunit (SUCLA2) which is known as a crucial component of citrate cycle.