• 제목/요약/키워드: Data Set Records

검색결과 197건 처리시간 0.03초

Using Missing Values in the Model Tree to Change Performance for Predict Cholesterol Levels (모델트리의 결측치 처리 방법에 따른 콜레스테롤수치 예측의 성능 변화)

  • Jung, Yong Gyu;Won, Jae Kang;Sihn, Sung Chul
    • Journal of Service Research and Studies
    • /
    • 제2권2호
    • /
    • pp.35-43
    • /
    • 2012
  • Data mining is an interest area in all field around us not in any specific areas, which could be used applications in a number of areas heavily. In other words, it is used in the decision-making process, data and correlation analysis in hidden relations, for finding the actionable information and prediction. But some of the data sets contains many missing values in the variables and do not exist a large number of records in the data set. In this paper, missing values are handled in accordance with the model tree algorithm. Cholesterol value is applied for predicting. For the performance analysis, experiments are approached for each treatment. Through this, efficient alternative is presented to apply the missing data.

  • PDF

Feasibility Study on Sampling Ocean Meteorological Data using Stratified Method (층화추출법에 의한 해양기상환경의 표본추출 타당성 연구)

  • Han, Song-I;Cho, Yong-Jin
    • Journal of Ocean Engineering and Technology
    • /
    • 제28권3호
    • /
    • pp.254-259
    • /
    • 2014
  • The infrared signature of a ship is largely influenced by the ocean environment of the operating area, which has been known to cause large changes in the signature. As a result, the weather condition has to be clearly set for an analysis of the infrared signatures. It is necessary to analyze meteorological data for all the oceans where the ship is supposed to be operated. This is impossibly costly and time consuming because of the huge size of the data. Therefore, the creation of a standard environmental variable for an infrared signature research is necessary. In this study, we compared and analyzed sampling methods to represent ocean data close to the Korean peninsula. In order to perform this research, we collected ocean meteorological records from KMA (Korea Meteorological Administration), and sampled these in numerous ways considering five variables that are known to affect the infrared signature. Specifically, a simple random sampling method for all the data and 1-D, 2-D, and 3-D stratified sampling methods were compared and analyzed by considering the mean square errors for each method.

A Study on the Research Trends of Archival Preservation Papers in Korea from 2000 to 2021 (국내 기록보존 연구동향 분석: 2000~2021년 학술논문을 중심으로)

  • Yonwhee, Na;Heejin, Park
    • Journal of Korean Society of Archives and Records Management
    • /
    • 제22권4호
    • /
    • pp.175-196
    • /
    • 2022
  • This study aims to determine the research trends in archival preservation through keyword analysis, understand the current research status, and identify the research topics' changes over time. The degree and betweenness centrality analyses were conducted and visualized on 463 "archival preservation studies" articles published from 2000 to 2021 in various academic journals, using NetMiner 4.0. The collected research papers were divided into three time periods according to when they were published: the first period (2000-2007), the second period (2008-2014), and the third period (2015-2021). The subject keywords for the research papers on archival preservation in Korea that have influence and expandability are as follows. Across all periods, these were "electronic records" and "long-term preservation." In addition, if taken separately per period, the "OAIS reference model" and "electronic records" dominated the first and second periods, respectively, while the "records management standard table" and "long-term preservation" both dominated the third period. A conceptual framework and theory-oriented study for archival preservation, such as "digital preservation," "digitalization," and the "OAIS reference model," dominated the first period. During the second period, more research focused on procedures and practical applications related to conservation activities, such as "electronic record," "appraisal," and "DRAMBORA." In contrast, the majority of the research in the third period was on technical implementation according to the changes in the records management environment, such as "data set," "administrative information system," and "social media."

Applying Centrality Analysis to Solve the Cold-Start and Sparsity Problems in Collaborative Filtering (협업필터링의 신규고객추천 및 희박성 문제 해결을 위한 중심성분석의 활용)

  • Cho, Yoon-Ho;Bang, Joung-Hae
    • Journal of Intelligence and Information Systems
    • /
    • 제17권3호
    • /
    • pp.99-114
    • /
    • 2011
  • Collaborative Filtering (CF) suffers from two major problems:sparsity and cold-start recommendation. This paper focuses on the cold-start problem for new customers with no purchase records and the sparsity problem for the customers with very few purchase records. For the purpose, we propose a method for the new customer recommendation by using a combined measure based on three well-used centrality measures to identify the customers who are most likely to become neighbors of the new customer. To alleviate the sparsity problem, we also propose a hybrid approach that applies our method to customers with very few purchase records and CF to the other customers with sufficient purchases. To evaluate the effectiveness of our method, we have conducted several experiments using a data set from a department store in Korea. The experiment results show that the combination of two measures makes better recommendations than not only a single measure but also the best-seller-based method and that the performance is improved when applying the hybrid approach.

Implementation of Ontology-based Service by Exploiting Massive Crime Investigation Records: Focusing on Intrusion Theft (대규모 범죄 수사기록을 활용한 온톨로지 기반 서비스 구현 - 침입 절도 범죄 분야를 중심으로 -)

  • Ko, Gun-Woo;Kim, Seon-Wu;Park, Sung-Jin;No, Yoon-Joo;Choi, Sung-Pil
    • Journal of the Korean Society for Library and Information Science
    • /
    • 제53권1호
    • /
    • pp.57-81
    • /
    • 2019
  • An ontology is a complex structure dictionary that defines the relationship between terms and terms related to specific knowledge in a particular field. There have been attempts to construct various ontologies in Korea and abroad, but there has not been a case in which a large scale crime investigation record is constructed as an ontology and a service is implemented through the ontology. Therefore, this paper describes the process of constructing an ontology based on information extracted from instrusion theft field of unstructured data, a crime investigation document, and implementing an ontology-based search service and a crime spot recommendation service. In order to understand the performance of the search service, we have tested Top-K accuracy measurement, which is one of the accuracy measurement methods for event search, and obtained a maximum accuracy of 93.52% for the experimental data set. In addition, we have obtained a suitable clue field combination for the entire experimental data set, and we can calibrate the field location information in the database with the performance of F1-measure 76.19% Respectively.

Data-Driven Kinematic Control for Robotic Spatial Augmented Reality System with Loose Kinematic Specifications

  • Lee, Ahyun;Lee, Joo-Haeng;Kim, Jaehong
    • ETRI Journal
    • /
    • 제38권2호
    • /
    • pp.337-346
    • /
    • 2016
  • We propose a data-driven kinematic control method for a robotic spatial augmented reality (RSAR) system. We assume a scenario where a robotic device and a projector-camera unit (PCU) are assembled in an ad hoc manner with loose kinematic specifications, which hinders the application of a conventional kinematic control method based on the exact link and joint specifications. In the proposed method, the kinematic relation between a PCU and joints is represented as a set of B-spline surfaces based on sample data rather than analytic or differential equations. The sampling process, which automatically records the values of joint angles and the corresponding external parameters of a PCU, is performed as an off-line process when an RSAR system is installed. In an on-line process, an external parameter of a PCU at a certain joint configuration, which is directly readable from motors, can be computed by evaluating the pre-built B-spline surfaces. We provide details of the proposed method and validate the model through a comparison with an analytic RSAR model with synthetic noises to simulate assembly errors.

Predictive Model of Optimal Continuous Positive Airway Pressure for Obstructive Sleep Apnea Patients with Obesity by Using Machine Learning (비만 폐쇄수면무호흡 환자에서 기계학습을 통한 적정양압 예측모형)

  • Kim, Seung Soo;Yang, Kwang Ik
    • Journal of Sleep Medicine
    • /
    • 제15권2호
    • /
    • pp.48-54
    • /
    • 2018
  • Objectives: The aim of this study was to develop a predicting model for the optimal continuous positive airway pressure (CPAP) for obstructive sleep apnea (OSA) patient with obesity by using a machine learning. Methods: We retrospectively investigated the medical records of 162 OSA patients who had obesity [body mass index (BMI) ≥ 25] and undertaken successful CPAP titration study. We divided the data to a training set (90%) and a test set (10%), randomly. We made a random forest model and a least absolute shrinkage and selection operator (lasso) regression model to predict the optimal pressure by using the training set, and then applied our models and previous reported equations to the test set. To compare the fitness of each models, we used a correlation coefficient (CC) and a mean absolute error (MAE). Results: The random forest model showed the best performance {CC 0.78 [95% confidence interval (CI) 0.43-0.93], MAE 1.20}. The lasso regression model also showed the improved result [CC 0.78 (95% CI 0.42-0.93), MAE 1.26] compared to the Hoffstein equation [CC 0.68 (95% CI 0.23-0.89), MAE 1.34] and the Choi's equation [CC 0.72 (95% CI 0.30-0.90), MAE 1.40]. Conclusions: Our random forest model and lasso model ($26.213+0.084{\times}BMI+0.004{\times}$apnea-hypopnea index+$0.004{\times}oxygen$ desaturation index-$0.215{\times}mean$ oxygen saturation) showed the improved performance compared to the previous reported equations. The further study for other subgroup or phenotype of OSA is required.

Analysis of race time between Korean athletes participating in 2016 Seoul International Wheelchair Marathon (2016 서울 국제 휠체어 마라톤 경기대회에 참가한 국내 선수의 구간 및 기록변화 분석)

  • Kim, Seong-Ho;Kim, Sang-Hoon
    • Journal of Industrial Convergence
    • /
    • 제18권3호
    • /
    • pp.91-98
    • /
    • 2020
  • The purpose of this study was to identify differences in records for each section of the wheelchair marathon and to provide information necessary for training and basic data for developing training methods. The subjects of this study were analyzed for the records of a total of 5 foreign male athletes and 4 domestic top athletes who completed the 42.195km full course among the participants at the 2016 Seoul International Wheelchair Marathon. Every 5km section records and finish records were used, and the first section to the ninth section were set. As for the data processing method, descriptive statistics (Mean, SD) were used using the statistical program SPSS 25.0. The following conclusions were drawn according to the above research methods and procedures. In the case of the winning W1 athletes, the record at 30km to 35km was the fastest 8 minutes and 43 seconds in the entire section, and the difference was 1 minute and 4 seconds compared to the K1 athletes of the nation's No. 1 athletes. In order to be ranked higher in international competitions, training to adapt to a quick pace at a later stage is required. That is, it was found that the second half face should be faster based on the half distance. In addition, in each section of the first half, it is necessary to develop a race pace suitable for the average speed level for each individual section. Therefore, it is considered that the fitness training program that can maintain the early race and the training program that can cope with changes in pace after the second half should be applied.

A Study on the Knowledge Formation Process of Wikipedia in Korea through Big Data Analysis (빅데이터 분석을 통해 본 한국 위키피디아의 지식형성 과정에 관한 연구)

  • Lee, Jungyeoun;Jeon, Suhyeon
    • Journal of the Korean Society for information Management
    • /
    • 제37권2호
    • /
    • pp.171-195
    • /
    • 2020
  • This study analyzed the collaborative process in time series by dismantling the edit log big data of Wikipedia Korea, a representative online collaboration community, from early 2002 to 2019. Analysis elements were extracted from the document edit records, formatted in standardized XML, and analyzed using Python and R. The ways of editors' contribution, the characteristics of data contents, and the trend of document creation were explained by the analysis. An active contribution of a small set of editors and a loose participation of the majority were revealed. In addition, sociocultural characteristics that appear in online communities were also found in Wikipedia Korea. A new, diverse set of external resources is necessary to sustain the collective intelligence. An effort to settle new editors into the wikipedia community and an openness through circulation structure to avoid the exclusiveness of the management group are suggested.

Prediction of Retail Beef Yield Using Parameters Based on Korean Beef Carcass Grading Standards

  • Choy, Yun-Ho;Choi, Seong-Bok;Jeon, Gi-Jun;Kim, Hyeong-Cheol;Chung, Hak-Jae;Lee, Jong-Moon;Park, Beom-Young;Lee, Sun-Ho
    • Food Science of Animal Resources
    • /
    • 제30권6호
    • /
    • pp.905-909
    • /
    • 2010
  • Two sets of data on carcass traits and beef cut parameters were used to investigate the relationships between carcass and beef cut measurements, which can be used to make predictions of retail cut percentages. One set had a total of 1,141 measurements of Hanwoo cattle of three different sex origins, which were slaughtered in an abattoir located at the National Institute of Animal Science, RDA, Korea from 1996 to 2008. To develop prediction models for retail cut percentage with higher accuracies than the current model, another set consisting of a total of 13,389 records of carcass and beef cut traits were collected from 30 abattoirs and butcheries in Korea from 2008 to 2009. Bulls yielded heavier and leaner carcasses than steers. High correlation coefficients were estimated between amount of body fat and percent retail cut (-0.82) as well as between back fat thickness (BF) and percent retail cut (-0.62). The amount of retail cut, however, was highly correlated with body weight before slaughter (BW, 0.95) or with cold carcass weight (CWT, 0.94). Relationships between percent retail cut and measurable beef yield traits, BF, loin eye area (LEA) or CWT varied by sex class, which must be considered for development of a prediction model with high accuracy. Models of data for all breeds and sexes fit the effects of breed, sex, and interaction of abattoir by butchers, whereas models of data for each breed and sex fit the effect of interaction of abattoir by butcher only. Due to possible future changes in back fat control, we performed a log transformation of BF. Our new models fit better than the currently used model.