Search | Korea Science

Sparse Data Cleaning using Multiple Imputations

Jun, Sung-Hae;Lee, Seung-Joo;Oh, Kyung-Whan
- International Journal of Fuzzy Logic and Intelligent Systems
- /
- v.4 no.1
- /
- pp.119-124
- /
- 2004
Real data as web log file tend to be incomplete. But we have to find useful knowledge from these for optimal decision. In web log data, many useful things which are hyperlink information and web usages of connected users may be found. The size of web data is too huge to use for effective knowledge discovery. To make matters worse, they are very sparse. We overcome this sparse problem using Markov Chain Monte Carlo method as multiple imputations. This missing value imputation changes spare web data to complete. Our study may be a useful tool for discovering knowledge from data set with sparseness. The more sparseness of data in increased, the better performance of MCMC imputation is good. We verified our work by experiments using UCI machine learning repository data.
https://doi.org/10.5391/IJFIS.2004.4.1.119 인용 PDF KSCI

Efficient LSTM Configuration in IoT Environment (IoT 환경에서의 효율적인 LSTM 구성)

Lee, Jongwon;Hwang, Chulhyun;Lee, Sungock;Song, Hyunok;Jung, Hoekyung
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2018.10a
- /
- pp.345-346
- /
- 2018
Internet of Things (IoT) data is collected in real time and is treated as highly reliable data because of its high precision. However, IoT data is not always highly reliable data. Because, data be often incomplete values for reasons such as sensor aging and failure, poor operating environment, and communication problems. So, we propose the methodology for solve this problem. Our methodology implements multiple LSTM networks to individually process the data collected from the sensors and a single LSTM network that batches the input data into an array. And, we propose an efficient method for constructing LSTM in IoT environment.
PDF

Dual Generalized Maximum Entropy Estimation for Panel Data Regression Models

Lee, Jaejun;Cheon, Sooyoung
- Communications for Statistical Applications and Methods
- /
- v.21 no.5
- /
- pp.395-409
- /
- 2014
Data limited, partial, or incomplete are known as an ill-posed problem. If the data with ill-posed problems are analyzed by traditional statistical methods, the results obviously are not reliable and lead to erroneous interpretations. To overcome these problems, we propose a dual generalized maximum entropy (dual GME) estimator for panel data regression models based on an unconstrained dual Lagrange multiplier method. Monte Carlo simulations for panel data regression models with exogeneity, endogeneity, or/and collinearity show that the dual GME estimator outperforms several other estimators such as using least squares and instruments even in small samples. We believe that our dual GME procedure developed for the panel data regression framework will be useful to analyze ill-posed and endogenous data sets.
https://doi.org/10.5351/CSAM.2014.21.5.395 인용 PDF KSCI

Comparing Accuracy of Imputation Methods for Incomplete Categorical Data

Shin, Hyung-Won;Sohn, So-Young
- Proceedings of the Korean Statistical Society Conference
- /
- 2003.05a
- /
- pp.237-242
- /
- 2003
Various kinds of estimation methods have been developed for imputation of categorical missing data. They include modal category method, logistic regression, and association rule. In this study, we propose two imputation methods (neural network fusion and voting fusion) that combine the results of individual imputation methods. A Monte-Carlo simulation is used to compare the performance of these methods. Five factors used to simulate the missing data are (1) true model for the data, (2) data size, (3) noise size (4) percentage of missing data, and (5) missing pattern. Overall, neural network fusion performed the best while voting fusion is better than the individual imputation methods, although it was inferior to the neural network fusion. Result of an additional real data analysis confirms the simulation result.
PDF

Diagnostic characteristics of supplemental laboratory criteria for incomplete Kawasaki disease in children with complete Kawasaki disease

Jun, Hyun Ok;Yu, Jeong Jin;Kang, So Yeon;Seo, Chang Deok;Baek, Jae Suk;Kim, Young-Hwue;Ko, Jae-Kon
- Clinical and Experimental Pediatrics
- /
- v.58 no.10
- /
- pp.369-373
- /
- 2015
Purpose: In 2004, the American Heart Association (AHA) had published an algorithm for the diagnosis of incomplete Kawasaki disease (KD). The aim of the present study was to investigate characteristics of supplemental laboratory criteria in this algorithm. Methods: We retrospectively examined the medical records of 355 patients with KD who were treated with intravenous immunoglobulin (IVIG) during the acute phase of the disease. Laboratory data were obtained before the initial IVIG administration and up to 10 days after fever onset. In 106 patients, laboratory testing was performed more than twice. Results: The AHA supplemental laboratory criteria were fulfilled in 90 patients (25.4%), and the frequency of laboratory examination (odds ratio [OR], 1.981; 95% confidence interval [CI], 1.391-2.821; P<0.001) was a significant predictor of it. The fulfillment of AHA supplemental laboratory criteria was significantly associated with refractoriness to the initial IVIG administration (OR, 2.388; 95% CI, 1.182-4.826; P=0.013) and dilatation of coronary arteries (OR, 2.776; 95% CI, 1.519-5.074; P=0.001). Conclusion: Repeated laboratory testing increased the rate of fulfillment of the AHA supplemental laboratory criteria in children with KD.
https://doi.org/10.3345/kjp.2015.58.10.369 인용 PDF KSCI

The Effects of Coordinative Locomotion Training Using the PNF Pattern on Walking in Patients with Spinal Cord Injury (PNF 패턴을 결합한 협응적 이동 훈련이 척수손상환자의 보행에 미치는 효과)

Hwang, Sang-Su;Maeng, Gwan-Cheol;Kim, Jin-In;Jung, Chang-Wook
- PNF and Movement
- /
- v.14 no.2
- /
- pp.67-74
- /
- 2016
Purpose: The purpose of this study was to prove the effects of coordinative locomotion training (CLT) on walking speed, walking endurance, and balance for incomplete spinal cord injury patients. Methods: Ten subjects were randomly assigned to the CLT group (n = 5) and the treadmill (TM) group (n = 5). The CLT group performed PNF pattern exercise using the motions of the sprinter and skater for 30 minutes, while the TM group performed using a treadmill for 30 minutes. Both groups performed these therapeutic interventions for five days per week, for a period of four weeks. A 10 meter walking test, Berg Balance Scale (BBS), and 6 meter walking test were used for the assessment of gait speed, balance, and gait endurance. The SPSS Ver. 18.0 statistical program was used for data processing. A Wilcoxon signed rank test was used for the comparison of pre- and post-intervention performance and a Mann-Whitney test was used for comparison between the groups. The significance level for the statistical inspection was set at 0.05. Results: Both groups showed significant improvements in the 10 meter walking test, Berg Balance Scale, and 6 meter walking test (P < 0.05). Conclusion: CLT had an effect on the improvement of walking speed, walking endurance, and the balance of incomplete spinal cord injury patients. Thus, we suggest that CLT is a therapeutic intervention for incomplete spinal cord injury patients.
https://doi.org/10.21598/JKPNFA.2016.14.2.67 인용 PDF

A Study of Suppliers' Participation in Private Exchanges: Focusing on MRO Markets (MRO 시장에서의 공급자의 전용마켓 참여에 관한 연구)

Lim, Seong-bae;Kim, Sung-Kwan;Mitchell, Robert B.;Hong, Soon-Goo
- The Journal of Society for e-Business Studies
- /
- v.9 no.4
- /
- pp.37-51
- /
- 2004
Many B2B electronic markets (EMs) are struggling to survive because they failed to attract enough participants. Thus reaching critical mass of participants is one of the key success factors for various types of EMs. The main purpose of this study is to investigate factors that lead MRO (maintenance, repair, and operating) suppliers to participate in private exchanges (PE), the buy-side EM. This paper introduces the characteristics of the PE according to the classification schemes introduced in previous studies about EM types. Literature is reviewed on suppliers' adoption of inter-organizational information systems focusing on EDI adoptions issues. Data analysis based on incomplete contract theory and the social exchange theory is then presented. The results of this study show that the number of suppliers and subsidy are factors that influence suppliers' participation in PEs. Nonsignificant results relating to trust imply that suppliers who are invited to participate in a PE do not expect their off-line relationships with the buyer to be transferred to the PE.
PDF

Postoperative Clipping Status after a Pterional versus Interhemispheric Approach for High-Positioned Anterior Communicating Artery Aneurysms

Kim, Myungsoo;Kim, Byoung-Joon;Son, Wonsoo;Park, Jaechan
- Journal of Korean Neurosurgical Society
- /
- v.64 no.4
- /
- pp.524-533
- /
- 2021
Objective : When treating high-positioned anterior communicating artery (ACoA) aneurysms, pterional-transsylvian and interhemispheric approaches are both viable options, yet comparative studies of these two surgical approaches are rare. Accordingly, this retrospective study investigated the surgical results of both approaches. Methods : Twenty-four patients underwent a pterional approach (n=11) or interhemispheric approach (n=13), including a unilateral low anterior interhemispheric approach or bifrontal interhemispheric approach, for high-positioned ACoA aneurysms with an aneurysm dome height >15 mm and aneurysm neck height >10 mm both measured from the level of the anterior clinoid process. The clinical and radiological data were reviewed to investigate the surgical results and risk factors of incomplete clipping. Results : The pterional patient group showed a significantly higher incidence of incomplete clipping than the interhemispheric patient group (p=0.031). Four patients (36.4%) who underwent a pterional approach showed a postclipping aneurysm remnant, whereas all the patients who experienced an interhemispheric approach showed complete clipping. In one case, the aneurysm remnant was obliterated by coiling, while follow-up of the other three cases showed the remnants remained limited to the aneurysm base. A multivariate analysis revealed that a pterional approach for a large aneurysm with a diameter >8 mm presented a statistically significant risk factor for incomplete clipping. Conclusion : For high-positioned ACoA aneurysms with a dome height >15 mm and neck height >10 mm above the level of the anterior clinoid process, a large aneurysm with a diameter >8 mm can be clipped more completely via an interhemispheric approach than via a pterional approach.
https://doi.org/10.3340/jkns.2020.0215 인용 PDF KSCI

A simple and efficient data loss recovery technique for SHM applications

Thadikemalla, Venkata Sainath Gupta;Gandhi, Abhay S.
- Smart Structures and Systems
- /
- v.20 no.1
- /
- pp.35-42
- /
- 2017
Recently, compressive sensing based data loss recovery techniques have become popular for Structural Health Monitoring (SHM) applications. These techniques involve an encoding process which is onerous to sensor node because of random sensing matrices used in compressive sensing. In this paper, we are presenting a model where the sampled raw acceleration data is directly transmitted to base station/receiver without performing any type of encoding at transmitter. The received incomplete acceleration data after data losses can be reconstructed faithfully using compressive sensing based reconstruction techniques. An in-depth simulated analysis is presented on how random losses and continuous losses affects the reconstruction of acceleration signals (obtained from a real bridge). Along with performance analysis for different simulated data losses (from 10 to 50%), advantages of performing interleaving before transmission are also presented.
https://doi.org/10.12989/sss.2017.20.1.035 인용 KSCI

Application of data mining and statistical measurement of agricultural high-quality development

Yan Zhou
- Advances in nano research
- /
- v.14 no.3
- /
- pp.225-234
- /
- 2023
In this study, we aim to use big data resources and statistical analysis to obtain a reliable instruction to reach high-quality and high yield agricultural yields. In this regard, soil type data, raining and temperature data as well as wheat production in each year are collected for a specific region. Using statistical methodology, the acquired data was cleaned to remove incomplete and defective data. Afterwards, using several classification methods in machine learning we tried to distinguish between different factors and their influence on the final crop yields. Comparing the proposed models' prediction using statistical quantities correlation factor and mean squared error between predicted values of the crop yield and actual values the efficacy of machine learning methods is discussed. The results of the analysis show high accuracy of machine learning methods in the prediction of the crop yields. Moreover, it is indicated that the random forest (RF) classification approach provides best results among other classification methods utilized in this study.
https://doi.org/10.12989/anr.2023.14.3.225 인용

Search Result 725, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)