Search | Korea Science

A Performance Comparison of Machine Learning Library based on Apache Spark for Real-time Data Processing (실시간 데이터 처리를 위한 아파치 스파크 기반 기계 학습 라이브러리 성능 비교)

Song, Jun-Seok;Kim, Sang-Young;Song, Byung-Hoo;Kim, Kyung-Tae;Youn, Hee-Yong
- Proceedings of the Korean Society of Computer Information Conference
- /
- 2017.01a
- /
- pp.15-16
- /
- 2017
IoT 시대가 도래함에 따라 실시간으로 대규모 데이터가 발생하고 있으며 이를 효율적으로 처리하고 활용하기 위한 분산 처리 및 기계 학습에 대한 관심이 높아지고 있다. 아파치 스파크는 RDD 기반의 인 메모리 처리 방식을 지원하는 분산 처리 플랫폼으로 다양한 기계 학습 라이브러리와의 연동을 지원하여 최근 차세대 빅 데이터 분석 엔진으로 주목받고 있다. 본 논문에서는 아파치 스파크 기반 기계 학습 라이브러리 성능 비교를 통해 아파치 스파크와 연동 가능한 기계 학습라이브러리인 MLlib와 아파치 머하웃, SparkR의 데이터 처리 성능을 비교한다. 이를 위해, 대표적인 기계 학습 알고리즘인 나이브 베이즈 알고리즘을 사용했으며 학습 시간 및 예측 시간을 비교하여 아파치 스파크 기반에서 실시간 데이터 처리에 적합한 기계 학습 라이브러리를 확인한다.
PDF

Risk Factors of Unplanned Readmission to Intensive Care Unit (중환자실 환자의 비계획적 재입실 위험 요인)

Kim, Yu Jeong;Kim, Keum Soon
- Journal of Korean Clinical Nursing Research
- /
- v.19 no.2
- /
- pp.265-274
- /
- 2013
Purpose: The aim of this study was to determine the risk factors contributed to unplanned readmission to intensive care unit (ICU) and to investigate the prediction model of unplanned readmission. Methods: We retrospectively reviewed the electronic medical records which included the data of 3,903 patients who had discharged from ICUs in a university hospital in Seoul from January 2011 to April 2012. Results: The unplanned readmission rate was 4.8% (n=186). The nine variables were significantly different between the unplanned readmission and no readmission groups: age, clinical department, length of stay at 1st ICU, operation, use of ventilator during 24 hours a day, APACHE II score at ICU admission and discharge, direct nursing care hours and Glasgow coma scale total score at 1st ICU discharge. The clinical department, length of stay at 1st ICU, operation and APACHE II score at ICU admission were the significant predictors of unplanned ICU readmission. The predictive model's area under the curve was .802 (p<.001). Conclusion: We identified the risk factors and the prediction model associated with unplanned ICU readmission. Better patient assessment tools and knowledge about risk factors could contribute to reduce unplanned ICU readmission rate and mortality.
https://doi.org/10.22650/JKCNR.2013.19.2.265 인용 PDF

Message Latency-based Load Shedding Mechanism in Apache Kafka (아파치 카프카의 메시지 지연시간 기반 로드 쉐딩 메커니즘)

Kim, Hajin;Bang, Jiwon;Son, Siwoon;Choi, Mi-Jung;Moon, Yang-Sae
- Proceedings of the Korea Information Processing Society Conference
- /
- 2018.10a
- /
- pp.573-576
- /
- 2018
아파치 카프카(Apache Kafka)는 데이터 스트림을 실시간 전달하는 분산 메시지 큐잉 플랫폼이다. 카프카는 대다수의 실시간 처리 응용에 사용되는데, 흔히 데이터 스트림의 발생지와 실시간 처리 시스템 사이(입력) 또는 실시간 처리 시스템과 처리 결과의 목적지 사이(출력)에 배치된다. 분산 기술을 도입한 카프카는 다른 메시지 큐잉 기술에 비해 대용량 데이터 스트림을 더욱 빠르게 전달 할 수 있다는 장점을 갖는다. 하지만, 카프카에 적재되는 데이터 스트림의 양과 실시간 처리 응용의 수가 증가할수록 메시지 지연시간은 매우 높아질 수 밖에 없다. 본 논문은 이러한 카프카의 메시지 지연 문제를 해결하고자 카프카의 로드 쉐딩 엔진을 제안한다. 로드 쉐딩의 세 가지 필수적인 결정에 따라, 제안하는 로드 쉐딩 엔진은 카프카의 프로뷰서에서 지연시간이 기준치를 초과할 경우 일부 메시지 전송을 제한하여 지연시간을 줄인다. 실제 실시간 처리 응용으로 실험한 결과, 단일/다중 데이터 스트리 모두 로드 쉐딩이 바르게 작동하여 지연시간이 지속적으로 증가하지 않고 오르내림이 반복되는 추세를 보였다. 본 연구는 데이터 스트림의 입출력을 카프카로 관리하는 실시간 처리 응용에 로드 쉐딩 기법을 적용한 첫 번째 시도로서, 앞으로 데이터 스트림 처리에 사용될 의미 있는 연구라 사료된다.
https://doi.org/10.3745/PKIPS.y2018m10a.573 인용 PDF

PARAFAC Tensor Reconstruction for Recommender System based on Apache Spark (아파치 스파크에서의 PARAFAC 분해 기반 텐서 재구성을 이용한 추천 시스템)

Im, Eo-Jin;Yong, Hwan-Seung
- Journal of Korea Multimedia Society
- /
- v.22 no.4
- /
- pp.443-454
- /
- 2019
In recent years, there has been active research on a recommender system that considers three or more inputs in addition to users and goods, making it a multi-dimensional array, also known as a tensor. The main issue with using tensor is that there are a lot of missing values, making it sparse. In order to solve this, the tensor can be shrunk using the tensor decomposition algorithm into a lower dimensional array called a factor matrix. Then, the tensor is reconstructed by calculating factor matrices to fill original empty cells with predicted values. This is called tensor reconstruction. In this paper, we propose a user-based Top-K recommender system by normalized PARAFAC tensor reconstruction. This method involves factorization of a tensor into factor matrices and reconstructs the tensor again. Before decomposition, the original tensor is normalized based on each dimension to reduce overfitting. Using the real world dataset, this paper shows the processing of a large amount of data and implements a recommender system based on Apache Spark. In addition, this study has confirmed that the recommender performance is improved through normalization of the tensor.
https://doi.org/10.9717/kmms.2019.22.4.443 인용 PDF KSCI HTML

SaaS application mashup based on High Speed Message Processing

Chen, Zhiguo;Kim, Myoungjin;Cui, Yun
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.16 no.5
- /
- pp.1446-1465
- /
- 2022
Diversified SaaS applications allow users more choices to use, according to their own preferences. However, the diversification of SaaS applications also makes it impossible for users to choose the best one. Furthermore, users can't take advantage of the functionality between SaaS applications. In this paper, we propose a platform that provides an SaaS mashup service, by extracting interoperable service functions from SaaS-based applications that independent vendors deploy and supporting a customized service recommendation function through log data binding in the cloud environment. The proposed SaaS mashup service platform consists of a SaaS aggregation framework and a log data binding framework. Each framework was concreted by using Apache Kafka and rule matrix-based recommendation techniques. We present the theoretical basis of implementing the high-performance message-processing function using Kafka. The SaaS mashup service platform, which provides a new type of mashup service by linking SaaS functions based on the above technology described, allows users to combine the required service functions freely and access the results of a rich service-utilization experience, using the SaaS mashup function. The platform developed through SaaS mashup service technology research will enable various flexible SaaS services, expected to contribute to the development of the smart-contents industry and the open market.
https://doi.org/10.3837/tiis.2022.05.003 인용 PDF KSCI HTML

Network Traffic Measurement Analysis using Machine Learning

Hae-Duck Joshua Jeong
- Korean Journal of Artificial Intelligence
- /
- v.11 no.2
- /
- pp.19-27
- /
- 2023
In recent times, an exponential increase in Internet traffic has been observed as a result of advancing development of the Internet of Things, mobile networks with sensors, and communication functions within various devices. Further, the COVID-19 pandemic has inevitably led to an explosion of social network traffic. Within this context, considerable attention has been drawn to research on network traffic analysis based on machine learning. In this paper, we design and develop a new machine learning framework for network traffic analysis whereby normal and abnormal traffic is distinguished from one another. To achieve this, we combine together well-known machine learning algorithms and network traffic analysis techniques. Using one of the most widely used datasets KDD CUP'99 in the Weka and Apache Spark environments, we compare and investigate results obtained from time series type analysis of various aspects including malicious codes, feature extraction, data formalization, network traffic measurement tool implementation. Experimental analysis showed that while both the logistic regression and the support vector machine algorithm were excellent for performance evaluation, among these, the logistic regression algorithm performs better. The quantitative analysis results of our proposed machine learning framework show that this approach is reliable and practical, and the performance of the proposed system and another paper is compared and analyzed. In addition, we determined that the framework developed in the Apache Spark environment exhibits a much faster processing speed in the Spark environment than in Weka as there are more datasets used to create and classify machine learning models.
https://doi.org/10.24225/kjai.2023.11.2.19 인용 PDF

A study on data collection environment and analysis using virtual server hosting of Azure cloud platform (Azure 클라우드 플랫폼의 가상서버 호스팅을 이용한 데이터 수집환경 및 분석에 관한 연구)

Lee, Jaekyu;Cho, Inpyo;Lee, Sangyub
- Proceedings of the Korean Society of Computer Information Conference
- /
- 2020.07a
- /
- pp.329-330
- /
- 2020
본 논문에서는 Azure 클라우드 플랫폼의 가상서버 호스팅을 이용해 데이터 수집 환경을 구축하고, Azure에서 제공하는 자동화된 기계학습(Automated Machine Learning, AutoML)을 기반으로 데이터 분석 방법에 관한 연구를 수행했다. 가상 서버 호스팅 환경에 LAMP(Linux, Apache, MySQL, PHP)를 설치하여 데이터 수집환경을 구축했으며, 수집된 데이터를 Azure AutoML에 적용하여 자동화된 기계학습을 수행했다. Azure AutoML은 소모적이고 반복적인 기계학습 모델 개발을 자동화하는 프로세스로써 기계학습 솔루션 구현하는데 시간과 자원(Resource)를 절약할 수 있다. 특히, AutoML은 수집된 데이터를 분류와 회귀 및 예측하는데 있어서 학습점수(Training Score)를 기반으로 보유한 데이터에 가장 적합한 기계학습 모델의 순위를 제공한다. 이는 데이터 분석에 필요한 기계학습 모델을 개발하는데 있어서 개발 초기 단계부터 코드를 설계하지 않아도 되며, 전체 기계학습 시스템을 개발 및 구현하기 전에 모델의 구성과 시스템을 설계해볼 수 있기 때문에 매우 효율적으로 활용될 수 있다. 본 논문에서는 NPU(Neural Processing Unit) 학습에 필요한 데이터 수집 환경에 관한 연구를 수행했으며, Azure AutoML을 기반으로 데이터 분류와 회귀 등 가장 효율적인 알고리즘 선정에 관한 연구를 수행했다.
PDF

The Prognostic Role of B-type Natriuretic Peptide in Acute Exacerbation of Chronic Obstructive Pulmonary Disease (만성폐쇄성폐질환의 급성 악화시 예후 인자로서의 혈중 B-type Natriuretic Peptide의 역할)

Lee, Ji Hyun;Oh, So Yeon;Hwang, Iljun;Kim, Okjun;Kim, Hyun Kuk;Kim, Eun Kyung;Lee, Ji-Hyun
- Tuberculosis and Respiratory Diseases
- /
- v.56 no.6
- /
- pp.600-610
- /
- 2004
Background : The plasma B-type natriuretic peptide(BNP) concentration increases with the degree of pulmonary hypertension in patients with chronic respiratory disease. The aim of this study was to examine the prognostic role of BNP in the acute exacerbation of chronic obstructive lung disease (COPD). Method : We selected 67 patients who were admitted our hospital because of an acute exacerbation of COPD. Their BNP levels were checked on admission at the Emergency Department. Their medical records were analyzed retrospectively. The patients were divided into two groups according to their in-hospital mortality. The patients' medical history, comobidity, exacerbation type, blood gas analysis, pulmonary function, APACHE II severity score and plasma BNP level were compared. Results : Multiple logistic regression analysis identified three independent predictors of mortality: $FEV_1$, APACHE II score and plasma BNP level. The decedents group showed a lower $FEV_1$($28{\pm}7$ vs. $37{\pm}15%$, p=0.005), a higher APACHE II score($22.4{\pm}6.1$ vs. $15.8{\pm}4.7$, p=0.000) and a higher BNP level ($201{\pm}116$ vs. $77{\pm}80pg/mL$, p=0.000) than the sSurvivors group. When the BNP cut-off level was set to 88pg/mL using the receiver operating characteristic curve, the sensitivity was 90% and the specificity was 75% in differentiating between the survivors and decedents. On Fisher's exact test, the odds ratio for mortality was 21.2 (95% CI 2.49 to 180.4) in the patients with a BNP level > 88pg/mL. Conclusion : The plasma BNP level might be a predictor of mortality in an acute exacerbation of COPD as well as the $FEV_1$ and APACHE II score.
PDF KSCI

The Predcitors of the Development of Acute Respiratory Distress Syndrome in the Patients with Acute Pancreatitis (급성 췌장염으로 내과계 중환자실에 입원한 환자들의 급성호흡곤란 증후군 발생에 연관된 인자에 관한 연구)

Yoo, Mi-Ran;Koh, Youn-Suck;Lim, Chae-Man;Lee, Moon-Gyu;Lee, Hong-Jae;Lee, Moo-Song;An, Jong-Jun;Lee, Sung-Koo;Kim, Myung-Hwan;Lee, Sang-Do;Kim, Woo-Sung;Kim, Dong-Soon;Kim, Won-Dong
- Tuberculosis and Respiratory Diseases
- /
- v.44 no.4
- /
- pp.861-870
- /
- 1997
Background : Though acute respiratory distress(ARDS) often occurs in the early stage of severe acute pancreatitis and significantly contributed to the mortality of the condition, the characteristics of the group who develops ARDS in the patients with acute pancreatitis have not been fully found. The objective of this investigation was to identify predictable factors which distinguish a group who would develop ARDS in the patients with acute pancreatitis. Method : A retrospective analysis of 94 cases in 86 patients who were admitted the Medical Intensive Care Unit with acute pancreatitis was done. ARDS were developed in 13 cases among them (13.8%). The possible clinical factors related to the development were analyzed using univariate analysis and $x^2$-test. Results : The risk of ARDS development was increased in the patients with abonormal findings of chest X-ray at admission compared to the patients with normal chest X-ray (p<0.05). The risk was also increased according to the sevecrity index score in abdominal computed tomography at the time of admission (p<0.05). The higher APACHE III score of the first day of admission, the more risk increment of ARDS development was observed (p<0.01). Patients with more than one points of Murray's lung injury score showed higher risk of ARDS compared to the patients with 0 points of that. The patients with sepsis and the patients with more than three organ dysfunction at admission had 3.5 times and 23.3 times higher risk of the development of ARDS compared to the patients without sepsis and without organ failure in each (p<0.05, p<0.01). Conclusion : The risk of ARDS development would be higher in the acute pancreatitis patients with abnormal chest X-ray, higher CT severity index, higher APACHE III or Murray's lung injury score, accompanying sepsis, and more than three organ failure at admission.
PDF

The Early Prognosis of Burn Patients with Elevated Initial Arterial Carboxyhemoglobin Level (초기 동맥혈 Carboxyhemoglobin 농도가 높았던 화상 환자들의 예후지표에 관한 연구)

Choi, Chang Soon;Kim, Cheal Hong;Kim, Keun Sook;Lee, Tae-Yu;Chung, Youn Son;Eom, Kwang Seok;Park, Young Bum;Jang, Seung Hun;Kim, Dong Gyu;Park, Myung Jae;Lee, Myung Goo;Hyun, In-Gyu;Jung, Ki-Suck;Kim, Jong Hyun
- Tuberculosis and Respiratory Diseases
- /
- v.55 no.2
- /
- pp.188-197
- /
- 2003
Background : Smoke inhalation injury is an important determinant of mortality in burn patients. The early detection of inhalation injury in burn patients is important because the incidence of respiratory failure after inhalation injury was known to be high, with hypoxemia, pneumonia, and prolonged ventilatory support being commonplace. Acute carbon monoxide poisoning was one feature of smoke inhalation. The purpose of our study were to investigate the clinical characteristics of burn patients whose initial arterial carboxyhemoglobin (COHb) level had been elevated, to assess the clinical impact of COHb for smoke inhalation injury. Methods : Among 1,416 burn patients had been admitted at our institution from August 1, 2001 to July 31, 2002, 39 patients whose initial arterial COHb level have been more than 5% were included. We compared clinical scoring system for inhalation injury, percent total body surface area (%TBSA) burn, initial chest X-ray findings, APACHE II scores and SAPS II scores between survivors (n=27) and non-survivors (n=12) retrospectively. Results : COHb level were 9.7(5.71% and 10.3(8.81% in survivors and in non-survivors (p>0.05). Mean %TBSA burn of survivors and non-survivors were $16.6{\pm}17.8%$ and $60.7{\pm}28.8%$ (p<0.001). We did not find any difference in clinical scoring system, initial chest X-ray findings in survivors and in non-survivors. But %TBSA burn, APACHE II and SAPS II scores were high in non-survivors than in survivors significantly. Important factors associated with death were %TBSA burn, APACHE II scores, SAPS II scores, and the most important factor in predicting mortality was %TBSA burn. Conclusion : Burn patients with elevated initial arterial COHb level showed poor prognosis, but further study may be performed to know that the effect of COHb on prognosis in burn patients accompanying smoke inhalation.
PDF KSCI

Search Result 355, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)