• Title/Summary/Keyword: Big data processing

Search Result 1,063, Processing Time 0.029 seconds

Management of Distributed Nodes for Big Data Analysis in Small-and-Medium Sized Hospital (중소병원에서의 빅데이터 분석을 위한 분산 노드 관리 방안)

  • Ryu, Wooseok
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2016.05a
    • /
    • pp.376-377
    • /
    • 2016
  • Performance of Hadoop, which is a distributed data processing framework for big data analysis, is affected by several characteristics of each node in distributed cluster such as processing power and network bandwidth. This paper analyzes previous approaches for heterogeneous hadoop clusters, and presents several requirements for distributed node clustering in small-and-medium sized hospitals by considering computing environments of the hospitals.

  • PDF

KOREAN TOPIC MODELING USING MATRIX DECOMPOSITION

  • June-Ho Lee;Hyun-Min Kim
    • East Asian mathematical journal
    • /
    • v.40 no.3
    • /
    • pp.307-318
    • /
    • 2024
  • This paper explores the application of matrix factorization, specifically CUR decomposition, in the clustering of Korean language documents by topic. It addresses the unique challenges of Natural Language Processing (NLP) in dealing with the Korean language's distinctive features, such as agglutinative words and morphological ambiguity. The study compares the effectiveness of Latent Semantic Analysis (LSA) using CUR decomposition with the classical Singular Value Decomposition (SVD) method in the context of Korean text. Experiments are conducted using Korean Wikipedia documents and newspaper data, providing insight into the accuracy and efficiency of these techniques. The findings demonstrate the potential of CUR decomposition to improve the accuracy of document clustering in Korean, offering a valuable approach to text mining and information retrieval in agglutinative languages.

Usefulness of RHadoop in Case of Healthcare Big Data Analysis (RHadoop을 이용한 보건의료 빅데이터 분석의 유효성)

  • Ryu, Wooseok
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2017.10a
    • /
    • pp.115-117
    • /
    • 2017
  • R has become a popular analytics platform as it provides powerful analytic functions as well as visualizations. However, it has a weakness in which scalability is limited. As an alternative, the RHadoop package facilitates distributed processing of R programs under the Hadoop platform. This paper investigates usefulness of the RHadoop package when analyzing healthcare big data that is widely open in the internet space. To do this, this paper has compared analytic performances of R and RHadoop using the medical treatment records of year 2015 provided by National Health Insurance Service. The result shows that RHadoop effectively enhances processing performance of healthcare big data compared with R.

  • PDF

Design of the Intelligent LBS Service : Using Big Data Distributed Processing System (빅데이터 분산처리 시스템을 활용한 지능형 LBS서비스의 설계)

  • Mun, Chang-Bae;Park, Hyun-Seok
    • The Journal of the Korea Contents Association
    • /
    • v.19 no.2
    • /
    • pp.159-169
    • /
    • 2019
  • Today, the location based service(LBS) is globally developing with the advance of smart phones and IOT devices. The main purpose of this research is to provide users with the most efficient route information, analyzing big data of people with a variety of routes. This system will enable users to have a similar feeling of getting a direct guidance from a person who has often used the route. It is possible because the system server analyzes the route information of people in real time, after composing the distributed processing system on the basis of map information. In the future, the system will be able to amazingly develop with the association of various LBS services, providing users with more precise and safer route information.

Automated Machine Learning-Based Solar PV Forecasting Considering Solar Position Information (태양 위치 정보를 고려한 AutoML 기반의 태양광 발전량 예측)

  • Jinyeong Oh;Dayeong So;Byeongcheon Lee;Jihoon Moon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.05a
    • /
    • pp.322-323
    • /
    • 2023
  • 지속 가능한 에너지인 태양광 발전은 전 세계에서 널리 활용하는 재생 에너지 원천 중 하나로 최근 효율적인 태양광 발전 시스템 운영을 위해 태양광 발전량을 정확하게 예측하기 위한 연구가 활발히 진행되고 있다. 태양광 발전량 예측 모델을 구성하기 위해서는 기상 및 대기 환경을 넘어 태양의 위치에 따른 일사량의 정보가 필수적이나 태양의 실시간 위치 정보를 입력 변수로 활용한 연구가 부족한 실정이다. 그리하여 본 논문에서는 시간과 태양광 발전소 위치를 기반으로 태양의 고도와 방위각을 실시간으로 계산하여 입력 변수로 사용하는 방식을 제안한다. 이를 위해 AutoML 기반의 다양한 기계학습 모델을 구성하여 태양광 발전율을 예측하고 그 성능을 비교 분석하였다. 실험 결과, 태양 위치 정보를 포함한 경우에 환경 변수만을 고려하였을 때보다 예측 성능이 크게 향상되었음을 확인할 수 있었으며, Extra Trees 모델의 경우 태양 위치 정보를 추가하였을 때 MAE(Mean Absolute Error)가 33.90 에서 22.38 까지 낮아지는 결과를 확인하였다.

Personalized Exercise Routine Recommendation System for Individuals with Intellectual Disabilities (지적 장애인을 위한 개인화 운동 루틴 추천 시스템)

  • Jimin Lee;Dayeong So;Yerim Jeon;Eunjin (Jinny) Jo;Jihoon Moon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.05a
    • /
    • pp.366-367
    • /
    • 2023
  • 지적 장애인은 제한된 활동 환경 범위의 제약으로 인해 자기 신체 구조에 맞는 운동법을 접할 기회가 적고, 각자의 건강 상태와 신체 구조에 따라 운동할 때 세심한 요구가 필요하다. 본 논문은 지적 장애인을 대상으로 비만 관리에 대한 필요성 인지 및 신체 활동량을 늘리기 위한 개인 맞춤형 운동 루틴 추천 시스템을 제안하였다. 제안한 시스템을 구성하기 위해 먼저 대한장애인체육회에서 제공하는 건강 상태, 신체 정보, 장애 유형 및 등급 등의 데이터를 분석하였다. 또한, 웹 사이트에서 장애인의 입력 정보가 들어오면 TF-IDF 벡터를 산출하고, 다른 사용자와의 코사인 유사성을 분석해 운동 루틴을 제안하였다. 본 연구에서 제안한 추천 시스템을 통해 지적 장애인을 대상으로 맞춤형 건강관리에 대한 인식 향상 및 건강권 보장, 운동 효율 증진 등을 기대할 수 있다.

A Network Intrusion Security Detection Method Using BiLSTM-CNN in Big Data Environment

  • Hong Wang
    • Journal of Information Processing Systems
    • /
    • v.19 no.5
    • /
    • pp.688-701
    • /
    • 2023
  • The conventional methods of network intrusion detection system (NIDS) cannot measure the trend of intrusiondetection targets effectively, which lead to low detection accuracy. In this study, a NIDS method which based on a deep neural network in a big-data environment is proposed. Firstly, the entire framework of the NIDS model is constructed in two stages. Feature reduction and anomaly probability output are used at the core of the two stages. Subsequently, a convolutional neural network, which encompasses a down sampling layer and a characteristic extractor consist of a convolution layer, the correlation of inputs is realized by introducing bidirectional long short-term memory. Finally, after the convolution layer, a pooling layer is added to sample the required features according to different sampling rules, which promotes the overall performance of the NIDS model. The proposed NIDS method and three other methods are compared, and it is broken down under the conditions of the two databases through simulation experiments. The results demonstrate that the proposed model is superior to the other three methods of NIDS in two databases, in terms of precision, accuracy, F1- score, and recall, which are 91.64%, 93.35%, 92.25%, and 91.87%, respectively. The proposed algorithm is significant for improving the accuracy of NIDS.

Bankruptcy Prediction Modeling Using Qualitative Information Based on Big Data Analytics (빅데이터 기반의 정성 정보를 활용한 부도 예측 모형 구축)

  • Jo, Nam-ok;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.2
    • /
    • pp.33-56
    • /
    • 2016
  • Many researchers have focused on developing bankruptcy prediction models using modeling techniques, such as statistical methods including multiple discriminant analysis (MDA) and logit analysis or artificial intelligence techniques containing artificial neural networks (ANN), decision trees, and support vector machines (SVM), to secure enhanced performance. Most of the bankruptcy prediction models in academic studies have used financial ratios as main input variables. The bankruptcy of firms is associated with firm's financial states and the external economic situation. However, the inclusion of qualitative information, such as the economic atmosphere, has not been actively discussed despite the fact that exploiting only financial ratios has some drawbacks. Accounting information, such as financial ratios, is based on past data, and it is usually determined one year before bankruptcy. Thus, a time lag exists between the point of closing financial statements and the point of credit evaluation. In addition, financial ratios do not contain environmental factors, such as external economic situations. Therefore, using only financial ratios may be insufficient in constructing a bankruptcy prediction model, because they essentially reflect past corporate internal accounting information while neglecting recent information. Thus, qualitative information must be added to the conventional bankruptcy prediction model to supplement accounting information. Due to the lack of an analytic mechanism for obtaining and processing qualitative information from various information sources, previous studies have only used qualitative information. However, recently, big data analytics, such as text mining techniques, have been drawing much attention in academia and industry, with an increasing amount of unstructured text data available on the web. A few previous studies have sought to adopt big data analytics in business prediction modeling. Nevertheless, the use of qualitative information on the web for business prediction modeling is still deemed to be in the primary stage, restricted to limited applications, such as stock prediction and movie revenue prediction applications. Thus, it is necessary to apply big data analytics techniques, such as text mining, to various business prediction problems, including credit risk evaluation. Analytic methods are required for processing qualitative information represented in unstructured text form due to the complexity of managing and processing unstructured text data. This study proposes a bankruptcy prediction model for Korean small- and medium-sized construction firms using both quantitative information, such as financial ratios, and qualitative information acquired from economic news articles. The performance of the proposed method depends on how well information types are transformed from qualitative into quantitative information that is suitable for incorporating into the bankruptcy prediction model. We employ big data analytics techniques, especially text mining, as a mechanism for processing qualitative information. The sentiment index is provided at the industry level by extracting from a large amount of text data to quantify the external economic atmosphere represented in the media. The proposed method involves keyword-based sentiment analysis using a domain-specific sentiment lexicon to extract sentiment from economic news articles. The generated sentiment lexicon is designed to represent sentiment for the construction business by considering the relationship between the occurring term and the actual situation with respect to the economic condition of the industry rather than the inherent semantics of the term. The experimental results proved that incorporating qualitative information based on big data analytics into the traditional bankruptcy prediction model based on accounting information is effective for enhancing the predictive performance. The sentiment variable extracted from economic news articles had an impact on corporate bankruptcy. In particular, a negative sentiment variable improved the accuracy of corporate bankruptcy prediction because the corporate bankruptcy of construction firms is sensitive to poor economic conditions. The bankruptcy prediction model using qualitative information based on big data analytics contributes to the field, in that it reflects not only relatively recent information but also environmental factors, such as external economic conditions.

Big Data based Epidemic Investigation Support System using Mobile Network Data (이동통신 데이터를 활용한 빅데이터 기반 역학조사지원 시스템)

  • Lee, Min-woo;Kim, Ye-ji;Yi, Jae-jin;Moon, Kyu-hwan;Hwang, SeonBae;Jun, Yong-joo;Hahm, Yu-Kun
    • The Journal of Bigdata
    • /
    • v.5 no.2
    • /
    • pp.187-199
    • /
    • 2020
  • The World Health Organization declared COVID-19 a pandemic on March 11. South Korea recorded 27,000 cases of the coronavirus illness, and more than 50 million coronavirus cases were confirmed all over the world. An epidemiological investigation becomes important once again due to the spread of COVID-19 infections. However, there were a number of confirmed coronavirus cases from Deagu and Gyeongbuk. Limitations of the epidemiological investigation methods were recognized. The Korea Disease Control and Prevention Agency developed the Epidemiological Investigation Support System(EISS) to utilize the smart city data hub technology and utilized the system in the epidemiological investigation. As a part of EISS, The proposed system is big-data bsed epidemiological investigation support system processing mobile network data. The established system is the epidemiological investigation support system based on big data to process mobile carriers' big data. Processing abnormal values of mobile carriers' data which was impossible with existing staff or creating hotspot regions where more than two people were in contact with an infected person were realized. As a result, our system processes outlier of mobile network data in 30 seconds, while processes hotspot around in 10 minutes. as a first time to adapt and support bigdata system into epidemiological investigation, our system proposes the practical utilizability of big-data system into epidemiological investigation.

A Study on Unstructured text data Post-processing Methodology using Stopword Thesaurus (불용어 시소러스를 이용한 비정형 텍스트 데이터 후처리 방법론에 관한 연구)

  • Won-Jo Lee
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.6
    • /
    • pp.935-940
    • /
    • 2023
  • Most text data collected through web scraping for artificial intelligence and big data analysis is generally large and unstructured, so a purification process is required for big data analysis. The process becomes structured data that can be analyzed through a heuristic pre-processing refining step and a post-processing machine refining step. Therefore, in this study, in the post-processing machine refining process, the Korean dictionary and the stopword dictionary are used to extract vocabularies for frequency analysis for word cloud analysis. In this process, "user-defined stopwords" are used to efficiently remove stopwords that were not removed. We propose a methodology for applying the "thesaurus" and examine the pros and cons of the proposed refining method through a case analysis using the "user-defined stop word thesaurus" technique proposed to complement the problems of the existing "stop word dictionary" method with R's word cloud technique. We present comparative verification and suggest the effectiveness of practical application of the proposed methodology.