• Title/Summary/Keyword: 데이터전처리

Search Result 19, Processing Time 0.02 seconds

Retransmission Number and ACK Path Setting Scheme for Efficient Data Transmission In Wireless Sensor Network (무선센서네트워크에서 에너지 효율적인 데이터 전송을 위한 재전송 횟수와 ACK 전송 경로 설정 기법)

  • Hwang, Boram;Shon, Minhan;Choo, Hyunseung
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2011.11a
    • /
    • pp.671-672
    • /
    • 2011
  • 비 신뢰적이고 비 대칭적인 무선 링크를 갖는 무선센서네트워크에서 데이터전송이 성공했음에도 ACK 메시지 전송 실패로 발생하는 불필요한 데이터 재전송을 피해 에너지 효율을 높이는 전송기법들이 제안되었다. 이러한 전송기법의 하나인 Probability-based Data Forwarding(PDF)기법에서 기대 전송 횟수 임계치를 설정해 임계값만큼 데이터를 재전송 하였다면 더 이상 재전송하지 않음으로 불필요한 에너지 낭비를 줄인다. 하지만 PDF 는 데이터 전송 횟수만을 고려함으로, 여전히 ACK 메시지의 신뢰성 있는 전송을 보장하지 못한다. 따라서 본 논문은 에너지 효율적인 PDF 기법에 비 대칭적 무선 노드의 특성을 고려하여 ACK 메시지 전송 시 사용되는 역방향 링크의 신뢰성이 낮다면 높은 신뢰성의 역방향 링크를 갖는 노드들을 선택하여 멀티 홉으로 송신 노드에게 ACK 메시지를 전송하는 기법을 적용한다. 이를 통해 불필요한 데이터 전송을 줄여 에너지의 낭비를 줄이고 무선센서네트워크의 수명을 연장한다.

A Study on Automation of Big Data Quality Diagnosis Using Machine Learning (머신러닝을 이용한 빅데이터 품질진단 자동화에 관한 연구)

  • Lee, Jin-Hyoung
    • The Journal of Bigdata
    • /
    • v.2 no.2
    • /
    • pp.75-86
    • /
    • 2017
  • In this study, I propose a method to automate the method to diagnose the quality of big data. The reason for automating the quality diagnosis of Big Data is that as the Fourth Industrial Revolution becomes a issue, there is a growing demand for more volumes of data to be generated and utilized. Data is growing rapidly. However, if it takes a lot of time to diagnose the quality of the data, it can take a long time to utilize the data or the quality of the data may be lowered. If you make decisions or predictions from these low-quality data, then the results will also give you the wrong direction. To solve this problem, I have developed a model that can automate diagnosis for improving the quality of Big Data using machine learning which can quickly diagnose and improve the data. Machine learning is used to automate domain classification tasks to prevent errors that may occur during domain classification and reduce work time. Based on the results of the research, I can contribute to the improvement of data quality to utilize big data by continuing research on the importance of data conversion, learning methods for unlearned data, and development of classification models for each domain.

  • PDF

KISTI-ML Platform: A Community-based Rapid AI Model Development Tool for Scientific Data (KISTI-ML 플랫폼: 과학기술 데이터를 위한 커뮤니티 기반 AI 모델 개발 도구)

  • Lee, Jeongcheol;Ahn, Sunil
    • Journal of Internet Computing and Services
    • /
    • v.20 no.6
    • /
    • pp.73-84
    • /
    • 2019
  • Machine learning as a service, the so-called MLaaS, has recently attracted much attention in almost all industries and research groups. The main reason for this is that you do not need network servers, storage, or even data scientists, except for the data itself, to build a productive service model. However, machine learning is often very difficult for most developers, especially in traditional science due to the lack of well-structured big data for scientific data. For experiment or application researchers, the results of an experiment are rarely shared with other researchers, so creating big data in specific research areas is also a big challenge. In this paper, we introduce the KISTI-ML platform, a community-based rapid AI model development for scientific data. It is a place where machine learning beginners use their own data to automatically generate code by providing a user-friendly online development environment. Users can share datasets and their Jupyter interactive notebooks among authorized community members, including know-how such as data preprocessing to extract features, hidden network design, and other engineering techniques.

Pre-processing Method of Raw Data Based on Ontology for Machine Learning (머신러닝을 위한 온톨로지 기반의 Raw Data 전처리 기법)

  • Hwang, Chi-Gon;Yoon, Chang-Pyo
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.5
    • /
    • pp.600-608
    • /
    • 2020
  • Machine learning constructs an objective function from learning data, and predicts the result of the data generated by checking the objective function through test data. In machine learning, input data is subjected to a normalisation process through a preprocessing. In the case of numerical data, normalization is standardized by using the average and standard deviation of the input data. In the case of nominal data, which is non-numerical data, it is converted into a one-hot code form. However, this preprocessing alone cannot solve the problem. For this reason, we propose a method that uses ontology to normalize input data in this paper. The test data for this uses the received signal strength indicator (RSSI) value of the Wi-Fi device collected from the mobile device. These data are solved through ontology because they includes noise and heterogeneous problems.

Verification of the Suitability of Fine Dust and Air Quality Management Systems Based on Artificial Intelligence Evaluation Models

  • Heungsup Sim
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.8
    • /
    • pp.165-170
    • /
    • 2024
  • This study aims to verify the accuracy of the air quality management system in Yangju City using an artificial intelligence (AI) evaluation model. The consistency and reliability of fine dust data were assessed by comparing public data from the Ministry of Environment with data from Yangju City's air quality management system. To this end, we analyzed the completeness, uniqueness, validity, consistency, accuracy, and integrity of the data. Exploratory statistical analysis was employed to compare data consistency. The results of the AI-based data quality index evaluation revealed no statistically significant differences between the two datasets. Among AI-based algorithms, the random forest model demonstrated the highest predictive accuracy, with its performance evaluated through ROC curves and AUC. Notably, the random forest model was identified as a valuable tool for optimizing the air quality management system. This study confirms that the reliability and suitability of fine dust data can be effectively assessed using AI-based model performance evaluation, contributing to the advancement of air quality management strategies.

Fandom-Persona Design based on Social Network Analysis (소셜 네트워크 분석을 이용한 팬덤 페르소나 디자인)

  • Sul, Sanghun;Seong, Kihun
    • Journal of Internet Computing and Services
    • /
    • v.20 no.5
    • /
    • pp.87-94
    • /
    • 2019
  • In this paper, the method of analyzing the unformatted data of consumers accumulated on social networks in the era of the Fourth Industrial Revolution by utilizing data from the service design and social psychology aspects was proposed. First, the fandom phenomenon, which shows subjective and collective behavior in a space on a social network rather than physical space, was defined from a data service perspective. The fandom model has been transformed into a collective level of customer Persona that has been analyzed at a personal level in traditional service design, and social network analysis that analyzes consumers' big data has been presented as an efficient way to pattern and visually analyze it. Consumer data collected through social leasing were pre-processed by column based on correlation, stability, missing, and ID-ness. Based on the above data, the company's brand strategy was divided into active and passive interventions and the effect of this strategic attitude on the growth direction of the consumer's fandom community was analyzed. To this end, the fandom model of consumers was proposed by dividing it into four strategies that the brand strategy had: stand-alone, decentralized, integrated and centralized, and the fandom shape of consumers was proposed as a growth model analysis technique that analyzes changes over time.

An Algorithm for Stable Video Conference System (안정적인 화상회의 시스템을 위한 알고리즘)

  • Lee Moon-Ku
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.42 no.2 s.302
    • /
    • pp.11-20
    • /
    • 2005
  • In previous video conference system, when the number of participants in video conference increases by n, the bandwidth and memory of n2 is required. And also, it brings about increase in traffic and problem of a say during a conference in aspect of transmission of voice data. In this paper, we propose an algorithm of remote video conference using silence detection algerian to resolve the questions such as buffering method of video data in server and heavy traffic detection algorithm to the increase in participants. Video data buffering algorithm is not a method of broadcasting to other client in the server, but this algorithm uses two other methods; the buffering method of receiving compressed video data from clients and the indexing method for acquiring the video data of other participants in clients according to clients' bandwidth and network transmission speed. We apply a voice transmission algerian and a channel management algorithm to the remote video conference system. The method used in the voice transmission algorithm is a silence detection algorithm which does not send silent participants' voice data to the server. The channel management algorithm is a method allocating a say to the participants who have priority. In consideration of average 20 frames and 30ms regardless of a number of participants, we can safely conclude that the transmission of video and voice data is stable.

Study on data preprocessing methods for considering snow accumulation and snow melt in dam inflow prediction using machine learning & deep learning models (머신러닝&딥러닝 모델을 활용한 댐 일유입량 예측시 융적설을 고려하기 위한 데이터 전처리에 대한 방법 연구)

  • Jo, Youngsik;Jung, Kwansue
    • Journal of Korea Water Resources Association
    • /
    • v.57 no.1
    • /
    • pp.35-44
    • /
    • 2024
  • Research in dam inflow prediction has actively explored the utilization of data-driven machine learning and deep learning (ML&DL) tools across diverse domains. Enhancing not just the inherent model performance but also accounting for model characteristics and preprocessing data are crucial elements for precise dam inflow prediction. Particularly, existing rainfall data, derived from snowfall amounts through heating facilities, introduces distortions in the correlation between snow accumulation and rainfall, especially in dam basins influenced by snow accumulation, such as Soyang Dam. This study focuses on the preprocessing of rainfall data essential for the application of ML&DL models in predicting dam inflow in basins affected by snow accumulation. This is vital to address phenomena like reduced outflow during winter due to low snowfall and increased outflow during spring despite minimal or no rain, both of which are physical occurrences. Three machine learning models (SVM, RF, LGBM) and two deep learning models (LSTM, TCN) were built by combining rainfall and inflow series. With optimal hyperparameter tuning, the appropriate model was selected, resulting in a high level of predictive performance with NSE ranging from 0.842 to 0.894. Moreover, to generate rainfall correction data considering snow accumulation, a simulated snow accumulation algorithm was developed. Applying this correction to machine learning and deep learning models yielded NSE values ranging from 0.841 to 0.896, indicating a similarly high level of predictive performance compared to the pre-snow accumulation application. Notably, during the snow accumulation period, adjusting rainfall during the training phase was observed to lead to a more accurate simulation of observed inflow when predicted. This underscores the importance of thoughtful data preprocessing, taking into account physical factors such as snowfall and snowmelt, in constructing data models.

A Study on the Clustering Method of Row and Multiplex Housing in Seoul Using K-Means Clustering Algorithm and Hedonic Model (K-Means Clustering 알고리즘과 헤도닉 모형을 활용한 서울시 연립·다세대 군집분류 방법에 관한 연구)

  • Kwon, Soonjae;Kim, Seonghyeon;Tak, Onsik;Jeong, Hyeonhee
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.95-118
    • /
    • 2017
  • Recent centrally the downtown area, the transaction between the row housing and multiplex housing is activated and platform services such as Zigbang and Dabang are growing. The row housing and multiplex housing is a blind spot for real estate information. Because there is a social problem, due to the change in market size and information asymmetry due to changes in demand. Also, the 5 or 25 districts used by the Seoul Metropolitan Government or the Korean Appraisal Board(hereafter, KAB) were established within the administrative boundaries and used in existing real estate studies. This is not a district classification for real estate researches because it is zoned urban planning. Based on the existing study, this study found that the city needs to reset the Seoul Metropolitan Government's spatial structure in estimating future housing prices. So, This study attempted to classify the area without spatial heterogeneity by the reflected the property price characteristics of row housing and Multiplex housing. In other words, There has been a problem that an inefficient side has arisen due to the simple division by the existing administrative district. Therefore, this study aims to cluster Seoul as a new area for more efficient real estate analysis. This study was applied to the hedonic model based on the real transactions price data of row housing and multiplex housing. And the K-Means Clustering algorithm was used to cluster the spatial structure of Seoul. In this study, data onto real transactions price of the Seoul Row housing and Multiplex Housing from January 2014 to December 2016, and the official land value of 2016 was used and it provided by Ministry of Land, Infrastructure and Transport(hereafter, MOLIT). Data preprocessing was followed by the following processing procedures: Removal of underground transaction, Price standardization per area, Removal of Real transaction case(above 5 and below -5). In this study, we analyzed data from 132,707 cases to 126,759 data through data preprocessing. The data analysis tool used the R program. After data preprocessing, data model was constructed. Priority, the K-means Clustering was performed. In addition, a regression analysis was conducted using Hedonic model and it was conducted a cosine similarity analysis. Based on the constructed data model, we clustered on the basis of the longitude and latitude of Seoul and conducted comparative analysis of existing area. The results of this study indicated that the goodness of fit of the model was above 75 % and the variables used for the Hedonic model were significant. In other words, 5 or 25 districts that is the area of the existing administrative area are divided into 16 districts. So, this study derived a clustering method of row housing and multiplex housing in Seoul using K-Means Clustering algorithm and hedonic model by the reflected the property price characteristics. Moreover, they presented academic and practical implications and presented the limitations of this study and the direction of future research. Academic implication has clustered by reflecting the property price characteristics in order to improve the problems of the areas used in the Seoul Metropolitan Government, KAB, and Existing Real Estate Research. Another academic implications are that apartments were the main study of existing real estate research, and has proposed a method of classifying area in Seoul using public information(i.e., real-data of MOLIT) of government 3.0. Practical implication is that it can be used as a basic data for real estate related research on row housing and multiplex housing. Another practical implications are that is expected the activation of row housing and multiplex housing research and, that is expected to increase the accuracy of the model of the actual transaction. The future research direction of this study involves conducting various analyses to overcome the limitations of the threshold and indicates the need for deeper research.