• Title/Summary/Keyword: 빅데이터 기법

Search Result 795, Processing Time 0.027 seconds

Development of a Gangwon Province Forest Fire Prediction Model using Machine Learning and Sampling (머신러닝과 샘플링을 이용한 강원도 지역 산불발생예측모형 개발)

  • Chae, Kyoung-jae;Lee, Yu-Ri;cho, yong-ju;Park, Ji-Hyun
    • The Journal of Bigdata
    • /
    • v.3 no.2
    • /
    • pp.71-78
    • /
    • 2018
  • The study is based on machine learning techniques to increase the accuracy of the forest fire predictive model. It used 14 years of data from 2003 to 2016 in Gang-won-do where forest fire were the most frequent. To reduce weather data errors, Gang-won-do was divided into nine areas and weather data from each region was used. However, dividing the forest fire forecast model into nine zones would make a large difference between the date of occurrence and the date of not occurring. Imbalance issues can degrade model performance. To address this, several sampling methods were applied. To increase the accuracy of the model, five indices in the Canadian Frost Fire Weather Index (FWI) were used as derived variable. The modeling method used statistical methods for logistic regression and machine learning methods for random forest and xgboost. The selection criteria for each zone's final model were set in consideration of accuracy, sensitivity and specificity, and the prediction of the nine zones resulted in 80 of the 104 fires that occurred, and 7426 of the 9758 non-fires. Overall accuracy was 76.1%.

SAAnnot-C3Pap: Ground Truth Collection Technique of Playing Posture Using Semi Automatic Annotation Method (SAAnnot-C3Pap: 반자동 주석화 방법을 적용한 연주 자세의 그라운드 트루스 수집 기법)

  • Park, So-Hyun;Kim, Seo-Yeon;Park, Young-Ho
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.10
    • /
    • pp.409-418
    • /
    • 2022
  • In this paper, we propose SAAnnot-C3Pap, a semi-automatic annotation method for obtaining ground truth of a player's posture. In order to obtain ground truth about the two-dimensional joint position in the existing music domain, openpose, a two-dimensional posture estimation method, was used or manually labeled. However, automatic annotation methods such as the existing openpose have the disadvantages of showing inaccurate results even though they are fast. Therefore, this paper proposes SAAnnot-C3Pap, a semi-automated annotation method that is a compromise between the two. The proposed approach consists of three main steps: extracting postures using openpose, correcting the parts with errors among the extracted parts using supervisely, and then analyzing the results of openpose and supervisely. Perform the synchronization process. Through the proposed method, it was possible to correct the incorrect 2D joint position detection result that occurred in the openpose, solve the problem of detecting two or more people, and obtain the ground truth in the playing posture. In the experiment, we compare and analyze the results of the semi-automated annotation method openpose and the SAAnnot-C3Pap proposed in this paper. As a result of comparison, the proposed method showed improvement of posture information incorrectly collected through openpose.

A Study of Influencing Factors on World Handball Win-Loss using the Decision Tree Analysis (의사결정나무 분석을 통한 세계핸드볼 승패결정요인 분석)

  • Kim, Hyunchul
    • Journal of Digital Convergence
    • /
    • v.19 no.5
    • /
    • pp.461-468
    • /
    • 2021
  • The purpose of this study is to collect official records of the 2019 Men's and Women's Handball World Championships to identify important shooting variables that determine the team's record of winning or losing. After collecting 192 games of men's and women's national teams from 24 countries and verifying the difference in competition records according to the winning and losing groups, the decision tree method, one of the data mining techniques, is analyzed. According to the analysis, the 9m shooting success rate and Near shooting success rate were the most important factors for both men and women. Men win 83.3% if the 9m shooting success rate is 32.5% or higher and the Near shooting success rate is 67.5%, and women win 75% if the 9m shooting success rate is 75% or more and the Near shooting success rate is 51%. Also, the women's yellow cards are considered important variables that determine victory or defeat. In conclusion, both men and women were able to identify the factors of winning and losing decision shooting, but follow-up studies are needed considering the relativity of various record variables and performance in future handball.

Research on Training and Implementation of Deep Learning Models for Web Page Analysis (웹페이지 분석을 위한 딥러닝 모델 학습과 구현에 관한 연구)

  • Jung Hwan Kim;Jae Won Cho;Jin San Kim;Han Jin Lee
    • The Journal of the Convergence on Culture Technology
    • /
    • v.10 no.2
    • /
    • pp.517-524
    • /
    • 2024
  • This study aims to train and implement a deep learning model for the fusion of website creation and artificial intelligence, in the era known as the AI revolution following the launch of the ChatGPT service. The deep learning model was trained using 3,000 collected web page images, processed based on a system of component and layout classification. This process was divided into three stages. First, prior research on AI models was reviewed to select the most appropriate algorithm for the model we intended to implement. Second, suitable web page and paragraph images were collected, categorized, and processed. Third, the deep learning model was trained, and a serving interface was integrated to verify the actual outcomes of the model. This implemented model will be used to detect multiple paragraphs on a web page, analyzing the number of lines, elements, and features in each paragraph, and deriving meaningful data based on the classification system. This process is expected to evolve, enabling more precise analysis of web pages. Furthermore, it is anticipated that the development of precise analysis techniques will lay the groundwork for research into AI's capability to automatically generate perfect web pages.

Design of Secure Scheme based on Bio-information Optimized for Car-sharing Cloud (카 쉐어링 클라우드 환경에서 최적화된 바이오 정보 기반 보안 기법 설계)

  • Lee, Kwang-Hyoung;Park, Sang-Hyeon
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.20 no.11
    • /
    • pp.469-478
    • /
    • 2019
  • Car-sharing services have been settled on as a new type of public transportation owing to their enhanced convenience, expanded awareness of practical consumption patterns, the inspiration for environmental conscientiousness, and the diffusion of smart phones following the economic crisis. With development of the market, many people have started using such services. However, security is still an issue. Damage is expected since IDs and passwords are required for log-in when renting and controlling the vehicles. The protocol suggested in this study uses bio-information, providing an optimized service, and convenient (but strong) authentication with various service-provider clouds registering car big data about users through brokers. If using the techniques suggested here, it is feasible to reduce the exposure of the bio-information, and to receive service from multiple service-provider clouds through one particular broker. In addition, the proposed protocol reduces public key operations and session key storage by 20% on mobile devices, compared to existing car-sharing platforms, and because it provides convenient, but strong, authentication (and therefore constitutes a secure channel), it is possible to proceed with secure communications. It is anticipated that the techniques suggested in this study will enhance secure communications and user convenience in the future car-sharing-service cloud environment.

A Trend Analysis of Agricultural and Food Marketing Studies Using Text-mining Technique (텍스트마이닝 기법을 이용한 국내 농식품유통 연구동향 분석)

  • Yoo, Li-Na;Hwang, Su-Chul
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.18 no.10
    • /
    • pp.215-226
    • /
    • 2017
  • This study analyzed trends in agricultural and food marketing studies from 1984 to 2015 using text-mining techniques. Text-mining is a part of Big-data analysis, which is an effective tool to objectively process large amounts of information based on categorization and trend analysis. In the present study, frequency analysis, topic analysis and association rules were conducted. Titles of agricultural and food marketing studies in four journals and reports were used for placing the analysis. The results showed that 1,126 total theses related to agricultural and food marketing could be categorized into six subjects. There were significant changes in research trends before and after the 2000s. While research before 2000s focused on farm and wholesale level marketing, research after the 2000s mainly covered consumption, (processed)food, exports and imports. Local food and school meals are new subjects that are increasingly being studied. Issues regarding agricultural supply and demand were the only subjects investigated in policy research studies. Interest in agricultural supply and demand was lost after the 2000s. A number of studies after the 2010s analyzed consumption, primarily consumption trends and consumer behavior.

Analysis of the Research Trends by Environmental Spatial-Information Using Text-Mining Technology (텍스트 마이닝 기법을 활용한 환경공간정보 연구 동향 분석)

  • OH, Kwan-Young;LEE, Moung-Jin;PARK, Bo-Young;LEE, Jung-Ho;YOON, Jung-Ho
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.20 no.1
    • /
    • pp.113-126
    • /
    • 2017
  • This study aimed to quantitatively analyze the trends in environmental research that utilize environmental geospatial information through text mining, one of the big data analysis technologies. The analysis was conducted on a total of 869 papers published in the Republic of Korea, which were collected from the National Digital Science Library (NDSL). On the basis of the classification scheme, the keywords extracted from the papers were recategorized into 10 environmental fields including "general environment", "climate", "air quality", and 20 environmental geospatial information fields including "satellite image", "numerical map", and "disaster". With the recategorized keywords, their frequency levels and time series changes in the collected papers were analyzed, as well as the association rules between keywords. First, the results of frequency analysis showed that "general environment"(40.85%) and "satellite image"(24.87%) had the highest frequency levels among environmental fields and environmental geospatial information fields, respectively. Second, the results of the time series analysis on environmental fields showed that the share of "climate" between 1996 and 2000 was high, but since 2001, that of "general environment" has increased. In terms of environmental geospatial information fields, the demand for "satellite image" was highest throughout the period analyzed, and its utilization share has also gradually increased. Third, a total of 80 correlation rules were generated for environmental fields and environmental geospatial information fields. Among environmental fields, "general environment" generated the highest number of correlation rules (17) with environmental geospatial information fields such as "satellite image" and "digital map".

Location Inference of Twitter Users using Timeline Data (타임라인데이터를 이용한 트위터 사용자의 거주 지역 유추방법)

  • Kang, Ae Tti;Kang, Young Ok
    • Spatial Information Research
    • /
    • v.23 no.2
    • /
    • pp.69-81
    • /
    • 2015
  • If one can infer the residential area of SNS users by analyzing the SNS big data, it can be an alternative by replacing the spatial big data researches which result from the location sparsity and ecological error. In this study, we developed the way of utilizing the daily life activity pattern, which can be found from timeline data of tweet users, to infer the residential areas of tweet users. We recognized the daily life activity pattern of tweet users from user's movement pattern and the regional cognition words that users text in tweet. The models based on user's movement and text are named as the daily movement pattern model and the daily activity field model, respectively. And then we selected the variables which are going to be utilized in each model. We defined the dependent variables as 0, if the residential areas that users tweet mainly are their home location(HL) and as 1, vice versa. According to our results, performed by the discriminant analysis, the hit ratio of the two models was 67.5%, 57.5% respectively. We tested both models by using the timeline data of the stress-related tweets. As a result, we inferred the residential areas of 5,301 users out of 48,235 users and could obtain 9,606 stress-related tweets with residential area. The results shows about 44 times increase by comparing to the geo-tagged tweets counts. We think that the methodology we have used in this study can be used not only to secure more location data in the study of SNS big data, but also to link the SNS big data with regional statistics in order to analyze the regional phenomenon.

Experiment and Implementation of a Machine-Learning Based k-Value Prediction Scheme in a k-Anonymity Algorithm (k-익명화 알고리즘에서 기계학습 기반의 k값 예측 기법 실험 및 구현)

  • Muh, Kumbayoni Lalu;Jang, Sung-Bong
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.9 no.1
    • /
    • pp.9-16
    • /
    • 2020
  • The k-anonymity scheme has been widely used to protect private information when Big Data are distributed to a third party for research purposes. When the scheme is applied, an optimal k value determination is one of difficult problems to be resolved because many factors should be considered. Currently, the determination has been done almost manually by human experts with their intuition. This leads to degrade performance of the anonymization, and it takes much time and cost for them to do a task. To overcome this problem, a simple idea has been proposed that is based on machine learning. This paper describes implementations and experiments to realize the proposed idea. In thi work, a deep neural network (DNN) is implemented using tensorflow libraries, and it is trained and tested using input dataset. The experiment results show that a trend of training errors follows a typical pattern in DNN, but for validation errors, our model represents a different pattern from one shown in typical training process. The advantage of the proposed approach is that it can reduce time and cost for experts to determine k value because it can be done semi-automatically.

Concrete Crack Detection Inside Finishing Materials Using Lock-in Thermography (위상 잠금 열화상 기법을 이용한 콘크리트 마감재 내부 균열 검출)

  • Myung-Hun Lee;Ukyong Woo;Hajin Choi;Jong-Chan Kim
    • Journal of the Korea institute for structural maintenance and inspection
    • /
    • v.27 no.6
    • /
    • pp.30-38
    • /
    • 2023
  • As the number of old buildings subject to safety inspection increases, the burden on designated institutions and management entities that are responsible for safety management is increasing. Accordingly, when selecting buildings subject to safety inspection, appropriate safety inspection standards and appropriate technology are essential. The current safety inspection standards for old buildings give low scores when it is difficult to confirm damage such as cracks in structural members due to finishing materials. This causes the evaluation results to be underestimated regardless of the actual safety status of the structure, resulting in an increase in the number of aging buildings subject to safety inspection. Accordingly, this study proposed a thermal imaging technique, a non-destructive and non-contact inspection, to detect cracks inside finishing materials. A concrete specimen was produced to observe cracks inside the finishing material using a thermal imaging camera, and thermal image data was measured by exciting a heat source on the concrete surface and cracked area. As a result of the measurement, it was confirmed that it was possible to observe cracks inside the finishing material with a width of 0.3mm, 0.5mm, and 0.7mm, but it was difficult to determine the cracks due to uneven temperature distribution due to surface peeling and peeling of the wallpaper. Accordingly, as a result of performing data analysis by deriving the amplitude and phase difference of the thermal image data, clear crack measurement was possible for 0.5mm and 0.7mm cracks. Based on this study, we hope to increase the efficiency of field application and analysis through the development of technology using big data-based deep learning in the diagnosis of internal crack damage in finishing materials.