• Title/Summary/Keyword: 비정형데이터

Search Result 583, Processing Time 0.029 seconds

A Machine Learning-Based Vocational Training Dropout Prediction Model Considering Structured and Unstructured Data (정형 데이터와 비정형 데이터를 동시에 고려하는 기계학습 기반의 직업훈련 중도탈락 예측 모형)

  • Ha, Manseok;Ahn, Hyunchul
    • The Journal of the Korea Contents Association
    • /
    • v.19 no.1
    • /
    • pp.1-15
    • /
    • 2019
  • One of the biggest difficulties in the vocational training field is the dropout problem. A large number of students drop out during the training process, which hampers the waste of the state budget and the improvement of the youth employment rate. Previous studies have mainly analyzed the cause of dropouts. The purpose of this study is to propose a machine learning based model that predicts dropout in advance by using various information of learners. In particular, this study aimed to improve the accuracy of the prediction model by taking into consideration not only structured data but also unstructured data. Analysis of unstructured data was performed using Word2vec and Convolutional Neural Network(CNN), which are the most popular text analysis technologies. We could find that application of the proposed model to the actual data of a domestic vocational training institute improved the prediction accuracy by up to 20%. In addition, the support vector machine-based prediction model using both structured and unstructured data showed high prediction accuracy of the latter half of 90%.

A Study on Evaluation Index of the Panelizing Optimization for Architectural Freeform Surfaces (비정형 파라메트릭 건축부재형성 및 BIM 데이터 변환 프로세스 모델에 관한 연구)

  • Ryu, Jeong-Won
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.18 no.1
    • /
    • pp.287-294
    • /
    • 2017
  • BIM technology has been used in the domestic AEC field since the middle 2000s. BIM has proved its worth in cutting-edge buildings, mega-buildings and freeform buildings in particular. Many freeform buildings could not be completed due to the low level of construction technique. However, many successful cases emerged after adopting digital technology, including BIM which encouraged architects to challenge freeform designs. The modeling software that can generate the freeform shape are not usually able to build the efficient BIM data type in the AEC industry. In this study a process model of the parametric freeform construction member generation and conversion to BIM data is shown and the prototype system is demonstrated.

Development of Machine Learning-based Construction Accident Prediction Model Using Structured and Unstructured Data of Construction Sites (건설현장 정형·비정형데이터를 활용한 기계학습 기반의 건설재해 예측 모델 개발)

  • Cho, Mingeon;Lee, Donghwan;Park, Jooyoung;Park, Seunghee
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.42 no.1
    • /
    • pp.127-134
    • /
    • 2022
  • Recently, policies and research to prevent increasing construction accidents have been actively conducted in the domestic construction industry. In previous studies, the prediction model developed to prevent construction accidents mainly used only structured data, so various characteristics of construction sites are not sufficiently considered. Therefore, in this study, we developed a machine learning-based construction accident prediction model that enables the characteristics of construction sites to be considered sufficiently by using both structured and text-type unstructured data. In this study, 6,826 cases of construction accident data were collected from the Construction Safety Management Integrated Information (CSI) for machine learning. The Decision forest algorithm and the BERT language model were used to train structured and unstructured data respectively. As a result of analysis using both types of data, it was confirmed that the prediction accuracy was 95.41 %, which is improved by about 20 % compared to the case of using only structured data. Conclusively, the performance of the predictive model was effectively improved by using the unstructured data together, and construction accidents can be expected to be reduced through more accurate prediction.

Reproduction of drought index using news big data analysis (뉴스 빅데이터 분석을 활용한 가뭄지수 재생산)

  • Jung, Jin Hong;Park, Dong Hyeok;Ahn, Jae Hyun
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2020.06a
    • /
    • pp.386-386
    • /
    • 2020
  • 가뭄은 강수, 증발산, 대기온도, 토양수분 등 다양한 수문기상학적 인자들이 복합적으로 작용하여 발생되기 때문에 가뭄의 정확한 사상을 분석하는 것은 매우 어렵다. 또한 어떤 요인을 중심으로 고려하느냐에 따라 가뭄은 다양한 시각으로 정의되고 있다. 일정기간 평균 강수량보다 적은 강수로 인해 건조한 날이 지속되는 것, 즉 기상요소를 중심으로 가뭄을 정의하는 것을 기상학적 가뭄이라 하며, 작물의 생육에 필요한 수분을 중심으로 고려하는 것을 농업적 가뭄이라 한다. 또한 하천유량, 댐 저수량 등 전반적인 수자원 공급원의 부족을 수문학적 가뭄이라 한다. 이와 같이 다양하게 나타는 가뭄의 발생특성을 정량적으로 해석하기 위해 다양한 가뭄지수가 개발되어 왔다. 그러나 현재까지 개발된 가뭄지수들은 공통적으로 정형데이터를 활용하여 산정한다. 하지만 최근에는 비정형데이터를 활용하여 지수(Index)를 산정하거나, 재난관리에 적용하는 등 비정형 데이터의 활용이 급증하고 있다. 따라서 본 연구에서는 비정형 데이터(뉴스 데이터)를 활용하여 가뭄지수를 산정하고 기존의 가뭄지수들과의 상관성 분석을 실시 한 뒤, 지수결합을 통해 가뭄사상 분석의 새로운 방안을 제시하고자 하였다. 본 연구의 공간적범위는 2014~2015 충남서북부가뭄 지역 중 가장 큰 피해를 입었던 보령지역으로 선정하였으며 시간적범위는 2013~2016년으로 설정하였다. 비정형 데이터의 구축은 크롤링(Crawling)을 활용하여 네이버 뉴스의 기사를 수집하였으며 자료의 신뢰성을 위해 URL이 동일한 중복기사 및 '보령', '가뭄' 단어가 없는 기사는 제거하였다. 구축된 데이터를 기반으로 월별 빈도를 산출하고 표준점수(Z-score)로 환산하여 가뭄지수를 산정하였다. 산정된 가뭄지수가 어떤 가뭄의 유형(기상학적, 농업적, 수문학적)을 보이는지 확인하기 위해 기존의 가뭄지수들과 상관성분석을 실시하였으며, 가장 높은 상관성을 보이는 가뭄지수와 결합을 통해 새로운 가뭄 사상을 분석하였다. 본 연구에서 진행한 가뭄사상 분석은 향후 가뭄만이 아니라 다양한 재난분야에서 비정형 데이터를 활용한 분석의 기초로자료로 활용될 수 있을 것이다.

  • PDF

Criminal Profiling Using Hierarchical Clustering of Unstructured Data (비정형 데이터의 계층적 군집화를 이용한 범죄 프로파일링)

  • Kim, YongHoon;Chung, Mokdong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2016.04a
    • /
    • pp.335-338
    • /
    • 2016
  • 최근 디지털 정보들은 각종 매체에 저장되어 다양하게 활용되고 있다. 그 중 범죄관련 비정형데이터의 분석과 활용은 범죄수사에 유용한 자료로 활용될 수 있다. 그러나 기존의 범죄통계 자료의 분석 및 활용은 정형데이터를 이용한 제한적 접근에 그치고 있다. 따라서, 본 논문은 수사 자료 중 처리되지 못한 비정형데이터를 분석, 저장, 처리하여, 수사 자료로 활용할 수 있도록 정형데이터화 함으로 범죄 프로파일링에 도움이 될 것으로 기대된다.

Analysis of drama viewership related words through unstructured data collection (비정형데이터 수집을 통한 드라마 시청률 연관어 분석)

  • Kang, Sun-Kyoung;Lee, Hyun-Chang;Shin, Seong-Yoon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.8
    • /
    • pp.1567-1574
    • /
    • 2017
  • In this paper, we analyzed the stereotyped and non - stereotyped data in order to analyze the drama 's ratings. The formalized data collection collected 19 items from the four areas of drama information, person information, broadcasting information, and audience rating information of each broadcasting company. Atypical data were collected from bulletin boards, pre - broadcast blogs and post - broadcast blogs operated by each broadcasting company using a crawling technique. As a result of comparing the differences according to the four areas for each broadcaster from the collected regular data, the results were similar to each other. And we derived seven related words by analyzing the correlation of occurrence frequencies from unstructured data collected from bulletin boards and blogs of each broadcasting company. The derived associations were obtained through reliability analysis.

Analysis of the Unstructured Traffic Report from Traffic Broadcasting Network by Adapting the Text Mining Methodology (텍스트 마이닝을 적용한 한국교통방송제보 비정형데이터의 분석)

  • Roh, You Jin;Bae, Sang Hoon
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.17 no.3
    • /
    • pp.87-97
    • /
    • 2018
  • The traffic accident reports that are generated by the Traffic Broadcasting Networks(TBN) are unstructured data. It, however, has the value as some sort of real-time traffic information generated by the viewpoint of the drives and/or pedestrians that were on the roads, the time and spots, not the offender or the victim who caused the traffic accidents. However, the traffic accident reports, which are big data, were not applied to traffic accident analysis and traffic related research commonly. This study adopting text-mining technique was able to provide a clue for utilizing it for the impacts of traffic accidents. Seven years of traffic reports were grasped by this analysis. By analyzing the reports, it was possible to identify the road names, accident spot names, time, and to identify factors that have the greatest influence on other drivers due to traffic accidents. Authors plan to combine unstructured accident data with traffic reports for further study.

A Study on the Utilization of Flood Damage Map with Crowdsourcing Data (크라우드 소싱 데이터를 적용한 홍수 피해지도 활용방안 연구)

  • Lee, Jeongha;Hwang, SeokHwan
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2022.05a
    • /
    • pp.310-310
    • /
    • 2022
  • 최근 통신의 발달로 인하여 웹(Web)상에는 다양한 데이터들이 실시간으로 생산되고 있으며 해당 내용은 다양한 산업에서 활용되고 있다. 특히 최근에는 재난과 관련 상황에서도 소셜 네트워크 서비스(SNS) 데이터가 활용되기도 하며 기존의 수치 계측 데이터가 아닌 하나의 센서 역할을 하는 개인의 비정형데이터의 업로드가 다양한 재난 모니터링 부분에 활용되고 있는 실정이다. 특히 홍수 등의 자연재해 발생 시 개개인의 업로드 한 웹 데이터에는 시간에 따른 인구의 유동성이나 간단한 위치 정보 등을 포함하여 실제 피해의 정도를 보다 빠르고 다양한 정보로 모니터링이 가능하다. 홍수 발생 시 일반적으로 활용하는 수문 데이터는 피해의 규모가 크게 예측되는 대하천 위주로 관측이 이루어지며 관측지역과 데이터의 양이 한정되어있어 비정형데이터를 함께 활용한 연구가 필요하다. 따라서 본 연구에서는 웹에 있는 비정형 데이터들을 추출해내는 웹 크롤러를 구성하고 해당 프로그램을 활용하여 추출한 데이터들에 대해 강우 사상과 공간적 패턴을 비교 분석하여 크라우드 소싱 데이터를 적용한 홍수 피해지도의 활용방안을 제시하고자 한다.

  • PDF

Study on the Application Methods of Big Data at a Corporation -Cases of A and Y corporation Big Data System Projects- (기업의 빅데이터 적용방안 연구 -A사, Y사 빅데이터 시스템 적용 사례-)

  • Lee, Jae Sung;Hong, Sung Chan
    • Journal of Internet Computing and Services
    • /
    • v.15 no.1
    • /
    • pp.103-112
    • /
    • 2014
  • In recent years, the rapid diffusion of smart devices and growth of internet usage and social media has led to a constant production of huge amount of valuable data set that includes personal information, buying patterns, location information and other things. IT and Production Infrastructure has also started to produce its own data with the vitalization of M2M (Machine-to-Machine) and IoT (Internet of Things). This analysis study researches the applicable effects of Structured and Unstructured Big Data in various business circumstances, and purposes to find out the value creation method for a corporation through the Structured and Unstructured Big Data case studies. The result demonstrates that corporations looking for the optimized big data utilization plan could maximize their creative values by utilizing Unstructured and Structured Big Data generated interior and exterior of corporations.

Analysis of related words of drama viewership through SNS unstructured data crawling (SNS 비정형데이터 크롤링을 통한 드라마 시청률의 연관어 분석)

  • Kang, Sun-Kyoung;Lee, Hyun-Chang;Shin, Seong-Yoon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2017.05a
    • /
    • pp.169-170
    • /
    • 2017
  • In this paper, we analyze contents of formal and non - standardized data to understand what factors affect the ratings of drama. The formalized data collection collected 19 items from the four areas of drama information, person information, broadcasting information, and audience rating information of each broadcasting company. In order to collect unstructured data, crawling techniques were used to collect bulletin boards, pre - broadcast blogs and post - broadcast blogs for each drama. From the collected data, it was found that the differences according to broadcasting time, the start time, genre, and day of broadcasting were similar among broadcasting companies.

  • PDF