• Title/Summary/Keyword: data analytics

Search Result 555, Processing Time 0.025 seconds

Collision Cause-Providing Ratio Prediction Model Using Natural Language Processing Analytics (자연어 처리 기법을 활용한 충돌사고 원인 제공 비율 예측 모델 개발)

  • Ik-Hyun Youn;Hyeinn Park;Chang-Hee, Lee
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.30 no.1
    • /
    • pp.82-88
    • /
    • 2024
  • As the modern maritime industry rapidly progresses through technological advancements, data processing technology is emphasized as a key driver of this development. Natural language processing is a technology that enables machines to understand and process human language. Through this methodology, we aim to develop a model that predicts the proportions of outcomes when entering new written judgments by analyzing the rulings of the Marine Safety Tribunal and learning the cause-providing ratios of previously adjudicated ship collisions. The model calculated the cause-providing ratios of the accident using the navigation applied at the time of the accident and the weight of key keywords that affect the cause-providing ratios. Through this, the accuracy of the developed model could be analyzed, the practical applicability of the model could be reviewed, and it could be used to prevent the recurrence of collisions and resolve disputes between parties involved in marine accidents.

Mining Intellectual History Using Unstructured Data Analytics to Classify Thoughts for Digital Humanities (디지털 인문학에서 비정형 데이터 분석을 이용한 사조 분류 방법)

  • Seo, Hansol;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.141-166
    • /
    • 2018
  • Information technology improves the efficiency of humanities research. In humanities research, information technology can be used to analyze a given topic or document automatically, facilitate connections to other ideas, and increase our understanding of intellectual history. We suggest a method to identify and automatically analyze the relationships between arguments contained in unstructured data collected from humanities writings such as books, papers, and articles. Our method, which is called history mining, reveals influential relationships between arguments and the philosophers who present them. We utilize several classification algorithms, including a deep learning method. To verify the performance of the methodology proposed in this paper, empiricists and rationalism - related philosophers were collected from among the philosophical specimens and collected related writings or articles accessible on the internet. The performance of the classification algorithm was measured by Recall, Precision, F-Score and Elapsed Time. DNN, Random Forest, and Ensemble showed better performance than other algorithms. Using the selected classification algorithm, we classified rationalism or empiricism into the writings of specific philosophers, and generated the history map considering the philosopher's year of activity.

Medical Characteristics of the Elderly Pedestrian Inpatient in Traffic Accident (노인 보행자 운수사고 입원환자의 의료적 특성연구)

  • Park, Hye-Seon;Kim, Sang-Mi
    • Journal of Digital Convergence
    • /
    • v.17 no.12
    • /
    • pp.345-352
    • /
    • 2019
  • This study aims to analyze the factors affecting the length of stay in elderly pediatric inpatients in traffic accidents. We used Korean National Hospital Discharge In-depth Injury data on the discharged from 2012 to 2016. Statistically significant factors affecting the length of stay are admission route, Charlson Comorbidity Index(CCI), injury parts, operation, results, hospital area, and beds for hospitals. The length of stay was shorter in the case of the admission route of the outpatient department than the emergency room, the results were not improved or death rather than improved, and the bed size was 500-999 beds or over 1000 beds rather than 100-299 beds. However, the length of stay was longer in the case of CCI score was 1-2 or over 3 rather than 0, injury parts were other parts rather than head/neck, when the operation was yes, and when the hospital area was a province, metropolitan rather than Seoul. This study intends to understand the medical characteristics of inpatient to prevent pedestrian traffic accidents in accordance with the population aging. Based on this finding, we wish to be used as the basic data for the establishment of policies to effectively manage traffic safety and medical resources in consideration of the characteristics of the elderly people.

The Study of Developing Korean SentiWordNet for Big Data Analytics : Focusing on Anger Emotion (빅데이터 분석을 위한 한국어 SentiWordNet 개발 방안 연구 : 분노 감정을 중심으로)

  • Choi, Sukjae;Kwon, Ohbyung
    • The Journal of Society for e-Business Studies
    • /
    • v.19 no.4
    • /
    • pp.1-19
    • /
    • 2014
  • Efforts to identify user's recognition which exists in the big data are being conducted actively. They try to measure scores of people's view about products, movies and social issues by analyzing statements raised on Internet bulletin boards or SNS. So this study deals with the problem of determining how to find the emotional vocabulary and the degree of these values. The survey methods are using the results of previous studies for the basic emotional vocabulary and degree, and inferring from the dictionary's glosses for the extended emotional vocabulary. The results were found to have the 4 emotional words lists (vocabularies) as basic emotional list, extended 1 stratum 1 level list from basic vocabulary's glosses, extended 2 stratum 1 level list from glosses of non-emotional words, and extended 2 stratum 2 level list from glosses' glosses. And we obtained the emotional degrees by applying the weight of the sentences and the emphasis multiplier values on the basis of basic emotional list. Experimental results have been identified as AND and OR sentence having a weight of average degree of included words. And MULTIPLY sentence having 1.2 to 1.5 weight depending on the type of adverb. It is also assumed that NOT sentence having a certain degree by reducing and reversing the original word's emotional degree. It is also considered that emphasis multiplier values have 2 for 1 stratum and 3 for 2 stratum.

Prediction of Traffic Congestion in Seoul by Deep Neural Network (심층인공신경망(DNN)과 다각도 상황 정보 기반의 서울시 도로 링크별 교통 혼잡도 예측)

  • Kim, Dong Hyun;Hwang, Kee Yeon;Yoon, Young
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.18 no.4
    • /
    • pp.44-57
    • /
    • 2019
  • Various studies have been conducted to solve traffic congestions in many metropolitan cities through accurate traffic flow prediction. Most studies are based on the assumption that past traffic patterns repeat in the future. Models based on such an assumption fall short in case irregular traffic patterns abruptly occur. Instead, the approaches such as predicting traffic pattern through big data analytics and artificial intelligence have emerged. Specifically, deep learning algorithms such as RNN have been prevalent for tackling the problems of predicting temporal traffic flow as a time series. However, these algorithms do not perform well in terms of long-term prediction. In this paper, we take into account various external factors that may affect the traffic flows. We model the correlation between the multi-dimensional context information with temporal traffic speed pattern using deep neural networks. Our model trained with the traffic data from TOPIS system by Seoul, Korea can predict traffic speed on a specific date with the accuracy reaching nearly 90%. We expect that the accuracy can be improved further by taking into account additional factors such as accidents and constructions for the prediction.

A Study on the Perceptions and Current Practices in Estimating Risk Cost of Contractor's Construction Budget - Focused on Building Projects - (종합건설사 실행예산 편성 시 리스크 비용 산정에 관한 인식 및 실태에 관한 연구 - 건축공사를 중심으로 -)

  • Choi, Jeong Won;Kim, Han Soo
    • Korean Journal of Construction Engineering and Management
    • /
    • v.23 no.3
    • /
    • pp.13-24
    • /
    • 2022
  • Construction projects are exposed to various types of risks, which tend to increase. The increasing risks call for contractors' more attentions to forecasting and dealing with these risks. One of the measures to deal with contractors' risks is to forecast or estimate risk cost and include it in the construction budget. Although various researches in relation to risk cost have been observed, little attention has been paid to general contractors' perceptions and current practices in estimating risk cost of construction budget. The objective of the study is to identify and discuss key characteristics and implications based on the survey and analysis of general contractors' perceptions and current practices in estimating risk cost of construction budget. The study shows that there is a gap between the perception and the practice of estimating risk cost, that is, high perception of the importance of risk cost and a relatively low level of practice. It suggests that historical cost data, guidelines and corporate-level standard procedures are required to improve the current practice in addition to sufficient time allocations for risk cost estimating. It discusses that there is a need for using sophisticated estimating techniques including bid data analytics despite a low level of the current adoption, and also proposes that research and development in the field of the sophisticated estimating techniques should be further implemented in order to increase their practicality.

Factors Affecting Falls of Demented Inpatients (치매 입원환자의 낙상 영향 요인)

  • Kim, Sang-Mi;Lee, Seong-A
    • 한국노년학
    • /
    • v.39 no.2
    • /
    • pp.231-240
    • /
    • 2019
  • The study aimed to identify risk factors for falls as well as hospitalization status according to disease and demographic characteristics of demented inpatients by investigating the in-depth Injury Patient Surveillance System data collected by Korea Centers for Disease Control and Prevention(KCDC). Older adults over 60 years old who were diagnosed with dementia were included(n=1,732). Their data were analyzed after being assigned to either a fall group or a non-fall group. STATA was used for statistical analyses, such as frequency analysis, chi-square (χ2) test, and logistics regression. It was found that 8.0% of the demented inpatients experienced falls. According to the analysis on category of fall and non-fall group were statistically significant difference in age and Charlson Comorbidity Index(CCI) and bone density deficiency. Based on the logistic regression analysis of factors affecting falls, older adults over 80 are 2.386 times more likely to fall and based on a target with a CCI of 0, the risk of falls is 0.421 times lower, finally based on those without bone density disorder, the fall risk for those with bone density disorder was 3.581 times higher. Therefore, we expect that the important about the factors relating to falls identified in this can not only be found valuable for educating inpatients with dementia and care-givers, but also be used as reference that supports clinical professionals to make decisions on falls management for patients with dementia.

Conditional Generative Adversarial Network based Collaborative Filtering Recommendation System (Conditional Generative Adversarial Network(CGAN) 기반 협업 필터링 추천 시스템)

  • Kang, Soyi;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.157-173
    • /
    • 2021
  • With the development of information technology, the amount of available information increases daily. However, having access to so much information makes it difficult for users to easily find the information they seek. Users want a visualized system that reduces information retrieval and learning time, saving them from personally reading and judging all available information. As a result, recommendation systems are an increasingly important technologies that are essential to the business. Collaborative filtering is used in various fields with excellent performance because recommendations are made based on similar user interests and preferences. However, limitations do exist. Sparsity occurs when user-item preference information is insufficient, and is the main limitation of collaborative filtering. The evaluation value of the user item matrix may be distorted by the data depending on the popularity of the product, or there may be new users who have not yet evaluated the value. The lack of historical data to identify consumer preferences is referred to as data sparsity, and various methods have been studied to address these problems. However, most attempts to solve the sparsity problem are not optimal because they can only be applied when additional data such as users' personal information, social networks, or characteristics of items are included. Another problem is that real-world score data are mostly biased to high scores, resulting in severe imbalances. One cause of this imbalance distribution is the purchasing bias, in which only users with high product ratings purchase products, so those with low ratings are less likely to purchase products and thus do not leave negative product reviews. Due to these characteristics, unlike most users' actual preferences, reviews by users who purchase products are more likely to be positive. Therefore, the actual rating data is over-learned in many classes with high incidence due to its biased characteristics, distorting the market. Applying collaborative filtering to these imbalanced data leads to poor recommendation performance due to excessive learning of biased classes. Traditional oversampling techniques to address this problem are likely to cause overfitting because they repeat the same data, which acts as noise in learning, reducing recommendation performance. In addition, pre-processing methods for most existing data imbalance problems are designed and used for binary classes. Binary class imbalance techniques are difficult to apply to multi-class problems because they cannot model multi-class problems, such as objects at cross-class boundaries or objects overlapping multiple classes. To solve this problem, research has been conducted to convert and apply multi-class problems to binary class problems. However, simplification of multi-class problems can cause potential classification errors when combined with the results of classifiers learned from other sub-problems, resulting in loss of important information about relationships beyond the selected items. Therefore, it is necessary to develop more effective methods to address multi-class imbalance problems. We propose a collaborative filtering model using CGAN to generate realistic virtual data to populate the empty user-item matrix. Conditional vector y identify distributions for minority classes and generate data reflecting their characteristics. Collaborative filtering then maximizes the performance of the recommendation system via hyperparameter tuning. This process should improve the accuracy of the model by addressing the sparsity problem of collaborative filtering implementations while mitigating data imbalances arising from real data. Our model has superior recommendation performance over existing oversampling techniques and existing real-world data with data sparsity. SMOTE, Borderline SMOTE, SVM-SMOTE, ADASYN, and GAN were used as comparative models and we demonstrate the highest prediction accuracy on the RMSE and MAE evaluation scales. Through this study, oversampling based on deep learning will be able to further refine the performance of recommendation systems using actual data and be used to build business recommendation systems.

Effect of Emotional Elements in Personal Relationships on Multiple Personas from the Perspective of Teenage SNS Users (SNS 상의 대인관계에서 나타나는 감정적 요소와 청소년의 온라인 다중정체성 간의 영향관계)

  • Choi, Bomi;Park, Minjung;Chai, Sangmi
    • Information Systems Review
    • /
    • v.18 no.2
    • /
    • pp.199-223
    • /
    • 2016
  • As social networking services (SNS) become widely used tools for maintaining social relationships, people use SNS to express themselves online. Users are free to form multiple characters in SNS because of online anonymity. This phenomenon causes SNS users to easily demonstrate multiple personas that are different from their identities in the real world. Therefore, this study focuses on online multi-personas that establish multiple fake identities in the SNS environment. The main objective of this study is to investigate factors that affect online multi-personas. Fake online identities can have various negative consequences such as cyber bullying, cyber vandalism, or antisocial behavior. Since the boundary between the online and offline worlds is fading fast, these negative aspects of online behavior may influence offline behaviors as well. This study focuses on teenagers who often create multi-personas online. According to previous studies, personal identities are usually established during a person's youth. Based on data on 664 teenage users, this study identifies four emotional factors, namely, closeness with others, relative deprivation, peer pressure and social norms. According to data analysis results, three factors (except closeness with others) have positive correlations with users' multi-personas. This study contributes to the literature by identifying the factors that cause young people to form online multi-personas, an issue that has not been fully discussed in previous studies. From a practical perspective, this study provides a basis for a safe online environment by explaining the reasons for creating fake SNS identities.

A Trend Analysis and Policy proposal for the Work Permit System through Text Mining: Focusing on Text Mining and Social Network analysis (텍스트마이닝을 통한 고용허가제 트렌드 분석과 정책 제안 : 텍스트마이닝과 소셜네트워크 분석을 중심으로)

  • Ha, Jae-Been;Lee, Do-Eun
    • Journal of Convergence for Information Technology
    • /
    • v.11 no.9
    • /
    • pp.17-27
    • /
    • 2021
  • The aim of this research was to identify the issue of the work permit system and consciousness of the people on the system, and to suggest some ideas on the government policies on it. To achieve the aim of research, this research used text mining based on social data. This research collected 1,453,272 texts from 6,217 units of online documents which contained 'work permit system' from January to December, 2020 using Textom, and did text-mining and social network analysis. This research extracted 100 key words frequently mentioned from the analyses of data top-level key word frequency, and degree centrality analysis, and constituted job problem, importance of policy process, competitiveness in the respect of industries, and improvement of living conditions of foreign workers as major key words. In addition, through semantic network analysis, this research figured out major awareness like 'employment policy', and various kinds of ambient awareness like 'international cooperation', 'workers' human rights', 'law', 'recruitment of foreigners', 'corporate competitiveness', 'immigrant culture' and 'foreign workforce management'. Finally, this research suggested some ideas worth considering in establishing government policies on the work permit system and doing related researches.