• Title/Summary/Keyword: Unstructured data analysis

Search Result 426, Processing Time 0.031 seconds

A study on Utilization of Big Data Based on the Personal Information Protection Act (개인정보보호법에 기반한 빅데이터 활용 방안 연구)

  • Kim, Byung-Chul
    • Journal of Digital Convergence
    • /
    • v.12 no.12
    • /
    • pp.87-92
    • /
    • 2014
  • We have noted a possibility of big data as a solution of social problem and pending issue. At the same time big data has a problem of privacy. Big data and privacy were in conflict. In this paper we pointed out that issue and propose a planning of big data based on privacy using case study of advanced country.

The Effect of Medical Service Design Thinking Teaching-learning on Empathic Problem Solving Ability: Convergence Analysis of Structured and Unstructured Data (의료서비스 디자인싱킹 교육의 공감적 문제해결능력 향상 효과: 정형 및 비정형 데이터 융복합 분석 중심으로)

  • Yoo, Jin-Yeong
    • Journal of Digital Convergence
    • /
    • v.18 no.6
    • /
    • pp.311-321
    • /
    • 2020
  • The purpose of the study is to verify the effectiveness the Freshman Preliminary Health Administrators(FPHA)' Empathic Problem Solving Ability(EPSA) through the application of Medical Service Design Thinking(MSDT) conducted by undergraduate school of SNS hospital marketing education. The pre-post questionnaire survey was conducted on 39 students in the freshman year of the Department of Health Administration after applying MSDT for 15 weeks from September to December, 2019 at a college in Daegu. MSDT was positive influenced on the improvement of Empathic Imagine, Empathic interest, Empathic awakening of the FPHA' EPSA. In the analysis of key common words, the use of neutral and negative words was low, while the use of positive words was high. In order to systematically equip Empathic problem solving job competency in the age of artificial intelligence, it is meaningful to develop a program for the freshmen curriculum and to conduct a analysis of the structured and unstructured data to verify its effectiveness. Additional program development research is needed for the application of theoretical subjects.

The Study of Chronic Kidney Disease Classification using KHANES data (국민건강영양조사 자료를 이용한 만성신장질환 분류기법 연구)

  • Lee, Hong-Ki;Myoung, Sungmin
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2020.01a
    • /
    • pp.271-272
    • /
    • 2020
  • Data mining is known useful in medical area when no availability of evidence favoring a particular treatment option is found. Huge volume of structured/unstructured data is collected by the healthcare field in order to find unknown information or knowledge for effective diagnosis and clinical decision making. The data of 5,179 records considered for analysis has been collected from Korean National Health and Nutrition Examination Survey(KHANES) during 2-years. Data splitting, referred as the training and test sets, was applied to predict to fit the model. We analyzed to predict chronic kidney disease (CKD) using data mining method such as naive Bayes, logistic regression, CART and artificial neural network(ANN). This result present to select significant features and data mining techniques for the lifestyle factors related CKD.

  • PDF

A study on Korean language processing using TF-IDF (TF-IDF를 활용한 한글 자연어 처리 연구)

  • Lee, Jong-Hwa;Lee, MoonBong;Kim, Jong-Weon
    • The Journal of Information Systems
    • /
    • v.28 no.3
    • /
    • pp.105-121
    • /
    • 2019
  • Purpose One of the reasons for the expansion of information systems in the enterprise is the increased efficiency of data analysis. In particular, the rapidly increasing data types which are complex and unstructured such as video, voice, images, and conversations in and out of social networks. The purpose of this study is the customer needs analysis from customer voices, ie, text data, in the web environment.. Design/methodology/approach As previous study results, the word frequency of the sentence is extracted as a word that interprets the sentence has better affects than frequency analysis. In this study, we applied the TF-IDF method, which extracts important keywords in real sentences, not the TF method, which is a word extraction technique that expresses sentences with simple frequency only, in Korean language research. We visualized the two techniques by cluster analysis and describe the difference. Findings TF technique and TF-IDF technique are applied for Korean natural language processing, the research showed the value from frequency analysis technique to semantic analysis and it is expected to change the technique by Korean language processing researcher.

Study on the Methodology for Extracting Information from SNS Using a Sentiment Analysis (SNS 감성분석을 이용한 정보 추출 방법론에 관한 연구)

  • Hong, Doopyo;Jeong, Harim;Park, Sangmin;Han, Eum;Kim, Honghoi;Yun, Ilsoo
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.16 no.6
    • /
    • pp.141-155
    • /
    • 2017
  • As the use of SNS becomes more active, many people are posting their thoughts about specific events in their SNS in the form of text. As a result, SNS is used in various fields such as finance and distribution to conduct service satisfaction surveys and consumer monitoring. However, in the transportation area, there are not enough cases to utilize unstructured data analysis such as emotional analysis. In this study, we developed an emotional analysis methodology that can be used in transportation by using highway VOC data, which is atypical data collected by Korea Expressway Corporation. The developed methodology consists of morpheme analysis, emotional dictionary construction, and emotional discrimination of the collected unstructured data. The developed methodology was verified using highway related tweet data. As a result of the analysis, it can be guessed that many information and information about the construction and the accident were related to the highway during the analysis period. Also, it seems that users complain about the delay caused by construction and accident.

A Meta Analysis of the Edible Insects (식용곤충 연구 메타 분석)

  • Yu, Ok-Kyeong;Jin, Chan-Yong;Nam, Soo-Tai;Lee, Hyun-Chang
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2018.10a
    • /
    • pp.182-183
    • /
    • 2018
  • Big data analysis is the process of discovering a meaningful correlation, pattern, and trends in large data set stored in existing data warehouse management tools and creating new values. In addition, by extracts new value from structured and unstructured data set in big volume means a technology to analyze the results. Most of the methods of Big data analysis technology are data mining, machine learning, natural language processing, pattern recognition, etc. used in existing statistical computer science. Global research institutes have identified Big data as the most notable new technology since 2011.

  • PDF

The Frequency Analysis of Teacher's Emotional Response in Mathematics Class (수학 담화에서 나타나는 교사의 감성적 언어 빈도 분석)

  • Son, Bok Eun;Ko, Ho Kyoung
    • Communications of Mathematical Education
    • /
    • v.32 no.4
    • /
    • pp.555-573
    • /
    • 2018
  • The purpose of this study is to identify the emotional language of math teachers in math class using text mining techniques. For this purpose, we collected the discourse data of the teachers in the class by using the excellent class video. The analysis of the extracted unstructured data proceeded to three stages: data collection, data preprocessing, and text mining analysis. According to text mining analysis, there was few emotional language in teacher's response in mathematics class. This result can infer the characteristics of mathematics class in the aspect of affective domain.

Understanding Facility Management on Tunnel through Text Mining of Precision Safety Diagnosis Data (터널시설물 점검진단 데이터의 텍스트마이닝 분석을 통한 유형별·지역별 중점 유지관리요소의 이해)

  • Seo, Jeong-eun;Oh, Jintak
    • Journal of Korean Association for Spatial Structures
    • /
    • v.21 no.3
    • /
    • pp.85-92
    • /
    • 2021
  • The purpose of this paper is to understand the key factors for efficient maintenance of rapidly aging facilities. Therefore, the safety inspection/diagnosis reports accumulated in the unstructured data were collected and preprocessed. Then, the analysis was performed using a text mining analysis method. The derived vulnerabilities of tunnel facilities can be used as elements of inspections that take into account the characteristics of individual facilities during regular inspections and daily inspections in the short term. In addition, if detailed specification information and other inspection results(safety, durability, and ease of use) are used for analysis, it provides a stepping stone for supporting preemptive maintenance decision-making in the long term.

Rating and Comments Mining Using TF-IDF and SO-PMI for Improved Priority Ratings

  • Kim, Jinah;Moon, Nammee
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.11
    • /
    • pp.5321-5334
    • /
    • 2019
  • Data mining technology is frequently used in identifying the intention of users over a variety of information contexts. Since relevant terms are mainly hidden in text data, it is necessary to extract them. Quantification is required in order to interpret user preference in association with other structured data. This paper proposes rating and comments mining to identify user priority and obtain improved ratings. Structured data (location and rating) and unstructured data (comments) are collected and priority is derived by analyzing statistics and employing TF-IDF. In addition, the improved ratings are generated by applying priority categories based on materialized ratings through Sentiment-Oriented Point-wise Mutual Information (SO-PMI)-based emotion analysis. In this paper, an experiment was carried out by collecting ratings and comments on "place" and by applying them. We confirmed that the proposed mining method is 1.2 times better than the conventional methods that do not reflect priorities and that the performance is improved to almost 2 times when the number to be predicted is small.

Recent deep learning methods for tabular data

  • Yejin Hwang;Jongwoo Song
    • Communications for Statistical Applications and Methods
    • /
    • v.30 no.2
    • /
    • pp.215-226
    • /
    • 2023
  • Deep learning has made great strides in the field of unstructured data such as text, images, and audio. However, in the case of tabular data analysis, machine learning algorithms such as ensemble methods are still better than deep learning. To keep up with the performance of machine learning algorithms with good predictive power, several deep learning methods for tabular data have been proposed recently. In this paper, we review the latest deep learning models for tabular data and compare the performances of these models using several datasets. In addition, we also compare the latest boosting methods to these deep learning methods and suggest the guidelines to the users, who analyze tabular datasets. In regression, machine learning methods are better than deep learning methods. But for the classification problems, deep learning methods perform better than the machine learning methods in some cases.