• Title/Summary/Keyword: Unstructured data analysis

Search Result 426, Processing Time 0.025 seconds

A Meta Analysis of Innovation Diffusion Theory based on Behavioral Intention of Consumer (혁신확산이론 기반 소비자 행위의도에 관한 메타분석)

  • Nam, Soo-Tai;Kim, Do-Goan;Jin, Chan-Yong
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2017.10a
    • /
    • pp.140-141
    • /
    • 2017
  • Big data analysis, in the large amount of data stored as the data warehouse which it refers the process of discovering meaningful new correlations, patterns, trends and creating new values. Thus, Big data analysis is an effective analysis of various big data that exist all over the world such as social big data, machine to machine (M2M) sensor data, and corporate customer relationship management data. In the big data era, it has become more important to effectively analyze not only structured data that is well organized in the database, but also unstructured big data such as the internet, social network services, and explosively generated web documents, e-mails, and social data in mobile environments. By the way, a meta analysis refers to a statistical literature synthesis method from the quantitative results of many known empirical studies. We reviewed a total of 750 samples among 50 studies published on the topic related as IDT between 2000 and 2017 in Korea.

  • PDF

A Development of LDA Topic Association Systems Based on Spark-Hadoop Framework

  • Park, Kiejin;Peng, Limei
    • Journal of Information Processing Systems
    • /
    • v.14 no.1
    • /
    • pp.140-149
    • /
    • 2018
  • Social data such as users' comments are unstructured in nature and up-to-date technologies for analyzing such data are constrained by the available storage space and processing time when fast storing and processing is required. On the other hand, it is even difficult in using a huge amount of dynamically generated social data to analyze the user features in a high speed. To solve this problem, we design and implement a topic association analysis system based on the latent Dirichlet allocation (LDA) model. The LDA does not require the training process and thus can analyze the social users' hourly interests on different topics in an easy way. The proposed system is constructed based on the Spark framework that is located on top of Hadoop cluster. It is advantageous of high-speed processing owing to that minimized access to hard disk is required and all the intermediately generated data are processed in the main memory. In the performance evaluation, it requires about 5 hours to analyze the topics for about 1 TB test social data (SNS comments). Moreover, through analyzing the association among topics, we can track the hourly change of social users' interests on different topics.

Visualizing Unstructured Data using a Big Data Analytical Tool R Language (빅데이터 분석 도구 R 언어를 이용한 비정형 데이터 시각화)

  • Nam, Soo-Tai;Chen, Jinhui;Shin, Seong-Yoon;Jin, Chan-Yong
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.05a
    • /
    • pp.151-154
    • /
    • 2021
  • Big data analysis is the process of discovering meaningful new correlations, patterns, and trends in large volumes of data stored in data stores and creating new value. Thus, most big data analysis technology methods include data mining, machine learning, natural language processing, and pattern recognition used in existing statistical computer science. Also, using the R language, a big data tool, we can express analysis results through various visualization functions using pre-processing text data. The data used in this study was analyzed for 21 papers in the March 2021 among the journals of the Korea Institute of Information and Communication Engineering. In the final analysis results, the most frequently mentioned keyword was "Data", which ranked first 305 times. Therefore, based on the results of the analysis, the limitations of the study and theoretical implications are suggested.

  • PDF

Topic Automatic Extraction Model based on Unstructured Security Intelligence Report (비정형 보안 인텔리전스 보고서 기반 토픽 자동 추출 모델)

  • Hur, YunA;Lee, Chanhee;Kim, Gyeongmin;Lim, HeuiSeok
    • Journal of the Korea Convergence Society
    • /
    • v.10 no.6
    • /
    • pp.33-39
    • /
    • 2019
  • As cyber attack methods are becoming more intelligent, incidents such as security breaches and international crimes are increasing. In order to predict and respond to these cyber attacks, the characteristics, methods, and types of attack techniques should be identified. To this end, many security companies are publishing security intelligence reports to quickly identify various attack patterns and prevent further damage. However, the reports that each company distributes are not structured, yet, the number of published intelligence reports are ever-increasing. In this paper, we propose a method to extract structured data from unstructured security intelligence reports. We also propose an automatic intelligence report analysis system that divides a large volume of reports into sub-groups based on their topics, making the report analysis process more effective and efficient.

Text-mining based Cause Analysis of Accidents at Workplaces in Korea (텍스트 마이닝 기법을 활용한 우리나라 산업재해의 원인분석)

  • Choi, Gi Heung
    • Journal of the Korean Society of Safety
    • /
    • v.37 no.3
    • /
    • pp.9-15
    • /
    • 2022
  • The analysis of the causes of accidents in workplaces where machines and tools are used is essential to improve the effectiveness and efficiency of safety prevention policies in places of employment in Korea. The causes of workplace accidents are not fully understood mainly due to difficulties in analyzing available descriptive information. This study focuses on the automated accident cause analysis in workplaces based on the accident abstracts found in industrial accident reports written in an unstructured descriptive format. The method proposed in this paper is based on text data mining and uses the keyword search function of Excel software to automate the analysis. The analysis results indicate that the primary reason for the frequency of accidents is related to technical aspects at a stage in which dangerous situations occur in the workplace. Accidents due to managerial causes are typically observed when danger exists in the workplace; however, managerial actions play a more important role in reducing accident severity. A small company tends to use unsafe machines and devices, leading to further accidents due to technical causes, whereas managerial causes are more conspicuous as the company grows. To preclude the occurrence of accidents due to inadequate knowledge, the implementation of safety management and the provision of safety education to elderly workers at the early stage of their employment are particularly important for small companies with less than 100 workers.

XAI Research Trends Using Social Network Analysis and Topic Modeling (소셜 네트워크 분석과 토픽 모델링을 활용한 설명 가능 인공지능 연구 동향 분석)

  • Gun-doo Moon;Kyoung-jae Kim
    • Journal of Information Technology Applications and Management
    • /
    • v.30 no.1
    • /
    • pp.53-70
    • /
    • 2023
  • Artificial intelligence has become familiar with modern society, not the distant future. As artificial intelligence and machine learning developed more highly and became more complicated, it became difficult for people to grasp its structure and the basis for decision-making. It is because machine learning only shows results, not the whole processes. As artificial intelligence developed and became more common, people wanted the explanation which could provide them the trust on artificial intelligence. This study recognized the necessity and importance of explainable artificial intelligence, XAI, and examined the trends of XAI research by analyzing social networks and analyzing topics with IEEE published from 2004, when the concept of artificial intelligence was defined, to 2022. Through social network analysis, the overall pattern of nodes can be found in a large number of documents and the connection between keywords shows the meaning of the relationship structure, and topic modeling can identify more objective topics by extracting keywords from unstructured data and setting topics. Both analysis methods are suitable for trend analysis. As a result of the analysis, it was found that XAI's application is gradually expanding in various fields as well as machine learning and deep learning.

Mining Intellectual History Using Unstructured Data Analytics to Classify Thoughts for Digital Humanities (디지털 인문학에서 비정형 데이터 분석을 이용한 사조 분류 방법)

  • Seo, Hansol;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.141-166
    • /
    • 2018
  • Information technology improves the efficiency of humanities research. In humanities research, information technology can be used to analyze a given topic or document automatically, facilitate connections to other ideas, and increase our understanding of intellectual history. We suggest a method to identify and automatically analyze the relationships between arguments contained in unstructured data collected from humanities writings such as books, papers, and articles. Our method, which is called history mining, reveals influential relationships between arguments and the philosophers who present them. We utilize several classification algorithms, including a deep learning method. To verify the performance of the methodology proposed in this paper, empiricists and rationalism - related philosophers were collected from among the philosophical specimens and collected related writings or articles accessible on the internet. The performance of the classification algorithm was measured by Recall, Precision, F-Score and Elapsed Time. DNN, Random Forest, and Ensemble showed better performance than other algorithms. Using the selected classification algorithm, we classified rationalism or empiricism into the writings of specific philosophers, and generated the history map considering the philosopher's year of activity.

Using noise filtering and sufficient dimension reduction method on unstructured economic data (노이즈 필터링과 충분차원축소를 이용한 비정형 경제 데이터 활용에 대한 연구)

  • Jae Keun Yoo;Yujin Park;Beomseok Seo
    • The Korean Journal of Applied Statistics
    • /
    • v.37 no.2
    • /
    • pp.119-138
    • /
    • 2024
  • Text indicators are increasingly valuable in economic forecasting, but are often hindered by noise and high dimensionality. This study aims to explore post-processing techniques, specifically noise filtering and dimensionality reduction, to normalize text indicators and enhance their utility through empirical analysis. Predictive target variables for the empirical analysis include monthly leading index cyclical variations, BSI (business survey index) All industry sales performance, BSI All industry sales outlook, as well as quarterly real GDP SA (seasonally adjusted) growth rate and real GDP YoY (year-on-year) growth rate. This study explores the Hodrick and Prescott filter, which is widely used in econometrics for noise filtering, and employs sufficient dimension reduction, a nonparametric dimensionality reduction methodology, in conjunction with unstructured text data. The analysis results reveal that noise filtering of text indicators significantly improves predictive accuracy for both monthly and quarterly variables, particularly when the dataset is large. Moreover, this study demonstrated that applying dimensionality reduction further enhances predictive performance. These findings imply that post-processing techniques, such as noise filtering and dimensionality reduction, are crucial for enhancing the utility of text indicators and can contribute to improving the accuracy of economic forecasts.

An Algorithms for Tournament-based Big Data Analysis (토너먼트 기반의 빅데이터 분석 알고리즘)

  • Lee, Hyunjin
    • Journal of Digital Contents Society
    • /
    • v.16 no.4
    • /
    • pp.545-553
    • /
    • 2015
  • While all of the data has a value in itself, most of the data that is collected in the real world is a random and unstructured. In order to extract useful information from the data, it is need to use the data transform and analysis algorithms. Data mining is used for this purpose. Today, there is not only need for a variety of data mining techniques to analyze the data but also need for a computational requirements and rapid analysis time for huge volume of data. The method commonly used to store huge volume of data is to use the hadoop. A method for analyzing data in hadoop is to use the MapReduce framework. In this paper, we developed a tournament-based MapReduce method for high efficiency in developing an algorithm on a single machine to the MapReduce framework. This proposed method can apply many analysis algorithms and we showed the usefulness of proposed tournament based method to apply frequently used data mining algorithms k-means and k-nearest neighbor classification.

Development of Data Visualization Tools for Land-Based Fish Farm Big Data Analysis System (육상 양식장 빅데이터 분석 시스템 개발을 위한 데이터 시각화 도구 개발)

  • Seoung-Bin Ye;Jeong-Seon Park;Hyi-Thaek Ceong;Soon-Hee Han
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.19 no.4
    • /
    • pp.763-770
    • /
    • 2024
  • Currently, land-based fish farms utilizing seawater have introduced and are utilizing various equipment such as real-time water quality monitoring systems, facility automation systems, and automated dissolved oxygen supply devices. Furthermore, data collected from various equipment in these fish farms produce structured and unstructured big data related to water quality environment, facility operations, and workplace visual information. The big data generated in the operational environment of fish farms aims to improve operational and production efficiency through the development and application of various methods. This study aims to develop a system for effectively analyzing and visualizing big data produced from land-based fish farms. It proposes a data visualization process suitable for use in a fish farm big data analysis system, develops big data visualization tools, and compares the results. Additionally, it presents intuitive visualization models for exploring and comparing big data with time-series characteristics.