• Title/Summary/Keyword: Unstructured data

Search Result 720, Processing Time 0.026 seconds

Interpretation and Prediction of Situations on the Korean Peninsula by Peace Index Analysis from Unstructured Data (비정형자료로부터의 평화지수 분석을 통한 한반도 정세 파악 방법)

  • Kwon, Ohbyung;Park, Dasol;Choi, Jihye;Lee, Jaeyoon
    • Journal of Information Technology Services
    • /
    • v.12 no.4
    • /
    • pp.423-434
    • /
    • 2013
  • Since acquiring intelligence about political situations around the Korea Peninsular in a direct manner is nearly impossible, it is inevitable for the individuals or companies to rely on open and indirect data such as newspapers. However, since the contents in the newspapers are substantially unstructured and very large, conventional content analysis is time-consuming and hence very costly. Hence, this paper aims to propose a sentimental analysis method which computes daily 'peace index' from unstructured data in the newspapers. From the content analysis, words and phrases which represent the sentiment of a nation are carefully identified. To show the feasibility of the idea proposed in this paper, a prototype system with vocabulary repository about political situations was developed for estimating peace index automatically.

Relations Between Paprika Consumption and Unstructured Big Data, and Paprika Consumption Prediction

  • Cho, Yongbeen;Oh, Eunhwa;Cho, Wan-Sup;Nasridinov, Aziz;Yoo, Kwan-Hee;Rah, HyungChul
    • International Journal of Contents
    • /
    • v.15 no.4
    • /
    • pp.113-119
    • /
    • 2019
  • It has been reported that large amounts of information on agri-foods were delivered to consumers through television and social networks, and the information may influence consumers' behavior. The purpose of this paper was first to analyze relations of social network service and broadcasting program on paprika consumption in the aspect of amounts to purchase and identify potential factors that can promote paprika consumption; second, to develop prediction models of paprika consumption by using structured and unstructured big data. By using data 2010-2017, cross-correlation and time-series prediction algorithms (autoregressive exogenous model and vector error correction model), statistically significant correlations between paprika consumption and television programs/shows and blogs mentioning paprika and diet were identified with lagged times. When paprika and diet related data were added for prediction, these data improved the model predictability. This is the first report to predict paprika consumption by using structured and unstructured data.

Proposal of Standardization Plan for Defense Unstructured Datasets based on Unstructured Dataset Standard Format (비정형 데이터셋 표준포맷 기반 국방 비정형 데이터셋 표준화 방안 제안)

  • Yun-Young Hwang;Jiseong Son
    • Journal of Internet Computing and Services
    • /
    • v.25 no.1
    • /
    • pp.189-198
    • /
    • 2024
  • AI is accepted not only in the private sector but also in the defense sector as a cutting-edge technology that must be introduced for the development of national defense. In particular, artificial intelligence has been selected as a key task in defense science and technology innovation, and the importance of data is increasing. As the national defense department shifts from a closed data policy to data sharing and activation, efforts are being made to secure high-quality data necessary for the development of national defense. In particular, we are promoting a review of the business budget system to secure data so that related procedures can be improved to reflect the unique characteristics of AI and big data, and research and development can begin with sufficient large quantities and high-quality data. However, there is a need to establish standardization and quality standards for structured data and unstructured data at the national defense level, but the defense department is still proposing standardization and quality standards for structured data, so this needs to be supplemented. In this paper, we propose an unstructured data set standard format for defense unstructured data sets, which are most needed in defense artificial intelligence, and based on this, we propose a standardization method for defense unstructured data sets.

Standardizing Unstructured Big Data and Visual Interpretation using MapReduce and Correspondence Analysis (맵리듀스와 대응분석을 활용한 비정형 빅 데이터의 정형화와 시각적 해석)

  • Choi, Joseph;Choi, Yong-Seok
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.2
    • /
    • pp.169-183
    • /
    • 2014
  • Massive and various types of data recorded everywhere are called big data. Therefore, it is important to analyze big data and to nd valuable information. Besides, to standardize unstructured big data is important for the application of statistical methods. In this paper, we will show how to standardize unstructured big data using MapReduce which is a distribution processing system. We also apply simple correspondence analysis and multiple correspondence analysis to nd the relationship and characteristic of direct relationship words for Samsung Electronics and The Korea Economic Daily newspaper as well as Apple Inc.

Design and Implementation of Input and Output System for Unstructured Big Data (비정형 대용량 데이터 입력 및 출력 시스템 설계 및 구현)

  • Kim, Chang-Su;Shim, Kyu-Chul;Kang, Byoung-Jun;Kim, Kyung-Hwan;Jung, Hoe-Kyung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.18 no.2
    • /
    • pp.387-393
    • /
    • 2014
  • In recent years, the spread of computers is increasing, and efficient processing effort for unstructured Big Data is required. In this paper, we are proposed a system to extract the data typed in a word processor quickly by user creating and XML mapping file after converting XML data that has been entered in the office file(HWP, MS-office). In addition, we proposed a system is able to lookup the necessary data from a database by entered form in advance and convert word processor document to office files by the application program. The unstructured big data will be available to be used.

Classification of Unstructured Customer Complaint Text Data for Potential Vehicle Defect Detection (잠재적 차량 결함 탐지를 위한 비정형 고객불만 텍스트 데이터 분류)

  • Ju Hyun Jo;Chang Su Ok;Jae Il Park
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.46 no.2
    • /
    • pp.72-81
    • /
    • 2023
  • This research proposes a novel approach to tackle the challenge of categorizing unstructured customer complaints in the automotive industry. The goal is to identify potential vehicle defects based on the findings of our algorithm, which can assist automakers in mitigating significant losses and reputational damage caused by mass claims. To achieve this goal, our model uses the Word2Vec method to analyze large volumes of unstructured customer complaint data from the National Highway Traffic Safety Administration (NHTSA). By developing a score dictionary for eight pre-selected criteria, our algorithm can efficiently categorize complaints and detect potential vehicle defects. By calculating the score of each complaint, our algorithm can identify patterns and correlations that can indicate potential defects in the vehicle. One of the key benefits of this approach is its ability to handle a large volume of unstructured data, which can be challenging for traditional methods. By using machine learning techniques, we can extract meaningful insights from customer complaints, which can help automakers prioritize and address potential defects before they become widespread issues. In conclusion, this research provides a promising approach to categorize unstructured customer complaints in the automotive industry and identify potential vehicle defects. By leveraging the power of machine learning, we can help automakers improve the quality of their products and enhance customer satisfaction. Further studies can build upon this approach to explore other potential applications and expand its scope to other industries.

A Study on the Value Evaluation of the Unstructured Data within Enterprise (기업내 비정형 데이터의 가치 평가 모델에 관한 연구)

  • Jang, Man-Chul;Kim, Jeong-Su;Kim, Jong-Hee;Kim, Jong-Bae
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2014.05a
    • /
    • pp.367-369
    • /
    • 2014
  • Digital data are mostly comprised of unstructured data such as text file, office file, image file, video file, and drawing file. The recent digital data being generated and used within enterprise are sharply increasing in quantity. Those digital data are becoming significant as digital assets, but the value of digital assets is not properly evaluated. Accordingly, this study will present a model to evaluate the value of unstructured data as digital assets within enterprise and will also present a differentiated management plan for unstructured data as assets.

  • PDF

An Efficient Information Retrieval System for Unstructured Data Using Inverted Index

  • Abdullah Iftikhar;Muhammad Irfan Khan;Kulsoom Iftikhar
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.7
    • /
    • pp.31-44
    • /
    • 2024
  • The inverted index is combination of the keywords and posting lists associated for indexing of document. In modern age excessive use of technology has increased data volume at a very high rate. Big data is great concern of researchers. An efficient Document indexing in big data has become a major challenge for researchers. All organizations and web engines have limited number of resources such as space and storage which is very crucial in term of data management of information retrieval system. Information retrieval system need to very efficient. Inverted indexing technique is introduced in this research to minimize the delay in retrieval of data in information retrieval system. Inverted index is illustrated and then its issues are discussed and resolve by implementing the scalable inverted index. Then existing algorithm of inverted compared with the naïve inverted index. The Interval list of inverted indexes stores on primary storage except of auxiliary memory. In this research an efficient architecture of information retrieval system is proposed particularly for unstructured data which don't have a predefined structure format and data volume.

A Study on the Trends of Construction Safety Accident in Unstructured Text Using Topic Modeling (비정형 텍스트 기반의 토픽 모델링을 이용한 건설 안전사고 동향 분석)

  • Lee, Sang-Gyu
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.10
    • /
    • pp.176-182
    • /
    • 2018
  • In order to understand and track the trends of construction safety accident, this study shows the topic trends in the construction safety accident with LDA(Latent Dirichlet Allocation)-based topic modeling method for data analytics. Especially, it performs to figure out the main issue of construction safety accident with unstructured data analysis based on the topic modeling rather than a variety of structured data analysis for preventing to safety accident in construction industry. To apply this methodology, I randomly collected to 540 news article data about construction accident from January 2017 to February 2018. Based on the unstructured data with the LDA-based topic modeling, I found the 10 topics and identified key issues through 10 keyword in each 10 topics. I forecasted the topic issue related to construction safety accident based on analysis of time-series trends about the news data from January 2017 to February 2018. With this method, this research gives a hint about ways of using unstructured news article data to anticipate safety policy and research field and to respond to construction accident safety issues in the future.

Finite volume method for incompressible flows with unstructured triangular grids (비정렬 삼각격자 유한체적법에 의한 비압축성유동 해석)

  • ;;Kim, Jong-Tae;Maeng, Joo-Sung
    • Transactions of the Korean Society of Mechanical Engineers
    • /
    • v.19 no.11
    • /
    • pp.3031-3040
    • /
    • 1995
  • Two-dimensional incompressible Navier-Stokes equations have been solved by the node-centered finite volume method with the unstructured triangular meshes. The pressure-velocity coupling is handled by the artificial compressibility algorithm due to its computational efficiency associated with the hyperbolic nature of the resulting equations. The convective fluxes are obtained by the Roe's flux difference splitting scheme using edge-based connectivities and higher-order differences are achieved by a reconstruction procedure. The time integration is based on an explicit four-stage Runge-Kutta scheme. Numerical procedures with local time stepping and implicit residual smoothing have been implemented to accelerate the convergence for the steady-state solutions. Comparisons with experimental data and other numerical results have proven accuracy and efficiency of the present unstructured approach.