• Title/Summary/Keyword: Unstructured data analysis

Search Result 426, Processing Time 0.028 seconds

An Analysis of IT Proposal Evaluation Results using Big Data-based Opinion Mining (빅데이터 분석 기반의 오피니언 마이닝을 이용한 정보화 사업 평가 분석)

  • Kim, Hong Sam;Kim, Chong Su
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.41 no.1
    • /
    • pp.1-10
    • /
    • 2018
  • Current evaluation practices for IT projects suffer from several problems, which include the difficulty of self-explanation for the evaluation results and the improperly scaled scoring system. This study aims to develop a methodology of opinion mining to extract key factors for the causal relationship analysis and to assess the feasibility of quantifying evaluation scores from text comments using opinion mining based on big data analysis. The research has been performed on the domain of publicly procured IT proposal evaluations, which are managed by the National Procurement Service. Around 10,000 sets of comments and evaluation scores have been gathered, most of which are in the form of digital data but some in paper documents. Thus, more refined form of text has been prepared using various tools. From them, keywords for factors and polarity indicators have been extracted, and experts on this domain have selected some of them as the key factors and indicators. Also, those keywords have been grouped into into dimensions. Causal relationship between keyword or dimension factors and evaluation scores were analyzed based on the two research models-a keyword-based model and a dimension-based model, using the correlation analysis and the regression analysis. The results show that keyword factors such as planning, strategy, technology and PM mostly affects the evaluation result and that the keywords are more appropriate forms of factors for causal relationship analysis than the dimensions. Also, it can be asserted from the analysis that evaluation scores can be composed or calculated from the unstructured text comments using opinion mining, when a comprehensive dictionary of polarity for Korean language can be provided. This study may contribute to the area of big data-based evaluation methodology and opinion mining for IT proposal evaluation, leading to a more reliable and effective IT proposal evaluation method.

A Meta-Analysis of Influencing Soybean Food Interventions on the Metabolic Syndrome Risk Factors Utilizing Big Data (빅 데이터 분석을 활용한 콩 식품 중재가 대사증후군 위험요인에 미치는 영향 메타분석)

  • Yu, Ok-Kyeong;Cha, Youn-Soo;Jin, Chan-Yong;Nam, Soo-Tai
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2016.05a
    • /
    • pp.134-137
    • /
    • 2016
  • Big data analysis refers the ability to store, manage and analyze collected data from an existing database management tool. In addition, extract value from large amounts of structured or unstructured data set and means the technology to analyze the results. Meta-analysis is a statistical integration method that delivers an opportunity to overview the entire result of integrating and analyzing many quantitative research results. Meta-analysis is sometimes expressed as an analysis of another analysis. Commonly, factors of metabolic syndrome can be defined as abdominal obesity, high triglycerides, low high density lipoprotein cholesterol, elevated blood pressure, and elevated fasting glucose. This study will find meaningful mediator variables for criterion variables that affect before and after the metabolic syndrome studies, on the basis of the results of a meta-analysis. We reviewed a total of 5 studies related to metabolic syndrome published in Korea between 2000 and 2016, where a cause and effect relationship is established between variables that are specified in the conceptual model of this study.

  • PDF

A Study on Effective Sentiment Analysis through News Classification in Bankruptcy Prediction Model (부도예측 모형에서 뉴스 분류를 통한 효과적인 감성분석에 관한 연구)

  • Kim, Chansong;Shin, Minsoo
    • Journal of Information Technology Services
    • /
    • v.18 no.1
    • /
    • pp.187-200
    • /
    • 2019
  • Bankruptcy prediction model is an issue that has consistently interested in various fields. Recently, as technology for dealing with unstructured data has been developed, researches applied to business model prediction through text mining have been activated, and studies using this method are also increasing in bankruptcy prediction. Especially, it is actively trying to improve bankruptcy prediction by analyzing news data dealing with the external environment of the corporation. However, there has been a lack of study on which news is effective in bankruptcy prediction in real-time mass-produced news. The purpose of this study was to evaluate the high impact news on bankruptcy prediction. Therefore, we classify news according to type, collection period, and analyzed the impact on bankruptcy prediction based on sentiment analysis. As a result, artificial neural network was most effective among the algorithms used, and commentary news type was most effective in bankruptcy prediction. Column and straight type news were also significant, but photo type news was not significant. In the news by collection period, news for 4 months before the bankruptcy was most effective in bankruptcy prediction. In this study, we propose a news classification methods for sentiment analysis that is effective for bankruptcy prediction model.

A Method of Mining Visualization Rules from Open Online Text for Situation Aware Business Chart Recommendation (상황인식형 비즈니스 차트 추천기 개발을 위한 개방형 온라인 텍스트로부터의 시각화 규칙 추출 방법 연구)

  • Zhang, Qingxuan;Kwon, Ohbyung
    • The Journal of Society for e-Business Studies
    • /
    • v.25 no.1
    • /
    • pp.83-107
    • /
    • 2020
  • Selecting business charts based on the nature of the data and the purpose of the visualization is useful in business analysis. However, current visualization tools lack the ability to help choose the right business chart for the context. Also, soliciting expert help about visualization methods for every analysis is inefficient. Therefore, the purpose of this study is to propose an accessible method to improve business chart productivity by creating rules for selecting business charts from online published documents. To this end, Korean, English, and Chinese unstructured data describing business charts were collected from the Internet, and the relationships between the contexts and the business charts were calculated using TF-IDF. We also used a Galois lattice to create rules for business chart selection. In order to evaluate the adequacy of the rules generated by the proposed method, experiments were conducted on experimental and control groups. The results confirmed that meaningful rules were extracted by the proposed method. To the best of our knowledge, this is the first study to recommend customizing business charts through open unstructured data analysis and to propose a method that enables efficient selection of business charts for office workers without expert assistance. This method should be useful for staff training by recommending business charts based on the document that he/she is working on.

Analysis of the Yearbook from the Korea Meteorological Administration using a text-mining agorithm (텍스트 마이닝 알고리즘을 이용한 기상청 기상연감 자료 분석)

  • Sun, Hyunseok;Lim, Changwon;Lee, YungSeop
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.4
    • /
    • pp.603-613
    • /
    • 2017
  • Many people have recently posted about personal interests on social media. The development of the Internet and computer technology has enabled the storage of digital forms of documents that has resulted in an explosion of the amount of textual data generated; subsequently there is an increased demand for technology to create valuable information from a large number of documents. A text mining technique is often used since text-based data is mostly composed of unstructured forms that are not suitable for the application of statistical analysis or data mining techniques. This study analyzed the Meteorological Yearbook data of the Korea Meteorological Administration (KMA) with a text mining technique. First, a term dictionary was constructed through preprocessing and a term-document matrix was generated. This term dictionary was then used to calculate the annual frequency of term, and observe the change in relative frequency for frequently appearing words. We also used regression analysis to identify terms with increasing and decreasing trends. We analyzed the trends in the Meteorological Yearbook of the KMA and analyzed trends of weather related news, weather status, and status of work trends that the KMA focused on. This study is to provide useful information that can help analyze and improve the meteorological services and reflect meteorological policy.

Evaluation Method of Big Data Efficiency (빅 데이터의 효율성 시험 평가 방법)

  • Yang, Hyeong-Sik;Kim, Sun-Bae
    • Journal of Digital Convergence
    • /
    • v.11 no.8
    • /
    • pp.31-39
    • /
    • 2013
  • Recently, integration between social media and the industry has been expended, and as the usage of Internet through various smart devices of not only the existing PC but also smart phone, tablet PC and so on, a lot of unstructured data has occurred, leading to increased interest on big data system. According to the institutes which specialize in market research, the data amount is predicted to increase by 9 folds in the next 5 years when compared to the present, and the big data market is also expected to grow bigger. This dissertation evaluates the efficiency test of big data through analysis on the requirements by identifying and fragmenting the items of efficiency quality evaluation that big data should be equipped with.

A Keyword Network Analysis of Standard Medical Terminology for Musculoskeletal System Using Big Data (빅데이터를 활용한 근골격계 표준의료용어에 대한 키워드 네트워크 분석)

  • Choi, Byung-Kwan;Choi, Eun-A;Nam, Moon-Hee
    • Journal of Digital Convergence
    • /
    • v.20 no.5
    • /
    • pp.681-693
    • /
    • 2022
  • The purpose of this study is to suggest a plan to utilize atypical data in the health care field by inferring standard medical terms related to the musculoskeletal system through keyword network analysis of medical records of patients hospitalized for musculoskeletal disorders. The analysis target was 145 summaries of discharge with musculoskeletal disorders from 2015 to 2019, and was analyzed using TEXTOM, a big data analysis solution developed by The IMC. The 177 musculoskeletal related terms derived through the primary and secondary refining processes were finally analyzed. As a result of the study, the frequent term was 'Metastasis', the clinical findings were 'Metastasis', the symptoms were 'Weakness', the diagnosis was 'Hepatitis', the treatment was 'Remove', and the body structure was 'Spine' in the analysis results for each medical terminology system. 'Oxycodone' was used the most. Based on these results, we would like to suggest implications for the analysis, utilization, and management of unstructured medical data.

Transonic/Supersonic Nonlinear Aeroelastic Analysis of a Complete Aircraft Using High Speed Parallel Processing Technique (고속 병렬처리 기법을 이용한 전기체 항공기 형상의 천음속/초음속 비선형 공탄성 해석)

  • Kim, Dong-Hyun;Kwon, Hyuk-Jun;Lee, In;Kwon, Oh-Joon;Paek, Seung-Kil;Hyun, Yong-Hee
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.30 no.8
    • /
    • pp.46-55
    • /
    • 2002
  • A nonlinear aeroelastic analysis system in transonic and supersonic flows has been developed using high speed parallel processing technique on the network based PC-clustered machines. This paper includes the coupling of advanced numerical techniques such as computational structural dynamics (CSD), finite element method (FEM) and computational fluid dynamics (CFD). The unsteady Euler solver on dynamic unstructured meshes is employed and coupled with computational aeroelastic solvers. Thus it can give very accurate engineering data in the structural and aeroelastic design of flight vehicles. To show the great potential of useful application, transonic and supersonic flutter analyses have been conducted for a complete aircraft model under developing in Korea.

For airline preferences of consumers Big Data Convergence Based Marketing Strategy (소비자의 항공사 선호도에 대한 빅데이터 융합 기반 마케팅 전략)

  • Chun, Yong-Ho;Lee, Seung-Joon;Park, Su-Hyeon
    • Journal of Industrial Convergence
    • /
    • v.17 no.3
    • /
    • pp.17-22
    • /
    • 2019
  • As the value of big data is recognized as important, it is possible to advance decision making by effectively introducing and improving the development and utilization of JAVA and R programs that can analyze vast amounts of existing and unstructured data to governments, public institutions and private businesses. In this study, news data was collated and analyzed through text mining techniques in order to establish marketing strategies based on consumers' airline preferences. This research is meaningful in establishing marketing strategies based on analysis results by analyzing consumers' airline preferences using high-level big data utilization program techniques for data that were difficult to obtain in the past.

Sentiment Analysis and Opinion Mining: literature analysis during 2007-2016 (감정분석과 오피니언 마이닝: 2007-2016)

  • Li, Jiapei;Li, Xiaomeng;Xiam, Xiam;Kang, Sun-kyung;Lee, Hyun Chang;Shin, Seong-yoon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2017.05a
    • /
    • pp.160-161
    • /
    • 2017
  • Sentiment analysis and opinion mining is the field of study that analyzes people's opinions, sentiments, evaluations, attitudes, and emotions from written language Opinion mining and sentiment analysis(OMSA) as a research discipline has emerged during last 15 years and provides a methodology to computationally process the unstructured data mainly to extract opinions and identify their sentiments. The relatively new but fast growing research discipline has changed a lot during these years. This paper presents a scientometric analysis of research work done on OMSA during 2007-2016. For the literature analysis, research publications indexed in Web of Science (WoS) database are used as input data. The publication data is analyzed computationally to identify year-wise publication pattern, rate of growth of publications, research areas. More detailed manual analysis of the data is also performed to identify popular approaches (machine learning and lexcon-based) used in these publications, levels (documents, sentences or aspect-level) of sentiment analysis work done and major application areass of OMSA.

  • PDF