• 제목/요약/키워드: text analytics

검색결과 109건 처리시간 0.02초

국내 제조업 화재사고 데이터 분석을 통한 복합 유해·위험요인 확인 (Identifying Hazard of Fire Accidents in Domestic Manufacturing Industry Using Data Analytics)

  • 김경민;서용윤;이종빈;장성록
    • 한국안전학회지
    • /
    • 제38권4호
    • /
    • pp.23-31
    • /
    • 2023
  • Revising the Occupational Safety and Health Act led to enacting and revising related laws and systems, such as placing fire observers in hot workplaces. However, the operating standards in such cases are still ambiguous. Although fire accidents occur through multiple and multi-step factors, the hazards of fire accidents have been identified in this study as individual rather than interrelated factors. The aim has been to identify multiple factors of accidents, outlining fire and explosion accidents that recently occurred in the domestic manufacturing industry. First, major keywords were extracted through text mining. Then representative accident types were derived by combining the main keywords through the co-word network analysis to identify the hazards and their relationships. The representative fire accidents were identified as six types, and their major hazards were then addressed for improving safety measures using the identification of hazards in the "Risk Assessment" tool. It is found that various safety measures, such as professional fire observers' training and clear placement standards, are needed. This study will provide useful basic data for revising practical laws and guidelines for fire accident prevention, system supplementation, safety policy establishment, and future related research.

Applications of Machine Learning Models on Yelp Data

  • Ruchi Singh;Jongwook Woo
    • Asia pacific journal of information systems
    • /
    • 제29권1호
    • /
    • pp.35-49
    • /
    • 2019
  • The paper attempts to document the application of relevant Machine Learning (ML) models on Yelp (a crowd-sourced local business review and social networking site) dataset to analyze, predict and recommend business. Strategically using two cloud platforms to minimize the effort and time required for this project. Seven machine learning algorithms in Azure ML of which four algorithms are implemented in Databricks Spark ML. The analyzed Yelp business dataset contained 70 business attributes for more than 350,000 registered business. Additionally, review tips and likes from 500,000 users have been processed for the project. A Recommendation Model is built to provide Yelp users with recommendations for business categories based on their previous business ratings, as well as the business ratings of other users. Classification Model is implemented to predict the popularity of the business as defining the popular business to have stars greater than 3 and unpopular business to have stars less than 3. Text Analysis model is developed by comparing two algorithms, uni-gram feature extraction and n-feature extraction in Azure ML studio and logistic regression model in Spark. Comparative conclusions have been made related to efficiency of Spark ML and Azure ML for these models.

감정 딥러닝 필터를 활용한 토픽 모델링 방법론 (Topic Modeling with Deep Learning-based Sentiment Filters)

  • 최병설;김남규
    • 한국정보시스템학회지:정보시스템연구
    • /
    • 제28권4호
    • /
    • pp.271-291
    • /
    • 2019
  • Purpose The purpose of this study is to propose a methodology to derive positive keywords and negative keywords through deep learning to classify reviews into positive reviews and negative ones, and then refine the results of topic modeling using these keywords. Design/methodology/approach In this study, we extracted topic keywords by performing LDA-based topic modeling. At the same time, we performed attention-based deep learning to identify positive and negative keywords. Finally, we refined the topic keywords using these keywords as filters. Findings We collected and analyzed about 6,000 English reviews of Gyeongbokgung, a representative tourist attraction in Korea, from Tripadvisor, a representative travel site. Experimental results show that the proposed methodology properly identifies positive and negative keywords describing major topics.

차세대 에너지 관련 뉴스 빅데이터 분석 (The Next Generation of Energy News Big Data Analytics)

  • 이예찬;조해찬;반재훈
    • 한국정보통신학회:학술대회논문집
    • /
    • 한국정보통신학회 2016년도 추계학술대회
    • /
    • pp.451-453
    • /
    • 2016
  • 대규모의 데이터가 생산되고 저장되는 정보화 시대에서 현재와 과거의 데이터를 바탕으로 미래를 추측하고 방향성을 알아갈 수 있는 빅데이터의 중요성이 강조되고 있다. 정형되지 못한 대규모 데이터를 빅데이터 분석 도구인 R을 통해 통계를 기초로 데이터의 정보분석과 정형화하도록 한다. 본 논문에서는 R을 이용하여 뉴스에서 나타나는 차세대 에너지 관련 빅데이터를 분석한다. 뉴스 기사에서 차세대 에너지 관련 데이터를 수집하고 수집된 키워드를 이용하여 근미래의 효율적인 차세대 에너지의 등장을 예측한다. 에너지 산업의 추진에 대한 흐름과 방향성을 제시하고 의사결정을 위한 기술적 과제를 도출함으로 탄력적인 경영과 의사결정에 도움을 주며 기술적 문제의 근원을 사전에 예측하고 방지할 수 있을 것으로 보여진다.

  • PDF

Towards Effective Entity Extraction of Scientific Documents using Discriminative Linguistic Features

  • Hwang, Sangwon;Hong, Jang-Eui;Nam, Young-Kwang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제13권3호
    • /
    • pp.1639-1658
    • /
    • 2019
  • Named entity recognition (NER) is an important technique for improving the performance of data mining and big data analytics. In previous studies, NER systems have been employed to identify named-entities using statistical methods based on prior information or linguistic features; however, such methods are limited in that they are unable to recognize unregistered or unlearned objects. In this paper, a method is proposed to extract objects, such as technologies, theories, or person names, by analyzing the collocation relationship between certain words that simultaneously appear around specific words in the abstracts of academic journals. The method is executed as follows. First, the data is preprocessed using data cleaning and sentence detection to separate the text into single sentences. Then, part-of-speech (POS) tagging is applied to the individual sentences. After this, the appearance and collocation information of the other POS tags is analyzed, excluding the entity candidates, such as nouns. Finally, an entity recognition model is created based on analyzing and classifying the information in the sentences.

Analysis of the influence of food-related social issues on corporate management performance using a portal search index

  • Yoon, Chaebeen;Hong, Seungjee;Kim, Sounghun
    • 농업과학연구
    • /
    • 제46권4호
    • /
    • pp.955-969
    • /
    • 2019
  • Analyzing on-line consumer responses is directly related to the management performance of food companies. Therefore, this study collected and analyzed data from an on-line portal site created by consumers about food companies with issues and examined the relationships between the data and the management performance. Through this process, we identified consumers' awareness of these companies obtained from big data analysis and analyzed the relationship between the results and the sales and stock prices of the companies through a time-series graph and correlation analysis. The results of this study were as follows. First, the result of the text mining analysis suggests that consumers respond more sensitively to negative issues than to positive issues. Second, the emotional analysis showed that companies' ethics issues (Enterprise 3 and 4) have a higher level of emotional continuity than that of food safety issues. It can be interpreted that the problem of ethical management has great influence on consumers' purchasing behavior. Finally, In the case of all negative food issues, the number of word frequency and emotional scores showed opposite trends. As a result of the correlation analysis, there was a correlation between word frequency and stock price in the case of all negative food issues and also between emotional scores and stock price. Recently, studies using big data analytics have been conducted in various fields. Therefore, based on this research, it is expected that studies using big data analytics will be done in the agricultural field.

상표의 소비자 인식 판단을 위한 빅데이터 활용 방안 (Big Data Application for Judgment on Consumer's Awareness of the Trademark)

  • 유현우;이환수
    • 예술인문사회 융합 멀티미디어 논문지
    • /
    • 제6권8호
    • /
    • pp.399-408
    • /
    • 2016
  • 빅데이터의 시대가 도래 하면서 지식재산권 영역에서도 빅데이터의 활용이 늘고 있는 추세이다. 한편 상표는 본질적으로 자타 상품을 식별하는 표지로서 그 목적이 소비자에게 인식되도록 하는 데 있다. 최근 이슈가 되고 있는 빅데이터 분석 기술은 상표에 있어서 이러한 소비자의 인식을 판단하는 도구로 활용될 수 있다. 그동안의 전통적인 방법으로는 소비자의 인식에 해당하는 식별력을 증명하는 것은 쉽지 않은 일이었다. 최근 상표의 식별력을 판단하기 위한 방안으로 설문조사에 대한 관심이 높아지면서 우리 상표법 시행규칙에도 설문조사가 도입되었지만 비용과 시간, 객관성, 공정성 측면에서 오류와 많은 문제가 있는 것으로 드러났다. 이에 대한 보완책으로 본 연구는 빅데이터 분석 기법을 활용하여 상표에 대한 소비자 인식을 판단할 수 있는 방안을 제시하고자 한다. 빅데이터 분석을 활용한 경우 상표 식별력 판단의 객관성을 높이는 것은 물론 상표와 관련된 다른 법률적 판단에 도움을 주는 보조 자료로 활용될 수 있을 것이다.

Data Analytics를 활용한 위험물 화재사고 분석 (Fire Accident Analysis of Hazardous Materials Using Data Analytics)

  • 신은지;고문수;신동일
    • 한국가스학회지
    • /
    • 제24권5호
    • /
    • pp.47-55
    • /
    • 2020
  • 위험물 사고는 해당 물질의 누출에 그치지 않고, 초기대응이 부적합한 경우, 화재, 폭발로 이어져 그 피해규모가 확대될 위험이 크다. 하지만 4차 산업혁명과 빅데이터 시대의 대두가 논의되고 있는 시점에서, 새로운 기법들에 바탕한 위험물 사고의 체계적인 분석은 시도되지 못하고, 단편적인 통계 수집에 그치고 있는 것이 아쉬운 실정이다. 본 연구에서는 지난 11년간(2008~2018) 축적된 소방청 위험물 화재사고 데이터를 대상으로 기계학습에 기반한 분석을 진행하였다. Text mining 분석을 통해 분석한 자료를 시각화하여 나타내었고, 아울러 위험물 화재사고 데이터에 존재하는 주요 인자를 이용해 피해규모 예측모델의 개발 가능성을 회귀분석 방법을 적용하여 탐색하였다.

Safeguarding Korean Export Trade through Social Media-Driven Risk Identification and Characterization

  • Sithipolvanichgul, Juthamon;Abrahams, Alan S.;Goldberg, David M.;Zaman, Nohel;Baghersad, Milad;Nasri, Leila;Ractham, Peter
    • Journal of Korea Trade
    • /
    • 제24권8호
    • /
    • pp.39-62
    • /
    • 2020
  • Purpose - Korean exports account for a vast proportion of Korean GDP, and large volumes of Korean products are sold in the United States. Identifying and characterizing actual and potential product hazards related to Korean products is critical to safeguard Korean export trade, as severe quality issues can impair Korea's reputation and reduce global consumer confidence in Korean products. In this study, we develop country-of-origin-based product risk analysis methods for social media with a specific focus on Korean-labeled products, for the purpose of safeguarding Korean export trade. Design/methodology - We employed two social media datasets containing consumer-generated product reviews. Sentiment analysis is a popular text mining technique used to quantify the type and amount of emotion that is expressed in the text. It is a useful tool for gathering customer opinions regarding products. Findings - We document and discuss the specific potential risks found in Korean-labeled products and explain their implications for safeguarding Korean export trade. Finally, we analyze the false positive matches that arise from the established dictionaries that were used for risk discovery and utilize these classification errors to suggest opportunities for the future refinement of the associated automated text analytic methods. Originality/value - Various studies have used online feedback from social media to analyze product defects. However, none of them links their findings to trade promotion and the protection of a specific country's exports. Therefore, it is important to fill this research gap, which could help to safeguard export trade in Korea.

라이브 커머스에서의 소비자 반응 요인 도출 : 소비자 생성 텍스트 데이터를 기반으로 (Identifying Consumer Response Factors in Live Commerce : Based on Consumer-Generated Text Data)

  • 박재형;이한솔;강주영
    • 정보화정책
    • /
    • 제30권2호
    • /
    • pp.68-85
    • /
    • 2023
  • 라이브 커머스의 방송 데이터를 수집하여 채팅 활성화 정도를 기준으로 방송을 분류하고, 방송 내의 소비자가 생성한 텍스트 반응 분포 데이터를 분석하였다. 국내 라이브 커머스 시장에서 가장 점유율이 높은 '네이버 쇼핑라이브'의 총 2,282개의 방송 가운데 시청자의 반응이 가장 활발하게 나타난 200개 방송을 선별하였으며, 그 가운데 시청자의 반응이 급격하게 증감하는 구간이 존재하는 방송을 최종적으로 선별하였다. 라이브커머스 시청 의도 및 참여 동기에 관한 기존 문헌의 변수들을 종합하여 연구 목적에 맞는 변수 테이블을 생성하였고, 이를 방송 내의 장치 및 이벤트에 대입하였다. 이를 통해 본 연구는 기존 연구들에서 발견된 소비자 반응에 관한 변수들이 방송 내의 어떤 요소에 의해 자극되는지를 확인하였으며, 라이브 커머스에 참여하는 소비자의 심리를 데이터를 통해 실증적으로 확인하였다.