• Title/Summary/Keyword: social media data

Search Result 1,219, Processing Time 0.025 seconds

Evaluation of Classification Algorithm Performance of Sentiment Analysis Using Entropy Score (엔트로피 점수를 이용한 감성분석 분류알고리즘의 수행도 평가)

  • Park, Man-Hee
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.22 no.9
    • /
    • pp.1153-1158
    • /
    • 2018
  • Online customer evaluations and social media information among a variety of information sources are critical for businesses as it influences the customer's decision making. There are limitations on the time and money that the survey will ask to identify a variety of customers' needs and complaints. The customer review data at online shopping malls provide the ideal data sources for analyzing customer sentiment about their products. In this study, we collected product reviews data on the smartphone of Samsung and Apple from Amazon. We applied five classification algorithms which are used as representative sentiment analysis techniques in previous studies. The five algorithms are based on support vector machines, bagging, random forest, classification or regression tree and maximum entropy. In this study, we proposed entropy score which can comprehensively evaluate the performance of classification algorithm. As a result of evaluating five algorithms using an entropy score, the SVMs algorithm's entropy score was ranked highest.

Title Generation Model for which Sequence-to-Sequence RNNs with Attention and Copying Mechanisms are used (주의집중 및 복사 작용을 가진 Sequence-to-Sequence 순환신경망을 이용한 제목 생성 모델)

  • Lee, Hyeon-gu;Kim, Harksoo
    • Journal of KIISE
    • /
    • v.44 no.7
    • /
    • pp.674-679
    • /
    • 2017
  • In big-data environments wherein large amounts of text documents are produced daily, titles are very important clues that enable a prompt catching of the key ideas in documents; however, titles are absent for numerous document types such as blog articles and social-media messages. In this paper, a title-generation model for which sequence-to-sequence RNNs with attention and copying mechanisms are employed is proposed. For the proposed model, input sentences are encoded based on bi-directional GRU (gated recurrent unit) networks, and the title words are generated through a decoding of the encoded sentences with keywords that are automatically selected from the input sentences. Regarding the experiments with 93631 training-data documents and 500 test-data documents, the attention-mechanism performances are more effective (ROUGE-1: 0.1935, ROUGE-2: 0.0364, ROUGE-L: 0.1555) than those of the copying mechanism; in addition, the qualitative-evaluation radiative performance of the former is higher.

Understanding the semantic change of Hangeul using word embedding (단어 임베딩 기법을 이용한 한글의 의미 변화 파악)

  • Sun, Hyunseok;Lee, Yung-Seop;Lim, Changwon
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.3
    • /
    • pp.295-308
    • /
    • 2021
  • In recent years, as many people post their interests on social media or store documents in digital form due to the development of the internet and computer technologies, the amount of text data generated has exploded. Accordingly, the demand for technology to create valuable information from numerous document data is also increasing. In this study, through statistical techniques, we investigate how the meanings of Korean words change over time by using the presidential speech records and newspaper articles public data. Using this, we present a strategy that can be utilized in the study of the synchronic change of Hangeul. The purpose of this study is to deviate from the study of the theoretical language phenomenon of Hangeul, which was studied by the intuition of existing linguists or native speakers, to derive numerical values through public documents that can be used by anyone, and to explain the phenomenon of changes in the meaning of words.

Analysis of Yoga Keywords with Media Big Data (미디어 빅데이터를 통한 요가 관련 키워드 분석)

  • Chi, Dong-Cheol;Lim, Hyu-Seong;Kim, Jong-Hyuck
    • Journal of the Korea Convergence Society
    • /
    • v.13 no.5
    • /
    • pp.365-372
    • /
    • 2022
  • South Korea is entering an aging society, and since the musculoskeletal system directly affects elders' daily life, muscle exercise and flexibility are required. In particular, yoga relaxes the mind and the body and heightens stress coping ability. To investigate keywords about yoga, news articles provided by BIGKinds, a news analysis system, was applied to collect articles from January 1, 2019, to December 31, 2021, and an analysis was conducted about the monthly keywords and the relationship followed by the weighted degree. Based on the research findings, first, it showed that there is high interest in yoga during the spring and autumn seasons. Second, yoga is offered in non-contact methods nowadays, and various social network services are applied for the operation. Third, there was high public attention to articles on yoga instructors and trainers, and this revealed the importance and interest in online coaching. It is anticipated to apply it for the development of yoga workout programs and base data to develop sports for all.

Implentation of a Model for Predicting the Distance between Hazardous Objects and Workers in the Workplace using YOLO-v4 (YOLO-v4를 활용한 작업장의 위험 객체와 작업자 간 거리 예측 모델의 구현)

  • Lee, Taejun;Cho, Minwoo;Kim, Hangil;Kim, Taekcheon;Jung, Heokyung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.10a
    • /
    • pp.332-334
    • /
    • 2021
  • As fatal accidents due to industrial accidents and deaths due to civil accidents were pointed out as social problems, the Act on Punishment of Serious Accidents Occurred in the Workplace was enacted to ensure the safety of citizens and to prevent serious accidents in advance. Effort is required. In this paper, we propose a distance prediction model in relation to the case where an operator is hit by heavy equipment such as a forklift. For the data, actual forklift trucks and workers roaming environments were directly captured by CCTV, and it was conducted based on the Euclidean distance. It is thought that it will be possible to learn YOLO-v4 by directly building a data-set at the industrial site, and then implement a model that predicts the distance and determines whether it is a dangerous situation, which can be used as basic data for a comprehensive risk situation judgment model.

  • PDF

A Design and Development of Big Data Indexing and Search System using Lucene (루씬을 이용한 빅데이터 인덱싱 및 검색시스템의 설계 및 구현)

  • Kim, DongMin;Choi, JinWoo;Woo, ChongWoo
    • Journal of Internet Computing and Services
    • /
    • v.15 no.6
    • /
    • pp.107-115
    • /
    • 2014
  • Recently, increased use of the internet resulted in generation of large and diverse types of data due to increased use of social media, expansion of a convergence of among industries, use of the various smart device. We are facing difficulties to manage and analyze the data using previous data processing techniques since the volume of the data is huge, form of the data varies and evolves rapidly. In other words, we need to study a new approach to solve such problems. Many approaches are being studied on this issue, and we are describing an effective design and development to build indexing engine of big data platform. Our goal is to build a system that could effectively manage for huge data set which exceeds previous data processing range, and that could reduce data analysis time. We used large SNMP log data for an experiment, and tried to reduce data analysis time through the fast indexing and searching approach. Also, we expect our approach could help analyzing the user data through visualization of the analyzed data expression.

An Asian Airline Implementation of Smartphone Collaboration: From Training to Operations (스마트폰을 활용한 항공사의 협업 사례 연구: 훈련 기간과 운영 기간의 차이 분석)

  • Dionne, Dante;Schutz, Douglas M.;Kim, Yong-Young
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.10
    • /
    • pp.303-313
    • /
    • 2018
  • In order to provide quality services across international airports, airline personnel must rapidly and effectively develop and share knowledge. Combining components of adaptive structuration theory (AST) and media synchronicity theory (MST), a research framework was developed to convey three distinct stages of knowledge sharing. We use the grounded theory research method for the qualitative data collected from audio transcripts of employees learning how to use and work with company issued smartphones with push-to-talk functionalities. Data was collected from 33 operations personnel. The results of the content analysis are recorded for the elements of each of the three concepts of our research framework. During the social interaction stage, the content of the audio conversations shifts mainly from conflict management to task management; for media synchronicity, from quality to quantity; for productive outcomes, from efficiency to commitment. New insights are uncovered from our analysis of data from the field as users advance from learning how to use the mobile devices, to using the devices for managing knowledge for their work in the airline industry.

Assessment of Public Awareness on Invasive Alien Species of Freshwater Ecosystem Using Conservation Culturomics (보전문화체학 접근방식을 통한 생태계교란 생물인 담수 외래종의 대중인식 평가)

  • Park, Woong-Bae;Do, Yuno
    • Journal of Wetlands Research
    • /
    • v.23 no.4
    • /
    • pp.364-371
    • /
    • 2021
  • Public awareness of alien species can vary by generation, period, or specific events associated with these species. An understanding of public awareness is important for the management of alien species because differences in public awareness can affect the establishment and implementation of management plans. We analyzed digital texts on social media platforms, news articles, and internet search volumes used in conservation culturomics to understand public interest and sentiment regarding alien freshwater species. The number of tweets, number of news articles, and relative search volume to 11 freshwater alien species were extracted to determine public interest. Additionally, the trend over time, seasonal variability, and repetition period of these data were confirmed. We also calculated the sentiment score and analyzed public sentiment in the collected data using sentiment analysis based on text mining techniques. The American bullfrog, nutria, bluegill, and largemouth bass drew relatively more public interest than other species. Some species showed repeated patterns in the number of Twitter posts, media coverage, and internet searches found according to the specified periods. The text mining analysis results showed negative sentiments from most people regarding alien freshwater species. Particularly, negative sentiments increased over the years after alien species were designated as ecologically disturbing species.

An Analysis of the Public Data for Making the Ambient Intelligent Service (공간지능화서비스 구현을 위한 공공데이터 분석)

  • Kim, Mi-Yun;Seo, Dong-Jo
    • Journal of Digital Convergence
    • /
    • v.12 no.12
    • /
    • pp.313-321
    • /
    • 2014
  • In current society, the digital era that makes enormous amount of data, and the diversified city, the smart space, which has characteristics of creating, collecting and representing data, is appeared. After 2012, in the social media environment called hyper-connected society with wide-spread smart phone, people started to get interested in public data and big data by generalized mobile device and SNS. At first, development of forming platform of data was focused, but now, many different idea from diverse area have been suggested about data analysis and usage to visualize the space intellectualization service. To focus on the visualization process to increase the usage of this public data for ordinary people more than specialized people, this research grasps the present condition of open data and public data service from the current public data portal and considers the applicability of them. As the result of research, the analysis and application of data to ordinary people decrease the use of paper documents, and this research will help to develop the application which is fast and accurate about individual behavior and demand to utilize public data service in intellectual space.

Improving the Accuracy of Document Classification by Learning Heterogeneity (이질성 학습을 통한 문서 분류의 정확성 향상 기법)

  • Wong, William Xiu Shun;Hyun, Yoonjin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.21-44
    • /
    • 2018
  • In recent years, the rapid development of internet technology and the popularization of smart devices have resulted in massive amounts of text data. Those text data were produced and distributed through various media platforms such as World Wide Web, Internet news feeds, microblog, and social media. However, this enormous amount of easily obtained information is lack of organization. Therefore, this problem has raised the interest of many researchers in order to manage this huge amount of information. Further, this problem also required professionals that are capable of classifying relevant information and hence text classification is introduced. Text classification is a challenging task in modern data analysis, which it needs to assign a text document into one or more predefined categories or classes. In text classification field, there are different kinds of techniques available such as K-Nearest Neighbor, Naïve Bayes Algorithm, Support Vector Machine, Decision Tree, and Artificial Neural Network. However, while dealing with huge amount of text data, model performance and accuracy becomes a challenge. According to the type of words used in the corpus and type of features created for classification, the performance of a text classification model can be varied. Most of the attempts are been made based on proposing a new algorithm or modifying an existing algorithm. This kind of research can be said already reached their certain limitations for further improvements. In this study, aside from proposing a new algorithm or modifying the algorithm, we focus on searching a way to modify the use of data. It is widely known that classifier performance is influenced by the quality of training data upon which this classifier is built. The real world datasets in most of the time contain noise, or in other words noisy data, these can actually affect the decision made by the classifiers built from these data. In this study, we consider that the data from different domains, which is heterogeneous data might have the characteristics of noise which can be utilized in the classification process. In order to build the classifier, machine learning algorithm is performed based on the assumption that the characteristics of training data and target data are the same or very similar to each other. However, in the case of unstructured data such as text, the features are determined according to the vocabularies included in the document. If the viewpoints of the learning data and target data are different, the features may be appearing different between these two data. In this study, we attempt to improve the classification accuracy by strengthening the robustness of the document classifier through artificially injecting the noise into the process of constructing the document classifier. With data coming from various kind of sources, these data are likely formatted differently. These cause difficulties for traditional machine learning algorithms because they are not developed to recognize different type of data representation at one time and to put them together in same generalization. Therefore, in order to utilize heterogeneous data in the learning process of document classifier, we apply semi-supervised learning in our study. However, unlabeled data might have the possibility to degrade the performance of the document classifier. Therefore, we further proposed a method called Rule Selection-Based Ensemble Semi-Supervised Learning Algorithm (RSESLA) to select only the documents that contributing to the accuracy improvement of the classifier. RSESLA creates multiple views by manipulating the features using different types of classification models and different types of heterogeneous data. The most confident classification rules will be selected and applied for the final decision making. In this paper, three different types of real-world data sources were used, which are news, twitter and blogs.