• Title/Summary/Keyword: textual analysis

Search Result 201, Processing Time 0.022 seconds

A Tensor Space Model based Deep Neural Network for Automated Text Classification (자동문서분류를 위한 텐서공간모델 기반 심층 신경망)

  • Lim, Pu-reum;Kim, Han-joon
    • Database Research
    • /
    • v.34 no.3
    • /
    • pp.3-13
    • /
    • 2018
  • Text classification is one of the text mining technologies that classifies a given textual document into its appropriate categories and is used in various fields such as spam email detection, news classification, question answering, emotional analysis, and chat bot. In general, the text classification system utilizes machine learning algorithms, and among a number of algorithms, naïve Bayes and support vector machine, which are suitable for text data, are known to have reasonable performance. Recently, with the development of deep learning technology, several researches on applying deep neural networks such as recurrent neural networks (RNN) and convolutional neural networks (CNN) have been introduced to improve the performance of text classification system. However, the current text classification techniques have not yet reached the perfect level of text classification. This paper focuses on the fact that the text data is expressed as a vector only with the word dimensions, which impairs the semantic information inherent in the text, and proposes a neural network architecture based upon the semantic tensor space model.

Analysis of Forwarding Schemes to Mitigate Data Broadcast Storm in Connected Vehicles over VNDN

  • Hur, Daewon;Lim, Huhnkuk
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.3
    • /
    • pp.69-75
    • /
    • 2021
  • Limitation of the TCP/IP network technology included in the vehicle communication is due to the frequent mobility of the vehicle, the increase in intermittent connection requirements, and the constant presence of the possibility of vehicle hacking. VNDN technology enables the transfer of the name you are looking for using textual information without the need for vehicle identifiers like IP/ID. In addition, intermittent connectivity communication is possible rather than end-to-end connection communication. The data itself is the subject of communication based on name-based forwarding using two types of packets: Interest packet and Data packet. One of the issues to be solved for the realization of infotainment services under the VNDN environment is the traffic explosion caused by data broadcasting. In this paper, we analyze and compare the existing technologies to reduce the data broadcast storm. Through this, we derive and analyze the requirements for presenting the best data mitigation technique for solving the data explosion phenomenon in the VNDN environment. We expect this paper can be utilized as prior knowledge in researching improved forwarding techniques to resolve the data broadcast explosion in connected vehicles over NDN.

Digital Marketing Tools for Managing the Development of Park and Recreation Complexes

  • Chaikovska, Maryna;Mashika, Hanna;Mankovska, Ruslana;Liulchak, Zoreslava;Haida, Pavlo;Diakova, Yana
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.5
    • /
    • pp.154-162
    • /
    • 2022
  • Digital marketing tools are actively used in managing the development of park and recreation complexes to familiarize the population with the objects of natural heritage. This article aims to empirically evaluate digital marketing tools for popularizing the park and recreational complexes. The methodology was based on the concept of ecosystem value of park and recreation complexes as a natural heritage site. These methods included: identifying and selecting websites with information about park and recreation complexes in Slovakia and Ukraine. structural analysis of the main channels of online details about natural parks. Assessing the current state of online identity of the studied sites from the perspective of Internet users. The results indicate that to manage the development of park and recreational complexes developed their driven official websites in the Internet space, on which sections structure the information with the allocation of data on tourism and recreational potential. The article identifies additional digital marketing tools for managing the development of park and recreation complexes, particularly social networks and tourist websites. There is a sufficient amount of information about tourist recreation sites within these natural parks and tourist routes. Among the main problems of the websites: the information on the websites is entirely textual, there is a lack of sufficient data on social networks, despite the created official pages, there is no video content, which was more attracted tourists and visitors, allowing a visual assessment of the tourist potential; there is a problem of many communication channels to present the natural heritage of the countries. The research proves that the website is the primary and most common digital marketing tool for natural heritage, structuring information about tourism potential and recreation.

Comparison and Analysis of Unsupervised Contrastive Learning Approaches for Korean Sentence Representations (한국어 문장 표현을 위한 비지도 대조 학습 방법론의 비교 및 분석)

  • Young Hyun Yoo;Kyumin Lee;Minjin Jeon;Jii Cha;Kangsan Kim;Taeuk Kim
    • Annual Conference on Human and Language Technology
    • /
    • 2022.10a
    • /
    • pp.360-365
    • /
    • 2022
  • 문장 표현(sentence representation)은 자연어처리 분야 내의 다양한 문제 해결 및 응용 개발에 있어 유용하게 활용될 수 있는 주요한 도구 중 하나이다. 하지만 최근 널리 도입되고 있는 사전 학습 언어 모델(pre-trained language model)로부터 도출한 문장 표현은 이방성(anisotropy)이 뚜렷한 등 그 고유의 특성으로 인해 문장 유사도(Semantic Textual Similarity; STS) 측정과 같은 태스크에서 기대 이하의 성능을 보이는 것으로 알려져 있다. 이러한 문제를 해결하기 위해 대조 학습(contrastive learning)을 사전 학습 언어 모델에 적용하는 연구가 문헌에서 활발히 진행되어 왔으며, 그중에서도 레이블이 없는 데이터를 활용하는 비지도 대조 학습 방법이 주목을 받고 있다. 하지만 대다수의 기존 연구들은 주로 영어 문장 표현 개선에 집중하였으며, 이에 대응되는 한국어 문장 표현에 관한 연구는 상대적으로 부족한 실정이다. 이에 본 논문에서는 대표적인 비지도 대조 학습 방법(ConSERT, SimCSE)을 다양한 한국어 사전 학습 언어 모델(KoBERT, KR-BERT, KLUE-BERT)에 적용하여 문장 유사도 태스크(KorSTS, KLUE-STS)에 대해 평가하였다. 그 결과, 한국어의 경우에도 일반적으로 영어의 경우와 유사한 경향성을 보이는 것을 확인하였으며, 이에 더하여 다음과 같은 새로운 사실을 관측하였다. 첫째, 사용한 비지도 대조 학습 방법 모두에서 KLUE-BERT가 KoBERT, KR-BERT보다 더 안정적이고 나은 성능을 보였다. 둘째, ConSERT에서 소개하는 여러 데이터 증강 방법 중 token shuffling 방법이 전반적으로 높은 성능을 보였다. 셋째, 두 가지 비지도 대조 학습 방법 모두 검증 데이터로 활용한 KLUE-STS 학습 데이터에 대해 성능이 과적합되는 현상을 발견하였다. 결론적으로, 본 연구에서는 한국어 문장 표현 또한 영어의 경우와 마찬가지로 비지도 대조 학습의 적용을 통해 그 성능을 개선할 수 있음을 검증하였으며, 이와 같은 결과가 향후 한국어 문장 표현 연구 발전에 초석이 되기를 기대한다.

  • PDF

English Predicate Inversion: Towards Data-driven Learning

  • Kim, Jong-Bok;Kim, Jin-Young
    • Journal of English Language & Literature
    • /
    • v.56 no.6
    • /
    • pp.1047-1065
    • /
    • 2010
  • English inversion constructions are not only hard for non-native speakers to learn but also difficult to teach mainly because of their intriguing grammatical and discourse properties. This paper addresses grammatical issues in learning or teaching the so-called 'predicate inversion (PI)' construction (e.g., Equally important in terms of forest depletion is the continuous logging of the forests). In particular, we chart the grammatical (distributional, syntactic, semantic, pragmatic) properties of the PI construction, and argue for adata-driven teaching for English grammar. To depart from the arm-chaired style of grammar teaching (relying on author-made simple sentences), our teaching method introduces a datadriven teaching. With total 25 university students in a grammar-related class, students together have analyzed the British Component of the International Corpus of English (ICE-GB), containing about one million words distributed across a variety of textual categories. We have identified total 290 PI sentences (206 from spoken and 87 from written texts). The preposed syntactic categories of the PI involve five main types: AdvP, PP, VP(ed/ing), NP, AP, and so, all of which function as the complement of the copula. In terms of discourse, we have observed, supporting Birner and Ward's (1998) observation that these preposed phrases represent more familiar information than the postposed subject. The corpus examples gave us the three possible types: The preposed element is discourse-old whereas the postposed one is discourse-new as in Putting wire mesh over a few bricks is a good idea. Both preposed and postposed elements can also be discourse new as in But a fly in the ointment is inflation. These two elements can also be discourse old as in Racing with him on the near-side is Rinus. The dominant occurrence of the PI in the spoken texts also supports the view that the balance (or scene-setting) in information structure is the main trigger for the use of the PI construction. After being exposed to the real data and in-depth syntactic as well as informationstructure analysis of the PI construction, it is proved that the class students have had a farmore clear understanding of the construction in question and have realized that grammar does not mean to live on by itself but tightly interacts with other important grammatical components such as information structure. The study directs us toward both a datadriven and interactive grammar teaching.

Possibilities and Limitations of Media Representation as the Historical Communication -Focusing on Korea Films of Gwangju Democratization Movement in 2000s- (역사적 소통 공간으로써 미디어 재현의 가능성과 한계 -2000년대 한국 영화 속 광주 민주화 운동을 중심으로-)

  • Kim, Mi-Sun;Kim, Yu-Rye
    • The Journal of the Korea Contents Association
    • /
    • v.15 no.7
    • /
    • pp.157-169
    • /
    • 2015
  • This study focuses on Korea films of historical communication. Narrative analysis was conducted on the films in 2000s including , and <26 Years> that mainly have dealt with the 'Gwangju Democratization Movement'. As a result of the syntagmatic analysis, these films try to stabilize 'social imbalances' in the aspect of individuals and conceals issues of social structure. In addition, the result of paradigmatic analysis reveals that textual factors of 'active involvement of female characters', 'continuity of history through the survivors' demonstrate its strategies to publicize the historical truth. Consequently these films show its limitations that weakens historical meaning by placing unsolved problems of social structure as well as the love story. But rather than describing it as a history of the past, these films act as a catalyst to bring thins specific historical issue to our present lives and publicize it as a current issue. Therefore, the historical film not only allows current generation to remind to history but also to provide an opportunity to publicize the important issues of social structure in the present society.

A Cultural Analysis of Self-introduction Letters by Young Job Seekers (청년주체들의 '자기소개서' 작성을 중심으로 한 구직 경험의 문화적인 분석)

  • Lee, Kee-hyeung;Song, Dong-Wook;Koo, Seung-Woo;Jeong, Jun;Kim, Ji-Su;Lee, Dan-Bi;Park, Ju-Hwa
    • Korean journal of communication and information
    • /
    • v.72
    • /
    • pp.7-51
    • /
    • 2015
  • Job seeking for young adults after college in South Korea is much fierce and highly competitive. Many job seekers tend to experience despair, frustration, and insecurity in such a dire social situation. This study focuses on the job seeking experiences of younger generation people by closely examining the self-introduction letters. This work pays keen attention to the narrative strategies and portrayal of the applicants' self-described activities in these forms of letters through a detailed textual and cultural analysis. In doing so, this analysis attempts to contextualize the complex structures of feeling for the part of young job seekers as well as various social factors and pressures that influence on them.

  • PDF

Occupational Therapy in Long-Term Care Insurance For the Elderly Using Text Mining (텍스트 마이닝을 활용한 노인장기요양보험에서의 작업치료: 2007-2018년)

  • Cho, Min Seok;Baek, Soon Hyung;Park, Eom-Ji;Park, Soo Hee
    • Journal of Society of Occupational Therapy for the Aged and Dementia
    • /
    • v.12 no.2
    • /
    • pp.67-74
    • /
    • 2018
  • Objective : The purpose of this study is to quantitatively analyze the role of occupational therapy in long - term care insurance for the elderly using text mining, one of the big data analysis techniques. Method : For the analysis of newspaper articles, "Long - Term Care Insurance for the Elderly + Occupational Therapy for the Elderly" was collected after the period from 2007 to 208. Naver, which has a high share of the domestic search engine, utilized the database of Naver News by utilizing Textom, a web crawling tool. After collecting the article title and original text of 510 news data from the collection of the elderly long term care insurance + occupational therapy search, we analyzed the article frequency and key words by year. Result : In terms of the frequency of articles published by year, the number of articles published in 2015 and 2017 was the highest with 70 articles (13.7%), and the top 10 terms of the key word analysis showed the highest frequency of 'dementia' (344) In terms of key words, dementia, treatment, hospital, health, service, rehabilitation, facilities, institution, grade, elderly, professional, salary, industrial complex and people are related. Conclusion : In this study, it is meaningful that the textual mining technique was used to more objectively confirm the social needs and the role of the occupational therapist for the dementia and rehabilitation in the related key keywords based on the media reporting trend of the elderly long - term care insurance for 11 years. Based on the results of this study, future research should expand research field and period and supplement the research methodology through various analysis methods according to the year.

Sentiment Analysis of Product Reviews to Identify Deceptive Rating Information in Social Media: A SentiDeceptive Approach

  • Marwat, M. Irfan;Khan, Javed Ali;Alshehri, Dr. Mohammad Dahman;Ali, Muhammad Asghar;Hizbullah;Ali, Haider;Assam, Muhammad
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.3
    • /
    • pp.830-860
    • /
    • 2022
  • [Introduction] Nowadays, many companies are shifting their businesses online due to the growing trend among customers to buy and shop online, as people prefer online purchasing products. [Problem] Users share a vast amount of information about products, making it difficult and challenging for the end-users to make certain decisions. [Motivation] Therefore, we need a mechanism to automatically analyze end-user opinions, thoughts, or feelings in the social media platform about the products that might be useful for the customers to make or change their decisions about buying or purchasing specific products. [Proposed Solution] For this purpose, we proposed an automated SentiDecpective approach, which classifies end-user reviews into negative, positive, and neutral sentiments and identifies deceptive crowd-users rating information in the social media platform to help the user in decision-making. [Methodology] For this purpose, we first collected 11781 end-users comments from the Amazon store and Flipkart web application covering distant products, such as watches, mobile, shoes, clothes, and perfumes. Next, we develop a coding guideline used as a base for the comments annotation process. We then applied the content analysis approach and existing VADER library to annotate the end-user comments in the data set with the identified codes, which results in a labelled data set used as an input to the machine learning classifiers. Finally, we applied the sentiment analysis approach to identify the end-users opinions and overcome the deceptive rating information in the social media platforms by first preprocessing the input data to remove the irrelevant (stop words, special characters, etc.) data from the dataset, employing two standard resampling approaches to balance the data set, i-e, oversampling, and under-sampling, extract different features (TF-IDF and BOW) from the textual data in the data set and then train & test the machine learning algorithms by applying a standard cross-validation approach (KFold and Shuffle Split). [Results/Outcomes] Furthermore, to support our research study, we developed an automated tool that automatically analyzes each customer feedback and displays the collective sentiments of customers about a specific product with the help of a graph, which helps customers to make certain decisions. In a nutshell, our proposed sentiments approach produces good results when identifying the customer sentiments from the online user feedbacks, i-e, obtained an average 94.01% precision, 93.69% recall, and 93.81% F-measure value for classifying positive sentiments.

A Critical Analysis of and Its Implications ("나꼼수현상"이 그려내는 문화정치의 명암: 권력-대항적인 정치시사콘텐츠의 함의를 맥락화하기)

  • Lee, Kee-Hyeung;Lee, Young-Joo;Hwang, Kyong-Ah;Chae, Zi-Yeon;Cheon, Hye-Young;Kwon, Sook-Young
    • Korean journal of communication and information
    • /
    • v.58
    • /
    • pp.74-105
    • /
    • 2012
  • $I$ $am$ $a$ $Weasel$ > is a radically different communicative form in several ways. It innovatively utilizes podcast, a kind of internet radio format while dealing actively with thorny political issues and scandals in much direct and challenging fashion. Also this program adopts politically-charged parody, sharp critique of current socio-political issues, as well as lively dialogues through which the program provides both acute political awareness and entertainment. As a new kind of talk show and an alternative media form, this program has gained much popularity and attention since its appearance. Considering the fact that the journalistic fields and public spheres are in disarray through the government intervention and wrought with fierce partisanship and political polarization, the role of this program needs to be examined both cautiously and contextually. This study aims to shed some lights on the multifaceted and much contentious role of $I$ $am$ $a$ $Weasel$ > through a textual reading and discourse analysis, as well as email interviews.

  • PDF