• Title/Summary/Keyword: TextMining

Search Result 1,563, Processing Time 0.027 seconds

Tourism Information Contents and Text Networking (Focused on Formal Website of Jeju and Chinese Personal Blogs) (온라인 관광정보의 내용 및 텍스트 네트워크 (제주 공식 웹사이트와 중국 개인블로그를 중심으로))

  • Zhang, Lin;Yun, Hee Jeong
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.1
    • /
    • pp.19-30
    • /
    • 2018
  • The main purposes of this study are to analyze the contents and text network of online tourism information. For this purpose, Jeju Island, one of the representative tourist destinations in South Korea is selected as a study site. And this study collects the contents of both JeJu official tourism website and Sina Weibo's personal blogs which is one of the most popular Social Network Systems in China. In addition, this study analyzes this online text information using ROST Content Mining System, one of the Chinese big data mining systems. The results of the content analysis show that the formal website of Jeju includes the nouns related to natural, geographical and physical resources, verbs related to existence of resources, and adjectives related to the beauty, cleanness and convenience of resources mainly. Meanwhile, personal blogs include the nouns of Korean-wave, food, local products, other destinations and shopping, verbs related to activity and feeling in Jeju, and adjectives related to their experiences and feeling mainly. Finally, the results of text network show that there are some strong centrality and network of online tourism information at formal website, but there are weak relationships in personal blogs. The results of this study may be able to contribute to the development of demand-based marketing strategies of tourists destination.

Prototype Design and Development of Online Recruitment System Based on Social Media and Video Interview Analysis (소셜미디어 및 면접 영상 분석 기반 온라인 채용지원시스템 프로토타입 설계 및 구현)

  • Cho, Jinhyung;Kang, Hwansoo;Yoo, Woochang;Park, Kyutae
    • Journal of Digital Convergence
    • /
    • v.19 no.3
    • /
    • pp.203-209
    • /
    • 2021
  • In this study, a prototype design model was proposed for developing an online recruitment system through multi-dimensional data crawling and social media analysis, and validates text information and video interview in job application process. This study includes a comparative analysis process through text mining to verify the authenticity of job application paperwork and to effectively hire and allocate workers based on the potential job capability. Based on the prototype system, we conducted performance tests and analyzed the result for key performance indicators such as text mining accuracy and interview STT(speech to text) function recognition rate. If commercialized based on design specifications and prototype development results derived from this study, it may be expected to be utilized as the intelligent online recruitment system technology required in the public and private recruitment markets in the future.

Automated Data Extraction from Unstructured Geotechnical Report based on AI and Text-mining Techniques (AI 및 텍스트 마이닝 기법을 활용한 지반조사보고서 데이터 추출 자동화)

  • Park, Jimin;Seo, Wanhyuk;Seo, Dong-Hee;Yun, Tae-Sup
    • Journal of the Korean Geotechnical Society
    • /
    • v.40 no.4
    • /
    • pp.69-79
    • /
    • 2024
  • Field geotechnical data are obtained from various field and laboratory tests and are documented in geotechnical investigation reports. For efficient design and construction, digitizing these geotechnical parameters is essential. However, current practices involve manual data entry, which is time-consuming, labor-intensive, and prone to errors. Thus, this study proposes an automatic data extraction method from geotechnical investigation reports using image-based deep learning models and text-mining techniques. A deep-learning-based page classification model and a text-searching algorithm were employed to classify geotechnical investigation report pages with 100% accuracy. Computer vision algorithms were utilized to identify valid data regions within report pages, and text analysis was used to match and extract the corresponding geotechnical data. The proposed model was validated using a dataset of 205 geotechnical investigation reports, achieving an average data extraction accuracy of 93.0%. Finally, a user-interface-based program was developed to enhance the practical application of the extraction model. It allowed users to upload PDF files of geotechnical investigation reports, automatically analyze these reports, and extract and edit data. This approach is expected to improve the efficiency and accuracy of digitizing geotechnical investigation reports and building geotechnical databases.

Entrepreneur Speech and User Comments: Focusing on YouTube Contents (기업가 연설문의 주제와 시청자 댓글 간의 관계 분석: 유튜브 콘텐츠를 중심으로)

  • Kim, Sungbum;Lee, Junghwan
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.5
    • /
    • pp.513-524
    • /
    • 2020
  • Recently, YouTube's growth started drawing attention. YouTube is not only a content-consumption channel but also provides a space for consumers to express their intention. Consumers share their opinions on YouTube through comments. The study focuses on the text of global entrepreneurs' speeches and the comments in response to those speeches on YouTube. A content analysis was conducted for each speech and comment using the text mining software Leximancer. We analyzed the theme of each entrepreneurial speech and derived topics related to the propensity and characteristics of individual entrepreneurs. In the comments, we found the theme of money, work and need to be common regardless of the content of each speech. Talking into account the different lengths of text, we additionally performed a Prominence Index analysis. We derived time, future, better, best, change, life, business, and need as common keywords for speech contents and viewer comments. Users who watched an entrepreneur's speech on YouTube responded equally to the topics of life, time, future, customer needs, and positive change.

Social media big data analysis of Z-generation fashion (Z세대 패션에 대한 소셜미디어의 빅데이터 분석)

  • Sung, Kwang-Sook
    • Journal of the Korea Fashion and Costume Design Association
    • /
    • v.22 no.3
    • /
    • pp.49-61
    • /
    • 2020
  • This study analyzed the social media accounts and performed a Big Data analysis of Z-generation fashion using Textom Text Mining Techniques program and Ucinet Big Data analysis program. The research results are as follows: First, as a result of keyword analysis on 67.646 Z-generation fashion social media posts over the last 5 years, 220,211 keywords were extracted. Among them, 67 major keywords were selected based on the frequency of co-occurrence being greater than more than 250 times. As the top keywords appearing over 1000 times, were the most influential as the number of nodes connected to 'Z generation' (29595 times) are overwhelmingly, and was followed by 'millennials'(18536 times), 'fashion'(17836 times), and 'generation'(13055 times), 'brand'(8325 times) and 'trend'(7310 times) Second, as a result of the analysis of Network Degree Centrality between the key keywords for the Z-generation, the number of nodes connected to the "Z-generation" (29595 times) is overwhelmingly large. Next, many 'millennial'(18536 times), 'fashion'(17836 times), 'generation'(13055 times), 'brand'(8325 times), 'trend'(7310 times), etc. appear. These texts are considered to be important factors in exploring the reaction of social media to the Z-generation. Third, through the analysis of CONCOR, text with the structural equivalence between major keywords for Gen Z fashion was rearranged and clustered. In addition, four clusters were derived by grouping through network semantic network visualization. Group 1 is 54 texts, 'Diverse Characteristics of Z-Generation Fashion Consumers', Group 2 is 7 Texts, 'Z-Generation's teenagers Fashion Powers', Group 3 is 8 Texts, 'Z-Generation's Celebrity Fashions' Interest and Fashion', Group 4 named 'Gucci', the most popular luxury fashion of the Z-generation as one text.

KR-WordRank : An Unsupervised Korean Word Extraction Method Based on WordRank (KR-WordRank : WordRank를 개선한 비지도학습 기반 한국어 단어 추출 방법)

  • Kim, Hyun-Joong;Cho, Sungzoon;Kang, Pilsung
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.40 no.1
    • /
    • pp.18-33
    • /
    • 2014
  • A Word is the smallest unit for text analysis, and the premise behind most text-mining algorithms is that the words in given documents can be perfectly recognized. However, the newly coined words, spelling and spacing errors, and domain adaptation problems make it difficult to recognize words correctly. To make matters worse, obtaining a sufficient amount of training data that can be used in any situation is not only unrealistic but also inefficient. Therefore, an automatical word extraction method which does not require a training process is desperately needed. WordRank, the most widely used unsupervised word extraction algorithm for Chinese and Japanese, shows a poor word extraction performance in Korean due to different language structures. In this paper, we first discuss why WordRank has a poor performance in Korean, and propose a customized WordRank algorithm for Korean, named KR-WordRank, by considering its linguistic characteristics and by improving the robustness to noise in text documents. Experiment results show that the performance of KR-WordRank is significantly better than that of the original WordRank in Korean. In addition, it is found that not only can our proposed algorithm extract proper words but also identify candidate keywords for an effective document summarization.

Analysis of the National Police Agency business trends using text mining (텍스트 마이닝 기법을 이용한 경찰청 업무 트렌드 분석)

  • Sun, Hyunseok;Lim, Changwon
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.2
    • /
    • pp.301-317
    • /
    • 2019
  • There has been significant research conducted on how to discover various insights through text data using statistical techniques. In this study we analyzed text data produced by the Korean National Police Agency to identify trends in the work by year and compare work characteristics among local authorities by identifying distinctive keywords in documents produced by each local authority. A preprocessing according to the characteristics of each data was conducted and the frequency of words for each document was calculated in order to draw a meaningful conclusion. The simple term frequency shown in the document is difficult to describe the characteristics of the keywords; therefore, the frequency for each term was newly calculated using the term frequency-inverse document frequency weights. The L2 norm normalization technique was used to compare the frequency of words. The analysis can be used as basic data that can be newly for future police work improvement policies and as a method to improve the efficiency of the police service that also help identify a demand for improvements in indoor work.

Rural Tourism Image and Major Activity Space in Gochang County Shown in Social Data - Focusing on the Keyword 'Gochang-gun Travel' - (소셜데이터에 나타난 고창군의 농촌관광 이미지와 주요 활동공간 - '고창군 여행' 키워드를 중심으로 -)

  • Kim, Young-Jin;Son, Gwangryul;Lee, Dongchae;Son, Yong-hoon
    • Journal of Korean Society of Rural Planning
    • /
    • v.27 no.3
    • /
    • pp.103-116
    • /
    • 2021
  • In this study, the characteristics of rural tourism image perceived by urban residents were analyzed through text analysis of blog data. In order to examine the images related to rural tourism, blog data written with the keyword "Gochang-gun travel" was used. LDA topic analysis, one of the text mining techniques, was used for the analysis. In the tourism image of Gochang-gun, 9 topics were derived, and 112 major places appeared. This was divided into 3 main activities and 5 object spaces through the review of keywords and the original text of blog data. As a result of the analysis, the traditional main resources of the region, Seonun mountain, Seonun temple, and Gochang-eup fortress, formed topic. On the other hand, world heritage such as dolmen and Ungok wetland did not appear as topic. In particular, the farms operated by the private sector form individual topics, and the theme farm can be seen as an important resource for tourism in Gochang-gun. Also, through the distribution of place keywords, it was possible to understand the characteristics of travel by region and the usage behavior of visitors. In the case of Gochang-gun, there was a phenomenon in which visitors were biased by region. This seems to be the result of Gochang-gun seeking to vitalize local tourism focusing on natural, ecological, and scenic resources. It is necessary to establish a plan for balanced regional development and develop other types of tourism resources. This study is different in that it identified the types and characteristics of rural tourism images in the region perceived by visitors, and the status of tourism at the regional level.

The User Perception in ASMR Marketing Content through Social Media Text-Mining: ASMR Product Review Content vs ASMR How-to Content (텍스트 마이닝을 활용한 ASMR 콘텐츠 분야에 따른 소비자 인식 및 구전효과 차이점 분석: ASMR 제품리뷰 및 ASMR How-to 콘텐츠 중심으로)

  • Tran, Hung Chuong;Choi, Jae Won
    • The Journal of Information Systems
    • /
    • v.30 no.4
    • /
    • pp.1-20
    • /
    • 2021
  • Purpose Nowadays, Autonomous Sensory Meridian Response (ASMR) is rapidly growing in popularity and increasingly appearing in marketing. Not even in TV commercial advertisement, ASMR also fast growing in one-person media communication, many brands and social media influencers used ASMR for their marketing contents. The purpose of this study is to measure consumers' perceptions about the products in ASMR marketing content and compare the differences in communication effect of ASMR content creator between product review and how-to in the same Macro tier influencer - the YouTuber that has 10,000-100,000 subscribers. Design/methodology/approach The research methods selected ASMRtist that do product review content and how-to content, Text comments data was collected from 200 videos of tech-device review videos and beauty-fashion videos. A total of 52,833 text comments were analyzed by applying the LDA topic modeling algorithm and social network analysis. Findings Through the result, we can know that ASMR is good at taking attention of viewers with ASMR triggers. In the Tech device reviews field, ASMR viewers also focus on the product like product's performance and purchase. However, there are many topics related to reaction of ASMR sound, trigger, relaxation. In the Beauty-fashion field, viewers' topics mainly focus on the reaction of the ASMR trigger, response to ASMRtist and other topics are talking about makeup - fashion, product, purchase. From LDA result, many ASMR viewers comment that they feel more comfortable when watching the marketing content that uses ASMR. This result has shown that ASMR marketing contents have a good performance in terms of user watching experience, so applying ASMR can take more consumer intention. And the result of social network analysis showed that product review ASMRtist have a higher communication effectiveness than how-to ASMRtist in the same tier. As an influencer marketing strategy, this study provides information to establish an efficient advertising strategy by using influencers that create ASMR content.

Using Text Mining and Social Network Analysis to Identify Determinant Characteristics Affecting Consumers' Evaluation of Clothing Fit (텍스트 마이닝과 소셜 네트워크 분석 기법을 활용한 소비자의 의복 맞음새(Fit)평가에 영향을 미치는 특성)

  • Soo Hyun Hwang;Juyeon Park
    • Science of Emotion and Sensibility
    • /
    • v.26 no.1
    • /
    • pp.101-114
    • /
    • 2023
  • This research aimed to recognize the determinant characteristics affecting consumers' clothing fit evaluation by employing text mining and social network analysis. For this aim, we first extracted text data linked to clothing fit from 2,000 consumer reviews collected from social network services and conducted semantic network examination and CONCOR analysis. As a result, we reported that "pants" and "skirts" were the most commonly associated clothing items with consumers' clothing fit evaluation. And the length of clothing was most commonly investigated. Then, the "waist" and "hip" were the most critical body parts affecting consumers' perception of clothing fit. Further, the four keywords including "wide," "large," "short," and "long" were the most employed ones in consumer reviews when evaluating clothing fit. This study is meaningful in that it specifically recognized the structural relationship and semantic meanings of keywords relevant to consumers' evaluation of clothing fit, which could bring empirical reference information for advanced clothing fit.