• Title/Summary/Keyword: Frequency based Text Analysis

Search Result 244, Processing Time 0.03 seconds

A Study on Domestic Research Trends (2001-2020) of Forest Ecology Using Text Mining (텍스트마이닝을 활용한 국내 산림생태 분야 연구동향(2001-2020) 분석)

  • Lee, Jinkyu;Lee, Chang-Bae
    • Journal of Korean Society of Forest Science
    • /
    • v.110 no.3
    • /
    • pp.308-321
    • /
    • 2021
  • The purpose of this study was to analyze domestic research trends over the past 20 years and future direction of forest ecology using text mining. A total of 1,015 academic papers and keywords data related to forest ecology were collected by the "Research and Information Service Section" and analyzed using big data analysis programs, such as Textom and UCINET. From the results of word frequency and N-gram analyses, we found domestic studies on forest ecology rapidly increased since 2011. The most common research topic was "species diversity" over the past 20 years and "climate change" became a major topic since 2011. Based on CONCOR analysis, study subjects were grouped intoeight categories, such as "species diversity," "environmental policy," "climate change," "management," "plant taxonomy," "habitat suitability index," "vascular plants," and "recreation and welfare." Consequently, species diversity and climate change will remain important topics in the future and diversifying and expanding domestic research topics following global research trendsis necessary.

Research on Tourist Perception of Grand Canal Cultural Heritage Based on Network Text Analysis : The Pingjiang Historical and Cultural District of Suzhou City as an example (네트워크 텍스트 분석을 통한 대운하 문화유산에 대한 관광객 인식 연구 : 쑤저우시 핑장역사문화지구의 예)

  • Chengkang Zheng;Qiwei Jing;Nam Kyung Hyeon
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.1
    • /
    • pp.215-231
    • /
    • 2023
  • Taking Pingjiang historical and cultural block in Suzhou as an example, this paper collects 1436 tourist comment data from Ctrip. com with Python technology, and uses network text analysis method to analyze frequency words, semantic network and emotion, so as to evaluate the tourist perception characteristics and levels of the Grand Canal cultural heritage. The study found that: natural and humanistic landscapes, historical and cultural deposits, and the style of the Jiangnan Canal are fully reflected in the perception of visitors to the Pingjiang Historical and Cultural District; Tourists hold strong positive emotions towards the Pingjiang Road historical and cultural district, however, there is still more space for the transformation and upgrading of the district. Finally,suggestions for measures to improve the perception of tourists of the Grand Canal cultural heritage are given in terms of conservation first, cultural integration and innovative utilization.

A Study of the Consumer Major Perception of Packaging Using Big Data Analysis -Focusing on Text Mining and Semantic Network Analysis- (빅데이터 분석을 통한 패키징에 대한 소비자의 주요 인식 조사 -텍스트 마이닝과 의미연결망 분석을 중심으로-)

  • Kang, Wook-Geon;Ko, Eui-Suk;Lee, Hak-Rae;Kim, Jai-neung
    • Journal of the Korea Convergence Society
    • /
    • v.9 no.4
    • /
    • pp.15-22
    • /
    • 2018
  • The purpose of this study is to investigate the consumer perception of packaging using big data analysis. This study use text mining to extract meaningful words from text and semantic network analysis to analyze connectivity and propagation trends. Data were collected by dividing the 'packaging(Korean)' and 'packaging(English)'. This study visualized the word network structure of the two key words and classified them into four groups with similar meaning through CONCOR analysis. The group name was specified based on the words constituting the classified group. These groups are a major category of consumers' perception of packaging. Especially cosmetics and design have high frequency of words and high centrality. Therefore it can be expected that the packaging design is perceived as important in the cosmetics industry. This study predicts consumers' perception of packaging so it can be a basis for future research and industry development.

A Trend Analysis of Agricultural and Food Marketing Studies Using Text-mining Technique (텍스트마이닝 기법을 이용한 국내 농식품유통 연구동향 분석)

  • Yoo, Li-Na;Hwang, Su-Chul
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.18 no.10
    • /
    • pp.215-226
    • /
    • 2017
  • This study analyzed trends in agricultural and food marketing studies from 1984 to 2015 using text-mining techniques. Text-mining is a part of Big-data analysis, which is an effective tool to objectively process large amounts of information based on categorization and trend analysis. In the present study, frequency analysis, topic analysis and association rules were conducted. Titles of agricultural and food marketing studies in four journals and reports were used for placing the analysis. The results showed that 1,126 total theses related to agricultural and food marketing could be categorized into six subjects. There were significant changes in research trends before and after the 2000s. While research before 2000s focused on farm and wholesale level marketing, research after the 2000s mainly covered consumption, (processed)food, exports and imports. Local food and school meals are new subjects that are increasingly being studied. Issues regarding agricultural supply and demand were the only subjects investigated in policy research studies. Interest in agricultural supply and demand was lost after the 2000s. A number of studies after the 2010s analyzed consumption, primarily consumption trends and consumer behavior.

Analysis of Urban-to-Rural Migrants' Perceptions of the 'Everyday Landscape' Using Diary-Based Text Mining (일기를 통해 본 귀농·귀촌인 '일상 경관' 인식 - 텍스트 마이닝 적용 -)

  • OH Jungshim
    • Korean Journal of Heritage: History & Science
    • /
    • v.57 no.3
    • /
    • pp.184-199
    • /
    • 2024
  • This study was conducted in response to the global trend of emphasizing the importance of "everyday landscapes", focusing on the perspective of those who have returned to rural life. With a focus on the case of Gokseong-gun in Jeollanam-do, 460 diaries written by these individuals were collected and analyzed using text mining techniques such as "frequency analysis", "topic modeling", and "sentiment analysis". The analysis of noun morphemes was interpreted from a cognitive aspect, while adjective morphemes were interpreted from an emotional aspect. In particular, this study applied semantic network analysis to overcome the limitations of existing sentiment analysis, and extracted a word network list and examined the content of nouns connected to adjectives that express emotions to identify the targets and contents of sentiments. This method represents a differentiated approach that is not commonly found in existing research. One of the intriguing findings is that the urban-to-rural migrants identified everyday landscapes such as "flowers on neighborhood walking paths", "harvest of a garden", "neighborhood events", and "cozy cafe spaces" as important. These elements all contain visual and enjoyable aspects of everyday landscapes. Currently, many rural villages are attempting to add visual elements to their everyday landscapes by unifying roof colors or painting murals on walls. However, such artificial measures do not necessarily leave a lasting impression on people. A critical review of current policies and systems is necessary. This research is significant because it is the first to study everyday landscapes from the perspective of urban-to-rural migration using diaries and text mining. With a lack of domestic research on everyday landscapes, this study hopes to contribute to the activation of related research in Korea.

Financial Fraud Detection using Text Mining Analysis against Municipal Cybercriminality (지자체 사이버 공간 안전을 위한 금융사기 탐지 텍스트 마이닝 방법)

  • Choi, Sukjae;Lee, Jungwon;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.119-138
    • /
    • 2017
  • Recently, SNS has become an important channel for marketing as well as personal communication. However, cybercrime has also evolved with the development of information and communication technology, and illegal advertising is distributed to SNS in large quantity. As a result, personal information is lost and even monetary damages occur more frequently. In this study, we propose a method to analyze which sentences and documents, which have been sent to the SNS, are related to financial fraud. First of all, as a conceptual framework, we developed a matrix of conceptual characteristics of cybercriminality on SNS and emergency management. We also suggested emergency management process which consists of Pre-Cybercriminality (e.g. risk identification) and Post-Cybercriminality steps. Among those we focused on risk identification in this paper. The main process consists of data collection, preprocessing and analysis. First, we selected two words 'daechul(loan)' and 'sachae(private loan)' as seed words and collected data with this word from SNS such as twitter. The collected data are given to the two researchers to decide whether they are related to the cybercriminality, particularly financial fraud, or not. Then we selected some of them as keywords if the vocabularies are related to the nominals and symbols. With the selected keywords, we searched and collected data from web materials such as twitter, news, blog, and more than 820,000 articles collected. The collected articles were refined through preprocessing and made into learning data. The preprocessing process is divided into performing morphological analysis step, removing stop words step, and selecting valid part-of-speech step. In the morphological analysis step, a complex sentence is transformed into some morpheme units to enable mechanical analysis. In the removing stop words step, non-lexical elements such as numbers, punctuation marks, and double spaces are removed from the text. In the step of selecting valid part-of-speech, only two kinds of nouns and symbols are considered. Since nouns could refer to things, the intent of message is expressed better than the other part-of-speech. Moreover, the more illegal the text is, the more frequently symbols are used. The selected data is given 'legal' or 'illegal'. To make the selected data as learning data through the preprocessing process, it is necessary to classify whether each data is legitimate or not. The processed data is then converted into Corpus type and Document-Term Matrix. Finally, the two types of 'legal' and 'illegal' files were mixed and randomly divided into learning data set and test data set. In this study, we set the learning data as 70% and the test data as 30%. SVM was used as the discrimination algorithm. Since SVM requires gamma and cost values as the main parameters, we set gamma as 0.5 and cost as 10, based on the optimal value function. The cost is set higher than general cases. To show the feasibility of the idea proposed in this paper, we compared the proposed method with MLE (Maximum Likelihood Estimation), Term Frequency, and Collective Intelligence method. Overall accuracy and was used as the metric. As a result, the overall accuracy of the proposed method was 92.41% of illegal loan advertisement and 77.75% of illegal visit sales, which is apparently superior to that of the Term Frequency, MLE, etc. Hence, the result suggests that the proposed method is valid and usable practically. In this paper, we propose a framework for crisis management caused by abnormalities of unstructured data sources such as SNS. We hope this study will contribute to the academia by identifying what to consider when applying the SVM-like discrimination algorithm to text analysis. Moreover, the study will also contribute to the practitioners in the field of brand management and opinion mining.

An Analysis of Indications of Meridians in DongUiBoGam Using Data Mining (데이터마이닝을 이용한 동의보감에서 경락의 주치특성 분석)

  • Chae, Younbyoung;Ryu, Yeonhee;Jung, Won-Mo
    • Korean Journal of Acupuncture
    • /
    • v.36 no.4
    • /
    • pp.292-299
    • /
    • 2019
  • Objectives : DongUiBoGam is one of the representative medical literatures in Korea. We used text mining methods and analyzed the characteristics of the indications of each meridian in the second chapter of DongUiBoGam, WaeHyeong, which addresses external body elements. We also visualized the relationships between the meridians and the disease sites. Methods : Using the term frequency-inverse document frequency (TF-IDF) method, we quantified values regarding the indications of each meridian according to the frequency of the occurrences of 14 meridians and 14 disease sites. The spatial patterns of the indications of each meridian were visualized on a human body template according to the TF-IDF values. Using hierarchical clustering methods, twelve meridians were clustered into four groups based on the TF-IDF distributions of each meridian. Results : TF-IDF values of each meridian showed different constellation patterns at different disease sites. The spatial patterns of the indications of each meridian were similar to the route of the corresponding meridian. Conclusions : The present study identified spatial patterns between meridians and disease sites. These findings suggest that the constellations of the indications of meridians are primarily associated with the lines of the meridian system. We strongly believe that these findings will further the current understanding of indications of acupoints and meridians.

A Research on Difference Between Consumer Perception of Slow Fashion and Consumption Behavior of Fast Fashion: Application of Topic Modelling with Big Data

  • YANG, Oh-Suk;WOO, Young-Mok;YANG, Yae-Rim
    • The Journal of Economics, Marketing and Management
    • /
    • v.9 no.1
    • /
    • pp.1-14
    • /
    • 2021
  • Purpose: The article deals with the proposition that consumers' fashion consumption behavior will still follow the consumption behavior of fast fashion, despite recognizing the importance of slow fashion. Research design, data and methodology: The research model to verify this proposition is topic modelling with big data including unstructured textual data. we combined 5,506 news articles posted on Naver news search platform during the 2003-2019 period about fast fashion and slow fashion, high-frequency words have been derived, and topics have been found using LDA model. Based on these, we examined consumers' perception and consumption behavior on slow fashion through the analysis of Topic Network. Results: (1) Looking at the status of annual article collection, consumers' interest in slow fashion mainly began in 2005 and showed a steady increase up to 2019. (2) Term Frequency analysis showed that the keywords for slow fashion are the lowest, with consumers' consumption patterns continuing around 'brand.' (3) Each topic's weight in articles showed that 'social value' - which includes slow fashion - ranked sixth among the 9 topics, low linkage with other topics. (4) Lastly, 'brand' and 'fashion trend' were key topics, and the topic 'social value' accounted for a low proportion. Conclusion: Slow fashion was not a considerable factor of consumption behavior. Consumption patterns in fashion sector are still dominated by general consumption patterns centered on brands and fast fashion.

Analyzing Disaster Response Terminologies by Text Mining and Social Network Analysis (텍스트 마이닝과 소셜 네트워크 분석을 이용한 재난대응 용어분석)

  • Kang, Seong Kyung;Yu, Hwan;Lee, Young Jai
    • Information Systems Review
    • /
    • v.18 no.1
    • /
    • pp.141-155
    • /
    • 2016
  • This study identified terminologies related to the proximity and frequency of disaster by social network analysis (SNA) and text mining, and then expressed the outcome into a mind map. The termdocument matrix of text mining was utilized for the terminology proximity analysis, and the SNA closeness centrality was calculated to organically express the relationship of the terminologies through a mind map. By analyzing terminology proximity and selecting disaster response-related terminologies, this study identified the closest field among all the disaster response fields to disaster response and the core terms in each disaster response field. This disaster response terminology analysis could be utilized in future core term-based terminology standardization, disaster-related knowledge accumulation and research, and application of various response scenario compositions, among others.

Text-Independent Speaker Identification System Based On Vowel And Incremental Learning Neural Networks

  • Heo, Kwang-Seung;Lee, Dong-Wook;Sim, Kwee-Bo
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2003.10a
    • /
    • pp.1042-1045
    • /
    • 2003
  • In this paper, we propose the speaker identification system that uses vowel that has speaker's characteristic. System is divided to speech feature extraction part and speaker identification part. Speech feature extraction part extracts speaker's feature. Voiced speech has the characteristic that divides speakers. For vowel extraction, formants are used in voiced speech through frequency analysis. Vowel-a that different formants is extracted in text. Pitch, formant, intensity, log area ratio, LP coefficients, cepstral coefficients are used by method to draw characteristic. The cpestral coefficients that show the best performance in speaker identification among several methods are used. Speaker identification part distinguishes speaker using Neural Network. 12 order cepstral coefficients are used learning input data. Neural Network's structure is MLP and learning algorithm is BP (Backpropagation). Hidden nodes and output nodes are incremented. The nodes in the incremental learning neural network are interconnected via weighted links and each node in a layer is generally connected to each node in the succeeding layer leaving the output node to provide output for the network. Though the vowel extract and incremental learning, the proposed system uses low learning data and reduces learning time and improves identification rate.

  • PDF