• Title/Summary/Keyword: Frequency based Text Analysis

Search Result 237, Processing Time 0.024 seconds

Big Data Analysis of the Annals of the Joseon Dynasty Using Jsoup (Jsoup를 이용한 조선왕조실록의 빅 데이터 분석)

  • Bong, Young-Il;Lee, Choong-Ho
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.10a
    • /
    • pp.131-133
    • /
    • 2021
  • The Annals of the Joseon Dynasty are important records registered in UNESCO. This paper proposes a method to analyze big data by examining the frequency of words in the Annals of the Joseon Dynasty translated into Korean. When you access the Annals of the Joseon Dynasty from an Internet site and try to investigate the frequency of words, if you directly access the source included in the page, the keywords necessary for the HTML grammar are included, so that it is difficult to analyze big data based on the frequency of words in the necessary text. In this paper, we propose a method to analyze the text of the Annals of the Joseon Dynasty using Java's Jsoup crawling function. In the experiment, only the Taejo part of the Annals of the Joseon Dynasty was extracted to verify the validity of this method.

  • PDF

Text Analysis of Software Test Report (소프트웨어 시험성적서에 대한 텍스트 분석)

  • Jung, Hye-Jung;Han, Gun-Hee
    • Journal of the Korea Convergence Society
    • /
    • v.11 no.11
    • /
    • pp.25-31
    • /
    • 2020
  • This study is to study a method of applying weights for quality characteristics in software test evaluation. The weight application method analyzes the text of the test report and uses the ratio according to the frequency of the text as a weight for the quality characteristics of the software test score. The feasibility review of the results of this study was conducted by comparing the results of the questionnaire survey, which made the developers and users to evaluate the importance of software, and the results of the frequency analysis of text analysis. When measuring quality based on the eight quality characteristics presented in ISO/IEC 25023, the result of this study is the software quality measurement result considering software characteristics, whereas the result of this study is the software quality measurement result by applying the same weight when measuring quality.

Text Summarization using PCA and SVD (주성분 분석과 비정칙치 분해를 이용한 문서 요약)

  • Lee, Chang-Beom;Kim, Min-Soo;Baek, Jang-Sun;Park, Hyuk-Ro
    • The KIPS Transactions:PartB
    • /
    • v.10B no.7
    • /
    • pp.725-734
    • /
    • 2003
  • In this paper, we propose the text summarization method using PCA (Principal Component Analysis) and SVD (Singular Value Decomposition). The proposed method presents a summary by extracting significant sentences based on the distances between thematic words and sentences. To extract thematic words, we use both word frequency and co-occurence information that result from performing PCA. To extract significant sentences, we exploit Euclidean distances between thematic word vectors and sentence vectors that result from carrying out SVD. Experimental results using newspaper articles show that the proposed method is superior to the method using either word frequency or only PCA.

Cross-Domain Text Sentiment Classification Method Based on the CNN-BiLSTM-TE Model

  • Zeng, Yuyang;Zhang, Ruirui;Yang, Liang;Song, Sujuan
    • Journal of Information Processing Systems
    • /
    • v.17 no.4
    • /
    • pp.818-833
    • /
    • 2021
  • To address the problems of low precision rate, insufficient feature extraction, and poor contextual ability in existing text sentiment analysis methods, a mixed model account of a CNN-BiLSTM-TE (convolutional neural network, bidirectional long short-term memory, and topic extraction) model was proposed. First, Chinese text data was converted into vectors through the method of transfer learning by Word2Vec. Second, local features were extracted by the CNN model. Then, contextual information was extracted by the BiLSTM neural network and the emotional tendency was obtained using softmax. Finally, topics were extracted by the term frequency-inverse document frequency and K-means. Compared with the CNN, BiLSTM, and gate recurrent unit (GRU) models, the CNN-BiLSTM-TE model's F1-score was higher than other models by 0.0147, 0.006, and 0.0052, respectively. Then compared with CNN-LSTM, LSTM-CNN, and BiLSTM-CNN models, the F1-score was higher by 0.0071, 0.0038, and 0.0049, respectively. Experimental results showed that the CNN-BiLSTM-TE model can effectively improve various indicators in application. Lastly, performed scalability verification through a takeaway dataset, which has great value in practical applications.

A Method for Short Text Classification using SNS Feature Information based on Markov Logic Networks (SNS 특징정보를 활용한 마르코프 논리 네트워크 기반의 단문 텍스트 분류 방법)

  • Lee, Eunji;Kim, Pankoo
    • Journal of Korea Multimedia Society
    • /
    • v.20 no.7
    • /
    • pp.1065-1072
    • /
    • 2017
  • As smart devices and social network services (SNSs) become increasingly pervasive, individuals produce large amounts of data in real time. Accordingly, studies on unstructured data analysis are actively being conducted to solve the resultant problem of information overload and to facilitate effective data processing. Many such studies are conducted for filtering inappropriate information. In this paper, a feature-weighting method considering SNS-message features is proposed for the classification of short text messages generated on SNSs, using Markov logic networks for category inference. The performance of the proposed method is verified through a comparison with an existing frequency-based classification methods.

The Effects of Consumers' Mask Selection Criteria on Mask Brand Awareness and Purchase Intention for Fashion Masks (마스크 선택기준이 브랜드 인지와 패션 마스크 구매의도에 미치는 영향)

  • Kim, Min Su;Lee, Ha Kyung;Kim, Hanna
    • Journal of the Korean Society of Clothing and Textiles
    • /
    • v.46 no.1
    • /
    • pp.116-131
    • /
    • 2022
  • This study used text mining to analyze big data to understand consumers' demand for and perceptions of fashion masks. Based on the text-mining analysis results, a survey was conducted with those living in Korea to investigate the influence of consumers' mask selection criteria on mask brand awareness and purchase intention for fashion masks. "Fashion mask" and "functional mask" were used as the keywords in a text-mining analysis, and an online survey of 242 respondents was conducted. The analysis results were as follows: First, the text-mining analysis extracted commonly appearing words that had a high frequency and TF-IDF, such as "COVID-19," "fashion," "celebrity," "antibacterial," and "filter." This confirmed that during the COVID-19 pandemic, consumers have demanded masks that are both functional and fashionable. Second, among consumers' mask selection criteria, trend and design had positive effects on face-mask brand awareness. Third, face-mask brand awareness had a positive effect on the purchase intention for both brand and fashion masks, and the purchase intention for brand masks had a positive effect on the purchase intention for fashion masks.

Text Mining Analysis on the Research Field of the Coastal and Ocean Engineering Based on the SCOPUS Bibliographic Information (해안해양공학 연구 분야의 SCOPUS 서지정보 Text Mining 분석)

  • Lee, Gi Seop;Cho, Hong Yeon;Han, Jae Rim
    • Journal of Korean Society of Coastal and Ocean Engineers
    • /
    • v.30 no.1
    • /
    • pp.19-28
    • /
    • 2018
  • Numerous research papers have been accumulated due to the development and computerization of bibliometrics. This made it difficult to review all of the related papers published worldwide to conduct the study. However, due to the development of Natural language processing techniques, the tendency analysis of published research papers has become easier. In this study, text mining analysis using the statistical computing language R was carried out based on the bibliographic information of SCOPUS DB (Data Base) in the field of coastal and ocean engineering. As expected, the term 'wave' predominates, and it was confirmed that numerical analysis and hydraulic experiments were still dominant from the terms 'numerical model', 'numerical simulation', and 'experimental study'. In addition, recent use of the term 'wave energy' related to marine energy has been recognized. On the other hand, it was quantitatively confirmed that the frequency of connection between 'wave', and 'height' or 'energy' prevailed, and suggested the possibility of high resolution analysis by detailed field and period in the future.

Development of big data based Skin Care Information System SCIS for skin condition diagnosis and management

  • Kim, Hyung-Hoon;Cho, Jeong-Ran
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.3
    • /
    • pp.137-147
    • /
    • 2022
  • Diagnosis and management of skin condition is a very basic and important function in performing its role for workers in the beauty industry and cosmetics industry. For accurate skin condition diagnosis and management, it is necessary to understand the skin condition and needs of customers. In this paper, we developed SCIS, a big data-based skin care information system that supports skin condition diagnosis and management using social media big data for skin condition diagnosis and management. By using the developed system, it is possible to analyze and extract core information for skin condition diagnosis and management based on text information. The skin care information system SCIS developed in this paper consists of big data collection stage, text preprocessing stage, image preprocessing stage, and text word analysis stage. SCIS collected big data necessary for skin diagnosis and management, and extracted key words and topics from text information through simple frequency analysis, relative frequency analysis, co-occurrence analysis, and correlation analysis of key words. In addition, by analyzing the extracted key words and information and performing various visualization processes such as scatter plot, NetworkX, t-SNE, and clustering, it can be used efficiently in diagnosing and managing skin conditions.

Analysis of Nursing Start-up Trends Using Text Network Analysis (텍스트 네트워크를 활용한 간호창업 연구동향 고찰)

  • Kim, Juhang
    • Journal of the Korea Convergence Society
    • /
    • v.11 no.1
    • /
    • pp.359-367
    • /
    • 2020
  • The purpose of this study is to explore text data of nursing start-up. 55 literatures were extracted from MEDLINE, Embase and Cochrane Library Data BASE. Text network analysis applied by using python network program. Key words with highest frequency and degree centrality were 'business', 'care', 'nursing', 'healthcare', 'service'. Keywords with highest degree centrality were 'mission', 'vision', 'team'. Based on the results nursing entrepreneurship support should be provided to develop competitive nursing services reflecting the specificity and science of nursing, to strengthen business competencies essential for nursing entrepreneurship, to expand nursing expertise and to present role models. The result will serve a basement to development systematic educational program and theory in nursing start-up.

A Trend Analysis and Policy proposal for the Work Permit System through Text Mining: Focusing on Text Mining and Social Network analysis (텍스트마이닝을 통한 고용허가제 트렌드 분석과 정책 제안 : 텍스트마이닝과 소셜네트워크 분석을 중심으로)

  • Ha, Jae-Been;Lee, Do-Eun
    • Journal of Convergence for Information Technology
    • /
    • v.11 no.9
    • /
    • pp.17-27
    • /
    • 2021
  • The aim of this research was to identify the issue of the work permit system and consciousness of the people on the system, and to suggest some ideas on the government policies on it. To achieve the aim of research, this research used text mining based on social data. This research collected 1,453,272 texts from 6,217 units of online documents which contained 'work permit system' from January to December, 2020 using Textom, and did text-mining and social network analysis. This research extracted 100 key words frequently mentioned from the analyses of data top-level key word frequency, and degree centrality analysis, and constituted job problem, importance of policy process, competitiveness in the respect of industries, and improvement of living conditions of foreign workers as major key words. In addition, through semantic network analysis, this research figured out major awareness like 'employment policy', and various kinds of ambient awareness like 'international cooperation', 'workers' human rights', 'law', 'recruitment of foreigners', 'corporate competitiveness', 'immigrant culture' and 'foreign workforce management'. Finally, this research suggested some ideas worth considering in establishing government policies on the work permit system and doing related researches.