• Title/Summary/Keyword: TextRank

Search Result 83, Processing Time 0.02 seconds

Understanding the Food Hygiene of Cruise through the Big Data Analytics using the Web Crawling and Text Mining

  • Shuting, Tao;Kang, Byongnam;Kim, Hak-Seon
    • Culinary science and hospitality research
    • /
    • v.24 no.2
    • /
    • pp.34-43
    • /
    • 2018
  • The objective of this study was to acquire a general and text-based awareness and recognition of cruise food hygiene through big data analytics. For the purpose, this study collected data with conducting the keyword "food hygiene, cruise" on the web pages and news on Google, during October 1st, 2015 to October 1st, 2017 (two years). The data collection was processed by SCTM which is a data collecting and processing program and eventually, 899 kb, approximately 20,000 words were collected. For the data analysis, UCINET 6.0 packaged with visualization tool-Netdraw was utilized. As a result of the data analysis, the words such as jobs, news, showed the high frequency while the results of centrality (Freeman's degree centrality and Eigenvector centrality) and proximity indicated the distinct rank with the frequency. Meanwhile, as for the result of CONCOR analysis, 4 segmentations were created as "food hygiene group", "person group", "location related group" and "brand group". The diagnosis of this study for the food hygiene in cruise industry through big data is expected to provide instrumental implications both for academia research and empirical application.

Association Modeling on Keyword and Abstract Data in Korean Port Research

  • Yoon, Hee-Young;Kwak, Il-Youp
    • Journal of Korea Trade
    • /
    • v.24 no.5
    • /
    • pp.71-86
    • /
    • 2020
  • Purpose - This study investigates research trends by searching for English keywords and abstracts in 1,511 Korean journal articles in the Korea Citation Index from the 2002-2019 period using the term "Port." The study aims to lay the foundation for a more balanced development of port research. Design/methodology - Using abstract and keyword data, we perform frequency analysis and word embedding (Word2vec). A t-SNE plot shows the main keywords extracted using the TextRank algorithm. To analyze which words were used in what context in our two nine-year subperiods (2002-2010 and 2010-2019), we use Scattertext and scaled F-scores. Findings - First, during the 18-year study period, port research has developed through the convergence of diverse academic fields, covering 102 subject areas and 219 journals. Second, our frequency analysis of 4,431 keywords in 1,511 papers shows that the words "Port" (60 times), "Port Competitiveness" (33 times), and "Port Authority" (29 times), among others, are attractive to most researchers. Third, a word embedding analysis identifies the words highly correlated with the top eight keywords and visually shows four different subject clusters in a t-SNE plot. Fourth, we use Scattertext to compare words used in the two research sub-periods. Originality/value - This study is the first to apply abstract and keyword analysis and various text mining techniques to Korean journal articles in port research and thus has important implications. Further in-depth studies should collect a greater variety of textual data and analyze and compare port studies from different countries.

Method of Extracting the Topic Sentence Considering Sentence Importance based on ELMo Embedding (ELMo 임베딩 기반 문장 중요도를 고려한 중심 문장 추출 방법)

  • Kim, Eun Hee;Lim, Myung Jin;Shin, Ju Hyun
    • Smart Media Journal
    • /
    • v.10 no.1
    • /
    • pp.39-46
    • /
    • 2021
  • This study is about a method of extracting a summary from a news article in consideration of the importance of each sentence constituting the article. We propose a method of calculating sentence importance by extracting the probabilities of topic sentence, similarity with article title and other sentences, and sentence position as characteristics that affect sentence importance. At this time, a hypothesis is established that the Topic Sentence will have a characteristic distinct from the general sentence, and a deep learning-based classification model is trained to obtain a topic sentence probability value for the input sentence. Also, using the pre-learned ELMo language model, the similarity between sentences is calculated based on the sentence vector value reflecting the context information and extracted as sentence characteristics. The topic sentence classification performance of the LSTM and BERT models was 93% accurate, 96.22% recall, and 89.5% precision, resulting in high analysis results. As a result of calculating the importance of each sentence by combining the extracted sentence characteristics, it was confirmed that the performance of extracting the topic sentence was improved by about 10% compared to the existing TextRank algorithm.

Analysis of User Reviews of Running Applications Using Text Mining: Focusing on Nike Run Club and Runkeeper (텍스트마이닝을 활용한 러닝 어플리케이션 사용자 리뷰 분석: Nike Run Club과 Runkeeper를 중심으로)

  • Gimun Ryu;Ilgwang Kim
    • Journal of Industrial Convergence
    • /
    • v.22 no.4
    • /
    • pp.11-19
    • /
    • 2024
  • The purpose of this study was to analyze user reviews of running applications using text mining. This study used user reviews of Nike Run Club and Runkeeper in the Google Play Store using the selenium package of python3 as the analysis data, and separated the morphemes by leaving only Korean nouns through the OKT analyzer. After morpheme separation, we created a rankNL dictionary to remove stopwords. To analyze the data, we used TF, TF-IDF and LDA topic modeling in text mining. The results of this study are as follows. First, the keywords 'record', 'app', and 'workout' were identified as the top keywords in the user reviews of Nike Run Club and Runkeeper applications, and there were differences in the rankings of TF and TF-IDF. Second, the LDA topic modeling of Nike Run Club identified the topics of 'basic items', 'additional features', 'errors', and 'location-based data', and the topics of Runkeeper identified the topics of 'errors', 'voice function', 'running data', 'benefits', and 'motivation'. Based on the results, it is recommended that errors and improvements should be made to contribute to the competitiveness of the application.

Hybrid Method using Frame Selection and Weighting Model Rank to improve Performance of Real-time Text-Independent Speaker Recognition System based on GMM (GMM 기반 실시간 문맥독립화자식별시스템의 성능향상을 위한 프레임선택 및 가중치를 이용한 Hybrid 방법)

  • 김민정;석수영;김광수;정호열;정현열
    • Journal of Korea Multimedia Society
    • /
    • v.5 no.5
    • /
    • pp.512-522
    • /
    • 2002
  • In this paper, we propose a hybrid method which is mixed with frame selection and weighting model rank method, based on GMM(gaussian mixture model), for real-time text-independent speaker recognition system. In the system, maximum likelihood estimation was used for GMM parameter optimization, and maximum likelihood was used for recognition basically Proposed hybrid method has two steps. First, likelihood score was calculated with speaker models and test data at frame level, and the difference is calculated between the biggest likelihood value and second. And then, the frame is selected if the difference is bigger than threshold. The second, instead of calculated likelihood, weighting value is used for calculating total score at each selected frame. Cepstrum coefficient and regressive coefficient were used as feature parameters, and the database for test and training consists of several data which are collected at different time, and data for experience are selected randomly In experiments, we applied each method to baseline system, and tested. In speaker recognition experiments, proposed hybrid method has an average of 4% higher recognition accuracy than frame selection method and 1% higher than W method, implying the effectiveness of it.

  • PDF

Food Preference and Nutrient Intake Status of High School Students in Rural Area of Korea (농촌 청소년의 식품 기호도와 영양 섭취 실태와의 관계)

  • Lee, Gun-Soon;Yoo, Young-Sang
    • Journal of the East Asian Society of Dietary Life
    • /
    • v.7 no.2
    • /
    • pp.199-210
    • /
    • 1997
  • The purpose of this study was to investigate the mutual relationship between food preference and nutrient intake status of high school students, based on the their personal characters which are sex, age, family type, number of family, mother's age, occupation, and school career. 439 students were selected with random stratified cluster sampling method. The study used a self-administrated questionnaire and 24-hour recall method for 5 days as instrument tools. Statistical methods applied to analyze the data were frequency, percent, Willcoxon Rank-sum test, Kruskal-Wallis test, ${x^2}-test$ by contingence table, and Spearman's correlation coefficient in non parametric statistical methods. Some of interesting results are as follows : 1. The correlation between sex and the set of characters of mother's age, school career and income is highly significant. However there is no any significant difference on the kinds of job and the types of family. 2. The relation between the preference of main dishes and the nutrient intake show a significant difference except to the noodles. This marks that preference of main dishes shows a direct proportion with the nutrient intakes except for the fat, vitamin A, vitamin C. 3. The preference of animal food marks a direct proportion with the nutrients such as energy, protein, fat, fiber, phosphorus, iron, vitamin $B_{1}$, vitamin $B_{2}$, and niacin 4. The preference of vegetable food gives some influence on the nutrient intake but the preference of soup is insignificant, the preference of Kimchi is in reverse proportion, and the preference of vegetable marks a direct proportion with the nutrient intake. 5. The preference of snacks marks a direct proportion with all kinds of nutrients intake except for the vitamin A, and vitamin C.

  • PDF

Research Trends of Articles Published in the Journal of Korean Clinical Nursing Research from 2000 to 2017: Text Network Analysis of Keywords (텍스트 네크워크 분석을 이용한 임상간호연구 게재논문의 연구동향 분석: 2000년부터 2017년까지)

  • Kim, Yeon Hee;Moon, Seong Mi;Kwon, In Gak;Kim, Kwang Sung;Jeong, Geum Hee;Shin, Eun Suk;Oh, Hyang Soon;Kim, Soo Hyun
    • Journal of Korean Clinical Nursing Research
    • /
    • v.25 no.1
    • /
    • pp.80-90
    • /
    • 2019
  • Purpose: The aim of this study was to identify the research trends of articles published in the Journal of Korean Clinical Nursing Research from 2000 to 2017 by a text network analysis using keywords. Methods: This study analyzed 600 articles. The R program was used for text mining that extracted frequency, centrality rank, and keyword network. Results: From 2000 to 2009, keywords with high-frequency were 'nurse', 'pain', 'anxiety', 'knowledge', 'attitude', and so on. 'Pain', 'nurse', and 'knowledge' showed a high centrality. 'Fatigue' showed no high frequency but a high centrality. Keywords such as 'nurse', 'knowledge', and 'pain' also showed high frequency and centrality between 2010 and 2017. 'Hemodialysis' and 'intensive care unit' were added to keywords with high frequency and centrality during the period. Conclusion: The frequency and centrality of keywords such as 'nurse', 'pain', 'knowledge', 'hemodialysis', and 'intensive care unit' reflect the research trends in clinical nursing between 2000 and 2017. Further studies need to expand the keyword networks by connecting the main keywords.

Relevant Image Retrieval of Korean Documents based on Sentence and Word Importance (문장 및 단어 중요도를 통한 한국어 문서 연관 이미지 검색)

  • Kim, Nam-Gyu;Kang, Shin-Jae
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.20 no.3
    • /
    • pp.43-48
    • /
    • 2019
  • While reading text-only documents and finding unknown words, readers will become the focus disturbed and not be able to understand the content of the documents. Because children have little experience, it is difficult to understand correctly if the description in context is unfamiliar or ambiguous. In this paper, in order to help understand the text and increase the interest of the readers, we analyze the texts of documents and select the contents that are considered important, and implement a system that displays the most relevant images automatically from the web and links the texts and the images together. The implementation of the system divides the article into paragraphs, analyzes the text, selects important sentences for each paragraph and the important words that best represent the meaning of the important sentences, searches for images related to the words on the web, and then links the images to each of the previous paragraphs. Experiments have shown how to select important sentences and how to select important words in the sentences. As a result of the experiment, we could get 60% performance by evaluating the accuracy of the relation between three selected images and corresponding important sentences.

Query Expansion based on Word Graph using Term Proximity (질의 어휘와의 근접도를 반영한 단어 그래프 기반 질의 확장)

  • Jang, Kye-Hun;Lee, Kyung-Soon
    • The KIPS Transactions:PartB
    • /
    • v.19B no.1
    • /
    • pp.37-42
    • /
    • 2012
  • The pseudo relevance feedback suggests that frequent words at the top documents are related to initial query. However, the main drawback associated with the term frequency method is the fact that it relies on feature independence, and disregards any dependencies that may exist between words in the text. In this paper, we propose query expansion based on word graph using term proximity. It supplements term frequency method. On TREC WT10g test collection, experimental results in MAP(Mean Average Precision) show that the proposed method achieved 6.4% improvement over language model.

A Study on the Signification of 'The Medicalization of Aging' in TV Health Programs: A Text Analysis of Focus on the 'Vitamin' in KBS (TV 건강프로그램의 '노화의 의료화' 의미화 방식: KBS <비타민>의 텍스트 분석을 중심으로)

  • Kim, Ju-Mi;Han, Hye-Kyoung
    • Korean journal of communication and information
    • /
    • v.61
    • /
    • pp.159-179
    • /
    • 2013
  • This study aims to consider the criteria and signification of 'aging' constructed in media in Korean society that has entered aging society. For the purpose, this study analyzed KBS the representative TV health programs. According to the result, designs the measurable indexes of aging to rank the casts. And it emphasizes to the casts that cannot reach a certain level the support from medical experts or advanced medical technology. With such characteristics of individual text, this paper found the ideological codes of the health programs. They contrast the elderly who have achieved successful aging from those that have not. They define the aged who have not practiced self-management or medical control to prevent aging properly as failure and also make fun of them. They draw aging that was not regarded as some kind of disease in the past into the area of medicine. Besides, the medicalization of aging regarded as an object for treatment may come to strengthen the control of medical experts and also individualize social issues.

  • PDF