• 제목/요약/키워드: Text frequency analysis

검색결과 458건 처리시간 0.025초

클라우드 컴퓨팅에서 Hadoop 애플리케이션 특성에 따른 성능 분석 (A Performance Analysis Based on Hadoop Application's Characteristics in Cloud Computing)

  • 금태훈;이원주;전창호
    • 한국컴퓨터정보학회논문지
    • /
    • 제15권5호
    • /
    • pp.49-56
    • /
    • 2010
  • 본 논문에서는 클라우드 컴퓨팅을 위해 Hadoop 기반의 클러스터를 구축하고, RandomTextWriter, WordCount, PI 애플리케이션을 수행함으로써 애플리케이션 특성에 따른 클러스터의 성능을 평가한다. RandomTextWriter는 주어진 용량만큼 임의의 단어를 생성하여 HDFS에 저장하는 애플리케이션이고, WordCount는 입력 파일을 읽어서 블록 단위로 단어 빈도수를 계산하는 애플리케이션이다. 그리고 PI는 몬테카를로법을 사용하여 PI 값을 유도하는 애플리케이션이다. 이러한 애플리케이션을 실행시키면서 데이터 블록 크기와 데이터 복제본 수 증가에 따른 애플리케이션의 수행시간을 측정한다. 시뮬레이션을 통하여 RandomTextWriter 애플리케이션은 데이터 복제본 수 증가에 비례하여 수행시간이 증가함을 알 수 있었다. 반면에 WordCount와 PI 애플리케이션은 데이터 복제본 수에 큰 영향을 받지 않았다. 또한 WordCount 애플리케이션은 블록 크기가 64~256MB 일 때 최적의 수행시간을 얻을 수있었다. 따라서 이러한 애플리케이션의 특성을 고려한 스케줄링 정책을 개발한다면 애플리케이션의 실행시간을 단축하여 클라우드 컴퓨팅 시스템의 성능을 향상시킬 수 있음을 보인다.

Analysis of Dental Hygienist Job Recognition Using Text Mining

  • Kim, Bo-Ra;Ahn, Eunsuk;Hwang, Soo-Jeong;Jeong, Soon-Jeong;Kim, Sun-Mi;Han, Ji-Hyoung
    • 치위생과학회지
    • /
    • 제21권1호
    • /
    • pp.70-78
    • /
    • 2021
  • Background: The aim of this study was to analyze the public demand for information about the job of dental hygienists by mining text data collected from the online Q & A section on an Internet portal site. Methods: Text data were collected from inquiries that were posted on the Naver Q & A section from January 2003 to July 2020 using "dental hygienist job recognition," "role recognition," "medical assistance," and "scaling" as search keywords. Text mining techniques were used to identify significant Korean words and their frequency of occurrence. In addition, the association between words was analyzed. Results: A total of 10,753 Korean words related to the job of dental hygienists were extracted from the text data. "Chi-lyo (treatment)," "chigwa (dental clinic)," "ske-illing (scaling)," "itmom (gum)," and "chia (tooth)" were the five most frequently used words. The words were classified into the following areas of job of the dental hygienist: periodontal disease treatment and prevention, medical assistance, patient care and consultation, and others. Among these areas, the number of words related to medical assistance was the largest, with sixty-six association rules found between the words, and "chi-lyo," "chigwa," and "ske-illing" as core words. Conclusion: The public demand for information about the job of dental hygienists was mainly related to "chi-lyo," "chigwa," and "ske-illing" as core words, demonstrating that scaling is recognized by the public as the job of a dental hygienist. However, the high demand for information related to treatment and medical assistance in the context of dental hygienists indicates that the job of dental hygienists is recognized by the public as being more focused on medical assistance than preventive dental care that are provided with job autonomy.

빅데이터를 활용한 골프웨어에 관한 인식 연구 (A Study of Perception of Golfwear Using Big Data Analysis)

  • 이아름;이진화
    • 한국의류산업학회지
    • /
    • 제20권5호
    • /
    • pp.533-547
    • /
    • 2018
  • The objective of this study is to examine the perception of golfwear and related trends based on major keywords and associated words related to golfwear utilizing big data. For this study, the data was collected from blogs, Jisikin and Tips, news articles, and web $caf{\acute{e}}$ from two of the most commonly used search engines (Naver & Daum) containing the keywords, 'Golfwear' and 'Golf clothes'. For data collection, frequency and matrix data were extracted through Textom, from January 1, 2016 to December 31, 2017. From the matrix created by Textom, Degree centrality, Closeness centrality, Betweenness centrality, and Eigenvector centrality were calculated and analyzed by utilizing Netminer 4.0. As a result of analysis, it was found that the keyword 'brand' showed the highest rank in web visibility followed by 'woman', 'size', 'man', 'fashion', 'sports', 'price', 'store', 'discount', 'equipment' in the top 10 frequency rankings. For centrality calculations, only the top 30 keywords were included because the density was extremely high due to high frequency of the co-occurring keywords. The results of centrality calculations showed that the keywords on top of the rankings were similar to the frequency of the raw data. When the frequency was adjusted by subtracting 100 and 500 words, it showed different results as the low-ranking keywords such as J. Lindberg in the frequency analysis ranked high along with changes in the rankings of all centrality calculations. Such findings of this study will provide basis for marketing strategies and ways to increase awareness and web visibility for Golfwear brands.

Analysis of domestic and foreign research trends of Tricholoma matsutake using text mining techniques

  • Choi, Ah Hyeon;Kang, Jun Won
    • 농업과학연구
    • /
    • 제48권3호
    • /
    • pp.505-514
    • /
    • 2021
  • Among non-timber forest products, Tricholoma matsutake is a high value added item. Many countries, including Korea, China, and Japan, are doing research and technology development to increase artificial cultivation and productivity. However, the production of T. matsutake is on the decline due to global warming, abnormal temperatures and pine tree pest problems. Therefore, it is necessary to identify trends in domestic and foreign research on T. matsutake, respond to preemptive research and development to preserve the genetic resources of T. matsutake and increase its productivity. Based on the correlation between keywords in the high frequency keywords, it was observed that microbial clusters of T. matsutake are mainly found in Korea. The main focus in China has been the pharmacology studies on the ingredients of T. matsutake. The main focus in Japan has been on preserving the genetic diversity and species of T. matsutake. Thus, future domestic studies of T. matsutake will require pharmacological studies on the ingredients of T. matsutake and on its genetic diversity and species conservation. In addition, unlike China and Japan, genetic keywords did not appear in Korea at high frequency. Therefore, Korea will have to proceed with research using modern molecular biology techniques.

The Strategy of Wireless Power Transfer for Light Rail Transit By Core Technologies Analysis Based on Text Mining

  • Meng, Xiang-Yu;Han, Young-Jae;Eum, Soo-Min;Cho, Sung-Won
    • 한국컴퓨터정보학회논문지
    • /
    • 제23권11호
    • /
    • pp.193-201
    • /
    • 2018
  • In this paper, we extracted relevant patent data and conducted statistical analysis to understand the technical development trend related to Wireless Power Transfer (WPT) for Light Rail Transit (LRT). Recently, with the development of WPT technologies, the Light Rail Transit (LRT) industry is concentrating on applying WPT to the power supply system of trains because of their advantages compared wired counterpart, such as low maintenance cost and high stability. This technology is divided into three areas: wireless feeding and collecting technology, high-frequency power converter technology and orbital and infrastructure technology. From each specific area, key words in patent document were extracted by TF-IDF method and analyzed by social network. In the keyword network, core word of each specific technology were extracted according to their degree centrality. Then, the multi-word phrases were also built to represent the concept of core technologies. Finally, based on the analysis results, the development strategies for each specifics technical area of WPT in LRT filed will be provided.

Research on Satisfaction Evaluation Based on Tourist Big Data

  • Guo, Hanwen;Liu, Ziyang;Jiao, Zeyu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제16권1호
    • /
    • pp.231-244
    • /
    • 2022
  • With the improvement of people's living standards and the development of tourism, tourists have greater freedom in choosing destinations. Therefore, as an indicator of satisfaction with scenic spots, tourist comments are becoming increasingly prominent. This paper aims to compare and analyze the landscape image of the Five Great Mountains in China and provide specific strategies for its development. The online reviews of tourists on the Online Travel Agency (OTA) website about the Five Great Mountains from 2015 to 2018 are collected as research samples. The text analysis method and R language are used to analyze the content of the tourist reviews, while the high-frequency words in the word cloud are used for visual display. In addition, the entropy weight method is used to determine the index weight and tourist satisfaction is evaluated to understand the weaknesses of those scenic spots. The results of the study show that firstly, the tourist satisfaction with the Five Great Mountains is basically consistent with its popularity. Secondly, through weight analysis, tourists pay special attention to the landscape features and environmental health of the scenic area, so that relevant departments should focus on building the landscape characteristics and improving the environmental health of the scenic area. At the same time, the accommodation and service management of the scenic spot cannot be ignored. Finally, according to the analysis results, suggestions are made on how to improve the tourist satisfaction with the Five Great Mountains.

유튜브 댓글을 통해 살펴본 버추얼 인플루언서에 대한 인식 연구 -캐릭터 디자인에 대한 긍부정 감성 반응을 중심으로- (A Study on Perceptions of Virtual Influencers through YouTube Comments -Focusing on Positive and Negative Emotional Responses Toward Character Design-)

  • 안효선;김지영
    • 한국의류학회지
    • /
    • 제47권5호
    • /
    • pp.873-890
    • /
    • 2023
  • This study analyzed users' emotional responses to VI character design through YouTube comments. The researchers applied text-mining to analyze 116,375 comments, focusing on terms related to character design and characteristics of VI. Using the BERT model in sentiment analysis, we classified comments into extremely negative, negative, neutral, positive, or extremely positive sentiments. Next, we conducted a co-occurrence frequency analysis on comments with extremely negative and extremely positive responses to examine the semantic relationships between character design and emotional characteristic terms. We also performed a content analysis of comments about Miquela and Shudu to analyze the perception differences regarding the two character designs. The results indicate that form elements (e.g., voice, face, and skin) and behavioral elements (e.g., speaking, interviewing, and reacting) are vital in eliciting users' emotional responses. Notably, in the negative responses, users focused on the humanization aspect of voice and the authenticity aspect of behavior in speaking, interviewing, and reacting. Furthermore, we found differences in the character design elements and characteristics that users expect based on the VI's field of activity. As a result, this study suggests applications to character design to accommodate these variations.

텍스트 마이닝과 연관 관계 분석을 이용한 건축역사 용어 분석 (Analyzing Architectural History Terminologies by Text Mining and Association Analysis)

  • 김민정;김철주
    • 디지털융복합연구
    • /
    • 제15권1호
    • /
    • pp.443-452
    • /
    • 2017
  • 건축의 한 분야인 동시에 역사학의 한 분야이기도 한 건축역사는 건축양식의 변천을 다루기는 하나 사회적, 경제적, 문화적, 기술적 상황 등의 시대 배경을 종합적으로 고찰할 필요가 있다. 그러므로 건축역사에서 주로 사용되는 용어는 다양한 분야를 아우를 수밖에 없다. 따라서 본 연구에서는 건축역사 관련 문헌을 대상으로 텍스트 마이닝과 연관 관계 분석을 수행하여 어떤 용어가 건축역사에서 핵심적인 용어인지를 파악해보았다. 우선 국내 건축역사 분야 유일한 학술지인 "건축역사연구"를 선정하여 지금까지 게재된 논문의 제목과 주제어, 초록에 사용된 용어 중 고빈도로 출현하는 핵심 용어들을 도출하였다. 다음으로 연구 분야별 문헌들을 구분하여 핵심 용어의 특징을 분석하였다. 마지막으로, 연관 관계 분석을 통해 핵심 용어들 간에 유기적인 관계를 분석하고 시각화하였다. 이러한 건축역사 핵심 용어의 파악은 건축역사 분야의 지금까지의 논의 내용과 향후 방향성을 이해하는데 유용할 것이다.

빅데이터 분석을 활용한 하이서울패션쇼에 대한 소비자 인식 조사 (A Study on the Consumer's Perception of HiSeoul Fashion Show Using Big Data Analysis)

  • 한기향
    • 패션비즈니스
    • /
    • 제23권5호
    • /
    • pp.81-95
    • /
    • 2019
  • The purpose of this study is to research consumers' perception of the HiSeoul fashion show, which is being used by new designers as a means of promotion, and to propose a strategy for revitalizing new designer brands. This was done in order to secure basic data from fashion consumers, to help guide marketing strategies and promote rising designers. In this research, the consumers' perception of HiSeoul fashion show was verified using text-mining, data refinement and word clouding that was undertaken by TEXTOM3.0. Also, semantic network analysis, CONCOR analysis and visualization of the analysis results were performed using Ucinet 6.0 and NetDraw. "HiSeoul fashion show" was used as the keyword for text-mining and data was collected from March 1, 2018 to April 30, 2019. Using frequency analysis, TF-IDF, and N-gram, it was also shown that consumers are aware of places where shows are held, such as DDP and Igansumun. It was also revealed that consumers recognize rising designer brands, designer's names, the names of guests attending the show and the photo times. This study is meaningful in that it not only confirmed consumers' interest in new designer brands participating in the HiSeoul Fashion Show through big data but also confirmed that it is available as a marketing strategy to boost brand sales. This study suggests using HiSeoul show room to induce consumer sales, or inviting guests that match the brand image to promote them on SNS on the day the show is held for a marketing strategy.

선박예지정비모델 개발을 위한 LNG 선박 도크 수리 항목의 텍스트 분석 연구 (Study on Text Analysis of the Liquefied Natural Gas Carriers Dock Specification for Development of the Ship Predictive Maintenance Model)

  • 황태민;윤익현;오정모
    • 해양환경안전학회지
    • /
    • 제27권1호
    • /
    • pp.60-66
    • /
    • 2021
  • 다양한 산업에서 강조되고 있는 정비의 중요성은 각 분야에 다양한 정비전략을 적용하도록 만들었다. 해양산업 역시 그에 따른 정비전략의 변화가 있었으나 타 산업 대비 그 속도가 느려 실제 적용이 되지 않은 채 과거 시행되고 있던 방식을 유지하는 경우가 많다. 특히 선박은 기존에 행해왔던 방식의 정비전략을 사용하고 있는 편이며 해상의 조건에서 선박은 새로운 정비전략의 개발을 필요로 하고있다. 이에 선박예지정비모델은 기기의 정비가 필요한 시점을 예지하여 조치 할 수 있는 정비전략으로서 선박이 항해 중에 처할 수 있는 정비 관련 위험요소들을 줄여 주는 모델이다. 본 연구는 선박예지정비모델의 개발을 위한 연구 중의 하나로서, LNG선박 입거사양서의 텍스트 데이터 분석을 통한 결과를 원문의 내용을 바탕으로 해석해보았다. 공통된 정비항목 조합을 도출하여 선박 내 다른 기기들 사이에 작용하고 있는 상호연관성을 발견하고 이를 앞으로 개발될 선박예지정비모델에 적용하고자 한다.