• Title/Summary/Keyword: 단어빈도

Search Result 542, Processing Time 0.023 seconds

Analysis of Changes in Restaurant Attributes According to the Spread of Infectious Diseases: Application of Text Mining Techniques (감염병 확산에 따른 레스토랑 선택속성 변화 분석: 텍스트마이닝 기법 적용)

  • Joonil Yoo;Eunji Lee;Chulmo Koo
    • Information Systems Review
    • /
    • v.25 no.4
    • /
    • pp.89-112
    • /
    • 2023
  • In March 2020, as it was declared a COVID-19 pandemic, various quarantine measures were taken. Accordingly, many changes have occurred in the tourism and hospitality industries. In particular, quarantine guidelines, such as the introduction of non-face-to-face services and social distancing, were implemented in the restaurant industry. For decades, research on restaurant attributes has emphasized the importance of three attributes: atmosphere, service quality, and food quality. Nevertheless, to the best of our knowledge, research on restaurant attributes considering the COVID-19 situation is insufficient. To respond to this call, this study attempted an exploratory approach to classify new restaurant attributes based on understanding environmental changes. This study considered 31,115 online reviews registered in Naverplace as an analysis unit, with 475 general restaurants located in Euljiro, Seoul. Further, we attempted to classify restaurant attributes by clustering words within online reviews through TF-IDF and LDA topic modeling techniques. As a result of the analysis, the factors of "prevention of infectious diseases" were derived as new attributes of restaurants in the context of COVID-19 situations, along with the atmosphere, service quality, and food quality. This study is of academic significance by expanding the literature of existing restaurant attributes in that it categorized the three attributes presented by existing restaurant attributes and further presented new attributes. Moreover, the analysis results have led to the formulation of practical recommendations, considering both the operational aspects of restaurants and policy implications.

Multifaceted Evaluation Methodology for AI Interview Candidates - Integration of Facial Recognition, Voice Analysis, and Natural Language Processing (AI면접 대상자에 대한 다면적 평가방법론 -얼굴인식, 음성분석, 자연어처리 영역의 융합)

  • Hyunwook Ji;Sangjin Lee;Seongmin Mun;Jaeyeol Lee;Dongeun Lee;kyusang Lim
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2024.01a
    • /
    • pp.55-58
    • /
    • 2024
  • 최근 각 기업의 AI 면접시스템 도입이 증가하고 있으며, AI 면접에 대한 실효성 논란 또한 많은 상황이다. 본 논문에서는 AI 면접 과정에서 지원자를 평가하는 방식을 시각, 음성, 자연어처리 3영역에서 구현함으로써, 면접 지원자를 다방면으로 분석 방법론의 적절성에 대해 평가하고자 한다. 첫째, 시각적 측면에서, 면접 지원자의 감정을 인식하기 위해, 합성곱 신경망(CNN) 기법을 활용해, 지원자 얼굴에서 6가지 감정을 인식했으며, 지원자가 카메라를 응시하고 있는지를 시계열로 도출하였다. 이를 통해 지원자가 면접에 임하는 태도와 특히 얼굴에서 드러나는 감정을 분석하는 데 주력했다. 둘째, 시각적 효과만으로 면접자의 태도를 파악하는 데 한계가 있기 때문에, 지원자 음성을 주파수로 환산해 특성을 추출하고, Bidirectional LSTM을 활용해 훈련해 지원자 음성에 따른 6가지 감정을 추출했다. 셋째, 지원자의 발언 내용과 관련해 맥락적 의미를 파악해 지원자의 상태를 파악하기 위해, 음성을 STT(Speech-to-Text) 기법을 이용하여 텍스트로 변환하고, 사용 단어의 빈도를 분석하여 지원자의 언어 습관을 파악했다. 이와 함께, 지원자의 발언 내용에 대한 감정 분석을 위해 KoBERT 모델을 적용했으며, 지원자의 성격, 태도, 직무에 대한 이해도를 파악하기 위해 객관적인 평가지표를 제작하여 적용했다. 논문의 분석 결과 AI 면접의 다면적 평가시스템의 적절성과 관련해, 시각화 부분에서는 상당 부분 정확도가 객관적으로 입증되었다고 판단된다. 음성에서 감정분석 분야는 면접자가 제한된 시간에 모든 유형의 감정을 드러내지 않고, 또 유사한 톤의 말이 진행되다 보니 특정 감정을 나타내는 주파수가 다소 집중되는 현상이 나타났다. 마지막으로 자연어처리 영역은 면접자의 발언에서 나오는 말투, 특정 단어의 빈도수를 넘어, 전체적인 맥락과 느낌을 이해할 수 있는 자연어처리 분석모델의 필요성이 더욱 커졌음을 판단했다.

  • PDF

A Study on the Perception of Pit and Fissure Sealant using Unstructured Big Data (비정형 빅데이터를 이용한 치면열구전색(치아홈메우기)에 대한 인식분석)

  • Han-A Cho
    • Journal of Korean Dental Hygiene Science
    • /
    • v.6 no.2
    • /
    • pp.101-114
    • /
    • 2023
  • Background: This study aimed to explore the overall perception of pit and fissure sealants and suggest methods to revitalize their current stagnation. Methods: To determine the social perception of the change in coverage policy for pit and fissure sealants, we categorized them into five time periods. The first period (December 1, 2009 to November 30, 2010), the second period (December 1, 2010 to September 30, 2012), the third period (October 1, 2012 to May 5, 2013), the fourth period (May 6, 2013 to September 30, 2017), and the fifth period (October 1, 2017 to December 31, 2022). We utilized text mining, an unstructured big data analysis method. Keywords were collected and analyzed using Textom, and the frequency analysis of the top 30 keywords, structural features of the semantic network, centrality analysis, QAP correlation analysis, and co-occurrence analysis were conducted. Results: The frequency analysis showed that the top keywords for each time period were 'Cavities', 'Treatment', and 'Children'. In the structural features of the semantic network of pit and fissure sealants by time period, the density index was found to be around 1.00 for all time periods. The QAP correlation analysis showed the highest correlation between the first and second periods and the fourth and fifth periods with a correlation coefficient of 0.834. The co-occurrence analysis showed that 'cavities' and 'prevention were the top two words across all time periods. Conclusion: This study showed that pit and fissure sealants are well accepted by the society as a preventive treatment for caries. However, the awareness of health education related to these sealants was found to be low. Efforts to revitalize stagnant pit and fissure sealants need to be strengthened with effective education.

COMPARATIVE STUDY UPON THE CHARACTERISTICS OF WRITING BETWEEN THE PATIENTS WITH WRITING DISABILITIES AND NORMAL ELEMENTARY SCHOOL STUDENTS (쓰기 장애 환자와 정상 초등학교 학생의 쓰기 특성 비교)

  • Cho, Soo-Churl;Shin, Sung-Woong
    • Journal of the Korean Academy of Child and Adolescent Psychiatry
    • /
    • v.12 no.1
    • /
    • pp.51-70
    • /
    • 2001
  • Characteristics of handwriting were investigated and compared between the patients with writing disabilities and normal elementary school pupils. Generally, the heights of the letters of the patients were significantly larger than those of normal children, and letters of the patients were more sparsely distributed than those of controls. The distance between the words were significantly reduced in the patients’ writings, which indicated that patients had much more problems of space-leaving than normal pupils. Letter heights differences were significant across all grades in the patients and normal controls. The heights of the letters decreased as they grew older, and the slope of the decrements were more steeper in normal girls(r=-0.45) than girls with writing disabilities(r=-0.16). Sex differences were found in the letter spacings in low grades(grades 1, 2), that is, the distances between the letters were significantly narrower in the male patients than normal boys in these grades, and the differences were almost indiscriminating in grades 3 through 5, and finally, in sixth grade, letter spacings were signifycantly broader in normal boys than male dysgraphics. In girls, letter spacings were significantly broader in the patients across all grades. These findings supports the hypothesis that male and female writings were qualitatively different and that distinct mechanisms served in boys and girls dysgraphics. Across all grades and sexes, spaces between the words of the patients were significantly broader than normal pupils, which suggested that space-leaving between the words was important in Korean writings. There was trend that letter spacings and word spacings decreased across grades, but in girls, no correlations between the letter spacings and grades were found. Correlation analyses revealed that letter heights and letter spacings had mild correlation(r=0.11-0.15), and that letter spacings and word spacings had robust correlation(r=0.99). Phonological errors were mostly found in last phoneme(Jong-seong), especially double-phoneme(ㄳ, ㄵ, ㄶ, ㄺ, ㄻ, ㄼ, ㄾ, ㄿ, ㅀ, ㅄ), and in the case the sound values changed due to assimilations of phonemes. Semantic errors were rare in both groups. Space-leaving errors were correlated with phonological errors, and more frequent in boys than girls. In conclusion, significant differences existed in the letter heights, letter spacings, word spacings, and frequencies of phonological errors and spaceleaving errors between the patients with writing disabilities and normal pupils. The characteristics of writings changed across grades and the developmental profiles were somewhat quantitatively different between the groups. The differences became obvious from the second-third grades.

  • PDF

Analyzing the discriminative characteristic of cover letters using text mining focused on Air Force applicants (텍스트 마이닝을 이용한 공군 부사관 지원자 자기소개서의 차별적 특성 분석)

  • Kwon, Hyeok;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.75-94
    • /
    • 2021
  • The low birth rate and shortened military service period are causing concerns about selecting excellent military officers. The Republic of Korea entered a low birth rate society in 1984 and an aged society in 2018 respectively, and is expected to be in a super-aged society in 2025. In addition, the troop-oriented military is changed as a state-of-the-art weapons-oriented military, and the reduction of the military service period was implemented in 2018 to ease the burden of military service for young people and play a role in the society early. Some observe that the application rate for military officers is falling due to a decrease of manpower resources and a preference for shortened mandatory military service over military officers. This requires further consideration of the policy of securing excellent military officers. Most of the related studies have used social scientists' methodologies, but this study applies the methodology of text mining suitable for large-scale documents analysis. This study extracts words of discriminative characteristics from the Republic of Korea Air Force Non-Commissioned Officer Applicant cover letters and analyzes the polarity of pass and fail. It consists of three steps in total. First, the application is divided into general and technical fields, and the words characterized in the cover letter are ordered according to the difference in the frequency ratio of each field. The greater the difference in the proportion of each application field, the field character is defined as 'more discriminative'. Based on this, we extract the top 50 words representing discriminative characteristics in general fields and the top 50 words representing discriminative characteristics in technology fields. Second, the number of appropriate topics in the overall cover letter is calculated through the LDA. It uses perplexity score and coherence score. Based on the appropriate number of topics, we then use LDA to generate topic and probability, and estimate which topic words of discriminative characteristic belong to. Subsequently, the keyword indicators of questions used to set the labeling candidate index, and the most appropriate index indicator is set as the label for the topic when considering the topic-specific word distribution. Third, using L-LDA, which sets the cover letter and label as pass and fail, we generate topics and probabilities for each field of pass and fail labels. Furthermore, we extract only words of discriminative characteristics that give labeled topics among generated topics and probabilities by pass and fail labels. Next, we extract the difference between the probability on the pass label and the probability on the fail label by word of the labeled discriminative characteristic. A positive figure can be seen as having the polarity of pass, and a negative figure can be seen as having the polarity of fail. This study is the first research to reflect the characteristics of cover letters of Republic of Korea Air Force non-commissioned officer applicants, not in the private sector. Moreover, these methodologies can apply text mining techniques for multiple documents, rather survey or interview methods, to reduce analysis time and increase reliability for the entire population. For this reason, the methodology proposed in the study is also applicable to other forms of multiple documents in the field of military personnel. This study shows that L-LDA is more suitable than LDA to extract discriminative characteristics of Republic of Korea Air Force Noncommissioned cover letters. Furthermore, this study proposes a methodology that uses a combination of LDA and L-LDA. Therefore, through the analysis of the results of the acquisition of non-commissioned Republic of Korea Air Force officers, we would like to provide information available for acquisition and promotional policies and propose a methodology available for research in the field of military manpower acquisition.

A Study on the Intellectual Structure of Metadata Research by Using Co-word Analysis (동시출현단어 분석에 기반한 메타데이터 분야의 지적구조에 관한 연구)

  • Choi, Ye-Jin;Chung, Yeon-Kyoung
    • Journal of the Korean Society for information Management
    • /
    • v.33 no.3
    • /
    • pp.63-83
    • /
    • 2016
  • As the usage of information resources produced in various media and forms has been increased, the importance of metadata as a tool of information organization to describe the information resources becomes increasingly crucial. The purposes of this study are to analyze and to demonstrate the intellectual structure in the field of metadata through co-word analysis. The data set was collected from the journals which were registered in the Core collection of Web of Science citation database during the period from January 1, 1998 to July 8, 2016. Among them, the bibliographic data from 727 journals was collected using Topic category search with the query word 'metadata'. From 727 journal articles, 410 journals with author keywords were selected and after data preprocessing, 1,137 author keywords were extracted. Finally, a total of 37 final keywords which had more than 6 frequency were selected for analysis. In order to demonstrate the intellectual structure of metadata field, network analysis was conducted. As a result, 2 domains and 9 clusters were derived, and intellectual relations among keywords from metadata field were visualized, and proposed keywords with high global centrality and local centrality. Six clusters from cluster analysis were shown in the map of multidimensional scaling, and the knowledge structure was proposed based on the correlations among each keywords. The results of this study are expected to help to understand the intellectual structure of metadata field through visualization and to guide directions in new approaches of metadata related studies.

Analyzing the Sentence Structure for Automatic Identification of Metadata Elements based on the Logical Semantic Structure of Research Articles (연구 논문의 의미 구조 기반 메타데이터 항목의 자동 식별 처리를 위한 문장 구조 분석)

  • Song, Min-Sun
    • Journal of the Korean Society for information Management
    • /
    • v.35 no.3
    • /
    • pp.101-121
    • /
    • 2018
  • This study proposes the analysis method in sentence semantics that can be automatically identified and processed as appropriate items in the system according to the composition of the sentences contained in the data corresponding to the logical semantic structure metadata of the research papers. In order to achieve the purpose, the structure of sentences corresponding to 'Research Objectives' and 'Research Outcomes' among the semantic structure metadata was analyzed based on the number of words, the link word types, the role of many-appeared words in sentences, and the end types of a word. As a result of this study, the number of words in the sentences was 38 in 'Research Objectives' and 212 in 'Research Outcomes'. The link word types in 'Research Objectives' were occurred in the order such as Causality, Sequence, Equivalence, In-other-word/Summary relation, and the link word types in 'Research Outcomes' were appeared in the order such as Causality, Equivalence, Sequence, In-other-word/Summary relation. Analysis target words like '역할(Role)', '요인(Factor)' and '관계(Relation)' played a similar role in both purpose and result part, but the role of '연구(Study)' was little different. Finally, the verb endings in sentences were appeared many times such as '~고자', '~였다' in 'Research Objectives', and '~었다', '~있다', '~였다' in 'Research Outcomes'. This study is significant as a fundamental research that can be utilized to automatically identify and input the metadata element reflecting the common logical semantics of research papers in order to support researchers' scholarly sensemaking.

'Elderly image' Analysis Using Big Data and Social Networking Techniques (빅데이터와 사회연결망 기법을 이용한 '노인 이미지' 분석)

  • Han, Sun-Bo;Lee, Hyun-Sim
    • The Journal of the Korea Contents Association
    • /
    • v.16 no.11
    • /
    • pp.253-263
    • /
    • 2016
  • We analyzed the social issue 'image of the elderly' using Big Data and Social Network Analysis. First, we analyzed the words extracted by the text mining technique by inputting the keyword 'elderly'. As a result of analysis, the image of the elderly viewed through media such as cafes, blogs, etc. Representing the trend of the public was using the word 'Senior' the most. The image of the elderly is expressed using the word having the highest frequency in the top 10, "The elderly are 'Senior' people who are respected by society, they are organized to earn money, to earn their qualifications, to health, and to 'Seniors' who desire to work healthy up to 100 years old". The purpose of this study is to differentiate from the existing analysis method by analyzing the macro-level image of the elderly including the social discourse by collecting vast amount of data and analyzing it with the social networking technique. When the image of the elderly that the public perceives is positively expressed as 'Senior', it can be said that the direction of the current elderly policy is evaluated as a desirable direction. On the other hand, it was able to feel the 'desire' of the public who wanted to be evaluated. Therefore, the policy direction of the elderly to be applied in the future should be the policy that enables the elderly to be perceived as 'Necessary existence' in society by taking on social roles. In addition, we proposed to implement the policy of the elderly that reflects priorities such as job creation, welfare, and alienation that can activity and maintain health.

Identifying potential buyers in the technology market using a semantic network analysis (시맨틱 네트워크 분석을 이용한 원천기술 분야의 잠재적 기술수요 발굴기법에 관한 연구)

  • Seo, Il Won;Chon, ChaeNam;Lee, Duk Hee
    • Journal of Technology Innovation
    • /
    • v.21 no.1
    • /
    • pp.279-301
    • /
    • 2013
  • This study demonstrates how social network analysis can be used for identifying potential buyers in technology marketing; in such, the methodology and empirical results are proposed. First of all, we derived the three most important 'seed' keywords from 'technology description' sections. The technologies are generated by various types of R&D activities organized by South Korea's public research institutes in the fundamental science fields. Second, some 3, 000 words were collected from websites related to the three 'seed' keywords. Next, three network matrices (i.e., one matrix per seed keyword) were constructed. To explore the technology network structure, each network is analyzed by degree centrality and Euclidean distance. The network analysis suggests 100 potentially demanding companies and identifies seven common companies after comparing results derived from each network. The usefulness of the result is verified by investigating the business area of the firm's homepages. Finally, five out of seven firms were proven to have strong relevance to the target technology. In terms of social network analysis, this study expands its application scope of methodology by combining semantic network analysis and the technology marketing method. From a practical perspective, the empirical study suggests the illustrative framework for exploiting prospective demanding companies on the web, raising possibilities of technology commercialization in the basic research fields. Future research is planned to examine how the efficiency of process and accuracy of result is increased.

  • PDF

A Korean Homonym Disambiguation System Using Refined Semantic Information and Thesaurus (정제된 의미정보와 시소러스를 이용한 동형이의어 분별 시스템)

  • Kim Jun-Su;Ock Cheol-Young
    • The KIPS Transactions:PartB
    • /
    • v.12B no.7 s.103
    • /
    • pp.829-840
    • /
    • 2005
  • Word Sense Disambiguation(WSD) is one of the most difficult problem in Korean information processing. We propose a WSD model with the capability to filter semantic information using the specific characteristics in dictionary dictions, and nth added information, useful to sense determination, such as statistical, distance and case information. we propose a model, which can resolve the issues resulting from the scarcity of semantic information data based on the word hierarchy system (thesaurus) developed by Ulsan University's UOU Word Intelligent Network, a dictionary-based toxicological database. Among the WSD models elaborated by this study, the one using statistical information, distance and case information along with the thesaurus (hereinafter referred to as 'SDJ-X model') performed the best. In an experiment conducted on the sense-tagged corpus consisting of 1,500,000 eojeols, provided by the Sejong project, the SDJ-X model recorded improvements over the maximum frequency word sense determination (maximum frequency determination, MFC, accuracy baseline) of $18.87\%$ ($21.73\%$ for nouns and inter-eojeot distance weights by $10.49\%$ ($8.84\%$ for nouns, $11.51\%$ for verbs). Finally, the accuracy level of the SDJ-X model was higher than that recorded by the model using only statistical information, distance and case information, without the thesaurus by a margin of $6.12\%$ ($5.29\%$ for nouns, $6.64\%$ for verbs).