• Title/Summary/Keyword: 단어빈도

Search Result 542, Processing Time 0.024 seconds

Determining the Specificity of Terms using Compositional and Contextual Information (구성정보와 문맥정보를 이용한 전문용어의 전문성 측정 방법)

  • Ryu Pum-Mo;Bae Sun-Mee;Choi Key-Sun
    • Journal of KIISE:Software and Applications
    • /
    • v.33 no.7
    • /
    • pp.636-645
    • /
    • 2006
  • A tenn with more domain specific information has higher level of term specificity. We propose new specificity calculation methods of terms based on information theoretic measures using compositional and contextual information. Specificity of terms is a kind of necessary conditions in tenn hierarchy construction task. The methods use based on compositional and contextual information of terms. The compositional information includes frequency, $tf{\cdot}idf$, bigram and internal structure of the terms. The contextual information of a tenn includes the probabilistic distribution of modifiers of terms. The proposed methods can be applied to other domains without extra procedures. Experiments showed very promising result with the precision of 82.0% when applied to the terms in MeSH thesaurus.

Collection and Extraction Algorithm of Field-Associated Terms (분야연상어의 수집과 추출 알고리즘)

  • Lee, Sang-Kon;Lee, Wan-Kwon
    • The KIPS Transactions:PartB
    • /
    • v.10B no.3
    • /
    • pp.347-358
    • /
    • 2003
  • VSField-associated term is a single or compound word whose terms occur in any document, and which makes it possible to recognize a field of text by using common knowledge of human. For example, human recognizes the field of document such as or , a field name of text, when she encounters a word 'Pitcher' or 'election', respectively We Proposes an efficient construction method of field-associated terms (FTs) for specializing field to decide a field of text. We could fix document classification scheme from well-classified document database or corpus. Considering focus field we discuss levels and stability ranks of field-associated terms. To construct a balanced FT collection, we construct a single FTs. From the collections we could automatically construct FT's levels, and stability ranks. We propose a new extraction algorithms of FT's for document classification by using FT's concentration rate, its occurrence frequencies.

Automatic Meeting Summary System using Enhanced TextRank Algorithm (향상된 TextRank 알고리즘을 이용한 자동 회의록 생성 시스템)

  • Bae, Young-Jun;Jang, Ho-Taek;Hong, Tae-Won;Lee, Hae-Yeoun
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.11 no.5
    • /
    • pp.467-474
    • /
    • 2018
  • To organize and document the contents of meetings and discussions is very important in various tasks. However, in the past, people had to manually organize the contents themselves. In this paper, we describe the development of a system that generates the meeting minutes automatically using the TextRank algorithm. The proposed system records all the utterances of the speaker in real time and calculates the similarity based on the appearance frequency of the sentences. Then, to create the meeting minutes, it extracts important words or phrases through a non-supervised learning algorithm for finding the relation between the sentences in the document data. Especially, we improved the performance by introducing the keyword weighting technique for the TextRank algorithm which reconfigured the PageRank algorithm to fit words and sentences.

A Study on Domestic Research Trends (2001-2020) of Forest Ecology Using Text Mining (텍스트마이닝을 활용한 국내 산림생태 분야 연구동향(2001-2020) 분석)

  • Lee, Jinkyu;Lee, Chang-Bae
    • Journal of Korean Society of Forest Science
    • /
    • v.110 no.3
    • /
    • pp.308-321
    • /
    • 2021
  • The purpose of this study was to analyze domestic research trends over the past 20 years and future direction of forest ecology using text mining. A total of 1,015 academic papers and keywords data related to forest ecology were collected by the "Research and Information Service Section" and analyzed using big data analysis programs, such as Textom and UCINET. From the results of word frequency and N-gram analyses, we found domestic studies on forest ecology rapidly increased since 2011. The most common research topic was "species diversity" over the past 20 years and "climate change" became a major topic since 2011. Based on CONCOR analysis, study subjects were grouped intoeight categories, such as "species diversity," "environmental policy," "climate change," "management," "plant taxonomy," "habitat suitability index," "vascular plants," and "recreation and welfare." Consequently, species diversity and climate change will remain important topics in the future and diversifying and expanding domestic research topics following global research trendsis necessary.

Analysis of Pressure Ulcer Nursing Records with Artificial Intelligence-based Natural Language Processing (인공지능 기반 자연어처리를 적용한 욕창간호기록 분석)

  • Kim, Myoung Soo;Ryu, Jung-Mi
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.10
    • /
    • pp.365-372
    • /
    • 2021
  • The purpose of this study was to examine the statements characteristics of the pressure ulcer nursing record by natural langage processing and assess the prediction accuracy for each pressure ulcer stage. Nursing records related to pressure ulcer were analyzed using descriptive statistics, and word cloud generators (http://wordcloud.kr) were used to examine the characteristics of words in the pressure ulcer prevention nursing records. The accuracy ratio for the pressure ulcer stage was calculated using deep learning. As a result of the study, the second stage and the deep tissue injury suspected were 23.1% and 23.0%, respectively, and the most frequent key words were erythema, blisters, bark, area, and size. The stages with high prediction accuracy were in the order of stage 0, deep tissue injury suspected, and stage 2. These results suggest that it can be developed as a clinical decision support system available to practice for nurses at the pressure ulcer prevention care.

Knowledge Trend Analysis of Uncertainty in Biomedical Scientific Literature (생의학 학술 문헌의 불확실성 기반 지식 동향 분석에 관한 연구)

  • Heo, Go Eun;Song, Min
    • Journal of the Korean Society for information Management
    • /
    • v.36 no.2
    • /
    • pp.175-199
    • /
    • 2019
  • Uncertainty means incomplete stages of knowledge of propositions due to the lack of consensus of information and existing knowledge. As the amount of academic literature increases exponentially over time, new knowledge is discovered as research develops. Although the flow of time may be an important factor to identify patterns of uncertainty in scientific knowledge, existing studies have only identified the nature of uncertainty based on the frequency in a particular discipline, and they did not take into consideration of the flow of time. Therefore, in this study, we identify and analyze the uncertainty words that indicate uncertainty in the scientific literature and investigate the stream of knowledge. We examine the pattern of biomedical knowledge such as representative entity pairs, predicate types, and entities over time. We also perform the significance testing using linear regression analysis. Seven pairs out of 17 entity pairs show the significant decrease pattern statistically and all 10 representative predicates decrease significantly over time. We analyze the relative importance of representative entities by year and identify entities that display a significant rising and falling pattern.

A Visualization of Movie Reviews based on a Semantic Network Analysis (의미연결망 분석을 활용한 영화 리뷰 시각화)

  • Kim, Seulgi;Kim, Jang Hyun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.23 no.1
    • /
    • pp.1-6
    • /
    • 2019
  • This study visualized users reaction about movies based on keywords with high frequency. For this work, we collected data of movie reviews on . A total of six movies were selected, and we conducted the work of data gathering and preprocessing. Semantic network analysis was used to understand the relationship among keywords. Also, NetDraw, packaged with UCINET, was used for data visualization. In this study, we identified the differences in characteristics of review contents regarding each movie. The implication of this study is that we visualized movie reviews made by sentence as keywords and explored whether it is possible to construct the interface to check users' reaction at a glance. We suggest that further studies use more diverse movie reviews, and the number of reviews for each movie is used in similar quantities for research.

An Exploratory Study of VR Technology using Patents and News Articles (특허와 뉴스 기사를 이용한 가상현실 기술에 관한 탐색적 연구)

  • Kim, Sungbum
    • Journal of Digital Convergence
    • /
    • v.16 no.11
    • /
    • pp.185-199
    • /
    • 2018
  • The purpose of this study is to derive the core technologies of VR using patent analysis and to explore the direction of social and public interest in VR using news analysis. In Study 1, we derived keywords using the frequency of words in patent texts, and we compared by company, year, and technical classification. Netminer, a network analysis program, was used to analyze the IPC codes of patents. In Study 2, we analyzed news articles using T-LAB program. TF-IDF was used as a keyword selection method and chi-square and association index algorithms were used to extract the words most relevant to VR. Through this study, we confirmed that VR is a fusion technology including optics, head mounted display (HMD), data analysis, electric and electronic technology, and found that optical technology is the central technology among the technologies currently being developed. In addition, through news articles, we found that the society and the public are interested in the formation and growth of VR suppliers and markets, and VR should be developed on the basis of user experience.

Research trends in statistics for domestic and international journal using paper abstract data (초록데이터를 활용한 국내외 통계학 분야 연구동향)

  • Yang, Jong-Hoon;Kwak, Il-Youp
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.2
    • /
    • pp.267-278
    • /
    • 2021
  • As time goes by, the amount of data is increasing regardless of government, business, domestic or overseas. Accordingly, research on big data is increasing in academia. Statistics is one of the major disciplines of big data research, and it will be interesting to understand the research trend of statistics through big data in the growing number of papers in statistics. In this study, we analyzed what studies are being conducted through abstract data of statistical papers in Korea and abroad. Research trends in domestic and international were analyzed through the frequency of keyword data of the papers, and the relationship between the keywords was visualized through the Word Embedding method. In addition to the keywords selected by the authors, words that are importantly used in statistical papers selected through Textrank were also visualized. Lastly, 10 topics were investigated by applying the LDA technique to the abstract data. Through the analysis of each topic, we investigated which research topics are frequently studied and which words are used importantly.

A Corpus-based English Syntax Academic Word List Building and its Lexical Profile Analysis (코퍼스 기반 영어 통사론 학술 어휘목록 구축 및 어휘 분포 분석)

  • Lee, Hye-Jin;Lee, Je-Young
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.12
    • /
    • pp.132-139
    • /
    • 2021
  • This corpus-driven research expounded the compilation of the most frequently occurring academic words in the domain of syntax and compared the extracted wordlist with Academic Word List(AWL) of Coxhead(2000) and General Service List(GSL) of West(1953) to examine their distribution and coverage within the syntax corpus. A specialized 546,074 token corpus, composed of widely used must-read syntax textbooks for English education majors, was loaded into and analyzed with AntWordProfiler 1.4.1. Under the parameter of lexical frequency, the analysis identified 288(50.5%) AWL word forms, appeared 16 times or more, as well as 218(38.2%) AWL items, occurred not exceeding 15 times. The analysis also indicated that the coverage of AWL and GSL accounted for 9.19% and 78.92% respectively and the combination of GSL and AWL amounted to 88.11% of all tokens. Given that AWL can be instrumental in serving broad disciplinary needs, this study highlighted the necessity to compile the domain-specific AWL as a lexical repertoire to promote academic literacy and competence.