• Title/Summary/Keyword: 텍스트 빈도 분석

Search Result 337, Processing Time 0.033 seconds

Building Specialized Language Model for National R&D through Knowledge Transfer Based on Further Pre-training (추가 사전학습 기반 지식 전이를 통한 국가 R&D 전문 언어모델 구축)

  • Yu, Eunji;Seo, Sumin;Kim, Namgyu
    • Knowledge Management Research
    • /
    • v.22 no.3
    • /
    • pp.91-106
    • /
    • 2021
  • With the recent rapid development of deep learning technology, the demand for analyzing huge text documents in the national R&D field from various perspectives is rapidly increasing. In particular, interest in the application of a BERT(Bidirectional Encoder Representations from Transformers) language model that has pre-trained a large corpus is growing. However, the terminology used frequently in highly specialized fields such as national R&D are often not sufficiently learned in basic BERT. This is pointed out as a limitation of understanding documents in specialized fields through BERT. Therefore, this study proposes a method to build an R&D KoBERT language model that transfers national R&D field knowledge to basic BERT using further pre-training. In addition, in order to evaluate the performance of the proposed model, we performed classification analysis on about 116,000 R&D reports in the health care and information and communication fields. Experimental results showed that our proposed model showed higher performance in terms of accuracy compared to the pure KoBERT model.

Evaluating real-time search query variation for intelligent information retrieval service (지능 정보검색 서비스를 위한 실시간검색어 변화량 평가)

  • Chong, Min-Young
    • Journal of Digital Convergence
    • /
    • v.16 no.12
    • /
    • pp.335-342
    • /
    • 2018
  • The search service, which is a core service of the portal site, presents search queries that are rapidly increasing among the inputted search queries based on the highest instantaneous search frequency, so it is difficult to immediately notify a search query having a high degree of interest for a certain period. Therefore, it is necessary to overcome the above problems and to provide more intelligent information retrieval service by bringing improved analysis results on the change of the search queries. In this paper, we present the criteria for measuring the interest, continuity, and attention of real-time search queries. In addition, according to the criteria, we measure and summarize changes in real-time search queries in hours, days, weeks, and months over a period of time to assess the issues that are of high interest, long-lasting issues of interest, and issues that need attention in the future.

The Stream of Uncertainty in Scientific Knowledge using Topic Modeling (토픽 모델링 기반 과학적 지식의 불확실성의 흐름에 관한 연구)

  • Heo, Go Eun
    • Journal of the Korean Society for information Management
    • /
    • v.36 no.1
    • /
    • pp.191-213
    • /
    • 2019
  • The process of obtaining scientific knowledge is conducted through research. Researchers deal with the uncertainty of science and establish certainty of scientific knowledge. In other words, in order to obtain scientific knowledge, uncertainty is an essential step that must be performed. The existing studies were predominantly performed through a hedging study of linguistic approaches and constructed corpus with uncertainty word manually in computational linguistics. They have only been able to identify characteristics of uncertainty in a particular research field based on the simple frequency. Therefore, in this study, we examine pattern of scientific knowledge based on uncertainty word according to the passage of time in biomedical literature where biomedical claims in sentences play an important role. For this purpose, biomedical propositions are analyzed based on semantic predications provided by UMLS and DMR topic modeling which is useful method to identify patterns in disciplines is applied to understand the trend of entity based topic with uncertainty. As time goes by, the development of research has been confirmed that uncertainty in scientific knowledge is moving toward a decreasing pattern.

Functional Lexical Bundles in Nuclear Science and Engineering Research Articles (원자력과학공학 학술 논문에 나타난 기능적 어휘다발 분석)

  • Nam, Daehyeon
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.11
    • /
    • pp.426-435
    • /
    • 2021
  • This study aims to functionally classify lexical bundles appearing in academic papers on nuclear science and engineering written in English and then analyze the lexical bundles' characteristics compared to those appearing in general academic papers. To this end, the texts of nuclear science and engineering papers were collected and produced as a corpus(c. 1 mil. tokens). Then they were statistically compared through Chi-square tests and standardized residuals with the corpus of general academic papers(c. 750,000 tokens). The results revealed that, compared to general academic papers, the bundles in the stance lexical bundle category were mainly used among the functional lexical bundle in nuclear science and engineering. The use of the lexical bundles lacked much variety. The same type of lexical bundles was 're-used' and 'recycled'. Based on these research results, educational implications for English for Academic Purposes and the further direction of follow-up research were discussed and suggested.

Analysis of Descriptive Lectures Evaluation using Text Mining: Comparative analysis pre and post COVID-19

  • Lee, Sang-Chul
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.10
    • /
    • pp.211-222
    • /
    • 2022
  • The purpose of this study is to indicate the direction of the future university classes in the post-COVID era, comparing and analyzing lecture evaluation of pre and post COVID-19. To this end, 4 yeard data were used from 2018 to 2019 for pre COVID-19 and form 2020 to 2021 data for post COVID-19. The results were as follows. In the case of liberal arts, "assignments" was the word with the highest frequency and degree centrality(DC) regardless of pre and post-COVID-19 In the major, "understanding" appeared as the most important word. The result of the ego network analysis indicated that "video lecture" and "non-face-to-face classes" were difficult and "interaction" between the professor and the students was important. As a results, it is important to reduce the weight of assignments and increase interaction with students in liberal arts classes. In the case of majors, it is necessary to operate face-to-face classes rather than non-face-to-face classes, and to organize the contents of videos without difficulty.

Comparative Analysis of 4-gram Word Clusters in South vs. North Korean High School English Textbooks (남북한 고등학교 영어교과서 4-gram 연어 비교 분석)

  • Kim, Jeong-ryeol
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.7
    • /
    • pp.274-281
    • /
    • 2020
  • N-gram analysis casts a new look at the n-word cluster in use different from the previously known idioms. It analyzes a corpus of English textbooks for frequently occurring n consecutive words mechanically using a concordance software, which is different from the previously known idioms. The current paper aims at extracting and comparing 4-gram words clusters between South Korean high school English textbooks and its North Korean counterpart. The classification criteria includes number of tokens and types between the two across oral and written languages in the textbooks. The criteria also use the grammatical categories and functional categories to classify and compare the 4-gram words clusters. The grammatical categories include noun phrases, verb phrases, prepositional phrases, partial clauses and others. The functional categories include deictic function, text organizers, stance and others. The findings are: South Korean high school English textbook contains more tokens and types in both oral and written languages. Verb phrase and partial clause 4-grams are grammatically most frequently encountered categories across both South and North Korean high school English textbooks. Stance is most dominant functional category in both South and North Korean English textbooks.

Trend Forecasting and Analysis of Quantum Computer Technology (양자 컴퓨터 기술 트렌드 예측과 분석)

  • Cha, Eunju;Chang, Byeong-Yun
    • Journal of the Korea Society for Simulation
    • /
    • v.31 no.3
    • /
    • pp.35-44
    • /
    • 2022
  • In this study, we analyze and forecast quantum computer technology trends. Previous research has been mainly focused on application fields centered on technology for quantum computer technology trends analysis. Therefore, this paper analyzes important quantum computer technologies and performs future signal detection and prediction, for a more market driven technical analysis and prediction. As analyzing words used in news articles to identify rapidly changing market changes and public interest. This paper extends conference presentation of Cha & Chang (2022). The research is conducted by collecting domestic news articles from 2019 to 2021. First, we organize the main keywords through text mining. Next, we explore future quantum computer technologies through analysis of Term Frequency - Inverse Document Frequency(TF-IDF), Key Issue Map(KIM), and Key Emergence Map (KEM). Finally, the relationship between future technologies and supply and demand is identified through random forests, decision trees, and correlation analysis. As results of the study, the interest in artificial intelligence was the highest in frequency analysis, keyword diffusion and visibility analysis. In terms of cyber-security, the rate of mention in news articles is getting overwhelmingly higher than that of other technologies. Quantum communication, resistant cryptography, and augmented reality also showed a high rate of increase in interest. These results show that the expectation is high for applying trend technology in the market. The results of this study can be applied to identifying areas of interest in the quantum computer market and establishing a response system related to technology investment.

A Comparative Analysis of Complex Disaster Research Trends Using Network Analysis (네트워크 분석을 활용한 국내·외 복합재난 연구 동향 분석)

  • Woosik Kim;Yeonwoo Choi;Youjeong Hong;Dong Keun Yoon
    • Journal of the Society of Disaster Information
    • /
    • v.18 no.4
    • /
    • pp.908-921
    • /
    • 2022
  • Purpose: As the connection between physical and non-physical structures in cities is expanding and becoming more complex, the risk of complex disaster which causes damage in a complex way is increasing. Preparing for these complex disasters, it is important to preemptively identify and manage disasters that can develop into complex disasters. Therefore, this study analyzes the disaster types studied as complex disasters by analyzing the trends of domestic and international studies related to complex disasters, and presents the direction of complex disaster management in the future. Method: We first established co-occurrence networks between disaster types based on 993 articles related to complex disasters published in disaster-related journals for the last 20 years (2002-2021). Then, through network analysis, domestic and international complex disaster research trends were compared and analyzed. Result: Research on complex disasters related to storm and flood damage, infrastructure failure and fire was high in domestic studies, and it was analyzed that research on complex disasters related to earthquakes and landslides has recently increased. However, in international studies, the proportion of studies on infrastructure failure along with storm and flood damage and earthquake was high, and various types of disasters such as tsunami and drought appeared. Conclusion: The results of this study are expected to increase the understanding of the trends in complex disaster research and provide suggestions of domestic complex disaster research in the future.

Comparative Analysis of Low Fertility Response Policies (Focusing on Unstructured Data on Parental Leave and Child Allowance) (저출산 대응 정책 비교분석 (육아휴직과 아동수당의 비정형 데이터 중심으로))

  • Eun-Young Keum;Do-Hee Kim
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.5
    • /
    • pp.769-778
    • /
    • 2023
  • This study compared and analyzed parental leave and child allowance, two major policies among solutions to the current serious low fertility rate problem, using unstructured data, and sought future directions and implications for related response policies based on this. The collection keywords were "low fertility + parental leave" and "low fertility + child allowance", and data analysis was conducted in the following order: text frequency analysis, centrality analysis, network visualization, and CONCOR analysis. As a result of the analysis, first, parental leave was found to be a realistic and practical policy in response to low fertility rates, as data analysis showed more diverse and systematic discussions than child allowance. Second, in terms of child allowance, data analysis showed that there was a high level of information and interest in the cash grant benefit system, including child allowance, but there were no other unique features or active discussions. As a future improvement plan, both policies need to utilize the existing system. First, parental leave requires improvement in the working environment and blind spots in order to expand the system, and second, child allowance requires a change in the form of payment that deviates from the uniform and biased system. should be sought, and it was proposed to expand the target age.

Analysis of Waterpark Status and Recognition Using Big Data Analysis (빅데이터 분석을 활용한 워터파크 현황 및 인식 분석)

  • Kim, Jae-Hwan;Lee, Jae-Moon
    • Journal of Digital Convergence
    • /
    • v.15 no.10
    • /
    • pp.525-535
    • /
    • 2017
  • The purpose of this study aims to examine consumer perception and current status of water park. The Naver and Daum were used for data collection channels and the keyword 'water park' was used for data retrieval. The data analysis period was limited to the study period from January 1, 2015 to December 31, 2016 for a total of two years. First, as a result of the frequency analysis, hidden cameras, Lotte water park, arrests, suspects, gimhae were in top 5 in 2015, Lotte water park, swimming, summer, opening, admission ticket were in top 5 in 2016. Second, as a result of the connection degree central analysis, hidden camera, arrest, suspect, female, shower room were in top 5 in 2015, swimming, Lotte water park, summer and One Mount, admission ticket were in top 5 in 2016. Third, as a result of the N-GRAM network graph, the water park/hidden camera, the hidden camera/hidden camera, the suspect/arrest, the Gimhae/Lotte water park, water park/suspect were in top 5 in 2015, and One Mount/water park, Gimhae/Lotte water park, water park/admission ticket, water park/water park, water park/opening were in top 5 in 2016. Fourth, as a result of the CONCOR analysis, three groups in 2015 and two groups in 2016 were formed.