• Title/Summary/Keyword: Text Classification Application

Search Result 72, Processing Time 0.032 seconds

Korean speech recognition using deep learning (딥러닝 모형을 사용한 한국어 음성인식)

  • Lee, Suji;Han, Seokjin;Park, Sewon;Lee, Kyeongwon;Lee, Jaeyong
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.2
    • /
    • pp.213-227
    • /
    • 2019
  • In this paper, we propose an end-to-end deep learning model combining Bayesian neural network with Korean speech recognition. In the past, Korean speech recognition was a complicated task due to the excessive parameters of many intermediate steps and needs for Korean expertise knowledge. Fortunately, Korean speech recognition becomes manageable with the aid of recent breakthroughs in "End-to-end" model. The end-to-end model decodes mel-frequency cepstral coefficients directly as text without any intermediate processes. Especially, Connectionist Temporal Classification loss and Attention based model are a kind of the end-to-end. In addition, we combine Bayesian neural network to implement the end-to-end model and obtain Monte Carlo estimates. Finally, we carry out our experiments on the "WorimalSam" online dictionary dataset. We obtain 4.58% Word Error Rate showing improved results compared to Google and Naver API.

An Analysis of Trends in Natural Language Processing Research in the Field of Science Education (과학교육 분야 자연어 처리 기법의 연구동향 분석)

  • Cheolhong Jeon;Suna Ryu
    • Journal of The Korean Association For Science Education
    • /
    • v.44 no.1
    • /
    • pp.39-55
    • /
    • 2024
  • This study aimed to examine research trends related to Natural Language Processing (NLP) in science education by analyzing 37 domestic and international documents that utilized NLP techniques in the field of science education from 2011 to September 2023. In particular, the study systematically analyzed the content, focusing on the main application areas of NLP techniques in science education, the role of teachers when utilizing NLP techniques, and a comparison of domestic and international perspectives. The analysis results are as follows: Firstly, it was confirmed that NLP techniques are significantly utilized in formative assessment, automatic scoring, literature review and classification, and pattern extraction in science education. Utilizing NLP in formative assessment allows for real-time analysis of students' learning processes and comprehension, reducing the burden on teachers' lessons and providing accurate, effective feedback to students. In automatic scoring, it contributes to the rapid and precise evaluation of students' responses. In literature review and classification using NLP, it helps to effectively analyze the topics and trends of research related to science education and student reports. It also helps to set future research directions. Utilizing NLP techniques in pattern extraction allows for effective analysis of commonalities or patterns in students' thoughts and responses. Secondly, the introduction of NLP techniques in science education has expanded the role of teachers from mere transmitters of knowledge to leaders who support and facilitate students' learning, requiring teachers to continuously develop their expertise. Thirdly, as domestic research on NLP is focused on literature review and classification, it is necessary to create an environment conducive to the easy collection of text data to diversify NLP research in Korea. Based on these analysis results, the study discussed ways to utilize NLP techniques in science education.

A study on the indications of Five Viscera Source Point Acupuncture extended from Taegeuk Acupuncture : Focused on Yeoungchu(靈樞) (태극침법(太極鍼法)의 확장형인 오장원혈침법(五臟原穴鍼法)의 적응증 연구 - "황제내경(黃帝內經).영추(靈樞)"를 중심으로 -)

  • Moh, Han Young;Lim, Gyo-Min;Baek, Jin-Ung
    • Journal of Korean Medical classics
    • /
    • v.25 no.4
    • /
    • pp.123-147
    • /
    • 2012
  • Objective : By establishing the Five Viscera Source Point Acupuncture as the targeted acupuncture treatment for stadardization, as the first step, this study was conducted to sort the indications of each acupuncture remedies, which can be referred as one of the most important factors in acupuncture treatment, based on Yeoungchu. Method : This study selected only the contents related to indications of five viscera, by extracting the relevant sentences from Yeoungchu using the search words Liver(Liver Meridian, First Yin), Heart(Pericardium, Heart Meridian, Second Yin), Spleen(Spleen meridian, Third Yin), Lung(Lung Meridian, Third Yin), and Kidney(Kidney Meridian, Second Yin). Result & Conclusion : 1. We selected and extracted text related to liver disease from Chapter 16, heart (pericardium) disease from Chapter 16, spleen disease from Chapter 19, lung disease from Chapter 17, and finally kidney disease from Chapter 17 of Yeoungchu. 2. The basic theory of applying Five Viscera Source Point Acupuncture to five viscera diseases is first assorting the diseases according to its state (i.e. deficiency or excess), then draining the source point of the appropriate viscus in case of excess, or supplementing the source point of the appropriate viscus in case of deficiency. 3. For the correct application of Five Viscera Source Point Acupuncture, the classification of the disease, not only the judgement on its state, must be presented systematically and synthetically in combination with Four Examinations. Therefore the follow-up studies needs to be conducted.

Social Issue Analysis Based on Sentiment of Twitter Users (트위터 사용자들의 감성을 이용한 사회적 이슈 분석)

  • Kim, Hannah;Jeong, Young-Seob
    • Journal of Convergence for Information Technology
    • /
    • v.9 no.11
    • /
    • pp.81-91
    • /
    • 2019
  • Recently, social network service (SNS) is actively used by public. Among them, Twitter has a lot of tweets including sentiment and it is convenient to collect data through open Aplication Programming Interface (API). In this paper, we analyze social issues and suggest the possibility of using them in marketing through sentimental information of users. In this paper, we collect twitter text about social issues and classify as positive or negative by sentiment classifier to provide qualitative analysis. We provide a quantitative analysis by analyzing the correlation between the number of like and retweet of each tweet. As a result of the qualitative analysis, we suggest solutions to attract the interest of the public or consumers. As a result of the quantitative analysis, we conclude that the positive tweet should be brief to attract the users' attention on the Twitter. As future work, we will continue to analyze various social issues.

Building Specialized Language Model for National R&D through Knowledge Transfer Based on Further Pre-training (추가 사전학습 기반 지식 전이를 통한 국가 R&D 전문 언어모델 구축)

  • Yu, Eunji;Seo, Sumin;Kim, Namgyu
    • Knowledge Management Research
    • /
    • v.22 no.3
    • /
    • pp.91-106
    • /
    • 2021
  • With the recent rapid development of deep learning technology, the demand for analyzing huge text documents in the national R&D field from various perspectives is rapidly increasing. In particular, interest in the application of a BERT(Bidirectional Encoder Representations from Transformers) language model that has pre-trained a large corpus is growing. However, the terminology used frequently in highly specialized fields such as national R&D are often not sufficiently learned in basic BERT. This is pointed out as a limitation of understanding documents in specialized fields through BERT. Therefore, this study proposes a method to build an R&D KoBERT language model that transfers national R&D field knowledge to basic BERT using further pre-training. In addition, in order to evaluate the performance of the proposed model, we performed classification analysis on about 116,000 R&D reports in the health care and information and communication fields. Experimental results showed that our proposed model showed higher performance in terms of accuracy compared to the pure KoBERT model.

A study on the systematic operation of the innovative patent strategy framework and the application plan of patent big data to secure competitive advantage (혁신특허전략 프레임워크의 체계적 운영 및 경쟁우위확보를 위한 특허빅테이터 활용방안에 관한 연구)

  • Kim, Hyun Ah;Cha, Wan Kyu
    • The Journal of the Convergence on Culture Technology
    • /
    • v.7 no.2
    • /
    • pp.351-357
    • /
    • 2021
  • At the time when interest in the use of big data is rising in the face of the technological paradigm shift of the 4th industrial revolution, interest in the use of patented big data is increasing, especially as the proportion of intangible assets of companies increases. In addition to quantitative information, patent data contains various information such as unstructured text such as title, abstract, claim, citation and citation relations, drawings, and technology classification. It is judged that the use of treatment is important. Therefore, in this study, in order to systematically operate the innovative patent strategy framework and to secure a competitive advantage by strengthening the fundamental technological competitiveness of the company, we propose a method of using patent big data centering on the case of Company A, and verify its validity. I would like to suggest some implications. Through this, it is intended to raise awareness of the use of patent big data, and to suggest ways to use patent big data in connection with the company's company-wide strategy, business strategy, and functional strategy.

The Importance of Multimedia for Professional Training of Future Specialists

  • Plakhotnik, Oleh;Strazhnikova, Inna;Yehorova, Inha;Semchuk, Svitlana;Tymchenko, Alla;Logvinova, Yaroslava;Kuchai, Oleksandr
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.9
    • /
    • pp.43-50
    • /
    • 2022
  • For high-quality education of the modern generation of students, forms of organizing the educational process and the latest methods of obtaining knowledge that differ from traditional ones are necessary. The importance of multimedia teaching tools is shown, which are promising and highly effective tools that allow the teacher not only to present an array of information in a larger volume than traditional sources of information, but also to include text, graphs, diagrams, sound, animation, video, etc. in a visually integrated form. Approaches to the classification of multimedia learning tools are revealed. Special features, advantages of multimedia, expediency of use and their disadvantages are highlighted. A comprehensive analysis of the capabilities of multimedia teaching tools gave grounds for identifying the didactic functions that they perform. Several areas of multimedia application are described. Multimedia technologies make it possible to implement several basic methods of pedagogical activity, which are traditionally divided into active and passive principles of student interaction with the computer, which are revealed in the article. Important conditions for the implementation of multimedia technologies in the educational process are indicated. The feasibility of using multimedia in education is illustrated by examples. Of particular importance in education are game forms of learning, in the implementation of which educational elements based on media material play an important role. The influence of the game on the development of attention by means of works of media culture, which are very diverse in form and character, is shown. The importance of the role of multimedia in student education is indicated. In the educational process of multimedia students, a number of educational functions are implemented, which are presented in the article. Recommendations for using multimedia are given.

Text Mining-Based Emerging Trend Analysis for e-Learning Contents Targeting for CEO (텍스트마이닝을 통한 최고경영자 대상 이러닝 콘텐츠 트렌드 분석)

  • Kyung-Hoon Kim;Myungsin Chae;Byungtae Lee
    • Information Systems Review
    • /
    • v.19 no.2
    • /
    • pp.1-19
    • /
    • 2017
  • Original scripts of e-learning lectures for the CEOs of corporation S were analyzed using topic analysis, which is a text mining method. Twenty-two topics were extracted based on the keywords chosen from five-year records that ranged from 2011 to 2015. Research analysis was then conducted on various issues. Promising topics were selected through evaluation and element analysis of the members of each topic. In management and economics, members demonstrated high satisfaction and interest toward topics in marketing strategy, human resource management, and communication. Philosophy, history of war, and history demonstrated high interest and satisfaction in the field of humanities, whereas mind health showed high interest and satisfaction in the field of in lifestyle. Studies were also conducted to identify topics on the proportion of content, but these studies failed to increase member satisfaction. In the field of IT, educational content responds sensitively to change of the times, but it may not increase the interest and satisfaction of members. The present study found that content production for CEOs should draw out deep implications for value innovation through technology application instead of simply ending the technical aspect of information delivery. Previous studies classified contents superficially based on the name of content program when analyzing the status of content operation. However, text mining can derive deep content and subject classification based on the contents of unstructured data script. This approach can examine current shortages and necessary fields if the service contents of the themes are displayed by year. This study was based on data obtained from influential e-learning companies in Korea. Obtaining practical results was difficult because data were not acquired from portal sites or social networking service. The content of e-learning trends of CEOs were analyzed. Data analysis was also conducted on the intellectual interests of CEOs in each field.

Application of Advertisement Filtering Model and Method for its Performance Improvement (광고 글 필터링 모델 적용 및 성능 향상 방안)

  • Park, Raegeun;Yun, Hyeok-Jin;Shin, Ui-Cheol;Ahn, Young-Jin;Jeong, Seungdo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.21 no.11
    • /
    • pp.1-8
    • /
    • 2020
  • In recent years, due to the exponential increase in internet data, many fields such as deep learning have developed, but side effects generated as commercial advertisements, such as viral marketing, have been discovered. This not only damages the essence of the internet for sharing high-quality information, but also causes problems that increase users' search times to acquire high-quality information. In this study, we define advertisement as "a text that obscures the essence of information transmission" and we propose a model for filtering information according to that definition. The proposed model consists of advertisement filtering and advertisement filtering performance improvement and is designed to continuously improve performance. We collected data for filtering advertisements and learned document classification using KorBERT. Experiments were conducted to verify the performance of this model. For data combining five topics, accuracy and precision were 89.2% and 84.3%, respectively. High performance was confirmed, even if atypical characteristics of advertisements are considered. This approach is expected to reduce wasted time and fatigue in searching for information, because our model effectively delivers high-quality information to users through a process of determining and filtering advertisement paragraphs.

Study of Rhetorical Puns in Korean Comic Strips in Daily Newspaper (한국 신문만화의 언어유희적 기법 연구)

  • Kim, Eul-Ho
    • Cartoon and Animation Studies
    • /
    • s.10
    • /
    • pp.1-16
    • /
    • 2006
  • This thesis aims to recall the importance of language in comics by studying comic strips in Korean daily newspapers: the comic strips are analyzed for rhetorical puns in its language text as they representatively show the value and role of language in comics. Moreover, Korean comic strips, as they developed into current affairs comics, acquired a stronger media characteristic of communicating information compared to other genres of cartoons. As a result, comics strips have become a genre where language plays an important role and the words needing to be able to convey the meaning quickly and implicitly. Due to tight control of national authority, the language technique developed into an indirect expression rather than a stronger direct imaging technique. The political oppression of the comic strip paradoxically brought on the rhetorical development in the creative techniques. Based on this analysis, the writer studied the rhetorical puns of the texts Korean comic strips by implementing the classification techniques of rhetoric expressions. As a result, through quotes and analysis of actual comic strips, the writer confirmed that Korean comic strips do actually show tremendously vast rhetorical puns in its language application techniques. The writer was also able to conclude that the rhetorical puns in comics were the force entertaining and impressing the readers, and also acting as the creative principle. Concluding this study, the writer emphasizes that language, not only in comic strips, is a combination of words and images and is also an important factor in all cartoons in general. Thus the thesis proposes that the training of humanistic thoughts and linguistic sensitivity are as important as learning to draw in the creation of cartoons.

  • PDF