Search | Korea Science

Prompt engineering to improve the performance of teaching and learning materials Recommendation of Generative Artificial Intelligence

Soo-Hwan Lee;Ki-Sang Song
- Journal of the Korea Society of Computer and Information
- /
- v.28 no.8
- /
- pp.195-204
- /
- 2023
In this study, prompt engineering that improves prompts was explored to improve the performance of teaching and learning materials recommendations using generative artificial intelligence such as GPT and Stable Diffusion. Picture materials were used as the types of teaching and learning materials. To explore the impact of the prompt composition, a Zero-Shot prompt, a prompt containing learning target grade information, a prompt containing learning goals, and a prompt containing both learning target grades and learning goals were designed to collect responses. The collected responses were embedded using Sentence Transformers, dimensionalized to t-SNE, and visualized, and then the relationship between prompts and responses was explored. In addition, each response was clustered using the k-means clustering algorithm, then the adjacent value of the widest cluster was selected as a representative value, imaged using Stable Diffusion, and evaluated by 30 elementary school teachers according to the criteria for evaluating teaching and learning materials. Thirty teachers judged that three of the four picture materials recommended were of educational value, and two of them could be used for actual classes. The prompt that recommended the most valuable picture material appeared as a prompt containing both the target grade and the learning goal.
https://doi.org/10.9708/jksci.2023.28.08.195 인용 PDF HTML

3DentAI: U-Nets for 3D Oral Structure Reconstruction from Panoramic X-rays (3DentAI: 파노라마 X-ray로부터 3차원 구강구조 복원을 위한 U-Nets)

Anusree P.Sunilkumar;Seong Yong Moon;Wonsang You
- The Transactions of the Korea Information Processing Society
- /
- v.13 no.7
- /
- pp.326-334
- /
- 2024
Extra-oral imaging techniques such as Panoramic X-rays (PXs) and Cone Beam Computed Tomography (CBCT) are the most preferred imaging modalities in dental clinics owing to its patient convenience during imaging as well as their ability to visualize entire teeth information. PXs are preferred for routine clinical treatments and CBCTs for complex surgeries and implant treatments. However, PXs are limited by the lack of third dimensional spatial information whereas CBCTs inflict high radiation exposure to patient. When a PX is already available, it is beneficial to reconstruct the 3D oral structure from the PX to avoid further expenses and radiation dose. In this paper, we propose 3DentAI - an U-Net based deep learning framework for 3D reconstruction of oral structure from a PX image. Our framework consists of three module - a reconstruction module based on attention U-Net for estimating depth from a PX image, a realignment module for aligning the predicted flattened volume to the shape of jaw using a predefined focal trough and ray data, and lastly a refinement module based on 3D U-Net for interpolating the missing information to obtain a smooth representation of oral cavity. Synthetic PXs obtained from CBCT by ray tracing and rendering were used to train the networks without the need of paired PX and CBCT datasets. Our method, trained and tested on a diverse datasets of 600 patients, achieved superior performance to GAN-based models even with low computational complexity.
https://doi.org/10.3745/TKIPS.2024.13.7.326 인용 PDF

Dual Codec Based Joint Bit Rate Control Scheme for Terrestrial Stereoscopic 3DTV Broadcast (지상파 스테레오스코픽 3DTV 방송을 위한 이종 부호화기 기반 합동 비트율 제어 연구)

Chang, Yong-Jun;Kim, Mun-Churl
- Journal of Broadcast Engineering
- /
- v.16 no.2
- /
- pp.216-225
- /
- 2011
Following the proliferation of three-dimensional video contents and displays, many terrestrial broadcasting companies have been preparing for stereoscopic 3DTV service. In terrestrial stereoscopic broadcast, it is a difficult task to code and transmit two video sequences while sustaining as high quality as 2DTV broadcast due to the limited bandwidth defined by the existing digital TV standards such as ATSC. Thus, a terrestrial 3DTV broadcasting with a heterogeneous video codec system, where the left image and right images are based on MPEG-2 and H.264/AVC, respectively, is considered in order to achieve both high quality broadcasting service and compatibility for the existing 2DTV viewers. Without significant change in the current terrestrial broadcasting systems, we propose a joint rate control scheme for stereoscopic 3DTV service based on the heterogeneous dual codec systems. The proposed joint rate control scheme applies to the MPEG-2 encoder a quadratic rate-quantization model which is adopted in the H.264/AVC. Then the controller is designed for the sum of the left and right bitstreams to meet the bandwidth requirement of broadcasting standards while the sum of image distortions is minimized by adjusting quantization parameter obtained from the proposed optimization scheme. Besides, we consider a condition on maintaining quality difference between the left and right images around a desired level in the optimization in order to mitigate negative effects on human visual system. Experimental results demonstrate that the proposed bit rate control scheme outperforms the rate control method where each video coding standard uses its own bit rate control algorithm independently in terms of the increase in PSNR by 2.02%, the decrease in the average absolute quality difference by 77.6% and the reduction in the variance of the quality difference by 74.38%.
https://doi.org/10.5909/JEB.2011.16.2.216 인용 PDF KSCI

A Study on the Necessity of Verification and Certification System of Inspection and Diagnostic Equipment for Infrastructure using Advanced Technologies (첨단 시설물 점검 및 진단장비 검·인증제도 도입 필요성에 대한 연구)

Hong, Sung-Ho;Kim, Jung-Gon;Cho, Jae-Young;Kim, Twae-Hwan
- Journal of the Society of Disaster Information
- /
- v.16 no.1
- /
- pp.163-177
- /
- 2020
Purpose: While it is very important to maintain facilities recently, the introduction and its application of high technology in the facility maintenance industry has increased. It is necessary for high technology to secure reliability through the verification and certification system of diagnostic equipment to have an effective impact in the field, but there is difference between the industry's perspective and realistic level of technology apart from social demand for the system of the system. This paper dealt with the introduction of a verification and certification system for rational facility diagnostic equipment with the opinion survey on managers about the current situation. Method: Survey is carried out targeting managers in the maintenance and construction regarding the necessity and urgency of introducing a verification and certification system to promote the introduction and its application of high technology of diagnostic equipment and facility inspection. Also, the introduction to a verification and certification system was reviewed for advanced facility diagnostic equipment through foreign research about similar systems and comparative analysis of similar systems related to the certification of 21 domestic equipment. Result: It showed that, regarding the application of high technology, it is necessary for most managers to introduce high technology such as drones, robots, etc., in the maintenance industry, and when high technology is introduced, it will have a considerable effect in the field. On the other hand, the current technology level in Korea is relatively low, so it turned out to take a certain amount of time for the application of technology. Also, it was found that the management of reliable facility diagnostic equipment will be possible through the introduction of the verification and certification system for facility diagnosis equipment. Meanwhile, the survey is conducted on similar systems about foreign and domestic diagnosis and measuring equipment, etc., but there is no system to verify and certify equipment applied with high technology directly to facility diagnosis maintenance. However, because Japan has a system of verifying the performance of diagnostic equipment and South Korea has 21 similar inspection and diagnostic equipment certification systems among 186 certification systems, it is considered to be possible to design systems which utilize them. Conclusion: According to the managers' opinion, it seems that the introduction of the system supporting the application of 4th industrial technology for the equipment and the use of the equipment with high reliability has sufficient validity. However, because our high technology level is undervalued compared to the urgency, the system for checking high technology facilities and certifying diagnostic equipment should be to be implemented in form of escalation considering technical use and verification level. Apart from the introduction of the verification and certification system, it is necessary for special investment, support and efforts to promote advanced facility diagnostic equipment.
https://doi.org/10.15683/kosdi.2020.3.31.163 인용 PDF KSCI

Development of a Device for Estimating the Optimal Artificial Insemination Time of Individually Stalled Sows Using Image Processing (영상처리기법을 이용한 스톨 사육 모돈의 인공수정적기 예측 장치 개발)

Kim, D.J.;Yeon, S.C.;Chang, H.H.
- Journal of Animal Science and Technology
- /
- v.49 no.5
- /
- pp.677-688
- /
- 2007
돼지를 포함한 대부분의 동물은 일정한 발정주기를 가지고 일정한 시기에 배란을 하는 자연배란동물이지만, 토끼, 고양이, 밍크 등의 암놈은 교미자극에 의해 배란이 일어나는 유기배란동물이다. 또한 1년에 한 번만 발정하는 단발정동물과 1년에 수차례 발정하는 다발정동물이 있다. 이 중에서 모돈은 1년에 수차례 발정하는 다발정 동물로서 발정기에 들면 비발정기와는 다른 행동을 나타낸다(Diehl 등, 2001). 양돈가의 수익을 최대화하기 위해서는 비생산일수를 최소로 줄여야 한다. 모돈의 비생산일수를 줄일 수 있는 한 가지 방법은 성공적으로 교배를 시키는 것이다. 이처럼 성공적으로 교배를 시키기 위해서는 수정적기를 정확히 예측해야 한다. 만약 수정적기를 정확히 판단하지 못하여 수태가 되지 않으면, 비생산일수가 늘어나 손실을 입게 된다. 따라서 수정적기를 정확히 판단하는 것은 모돈의 성공적인 인공수정에 있어서 중요한 요소이다. 수정적기는 배란이 일어나기 전 10시간에서 12시간 사이이며, 발정이 시작되는 시점을 기준으로 하였을 때 경산돈의 경우 26시간에서 34시간 사이이고 미경산돈의 경우는 18시간에서 26시간 사이이다(Evans 등, 2001). 현재 하루에 두 번 모돈의 발정을 확인하는 것이 일반화되어 있으며, 이 때 웅돈을 접촉시키거나 육안관찰을 통하여 발정 유무를 판단한다. 이러한 방법에는 숙련된 기술과 풍부한 경험이 요구될 뿐만 아니라 총 소요노동력의 30% 정도가 요구된다(Perez 등, 1986). 하루에 두 번밖에 발정을 감지하지 않기 때문에 발정이 언제 시작되었는지를 정확히 알 수 없으며, 또한 발정의 대부분이 새벽에 시작되므로 수정적기를 정확히 판단하기란 매우 어렵다. 만약 발정을 감지했더라도 적기에 인공수정을 하지 못한다면, 수태율이 낮아지므로 경제적 손실이 초래된다. 현재 이러한 문제점 때문에 2회에서 3회에 걸쳐 인공수정을 하고 있으나 이에 따른 소요비용과 소요노동력 등은 양돈가의 부담을 가중시키는 요인이 되고 있다. 돼지는 발정기가 되면 비발정기에 나타내지 않던 외음부의 냄새를 맡는 행동, 귀를 세우는 행동 및 승가허용 행동 등을 나타낸다(Diehl 등, 2001). 또한 돼지는 비발정기에 비하여 발정기에 더 많은 활동량을 나타낸다(Altman, 1941; Erez and Hartsock, 1990). Freson 등(1998)은 스톨에서 개별적으로 사육되고 있는 모돈의 활동량을 적외선센서를 이용하여 측정함으로써 발정을 86%까지 감지하였다고 보고하였다. 그러나 이 연구는 단지 모돈의 발정을 감지하였을 뿐 번식관리에 있어서 가장 중요한 수정적기의 판단 기준을 제시하지 못하였다. 따라서, 본 연구는 스톨에서 사육되는 모돈의 활동량을 측정함으로써 발정시작시각을 감지하고 이를 기준으로 인공수정적기를 예측할 수 있는 인공수정적기 예측 장치를 개발한 후 이의 성능을 농장실증실험을 통하여 시험하고자 수행되었다.
https://doi.org/10.5187/JAST.2007.49.5.677 인용 PDF KSCI

Development of Yóukè Mining System with Yóukè's Travel Demand and Insight Based on Web Search Traffic Information (웹검색 트래픽 정보를 활용한 유커 인바운드 여행 수요 예측 모형 및 유커마이닝 시스템 개발)

Choi, Youji;Park, Do-Hyung
- Journal of Intelligence and Information Systems
- /
- v.23 no.3
- /
- pp.155-175
- /
- 2017
As social data become into the spotlight, mainstream web search engines provide data indicate how many people searched specific keyword: Web Search Traffic data. Web search traffic information is collection of each crowd that search for specific keyword. In a various area, web search traffic can be used as one of useful variables that represent the attention of common users on specific interests. A lot of studies uses web search traffic data to nowcast or forecast social phenomenon such as epidemic prediction, consumer pattern analysis, product life cycle, financial invest modeling and so on. Also web search traffic data have begun to be applied to predict tourist inbound. Proper demand prediction is needed because tourism is high value-added industry as increasing employment and foreign exchange. Among those tourists, especially Chinese tourists: Youke is continuously growing nowadays, Youke has been largest tourist inbound of Korea tourism for many years and tourism profits per one Youke as well. It is important that research into proper demand prediction approaches of Youke in both public and private sector. Accurate tourism demands prediction is important to efficient decision making in a limited resource. This study suggests improved model that reflects latest issue of society by presented the attention from group of individual. Trip abroad is generally high-involvement activity so that potential tourists likely deep into searching for information about their own trip. Web search traffic data presents tourists' attention in the process of preparation their journey instantaneous and dynamic way. So that this study attempted select key words that potential Chinese tourists likely searched out internet. Baidu-Chinese biggest web search engine that share over 80%- provides users with accessing to web search traffic data. Qualitative interview with potential tourists helps us to understand the information search behavior before a trip and identify the keywords for this study. Selected key words of web search traffic are categorized by how much directly related to "Korean Tourism" in a three levels. Classifying categories helps to find out which keyword can explain Youke inbound demands from close one to far one as distance of category. Web search traffic data of each key words gathered by web crawler developed to crawling web search data onto Baidu Index. Using automatically gathered variable data, linear model is designed by multiple regression analysis for suitable for operational application of decision and policy making because of easiness to explanation about variables' effective relationship. After regression linear models have composed, comparing with model composed traditional variables and model additional input web search traffic data variables to traditional model has conducted by significance and R squared. after comparing performance of models, final model is composed. Final regression model has improved explanation and advantage of real-time immediacy and convenience than traditional model. Furthermore, this study demonstrates system intuitively visualized to general use -Youke Mining solution has several functions of tourist decision making including embed final regression model. Youke Mining solution has algorithm based on data science and well-designed simple interface. In the end this research suggests three significant meanings on theoretical, practical and political aspects. Theoretically, Youke Mining system and the model in this research are the first step on the Youke inbound prediction using interactive and instant variable: web search traffic information represents tourists' attention while prepare their trip. Baidu web search traffic data has more than 80% of web search engine market. Practically, Baidu data could represent attention of the potential tourists who prepare their own tour as real-time. Finally, in political way, designed Chinese tourist demands prediction model based on web search traffic can be used to tourism decision making for efficient managing of resource and optimizing opportunity for successful policy.
https://doi.org/10.13088/jiis.2017.23.3.155 인용 PDF KSCI

Different Look, Different Feel: Social Robot Design Evaluation Model Based on ABOT Attributes and Consumer Emotions (각인각색, 각봇각색: ABOT 속성과 소비자 감성 기반 소셜로봇 디자인평가 모형 개발)

Ha, Sangjip;Lee, Junsik;Yoo, In-Jin;Park, Do-Hyung
- Journal of Intelligence and Information Systems
- /
- v.27 no.2
- /
- pp.55-78
- /
- 2021
Tosolve complex and diverse social problems and ensure the quality of life of individuals, social robots that can interact with humans are attracting attention. In the past, robots were recognized as beings that provide labor force as they put into industrial sites on behalf of humans. However, the concept of today's robot has been extended to social robots that coexist with humans and enable social interaction with the advent of Smart technology, which is considered an important driver in most industries. Specifically, there are service robots that respond to customers, the robots that have the purpose of edutainment, and the emotionalrobots that can interact with humans intimately. However, popularization of robots is not felt despite the current information environment in the modern ICT service environment and the 4th industrial revolution. Considering social interaction with users which is an important function of social robots, not only the technology of the robots but also other factors should be considered. The design elements of the robot are more important than other factors tomake consumers purchase essentially a social robot. In fact, existing studies on social robots are at the level of proposing "robot development methodology" or testing the effects provided by social robots to users in pieces. On the other hand, consumer emotions felt from the robot's appearance has an important influence in the process of forming user's perception, reasoning, evaluation and expectation. Furthermore, it can affect attitude toward robots and good feeling and performance reasoning, etc. Therefore, this study aims to verify the effect of appearance of social robot and consumer emotions on consumer's attitude toward social robot. At this time, a social robot design evaluation model is constructed by combining heterogeneous data from different sources. Specifically, the three quantitative indicator data for the appearance of social robots from the ABOT Database is included in the model. The consumer emotions of social robot design has been collected through (1) the existing design evaluation literature and (2) online buzzsuch as product reviews and blogs, (3) qualitative interviews for social robot design. Later, we collected the score of consumer emotions and attitudes toward various social robots through a large-scale consumer survey. First, we have derived the six major dimensions of consumer emotions for 23 pieces of detailed emotions through dimension reduction methodology. Then, statistical analysis was performed to verify the effect of derived consumer emotionson attitude toward social robots. Finally, the moderated regression analysis was performed to verify the effect of quantitatively collected indicators of social robot appearance on the relationship between consumer emotions and attitudes toward social robots. Interestingly, several significant moderation effects were identified, these effects are visualized with two-way interaction effect to interpret them from multidisciplinary perspectives. This study has theoretical contributions from the perspective of empirically verifying all stages from technical properties to consumer's emotion and attitudes toward social robots by linking the data from heterogeneous sources. It has practical significance that the result helps to develop the design guidelines based on consumer emotions in the design stage of social robot development.
https://doi.org/10.13088/jiis.2021.27.2.055 인용 PDF KSCI

A Study on Differences of Contents and Tones of Arguments among Newspapers Using Text Mining Analysis (텍스트 마이닝을 활용한 신문사에 따른 내용 및 논조 차이점 분석)

Kam, Miah;Song, Min
- Journal of Intelligence and Information Systems
- /
- v.18 no.3
- /
- pp.53-77
- /
- 2012
This study analyses the difference of contents and tones of arguments among three Korean major newspapers, the Kyunghyang Shinmoon, the HanKyoreh, and the Dong-A Ilbo. It is commonly accepted that newspapers in Korea explicitly deliver their own tone of arguments when they talk about some sensitive issues and topics. It could be controversial if readers of newspapers read the news without being aware of the type of tones of arguments because the contents and the tones of arguments can affect readers easily. Thus it is very desirable to have a new tool that can inform the readers of what tone of argument a newspaper has. This study presents the results of clustering and classification techniques as part of text mining analysis. We focus on six main subjects such as Culture, Politics, International, Editorial-opinion, Eco-business and National issues in newspapers, and attempt to identify differences and similarities among the newspapers. The basic unit of text mining analysis is a paragraph of news articles. This study uses a keyword-network analysis tool and visualizes relationships among keywords to make it easier to see the differences. Newspaper articles were gathered from KINDS, the Korean integrated news database system. KINDS preserves news articles of the Kyunghyang Shinmun, the HanKyoreh and the Dong-A Ilbo and these are open to the public. This study used these three Korean major newspapers from KINDS. About 3,030 articles from 2008 to 2012 were used. International, national issues and politics sections were gathered with some specific issues. The International section was collected with the keyword of 'Nuclear weapon of North Korea.' The National issues section was collected with the keyword of '4-major-river.' The Politics section was collected with the keyword of 'Tonghap-Jinbo Dang.' All of the articles from April 2012 to May 2012 of Eco-business, Culture and Editorial-opinion sections were also collected. All of the collected data were handled and edited into paragraphs. We got rid of stop-words using the Lucene Korean Module. We calculated keyword co-occurrence counts from the paired co-occurrence list of keywords in a paragraph. We made a co-occurrence matrix from the list. Once the co-occurrence matrix was built, we used the Cosine coefficient matrix as input for PFNet(Pathfinder Network). In order to analyze these three newspapers and find out the significant keywords in each paper, we analyzed the list of 10 highest frequency keywords and keyword-networks of 20 highest ranking frequency keywords to closely examine the relationships and show the detailed network map among keywords. We used NodeXL software to visualize the PFNet. After drawing all the networks, we compared the results with the classification results. Classification was firstly handled to identify how the tone of argument of a newspaper is different from others. Then, to analyze tones of arguments, all the paragraphs were divided into two types of tones, Positive tone and Negative tone. To identify and classify all of the tones of paragraphs and articles we had collected, supervised learning technique was used. The Na$\ddot{i}$ve Bayesian classifier algorithm provided in the MALLET package was used to classify all the paragraphs in articles. After classification, Precision, Recall and F-value were used to evaluate the results of classification. Based on the results of this study, three subjects such as Culture, Eco-business and Politics showed some differences in contents and tones of arguments among these three newspapers. In addition, for the National issues, tones of arguments on 4-major-rivers project were different from each other. It seems three newspapers have their own specific tone of argument in those sections. And keyword-networks showed different shapes with each other in the same period in the same section. It means that frequently appeared keywords in articles are different and their contents are comprised with different keywords. And the Positive-Negative classification showed the possibility of classifying newspapers' tones of arguments compared to others. These results indicate that the approach in this study is promising to be extended as a new tool to identify the different tones of arguments of newspapers.
https://doi.org/10.13088/jiis.2012.18.3.053 인용 PDF KSCI

The Pattern Analysis of Financial Distress for Non-audited Firms using Data Mining (데이터마이닝 기법을 활용한 비외감기업의 부실화 유형 분석)

Lee, Su Hyun;Park, Jung Min;Lee, Hyoung Yong
- Journal of Intelligence and Information Systems
- /
- v.21 no.4
- /
- pp.111-131
- /
- 2015
There are only a handful number of research conducted on pattern analysis of corporate distress as compared with research for bankruptcy prediction. The few that exists mainly focus on audited firms because financial data collection is easier for these firms. But in reality, corporate financial distress is a far more common and critical phenomenon for non-audited firms which are mainly comprised of small and medium sized firms. The purpose of this paper is to classify non-audited firms under distress according to their financial ratio using data mining; Self-Organizing Map (SOM). SOM is a type of artificial neural network that is trained using unsupervised learning to produce a lower dimensional discretized representation of the input space of the training samples, called a map. SOM is different from other artificial neural networks as it applies competitive learning as opposed to error-correction learning such as backpropagation with gradient descent, and in the sense that it uses a neighborhood function to preserve the topological properties of the input space. It is one of the popular and successful clustering algorithm. In this study, we classify types of financial distress firms, specially, non-audited firms. In the empirical test, we collect 10 financial ratios of 100 non-audited firms under distress in 2004 for the previous two years (2002 and 2003). Using these financial ratios and the SOM algorithm, five distinct patterns were distinguished. In pattern 1, financial distress was very serious in almost all financial ratios. 12% of the firms are included in these patterns. In pattern 2, financial distress was weak in almost financial ratios. 14% of the firms are included in pattern 2. In pattern 3, growth ratio was the worst among all patterns. It is speculated that the firms of this pattern may be under distress due to severe competition in their industries. Approximately 30% of the firms fell into this group. In pattern 4, the growth ratio was higher than any other pattern but the cash ratio and profitability ratio were not at the level of the growth ratio. It is concluded that the firms of this pattern were under distress in pursuit of expanding their business. About 25% of the firms were in this pattern. Last, pattern 5 encompassed very solvent firms. Perhaps firms of this pattern were distressed due to a bad short-term strategic decision or due to problems with the enterpriser of the firms. Approximately 18% of the firms were under this pattern. This study has the academic and empirical contribution. In the perspectives of the academic contribution, non-audited companies that tend to be easily bankrupt and have the unstructured or easily manipulated financial data are classified by the data mining technology (Self-Organizing Map) rather than big sized audited firms that have the well prepared and reliable financial data. In the perspectives of the empirical one, even though the financial data of the non-audited firms are conducted to analyze, it is useful for find out the first order symptom of financial distress, which makes us to forecast the prediction of bankruptcy of the firms and to manage the early warning and alert signal. These are the academic and empirical contribution of this study. The limitation of this research is to analyze only 100 corporates due to the difficulty of collecting the financial data of the non-audited firms, which make us to be hard to proceed to the analysis by the category or size difference. Also, non-financial qualitative data is crucial for the analysis of bankruptcy. Thus, the non-financial qualitative factor is taken into account for the next study. This study sheds some light on the non-audited small and medium sized firms' distress prediction in the future.
https://doi.org/10.13088/jiis.2015.21.4.111 인용 PDF KSCI

Deep Learning-based Professional Image Interpretation Using Expertise Transplant (전문성 이식을 통한 딥러닝 기반 전문 이미지 해석 방법론)

Kim, Taejin;Kim, Namgyu
- Journal of Intelligence and Information Systems
- /
- v.26 no.2
- /
- pp.79-104
- /
- 2020
Recently, as deep learning has attracted attention, the use of deep learning is being considered as a method for solving problems in various fields. In particular, deep learning is known to have excellent performance when applied to applying unstructured data such as text, sound and images, and many studies have proven its effectiveness. Owing to the remarkable development of text and image deep learning technology, interests in image captioning technology and its application is rapidly increasing. Image captioning is a technique that automatically generates relevant captions for a given image by handling both image comprehension and text generation simultaneously. In spite of the high entry barrier of image captioning that analysts should be able to process both image and text data, image captioning has established itself as one of the key fields in the A.I. research owing to its various applicability. In addition, many researches have been conducted to improve the performance of image captioning in various aspects. Recent researches attempt to create advanced captions that can not only describe an image accurately, but also convey the information contained in the image more sophisticatedly. Despite many recent efforts to improve the performance of image captioning, it is difficult to find any researches to interpret images from the perspective of domain experts in each field not from the perspective of the general public. Even for the same image, the part of interests may differ according to the professional field of the person who has encountered the image. Moreover, the way of interpreting and expressing the image also differs according to the level of expertise. The public tends to recognize the image from a holistic and general perspective, that is, from the perspective of identifying the image's constituent objects and their relationships. On the contrary, the domain experts tend to recognize the image by focusing on some specific elements necessary to interpret the given image based on their expertise. It implies that meaningful parts of an image are mutually different depending on viewers' perspective even for the same image. So, image captioning needs to implement this phenomenon. Therefore, in this study, we propose a method to generate captions specialized in each domain for the image by utilizing the expertise of experts in the corresponding domain. Specifically, after performing pre-training on a large amount of general data, the expertise in the field is transplanted through transfer-learning with a small amount of expertise data. However, simple adaption of transfer learning using expertise data may invoke another type of problems. Simultaneous learning with captions of various characteristics may invoke so-called 'inter-observation interference' problem, which make it difficult to perform pure learning of each characteristic point of view. For learning with vast amount of data, most of this interference is self-purified and has little impact on learning results. On the contrary, in the case of fine-tuning where learning is performed on a small amount of data, the impact of such interference on learning can be relatively large. To solve this problem, therefore, we propose a novel 'Character-Independent Transfer-learning' that performs transfer learning independently for each character. In order to confirm the feasibility of the proposed methodology, we performed experiments utilizing the results of pre-training on MSCOCO dataset which is comprised of 120,000 images and about 600,000 general captions. Additionally, according to the advice of an art therapist, about 300 pairs of 'image / expertise captions' were created, and the data was used for the experiments of expertise transplantation. As a result of the experiment, it was confirmed that the caption generated according to the proposed methodology generates captions from the perspective of implanted expertise whereas the caption generated through learning on general data contains a number of contents irrelevant to expertise interpretation. In this paper, we propose a novel approach of specialized image interpretation. To achieve this goal, we present a method to use transfer learning and generate captions specialized in the specific domain. In the future, by applying the proposed methodology to expertise transplant in various fields, we expected that many researches will be actively conducted to solve the problem of lack of expertise data and to improve performance of image captioning.
https://doi.org/10.13088/jiis.2020.26.2.079 인용 PDF KSCI

Search Result 761, Processing Time 0.027 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)