• Title/Summary/Keyword: language training

Search Result 685, Processing Time 0.026 seconds

Feasibility of Deep Learning Algorithms for Binary Classification Problems (이진 분류문제에서의 딥러닝 알고리즘의 활용 가능성 평가)

  • Kim, Kitae;Lee, Bomi;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.1
    • /
    • pp.95-108
    • /
    • 2017
  • Recently, AlphaGo which is Bakuk (Go) artificial intelligence program by Google DeepMind, had a huge victory against Lee Sedol. Many people thought that machines would not be able to win a man in Go games because the number of paths to make a one move is more than the number of atoms in the universe unlike chess, but the result was the opposite to what people predicted. After the match, artificial intelligence technology was focused as a core technology of the fourth industrial revolution and attracted attentions from various application domains. Especially, deep learning technique have been attracted as a core artificial intelligence technology used in the AlphaGo algorithm. The deep learning technique is already being applied to many problems. Especially, it shows good performance in image recognition field. In addition, it shows good performance in high dimensional data area such as voice, image and natural language, which was difficult to get good performance using existing machine learning techniques. However, in contrast, it is difficult to find deep leaning researches on traditional business data and structured data analysis. In this study, we tried to find out whether the deep learning techniques have been studied so far can be used not only for the recognition of high dimensional data but also for the binary classification problem of traditional business data analysis such as customer churn analysis, marketing response prediction, and default prediction. And we compare the performance of the deep learning techniques with that of traditional artificial neural network models. The experimental data in the paper is the telemarketing response data of a bank in Portugal. It has input variables such as age, occupation, loan status, and the number of previous telemarketing and has a binary target variable that records whether the customer intends to open an account or not. In this study, to evaluate the possibility of utilization of deep learning algorithms and techniques in binary classification problem, we compared the performance of various models using CNN, LSTM algorithm and dropout, which are widely used algorithms and techniques in deep learning, with that of MLP models which is a traditional artificial neural network model. However, since all the network design alternatives can not be tested due to the nature of the artificial neural network, the experiment was conducted based on restricted settings on the number of hidden layers, the number of neurons in the hidden layer, the number of output data (filters), and the application conditions of the dropout technique. The F1 Score was used to evaluate the performance of models to show how well the models work to classify the interesting class instead of the overall accuracy. The detail methods for applying each deep learning technique in the experiment is as follows. The CNN algorithm is a method that reads adjacent values from a specific value and recognizes the features, but it does not matter how close the distance of each business data field is because each field is usually independent. In this experiment, we set the filter size of the CNN algorithm as the number of fields to learn the whole characteristics of the data at once, and added a hidden layer to make decision based on the additional features. For the model having two LSTM layers, the input direction of the second layer is put in reversed position with first layer in order to reduce the influence from the position of each field. In the case of the dropout technique, we set the neurons to disappear with a probability of 0.5 for each hidden layer. The experimental results show that the predicted model with the highest F1 score was the CNN model using the dropout technique, and the next best model was the MLP model with two hidden layers using the dropout technique. In this study, we were able to get some findings as the experiment had proceeded. First, models using dropout techniques have a slightly more conservative prediction than those without dropout techniques, and it generally shows better performance in classification. Second, CNN models show better classification performance than MLP models. This is interesting because it has shown good performance in binary classification problems which it rarely have been applied to, as well as in the fields where it's effectiveness has been proven. Third, the LSTM algorithm seems to be unsuitable for binary classification problems because the training time is too long compared to the performance improvement. From these results, we can confirm that some of the deep learning algorithms can be applied to solve business binary classification problems.

The way to make training data for deep learning model to recognize keywords in product catalog image at E-commerce (온라인 쇼핑몰에서 상품 설명 이미지 내의 키워드 인식을 위한 딥러닝 훈련 데이터 자동 생성 방안)

  • Kim, Kitae;Oh, Wonseok;Lim, Geunwon;Cha, Eunwoo;Shin, Minyoung;Kim, Jongwoo
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.1-23
    • /
    • 2018
  • From the 21st century, various high-quality services have come up with the growth of the internet or 'Information and Communication Technologies'. Especially, the scale of E-commerce industry in which Amazon and E-bay are standing out is exploding in a large way. As E-commerce grows, Customers could get what they want to buy easily while comparing various products because more products have been registered at online shopping malls. However, a problem has arisen with the growth of E-commerce. As too many products have been registered, it has become difficult for customers to search what they really need in the flood of products. When customers search for desired products with a generalized keyword, too many products have come out as a result. On the contrary, few products have been searched if customers type in details of products because concrete product-attributes have been registered rarely. In this situation, recognizing texts in images automatically with a machine can be a solution. Because bulk of product details are written in catalogs as image format, most of product information are not searched with text inputs in the current text-based searching system. It means if information in images can be converted to text format, customers can search products with product-details, which make them shop more conveniently. There are various existing OCR(Optical Character Recognition) programs which can recognize texts in images. But existing OCR programs are hard to be applied to catalog because they have problems in recognizing texts in certain circumstances, like texts are not big enough or fonts are not consistent. Therefore, this research suggests the way to recognize keywords in catalog with the Deep Learning algorithm which is state of the art in image-recognition area from 2010s. Single Shot Multibox Detector(SSD), which is a credited model for object-detection performance, can be used with structures re-designed to take into account the difference of text from object. But there is an issue that SSD model needs a lot of labeled-train data to be trained, because of the characteristic of deep learning algorithms, that it should be trained by supervised-learning. To collect data, we can try labelling location and classification information to texts in catalog manually. But if data are collected manually, many problems would come up. Some keywords would be missed because human can make mistakes while labelling train data. And it becomes too time-consuming to collect train data considering the scale of data needed or costly if a lot of workers are hired to shorten the time. Furthermore, if some specific keywords are needed to be trained, searching images that have the words would be difficult, as well. To solve the data issue, this research developed a program which create train data automatically. This program can make images which have various keywords and pictures like catalog and save location-information of keywords at the same time. With this program, not only data can be collected efficiently, but also the performance of SSD model becomes better. The SSD model recorded 81.99% of recognition rate with 20,000 data created by the program. Moreover, this research had an efficiency test of SSD model according to data differences to analyze what feature of data exert influence upon the performance of recognizing texts in images. As a result, it is figured out that the number of labeled keywords, the addition of overlapped keyword label, the existence of keywords that is not labeled, the spaces among keywords and the differences of background images are related to the performance of SSD model. This test can lead performance improvement of SSD model or other text-recognizing machine based on deep learning algorithm with high-quality data. SSD model which is re-designed to recognize texts in images and the program developed for creating train data are expected to contribute to improvement of searching system in E-commerce. Suppliers can put less time to register keywords for products and customers can search products with product-details which is written on the catalog.

A Study on improvement of curriculum in Nursing (간호학 교과과정 개선을 위한 조사 연구)

  • 김애실
    • Journal of Korean Academy of Nursing
    • /
    • v.4 no.2
    • /
    • pp.1-16
    • /
    • 1974
  • This Study involved the development of a survey form and the collection of data in an effort-to provide information which can be used in the improvement of nursing curricula. The data examined were the kinds courses currently being taught in the curricula of nursing education institutions throughout Korea, credits required for course completion, and year in-which courses are taken. For the purposes of this study, curricula were classified into college, nursing school and vocational school categories. Courses were directed into the 3 major categories of general education courses, supporting science courses and professional education course, and further subdirector as. follows: 1) General education (following the classification of Philip H. phoenix): a) Symbolics, b) Empirics, c) Aesthetics. 4) Synthetics, e) Ethics, f) Synoptic. 2) Supporting science: a) physical science, b) biological science, c) social science, d) behavioral science, e) Health science, f) Educations 3) Professional Education; a) basic courses, b) courses in each of the respective fields of nursing. Ⅰ. General Education aimed at developing the individual as a person and as a member of society is relatively strong in college curricula compared with the other two. a) Courses included in the category of symbolics included Korean language, English, German. Chines. Mathematics. Statics: Economics and Computer most college curricula included 20 credits. of courses in this sub-category, while nursing schools required 12 credits and vocational school 10 units. English ordinarily receives particularly heavy emphasis. b) Research methodology, Domestic affair and women & courtney was included under the category of empirics in the college curricula, nursing and vocational school do not offer this at all. c) Courses classified under aesthetics were physical education, drill, music, recreation and fine arts. Most college curricula had 4 credits in these areas, nursing school provided for 2 credits, and most vocational schools offered 10 units. d) Synoptic included leadership, interpersonal relationship, and communications, Most schools did not offer courses of this nature. e) The category of ethics included citizenship. 2 credits are provided in college curricula, while vocational schools require 4 units. Nursing schools do not offer these courses. f) Courses included under synoptic were Korean history, cultural history, philosophy, Logics, and religion. Most college curricular 5 credits in these areas, nursing schools 4 credits. and vocational schools 2 units. g) Only physical education was given every Year in college curricula and only English was given in nursing schools and vocational schools in every of the curriculum. Most of the other courses were given during the first year of the curriculum. Ⅱ. Supporting science courses are fundamental to the practice and application of nursing theory. a) Physical science course include physics, chemistry and natural science. most colleges and nursing schools provided for 2 credits of physical science courses in their curricula, while most vocational schools did not offer t me. b) Courses included under biological science were anatomy, physiologic, biology and biochemistry. Most college curricula provided for 15 credits of biological science, nursing schools for the most part provided for 11 credits, and most vocational schools provided for 8 units. c) Courses included under social science were sociology and anthropology. Most colleges provided for 1 credit in courses of this category, which most nursing schools provided for 2 creates Most vocational school did not provide courses of this type. d) Courses included under behavioral science were general and clinical psychology, developmental psychology. mental hygiene and guidance. Most schools did not provide for these courses. e) Courses included under health science included pharmacy and pharmacology, microbiology, pathology, nutrition and dietetics, parasitology, and Chinese medicine. Most college curricula provided for 11 credits, while most nursing schools provide for 12 credits, most part provided 20 units of medical courses. f) Courses included under education included educational psychology, principles of education, philosophy of education, history of education, social education, educational evaluation, educational curricula, class management, guidance techniques and school & community. Host college softer 3 credits in courses in this category, while nursing schools provide 8 credits and vocational schools provide for 6 units, 50% of the colleges prepare these students to qualify as regular teachers of the second level, while 91% of the nursing schools and 60% of the vocational schools prepare their of the vocational schools prepare their students to qualify as school nurse. g) The majority of colleges start supporting science courses in the first year and complete them by the second year. Nursing schools and vocational schools usually complete them in the first year. Ⅲ. Professional Education courses are designed to develop professional nursing knowledge, attitudes and skills in the students. a) Basic courses include social nursing, nursing ethics, history of nursing professional control, nursing administration, social medicine, social welfare, introductory nursing, advanced nursing, medical regulations, efficient nursing, nursing english and basic nursing, College curricula devoted 13 credits to these subjects, nursing schools 14 credits, and vocational schools 26 units indicating a severe difference in the scope of education provided. b) There was noticeable tendency for the colleges to take a unified approach to the branches of nursing. 60% of the schools had courses in public health nursing, 80% in pediatric nursing, 60% in obstetric nursing, 90% in psychiatric nursing and 80% in medical-surgical nursing. The greatest number of schools provided 48 crudites in all of these fields combined. in most of the nursing schools, 52 credits were provided for courses divided according to disease. in the vocational schools, unified courses are provided in public health nursing, child nursing, maternal nursing, psychiatric nursing and adult nursing. In addition, one unit is provided for one hour a week of practice. The total number of units provided in the greatest number of vocational schools is thus Ⅲ units double the number provided in nursing schools and colleges. c) In th leges, the second year is devoted mainly to basic nursing courses, while the third and fourth years are used for advanced nursing courses. In nursing schools and vocational schools, the first year deals primarily with basic nursing and the second and third years are used to cover advanced nursing courses. The study yielded the following conclusions. 1. Instructional goals should be established for each courses in line with the idea of nursing, and curriculum improvements should be made accordingly. 2. Course that fall under the synthetics category should be strengthened and ways should be sought to develop the ability to cooperate with those who work for human welfare and health. 3. The ability to solve problems on the basis of scientific principles and knowledge and understanding of man society should be fostered through a strengthening of courses dealing with physical sciences, social sciences and behavioral sciences and redistribution of courses emphasizing biological and health sciences. 4. There should be more balanced curricula with less emphasis on courses in the major There is a need to establish courses necessary for the individual nurse by doing away with courses centered around specific diseases and combining them in unified courses. In addition it is possible to develop skill in dealing with people by using the social setting in comprehensive training. The most efficient ratio of the study experience should be studied to provide more effective, interesting education Elective course should be initiated to insure a man flexible, responsive educational program. 5. The curriculum stipulated in the education law should be examined.

  • PDF

Analysis of Success Cases of InsurTech and Digital Insurance Platform Based on Artificial Intelligence Technologies: Focused on Ping An Insurance Group Ltd. in China (인공지능 기술 기반 인슈어테크와 디지털보험플랫폼 성공사례 분석: 중국 평안보험그룹을 중심으로)

  • Lee, JaeWon;Oh, SangJin
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.3
    • /
    • pp.71-90
    • /
    • 2020
  • Recently, the global insurance industry is rapidly developing digital transformation through the use of artificial intelligence technologies such as machine learning, natural language processing, and deep learning. As a result, more and more foreign insurers have achieved the success of artificial intelligence technology-based InsurTech and platform business, and Ping An Insurance Group Ltd., China's largest private company, is leading China's global fourth industrial revolution with remarkable achievements in InsurTech and Digital Platform as a result of its constant innovation, using 'finance and technology' and 'finance and ecosystem' as keywords for companies. In response, this study analyzed the InsurTech and platform business activities of Ping An Insurance Group Ltd. through the ser-M analysis model to provide strategic implications for revitalizing AI technology-based businesses of domestic insurers. The ser-M analysis model has been studied so that the vision and leadership of the CEO, the historical environment of the enterprise, the utilization of various resources, and the unique mechanism relationships can be interpreted in an integrated manner as a frame that can be interpreted in terms of the subject, environment, resource and mechanism. As a result of the case analysis, Ping An Insurance Group Ltd. has achieved cost reduction and customer service development by digitally innovating its entire business area such as sales, underwriting, claims, and loan service by utilizing core artificial intelligence technologies such as facial, voice, and facial expression recognition. In addition, "online data in China" and "the vast offline data and insights accumulated by the company" were combined with new technologies such as artificial intelligence and big data analysis to build a digital platform that integrates financial services and digital service businesses. Ping An Insurance Group Ltd. challenged constant innovation, and as of 2019, sales reached $155 billion, ranking seventh among all companies in the Global 2000 rankings selected by Forbes Magazine. Analyzing the background of the success of Ping An Insurance Group Ltd. from the perspective of ser-M, founder Mammingz quickly captured the development of digital technology, market competition and changes in population structure in the era of the fourth industrial revolution, and established a new vision and displayed an agile leadership of digital technology-focused. Based on the strong leadership led by the founder in response to environmental changes, the company has successfully led InsurTech and Platform Business through innovation of internal resources such as investment in artificial intelligence technology, securing excellent professionals, and strengthening big data capabilities, combining external absorption capabilities, and strategic alliances among various industries. Through this success story analysis of Ping An Insurance Group Ltd., the following implications can be given to domestic insurance companies that are preparing for digital transformation. First, CEOs of domestic companies also need to recognize the paradigm shift in industry due to the change in digital technology and quickly arm themselves with digital technology-oriented leadership to spearhead the digital transformation of enterprises. Second, the Korean government should urgently overhaul related laws and systems to further promote the use of data between different industries and provide drastic support such as deregulation, tax benefits and platform provision to help the domestic insurance industry secure global competitiveness. Third, Korean companies also need to make bolder investments in the development of artificial intelligence technology so that systematic securing of internal and external data, training of technical personnel, and patent applications can be expanded, and digital platforms should be quickly established so that diverse customer experiences can be integrated through learned artificial intelligence technology. Finally, since there may be limitations to generalization through a single case of an overseas insurance company, I hope that in the future, more extensive research will be conducted on various management strategies related to artificial intelligence technology by analyzing cases of multiple industries or multiple companies or conducting empirical research.

Sentiment Analysis of Movie Review Using Integrated CNN-LSTM Mode (CNN-LSTM 조합모델을 이용한 영화리뷰 감성분석)

  • Park, Ho-yeon;Kim, Kyoung-jae
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.141-154
    • /
    • 2019
  • Rapid growth of internet technology and social media is progressing. Data mining technology has evolved to enable unstructured document representations in a variety of applications. Sentiment analysis is an important technology that can distinguish poor or high-quality content through text data of products, and it has proliferated during text mining. Sentiment analysis mainly analyzes people's opinions in text data by assigning predefined data categories as positive and negative. This has been studied in various directions in terms of accuracy from simple rule-based to dictionary-based approaches using predefined labels. In fact, sentiment analysis is one of the most active researches in natural language processing and is widely studied in text mining. When real online reviews aren't available for others, it's not only easy to openly collect information, but it also affects your business. In marketing, real-world information from customers is gathered on websites, not surveys. Depending on whether the website's posts are positive or negative, the customer response is reflected in the sales and tries to identify the information. However, many reviews on a website are not always good, and difficult to identify. The earlier studies in this research area used the reviews data of the Amazon.com shopping mal, but the research data used in the recent studies uses the data for stock market trends, blogs, news articles, weather forecasts, IMDB, and facebook etc. However, the lack of accuracy is recognized because sentiment calculations are changed according to the subject, paragraph, sentiment lexicon direction, and sentence strength. This study aims to classify the polarity analysis of sentiment analysis into positive and negative categories and increase the prediction accuracy of the polarity analysis using the pretrained IMDB review data set. First, the text classification algorithm related to sentiment analysis adopts the popular machine learning algorithms such as NB (naive bayes), SVM (support vector machines), XGboost, RF (random forests), and Gradient Boost as comparative models. Second, deep learning has demonstrated discriminative features that can extract complex features of data. Representative algorithms are CNN (convolution neural networks), RNN (recurrent neural networks), LSTM (long-short term memory). CNN can be used similarly to BoW when processing a sentence in vector format, but does not consider sequential data attributes. RNN can handle well in order because it takes into account the time information of the data, but there is a long-term dependency on memory. To solve the problem of long-term dependence, LSTM is used. For the comparison, CNN and LSTM were chosen as simple deep learning models. In addition to classical machine learning algorithms, CNN, LSTM, and the integrated models were analyzed. Although there are many parameters for the algorithms, we examined the relationship between numerical value and precision to find the optimal combination. And, we tried to figure out how the models work well for sentiment analysis and how these models work. This study proposes integrated CNN and LSTM algorithms to extract the positive and negative features of text analysis. The reasons for mixing these two algorithms are as follows. CNN can extract features for the classification automatically by applying convolution layer and massively parallel processing. LSTM is not capable of highly parallel processing. Like faucets, the LSTM has input, output, and forget gates that can be moved and controlled at a desired time. These gates have the advantage of placing memory blocks on hidden nodes. The memory block of the LSTM may not store all the data, but it can solve the CNN's long-term dependency problem. Furthermore, when LSTM is used in CNN's pooling layer, it has an end-to-end structure, so that spatial and temporal features can be designed simultaneously. In combination with CNN-LSTM, 90.33% accuracy was measured. This is slower than CNN, but faster than LSTM. The presented model was more accurate than other models. In addition, each word embedding layer can be improved when training the kernel step by step. CNN-LSTM can improve the weakness of each model, and there is an advantage of improving the learning by layer using the end-to-end structure of LSTM. Based on these reasons, this study tries to enhance the classification accuracy of movie reviews using the integrated CNN-LSTM model.