• Title/Summary/Keyword: amount of learning

Search Result 1,008, Processing Time 0.03 seconds

Reliable Image-Text Fusion CAPTCHA to Improve User-Friendliness and Efficiency (사용자 편의성과 효율성을 증진하기 위한 신뢰도 높은 이미지-텍스트 융합 CAPTCHA)

  • Moon, Kwang-Ho;Kim, Yoo-Sung
    • The KIPS Transactions:PartC
    • /
    • v.17C no.1
    • /
    • pp.27-36
    • /
    • 2010
  • In Web registration pages and online polling applications, CAPTCHA(Completely Automated Public Turing Test To Tell Computers and Human Apart) is used for distinguishing human users from automated programs. Text-based CAPTCHAs have been widely used in many popular Web sites in which distorted text is used. However, because the advanced optical character recognition techniques can recognize the distorted texts, the reliability becomes low. Image-based CAPTCHAs have been proposed to improve the reliability of the text-based CAPTCHAs. However, these systems also are known as having some drawbacks. First, some image-based CAPTCHA systems with small number of image files in their image dictionary is not so reliable since attacker can recognize images by repeated executions of machine learning programs. Second, users may feel uncomfortable since they have to try CAPTCHA tests repeatedly when they fail to input a correct keyword. Third, some image-base CAPTCHAs require high communication cost since they should send several image files for one CAPTCHA. To solve these problems of image-based CAPTCHA, this paper proposes a new CAPTCHA based on both image and text. In this system, an image and keywords are integrated into one CAPTCHA image to give user a hint for the answer keyword. The proposed CAPTCHA can help users to input easily the answer keyword with the hint in the fused image. Also, the proposed system can reduce the communication costs since it uses only a fused image file for one CAPTCHA. To improve the reliability of the image-text fusion CAPTCHA, we also propose a dynamic building method of large image dictionary from gathering huge amount of images from theinternet with filtering phase for preserving the correctness of CAPTCHA images. In this paper, we proved that the proposed image-text fusion CAPTCHA provides users more convenience and high reliability than the image-based CAPTCHA through experiments.

Korean and Multilingual Language Models Study for Cross-Lingual Post-Training (XPT) (Cross-Lingual Post-Training (XPT)을 위한 한국어 및 다국어 언어모델 연구)

  • Son, Suhyune;Park, Chanjun;Lee, Jungseob;Shim, Midan;Lee, Chanhee;Park, Kinam;Lim, Heuiseok
    • Journal of the Korea Convergence Society
    • /
    • v.13 no.3
    • /
    • pp.77-89
    • /
    • 2022
  • It has been proven through many previous researches that the pretrained language model with a large corpus helps improve performance in various natural language processing tasks. However, there is a limit to building a large-capacity corpus for training in a language environment where resources are scarce. Using the Cross-lingual Post-Training (XPT) method, we analyze the method's efficiency in Korean, which is a low resource language. XPT selectively reuses the English pretrained language model parameters, which is a high resource and uses an adaptation layer to learn the relationship between the two languages. This confirmed that only a small amount of the target language dataset in the relationship extraction shows better performance than the target pretrained language model. In addition, we analyze the characteristics of each model on the Korean language model and the Korean multilingual model disclosed by domestic and foreign researchers and companies.

Spark based Scalable RDFS Ontology Reasoning over Big Triples with Confidence Values (신뢰값 기반 대용량 트리플 처리를 위한 스파크 환경에서의 RDFS 온톨로지 추론)

  • Park, Hyun-Kyu;Lee, Wan-Gon;Jagvaral, Batselem;Park, Young-Tack
    • Journal of KIISE
    • /
    • v.43 no.1
    • /
    • pp.87-95
    • /
    • 2016
  • Recently, due to the development of the Internet and electronic devices, there has been an enormous increase in the amount of available knowledge and information. As this growth has proceeded, studies on large-scale ontological reasoning have been actively carried out. In general, a machine learning program or knowledge engineer measures and provides a degree of confidence for each triple in a large ontology. Yet, the collected ontology data contains specific uncertainty and reasoning such data can cause vagueness in reasoning results. In order to solve the uncertainty issue, we propose an RDFS reasoning approach that utilizes confidence values indicating degrees of uncertainty in the collected data. Unlike conventional reasoning approaches that have not taken into account data uncertainty, by using the in-memory based cluster computing framework Spark, our approach computes confidence values in the data inferred through RDFS-based reasoning by applying methods for uncertainty estimating. As a result, the computed confidence values represent the uncertainty in the inferred data. To evaluate our approach, ontology reasoning was carried out over the LUBM standard benchmark data set with addition arbitrary confidence values to ontology triples. Experimental results indicated that the proposed system is capable of running over the largest data set LUBM3000 in 1179 seconds inferring 350K triples.

What factors drive AI project success? (무엇이 AI 프로젝트를 성공적으로 이끄는가?)

  • KyeSook Kim;Hyunchul Ahn
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.1
    • /
    • pp.327-351
    • /
    • 2023
  • This paper aims to derive success factors that successfully lead an artificial intelligence (AI) project and prioritize importance. To this end, we first reviewed prior related studies to select success factors and finally derived 17 factors through expert interviews. Then, we developed a hierarchical model based on the TOE framework. With a hierarchical model, a survey was conducted on experts from AI-using companies and experts from supplier companies that support AI advice and technologies, platforms, and applications and analyzed using AHP methods. As a result of the analysis, organizational and technical factors are more important than environmental factors, but organizational factors are a little more critical. Among the organizational factors, strategic/clear business needs, AI implementation/utilization capabilities, and collaboration/communication between departments were the most important. Among the technical factors, sufficient amount and quality of data for AI learning were derived as the most important factors, followed by IT infrastructure/compatibility. Regarding environmental factors, customer preparation and support for the direct use of AI were essential. Looking at the importance of each 17 individual factors, data availability and quality (0.2245) were the most important, followed by strategy/clear business needs (0.1076) and customer readiness/support (0.0763). These results can guide successful implementation and development for companies considering or implementing AI adoption, service providers supporting AI adoption, and government policymakers seeking to foster the AI industry. In addition, they are expected to contribute to researchers who aim to study AI success models.

Flood Disaster Prediction and Prevention through Hybrid BigData Analysis (하이브리드 빅데이터 분석을 통한 홍수 재해 예측 및 예방)

  • Ki-Yeol Eom;Jai-Hyun Lee
    • The Journal of Bigdata
    • /
    • v.8 no.1
    • /
    • pp.99-109
    • /
    • 2023
  • Recently, not only in Korea but also around the world, we have been experiencing constant disasters such as typhoons, wildfires, and heavy rains. The property damage caused by typhoons and heavy rain in South Korea alone has exceeded 1 trillion won. These disasters have resulted in significant loss of life and property damage, and the recovery process will also take a considerable amount of time. In addition, the government's contingency funds are insufficient for the current situation. To prevent and effectively respond to these issues, it is necessary to collect and analyze accurate data in real-time. However, delays and data loss can occur depending on the environment where the sensors are located, the status of the communication network, and the receiving servers. In this paper, we propose a two-stage hybrid situation analysis and prediction algorithm that can accurately analyze even in such communication network conditions. In the first step, data on river and stream levels are collected, filtered, and refined from diverse sensors of different types and stored in a bigdata. An AI rule-based inference algorithm is applied to analyze the crisis alert levels. If the rainfall exceeds a certain threshold, but it remains below the desired level of interest, the second step of deep learning image analysis is performed to determine the final crisis alert level.

Safety Verification Techniques of Privacy Policy Using GPT (GPT를 활용한 개인정보 처리방침 안전성 검증 기법)

  • Hye-Yeon Shim;MinSeo Kweun;DaYoung Yoon;JiYoung Seo;Il-Gu Lee
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.34 no.2
    • /
    • pp.207-216
    • /
    • 2024
  • As big data was built due to the 4th Industrial Revolution, personalized services increased rapidly. As a result, the amount of personal information collected from online services has increased, and concerns about users' personal information leakage and privacy infringement have increased. Online service providers provide privacy policies to address concerns about privacy infringement of users, but privacy policies are often misused due to the long and complex problem that it is difficult for users to directly identify risk items. Therefore, there is a need for a method that can automatically check whether the privacy policy is safe. However, the safety verification technique of the conventional blacklist and machine learning-based privacy policy has a problem that is difficult to expand or has low accessibility. In this paper, to solve the problem, we propose a safety verification technique for the privacy policy using the GPT-3.5 API, which is a generative artificial intelligence. Classification work can be performed evenin a new environment, and it shows the possibility that the general public without expertise can easily inspect the privacy policy. In the experiment, how accurately the blacklist-based privacy policy and the GPT-based privacy policy classify safe and unsafe sentences and the time spent on classification was measured. According to the experimental results, the proposed technique showed 10.34% higher accuracy on average than the conventional blacklist-based sentence safety verification technique.

Outlier Detection By Clustering-Based Ensemble Model Construction (클러스터링 기반 앙상블 모델 구성을 이용한 이상치 탐지)

  • Park, Cheong Hee;Kim, Taegong;Kim, Jiil;Choi, Semok;Lee, Gyeong-Hoon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.11
    • /
    • pp.435-442
    • /
    • 2018
  • Outlier detection means to detect data samples that deviate significantly from the distribution of normal data. Most outlier detection methods calculate an outlier score that indicates the extent to which a data sample is out of normal state and determine it to be an outlier when its outlier score is above a given threshold. However, since the range of an outlier score is different for each data and the outliers exist at a smaller ratio than the normal data, it is very difficult to determine the threshold value for an outlier score. Further, in an actual situation, it is not easy to acquire data including a sufficient amount of outliers available for learning. In this paper, we propose a clustering-based outlier detection method by constructing a model representing a normal data region using only normal data and performing binary classification of outliers and normal data for new data samples. Then, by dividing the given normal data into chunks, and constructing a clustering model for each chunk, we expand it to the ensemble method combining the decision by the models and apply it to the streaming data with dynamic changes. Experimental results using real data and artificial data show high performance of the proposed method.

Deletion-Based Sentence Compression Using Sentence Scoring Reflecting Linguistic Information (언어 정보가 반영된 문장 점수를 활용하는 삭제 기반 문장 압축)

  • Lee, Jun-Beom;Kim, So-Eon;Park, Seong-Bae
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.3
    • /
    • pp.125-132
    • /
    • 2022
  • Sentence compression is a natural language processing task that generates concise sentences that preserves the important meaning of the original sentence. For grammatically appropriate sentence compression, early studies utilized human-defined linguistic rules. Furthermore, while the sequence-to-sequence models perform well on various natural language processing tasks, such as machine translation, there have been studies that utilize it for sentence compression. However, for the linguistic rule-based studies, all rules have to be defined by human, and for the sequence-to-sequence model based studies require a large amount of parallel data for model training. In order to address these challenges, Deleter, a sentence compression model that leverages a pre-trained language model BERT, is proposed. Because the Deleter utilizes perplexity based score computed over BERT to compress sentences, any linguistic rules and parallel dataset is not required for sentence compression. However, because Deleter compresses sentences only considering perplexity, it does not compress sentences by reflecting the linguistic information of the words in the sentences. Furthermore, since the dataset used for pre-learning BERT are far from compressed sentences, there is a problem that this can lad to incorrect sentence compression. In order to address these problems, this paper proposes a method to quantify the importance of linguistic information and reflect it in perplexity-based sentence scoring. Furthermore, by fine-tuning BERT with a corpus of news articles that often contain proper nouns and often omit the unnecessary modifiers, we allow BERT to measure the perplexity appropriate for sentence compression. The evaluations on the English and Korean dataset confirm that the sentence compression performance of sentence-scoring based models can be improved by utilizing the proposed method.

Literary Text and the Cultural Interpretation - A Study of the Model of 「History of Spanish Literature」 (문학텍스트와 문학적 해석 -「스페인 문학사」를 통한 모델 연구)

  • Na, Songjoo
    • Cross-Cultural Studies
    • /
    • v.26
    • /
    • pp.465-485
    • /
    • 2012
  • Instructing "History of Spanish Literature" class faces various types of limits and obstacles, just as other foreign language literature history classes do. Majority of students enter the university without having any previous spanish learning experience, which means, for them, even the interpretation of the text itself can be difficult. Moreover, the fact that "History of Spanish Literature" is traced all the way back to the Middle Age, students encounter even more difficulties and find factors that make them feel the class is not interesting. To list several, such factors include the embarrassment felt by the students, antiquated expressions, literature texts filled with deliberately broken grammars, explanations written in pretentious vocabularies, disorderly introduction of many different literary works that ignores the big picture, in which in return, reduces academic interest in students, and finally general lack of interest in literate itself due to the fact that the following generation is used to visual media. Although recognizing such problem that causes the distortion of the value of our lives and literature is a very imminent problem, there has not even been a primary discussion on such matter. Thus, the problem of what to teach in "History of Spanish Literature" class remains unsolved so far. Such problem includes wether to teach the history of authors and literature works, or the chronology of the text, the correlations, and what style of writing to teach first among many, and how to teach to read with criticism, and how to effectively utilize the limited class time to teach. However, unfortunately, there has not been any sorts of discussion among the insructors. I, as well, am not so proud of myself either when I question myself of how little and insufficiently did I contemplate about such problems. Living in the era so called the visual media era or the crisis of humanity studies, now there is a strong need to bring some change in the education of literature history. To suggest a solution to make such necessary change, I recommended to incorporate the visual media, the culture or custom that students are accustomed to, to the class. This solution is not only an attempt to introduce various fields to students, superseding the mere literature reserch area, but also the result that reflects the voice of students who come from a different cultural background and generation. Thus, what not to forget is that the bottom line of adopting a new teaching method is to increase the class participation of students and broaden the horizon of the Spanish literature. However, the ultimate goal of "History of Spanish Literature" class is the contemplation about humanity, not the progress in linguistic ability. Similarly, the ultimate goal of university education is to train students to become a successful member of the society. To achieve such goal, cultural approach to the literature text helps not only Spanish learning but also pragmatic education. Moreover, it helps to go beyond of what a mere functional person does. However, despite such optimistic expectations, foreign literature class has to face limits of eclecticism. As for the solution, as mentioned above, the method of teaching that mainly incorporates cultural text is a approach that fulfills the students with sensibility who live in the visual era. Second, it is a three-dimensional and sensible approach for the visual era, not an annotation that searches for any ambiguous vocabularies or metaphors. Third, it is the method that reduces the burdensome amount of reading. Fourth, it triggers interest in students including philosophical, sociocultural, and political ones. Such experience is expected to stimulate the intellectual curiosity in students and moreover motivates them to continues their study in graduate school, because it itself can be an interesting area of study.

Teachers' Recognition on the Optimization of the Educational Contents of Clothing and Textiles in Practical Arts or Technology.Home Economics (실과 및 기술.가정 교과에서 의생활 교육내용의 적정성에 대한 교사의 인식)

  • Baek Seung-Hee;Han Young-Sook;Lee Hye-Ja
    • Journal of Korean Home Economics Education Association
    • /
    • v.18 no.3 s.41
    • /
    • pp.97-117
    • /
    • 2006
  • The purpose of this study was to investigate the teachers' recognition on the optimization of the educational contents of Clothing & Textiles in subjects of :he Practical Arts or the Technology & Home Economics in the course of elementary, middle and high schools. The statistical data for this research were collected from 203 questionnaires of teachers who work on elementary, middle and high schools. Mean. standard deviation, percentage were calculated using SPSS/WIN 12.0 program. Also. these materials were verified by t-test, One-way ANOVA and post verification Duncan. The results were as follows; First, The equipment ratio of practice laboratory were about 24% and very poor in elementary schools but those of middle and high school were 97% and 78% each and higher than elementary schools. Second, More than 50% of teachers recognized the amount of learning 'proper'. The elementary school teachers recognized the mount of learning in 'operating sewing machines' too heavy especially, the same as middle school teachers in 'making shorts': the same as high school teachers in 'making tablecloth and curtain' and 'making pillow cover or bag'. Third, All of the elementary, middle and high school teachers recognized the levels of total contents of clothing and textiles 'common'. The 80% of elementary school teachers recognized 'operating sewing machines' and 'making cushions' difficult especially. The same as middle school teachers in 'hand knitting handbag by crochet hoop needle', 'the various kinds of cloth' and 'making short pants'. The same as high school teachers in 'making tablecloth or curtain'. Fourth, Elementary school teachers recognized 'practicing basic hand needlework' and 'making pouch using hand needlework' important in the degree of educational contents importance. Middle school teachers recognized 'making short pants unimportant. High school teachers considered the contents focusing on practice such as 'making tablecloth and curtain' and 'making pillow cover or bags' unimportant. My suggestions were as follows; Both laboratories and facilities for practice should be established for making clothing and textiles lessons effective in Practical Arts in elementary schools. The 'operating sewing machines' which were considered difficult should be dealt in upper grade, re-conditioning to easier or omitted. The practical contents should be changed to student-activity-oriented and should be recomposed in order to familiar with students' living. It was needed to various and sufficient supports for increasing the teachers' practical abilities.

  • PDF