• Title/Summary/Keyword: mistakes of classification

Search Result 26, Processing Time 0.02 seconds

A Design of a Korean Programming Language Ensuring Run-Time Safety through Categorizing C Secure Coding Rules (C 시큐어 코딩 규칙 분류를 통한 실행 안전성을 보장하는 한글 언어 설계)

  • Kim, Yeoneo;Song, Jiwon;Woo, Gyun
    • Journal of KIISE
    • /
    • v.42 no.4
    • /
    • pp.487-495
    • /
    • 2015
  • Since most of information is computerized nowadays, it is extremely important to promote the security of the computerized information. However, the software itself can threaten the safety of information through many abusive methods enabled by coding mistakes. Even though the Secure Coding Guide has been proposed to promote the safety of information by fundamentally blocking the hacking methods, it is still hard to apply the techniques on other programming languages because the proposed coding guide is mainly written for C and Java programmers. In this paper, we reclassified the coding rules of the Secure Coding Guide to extend its applicability to programming languages in general. The specific coding guide adopted in this paper is the C Secure Coding Guide, announced by the Ministry of Government Administration and Home Affairs of Korea. According to the classification, we applied the rules of programming in Sprout, which is a newly proposed Korean programming language. The number of vulnerability rules that should be checked was decreased in Sprout by 52% compared to C.

Changing Methodologies and Reshaping Concepts in Biodiversity Science: A Historical Review of Research on Human Genetic Diversity (생물학 연구 방법론 변화에 따른 생물다양성 개념의 전환: 인간 유전다양성 연구 사례)

  • Hyun, Jaehwan
    • Korean Journal of Environmental Biology
    • /
    • v.32 no.4
    • /
    • pp.413-425
    • /
    • 2014
  • In order to shed some light on the historical change of biodiversity concepts, this paper reviews the science and technology studies (STS) literature on the history of biological research on human genetic diversity. By doing that, I show how the notion of genetic diversity in the human population - from "race" to "population" to "biogeographical ancestry" - has changed with methodologies and techniques over the last hundred years. In the meantime, I point out contexts and situations, despite conceptual and methodological developments, that show that current human genetic diversity research is slipping into the past mistakes of scientific racism. This article offers biodiversity researchers an opportunity to consider their own scientific practices on classifying species more reflectively.

Sentiment Analysis of Korean Reviews Using CNN: Focusing on Morpheme Embedding (CNN을 적용한 한국어 상품평 감성분석: 형태소 임베딩을 중심으로)

  • Park, Hyun-jung;Song, Min-chae;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.59-83
    • /
    • 2018
  • With the increasing importance of sentiment analysis to grasp the needs of customers and the public, various types of deep learning models have been actively applied to English texts. In the sentiment analysis of English texts by deep learning, natural language sentences included in training and test datasets are usually converted into sequences of word vectors before being entered into the deep learning models. In this case, word vectors generally refer to vector representations of words obtained through splitting a sentence by space characters. There are several ways to derive word vectors, one of which is Word2Vec used for producing the 300 dimensional Google word vectors from about 100 billion words of Google News data. They have been widely used in the studies of sentiment analysis of reviews from various fields such as restaurants, movies, laptops, cameras, etc. Unlike English, morpheme plays an essential role in sentiment analysis and sentence structure analysis in Korean, which is a typical agglutinative language with developed postpositions and endings. A morpheme can be defined as the smallest meaningful unit of a language, and a word consists of one or more morphemes. For example, for a word '예쁘고', the morphemes are '예쁘(= adjective)' and '고(=connective ending)'. Reflecting the significance of Korean morphemes, it seems reasonable to adopt the morphemes as a basic unit in Korean sentiment analysis. Therefore, in this study, we use 'morpheme vector' as an input to a deep learning model rather than 'word vector' which is mainly used in English text. The morpheme vector refers to a vector representation for the morpheme and can be derived by applying an existent word vector derivation mechanism to the sentences divided into constituent morphemes. By the way, here come some questions as follows. What is the desirable range of POS(Part-Of-Speech) tags when deriving morpheme vectors for improving the classification accuracy of a deep learning model? Is it proper to apply a typical word vector model which primarily relies on the form of words to Korean with a high homonym ratio? Will the text preprocessing such as correcting spelling or spacing errors affect the classification accuracy, especially when drawing morpheme vectors from Korean product reviews with a lot of grammatical mistakes and variations? We seek to find empirical answers to these fundamental issues, which may be encountered first when applying various deep learning models to Korean texts. As a starting point, we summarized these issues as three central research questions as follows. First, which is better effective, to use morpheme vectors from grammatically correct texts of other domain than the analysis target, or to use morpheme vectors from considerably ungrammatical texts of the same domain, as the initial input of a deep learning model? Second, what is an appropriate morpheme vector derivation method for Korean regarding the range of POS tags, homonym, text preprocessing, minimum frequency? Third, can we get a satisfactory level of classification accuracy when applying deep learning to Korean sentiment analysis? As an approach to these research questions, we generate various types of morpheme vectors reflecting the research questions and then compare the classification accuracy through a non-static CNN(Convolutional Neural Network) model taking in the morpheme vectors. As for training and test datasets, Naver Shopping's 17,260 cosmetics product reviews are used. To derive morpheme vectors, we use data from the same domain as the target one and data from other domain; Naver shopping's about 2 million cosmetics product reviews and 520,000 Naver News data arguably corresponding to Google's News data. The six primary sets of morpheme vectors constructed in this study differ in terms of the following three criteria. First, they come from two types of data source; Naver news of high grammatical correctness and Naver shopping's cosmetics product reviews of low grammatical correctness. Second, they are distinguished in the degree of data preprocessing, namely, only splitting sentences or up to additional spelling and spacing corrections after sentence separation. Third, they vary concerning the form of input fed into a word vector model; whether the morphemes themselves are entered into a word vector model or with their POS tags attached. The morpheme vectors further vary depending on the consideration range of POS tags, the minimum frequency of morphemes included, and the random initialization range. All morpheme vectors are derived through CBOW(Continuous Bag-Of-Words) model with the context window 5 and the vector dimension 300. It seems that utilizing the same domain text even with a lower degree of grammatical correctness, performing spelling and spacing corrections as well as sentence splitting, and incorporating morphemes of any POS tags including incomprehensible category lead to the better classification accuracy. The POS tag attachment, which is devised for the high proportion of homonyms in Korean, and the minimum frequency standard for the morpheme to be included seem not to have any definite influence on the classification accuracy.

Factor Analysis of Decreased Score on Coronary Artery Calcium Score (관상동맥 석회화점수 감소 요인 분석)

  • Shim, Jae-Goo;Kim, Yon-Min;Kim, Jin-Woo
    • Journal of the Korean Society of Radiology
    • /
    • v.10 no.4
    • /
    • pp.285-290
    • /
    • 2016
  • The purpose of our study was to retrospectively evaluate the cause of a decreased calcium score of follow-up studies on coronary artery calcium scores (CACs) computed tomography (CT). The subjects were healthy 100 people(85 males $60.6{\pm}6.9$ years, 15 females $67.2{\pm}7.3$ years). The subjects decreased CACs were divided into 4 subgroups depending on Agatston classification, minimal (1-10), mild (11-100), moderate (101-400), severe (400<). As a result of decreased CACs were scan location disagreement 51%, motion artifact 26%, equipment changes 14%, operator mistakes 5%, input miss 2%, image loss 1%, arrhythmia 1%. In the mild group, the most common decreased CACs were 49 people. In the minimal group, the most significant variation reduction has occurred to 6 people. Scan location disagreement was considered a partial volume effects due to the scan starting position. It showed less than 100 CACs a high variation (19.7%) in more than 100 CACs, a lower variation (2.2%), these could be seen that the variation range is different that can be tolerated according to the calcification score. Motion artifact factor was found in 26%, which is so closely related to the preceding tests that affect the higher heart rate like this pulmonary function test, exercise stress test.

Cultural Landscape Analysis of Market Space in Chinatown - A Case Study of the 'Chung-Ang Market of Dairimdong' - (중국 이주민 거주지역 내 시장공간의 문화경관해석 - 서울시 대림동 중앙시장을 대상으로 -)

  • Chun, Hyun-Jin;Lee, June;Jiang, Long;Kim, Sung-Kyun
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.40 no.5
    • /
    • pp.73-87
    • /
    • 2012
  • Nowadays, the Korean society is full of multiculturalism as there are many foreign ethnic enclaves. Many Chinese quarters are built in various parts of Korea along with the increasing population of Chinese immigrant. Especially, the Chinese quarter has shown the sign of time and the cultural characteristic of the local residents. This research is to study the market space of Chinese ethnic enclaves in Dairimdong. This research method is the field study to use a participant observation. Below are the research results: Chinese merchants put a private object such as "tanzi" on a sidewalk and install large awning covered full of sidewalk. Sidewalk transform from an outdoor space into an internal space because of Chinese merchants. Passers-by move to use vehicle roads and transform not only the car's space but also the passers-by space. Urban planners originally classify space into three categories, which are building - sidewalk - vehicles road. However, after Chinese came to the market, Chinese classified space into new three categories which is building - space for both sidewalk and "tanzi" - space for both sidewalk and vehicles road. New classification of space is quite different from the previous. In addition, Chinese thinks that the Dairimdong's Market is a very comfortable place. Because Dairimdong Market have many Chinese physical facilities. Next, Chinese thinks that the Dairimdong Market is a very friendly place to buy Chinese products easily. This market has become a place of consumption for the Chinese. Eventually, Dairimdong's Market has changed because of Chinese immigrants. It is possible to make satisfactory planning and design proposal to build Chinese quarters in the future through the explanation of space and status by way of culture. There are many careless mistakes in previous subjective planning and design proposal of the designers. Thus, it should consider the problems created by their way of use in later planning and design.

The way to make training data for deep learning model to recognize keywords in product catalog image at E-commerce (온라인 쇼핑몰에서 상품 설명 이미지 내의 키워드 인식을 위한 딥러닝 훈련 데이터 자동 생성 방안)

  • Kim, Kitae;Oh, Wonseok;Lim, Geunwon;Cha, Eunwoo;Shin, Minyoung;Kim, Jongwoo
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.1-23
    • /
    • 2018
  • From the 21st century, various high-quality services have come up with the growth of the internet or 'Information and Communication Technologies'. Especially, the scale of E-commerce industry in which Amazon and E-bay are standing out is exploding in a large way. As E-commerce grows, Customers could get what they want to buy easily while comparing various products because more products have been registered at online shopping malls. However, a problem has arisen with the growth of E-commerce. As too many products have been registered, it has become difficult for customers to search what they really need in the flood of products. When customers search for desired products with a generalized keyword, too many products have come out as a result. On the contrary, few products have been searched if customers type in details of products because concrete product-attributes have been registered rarely. In this situation, recognizing texts in images automatically with a machine can be a solution. Because bulk of product details are written in catalogs as image format, most of product information are not searched with text inputs in the current text-based searching system. It means if information in images can be converted to text format, customers can search products with product-details, which make them shop more conveniently. There are various existing OCR(Optical Character Recognition) programs which can recognize texts in images. But existing OCR programs are hard to be applied to catalog because they have problems in recognizing texts in certain circumstances, like texts are not big enough or fonts are not consistent. Therefore, this research suggests the way to recognize keywords in catalog with the Deep Learning algorithm which is state of the art in image-recognition area from 2010s. Single Shot Multibox Detector(SSD), which is a credited model for object-detection performance, can be used with structures re-designed to take into account the difference of text from object. But there is an issue that SSD model needs a lot of labeled-train data to be trained, because of the characteristic of deep learning algorithms, that it should be trained by supervised-learning. To collect data, we can try labelling location and classification information to texts in catalog manually. But if data are collected manually, many problems would come up. Some keywords would be missed because human can make mistakes while labelling train data. And it becomes too time-consuming to collect train data considering the scale of data needed or costly if a lot of workers are hired to shorten the time. Furthermore, if some specific keywords are needed to be trained, searching images that have the words would be difficult, as well. To solve the data issue, this research developed a program which create train data automatically. This program can make images which have various keywords and pictures like catalog and save location-information of keywords at the same time. With this program, not only data can be collected efficiently, but also the performance of SSD model becomes better. The SSD model recorded 81.99% of recognition rate with 20,000 data created by the program. Moreover, this research had an efficiency test of SSD model according to data differences to analyze what feature of data exert influence upon the performance of recognizing texts in images. As a result, it is figured out that the number of labeled keywords, the addition of overlapped keyword label, the existence of keywords that is not labeled, the spaces among keywords and the differences of background images are related to the performance of SSD model. This test can lead performance improvement of SSD model or other text-recognizing machine based on deep learning algorithm with high-quality data. SSD model which is re-designed to recognize texts in images and the program developed for creating train data are expected to contribute to improvement of searching system in E-commerce. Suppliers can put less time to register keywords for products and customers can search products with product-details which is written on the catalog.