• Title/Summary/Keyword: open information extraction

Search Result 103, Processing Time 0.035 seconds

Feature Extraction Methods using Iris Region Segmentation for Iris Recognition (홍채인식을 위한 홍채영역 분할 특징추출 방법)

  • Eun, In-Ki;Lee, Kwan-Yong
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2007.06c
    • /
    • pp.432-435
    • /
    • 2007
  • 본 논문은 신원확인 수단으로 부각되어 관심이 높은 홍채인식에 대한 연구이다 홍채인식 시스템의 경우 홍채영역에 따라 각 영상들의 특징 값이 차지하는 비중이 서로 다르게 분포되어 있고 눈썹이나 조명에 의한 잡음으로 인하여 인식성능에 영향을 미친다. 이 경우 기존에 등록되어 인증된 사용자의 홍채영상일지라도 제대로 인식하지 못하거나 인증에 실패할 수 있으며, 실세계에서의 홍채영역 사용이 원활하지 못하게 된다. 그러므로 단일 생체인식 시스템에서 홍채인식을 할 경우, 중요한 특징을 그대로 유지하고 인식성능을 향상시키기 위해서 획득된 홍채 영상의 정규화와 전처리 과정을 거친 다음 홍채영역을 분할한 후 각 영역에서의 보정치 적용을 통한 특징추출 방법을 제안한다. 또한 웨이블릿 변환과 주성분 분석을 이용하여 인식 성능이 개선된 특징추출 방법임을 보인다.

  • PDF

Automatic Text Extraction in Video Images using Morphology (모폴로지을 이용한 비디오 영상에서의 자동 문자 추출)

  • 장인영;고병철;김길천;변혜란
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2001.10b
    • /
    • pp.418-420
    • /
    • 2001
  • 본 논문에서는 뉴스 비디오의 정지 영상에서 뉴스 자막과 배경 문자를 추출하기 위한 새로운 방법을 제안한다. 본 논문에서는 일차적으로 입력 컬러 영상을 그레이 영상으로 변환한 후 입력 영상의 명암 대비를 강화시키기 위해 명암 대비 스트레칭을 적용한다. 이후 명암 대비 스트레칭된 영상의 분할을 위해 적응적 임계값을 적용하고 다음 단계에서 문자와 유사한 영역들을 적당한 크기 의 structuring element를 이용하여 제거하는 1차 하부 단계와 모폴로지 녹임(erosion)을 적용한 영상과 모폴로지(열림닫힘[OpenClose]+닫힘열림[CloseOpen])/2가 적용된 영상 사이의 차이 영상을 구하는 2차 하부 단계를 적용시킨다. 마지막 단계에서 각 후보 영역들 중 실제 자막 영역을 추출해내기 위해, 후보 문자 영역의 화소수 비율과 외곽선의 화소수의 비율, 그리고 장축과 단축간의 비율 등에 대해 필터링을 적용한다. 본 논문에서는 임의의 300개의 뉴스영상을 입력 값으로 실험한 결과 93.6%의 우수한 인식률을 얻을 수 있었다. 또한 본 논문에서 제안한 방법은 structuring element의 크기 조절을 통해 크기가 다른 다양한 이미지에서도 좋은 성능을 거둘 수 있다.

  • PDF

An image enhancement Method for extracting multi-license plate region

  • Yun, Jong-Ho;Choi, Myung-Ryul;Lee, Sang-Sun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.6
    • /
    • pp.3188-3207
    • /
    • 2017
  • In this paper, we propose an image enhancement algorithm to improve license plate extraction rate in various environments (Day Street, Night Street, Underground parking lot, etc.). The proposed algorithm is composed of image enhancement algorithm and license plate extraction algorithm. The image enhancement method can improve an image quality of the degraded image, which utilizes a histogram information and overall gray level distribution of an image. The proposed algorithm employs an interpolated probability distribution value (PDV) in order to control a sudden change in image brightness. Probability distribution value can be calculated using cumulative distribution function (CDF) and probability density function (PDF) of the captured image, whose values are achieved by brightness distribution of the captured image. Also, by adjusting the image enhancement factor of each part region based on image pixel information, it provides a function that can adjust the gradation of the image in more details. This processed gray image is converted into a binary image, which fuses narrow breaks and long thin gulfs, eliminates small holes, and fills gaps in the contour by using morphology operations. Then license plate region is detected based on aspect ratio and license plate size of the bound box drawn on connected license plate areas. The images have been captured by using a video camera or a personal image recorder installed in front of the cars. The captured images have included several license plates on multilane roads. Simulation has been executed using OpenCV and MATLAB. The results show that the extraction success rate is more improved than the conventional algorithms.

A Push Agent System for Personalizing e-Mails using Extraction of User Preference Mail Formatn (사용자 선호 메일 형식을 통한 개인화 이메일 푸쉬 에이전트 시스템)

  • 이광형;박재표;이종희;전문석
    • The Journal of Society for e-Business Studies
    • /
    • v.9 no.2
    • /
    • pp.109-121
    • /
    • 2004
  • In this paper, we propose a system that generates a new customizing information for customer with classification and analysis in detail and provides customized information to individual customers automatically. A proposed system generate preference information and preference e-mail format as analysis and calculate that e-mail open rate and mouse event information. Using generated interesting information and preference e-mail format, individual customer's interest information according to e-mail standard and format that customer prefers through agent automatically recompose and push to customer. From experiment, the designed and implemented system showed high e-mail open ratio and user's satisfaction in performance assessment.

  • PDF

Extraction method of spatial relation by analyzing location tag in folksonomy (폭소노미에서 위치태그 분석을 통한 공간관계 추출 기법)

  • Choi, Yun-Hee;Yong, Hwan-Seung
    • Journal of Korea Multimedia Society
    • /
    • v.12 no.8
    • /
    • pp.1043-1054
    • /
    • 2009
  • As the semantic web receives higher concern with an intensified necessity in these days, the research on the ontology as its core technology has been carried out in various fields. The ontology has been adopted as an alternative to work out lots of problematic issues resulted from the insufficient vocabulary selection rules in folksonomy, widely accepted under Web 2.0. Therefore the importance of research to complementarily consolidate the two disciplines, the folksonomy and the ontology, has been increased. Based on this idea this research proposes a system, which pulls out, using open services, the location information tags from folksonomy-based metadata, ultimately extracts, following location information analyses, spatial relationships among tags, and in turn automatically constructs self-correcting location information domain ontology. The system devised in this study will associate data derived from easily accessible folksonomy with meaningful and technological information from ontology.

  • PDF

Flower Recognition System Using OpenCV on Android Platform (OpenCV를 이용한 안드로이드 플랫폼 기반 꽃 인식 시스템)

  • Kim, Kangchul;Yu, Cao
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.1
    • /
    • pp.123-129
    • /
    • 2017
  • New mobile phones with high tech-camera and a large size memory have been recently launched and people upload pictures of beautiful scenes or unknown flowers in SNS. This paper develops a flower recognition system that can get information on flowers in the place where mobile communication is not even available. It consists of a registration part for reference flowers and a recognition part based on OpenCV for Android platform. A new color classification method using RGB color channel and K-means clustering is proposed to reduce the recognition processing time. And ORB for feature extraction and Brute-Force Hamming algorithm for matching are used. We use 12 kinds of flowers with four color groups, and 60 images are applied for reference DB design and 60 images for test. Simulation results show that the success rate is 83.3% and the average recognition time is 2.58 s on Huawei ALEUL00 and the proposed system is suitable for a mobile phone without a network.

An Automatic Schema Generation System based on the Contents for Integrating Web Information Sources (웹 정보원 통합을 위한 내용 기반의 스키마 자동생성시스템)

  • Kwak, Jun-Young;Bae, Jong-Min
    • Journal of the Korea Society of Computer and Information
    • /
    • v.13 no.6
    • /
    • pp.77-86
    • /
    • 2008
  • The Web information sources can be regarded as the largest distributed database to the users. By virtually integrating the distributed information sources and regarding them as a single huge database, we can query the database to extract information. This capability is important to develop Web application programs. We have to infer a database schema from browsing-oriented Web documents in order to integrate databases. This paper presents a heuristic algorithm to infer the XML Schema fully automatically from semi-structured Web documents. The algorithm first extracts candidate pattern regions based on predefined structure-making tags, and determines a target pattern region using a few heuristic factors, and then derives XML Schema extraction rules from the target pattern region. The schema extraction rule is represented in XQuery, which makes development of various application systems possible using open standard XML tools. We also present the experimental results for several public web sources to show the effectiveness of the algorithm.

  • PDF

Development of Information Extraction System from Multi Source Unstructured Documents for Knowledge Base Expansion (지식베이스 확장을 위한 멀티소스 비정형 문서에서의 정보 추출 시스템의 개발)

  • Choi, Hyunseung;Kim, Mintae;Kim, Wooju;Shin, Dongwook;Lee, Yong Hun
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.111-136
    • /
    • 2018
  • In this paper, we propose a methodology to extract answer information about queries from various types of unstructured documents collected from multi-sources existing on web in order to expand knowledge base. The proposed methodology is divided into the following steps. 1) Collect relevant documents from Wikipedia, Naver encyclopedia, and Naver news sources for "subject-predicate" separated queries and classify the proper documents. 2) Determine whether the sentence is suitable for extracting information and derive the confidence. 3) Based on the predicate feature, extract the information in the proper sentence and derive the overall confidence of the information extraction result. In order to evaluate the performance of the information extraction system, we selected 400 queries from the artificial intelligence speaker of SK-Telecom. Compared with the baseline model, it is confirmed that it shows higher performance index than the existing model. The contribution of this study is that we develop a sequence tagging model based on bi-directional LSTM-CRF using the predicate feature of the query, with this we developed a robust model that can maintain high recall performance even in various types of unstructured documents collected from multiple sources. The problem of information extraction for knowledge base extension should take into account heterogeneous characteristics of source-specific document types. The proposed methodology proved to extract information effectively from various types of unstructured documents compared to the baseline model. There is a limitation in previous research that the performance is poor when extracting information about the document type that is different from the training data. In addition, this study can prevent unnecessary information extraction attempts from the documents that do not include the answer information through the process for predicting the suitability of information extraction of documents and sentences before the information extraction step. It is meaningful that we provided a method that precision performance can be maintained even in actual web environment. The information extraction problem for the knowledge base expansion has the characteristic that it can not guarantee whether the document includes the correct answer because it is aimed at the unstructured document existing in the real web. When the question answering is performed on a real web, previous machine reading comprehension studies has a limitation that it shows a low level of precision because it frequently attempts to extract an answer even in a document in which there is no correct answer. The policy that predicts the suitability of document and sentence information extraction is meaningful in that it contributes to maintaining the performance of information extraction even in real web environment. The limitations of this study and future research directions are as follows. First, it is a problem related to data preprocessing. In this study, the unit of knowledge extraction is classified through the morphological analysis based on the open source Konlpy python package, and the information extraction result can be improperly performed because morphological analysis is not performed properly. To enhance the performance of information extraction results, it is necessary to develop an advanced morpheme analyzer. Second, it is a problem of entity ambiguity. The information extraction system of this study can not distinguish the same name that has different intention. If several people with the same name appear in the news, the system may not extract information about the intended query. In future research, it is necessary to take measures to identify the person with the same name. Third, it is a problem of evaluation query data. In this study, we selected 400 of user queries collected from SK Telecom 's interactive artificial intelligent speaker to evaluate the performance of the information extraction system. n this study, we developed evaluation data set using 800 documents (400 questions * 7 articles per question (1 Wikipedia, 3 Naver encyclopedia, 3 Naver news) by judging whether a correct answer is included or not. To ensure the external validity of the study, it is desirable to use more queries to determine the performance of the system. This is a costly activity that must be done manually. Future research needs to evaluate the system for more queries. It is also necessary to develop a Korean benchmark data set of information extraction system for queries from multi-source web documents to build an environment that can evaluate the results more objectively.

Preprocessor Implementation of Open IDS Snort for Smart Manufacturing Industry Network (스마트 제조 산업용 네트워크에 적합한 Snort IDS에서의 전처리기 구현)

  • Ha, Jaecheol
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.26 no.5
    • /
    • pp.1313-1322
    • /
    • 2016
  • Recently, many virus and hacking attacks on public organizations and financial institutions by internet are becoming increasingly intelligent and sophisticated. The Advanced Persistent Threat has been considered as an important cyber risk. This attack is basically accomplished by spreading malicious codes through complex networks. To detect and extract PE files in smart manufacturing industry networks, an efficient processing method which is performed before analysis procedure on malicious codes is proposed. We implement a preprocessor of open intrusion detection system Snort for fast extraction of PE files and install on a hardware sensor equipment. As a result of practical experiment, we verify that the network sensor can extract the PE files which are often suspected as a malware.

Fast and Efficient Implementation of Neural Networks using CUDA and OpenMP (CUDA와 OPenMP를 이용한 빠르고 효율적인 신경망 구현)

  • Park, An-Jin;Jang, Hong-Hoon;Jung, Kee-Chul
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.4
    • /
    • pp.253-260
    • /
    • 2009
  • Many algorithms for computer vision and pattern recognition have recently been implemented on GPU (graphic processing unit) for faster computational times. However, the implementation has two problems. First, the programmer should master the fundamentals of the graphics shading languages that require the prior knowledge on computer graphics. Second, in a job that needs much cooperation between CPU and GPU, which is usual in image processing and pattern recognition contrary to the graphic area, CPU should generate raw feature data for GPU processing as much as possible to effectively utilize GPU performance. This paper proposes more quick and efficient implementation of neural networks on both GPU and multi-core CPU. We use CUDA (compute unified device architecture) that can be easily programmed due to its simple C language-like style instead of GPU to solve the first problem. Moreover, OpenMP (Open Multi-Processing) is used to concurrently process multiple data with single instruction on multi-core CPU, which results in effectively utilizing the memories of GPU. In the experiments, we implemented neural networks-based text extraction system using the proposed architecture, and the computational times showed about 15 times faster than implementation on only GPU without OpenMP.