• Title/Summary/Keyword: text extraction

Search Result 465, Processing Time 0.043 seconds

A Study on the Extraction and Integration of Learning Object Meta-data using Web Service of Databases (DBMS의 웹서비스를 이용한 학습객체 메타데이터 추출 및 통합에 관한 연구)

  • Choe, Hyun-Jong
    • Journal of The Korean Association of Information Education
    • /
    • v.7 no.2
    • /
    • pp.199-206
    • /
    • 2003
  • XML is becoming a new developing tool of web technology because of its ability of data management and flexibility in data presentation. So it's well researched that the reusability and integration with learning objects such as text, image, sound, video and plug-in programs of web contents in computer education. But the research for storing, extracting and integrating metadata about learning object was needed prior to implementing online learning system to integrate and manage it. Therefore this study propose a new method of using web service of DBMS for extracting learning object's metadata in database server which located in 3-tier system. To evaluate the efficiency of proposed method, The test server and two DBMSs(MS SQL Server 2000 and Oracle 9i) which have 30 metadata was implemented and the response time of it was measured. The response time of it was short, but in order to using this method the additional programming with SAX/DOM was necessary.

  • PDF

Moxibustion Treatment for Knee Pain: A Systematic Review (슬통의 뜸치료에 대한 체계적 고찰)

  • Kim, Seok Hee;Lee, Kyung Jin;Choi, Yoo Min;Kim, Ju Yong;Yook, Tae Han;Lee, Sang Lyoung;Kim, Jong Uk
    • Journal of Acupuncture Research
    • /
    • v.32 no.3
    • /
    • pp.135-146
    • /
    • 2015
  • Objectives : This study was designed to evaluate clinical evidence of moxibustion treatment for knee pain. Methods : All processes were independently carried out by three investigators. A literature search was performed in 3 databases from their inception to May 2015. Ten reports were found based on their title, abstract and text. Following this, data extraction and analysis were done using a risk of bias(ROB) and through an assessment of multiple systematic reviews(AMSTAR). Results : 10 studies(6RCT, 2SR, 2CR) were included. Generally, indirect moxibustion was used for knee pain, but only one study indicated the use of direct moxibustion. Moxibustion was shown to be effective in treating knee pain, and the number of required treatments was fourteen on average. In assessing risk of bias, indefinite and uncertain information made all included trials subject to a high risk of bias. On the other hand, SR showed all evaluation items in the assessment of multiple systematic reviews, with the exception of an included or excluded studies list. Conclusions : Because of deficient study design or limited research planning, there is not sufficient evidence to allow for any conclusion about the efficacy of moxibustion for knee pain. Therefore, well-designed high quality trials are needed from now on.

Proposal for License Plate Recognition Using Synthetic Data and Vehicle Type Recognition System (가상 데이터를 활용한 번호판 문자 인식 및 차종 인식 시스템 제안)

  • Lee, Seungju;Park, Gooman
    • Journal of Broadcast Engineering
    • /
    • v.25 no.5
    • /
    • pp.776-788
    • /
    • 2020
  • In this paper, a vehicle type recognition system using deep learning and a license plate recognition system are proposed. In the existing system, the number plate area extraction through image processing and the character recognition method using DNN were used. These systems have the problem of declining recognition rates as the environment changes. Therefore, the proposed system used the one-stage object detection method YOLO v3, focusing on real-time detection and decreasing accuracy due to environmental changes, enabling real-time vehicle type and license plate character recognition with one RGB camera. Training data consists of actual data for vehicle type recognition and license plate area detection, and synthetic data for license plate character recognition. The accuracy of each module was 96.39% for detection of car model, 99.94% for detection of license plates, and 79.06% for recognition of license plates. In addition, accuracy was measured using YOLO v3 tiny, a lightweight network of YOLO v3.

An Effective Incremental Text Clustering Method for the Large Document Database (대용량 문서 데이터베이스를 위한 효율적인 점진적 문서 클러스터링 기법)

  • Kang, Dong-Hyuk;Joo, Kil-Hong;Lee, Won-Suk
    • The KIPS Transactions:PartD
    • /
    • v.10D no.1
    • /
    • pp.57-66
    • /
    • 2003
  • With the development of the internet and computer, the amount of information through the internet is increasing rapidly and it is managed in document form. For this reason, the research into the method to manage for a large amount of document in an effective way is necessary. The document clustering is integrated documents to subject by classifying a set of documents through their similarity among them. Accordingly, the document clustering can be used in exploring and searching a document and it can increased accuracy of search. This paper proposes an efficient incremental cluttering method for a set of documents increase gradually. The incremental document clustering algorithm assigns a set of new documents to the legacy clusters which have been identified in advance. In addition, to improve the correctness of the clustering, removing the stop words can be proposed and the weight of the word can be calculated by the proposed TF$\times$NIDF function.

Recognition of Korean Text in Outdoor Signboard Images Using Directional Feature and Fisher Measure (방향성분 특징과 Fisher Measure를 이용한 간판영상 한글인식)

  • Lim, Jun-Sik;Kim, Soo-Hyung;Lee, Guee-Sang;Yang, Hyung-Jung;Lee, Myung-Eun
    • The KIPS Transactions:PartB
    • /
    • v.16B no.3
    • /
    • pp.239-246
    • /
    • 2009
  • In this paper, we propose a Korean character recognition method from outboard signboard images. We have chosen 808 classes of Korean characters by an analysis of frequencies of appearance in a dictionary of signboard names. The proposed method mainly consists of three steps: feature extraction, rough classification, and coarse classification. The first step is to extract a nonlinear directional segments feature, which is immune to the distortion of character shapes. The second step computes an ordered set of 10 recognition candidates using a minimum distance classifier. The last step reorders the recognition candidates using a Fisher discriminant measure. As experimental results, the recognition accuracy is 80.45% for the first choice, and 93.51% for the top five choices.

A WWW Images Automatic Annotation Based On Multi-cues Integration (멀티-큐 통합을 기반으로 WWW 영상의 자동 주석)

  • Shin, Seong-Yoon;Moon, Hyung-Yoon;Rhee, Yang-Won
    • Journal of the Korea Society of Computer and Information
    • /
    • v.13 no.4
    • /
    • pp.79-86
    • /
    • 2008
  • As the rapid development of the Internet, the embedded images in HTML web pages nowadays become predominant. For its amazing function in describing the content and attracting attention, images become substantially important in web pages. All these images consist a considerable database. What's more, the semantic meanings of images are well presented by the surrounding text and links. But only a small minority of these images have precise assigned keyphrases. and manually assigning keyphrases to existing images is very laborious. Therefore it is highly desirable to automate the keyphrases extraction process. In this paper, we first introduce WWW image annotation methods, based on low level features, page tags, overall word frequency and local word frequency. Then we put forward our method of multi-cues integration image annotation. Also, show multi-cue image annotation method is more superior than other method through an experiment.

  • PDF

Control of Time-varying and Nonstationary Stochastic Systems using a Neural Network Controller and Dynamic Bayesian Network Modeling (신경회로망 제어기와 동적 베이시안 네트워크를 이용한 시변 및 비정치 확률시스템의 제어)

  • Cho, Hyun-Cheol;Lee, Jin-Woo;Lee, Young-Jin;Lee, Kwon-Soon
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.17 no.7
    • /
    • pp.930-938
    • /
    • 2007
  • Captions which appear in images include information that relates to the images. In order to obtain the information carried by captions, the methods for text extraction from images have been developed. However, most existing methods can be applied to captions with fixed height of stroke's width. We propose a method which can be applied to various caption size. Our method is based on connected components. And then the edge pixels are detected and grouped into connected components. We analyze the properties of connected components and build a neural network which discriminates connected components which include captions from ones which do not. Experimental data is collected from broadcast programs such as news, documentaries, and show programs which include various height caption. Experimental result is evaluated by two criteria : recall and precision. Recall is the ratio of the identified captions in all the captions in images and the precision is the ratio of the captions in the objects identified as captions. The experiment shows that the proposed method can efficiently extract captions various in size.

A Study on the Printed Korean and Chinese Character Recognition (인쇄체 한글 및 한자의 인식에 관한 연구)

  • 김정우;이세행
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.17 no.11
    • /
    • pp.1175-1184
    • /
    • 1992
  • A new classification method and recognition algorithms for printed Korean and Chinese character is studied for Korean text which contains both Korean and Chinese characters. The proposed method utilizes structural features of the vertical and horizontal vowel in Korean character. Korean characters are classified into 6 groups. Vowel and consonant are separated by means of different vowel extraction methods applied to each group. Time consuming thinning process is excluded. A modified crossing distance feature is measured to recognize extracted consonant. For Chinese character, an average of stroke crossing number is calculated on every characters, which allows the characters to be classified into several groups. A recognition process is then followed in terms of the stroke crossing number and the black dot rate of character. Classification between Korean and Chinese character was at the rate of 90.5%, and classification rate of Ming-style 2512 Korean characters was 90.0%. The recognition algorithm was applied on 1278 characters. The recognition rate was 92.2%. The densest class after classification of 4585 Chinese characters was found to contain only 124 characters, only 1/40 of total numbers. The recognition rate was 89.2%.

  • PDF

Concept Extraction Technique from Documents Using Domain Ontology (지식 문서에서 도메인 온톨로지를 이용한 개념 추출 기법)

  • Mun Hyeon-Jeong;Woo Yong-Tae
    • The KIPS Transactions:PartD
    • /
    • v.13D no.3 s.106
    • /
    • pp.309-316
    • /
    • 2006
  • We propose a novel technique to categorize XML documents and extract a concept efficiently using domain ontology. First, we create domain ontology that use text mining technique and statistical technique. We propose a DScore technique to classify XML documents by using the structural characteristic of XML document. We also present TScore technique to extract a concept by comparing the association term set of domain ontology and the terms in the XML document. To verify the efficiency of the proposed technique, we perform experiment for 295 papers in the computer science area. The results of experiment show that the proposed technique using the structural information in the XML documents is more efficient than the existing technique. Especially, the TScore technique effectively extract the concept of documents although frequency of term is few. Hence, the proposed concept-based retrieval techniques can be expected to contribute to the development of an efficient ontology-based knowledge management system.

The Geometric Layout Analysis of the Document Image Using Connected Components Method and Median Filter (연결요소 방법과 메디안 필터를 이용한 문서영상 기하학적 구조분석)

  • Jang, Dae-Geun;Hwang, Chan-Sik
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.27 no.8A
    • /
    • pp.805-813
    • /
    • 2002
  • Document image should be classified into detailed regions as text, picture, table and etc through the geometric layout analysis if paper documents can be converted automatically into electronic documents. However, complexity of the document layout and variety of the size and density of a picture are the reason to make it difficult to analyze the geometric layout of the document images. In this paper, we propose the method which have a better performance of the region segmentation and classifications, and the line extraction in the table region than the commercial softwares and previous methods. The proposed method can segment the document into detailed regions by using connected components method even if its layout is complex. This method also classifies texts and pictures by using separable median filter even. Though their size and density are diverse, In addition, this method extracts the lines from the table adapting one dimensional median filter to the each horizontal and vertical direction, even though lines are deformed or texts attached to them.