• Title/Summary/Keyword: Text line information

Search Result 147, Processing Time 0.024 seconds

Recognition of Various Printed Hangul Images by using the Boundary Tracing Technique (경계선 기울기 방법을 이용한 다양한 인쇄체 한글의 인식)

  • Baek, Seung-Bok;Kang, Soon-Dae;Sohn, Young-Sun
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.13 no.1
    • /
    • pp.1-5
    • /
    • 2003
  • In this paper, we realized a system that converts the character images of the printed Korean alphabet (Hangul) to the editable text documents by using the black and white CCD camera, We were able to abstract the contours information of the character which is based on the structural character by using the boundary tracing technique that is strong to the noise on the character recognition. By using the contours information, we recognized the horizontal vowels and vertical vowels of the character image and classify the character into the six patterns. After that, the character is divided to the unit of the consonant and vowel. The vowels are recognized by using the maximum length projection. The separated consonants are recognized by comparing the inputted pattern with the standard pattern that has the phase information of the boundary line change. We realized a system that the recognized characters are inputted to the word editor with the editable KS Hangul completion type code.

An Analysis of the Network of Interactions among Medicinal Herbs and Their Uses (본초 상호작용 관계망 분석 및 활용 방향)

  • Lee, Jeong-Hyeon;Kwon, Oh-Min
    • Journal of Society of Preventive Korean Medicine
    • /
    • v.17 no.1
    • /
    • pp.1-11
    • /
    • 2013
  • Objectives : The aim of this research is to produce information by gathering up the data on the interaction between medicinal herbs which lie scattered in oriental medical books, and to provide people with easy access to the information by visualizing it. Methods : For this purpose, this study established the fundamental data by organizing the patterns of interaction into some kinds after selecting a part of Bonchogangmok(本草綱目) and extracting its text. In addition, in an effort to visualize the data, the study converted the data into 'net' file and visualized the interaction between medicinal herbs on Pajek. The visualization was done targeting a total of three patterns, such as 1 medicinal herb, 2 medicinal herbs, and 1 prescription. With the data on 'Chinese Lacquer(乾漆)' for 1 medicinal herb, data on 'Licorice(甘草)' and 'Chinese Lacquer(乾漆)' for 2 medicinal herbs, and data on 'Iijin-tang(二陳湯)' for prescription, the research conducted the analysis of the network using 'Kamada-Kawaii Algorithm' on Pajek. Results : As a result of the analysis, it was possible to see the meanings at a single glance as the scattered and fractional meanings were integrated with focus on medicinal herbs, but the increasing number of analyzed medicinal herbs tended to more and more complicate their relationships, thus, requiring additional work like filtering. Conclusions : Such results are fairly applicable in on-line database, and it is judged that if further research expands its scope to include systematic classification of medicinal herbs or cover other medical books than Bonchogangmok, it will create more objective, abundant information.

Learning Material Bookmarking Service based on Collective Intelligence (집단지성 기반 학습자료 북마킹 서비스 시스템)

  • Jang, Jincheul;Jung, Sukhwan;Lee, Seulki;Jung, Chihoon;Yoon, Wan Chul;Yi, Mun Yong
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.179-192
    • /
    • 2014
  • Keeping in line with the recent changes in the information technology environment, the online learning environment that supports multiple users' participation such as MOOC (Massive Open Online Courses) has become important. One of the largest professional associations in Information Technology, IEEE Computer Society, announced that "Supporting New Learning Styles" is a crucial trend in 2014. Popular MOOC services, CourseRa and edX, have continued to build active learning environment with a large number of lectures accessible anywhere using smart devices, and have been used by an increasing number of users. In addition, collaborative web services (e.g., blogs and Wikipedia) also support the creation of various user-uploaded learning materials, resulting in a vast amount of new lectures and learning materials being created every day in the online space. However, it is difficult for an online educational system to keep a learner' motivation as learning occurs remotely, with limited capability to share knowledge among the learners. Thus, it is essential to understand which materials are needed for each learner and how to motivate learners to actively participate in online learning system. To overcome these issues, leveraging the constructivism theory and collective intelligence, we have developed a social bookmarking system called WeStudy, which supports learning material sharing among the users and provides personalized learning material recommendations. Constructivism theory argues that knowledge is being constructed while learners interact with the world. Collective intelligence can be separated into two types: (1) collaborative collective intelligence, which can be built on the basis of direct collaboration among the participants (e.g., Wikipedia), and (2) integrative collective intelligence, which produces new forms of knowledge by combining independent and distributed information through highly advanced technologies and algorithms (e.g., Google PageRank, Recommender systems). Recommender system, one of the examples of integrative collective intelligence, is to utilize online activities of the users and recommend what users may be interested in. Our system included both collaborative collective intelligence functions and integrative collective intelligence functions. We analyzed well-known Web services based on collective intelligence such as Wikipedia, Slideshare, and Videolectures to identify main design factors that support collective intelligence. Based on this analysis, in addition to sharing online resources through social bookmarking, we selected three essential functions for our system: 1) multimodal visualization of learning materials through two forms (e.g., list and graph), 2) personalized recommendation of learning materials, and 3) explicit designation of learners of their interest. After developing web-based WeStudy system, we conducted usability testing through the heuristic evaluation method that included seven heuristic indices: features and functionality, cognitive page, navigation, search and filtering, control and feedback, forms, context and text. We recruited 10 experts who majored in Human Computer Interaction and worked in the same field, and requested both quantitative and qualitative evaluation of the system. The evaluation results show that, relative to the other functions evaluated, the list/graph page produced higher scores on all indices except for contexts & text. In case of contexts & text, learning material page produced the best score, compared with the other functions. In general, the explicit designation of learners of their interests, one of the distinctive functions, received lower scores on all usability indices because of its unfamiliar functionality to the users. In summary, the evaluation results show that our system has achieved high usability with good performance with some minor issues, which need to be fully addressed before the public release of the system to large-scale users. The study findings provide practical guidelines for the design and development of various systems that utilize collective intelligence.

A Block Classification and Rotation Angle Extraction for Document Image (문서 영상의 영역 분류와 회전각 검출)

  • Mo, Moon-Jung;Kim, Wook-Hyun
    • The KIPS Transactions:PartB
    • /
    • v.9B no.4
    • /
    • pp.509-516
    • /
    • 2002
  • This paper proposes an efficient algorithm which recognizes the mixed document image consisting of the images, texts, tables, and straight lines. This system is composed of three steps. The first step is the detection of rotation angle for complementing skewed images, the second is detection of erasing an unnecessary background region and last is the classification of each component included in document images. This algorithm performs preprocessing of detecting rotation angles and correcting documents based on the detected rotation angles in order to minimize the error rate by skewness of the documentation. We detected the rotation angie using only horizontal and vertical components in document images and minimized calculation time by erasing unnecessary background region in the detecting process of component of document. In the next step, we classify various components such as image, text, table and line area included in document images. we applied this method to various document images in order to evaluate the performance of document recognition system and show the successful experimental results.

The Geometric Layout Analysis of the Document Image Using Connected Components Method and Median Filter (연결요소 방법과 메디안 필터를 이용한 문서영상 기하학적 구조분석)

  • Jang, Dae-Geun;Hwang, Chan-Sik
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.27 no.8A
    • /
    • pp.805-813
    • /
    • 2002
  • Document image should be classified into detailed regions as text, picture, table and etc through the geometric layout analysis if paper documents can be converted automatically into electronic documents. However, complexity of the document layout and variety of the size and density of a picture are the reason to make it difficult to analyze the geometric layout of the document images. In this paper, we propose the method which have a better performance of the region segmentation and classifications, and the line extraction in the table region than the commercial softwares and previous methods. The proposed method can segment the document into detailed regions by using connected components method even if its layout is complex. This method also classifies texts and pictures by using separable median filter even. Though their size and density are diverse, In addition, this method extracts the lines from the table adapting one dimensional median filter to the each horizontal and vertical direction, even though lines are deformed or texts attached to them.

Weaknesses of the new design of wearable token system proposed by Sun et al. (Sun 등이 제안한 착용 가능한 토큰 시스템의 취약점 분석에 관한 연구)

  • Kim, Jung-Yoon;Choi, Hyoung-Kee
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.20 no.5
    • /
    • pp.81-88
    • /
    • 2010
  • Sun et al. proposed a new design of wearable token system for security of mobile devices, such as a notebook and PDA. In this paper, we show that Sun et al.'s system is vulnerable to off-line password guessing attack and man in the middle attack based on known plain-text attack. We propose an improved scheme which overcomes the weaknesses of Sun et al.'s system. The proposed protocol requires to perform one modular multiplication in the wearable token, which has low computation ability, and modular exponentiation in the mobile devices, which have sufficient computing resources. Our protocol has no security problem, which threatens Sun's system, and known vulnerabilities. That is, the proposed protocol overcomes the security problems of Sun's system with minimal overheads.

Deep-Learning-based smartphone application for automatic recognition of ingredients on curved containers (곡면 용기에 표시된 성분표 자동 인식을 위한 인공지능 기반 스마트폰 애플리케이션)

  • Hieyong Jeong;Choonsung Shin
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.28 no.6
    • /
    • pp.29-43
    • /
    • 2023
  • Consumers should look at the ingredients of cosmetics or food for their health and purchase them after checking whether they contain allergy-causing ingredients. Therefore, this paper aimed to develop an artificial intelligence-based smartphone application for automatically recognizing the ingredients displayed on a curved container and delivering it to consumers in an easy-to-understand manner. The app needs to allow consumers to immediately comprehend the restricted ingredients by recognizing the ingredients' words in the cropped image. Two major issues should be solved during the development process: First, although there were flat containers for cosmetics or food, most were curved containers. Thus, it was necessary to recognize the ingredient table displayed on the curved containers. Second, since the ingredients' words were displayed on the curved surface, the transformed or line-changed words also needed to be recognized. The proposed new methods were enough to solve the above two problems. The application developed through various tests verified that there was no problem recognizing the ingredients' words contained in a cylindrical curved container.

Design of Heterogeneous Content Linkage Method by Analyzing Genbank (Genbank 분석을 통한 이종의 콘텐츠 연계 방안 설계)

  • Ahn, Bu-Young;Lee, Myung-Sun;Kim, Ji-Young;Oh, Chung-Shick
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.6
    • /
    • pp.49-54
    • /
    • 2010
  • As information on gene sequences is not only diverse but also extremely huge in volume, high-performance computer and information technology techniques are required to build and analyze gene sequence databases. This has given rise to the discipline of bioinformatics, a field of research where computers are utilized to collect, to manage, to save, to evaluate, and to analyze biological data. In line with such continued development in bioinformatics, the Korea Institute of Science and Technology Information (KISTI) has built an infrastructure for the biological information, based on the information technology, and provided the information for researchers of bioscience. This paper analyzes the reference fields of Genbank, the most frequently used gene database by the global researchers among the life information databases, and proposes the interface method to NDSL which is the science and technology information integrated service provided by KISTI. For these, after collecting Genbank data from NCBI FTP site, we rebuilt the database by separating Genbank text files into the basic gene data and the reference data. So new tables are generated through extracting the paper and patent information from Genbank reference fields. Then we suggest the method of connection with the paper DB and the patent DB operated by KISTI.

An Intelligence Support System Research on KTX Rolling Stock Failure Using Case-based Reasoning and Text Mining (사례기반추론과 텍스트마이닝 기법을 활용한 KTX 차량고장 지능형 조치지원시스템 연구)

  • Lee, Hyung Il;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.47-73
    • /
    • 2020
  • KTX rolling stocks are a system consisting of several machines, electrical devices, and components. The maintenance of the rolling stocks requires considerable expertise and experience of maintenance workers. In the event of a rolling stock failure, the knowledge and experience of the maintainer will result in a difference in the quality of the time and work to solve the problem. So, the resulting availability of the vehicle will vary. Although problem solving is generally based on fault manuals, experienced and skilled professionals can quickly diagnose and take actions by applying personal know-how. Since this knowledge exists in a tacit form, it is difficult to pass it on completely to a successor, and there have been studies that have developed a case-based rolling stock expert system to turn it into a data-driven one. Nonetheless, research on the most commonly used KTX rolling stock on the main-line or the development of a system that extracts text meanings and searches for similar cases is still lacking. Therefore, this study proposes an intelligence supporting system that provides an action guide for emerging failures by using the know-how of these rolling stocks maintenance experts as an example of problem solving. For this purpose, the case base was constructed by collecting the rolling stocks failure data generated from 2015 to 2017, and the integrated dictionary was constructed separately through the case base to include the essential terminology and failure codes in consideration of the specialty of the railway rolling stock sector. Based on a deployed case base, a new failure was retrieved from past cases and the top three most similar failure cases were extracted to propose the actual actions of these cases as a diagnostic guide. In this study, various dimensionality reduction measures were applied to calculate similarity by taking into account the meaningful relationship of failure details in order to compensate for the limitations of the method of searching cases by keyword matching in rolling stock failure expert system studies using case-based reasoning in the precedent case-based expert system studies, and their usefulness was verified through experiments. Among the various dimensionality reduction techniques, similar cases were retrieved by applying three algorithms: Non-negative Matrix Factorization(NMF), Latent Semantic Analysis(LSA), and Doc2Vec to extract the characteristics of the failure and measure the cosine distance between the vectors. The precision, recall, and F-measure methods were used to assess the performance of the proposed actions. To compare the performance of dimensionality reduction techniques, the analysis of variance confirmed that the performance differences of the five algorithms were statistically significant, with a comparison between the algorithm that randomly extracts failure cases with identical failure codes and the algorithm that applies cosine similarity directly based on words. In addition, optimal techniques were derived for practical application by verifying differences in performance depending on the number of dimensions for dimensionality reduction. The analysis showed that the performance of the cosine similarity was higher than that of the dimension using Non-negative Matrix Factorization(NMF) and Latent Semantic Analysis(LSA) and the performance of algorithm using Doc2Vec was the highest. Furthermore, in terms of dimensionality reduction techniques, the larger the number of dimensions at the appropriate level, the better the performance was found. Through this study, we confirmed the usefulness of effective methods of extracting characteristics of data and converting unstructured data when applying case-based reasoning based on which most of the attributes are texted in the special field of KTX rolling stock. Text mining is a trend where studies are being conducted for use in many areas, but studies using such text data are still lacking in an environment where there are a number of specialized terms and limited access to data, such as the one we want to use in this study. In this regard, it is significant that the study first presented an intelligent diagnostic system that suggested action by searching for a case by applying text mining techniques to extract the characteristics of the failure to complement keyword-based case searches. It is expected that this will provide implications as basic study for developing diagnostic systems that can be used immediately on the site.

A New Approach to Automatic Keyword Generation Using Inverse Vector Space Model (키워드 자동 생성에 대한 새로운 접근법: 역 벡터공간모델을 이용한 키워드 할당 방법)

  • Cho, Won-Chin;Rho, Sang-Kyu;Yun, Ji-Young Agnes;Park, Jin-Soo
    • Asia pacific journal of information systems
    • /
    • v.21 no.1
    • /
    • pp.103-122
    • /
    • 2011
  • Recently, numerous documents have been made available electronically. Internet search engines and digital libraries commonly return query results containing hundreds or even thousands of documents. In this situation, it is virtually impossible for users to examine complete documents to determine whether they might be useful for them. For this reason, some on-line documents are accompanied by a list of keywords specified by the authors in an effort to guide the users by facilitating the filtering process. In this way, a set of keywords is often considered a condensed version of the whole document and therefore plays an important role for document retrieval, Web page retrieval, document clustering, summarization, text mining, and so on. Since many academic journals ask the authors to provide a list of five or six keywords on the first page of an article, keywords are most familiar in the context of journal articles. However, many other types of documents could not benefit from the use of keywords, including Web pages, email messages, news reports, magazine articles, and business papers. Although the potential benefit is large, the implementation itself is the obstacle; manually assigning keywords to all documents is a daunting task, or even impractical in that it is extremely tedious and time-consuming requiring a certain level of domain knowledge. Therefore, it is highly desirable to automate the keyword generation process. There are mainly two approaches to achieving this aim: keyword assignment approach and keyword extraction approach. Both approaches use machine learning methods and require, for training purposes, a set of documents with keywords already attached. In the former approach, there is a given set of vocabulary, and the aim is to match them to the texts. In other words, the keywords assignment approach seeks to select the words from a controlled vocabulary that best describes a document. Although this approach is domain dependent and is not easy to transfer and expand, it can generate implicit keywords that do not appear in a document. On the other hand, in the latter approach, the aim is to extract keywords with respect to their relevance in the text without prior vocabulary. In this approach, automatic keyword generation is treated as a classification task, and keywords are commonly extracted based on supervised learning techniques. Thus, keyword extraction algorithms classify candidate keywords in a document into positive or negative examples. Several systems such as Extractor and Kea were developed using keyword extraction approach. Most indicative words in a document are selected as keywords for that document and as a result, keywords extraction is limited to terms that appear in the document. Therefore, keywords extraction cannot generate implicit keywords that are not included in a document. According to the experiment results of Turney, about 64% to 90% of keywords assigned by the authors can be found in the full text of an article. Inversely, it also means that 10% to 36% of the keywords assigned by the authors do not appear in the article, which cannot be generated through keyword extraction algorithms. Our preliminary experiment result also shows that 37% of keywords assigned by the authors are not included in the full text. This is the reason why we have decided to adopt the keyword assignment approach. In this paper, we propose a new approach for automatic keyword assignment namely IVSM(Inverse Vector Space Model). The model is based on a vector space model. which is a conventional information retrieval model that represents documents and queries by vectors in a multidimensional space. IVSM generates an appropriate keyword set for a specific document by measuring the distance between the document and the keyword sets. The keyword assignment process of IVSM is as follows: (1) calculating the vector length of each keyword set based on each keyword weight; (2) preprocessing and parsing a target document that does not have keywords; (3) calculating the vector length of the target document based on the term frequency; (4) measuring the cosine similarity between each keyword set and the target document; and (5) generating keywords that have high similarity scores. Two keyword generation systems were implemented applying IVSM: IVSM system for Web-based community service and stand-alone IVSM system. Firstly, the IVSM system is implemented in a community service for sharing knowledge and opinions on current trends such as fashion, movies, social problems, and health information. The stand-alone IVSM system is dedicated to generating keywords for academic papers, and, indeed, it has been tested through a number of academic papers including those published by the Korean Association of Shipping and Logistics, the Korea Research Academy of Distribution Information, the Korea Logistics Society, the Korea Logistics Research Association, and the Korea Port Economic Association. We measured the performance of IVSM by the number of matches between the IVSM-generated keywords and the author-assigned keywords. According to our experiment, the precisions of IVSM applied to Web-based community service and academic journals were 0.75 and 0.71, respectively. The performance of both systems is much better than that of baseline systems that generate keywords based on simple probability. Also, IVSM shows comparable performance to Extractor that is a representative system of keyword extraction approach developed by Turney. As electronic documents increase, we expect that IVSM proposed in this paper can be applied to many electronic documents in Web-based community and digital library.