• Title/Summary/Keyword: Text line information

Search Result 147, Processing Time 0.024 seconds

Cluster-Based Selection of Diverse Query Examples for Active Learning (능동적 학습을 위한 군집화 기반의 다양한 복수 문의 예제 선정 방법)

  • Kang, Jae-Ho;Ryu, Kwang-Ryel;Kwon, Hyuk-Chul
    • Journal of Intelligence and Information Systems
    • /
    • v.11 no.1
    • /
    • pp.169-189
    • /
    • 2005
  • In order to derive a better classifier with a limited number of training examples, active teaming alternately repeats the querying stage fur category labeling and the subsequent learning stage fur rebuilding the calssifier with the newly expanded training set. To relieve the user from the burden of labeling, especially in an on-line environment, it is important to minimize the number of querying steps as well as the total number of query examples. We can derive a good classifier in a small number of querying steps by using only a small number of examples if we can select multiple of diverse, representative, and ambiguous examples to present to the user at each querying step. In this paper, we propose a cluster-based batch query selection method which can select diverse, representative, and highly ambiguous examples for efficient active learning. Experiments with various text data sets have shown that our method can derive a better classifier than other methods which only take into account the ambiguity as the criterion to select multiple query examples.

  • PDF

A New Vocoder based on AMR 7.4Kbit/s Mode for Speaker Dependent System (화자 의존 환경의 AMR 7.4Kbit/s모드에 기반한 보코더)

  • Min, Byung-Jae;Park, Dong-Chul
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.33 no.9C
    • /
    • pp.691-696
    • /
    • 2008
  • A new vocoder of Code Excited Linear Predictive (CELP) based on Adaptive Multi Rate (AMR) 7.4kbit/s mode is proposed in this paper. The proposed vocoder achieves a better compression rate in an environment of Speaker Dependent Coding System (SDSC) and is efficiently used for systems, such as OGM(Outgoing message) and TTS(Text To Speech), which needs only one person's speech. In order to enhance the compression rate of a coder, a new Line Spectral Pairs(LSP) code-book is employed by using Centroid Neural Network (CNN) algorithm. In comparison with original(traditional) AMR 7.4 Kbit/s coder, the new coder shows 27% higher compression rate while preserving synthesized speech quality in terms of Mean Opinion Score(MOS).

A Validation of Effectiveness for Intrusion Detection Events Using TF-IDF (TF-IDF를 이용한 침입탐지이벤트 유효성 검증 기법)

  • Kim, Hyoseok;Kim, Yong-Min
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.28 no.6
    • /
    • pp.1489-1497
    • /
    • 2018
  • Web application services have diversified. At the same time, research on intrusion detection is continuing due to the surge of cyber threats. Also, As a single-defense system evolves into multi-level security, we are responding to specific intrusions by correlating security events that have become vast. However, it is difficult to check the OS, service, web application type and version of the target system in real time, and intrusion detection events occurring in network-based security devices can not confirm vulnerability of the target system and success of the attack A blind spot can occur for threats that are not analyzed for problems and associativity. In this paper, we propose the validation of effectiveness for intrusion detection events using TF-IDF. The proposed scheme extracts the response traffics by mapping the response of the target system corresponding to the attack. Then, Response traffics are divided into lines and weights each line with an TF-IDF weight. we checked the valid intrusion detection events by sequentially examining the lines with high weights.

Automatic indexing as a subject analysis technique (주제분석기법으로서의 자동색인)

  • 이영자
    • Journal of Korean Library and Information Science Society
    • /
    • v.12
    • /
    • pp.61-96
    • /
    • 1985
  • The human subject analysis of a document has some critical problems. The method results in the inconsistency in analysis process and the contradiction of two objects of the subject analysis (one is the identification of the content for the retrieval of specific items and the other is to identify the content for the grouping of related materials). Since the subject analysis by mechanized has been recognized to be the possible way to aggregate the problems of manual analysis, various a n.0, pproaches of automatic indexing have been studied and experimented. This study is to examine the automatic indexing as one of the promising subject analysis techniques by statistical, syntactical and semantic a n.0, pproaches. In conclusion, the reasonable a n.0, pplication time of the automatic indexing should be made a decision based on the through investigation on the cost verse effectiveness, and automatic indexing system should be developed in the close relationship with the on-line search which is a good retrieval system for information explosion society. From now on, since the machine-readable document-text will be envisaged to be more and more available due to the rapid development of computer technology, the more substantial research on the automatic indexing will be also possible, which can bring about the increasing of practical automatic indexing systems.

  • PDF

Development of Classification Model for Healthcare Contents on the Online Community (온라인 커뮤니티에서의 건강 관련 콘텐츠 분류 모형 개발)

  • Kim, Tae-Yun;Kim, Yoo-Sin;Choi, Sang-Hyun;Kim, Do-Hun;Chang, You-Jin
    • The Journal of Information Systems
    • /
    • v.26 no.4
    • /
    • pp.285-301
    • /
    • 2017
  • Purpose In this paper we verified the reliabilities of healthcare-related information provided by various users on the site of Naver Jisikin, a Korean typical search platform. Based on Q&A contents we validated answers' reliabilities to the asked questions about a lung cancer with the help of professors at a medical school. Design/methodology/approach The content analysis includes that the types of questions are classified into symptom/diagnosis, therapy, prognosis, after-management and so on. The answers contains advice, advertisement, oriental medicine, and religion as well as the above 5 question categories. The validation results of medical evidence about each answer show that only 49% among all answers have medical grounds. Findings We classified the medical grounded answers into three levels; high, medium and low. Among all answers we need to find out the answers including advertisement because the answers can be harmful to patients. We found the method to select the answers containing advertisement contents with the help of text mining research. The selection model presents high performance as 84% classification accuracy.

E-mail System Providing Integrated User's View for the Message containing Image and Text (이미지와 텍스트 메시지의 통합 사용자 뷰를 제공하는 전자 우편 시스템)

  • Dok-Go, Se-Jun;Lee, Taek-Gyun;Lee, Hyeong-U;Yun, Seong-Hyeon;Lee, Seong-Hwan;Kim, Chang-Heon;Kim, Tae-Yun
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.2
    • /
    • pp.563-572
    • /
    • 1997
  • E-mail has been eidely used for unformation delivery as an Inernet serive. As multimedia etchnologies are developed rapidly, most of the recent Unternet infornation servies support multimedia data. E-mail system also needs to suport multimedia nesage. But Internet mail servise using simple maiol transfer protocol(SMTP) speci-fied in RFC 821/822 handles only ASCII text messages repressented with 7-bit code. Each line the message has the length limitation as well. Those are why it cannot satisfy the diverse user'w demands. Multipuepose Unternet mail extensions(MIMZE), which is a modification and supplement of RFC 822,was proposed for supporting transportation of multimedia data.It can solve the limitations of sizes and types in contents of a message. In this study the E-mail system has been designed and implemented according to the MIME standard in order to solve the limitations of transpotation of messages regardless of the message content type. Hypertext markup language(HTML)syntax is applied to the mail system, and so it is possible to display a message consisting of differnt media as an intergrated from for the purpose of better understanding a message. No application program is needed for displaying a message including image data,and convenience for user is considered in the system. The futuer work is to improve the E-mail system so that it may support motion pictures and sound information,Thereby tge perfor multimeda E-mail system providing inergrated user's wiew will be developed.

  • PDF

Development of Intelligent OCR Technology to Utilize Document Image Data (문서 이미지 데이터 활용을 위한 지능형 OCR 기술 개발)

  • Kim, Sangjun;Yu, Donghui;Hwang, Soyoung;Kim, Minho
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.05a
    • /
    • pp.212-215
    • /
    • 2022
  • In the era of so-called digital transformation today, the need for the construction and utilization of big data in various fields has increased. Today, a lot of data is produced and stored in a digital device and media-friendly manner, but the production and storage of data for a long time in the past has been dominated by print books. Therefore, the need for Optical Character Recognition (OCR) technology to utilize the vast amount of print books accumulated for a long time as big data was also required in line with the need for big data. In this study, a system for digitizing the structure and content of a document object inside a scanned book image is proposed. The proposal system largely consists of the following three steps. 1) Recognition of area information by document objects (table, equation, picture, text body) in scanned book image. 2) OCR processing for each area of the text body-table-formula module according to recognized document object areas. 3) The processed document informations gather up and returned to the JSON format. The model proposed in this study uses an open-source project that additional learning and improvement. Intelligent OCR proposed as a system in this study showed commercial OCR software-level performance in processing four types of document objects(table, equation, image, text body).

  • PDF

Design and Implementation of Geographic Education Website Based on the Google Earth (구글어스 기반의 지리교육 사이트 설계 및 구현)

  • Lee, Sun-Ju;Kang, Young-Ok
    • Spatial Information Research
    • /
    • v.18 no.2
    • /
    • pp.13-24
    • /
    • 2010
  • The purpose of this research is to explore the possibility of geographic education by implementing the map-based geographic education site which mashed up with Google earth by referring the various materials of geographic education which exist in on-line and off-line. In recent years map-based geographic education is required by the radical change of geoweb environments, but there have been few researches in this field. This research is folded up as follows: First, we designed the contents through the textbook analysis and then collect various data related to the contents such as pictures, video clips, conceptual map, etc. which are required to explain the concept. Second, we mashed up the collected data on the Google earth by using the Google's open API. Third, we implemented the geographic education website based on the classification of contents in textbook and the various collected data. This research is important in both that it explores the possibility of the map-based education rather than the text-based education in the geographic field which handles mainly the space and finds the best method to express the various concepts of the textbook on the geoweb environments.

Improvement of the Readability for Text using Graphic Software - Laying Stress on Anti-Aliasing in Digital Media - (그래픽 소프트웨어를 활용한 문자가독성 개선 - 디지털미디어 환경의 앤티에일리어싱(Anti-Aliasing)을 중심으로 -)

  • Kim, Yong-Chul;Kim, Un-Chong
    • The Journal of the Korea Contents Association
    • /
    • v.8 no.12
    • /
    • pp.141-150
    • /
    • 2008
  • The purpose of this study is to find the way of enhancing readability with Anti-Aliasing in digital media environment. Subjects of this paper were established analyzing Anti-Aliasing & Multiple Anti-Aliasing. First, this study is to find a actual condition of using Anti-Aliasing in digital media environment. Then Gathered the information of Multiple Anti-Aliasing samples from internat. I made enhancing readablity result with that information and tested this result for finding the best result for readability. This study will present the guid line for the person who is working as a designer in field to find the best result for readability with Multiple Anti-Aliasing. Also, will show us a readablity's possibility of using Multiple Anti-Aliasing for next verson of graphic software.

Analysis of Business Overview and use of 'C'group's Internet phone of National Information and Communication Services (국가정보통신서비스의 'C'그룹 인터넷전화 사업현황과 이용 분석)

  • Shin, Jin;Park, Dea-Woo
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.15 no.11
    • /
    • pp.2391-2398
    • /
    • 2011
  • National Information and Communication Services of Public Administration and Security organized by the 'A 'group (Line service network), 'B' group (IP service network), 'C' Group (Voice over Internet protocol(VoIP) service, IP application services) are provided by constructing the infrastructure. National Information and Communications Services 'C' group, providers are providing VoIP services. In this paper, national information and communications service 'C' group, providers of domestic calls, international calls, including calls to move we will study the basic telephone service. And text messaging, video telephony, IP-Centrex services, etc. we will study the seven value-added services. In addition, national information and communication service providers on the status of the project based on the analysis of national information and communication Internet telephone network using Internet telephony is the type of analysis. In this study, national information and communications services industry, will serve as the basis for the development.