• Title/Summary/Keyword: Text data

Search Result 2,956, Processing Time 0.038 seconds

Descriptor Profiling for Research Domain Analysis (연구영역분석을 위한 디스크립터 프로파일링에 관한 연구)

  • Kim, Pan-Jun;Lee, Jae-Yun
    • Journal of the Korean Society for information Management
    • /
    • v.24 no.4
    • /
    • pp.285-303
    • /
    • 2007
  • This study aims to explore a new technique making complementary linkage between controlled vocabularies and uncontrolled vocabularies for analyzing a research domain. Co-word analysis can be largely divided into two based on the types of vocabulary used: controlled and uncontrolled. In the case of using controlled vocabulary, data sparseness and indexer effect are inherent drawbacks. On the other case, word selection by the author's perspective and word ambiguity. To complement each other, we suggest a descriptor profiling that represents descriptors(controlled vocabulary) as the co-occurrence with words from the text(uncontrolled vocabulary). Applying the profiling to the domain of information science implies that this method can complement each other by reducing the inherent shortcoming of the controlled and uncontrolled vocabulary.

The Epigraph Reading Method using a Visualization Technique based on Morphological Characteristics of the Letters (각자된 글자의 형태적 시각화를 이용한 금석문 판독방법)

  • Choi, Won-Ho;Ko, Sun-Woo
    • The Journal of the Korea Contents Association
    • /
    • v.17 no.1
    • /
    • pp.740-749
    • /
    • 2017
  • The epigraphy is a text or a picture engraved on metal or stone. One of advantages of rubbing of ancient inscription has been used in epigraphic field is simple. But the rubbing is not an optimal method in viewpoints of resolution and noise to decode the inscribed characters. In this study, we proposed a new research method that increases the possibility of a reading by reflecting the 3D characteristics of the engraved letters. The proposed techniques apply 3D scanning technology to obtain three-dimensional and high quality data of each of the letters in the epigraphy and use Ambient Occlusion visualization techniques to express the shade according to the 3D form of the letters. Research result enhances the readability of the letters that removed the damaged and worn information from the letters information of surface. This research contributes to narrow the scope of a particular letter and read to the controversial letters on the Pohang Jungsoengri Silla Stone Monument(Korea's national treasure number 318).

A Path Storing and Number Matching Method for Management of XML Documents using RDBMS (RDBMS를 이용하여 XML 문서 관리를 위한 경로 저장과 숫자 매칭 기법)

  • Vong, Ha-Ik;Hwang, Byung-Yeon
    • Journal of Korea Multimedia Society
    • /
    • v.10 no.7
    • /
    • pp.807-816
    • /
    • 2007
  • Since W3C proposed XML in 1996, XML documents have been widely spreaded in many internet documents. Because of this, needs for research related with XML is increasing. Especially, it is being well performed to study XML management system for storage, retrieval, and management with XML Documents. Among these studies, XRel is a representative study for XML management and has been become a comparative study. In this study, we suggest XML documents management system based on Relational DataBase Management System. This system is stored not all possible path expressions such as XRel, but filtered path expression which has text value or attribute value. And by giving each node Node Expression Identifier, we try to match given Node Expression Identifier. Finally, to prove efficiency of the suggested technique, this paper shows the result of experiment that compares XPath query processing performance between suggested study and existing technique, XRel.

  • PDF

Reconstructing Web Broadcasting Information based on User Retrieval Pattern (무선 환경에서 사용자 검색 성향을 반영한 웹 방송 정보 재구성 기법)

  • Kim, Won-Cheol;Lee, Soo-Cheol;Hwang, Een-Jun;Byeon, Kwang-Jun
    • The KIPS Transactions:PartD
    • /
    • v.11D no.5
    • /
    • pp.1149-1158
    • /
    • 2004
  • Today the fastest growing communities of web users are mobile visitors who browse web page with wireless PDAs and cellular phones. However, most web pages are optimiaed exclusively for desktop clients on the broadband network and are inconvenient to users with small screen mobile devices. They display only a few lines of text and cannot run client-side programs or scripts due to lack of system resource. Even worse, their connections are usually slow to support most of the data-intensive applications. In this paper, we propose a pageslet scheme that makes it feasible to browse ordinary web pages on small screen mobile devices. It extracts broadcasting sections of user preference from broadcasting web pages and automatically reorganizes the extracted sections for convenient browsing on mobile devices.

A Multimedia Mail System using IMAP Protocol (IMAP 프로토콜을 이용한 멀티미디어 메일 시스템)

  • Lee, Bong-Hwan;Park, Mun-Ho;Lee, Ha-Uk;Ju, Gi-Ho;Lee, Chan-Do;Lee, Nam-Jun;Sim, Yeong-Jin
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.5
    • /
    • pp.1297-1307
    • /
    • 1997
  • This paper presents a multimedia mail system which transmit and redeive multimedia mailing messges on Intemet.This mail system is an extension of the exsting e-mail system for multimedia uncluding,text,image,MPEG video,and binary data,The MIME(Multipurpose Intert Mail Extensions)format,which is an extension of REF-822 maill format,is used to reprssent multimedia,and SMTP(Simple Mail Transfer Protocol)is utilized as a mail transport prttocol.The IMAP(Intenet Mail Access Protcol)which privides more functions than the widely used POP(Post Office Protocol)is used as a mailbox retrival protocol.The mail client is complemented on a multimedia PC while the server is implemented on a UNIX system.In the mail system, a mail sending program allows a user to attach binary files such as Postscript files and MPEG compressed video,while a receiving program provides direct interface to application programs to play back received multimedia mail messages.

  • PDF

A Study on the Copyright Contents of Medical Library Web Site (국내 의과대학도서관 웹사이트에 나타난 저작권관련사항에 관한 연구)

  • Choi, Hung-Sik;Yoon, Mi-Hui
    • Journal of Korean Library and Information Science Society
    • /
    • v.37 no.4
    • /
    • pp.371-389
    • /
    • 2006
  • The purpose of this study is to induce a librarian to construct databases lawfully and the user with a copyright and the fair use. Based on information from web sites, the domestic medical libraries(32) were investigated for analyzing how to deal with copyright problems. The results show that only 28% of the medical libraries provide users with information on copyright at their web site. The terminology used was different among libraries. 44.4% of the libraries use 'copyright law' and 33.3% 'copyright' The terminology appeared at the 'notice' menu and 'library guidance' menu. The detail part of copyright contents was shown at 'the data copy and transmission', 'full text DB use', 'thesis and dissertation service', 'FAX/file transfer', 'Image(Ariel)', and 'interlibrary loan'. In general, there was a necessity to the correct understanding of the copyright and fair use.

  • PDF

A Spelling Error Correction Model in Korean Using a Correction Dictionary and a Newspaper Corpus (교정사전과 신문기사 말뭉치를 이용한 한국어 철자 오류 교정 모델)

  • Lee, Se-Hee;Kim, Hark-Soo
    • The KIPS Transactions:PartB
    • /
    • v.16B no.5
    • /
    • pp.427-434
    • /
    • 2009
  • With the rapid evolution of the Internet and mobile environments, text including spelling errors such as newly-coined words and abbreviated words are widely used. These spelling errors make it difficult to develop NLP (natural language processing) applications because they decrease the readability of texts. To resolve this problem, we propose a spelling error correction model using a spelling error correction dictionary and a newspaper corpus. The proposed model has the advantage that the cost of data construction are not high because it uses a newspaper corpus, which we can easily obtain, as a training corpus. In addition, the proposed model has an advantage that additional external modules such as a morphological analyzer and a word-spacing error correction system are not required because it uses a simple string matching method based on a correction dictionary. In the experiments with a newspaper corpus and a short message corpus collected from real mobile phones, the proposed model has been shown good performances (a miss-correction rate of 7.3%, a F1-measure of 97.3%, and a false positive rate of 1.1%) in the various evaluation measures.

Implementing Biological Network Analysis System through Oriental Medical Literature Analysis (한의학 분야 문헌 분석을 통한 생물학적 네트워크 분석시스템 개발)

  • Yu, Seok Jong;Cho, Yongseong;Lee, Junehawk;Seo, Dongmin;Yea, Sang-Jun;Kim, Chul
    • The Journal of the Korea Contents Association
    • /
    • v.15 no.10
    • /
    • pp.616-625
    • /
    • 2015
  • Currently, oriental medicine research is focused with modern research technology and validate it's various biochemical effect by combining with molecular biology technology. But there are few searching system for finding biochemical mechanism which is related to major compounds in oriental medicine. In this research, we aimed developing korean herb database based on text-mining system by analyzing PubMed data. We have developed prototype system for searching chemical, gene and biological relation in oriental medicine. It is characterized by modern oriental medicine research trend with major chemical, gene and protein information. Analysis results can be searched on the prototype system with visualization of the biological interactions.

Research on Designing Korean Emotional Dictionary using Intelligent Natural Language Crawling System in SNS (SNS대상의 지능형 자연어 수집, 처리 시스템 구현을 통한 한국형 감성사전 구축에 관한 연구)

  • Lee, Jong-Hwa
    • The Journal of Information Systems
    • /
    • v.29 no.3
    • /
    • pp.237-251
    • /
    • 2020
  • Purpose The research was studied the hierarchical Hangul emotion index by organizing all the emotions which SNS users are thinking. As a preliminary study by the researcher, the English-based Plutchick (1980)'s emotional standard was reinterpreted in Korean, and a hashtag with implicit meaning on SNS was studied. To build a multidimensional emotion dictionary and classify three-dimensional emotions, an emotion seed was selected for the composition of seven emotion sets, and an emotion word dictionary was constructed by collecting SNS hashtags derived from each emotion seed. We also want to explore the priority of each Hangul emotion index. Design/methodology/approach In the process of transforming the matrix through the vector process of words constituting the sentence, weights were extracted using TF-IDF (Term Frequency Inverse Document Frequency), and the dimension reduction technique of the matrix in the emotion set was NMF (Nonnegative Matrix Factorization) algorithm. The emotional dimension was solved by using the characteristic value of the emotional word. The cosine distance algorithm was used to measure the distance between vectors by measuring the similarity of emotion words in the emotion set. Findings Customer needs analysis is a force to read changes in emotions, and Korean emotion word research is the customer's needs. In addition, the ranking of the emotion words within the emotion set will be a special criterion for reading the depth of the emotion. The sentiment index study of this research believes that by providing companies with effective information for emotional marketing, new business opportunities will be expanded and valued. In addition, if the emotion dictionary is eventually connected to the emotional DNA of the product, it will be possible to define the "emotional DNA", which is a set of emotions that the product should have.

Voice Command Web Browser Using Variable Vocabulary Word Recognizer (가변어휘 단어 인식기를 사용한 음성 명령 웹 브라우저)

  • 이항섭
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.2
    • /
    • pp.48-52
    • /
    • 1999
  • In this paper, we describe a Voice Command Web Browser using a variable vocabulary word recognizer that can do Internet surfing with Korean speech recognition on the Web. The feature of this browser is that it can handle the links and menus of the web browser by speech. Therefore, we can use speech interface together with mouse for web browsing. To recognize the recognition candidates dynamically changing according to Web pages, we use the variable vocabulary word recognizer. The recognizer was trained using POW (Phonetically Optimized Words) 3,848 words. So that it can recognize new words which did not exist in training data. The preliminary test results showed that the performance of speaker-independent and vocabulary-independent recognition is 93.8% for 32 Korean words. The Voice Command Web Browser was developed on windows 95/NT using Netscape Navigator and reflected usability test results in order to offer easy interface to users unfamiliar with speech interface. In on-line experiment of speaker-independent and environment-independent situation, Voice Command Web Browser showed recognition accuracy of 90%.

  • PDF