• Title/Summary/Keyword: Indexing Language

Search Result 92, Processing Time 0.024 seconds

Improving Indexing Performance by using Occurrence Pattern Information of Proper Nouns (고유 명사 출현 패턴을 이용한 색인의 성능 향상에 관한 연구)

  • Jung, Rae-Jung;Kim, Jun-Tae
    • Annual Conference on Human and Language Technology
    • /
    • 1996.10a
    • /
    • pp.68-72
    • /
    • 1996
  • 본 논문에서는 고유 명사 출현 패턴 정보와 부가 정보를 이용한 미등록 고유 명사의 색인 방법을 제안한다. 정보 검색 시스템에서 고유 명사의 처리는 정확하고 의미 있는 색인을 위해 매우 중요하다. 본 논문은 형태소 분석 결과에 고유 명사 출현 패턴과 패턴 부가 정보를 사용하여 인명, 기관명, 회사명 등의 고유 명사 추출의 정확도를 높이는 방법을 제시한다. 총 827개의 인명과 기관 및 회사명을 포함하고 있는 조선일보 경제면 기사 100개 7416 어절에 대하여 본 시스템으로 실험한 결과, 인명의 경우 89%의 정확률을 보였다. 본 논문에서 제시한 출현 패턴과 고유 명사의 부가 정보를 적용했을 때 단순한 형태소 분석 결과에 비하여 고유 명사 추출 오류가 크게 개선되었다.

  • PDF

Indexing and Ranking Mathematical Equations Using Postfix Notation (후위 표기법을 사용한 수학식 색인 및 랭킹)

  • Lee, Sehee;Shin, Junsoo;Kim, Harksoo
    • Annual Conference on Human and Language Technology
    • /
    • 2009.10a
    • /
    • pp.160-164
    • /
    • 2009
  • 최근 인터넷 및 컴퓨터의 사용이 활발해짐에 따라 문서의 디지털화가 빠르게 진행되고 있다. 이런 변화에 따라 수학식이 많이 사용되는 과학, 공학, 수학 등의 분야와 관련된 문서들을 검색해야할 필요성이 늘어가고 있다. 그러나 현재 일반 검색엔진은 텍스트 검색만을 제공하며 별도의 수학식 검색은 제공하지 않는다. 따라서 본 논문에서는 수학식 검색이 가능하도록 수학식의 색인 방법 및 랭킹 방법을 제안한다. 제안하는 색인 방법은 MathML로 입력되는 수학식을 후위 표기법과 일반 색인 방법의 두 가지로 색인하며, 언어모델을 사용하여 질의에 적합한 수학식을 랭킹한다. 일반 검색 엔진의 성능과 비교하기 위하여 2-포아송 모델과 제안 모델을 비교하였으며, 그 결과 제안 모델의 성능이 더 우수함을 보였다.

  • PDF

Memory-based Pattern Completion in Database Semantics

  • Hausser Roland
    • Language and Information
    • /
    • v.9 no.1
    • /
    • pp.69-92
    • /
    • 2005
  • Pattern recognition in cognitive agents is based on (i) the uninterpreted input data (e.g. parameter values) provided by the agent's hardware devices and (ii) and interpreted patterns (e.g. templates) provided by the agent's memory. Computationally, the task consists in finding the memory data corresponding best to the input data, for any given input. Once the best fitting memory data have been found, the input is recognized by applying to it the interpretation which happens to be stored with the memorized pattern. This paper presents a fast converging procedure which starts from a few initially recognized items and then analyzes the remainder of the input by systematically checking for items shown by memory to have been related to the initial items in previous encounters. In this way, known patterns are tried first, and only when they have been exhausted, an elementary exploration of the input is commenced. Efficiency is improved further by choosing the candidate to be tested next according to frequency.

  • PDF

Against Pied-Piping

  • Choi, Young-Sik
    • Language and Information
    • /
    • v.6 no.2
    • /
    • pp.171-185
    • /
    • 2002
  • I claim that the asymmetry of locality effects in wh-questions involving Complex Noun Phrase Island in Korean follows from the proposal for the asymmetric mode of scope taking between way (why) and the other wh-words in Korean as laid out in Choi (2002). 1 will show that the present proposal is superio. to the LF pied-piping approach in Nishigauchi (1990) and WH-structure pied-piping in von Stechow(1996) in that it does not have the fatal problem of wrong semantics in Nishigauchi and Subjacency violation problem in von Stechow. The crossed reading in examples involving Wh-island has an interesting implication for the mechanism of unselective binding, suggesting that Heim's (1982) quantifier indexing mechanism, which requires the local unselective binding of the indefinite by the unselective binder, may be too strong.

  • PDF

Efficient Indexing Technique for Retrieval of an XML Document and Design of Query Language (TQL) (XML 문서의 검색을 위한 효율적인 색인 기법과 질의 언어(TQL)의 설계)

  • 이계준;신동욱;권택근
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 1999.10a
    • /
    • pp.57-59
    • /
    • 1999
  • 현재 WWW(World Wide Web), 사무 자동화 시스템(Office Information System), 전자 도서관(Digital Library) 등의 빠른 발전으로 인하여 정보가 기하급수적으로 증가하였다. 이러한 방대한 양의 정보를 처리하기 위하여 많은 인터넷 기반의 문서 표준들이 출현하였고, 대표적으로 XML(eXtensible Markup Language)이 차세대 인터넷 전자 문서의 표준으로 많은 곳에 응용되고 있다. 이에 따라 XML 문서의 정보들을 효율적이고 정확하게 저장하고 이용, 검색 할 수 있는 기능을 요구되어졌다. 현재 대부분의 연구들은 XML 문서에 대한 구조적인 정보만을 저장하고 검색하는 기능만을 지원 할 뿐 검색된 결과에 대한 재사용이나 재구성에 대한 기능의 제공은 미흡한 실정이다. 본 논문에서는 현재 검색기들이 제공하는 XML 문서에 대한 구조적인 검색 기능을 확장하여 XML 문서를 보다 효율적으로 검색하기 위하여 새로운 색인 기법을 제안하고, 데이터베이스 내에 저장된 XML문서에 대해 구조적인 검색과 이것을 바탕으로 문서를 재구성하고 재사용하는 기능을 수행할 수 있도록 새로운 질의어(TQL)을 설계하였다.

  • PDF

NVST DATA ARCHIVING SYSTEM BASED ON FASTBIT NOSQL DATABASE

  • Liu, Ying-Bo;Wang, Feng;Ji, Kai-Fan;Deng, Hui;Dai, Wei;Liang, Bo
    • Journal of The Korean Astronomical Society
    • /
    • v.47 no.3
    • /
    • pp.115-122
    • /
    • 2014
  • The New Vacuum Solar Telescope (NVST) is a 1-meter vacuum solar telescope that aims to observe the fine structures of active regions on the Sun. The main tasks of the NVST are high resolution imaging and spectral observations, including the measurements of the solar magnetic field. The NVST has been collecting more than 20 million FITS files since it began routine observations in 2012 and produces maximum observational records of 120 thousand files in a day. Given the large amount of files, the effective archiving and retrieval of files becomes a critical and urgent problem. In this study, we implement a new data archiving system for the NVST based on the Fastbit Not Only Structured Query Language (NoSQL) database. Comparing to the relational database (i.e., MySQL; My Structured Query Language), the Fastbit database manifests distinctive advantages on indexing and querying performance. In a large scale database of 40 million records, the multi-field combined query response time of Fastbit database is about 15 times faster and fully meets the requirements of the NVST. Our slestudy brings a new idea for massive astronomical data archiving and would contribute to the design of data management systems for other astronomical telescopes.

A Study of Retrieval Model Providing Relevant Sentences in Storytelling on Semantic Web (시맨틱 웹 환경에서 적합한 문장을 제공하는 이야기 쓰기 도우미에 관한 연구)

  • Lee, Tae-Young
    • Journal of the Korean Society for information Management
    • /
    • v.26 no.4
    • /
    • pp.7-34
    • /
    • 2009
  • Structures of stories, paragraphs, and sentences and inferences applied to indexing and searching were studied to construct the full-text and sentence retrieval system for storytelling. The system designed the database of stories, paragraphs, and sentences and the knowledge-base of inference rules to aid to write the story. The Knowledge-base comprised the files of story frames, paragraph scripts, and sentence logics made by mark-up languages like SWRL etc. able to operate in semantic web. It is necessary to establish more precise indexing language represented the sentences and to create a mark-up languages able to construct more accurate inference rules.

Design of Efficient Storage Structure and Indexing Mechanism for XML Documents (XML을 위한 효율적인 저장구조 및 인덱싱 기법설계)

  • 신판섭
    • Journal of the Korea Computer Industry Society
    • /
    • v.5 no.1
    • /
    • pp.87-100
    • /
    • 2004
  • XML has recently considered as a new standard for data presentation and exchange on the web, many researches are on going to develop applications and index mechanism to store and retrieve XML documents efficiently. In this paper, design a Main-Memory based XML storage system for efficient management of XML document. And propose structured retrieval of XML document tree which reduce the traverse of XML document tree using element type information included user queries. Proposed indexing mechanism has flexibilities for dynamic data update. Finally, for query processing of XML document include Link information, design a index structure of table type link information on observing XLink standards.

  • PDF

Path Signatures : Path-oriented Query Processing System for XML document Retrieval (경로 서명 : XML문서 검색을 위한 경로-지향 질의처리 시스템)

  • Park, Hee-Sook;Park, Ju-Hyun;Cho, Woo-Hyun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.11 no.7
    • /
    • pp.1311-1317
    • /
    • 2007
  • Recently, due to the popularity and explosive growth of the Internet, the information exchange is increasing so rapidly over the Internet. Also the XML is becoming a standard as well as a major tool of data exchange on the Internet and thus we propose the new indexing technique for evaluating a path-oriented query and design and implementation of Path-oriented Query Processing System to give useful for users. In proposed indexing technique, which combined a binary trio structure with a path signature file to improve performance of XML document retrieval.

Integrated Indexing Method using Compound Noun Segmentation and Noun Phrase Synthesis (복합명사 분할과 명사구 합성을 이용한 통합 색인 기법)

  • Won, Hyung-Suk;Park, Mi-Hwa;Lee, Geun-Bae
    • Journal of KIISE:Software and Applications
    • /
    • v.27 no.1
    • /
    • pp.84-95
    • /
    • 2000
  • In this paper, we propose an integrated indexing method with compound noun segmentation and noun phrase synthesis. Statistical information is used in the compound noun segmentation and natural language processing techniques are carefully utilized in the noun phrase synthesis. Firstly, we choose index terms from simple words through morphological analysis and part-of-speech tagging results. Secondly, noun phrases are automatically synthesized from the syntactic analysis results. If syntactic analysis fails, only morphological analysis and tagging results are applied. Thirdly, we select compound nouns from the tagging results and then segment and re-synthesize them using statistical information. In this way, segmented and synthesized terms are used together as index terms to supplement the single terms. We demonstrate the effectiveness of the proposed integrated indexing method for Korean compound noun processing using KTSET2.0 and KRIST SET which are a standard test collection for Korean information retrieval.

  • PDF