• Title/Summary/Keyword: phrases retrieval

Search Result 21, Processing Time 0.021 seconds

Reputation Analysis of Document Using Probabilistic Latent Semantic Analysis Based on Weighting Distinctions (가중치 기반 PLSA를 이용한 문서 평가 분석)

  • Cho, Shi-Won;Lee, Dong-Wook
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.58 no.3
    • /
    • pp.632-638
    • /
    • 2009
  • Probabilistic Latent Semantic Analysis has many applications in information retrieval and filtering, natural language processing, machine learning from text, and in related areas. In this paper, we propose an algorithm using weighted Probabilistic Latent Semantic Analysis Model to find the contextual phrases and opinions from documents. The traditional keyword search is unable to find the semantic relations of phrases, Overcoming these obstacles requires the development of techniques for automatically classifying semantic relations of phrases. Through experiments, we show that the proposed algorithm works well to discover semantic relations of phrases and presents the semantic relations of phrases to the vector-space model. The proposed algorithm is able to perform a variety of analyses, including such as document classification, online reputation, and collaborative recommendation.

An Example-Based Engligh Learing Environment for Writing

  • Miyoshi, Yasuo;Ochi, Youji;Okamoto, Ryo;Yano, Yoneo
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2001.01a
    • /
    • pp.292-297
    • /
    • 2001
  • In writing learning as a second/foreign language, a learner has to acquire not only lexical and syntactical knowledge but also the skills to choose suitable words for content which s/he is interested in. A learning system should extrapolate learner\\`s intention and give example phrases that concern with the content in order to support this on the system. However, a learner cannot always represent a content of his/her desired phrase as inputs to the system. Therefore, the system should be equipped with a diagnosis function for learner\\`s intention. Additionally, a system also should be equipped with an analysis function to score similarity between learner\\`s intention and phrases which is stored in the system on both syntactic and idiomatic level in order to present appropriate example phrases to a learner. In this paper, we propose architecture of an interactive support method for English writing learning which is based an analogical search technique of sample phrases from corpora. Our system can show a candidate of variation/next phrases to write and an analogous sentence that a learner wants to represents from corpora.

  • PDF

A Development of Elementary Digital Textbook with retrieval function (검색기능을 지원하는 초등 디지털교과서의 개발)

  • Lee, Yong-Bae
    • Journal of The Korean Association of Information Education
    • /
    • v.15 no.3
    • /
    • pp.425-437
    • /
    • 2011
  • The major functions of the digital textbooks that have been developed recently were focused on electronic whiteboard, learning tools, multimedia. But supporting contents retrieval was not included yet. This study is aimed to develop an elementary digital textbook that supports retrieval function. We surveyed elementary teachers about the requirements for search function and a digital textbook was designed and implemented based on the result of the survey. The distinctive feature of the developed digital textbook, which is written in XML, is that it is enabled to search the units of meaningful phrases. The survey which was carried out after using the new digital textbook showed that more than 90% of the teachers think that it is quite useful in class and they would like to use this textbook in their class. If a retrieval function can be added to the digital textbook which is going to be supplied to elementary schools, it will be a lot helpful for teaching and learning procedure.

  • PDF

Range Detection of Wa/Kwa Parallel Noun Phrase by Alignment method (정렬기법을 활용한 와/과 병렬명사구 범위 결정)

  • Choe, Yong-Seok;Sin, Ji-Ae;Choe, Gi-Seon;Kim, Gi-Tae;Lee, Sang-Tae
    • Proceedings of the Korean Society for Emotion and Sensibility Conference
    • /
    • 2008.10a
    • /
    • pp.90-93
    • /
    • 2008
  • In natural language, it is common that repetitive constituents in an expression are to be left out and it is necessary to figure out the constituents omitted at analyzing the meaning of the sentence. This paper is on recognition of boundaries of parallel noun phrases by figuring out constituents omitted. Recognition of parallel noun phrases can greatly reduce complexity at the phase of sentence parsing. Moreover, in natural language information retrieval, recognition of noun with modifiers can play an important role in making indexes. We propose an unsupervised probabilistic model that identifies parallel cores as well as boundaries of parallel noun phrases conjoined by a conjunctive particle. It is based on the idea of swapping constituents, utilizing symmetry (two or more identical constituents are repeated) and reversibility (the order of constituents is changeable) in parallel structure. Semantic features of the modifiers around parallel noun phrase, are also used the probabilistic swapping model. The model is language-independent and in this paper presented on parallel noun phrases in Korean language. Experiment shows that our probabilistic model outperforms symmetry-based model and supervised machine learning based approaches.

  • PDF

Phrase-based Indexing for Korean Information Retrieval System (한국어 정보검색 시스템을 위한 구 단위 색인)

  • 윤성희
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.5 no.1
    • /
    • pp.44-48
    • /
    • 2004
  • This paper proposes a phrase-based indexing system based on the phrase. the larger syntax unit than a single keyword. Early information retrieval systems with indexing system matching single keyword is simple and popular. But with single keyword matching it is very hard to represent the exact meaning of documents and the set of documents from retrieval is very large, therefore it can't satisfy the user of the information retrieval systems. Web documents include lots of syntactic errors, the natural language parser with high quality cannot be expected in Web. Partial trees, even not a full tree, from fully bottom-up parsing is still useful for extracting phrases, and they are much more discriminative than single keyword for index. It helps the information retrieval system enhance the efficiency and reduce the processing overhead, too.

  • PDF

Opinion Retrieval in Twitter Considering Syntactic Relations of Sentiment Phrase (의견 어구의 구문 관계를 고려한 트위터 의견 검색)

  • Kim, Yoonsung;Yang, Min-Chul;Lee, Seung-Wook;Rim, Hae-Chang
    • KIISE Transactions on Computing Practices
    • /
    • v.20 no.9
    • /
    • pp.492-497
    • /
    • 2014
  • In this paper, we propose a method of retrieving opinioned tweets in Twitter, which is the one of the popular Social Network Services and shares diverse opinions among various users. In typical opinion retrieval systems, they may consider the presence of sentiment phrases (subjectivity) as the important factor even if the subjective phrases are not related to a given query or speaker. To alleviate these problems, we utilized the syntactic structure of a sentence to identify the relationships between 1) subjectivity-query and 2) subjectivity-speaker and 3) the syntactic role of subjectivity. Besides, our learning-to-rank approach is trained to retrieve opinioned tweets based on query-relevance, textual features, user information, and Twitter-specific features. Experimental results on real world data show that our proposed method can achieve better performance than several baseline methods in terms of precision and nDCG.

A Study on Natural Language Keyword Indexing for Web-based Information Retrieval (웹기반 정보검색을 위한 자연어 키워드 색인에 관한 연구)

  • 윤성희
    • Journal of the Korea Computer Industry Society
    • /
    • v.4 no.12
    • /
    • pp.1103-1111
    • /
    • 2003
  • Information retrieval system with indexing system matching single keyword is simple and popular. But with single keyword matching it is very hard to represent the exact meaning of documents and the set of documents from retrieval is very large, therefore it can't satisfy the user of the information retrieval systems. This paper proposes a phrase-based indexing system based on the phrase, the larger syntax unit than a single keyword. Web documents include lots of syntactic errors, the natural language parser with high Quality cannot be expected in Web. Partial trees, even not a full tree, from fully bottom-up parsing is still useful for extracting phrases, and they are much more discriminative than single keyword for index. It helps the information retrieval system enhance the efficiency and reduce the processing overhead.

  • PDF

The Development of an Automatic Indexing System based on a Thesaurus (시소러스를 기반으로 하는 자동색인 시스템에 관한 연구)

  • 임형묵;정상철
    • Korean Journal of Cognitive Science
    • /
    • v.4 no.1
    • /
    • pp.213-242
    • /
    • 1993
  • During the past decades,several automatic indexing systems have been developed such as single term indexing.phrase indexing and thesaurus basedidndexing systems.Among these systems,single term indexing has been known as superior to others despte its simpicity of extracting meaningful terms.On the other hand,thesaurus based one has been conceived as producing low retrival rate ,mainly because thesauri do not usually have enough index terms.so that much of text data fail to be indexed if they do not match with any of index terms in thesauri.This paper develops a thesaurus based indexing system THINS that yields higher retrieval rate than other systems.by doing syntactic analysis of text data and matching them with index terms in thesauri partially.First,the system analyzes the input text syntactically by using the machine translation suystem MATES/EK and extracts noun phrases.After deleting stop words from noun phrases and stemming the remaining ones.it tries to index these with similar index terms in the thesaurus as much as possible. We conduct an experiment with CACM data set that measures the retrieval effectiveness with CACM data set that measures the retrieval effectuvenss of THINS with single term based one under HYKIS-a thesaurus based information retrieval system.It turns out that THINS yields about 10 percent higher precision than single term based one.while shows 8to9 percent lower recall.This retrieval rate shows that THINS improves much better than privious ones that only yields 25 or 30 percent lower precision than single term based one.We also argue that the relatively lower recall is cause by that CRCS-the thesaurus included in CACM datea set is very incomplete one,having only more than one thousand terms,thus THINS is expected to produce much higher rate if it is associated with currently available large thesaurus.

An n-gram-based Indexing Method for Effective Retrieval of Hangul Texts (한글 문서의 효과적인 검색을 위한 n-gram 기반의 색인 방법)

  • 이준호;안정수;박현주;김명호
    • Journal of the Korean Society for information Management
    • /
    • v.13 no.1
    • /
    • pp.47-63
    • /
    • 1996
  • Conventional automatic indexing methods for Hangul texts can be classified into two groups as follows: One is to extract index terms by removing non-indexable segments from word-phrases, and the other is to generate index terms from the morphemes of word-phrases. The former suffers from the problem of word boundaries when documents contain many compound nouns. The latter can overcome the word boundary problem by extracting simple nouns, but has many overheads to develop a lot of linguistic knowledges needed in the indexing procedure. In this paper we propose a new indexing method based on n-grams. This method alleviates the problems of previous indexing methods related with word boundaries and linguistic knowledges. We also compare the effectiveness of the n-gram based indexing method with that of the previous ones.

  • PDF

Safe clinical photography: best practice guidelines for risk management and mitigation

  • Chandawarkar, Rajiv;Nadkarni, Prakash
    • Archives of Plastic Surgery
    • /
    • v.48 no.3
    • /
    • pp.295-304
    • /
    • 2021
  • Clinical photography is an essential component of patient care in plastic surgery. The use of unsecured smartphone cameras, digital cameras, social media, instant messaging, and commercially available cloud-based storage devices threatens patients' data safety. This paper Identifies potential risks of clinical photography and heightens awareness of safe clinical photography. Specifically, we evaluated existing risk-mitigation strategies globally, comparing them to industry standards in similar settings, and formulated a framework for developing a risk-mitigation plan for avoiding data breaches by identifying the safest methods of picture taking, transfer to storage, retrieval, and use, both within and outside the organization. Since threats evolve constantly, the framework must evolve too. Based on a literature search of both PubMed and the web (via Google) with key phrases and child terms (for PubMed), the risks and consequences of data breaches in individual processes in clinical photography are identified. Current clinical-photography practices are described. Lastly, we evaluate current risk mitigation strategies for clinical photography by examining guidelines from professional organizations, governmental agencies, and non-healthcare industries. Combining lessons learned from the steps above into a comprehensive framework that could contribute to national/international guidelines on safe clinical photography, we provide recommendations for best practice guidelines. It is imperative that best practice guidelines for the simple, safe, and secure capture, transfer, storage, and retrieval of clinical photographs be co-developed through cooperative efforts between providers, hospital administrators, clinical informaticians, IT governance structures, and national professional organizations. This would significantly safeguard patient data security and provide the privacy that patients deserve and expect.