• 제목/요약/키워드: vocabulary expansion

검색결과 15건 처리시간 0.021초

Vocabulary Expansion Technique for Advertisement Classification

  • Jung, Jin-Yong;Lee, Jung-Hyun;Ha, Jong-Woo;Lee, Sang-Keun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제6권5호
    • /
    • pp.1373-1387
    • /
    • 2012
  • Contextual advertising is an important revenue source for major service providers on the Web. Ads classification is one of main tasks in contextual advertising, and it is used to retrieve semantically relevant ads with respect to the content of web pages. However, it is difficult for traditional text classification methods to achieve satisfactory performance in ads classification due to scarce term features in ads. In this paper, we propose a novel ads classification method that handles the lack of term features for classifying ads with short text. The proposed method utilizes a vocabulary expansion technique using semantic associations among terms learned from large-scale search query logs. The evaluation results show that our methodology achieves 4.0% ~ 9.7% improvements in terms of the hierarchical f-measure over the baseline classifiers without vocabulary expansion.

무한 사전 온라인 LDA 토픽 모델에서 의미적 연관성을 사용한 토픽 확장 (Topic Expansion based on Infinite Vocabulary Online LDA Topic Model using Semantic Correlation Information)

  • 곽창욱;김선중;박성배;김권양
    • 정보과학회 컴퓨팅의 실제 논문지
    • /
    • 제22권9호
    • /
    • pp.461-466
    • /
    • 2016
  • 토픽 확장은 학습된 토픽의 질을 향상시키기 위해 추가적인 외부 데이터를 반영하여 점진적으로 토픽을 확장하는 방법이다. 기존의 온라인 학습 토픽 모델에서는 외부 데이터를 확장에 사용될 경우, 새로운 단어가 기존의 학습된 모델에 반영되지 않는다는 문제가 있었다. 본 논문에서는 무한 사전 온라인 LDA 토픽 모델을 이용하여 외부 데이터를 반영한 토픽 모델 확장 방법을 연구하였다. 토픽 확장 학습에서는 기존에 형성된 토픽과 추가된 외부 데이터의 단어와 유사도를 반영하여 토픽을 확장한다. 실험에서는 기존의 토픽 확장 모델들과 비교하였다. 비교 결과, 제안한 방법에서 외부 연관 문서 단어를 토픽 모델에 반영하기 때문에 대본 토픽이 다루지 못한 정보들을 토픽에 포함할 수 있었다. 또한, 일관성 평가에서도 비교 모델보다 뛰어난 성능을 나타냈다.

Survey of Automatic Query Expansion for Arabic Text Retrieval

  • Farhan, Yasir Hadi;Noah, Shahrul Azman Mohd;Mohd, Masnizah
    • Journal of Information Science Theory and Practice
    • /
    • 제8권4호
    • /
    • pp.67-86
    • /
    • 2020
  • Information need has been one of the main motivations for a person using a search engine. Queries can represent very different information needs. Ironically, a query can be a poor representation of the information need because the user can find it difficult to express the information need. Query Expansion (QE) is being popularly used to address this limitation. While QE can be considered as a language-independent technique, recent findings have shown that in certain cases, language plays an important role. Arabic is a language with a particularly large vocabulary rich in words with synonymous shades of meaning and has high morphological complexity. This paper, therefore, provides a review on QE for Arabic information retrieval, the intention being to identify the recent state-of-the-art of this burgeoning area. In this review, we primarily discuss statistical QE approaches that include document analysis, search, browse log analyses, and web knowledge analyses, in addition to the semantic QE approaches, which use semantic knowledge structures to extract meaningful word relationships. Finally, our conclusion is that QE regarding the Arabic language is subjected to additional investigation and research due to the intricate nature of this language.

포스트모던 조경설계의 다원적 전개 양상 (The Pluralistic Development of Postmodern Landscape Design)

  • 김한배
    • 한국조경학회지
    • /
    • 제32권6호
    • /
    • pp.68-81
    • /
    • 2005
  • The styles of contemporary landscape design have diversified since the emergence of Postmodernism in landscape architecture. The diversification was mostly influenced by contemporary fine arts and architecture. This study examines the pluralistic development of Postmodern landscape design through the investigation of the influences from those sister arts. In this point of view, the main approaches of Postmodern landscape design are thought to be classified into three categories;'the formal abstract approach', 'the figurative approach' and 'the new picturesque approach'. The first category of the formal abstract approach was formulated with the concepts and vocabulary of Minimal Art and Installation Art. Its representative icons such as 'point grids' and 'stripes', and the main concepts such as the sense of 'flahess', 'expansion' and 'materiality' are mostly thought to be originated from these art forms. The second category of the figurative approach is characterised by the concepts and vocabulary of Pop Art and New Image Paintings. Its representative icons such as 'map' or 'figurative forms' and main concepts like the sense of 'reality', 'context' and 'symbolism' are mostly thought to be originated from these art forms. The third category of the new picturesque approach was formulated with the concepts and vocabulary of Land Art and Late Deconstructive Architecture. Its representative icons such as 'hybrid', 'layer' and 'fold', and the main concepts such as the sense of 'complexity', 'continuity' and 'reversibility' are thought to be originated from these art forms. The research shows that the main stream of contemporary landscape design seems to be gradually moving toward the second and third approach above, in step with the cultural orientation and the dynamism of contemporary urban life. Therefore, the study focused especially on the new picturesque approach which would be in greater need for coping with the hybrid culture today.

Word Embeddings-Based Pseudo Relevance Feedback Using Deep Averaging Networks for Arabic Document Retrieval

  • Farhan, Yasir Hadi;Noah, Shahrul Azman Mohd;Mohd, Masnizah;Atwan, Jaffar
    • Journal of Information Science Theory and Practice
    • /
    • 제9권2호
    • /
    • pp.1-17
    • /
    • 2021
  • Pseudo relevance feedback (PRF) is a powerful query expansion (QE) technique that prepares queries using the top k pseudorelevant documents and choosing expansion elements. Traditional PRF frameworks have robustly handled vocabulary mismatch corresponding to user queries and pertinent documents; nevertheless, expansion elements are chosen, disregarding similarity to the original query's elements. Word embedding (WE) schemes comprise techniques of significant interest concerning QE, that falls within the information retrieval domain. Deep averaging networks (DANs) defines a framework relying on average word presence passed through multiple linear layers. The complete query is understandably represented using the average vector comprising the query terms. The vector may be employed for determining expansion elements pertinent to the entire query. In this study, we suggest a DANs-based technique that augments PRF frameworks by integrating WE similarities to facilitate Arabic information retrieval. The technique is based on the fundamental that the top pseudo-relevant document set is assessed to determine candidate element distribution and select expansion terms appropriately, considering their similarity to the average vector representing the initial query elements. The Word2Vec model is selected for executing the experiments on a standard Arabic TREC 2001/2002 set. The majority of the evaluations indicate that the PRF implementation in the present study offers a significant performance improvement compared to that of the baseline PRF frameworks.

영한 기계번역 시스템의 영한 변환사전 확장 도구 (English-Korean Transfer Dictionary Extension Tool in English-Korean Machine Translation System)

  • 김성동
    • 정보처리학회논문지:소프트웨어 및 데이터공학
    • /
    • 제2권1호
    • /
    • pp.35-42
    • /
    • 2013
  • 영한 기계번역 시스템을 개발하기 위해서는 언어에 대한 다양한 정보를 필요로 하며, 특히 영어 단어에 대한 의미 정보를 포함하는 영한 변환사전의 풍부한 정보량은 번역품질에 중요한 요소이다. 지속적으로 생성되는 새로운 단어들은 사전에 등록되어 있지 않아 번역문에 영어 단어가 그대로 출력되어 번역품질을 저하시킨다. 또한 복합명사는 어휘분석, 구문분석을 복잡하게 하고 사전에 의미가 등록되지 않은 경우가 많아 올바르게 번역하기 어렵다. 따라서 영한 기계번역의 번역품질 향상을 위해서는 사전에 등록되어 있지 않은 단어들과 자주 사용되는 복합명사들을 수집하고 의미 정보를 추가하여 영한 변환사전을 지속적으로 확장하는 것이 필요하다. 본 논문에서는 인터넷 신문기사로부터 말뭉치를 추출하고, 사전 미등록 단어와 자주 나타나는 복합명사를 찾은 후, 이들에 대해 의미를 부착하여 영한 변환사전에 추가하는 일련의 과정으로 구성되는 영한 변환사전의 확장 방안을 제안하고 이를 지원하는 도구를 개발하였다. 사전 정보의 확대는 많은 사람의 노력을 필요로 하는 일이지만, 영한 기계번역 시스템의 개선을 위해서는 필수적이다. 본 논문에서 개발한 도구는 사람의 노력을 최소화 하면서, 영한 변환사전의 정보량 지속적인 확대를 위해 유용하게 활용되어 영한 기계번역 시스템의 번역품질 개선에 기여할 것으로 기대된다.

2계층 유사관계행렬 구축을 통한 질의 처리 (Fuzzy Query Processing through Two-level Similarity Relation Matrices Construction)

  • 이기영
    • 한국컴퓨터산업학회논문지
    • /
    • 제4권10호
    • /
    • pp.587-598
    • /
    • 2003
  • 본 연구에서는 학술논문을 대상으로 하여 표제와 초록에 대한 2단계 색인어 유사관계행렬을 구축하였다. 동시출현빈도 기반으로 구축된 색인어 유사관계행렬은 호환관계에 따른 질의 확장으로 재현률을 유지하면서 2단계 내용기반 검색으로 정확률을 향상시키기 위한 색인구조이다. 따라서, 주제 분석을 통해 영역지식을 추출하고 이용자의 정보 요구와 영역지식을 퍼지논리 기반으로 추론하였다. 본 연구는 질의에 본질적으로 가지고 있는 용어 불일치 및 정보표현을 향상시키기 위한 연구이다.

  • PDF

A comparative Study of English Loans in Russian and Swahili

  • Dzahene-Quarshie, Josephine;Csajbok-Twerefou, Ildiko
    • 비교문화연구
    • /
    • 제24권
    • /
    • pp.99-111
    • /
    • 2011
  • This paper is a comparative study of English loans in Russian and Swahili. In the twenty first century, due to the advantage of English as a global language, a language of technology and business, it has had contact with many languages of the world and has become a major source of loans to many languages. Though very different from each other, both Russian and Swahili currently have English as their main source of loanwords. This study reports the extensive adaptation of English loans by Russian and Swahili and examines how these loan items are assimilated into the two languages. It concludes that besides the adaption of pure English loans they have both employed other strategies such as loan translations, semantic extensions and loanblends for vocabulary expansion.

이미지 스케일에 따른 트랜드 중심의 실내디자인 표현어휘 연구 - 2008~09년도 국제박람회를 중심으로 - (A Study on the Interior Design representation-language from image scale of Trend - Focused on 2008~09 international Fair -)

  • 신동관;한영호
    • 한국실내디자인학회논문집
    • /
    • 제19권1호
    • /
    • pp.112-120
    • /
    • 2010
  • Generally, Interior design understood as shapes, lines, spaces, tones, quality, and principal. And people research these with a study of formative approach. The six elements of above categories are basis of expansion for the various designs. In this study, the represented design language is extracted by the basic 6 stages and re-divided into classes. This study presented by the Kobayashi-scale image of the current trends in the assignment to examine the cultural, functional, sensory vocabulary, three elements were classified by the assignment about 2008~2009 international fair by trend and design direction for the image. If we look into the categorized format of the design, it reflects mixed culture with emotional approach and shows direction of design constantly. Especially, compared to the year 2008, the design of year 2009 has tendency as an emotional translator regarding the verbal expression. In other words, it becomes more concrete to express the design of emotions for human beings. In addition, design shows detail of its flow with outcomes of past leading trend which was re-created shape. On the other hand, undeveloped cultures such as folk, historical, and unique cultures attract design leaders as it is. This research would make good relationship between designers and customers regarding the newly started international trend of design. Hereby I research with the method to reclassify the image of vocabulary from the image scale extract. It remains as a task to resolve ambiguous, complex and neutral expression for better understanding and definite analysis method to the public.

MMORPG의 감성평가 체크리스트에 관한 연구 (A Study on the Checklist of Emotional Evaluation for MMORPG)

  • 박상진;곽훈성;서미라
    • 한국콘텐츠학회논문지
    • /
    • 제6권11호
    • /
    • pp.217-224
    • /
    • 2006
  • 대규모 다인 접속이 가능한 온라인 게임이 인기를 누리면서 다수의 온라인 게임 제작회사의 개발 건수는 기하급수적으로 증가하게 되었다. 하지만 게임의 양적 팽창속도에 비해 질적 팽창속도가 따라가지 못하는 이유는 소수 경쟁적 우위를 점하고 있는 선두업체를 제외하고는 대다수가 영세 업체이기 때문에 제대로 된 제작 절차 부재와 출시 전 정확한 테스트가 수반되지 않아 예측 가능한 결과에 유연하게 대처하지 못하기 때문이다. 현재는 게임성 점검 단계를 통해 중요한 기능이 구현되지 않거나 특정한 부분의 치명적인 에러를 찾는데 쓰이고 있는 실정이다. 이에 본 연구는 기존의 사용성 평가 시스템에 치우쳐 있는 평가체계를 사용성과 더불어 게임을 하면서 사용자가 느끼는 감성을 평가할 수 있는 평가시스템을 제안하기 위해 게임에서 느끼는 감성어휘를 수집하고 요인분석을 통해 Interactivity, Interface, Information 요인으로 분류한 후 그에 맞는 평가문항을 설계하여 감성평가시스템을 제안한다.

  • PDF