• Title/Summary/Keyword: vocabulary expansion

Search Result 16, Processing Time 0.021 seconds

Vocabulary Expansion Technique for Advertisement Classification

  • Jung, Jin-Yong;Lee, Jung-Hyun;Ha, Jong-Woo;Lee, Sang-Keun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.6 no.5
    • /
    • pp.1373-1387
    • /
    • 2012
  • Contextual advertising is an important revenue source for major service providers on the Web. Ads classification is one of main tasks in contextual advertising, and it is used to retrieve semantically relevant ads with respect to the content of web pages. However, it is difficult for traditional text classification methods to achieve satisfactory performance in ads classification due to scarce term features in ads. In this paper, we propose a novel ads classification method that handles the lack of term features for classifying ads with short text. The proposed method utilizes a vocabulary expansion technique using semantic associations among terms learned from large-scale search query logs. The evaluation results show that our methodology achieves 4.0% ~ 9.7% improvements in terms of the hierarchical f-measure over the baseline classifiers without vocabulary expansion.

Topic Expansion based on Infinite Vocabulary Online LDA Topic Model using Semantic Correlation Information (무한 사전 온라인 LDA 토픽 모델에서 의미적 연관성을 사용한 토픽 확장)

  • Kwak, Chang-Uk;Kim, Sun-Joong;Park, Seong-Bae;Kim, Kweon Yang
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.9
    • /
    • pp.461-466
    • /
    • 2016
  • Topic expansion is an expansion method that reflects external data for improving quality of learned topic. The online learning topic model is not appropriate for topic expansion using external data, because it does not reflect unseen words to learned topic model. In this study, we proposed topic expansion method using infinite vocabulary online LDA. When unseen words appear in learning process, the proposed method allocates unseen word to topic after calculating semantic correlation between unseen word and each topic. To evaluate the proposed method, we compared with existing topic expansion method. The results indicated that the proposed method includes additional information that is not contained in broadcasting script by reflecting external documents. Also, the proposed method outperformed on coherence evaluation.

Survey of Automatic Query Expansion for Arabic Text Retrieval

  • Farhan, Yasir Hadi;Noah, Shahrul Azman Mohd;Mohd, Masnizah
    • Journal of Information Science Theory and Practice
    • /
    • v.8 no.4
    • /
    • pp.67-86
    • /
    • 2020
  • Information need has been one of the main motivations for a person using a search engine. Queries can represent very different information needs. Ironically, a query can be a poor representation of the information need because the user can find it difficult to express the information need. Query Expansion (QE) is being popularly used to address this limitation. While QE can be considered as a language-independent technique, recent findings have shown that in certain cases, language plays an important role. Arabic is a language with a particularly large vocabulary rich in words with synonymous shades of meaning and has high morphological complexity. This paper, therefore, provides a review on QE for Arabic information retrieval, the intention being to identify the recent state-of-the-art of this burgeoning area. In this review, we primarily discuss statistical QE approaches that include document analysis, search, browse log analyses, and web knowledge analyses, in addition to the semantic QE approaches, which use semantic knowledge structures to extract meaningful word relationships. Finally, our conclusion is that QE regarding the Arabic language is subjected to additional investigation and research due to the intricate nature of this language.

Incorporating Deep Median Networks for Arabic Document Retrieval Using Word Embeddings-Based Query Expansion

  • Yasir Hadi Farhan;Mohanaad Shakir;Mustafa Abd Tareq;Boumedyen Shannaq
    • Journal of Information Science Theory and Practice
    • /
    • v.12 no.3
    • /
    • pp.36-48
    • /
    • 2024
  • The information retrieval (IR) process often encounters a challenge known as query-document vocabulary mismatch, where user queries do not align with document content, impacting search effectiveness. Automatic query expansion (AQE) techniques aim to mitigate this issue by augmenting user queries with related terms or synonyms. Word embedding, particularly Word2Vec, has gained prominence for AQE due to its ability to represent words as real-number vectors. However, AQE methods typically expand individual query terms, potentially leading to query drift if not carefully selected. To address this, researchers propose utilizing median vectors derived from deep median networks to capture query similarity comprehensively. Integrating median vectors into candidate term generation and combining them with the BM25 probabilistic model and two IR strategies (EQE1 and V2Q) yields promising results, outperforming baseline methods in experimental settings.

The Pluralistic Development of Postmodern Landscape Design (포스트모던 조경설계의 다원적 전개 양상)

  • Kim, Han-Bai
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.32 no.6 s.107
    • /
    • pp.68-81
    • /
    • 2005
  • The styles of contemporary landscape design have diversified since the emergence of Postmodernism in landscape architecture. The diversification was mostly influenced by contemporary fine arts and architecture. This study examines the pluralistic development of Postmodern landscape design through the investigation of the influences from those sister arts. In this point of view, the main approaches of Postmodern landscape design are thought to be classified into three categories;'the formal abstract approach', 'the figurative approach' and 'the new picturesque approach'. The first category of the formal abstract approach was formulated with the concepts and vocabulary of Minimal Art and Installation Art. Its representative icons such as 'point grids' and 'stripes', and the main concepts such as the sense of 'flahess', 'expansion' and 'materiality' are mostly thought to be originated from these art forms. The second category of the figurative approach is characterised by the concepts and vocabulary of Pop Art and New Image Paintings. Its representative icons such as 'map' or 'figurative forms' and main concepts like the sense of 'reality', 'context' and 'symbolism' are mostly thought to be originated from these art forms. The third category of the new picturesque approach was formulated with the concepts and vocabulary of Land Art and Late Deconstructive Architecture. Its representative icons such as 'hybrid', 'layer' and 'fold', and the main concepts such as the sense of 'complexity', 'continuity' and 'reversibility' are thought to be originated from these art forms. The research shows that the main stream of contemporary landscape design seems to be gradually moving toward the second and third approach above, in step with the cultural orientation and the dynamism of contemporary urban life. Therefore, the study focused especially on the new picturesque approach which would be in greater need for coping with the hybrid culture today.

Word Embeddings-Based Pseudo Relevance Feedback Using Deep Averaging Networks for Arabic Document Retrieval

  • Farhan, Yasir Hadi;Noah, Shahrul Azman Mohd;Mohd, Masnizah;Atwan, Jaffar
    • Journal of Information Science Theory and Practice
    • /
    • v.9 no.2
    • /
    • pp.1-17
    • /
    • 2021
  • Pseudo relevance feedback (PRF) is a powerful query expansion (QE) technique that prepares queries using the top k pseudorelevant documents and choosing expansion elements. Traditional PRF frameworks have robustly handled vocabulary mismatch corresponding to user queries and pertinent documents; nevertheless, expansion elements are chosen, disregarding similarity to the original query's elements. Word embedding (WE) schemes comprise techniques of significant interest concerning QE, that falls within the information retrieval domain. Deep averaging networks (DANs) defines a framework relying on average word presence passed through multiple linear layers. The complete query is understandably represented using the average vector comprising the query terms. The vector may be employed for determining expansion elements pertinent to the entire query. In this study, we suggest a DANs-based technique that augments PRF frameworks by integrating WE similarities to facilitate Arabic information retrieval. The technique is based on the fundamental that the top pseudo-relevant document set is assessed to determine candidate element distribution and select expansion terms appropriately, considering their similarity to the average vector representing the initial query elements. The Word2Vec model is selected for executing the experiments on a standard Arabic TREC 2001/2002 set. The majority of the evaluations indicate that the PRF implementation in the present study offers a significant performance improvement compared to that of the baseline PRF frameworks.

English-Korean Transfer Dictionary Extension Tool in English-Korean Machine Translation System (영한 기계번역 시스템의 영한 변환사전 확장 도구)

  • Kim, Sung-Dong
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.1
    • /
    • pp.35-42
    • /
    • 2013
  • Developing English-Korean machine translation system requires the construction of information about the languages, and the amount of information in English-Korean transfer dictionary is especially critical to the translation quality. Newly created words are out-of-vocabulary words and they appear as they are in the translated sentence, which decreases the translation quality. Also, compound nouns make lexical and syntactic analysis complex and it is difficult to accurately translate compound nouns due to the lack of information in the transfer dictionary. In order to improve the translation quality of English-Korean machine translation, we must continuously expand the information of the English-Korean transfer dictionary by collecting the out-of-vocabulary words and the compound nouns frequently used. This paper proposes a method for expanding of the transfer dictionary, which consists of constructing corpus from internet newspapers, extracting the words which are not in the existing dictionary and the frequently used compound nouns, attaching meaning to the extracted words, and integrating with the transfer dictionary. We also develop the tool supporting the expansion of the transfer dictionary. The expansion of the dictionary information is critical to improving the machine translation system but requires much human efforts. The developed tool can be useful for continuously expanding the transfer dictionary, and so it is expected to contribute to enhancing the translation quality.

Fuzzy Query Processing through Two-level Similarity Relation Matrices Construction (2계층 유사관계행렬 구축을 통한 질의 처리)

  • 이기영
    • Journal of the Korea Computer Industry Society
    • /
    • v.4 no.10
    • /
    • pp.587-598
    • /
    • 2003
  • This paper construct two-level word similarity relation matrices about title and to scientific treatise. As guide keyword similarity relation matrices which is constructed to co-occurrence frequency base same time keeps recall rater by query expansion by tolerance relation, it is index structure to improve the precision rate by two-level contents base retrieval. Therefore, draw area knowledge through subject analysis and reasoned user's information request and area knowledge to fuzzy logic base. This research is research to improve vocabulary mismatch problem and information expression having essentially on query.

  • PDF

A comparative Study of English Loans in Russian and Swahili

  • Dzahene-Quarshie, Josephine;Csajbok-Twerefou, Ildiko
    • Cross-Cultural Studies
    • /
    • v.24
    • /
    • pp.99-111
    • /
    • 2011
  • This paper is a comparative study of English loans in Russian and Swahili. In the twenty first century, due to the advantage of English as a global language, a language of technology and business, it has had contact with many languages of the world and has become a major source of loans to many languages. Though very different from each other, both Russian and Swahili currently have English as their main source of loanwords. This study reports the extensive adaptation of English loans by Russian and Swahili and examines how these loan items are assimilated into the two languages. It concludes that besides the adaption of pure English loans they have both employed other strategies such as loan translations, semantic extensions and loanblends for vocabulary expansion.

A Study on the Interior Design representation-language from image scale of Trend - Focused on 2008~09 international Fair - (이미지 스케일에 따른 트랜드 중심의 실내디자인 표현어휘 연구 - 2008~09년도 국제박람회를 중심으로 -)

  • Sheen, Dong-Kwan;Han, Young-Ho
    • Korean Institute of Interior Design Journal
    • /
    • v.19 no.1
    • /
    • pp.112-120
    • /
    • 2010
  • Generally, Interior design understood as shapes, lines, spaces, tones, quality, and principal. And people research these with a study of formative approach. The six elements of above categories are basis of expansion for the various designs. In this study, the represented design language is extracted by the basic 6 stages and re-divided into classes. This study presented by the Kobayashi-scale image of the current trends in the assignment to examine the cultural, functional, sensory vocabulary, three elements were classified by the assignment about 2008~2009 international fair by trend and design direction for the image. If we look into the categorized format of the design, it reflects mixed culture with emotional approach and shows direction of design constantly. Especially, compared to the year 2008, the design of year 2009 has tendency as an emotional translator regarding the verbal expression. In other words, it becomes more concrete to express the design of emotions for human beings. In addition, design shows detail of its flow with outcomes of past leading trend which was re-created shape. On the other hand, undeveloped cultures such as folk, historical, and unique cultures attract design leaders as it is. This research would make good relationship between designers and customers regarding the newly started international trend of design. Hereby I research with the method to reclassify the image of vocabulary from the image scale extract. It remains as a task to resolve ambiguous, complex and neutral expression for better understanding and definite analysis method to the public.