• Title/Summary/Keyword: vocabulary search

Search Result 73, Processing Time 0.037 seconds

Gaussian Optimization of Vocabulary Recognition Clustering Model using Configuration Thread Control (형상 형성 제어를 이용한 어휘인식 공유 모델의 가우시안 최적화)

  • Ahn, Chan-Shik;Oh, Sang-Yeob
    • Journal of the Korea Society of Computer and Information
    • /
    • v.15 no.2
    • /
    • pp.127-134
    • /
    • 2010
  • In continuous vocabulary recognition system by probability distribution of clustering method has used model parameters of an advance estimate to generated each contexts for phoneme data surely needed but it has it's bad points of gaussian model the accuracy unsecure of composed model for phoneme data. To improve suggested probability distribution mixed gaussian model to optimized that phoneme data search supported configuration thread system. This paper of configuration thread system has used extension facet classification user phoneme configuration thread information offered gaussian model the accuracy secure. System performance as a result of represent vocabulary dependence recognition rate of 98.31%, vocabulary independence recognition rate of 97.63%.

The implementation of the search system by Human sensibility Ergonomics for customer shopping benefit based on Internet shopping mall (인터넷 쇼핑몰에서 고객 쇼핑편익을 위한 감성공학적 검색 System 구현)

  • 오진희;김돈한
    • Archives of design research
    • /
    • v.13 no.1
    • /
    • pp.49-58
    • /
    • 2000
  • This study is to implement the search system of human sensibility ergonomics in the internet shopping mall, which is a the electronic commerce in the contemporary as a shopping culture on the internet. Instead a category of business, an item, cost & size is using the keyword of a search in a existing shopping mall, the research is accomplished the center of system selecting products by the sensitivity feeling in products. The search system chooses the proper item and makes database with the sensible vocabulary for its image and then searches the item chosen by customers with keywords of the vocabulary after constructing web-server on the internet. This study - systematizes customers' sensible needs with more practical ways. - recognize the customers' sense on items and provides the applied technology conditions tor customers. - gives more opportunities of choice to customers on the internet shopping mall. - supplies various information and approaches to the customers' needs with practice.

  • PDF

A Study on the Retrieval Effectiveness of KoreaMed using MeSH Search Filter and Word-Proximity Search (검색용 MeSH 필터와 단어인접탐색 기법을 활용한 KoreaMed 검색 효율성 향상 연구)

  • Jeong, So-Na;Jeong, Ji-Na
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.18 no.5
    • /
    • pp.596-607
    • /
    • 2017
  • This study examined the method for adding related to "stomach neoplasms" as filters to the Medical Subject Headings (MeSH) for search as well as a method for improving the search efficiency through a word-proximity search by measuring the distance of co-occurring terms. A total of 8,625 articles published between 2007 and 2016 with the major topic terms "stomach neoplasms" were downloaded from PubMed article titles. The vocabulary to be added to the MeSH for search were analyzed. The search efficiency was verified by 277 articles that had "Stomach Neoplasms" indexed as MEDLINE MeSH in KoreaMed. As a result, 973 terms were selected as the candidate vocabulary. "Gastric Cancer" (2,780 appearances) was the most frequent term and 7,376 compound words (88.51%) combined the histological terms of "stomach" and "neoplasm", such as "gastric adenocarcinoma" and "gastric MALT lymphoma". A total of 5,234 compounds words (70.95%), in which the co-occurring distance was two words, were found. The matching rate through the MEDLINE MeSH and KoreaMed MeSH Indexer was 209 articles (75.5%). The search efficiency improved to 263 articles (94.9%) when the search filters were added, and to 268 articles (96.7%) when the 13 word-proximity search technique of the co-occurring terms was applied. This study showed that the use of a thesaurus as a means of improving the search efficiency in a natural language search could maintain the advantages of controlled vocabulary. The search accuracy can be improved using the word-proximity search instead of a Boolean search.

Korean Broadcast News Transcription Using Morpheme-based Recognition Units

  • Kwon, Oh-Wook;Alex Waibel
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.1E
    • /
    • pp.3-11
    • /
    • 2002
  • Broadcast news transcription is one of the hardest tasks in speech recognition because broadcast speech signals have much variability in speech quality, channel and background conditions. We developed a Korean broadcast news speech recognizer. We used a morpheme-based dictionary and a language model to reduce the out-of·vocabulary (OOV) rate. We concatenated the original morpheme pairs of short length or high frequency in order to reduce insertion and deletion errors due to short morphemes. We used a lexicon with multiple pronunciations to reflect inter-morpheme pronunciation variations without severe modification of the search tree. By using the merged morpheme as recognition units, we achieved the OOV rate of 1.7% comparable to European languages with 64k vocabulary. We implemented a hidden Markov model-based recognizer with vocal tract length normalization and online speaker adaptation by maximum likelihood linear regression. Experimental results showed that the recognizer yielded 21.8% morpheme error rate for anchor speech and 31.6% for mostly noisy reporter speech.

Group-wise Keyword Extraction of the External Audit using Text Mining and Association Rules (텍스트마이닝과 연관규칙을 이용한 외부감사 실시내용의 그룹별 핵심어 추출)

  • Seong, Yoonseok;Lee, Donghee;Jung, Uk
    • Journal of Korean Society for Quality Management
    • /
    • v.50 no.1
    • /
    • pp.77-89
    • /
    • 2022
  • Purpose: In order to improve the audit quality of a company, an in-depth analysis is required to categorize the audit report in the form of a text document containing the details of the external audit. This study introduces a systematic methodology to extract keywords for each group that determines the differences between groups such as 'audit plan' and 'interim audit' using audit reports collected in the form of text documents. Methods: The first step of the proposed methodology is to preprocess the document through text mining. In the second step, the documents are classified into groups using machine learning techniques and based on this, important vocabularies that have a dominant influence on the performance of classification are extracted. In the third step, the association rules for each group's documents are found. In the last step, the final keywords for each group representing the characteristics of each group are extracted by comparing the important vocabulary for classification with the important vocabulary representing the association rules of each group. Results: This study quantitatively calculates the importance value of the vocabulary used in the audit report based on machine learning rather than the qualitative research method such as the existing literature search, expert evaluation, and Delphi technique. From the case study of this study, it was found that the extracted keywords describe the characteristics of each group well. Conclusion: This study is meaningful in that it has laid the foundation for quantitatively conducting follow-up studies related to key vocabulary in each stage of auditing.

Implementation of the Automatic Speech Editing System Using Keyword Spotting Technique (핵심어 인식을 이용한 음성 자동 편집 시스템 구현)

  • Chung, Ik-Joo
    • Speech Sciences
    • /
    • v.3
    • /
    • pp.119-131
    • /
    • 1998
  • We have developed a keyword spotting system for automatic speech editing. This system recognizes the only keyword 'MBC news' and then sends the time information to the host system. We adopted a vocabulary dependent model based on continuous hidden Markov model, and the Viterbi search was used for recognizing the keyword. In recognizing the keyword, the system uses a parallel network where HMM models are connected independently and back-tracking information for reducing false alarms and missing. We especially focused on implementing a stable and practical real-time system.

  • PDF

Advanced Procedure and Computing System for Standardization of IEC Terminologies (선진화된 IEC 기술용어 표준화 구축절차 및 전산시스템)

  • Hwang, Humor;Kim, Jung-Hoon;Moon, Bong-Hee
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.65 no.3
    • /
    • pp.388-396
    • /
    • 2016
  • Through the correspondence works with international electrotechnical vocabulary(IEV) in the smart grid field and power information technology field, we analyzed cases for discussion of terms and definitions in the IEV and then proposed an advanced procedure and computing system for standardization of International Electronical Committee(IEC) terminologies. The standardization procedure consists of processes for existing terminology, new terminology and correspondent terminology which have different structures. An example of the standardization work of correspondent terminology is given. The standardization computing system are based on the process for terminology extraction, terminology verification and terminology management which could provide the Wikipedia type terminology search function. In order to prevent that there exist multiple terminologies in IEV, the database search system is needed to be developed. We proposed the 'IEV_Term_Search' program which is the database search system. Terminology standardization of different technical committees(TC) and completion of the IEV to promote cooperation between TC 1 and the TCs must be followed by revision and standardization using the standardization computing system.

Component Retrieval using Extended Software Component Descriptor (확장된 소프트웨어 컴포넌트 서술자에 기초한 컴포넌트 저장소의 검색)

  • Geum, Yeong-Uk;Park, Byeong-Seop
    • The KIPS Transactions:PartD
    • /
    • v.9D no.3
    • /
    • pp.417-426
    • /
    • 2002
  • Components are stored in a component repository for later reuse. Effective search and retrieval of desired components in a component repository is a very important issue. It usually takes a lot of time and efforts to gather information about a component, and its availability is essential to implement a repository. Software Component Descriptor proposed in CORBA 3 contains information about a component using an XML vocabulary. In this paper we extend Software Component Descriptor to be useful for the search of a component repository. We use a facet scheme as a search method of a component repository. And our new retrieval method supports queries connected with logical operators such as AND, OR, NOT, which were not supported with existing facet retrieval methods. Also we reduce the search complexity considerably.

Design and Implementation of Ontology-Based Natural Language Search System (온톨로지 기반의 자연어 검색 시스템 설계 및 구현)

  • Kang, Rae-Goo;Lim, Dong-Il;Jung, Chai-Yeoung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2007.10a
    • /
    • pp.875-878
    • /
    • 2007
  • Up until now, when a user search product information, the keyword-based search that mainly uses frequency of words or vocabulary information has been utilized in large. In the keyword-based research, the user should have to bear additional burden in order to search the displayed results manually once again because it shows those files that have no connection at all with the inquiries made by the user. To resolve such a problem, ontology has been emerged. In this paper, product search system using ontology was constructed directly and also tested how accurate search it does perform through the searching according to classification. To test this, about 40,000 product data of A discount store, which was operating on/off line discount stores, were constructed as database, and developmental environment for User Interface was tested by having developed the search system using JSP and PowerBuilder 9.0. Results from the test proved that the search method using Domain Ontology for product presented and designed in this paper was superior to the existing keyword-based search method.

  • PDF

A Study on OOV Rejection Using Viterbi Search Characteristics (Viterbi 탐색 특성을 이용한 미등록어휘 제거에 대한 연구)

  • Kim, Kyu-Hong;Kim, Hoi-Rin
    • Proceedings of the KSPS conference
    • /
    • 2005.04a
    • /
    • pp.95-98
    • /
    • 2005
  • Many utterance verification (UV) algorithms have been studied to reject out-of-vocabulary (OOV) in speech recognition systems. Most of conventional confidence measures for UV algorithms are mainly based on log likelihood ratio test, but these measures take much time to evaluate the alternative hypothesis or anti-model likelihood. We propose a novel confidence measure which makes use of a momentary best scored state sequence during Viterbi search. Our approach is more efficient than conventional LRT-based algorithms because it does not need to build anti-model or to calculate the alternative hypothesis. The proposed confidence measure shows better performance in additive noise-corrupted speech as well as clean speech.

  • PDF