• Title/Summary/Keyword: vocabulary data

Search Result 285, Processing Time 0.028 seconds

Variable Vocabulary Word Recognizer using Phonetic Knowledge-based Allophone Model (음성학적 지식 기반 변이음 모델을 이용한 가변 어휘 단어 인식기)

  • Kim, Hoi-Rin;Lee, Hang-Seop
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.2
    • /
    • pp.31-35
    • /
    • 1997
  • In this paper, we propose a variable vocabulary word recognizer that is able to recognize new words not exist in training data. For the variable vocabulary word recognizer, we must have an on-line lexicon generator to transform new candidate words to the corresponding pronunciation sequences of phones without any large lexicon table. And, we also must make outputs. In order to model the phones and allophones reliably, we define Korean allophones by triphone clustering based on phonetic knowledge of preceding and succeeding phones of each phone. Using the clustering method, we generated 1,548 allophones with POW (Phonetically Optimized Words) 3,848 word DB. We evaluated the proposed word recognizer with POW 3,848 DB, PBW (Phonetically Balanced Words) 445 DB, and 244 word DB in hotel reservation task. Experimental results showed word recognition accuracy of 79.6% for the POW DB corresponding to vocabulary-dependent case, 79.4% in case of 445 word lexicon and 88.9% in case of 100 word lexicon for the PBW DB, and 71.4% for the hotel reservation DB corresponding to vocabulary-independent case.

  • PDF

Construction and Application of POI Database with Spatial Relations Using SNS (SNS를 이용한 POI 공간관계 데이터베이스 구축과 활용)

  • Kim, Min Gyu;Park, Soo Hong
    • Spatial Information Research
    • /
    • v.22 no.4
    • /
    • pp.21-38
    • /
    • 2014
  • Since users who search maps conduct their searching using the name they already know or is commonly called rather than formal name of a specific place, they tend to fail to find their destination. In addition, in typical web map service in terms of spatial searching of map. Location information of unintended place can be provided because when spatial searching is conducted with the vocabulary 'nearby' and 'in the vicinity', location exceeding 2 km from the current location is searched altogether as well. In this research, spatial range that human can perceive is calculated by extracting POI date with the usage of twitter data of SNS, constructing spatial relations with existing POI, which is already constructed. As a result, various place names acquired could be utilized as different names of existing POI data and it is expected that new POI data would contribute to select places for constructing POI data by utilizing to recognize places having lots of POI variation. Besides, we also expect efficient spatial searching be conducted using diverse spatial vocabulary which can be used in spatial searching and spatial range that human can perceive.

Phonetic Question Set Generation Algorithm (음소 질의어 집합 생성 알고리즘)

  • 김성아;육동석;권오일
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.2
    • /
    • pp.173-179
    • /
    • 2004
  • Due to the insufficiency of training data in large vocabulary continuous speech recognition, similar context dependent phones can be clustered by decision trees to share the data. When the decision trees are built and used to predict unseen triphones, a phonetic question set is required. The phonetic question set, which contains categories of the phones with similar co-articulation effects, is usually generated by phonetic or linguistic experts. This knowledge-based approach for generating phonetic question set, however, may reduce the homogeneity of the clusters. Moreover, the experts must adjust the question sets whenever the language or the PLU (phone-like unit) of a recognition system is changed. Therefore, we propose a data-driven method to automatically generate phonetic question set. Since the proposed method generates the phone categories using speech data distribution, it is not dependent on the language or the PLU, and may enhance the homogeneity of the clusters. In large vocabulary speech recognition experiments, the proposed algorithm has been found to reduce the error rate by 14.3%.

Analyzing BIBFRAME Cases for the Development of BIBFRAME Application Plans in Korea (BIBFRAME 구축 사례 분석을 통한 국내 적용방안에 관한 연구)

  • Lee, Mihwa
    • Journal of Korean Library and Information Science Society
    • /
    • v.49 no.2
    • /
    • pp.59-78
    • /
    • 2018
  • This study is to suggest the concrete application plan of BIBFRAME under the development of BIBFRAME as library specific ontology for linked open data. The several research methods are used as the literature reviews, the case study of LC and LD4P, and the survey of cataloging librarians which is to grasp understanding level of the linked data related terms and requirements for constructing LOD. The application plan is suggested as follows. First, publishing name authority data and subject heading in LOD are prominent as the startup with creating terms list or vocabulary in LOD that has been used in library for controlled vocabulary and data value. Second, it is needed to develop BIBFRAME application and extension modeling in Korea, to map KORMARC and the properties and classes of BIBFRAME, and to develop the editor and MARC to BIBFRAME Transformation Tools. Third, the systematical training for cataloging librarians is designed to regard BIBFRAME related works as the librarian's main field. Therefore, this study would contribute to seek the practical application plan for BIBFRAME in Korea.

Evaluation of Knowledge Graph for Interoperating Digital Records (디지털 기록의 상호운용을 위한 지식그래프의 평가)

  • Haram Park;Haklae Kim
    • Journal of Korean Society of Archives and Records Management
    • /
    • v.23 no.4
    • /
    • pp.159-178
    • /
    • 2023
  • A digital archive is an online platform for preserving and utilizing digital records worthy of continued preservation. However, there are no shared standards for functionality, metadata, or data technical principles across digital archives in Korea. These issues create challenges in linking distributed digital records. This study proposes a common vocabulary for digital archives to enhance the interoperability of digital records and evaluates the interoperability of the digital archive built with the common vocabulary. We collect and analyze data from the digital archive on the Korean financial crisis of 1997 to construct a knowledge graph and compare its interoperability with the knowledge graph built with RiC-O. The archive and the knowledge graph underwent evaluation using the FAIR data principles evaluation framework. The constructed knowledge graph links various objects in the archive and provides contextual information to aid in understanding the archive. The results demonstrate that a knowledge graph built with a common vocabulary significantly improves the linkage, search, and interoperability of digital records compared to a traditional archive.

A Case Study on the Exterior Space Improving in University Campus through the Analysis of User's Cognition - Focused on Campuses in Busan City - (사용자인식 분석을 통한 캠퍼스 외부공간 개선방향 설정에 관한 사례연구 - 부산시 소재 대학을 중심으로 -)

  • Hong, Sung-Min
    • Journal of the Korean Institute of Educational Facilities
    • /
    • v.21 no.1
    • /
    • pp.33-42
    • /
    • 2014
  • The purpose of this study is to suggest a basis for exterior space improving in university campus in terms of upgrading the quality of university education environment by analysing user's cognition and physical feature about campus exterior space. For this, this study was survey six major university students in Busan city about perception of campus exterior space, and analyzes the user's cognition by using natural-language vocabulary analysis for qualitative approach. Next, this study analyzes the physical feature of campus exterior space by investigating user's intensive using spaces and preferred, non-preferred spaces in their universities, then propose the improved direction of campus exterior space by comparing the analyzed data of user's cognition and physical feature. A SPSS20 program is used for the data analysis and the sample sizes are 171 college students.

A Study on the Color Preferences of Genders of Color Image Types - From the Perspectives of Color Application of the Fashion Shop Facade - (색채 이미지 유형에 따른 성별 색채 선호도에 관한 연구 - 패션샵 파사드의 색채 적용 관점에서 -)

  • Yeo, Mi;Lee, Chang-No
    • Korean Institute of Interior Design Journal
    • /
    • v.21 no.1
    • /
    • pp.136-147
    • /
    • 2012
  • This study researched about gender color preference as basic data for color application of fashion shop Facade. A HUE TONE system from V(vivid) to DK(dark) was used based on 10 colors of the IRI-120 color chart, color preference according to gender was investigated through a survey on males and females of over teenage years, and it was analyzed and presented as a color matching chart. And it was suggested as a color guideline through comprehensive analysis. Few definitions can be given through the results of this study. First, the preference degree according to gender was similar but different senses were shown visually even though the same adjective expressive vocabulary of a color image was suggested. This means there is an unchanging basic conservative disposition that males and females do not have and therefore they infer different ideas according to various environments and factors. Second, females showed more sensitive response to colors than males in the gender color preference result, which confirmed the deviation of each color group that is characteristically preferred according to a category. Third, high preferred color matches according to gender were shown for each vocabulary in various senses such as similar color matching, complementary color matching, separation color matching, and accent color matching. A universal empirical theory by general sensibility was obtained as the purpose of this study. This study suggested securement of a color design planning as basic data and the extent of usability by quantitatively showing the order of priority through the survey and analysis. Thus, the results of this study will be a great help as basic data for invigoration and commercialization of a color planning for designers and users.

  • PDF

Implementation of HMM Based Speech Recognizer with Medium Vocabulary Size Using TMS320C6201 DSP (TMS320C6201 DSP를 이용한 HMM 기반의 음성인식기 구현)

  • Jung, Sung-Yun;Son, Jong-Mok;Bae, Keun-Sung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.1E
    • /
    • pp.20-24
    • /
    • 2006
  • In this paper, we focused on the real time implementation of a speech recognition system with medium size of vocabulary considering its application to a mobile phone. First, we developed the PC based variable vocabulary word recognizer having the size of program memory and total acoustic models as small as possible. To reduce the memory size of acoustic models, linear discriminant analysis and phonetic tied mixture were applied in the feature selection process and training HMMs, respectively. In addition, state based Gaussian selection method with the real time cepstral normalization was used for reduction of computational load and robust recognition. Then, we verified the real-time operation of the implemented recognition system on the TMS320C6201 EVM board. The implemented recognition system uses memory size of about 610 kbytes including both program memory and data memory. The recognition rate was 95.86% for ETRI 445DB, and 96.4%, 97.92%, 87.04% for three kinds of name databases collected through the mobile phones.

Phoneme Similarity Error Correction System using Bhattacharyya Distance Measurement Method (바타챠랴 거리 측정법을 이용한 음소 유사율 오류 보정 개선 시스템)

  • Ahn, Chan-Shik;Oh, Sang-Yeob
    • Journal of the Korea Society of Computer and Information
    • /
    • v.15 no.6
    • /
    • pp.73-80
    • /
    • 2010
  • Vocabulary recognition system is providing inaccurate vocabulary and similar phoneme recognition due to reduce recognition rate. It's require method of similar phoneme recognition unrecognized and efficient feature extraction process. Therefore in this paper propose phoneme likelihood error correction improvement system using based on phoneme feature Bhattacharyya distance measurement. Phoneme likelihood is monophone training data phoneme using HMM feature extraction method, similar phoneme is induced recognition able to accurate phoneme using Bhattacharyya distance measurement. They are effective recognition rate improvement. System performance comparison as a result of recognition improve represent 1.2%, 97.91% by Euclidean distance measurement and dynamic time warping(DTW) system.

Improving methods for normalizing biomedical text entities with concepts from an ontology with (almost) no training data at BLAH5 the CONTES

  • Ferre, Arnaud;Ba, Mouhamadou;Bossy, Robert
    • Genomics & Informatics
    • /
    • v.17 no.2
    • /
    • pp.20.1-20.5
    • /
    • 2019
  • Entity normalization, or entity linking in the general domain, is an information extraction task that aims to annotate/bind multiple words/expressions in raw text with semantic references, such as concepts of an ontology. An ontology consists minimally of a formally organized vocabulary or hierarchy of terms, which captures knowledge of a domain. Presently, machine-learning methods, often coupled with distributional representations, achieve good performance. However, these require large training datasets, which are not always available, especially for tasks in specialized domains. CONTES (CONcept-TErm System) is a supervised method that addresses entity normalization with ontology concepts using small training datasets. CONTES has some limitations, such as it does not scale well with very large ontologies, it tends to overgeneralize predictions, and it lacks valid representations for the out-of-vocabulary words. Here, we propose to assess different methods to reduce the dimensionality in the representation of the ontology. We also propose to calibrate parameters in order to make the predictions more accurate, and to address the problem of out-of-vocabulary words, with a specific method.