• Title/Summary/Keyword: word database

Search Result 235, Processing Time 0.023 seconds

Speech Database for 3-5 years old Korean Children (만 3-5세 유아의 한국어 음성 데이터베이스 구축)

  • Yoo, Jae-Kwon;Lee, Kyung-Ok;Lee, Kyoung-Mi
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.4
    • /
    • pp.52-59
    • /
    • 2012
  • Children develop their language skill rapidly between age 3 and 5. To meet the child's language development through a variety of experiences, it is necessary to develop age-appropriate contents. So it needs to develop various contents using speech interface for children, but there is no speech database of korean children. In this paper, we develop speech database of 3 to 5 years old children in korean. For collecting accurate children's speech, child education experts examine in the speech database development process. The words for database are selected from MCDI-K in two stage and children speak a word three times. Such collected speech are tokenized by child and word and stored in database. This speech database will be transferred through web and, hopefully, be the foundation of development of children-oriented contents.

A pilot implementation of Korean in Database Semantics: focusing on numeral-classifier construction (데이터베이스 의미론을 이용한 한국어 구현 시론: 수사-분류사 구조를 중심으로)

  • Choe, Jae-Woong
    • Korean Journal of Cognitive Science
    • /
    • v.18 no.4
    • /
    • pp.457-483
    • /
    • 2007
  • Database Semantics (DBS) attempts to provide a comprehensive and integrated approach to human communication which seeks theory-implementation transparency. Two key components of DBS are Word bank as a data structure and left-Associative Grammar (LAG) as an algorithm. This study aims to provide a pilot implementation of Korean in DBS. First, it is shown how the three separate modules of grammar in DBS, namely, Hear, Think, and Speak, combine to form an integrated system that simulates a cognitive agent by making use of a simple Korean sentence as an example. Second, we provide a detailed analysis of the structure in Korean that is a characteristic of Korean involving numerals, classifiers, and nouns, thereby illustrating how DBS can be applied to Korean. We also discuss an issue raised in the literature concerning a problem that arises when we try to apply the LAG algorithm to the analysis of head-final language like Korean, and then discuss some possible solution to the problem.

  • PDF

An Implementation of Intelligent Word Relay Game Considering Characteristics of Real World Langunge (언어생활을 반영한 지능적 끝말잇기 프로그램 구현)

  • Lim, Heui-Seok
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.9 no.1
    • /
    • pp.122-128
    • /
    • 2008
  • An word relay game contributes to rehabilitation and treatment of language disorders such as aphasia. As a computer is better than human in memorizing very large vocabularies, the computer has much advantage over people in word relay game. Such the game result in decrease the motivation of players and patients in treatment of language disorders. To make people to continue word relay and to be effective to remedy language disorders, the game need to be intelligent and familar with a person. This paper proposes an implementation of intelligent Korean word relay game, which considers characteristics of Korean word usage patterns. The gaem is intelligent in constructing vocabulary database and choosing an answer considering the level of player.

A Segmentation-Based HMM and MLP Hybrid Classifier for English Legal Word Recognition (분할기반 은닉 마르코프 모델과 다층 퍼셉트론 결합 영문수표필기단어 인식시스템)

  • 김계경;김진호;박희주
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.11 no.3
    • /
    • pp.200-207
    • /
    • 2001
  • In this paper, we propose an HMM(Hidden Markov modeJ)-MLP(Multi-layer perceptron) hybrid model for recognizing legal words on the English bank check. We adopt an explicit segmentation-based word level architecture to implement an HMM engine with nonscaled and non-normalized symbol vectors. We also introduce an MLP for implicit segmentation-based word recognition. The final recognition model consists of a hybrid combination of the HMM and MLP with a new hybrid probability measure. The main contributions of this model are a novel design of the segmentation-based variable length HMMs and an efficient method of combining two heterogeneous recognition engines. ExperimenLs have been conducted using the legal word database of CENPARMI with encouraging results.

  • PDF

Choosing preferable labels for the Japanese translation of the Human Phenotype Ontology

  • Ninomiya, Kota;Takatsuki, Terue;Kushida, Tatsuya;Yamamoto, Yasunori;Ogishima, Soichi
    • Genomics & Informatics
    • /
    • v.18 no.2
    • /
    • pp.23.1-23.6
    • /
    • 2020
  • The Human Phenotype Ontology (HPO) is the de facto standard ontology to describe human phenotypes in detail, and it is actively used, particularly in the field of rare disease diagnoses. For clinicians who are not fluent in English, the HPO has been translated into many languages, and there have been four initiatives to develop Japanese translations. At the Biomedical Linked Annotation Hackathon 6 (BLAH6), a rule-based approach was attempted to determine the preferable Japanese translation for each HPO term among the candidates developed by the four approaches. The relationship between the HPO and Mammalian Phenotype translations was also investigated, with the eventual goal of harmonizing the two translations to facilitate phenotype-based comparisons of species in Japanese through cross-species phenotype matching. In order to deal with the increase in the number of HPO terms and the need for manual curation, it would be useful to have a dictionary containing word-by-word correspondences and fixed translation phrases for English word order. These considerations seem applicable to HPO localization into other languages.

Implementation of A Fast Preprocessor for Isolated Word Recognition (고립단어 인식을 위한 빠른 전처리기의 구현)

  • Ahn, Young-Mok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.16 no.1
    • /
    • pp.96-99
    • /
    • 1997
  • This paper proposes a very fast preprocessor for isolated word recognition. The proposed preprocessor has a small computational cost for extracting candidate words. In the preprocessor, we used a feature sorting algorithm instead of vector quantization to reduce the computational cost. In order to show the effectiveness of our preprocessor, we compared it to a speech recognition system based on semi-continuous hidden Markov Model and a VQ-based preprocessor by computing their recognition performances of a speaker independent isolated word recognition. For the experiments, we used the speech database consisting of 244 words which were uttered by 40 male speakers. The set of speech data uttered by 20 male speakers was used for training, and the other set for testing. As the results, the accuracy of the proposed preprocessor was 99.9% with 90% reduction rate for the speech database.

  • PDF

Developing an Alias Management Method based on Word Similarity Measurement for POI Application

  • Choi, Jihye;Lee, Jiyeong
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.37 no.2
    • /
    • pp.81-89
    • /
    • 2019
  • As the need for the integration of administrative datasets and address information increases, there is also growing interest in POI (Point of Interest) data as a source of location information across applications and platforms. The purpose of this study is to develop an alias database management method for efficient POI searching, based on POI data representing position. First, we determine the attributes of POI alias data as it is used variously by individual users. When classifying aliases of POIs, we excluded POIs in which the typo and names are all in English alphabet. The attributes of POI aliases are classified into four categories, and each category is reclassified into three classes according to the strength of the attributes. We then define the quality of POI aliases classified in this study through experiments. Based on the four attributes of POI defined in this study, we developed a method of managing one POI alias through and integrated method composed of word embedding and a similarity measurement. Experimental results of the proposed POI alias management method show that it is possible to utilize the algorithm developed in this study if there are small numbers of aliases in each POI with appropriate POI attributes defined in this study.

On the Development of a Large-Vocabulary Continuous Speech Recognition System for the Korean Language (대용량 한국어 연속음성인식 시스템 개발)

  • Choi, In-Jeong;Kwon, Oh-Wook;Park, Jong-Ryeal;Park, Yong-Kyu;Kim, Do-Yeong;Jeong, Ho-Young;Un, Chong-Kwan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.14 no.5
    • /
    • pp.44-50
    • /
    • 1995
  • This paper describes a large-vocabulary continuous speech recognition system using continuous hidden Markov models for the Korean language. To improve the performance of the system, we study on the selection of speech modeling units, inter-word modeling, search algorithm, and grammars. We used triphones as basic speech modeling units, generalized triphones and function word-dependent phones are used to improve the trainability of speech units and to reduce errors in function words. Silence between words is optionally inserted by using a silence model and a null transition. Word pair grammar and bigram model based oil word classes are used. Also we implement a search algorithm to find N-best candidate sentences. A postprocessor reorders the N-best sentences using word triple grammar, selects the most likely sentence as the final recognition result, and finally corrects trivial errors related with postpositions. In recognition tests using a 3,000-word continuous speech database, the system attained $93.1\%$ word recognition accuracy and $73.8\%$ sentence recognition accuracy using word triple grammar in postprocessing.

  • PDF

Network Analysis between Uncertainty Words based on Word2Vec and WordNet (Word2Vec과 WordNet 기반 불확실성 단어 간의 네트워크 분석에 관한 연구)

  • Heo, Go Eun
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.53 no.3
    • /
    • pp.247-271
    • /
    • 2019
  • Uncertainty in scientific knowledge means an uncertain state where propositions are neither true or false at present. The existing studies have analyzed the propositions written in the academic literature, and have conducted the performance evaluation based on the rule based and machine learning based approaches by using the corpus. Although they recognized that the importance of word construction, there are insufficient attempts to expand the word by analyzing the meaning of uncertainty words. On the other hand, studies for analyzing the structure of networks by using bibliometrics and text mining techniques are widely used as methods for understanding intellectual structure and relationship in various disciplines. Therefore, in this study, semantic relations were analyzed by applying Word2Vec to existing uncertainty words. In addition, WordNet, which is an English vocabulary database and thesaurus, was applied to perform a network analysis based on hypernyms, hyponyms, and synonyms relations linked to uncertainty words. The semantic and lexical relationships of uncertainty words were structurally identified. As a result, we identified the possibility of automatically expanding uncertainty words.

Cerebral activation in picture naming task including word reading, picture-word matching and semantic categorization

  • Sohn, Hyo-Jeong;Jung, Jae-Bum;Pyun, Sung-Bom;Nam, Ki-Chun
    • Proceedings of the Korean Society for Cognitive Science Conference
    • /
    • 2006.06a
    • /
    • pp.59-60
    • /
    • 2006
  • To date, there has been minimal researchregarding the cerebral activation of Korean language. There need the database for Korean language that is quite different from alphabetic system. This study examined the brain activation of picture naming, word reading, picture-word matching, and semantic categorization in Korean language. Moreover, we investigated the cortical activation pattern according to semantic demand for the above tasks.

  • PDF