• Title/Summary/Keyword: Data Word length

Search Result 46, Processing Time 0.026 seconds

Pronunciation Variation Modeling for Korean Point-of-Interest Data Using Prosodic Information (운율 정보를 이용한 한국어 위치 정보 데이타의 발음 모델링)

  • Kim, Sun-He;Park, Jeon-Gue;Na, Min-Soo;Jeon, Je-Hun;Chung, Min-Wha
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.2
    • /
    • pp.104-111
    • /
    • 2007
  • This paper examines how the performance of an automatic speech recognizer was improved for Korean Point-of-Interest (POI) data by modeling pronunciation variation using structural prosodic information such as prosodic words and syllable length. First, multiple pronunciation variants are generated using prosodic words given that each POI word can be broken down into prosodic words. And the cross-prosodic-word variations were modeled considering the syllable length of word. A total of 81 experiments were conducted using 9 test sets (3 baseline and 6 proposed) on 9 trained sets (3 baseline, 6 proposed). The results show: (i) the performance was improved when the pronunciation lexica were generated using prosodic words; (ii) the best performance was achieved when the maximum number of variants was constrained to 3 based on the syllable length; and (iii) compared to the baseline word error rate (WER) of 4.63%, a maximum of 8.4% in WER reduction was achieved when both prosodic words and syllable length were considered.

A Study on Phon Call Big Data Analytics (전화통화 빅데이터 분석에 관한 연구)

  • Kim, Jeongrae;Jeong, Chanki
    • Journal of Information Technology and Architecture
    • /
    • v.10 no.3
    • /
    • pp.387-397
    • /
    • 2013
  • This paper proposes an approach to big data analytics for phon call data. The analytical models for phon call data is composed of the PVPF (Parallel Variable-length Phrase Finding) algorithm for identifying verbal phrases of natural language and the word count algorithm for measuring the usage frequency of keywords. In the proposed model, we identify words using the PVPF algorithm, and measure the usage frequency of the identified words using word count algorithm in MapReduce. The results can be interpreted from various viewpoints. We design and implement the model based HDFS (Hadoop Distributed File System), verify the proposed approach through a case study of phon call data. So we extract useful results through analysis of keyword correlation and usage frequency.

Linguistic Characteristics of Domestic Men's Formal Wear Brand Names

  • Kwon, Hae-Sook
    • Journal of Fashion Business
    • /
    • v.14 no.6
    • /
    • pp.11-22
    • /
    • 2010
  • The main purpose of this research was to examine the linguistic characteristics of domestic men's formal wear brand name. Four linguistic characteristics of language type, combined structure type of language, word class, length of brand name were investigated in this research and also examined the difference between brand type. For sample selection, the 209 men's fashion brands were selected from '2009 Korea Fashion Yearbook' and then, 25 brands which could not collect proper informations about the brand name or naming were excluded. Among total 184 men's brand names, 66 men's formal wear brands were selected and studied. For data analysis, quantitative evaluation of the frequency and qualitative evaluation have been used. The result as follows.; (1) Seven language types were found in domestic men's formal wear brand names. English has been used the most, then followed by Italian and French. (2) For combined structure type of brand name language, the single word used the most, followed by separately combined word type, artificially combined word, and unified word type. (3) The most frequently used the type of word class was noun, and followed by phrase, adjective, and verb. In the noun type, 6 different types which expressed a person, concrete & abstract entity, place, acronym, and neologic were found. For phrase, only noun type was appeared, however, 6 out of 20 phrases were abbreviated type. All eight adjective brand names implied an attributive character of the brand such as 'Dainty' or 'Solus(Solo)'. (4) The long name used most and then followed by normal and short length of brand name. Looking by the number of syllable, 4 syllables appeared the most and then followed by 3, 5, 6, 2 & 7 showed the same rate, and 8 syllables. (5) The result which compared the difference according to each brand type showed a difference in its language type, language combined style, word class, but length of brand name.

Fixed-point Processing Optimization of MPEG Psychoacoustic Model-II Algorithm for ASIC Implementation (MPEG 심리음향 모델-ll 알고리듬의 ASIC 구현을 위한 고정 소수점 연산 최적화)

  • Lee Keun-Sup;Park Young-Cheol;Youn Dae Hee
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.11C
    • /
    • pp.1491-1497
    • /
    • 2004
  • The psychoacoustic model in MPEG audio layer-III (MP3) encoder is optimized for the fixed-point processing. The optimization process consists of determining the data word length of arithmetic unit and the algorithm for transcendental functions that are often used in the psychoacoustic model. In order to determine the data word length, we defined a statistical model expressing the relation between the fixed-point operation errors of the psychoacoustic model and the probability of alteration of the allocated bits doe to these errors. Based on the simulations using this model, we chose a 24-bit data path and constructed a 24-bit fixed-point MP3 encoder. Sound quality tests using the constructed fixed-point encoder showed a mean degradation of -0.2 on ITU-R 5-point audio impairment scale.

On the Design of Demodulator and Equalizer of 9600 BPS Modem (9600 BPS Modem의 복조기와 Equalizer에 관한 연구)

  • 장춘서;은종관
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.20 no.4
    • /
    • pp.10-15
    • /
    • 1983
  • In this paper effective methods of demodulation and equalization in a 9600 bps modem have been studied. To reduce the number of multiplications required per symbol in demodula-tion, the method of using a decimation filter is presented. In the equalizer the optimum step size and the steady state mean-squared error (MSE) are obtained from computer simulation results. The performance of the first-order carrier phase tracking loop is compared with that of the second-order loop when carrier frequency offset exists. In addition, the finite word length effects in the equalizer are studied.

  • PDF

Compensation in VC and Word

  • Yun, Il-Sung
    • Phonetics and Speech Sciences
    • /
    • v.2 no.3
    • /
    • pp.81-89
    • /
    • 2010
  • Korean and three other languages (English, Arabic, and Japanese) were compared with regard to the compensatory movements in a VC (Vowel and Consonant) sequence and word. For this, Korean data were collected from an experiment and the other languages' data from literature. All the test words of the languages had the same syllabic contexture, i.e., /CVCV(r)/, where C was an oral stop and intervocalic consonants were either bilabial or alveolar stops. The present study found that (1) Korean is most striking in the durational variations of segments (vowel and the following hetero-syllabic consonant); (2) unlike the three languages that show a constant sum of VC, Korean yields a three-way distinction in the length of VC according the type (lax unaspirated vs. tense unaspirated vs. tense aspirated) of the following stop consonant; (3) a durational constancy is maintained up to the word level in the three languages, but Korean word duration varies as a function of the feature tenseness of the intervocalic consonants; (4) consonant duration is proven to differentiate Korean the most from the other languages. It is suggested that the durational difference between a lax consonant and its tense cognate(s) and the degree of compensation between V and C are determined by the phonology in each language.

  • PDF

Ordering a Left-branching Language: Heaviness vs. Givenness

  • Choi, Hye-Won
    • Language and Information
    • /
    • v.13 no.1
    • /
    • pp.39-56
    • /
    • 2009
  • This paper investigates ordering alternation phenomena in Korean using the dative construction data from Sejong Corpus of Modern Korean (Kim, 2000). The paper first shows that syntactic weight and information structure are distinct and independent factors that influence word order in Korean. Moreover, it reveals that heaviness and givenness compete each other and exert diverging effects on word order, which contrasts the converging effects of these factors shown in word orders of right-branching languages like English. The typological variation of syntactic weight effect poses interesting theoretical and empirical questions, which are discussed in relation to processing efficiency in ordering.

  • PDF

HMM-based Korean Named Entity Recognition (HMM에 기반한 한국어 개체명 인식)

  • Hwang, Yi-Gyu;Yun, Bo-Hyun
    • The KIPS Transactions:PartB
    • /
    • v.10B no.2
    • /
    • pp.229-236
    • /
    • 2003
  • Named entity recognition is the process indispensable to question answering and information extraction systems. This paper presents an HMM based named entity (m) recognition method using the construction principles of compound words. In Korean, many named entities can be decomposed into more than one word. Moreover, there are contextual relationships among nouns in an NE, and among an NE and its surrounding words. In this paper, we classify words into a word as an NE in itself, a word in an NE, and/or a word adjacent to an n, and train an HMM based on NE-related word types and parts of speech. Proposed named entity recognition (NER) system uses trigram model of HMM for considering variable length of NEs. However, the trigram model of HMM has a serious data sparseness problem. In order to solve the problem, we use multi-level back-offs. Experimental results show that our NER system can achieve an F-measure of 87.6% in the economic articles.

Automatic Word Spacing of the Korean Sentences by Using End-to-End Deep Neural Network (종단 간 심층 신경망을 이용한 한국어 문장 자동 띄어쓰기)

  • Lee, Hyun Young;Kang, Seung Shik
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.8 no.11
    • /
    • pp.441-448
    • /
    • 2019
  • Previous researches on automatic spacing of Korean sentences has been researched to correct spacing errors by using n-gram based statistical techniques or morpheme analyzer to insert blanks in the word boundary. In this paper, we propose an end-to-end automatic word spacing by using deep neural network. Automatic word spacing problem could be defined as a tag classification problem in unit of syllable other than word. For contextual representation between syllables, Bi-LSTM encodes the dependency relationship between syllables into a fixed-length vector of continuous vector space using forward and backward LSTM cell. In order to conduct automatic word spacing of Korean sentences, after a fixed-length contextual vector by Bi-LSTM is classified into auto-spacing tag(B or I), the blank is inserted in the front of B tag. For tag classification method, we compose three types of classification neural networks. One is feedforward neural network, another is neural network language model and the other is linear-chain CRF. To compare our models, we measure the performance of automatic word spacing depending on the three of classification networks. linear-chain CRF of them used as classification neural network shows better performance than other models. We used KCC150 corpus as a training and testing data.

Design of Downlink Beamforming Transmitter in OFDMA/ TDD system (OFDMA/TDD 시스템의 하향링크 빔형성 송신기 설계)

  • Park Hyeong-Sook;Park Youn-Ok;Kim Cheol-Sung
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.31 no.5A
    • /
    • pp.493-500
    • /
    • 2006
  • This paper presents the efficient structure and parameter optimization of downlink beamforming transmitter in OFDMA/TDD system. To design downlink beamforming transmitter for multiple transmit antennas, an efficient beamforming structure for multiple users and the choice of word-length of each block are critical in the aspect of its performance and hardware complexity. We propose an efficient beamforming scheme, which stores the weights of subcarriers into memory without user identification at the receiver of base station and calculates the weights for corresponding user in a subcarrier unit of IFFT input at high speed. Also, we obtain the word-length of main data path and other design parameters by fixed-point simulation analysis. The proposed architecture could reduce the memory size proportional to the maximum number of users per frame, and the processing time of an OFDM symbol at the receiver of base station without the need of additional processing time for calculating the weights at the transmitter.