• Title/Summary/Keyword: word and segment classification

Search Result 8, Processing Time 0.022 seconds

An Algorithm for Text Image Watermarking based on Word Classification (단어 분류에 기반한 텍스트 영상 워터마킹 알고리즘)

  • Kim Young-Won;Oh Il-Seok
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.8
    • /
    • pp.742-751
    • /
    • 2005
  • This paper proposes a novel text image watermarking algorithm based on word classification. The words are classified into K classes using simple features. Several adjacent words are grouped into a segment. and the segments are also classified using the word class information. The same amount of information is inserted into each of the segment classes. The signal is encoded by modifying some inter-word spaces statistics of segment classes. Subjective comparisons with conventional word-shift algorithms are presented under several criteria.

Discriminative Training of Stochastic Segment Model Based on HMM Segmentation for Continuous Speech Recognition

  • Chung, Yong-Joo;Un, Chong-Kwan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.4E
    • /
    • pp.21-27
    • /
    • 1996
  • In this paper, we propose a discriminative training algorithm for the stochastic segment model (SSM) in continuous speech recognition. As the SSM is usually trained by maximum likelihood estimation (MLE), a discriminative training algorithm is required to improve the recognition performance. Since the SSM does not assume the conditional independence of observation sequence as is done in hidden Markov models (HMMs), the search space for decoding an unknown input utterance is increased considerably. To reduce the computational complexity and starch space amount in an iterative training algorithm for discriminative SSMs, a hybrid architecture of SSMs and HMMs is programming using HMMs. Given the segment boundaries, the parameters of the SSM are discriminatively trained by the minimum error classification criterion based on a generalized probabilistic descent (GPD) method. With the discriminative training of the SSM, the word error rate is reduced by 17% compared with the MLE-trained SSM in speaker-independent continuous speech recognition.

  • PDF

A Study on the Phonemic Analysis for Korean Speech Segmentation (한국어 음소분리에 관한 연구)

  • Lee, Sou-Kil;Song, Jeong-Young
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.4E
    • /
    • pp.134-139
    • /
    • 2004
  • It is generally known that accurate segmentation is very necessary for both an individual word and continuous utterances in speech recognition. It is also commonly known that techniques are now being developed to classify the voiced and the unvoiced, also classifying the plosives and the fricatives. The method for accurate recognition of the phonemes isn't yet scientifically established. Therefore, in this study we analyze the Korean language, using the classification of 'Hunminjeongeum' and contemporary phonetics, with the frequency band, Mel band and Mel Cepstrum, we extract notable features of the phonemes from Korean speech and segment speech by the unit of the phonemes to normalize them. Finally, through the analysis and verification, we intend to set up Phonemic Segmentation System that will make us able to adapt it to both an individual word and continuous utterances.

Unsupervised Word Grouping Algorithm for real-time implementation of Medium vocabulary recognition (중규모급 단어 인식기의 실시간 구현을 위한 무감독 단어집단화 알고리듬)

  • Lim Dong Sik;Kim Jin Young;Baek Seong Joon
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • autumn
    • /
    • pp.81-84
    • /
    • 1999
  • 본 논문에서는 중규모급 단어인식기의 실시간 구현을 위한 무감독 단어집단화 알고리듬을 제안한다. 무감독 단어집단화는 인식대상 어휘 수가 많은 대용량 음성인식 시스템에서 대상 어휘 수를 줄여주는 역할을 하는 전처리기의 성격을 갖는다. 무감독 집단화를 위해 각 단어의 유$\cdot$무성음 고유의 특성을 잘 반영할 수 있는 특징 파라미터 5개를 사용하여 패턴 인식과 회귀분석에서 널리 사용되고 있는 분류$\cdot$회귀트리(Classification And Regression Tree)에 적용시키는 방법으로 접근하였고, 각 단어의 frame 수를 일정하게 n개로 분할(segment)하여 1개의 tree를 생성시키는 방법과 각 segment에 해당하는 tree를 생성시켜 segment들 사이의 교집합 성분으로 단어들을 집단화 하였다 실험결과 탐색 대상단어 22개에서 평균2.21개로 줄어 전체 대상 단어의 $10\%$만을 탐색하여 인식할 수 있는 방법을 제시할 수 있었다.

  • PDF

Word Segmentation in Handwritten Korean Text Lines based on GAP Clustering (GAP 군집화에 기반한 필기 한글 단어 분리)

  • Jeong, Seon-Hwa;Kim, Soo-Hyung
    • Journal of KIISE:Software and Applications
    • /
    • v.27 no.6
    • /
    • pp.660-667
    • /
    • 2000
  • In this paper, a word segmentation method for handwritten Korean text line images is proposed. The method uses gap information to segment words in line images, where the gap is defined as a white run obtained after vertical projection of line images. Each gap is assigned to one of inter-word gap and inter-character gap based on gap distance. We take up three distance measures which have been proposed for the word segmentation of handwritten English text line images. Then we test three clustering techniques to detect the best combination of gap metrics and classification techniques for Korean text line images. The experiment has been done with 305 text line images extracted manually from live mail pieces. The experimental result demonstrates the superiority of BB(Bounding Box) distance measure and sequential clustering approach, in which the cumulative word segmentation accuracy up to the third hypothesis is 88.52%. Given a line image, the processing time is about 0.05 second.

  • PDF

Towards Improving Causality Mining using BERT with Multi-level Feature Networks

  • Ali, Wajid;Zuo, Wanli;Ali, Rahman;Rahman, Gohar;Zuo, Xianglin;Ullah, Inam
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.10
    • /
    • pp.3230-3255
    • /
    • 2022
  • Causality mining in NLP is a significant area of interest, which benefits in many daily life applications, including decision making, business risk management, question answering, future event prediction, scenario generation, and information retrieval. Mining those causalities was a challenging and open problem for the prior non-statistical and statistical techniques using web sources that required hand-crafted linguistics patterns for feature engineering, which were subject to domain knowledge and required much human effort. Those studies overlooked implicit, ambiguous, and heterogeneous causality and focused on explicit causality mining. In contrast to statistical and non-statistical approaches, we present Bidirectional Encoder Representations from Transformers (BERT) integrated with Multi-level Feature Networks (MFN) for causality recognition, called BERT+MFN for causality recognition in noisy and informal web datasets without human-designed features. In our model, MFN consists of a three-column knowledge-oriented network (TC-KN), bi-LSTM, and Relation Network (RN) that mine causality information at the segment level. BERT captures semantic features at the word level. We perform experiments on Alternative Lexicalization (AltLexes) datasets. The experimental outcomes show that our model outperforms baseline causality and text mining techniques.

A Study on Classification into Hangeul and Hanja in Text Area of Printed Document (인쇄체 문서의 문자영역에서 한글과 한자의 구별에 관한 연구)

  • 심상원;이성범;남궁재찬
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.18 no.6
    • /
    • pp.802-814
    • /
    • 1993
  • This paper propose an algorithm for preprocessing of character recognition, which classify characters into Hangeul and Hanja. In this study, we use the 9 structural chacteristics of Hanja which isn't affected by deformation of size and style of characters and rates based on character size to classify characters. Firstly, we process the blocking to segment each characters. Secondly, on this segmented characters, we apply algorithm proposed in this paper to classify Hangeul and Hanja. Finally, we classify characters into Hangeul and Hanja, respectively. An experiment with 2350 Hangeul and 4888 Hanja printed Gothic and Mincho style of KS-C 5601 are carried out. We experiment on typeface sample book, newspapers, academic society's papers, magazines, textbooks and documents written out word processor to obtain the classifying rates of 98.8%, 92%, 96%, 98% and 98%, respectively.

  • PDF

Consonant-Vowel Classification Based Segmentation Technique for Handwritten Off-Line Hangul (자소 클래스 인식에 의한 off-line 필기체 한글 문자 분할)

  • Hwang, Sun-Ja;Kim, Mun-Hyeon
    • The Transactions of the Korea Information Processing Society
    • /
    • v.3 no.4
    • /
    • pp.1002-1013
    • /
    • 1996
  • The segmentation of characters is an important step in the automatic recognition of handwritten text. This paper proposes the segmenting method of off-line handwritten Hangul. The suggested approach is based on the structural characteristics of Hangul. The first step extracts the local features. connected component and strokes from the imput word. In the second step we identify the class of strokes. The third segmenting step specifies WRC(White Run Column) before consonant or horizontal vowel. If the segment is longer than threshold, the system estimates segmenting columns using the consonant-vowel information and column features, and then finds a cornered boundary along the strokes within the estimated segmenting columns.

  • PDF