• Title/Summary/Keyword: 단어벡터

Search Result 300, Processing Time 0.025 seconds

(The Classification Method of the Document Plagiarism Similarity based on Similar Syntagma Tree and Non-Index Term) (유사 어절 트리와 비 색인어 기반의 문서 표절 유사도 분류 방법)

  • 천승환;김미영;이귀상
    • Journal of the Korea Computer Industry Society
    • /
    • v.3 no.8
    • /
    • pp.1039-1048
    • /
    • 2002
  • It is difficult and laborious to distinguish between the original and the plagiarism about the electrical documents or on-line received documents, specially student homeworks because in many case, the homeworks are written on the same subject. Existing methods are not appropriate to solve this problem, which find the most appropriate category using the expression frequency of index term in documents to be classified. In this paper, a new classification method was proposed to distinguish between the original and the plagiarism about documents which were written similarly which is based on the syntagma vector - except the similar syntagma tree structure and non-index term.

  • PDF

Korean Onomatopoeia Clustering for Sound Database (음향 DB 구축을 위한 한국어 의성어 군집화)

  • Kim, Myung-Gwan;Shin, Young-Suk;Kim, Young-Rye
    • Journal of Korea Multimedia Society
    • /
    • v.11 no.9
    • /
    • pp.1195-1203
    • /
    • 2008
  • Onomatopoeia of korean documents is to represent from natural or artificial sound to human language and it can express onomatopoeia language which is the nearest an object and also able to utilize as standard for clustering of Multimedia data. In this study, We get frequency of onomatopoeia in the experiment subject and select 100 onomatopoeia of use to our study In order to cluster onomatopoeia's relation, we extract feature of similarity and distance metric and then represent onomatopoeia's relation on vector space by using PCA. At the end, we can clustering onomatopoeia by using k-means algorithm.

  • PDF

Speaker Adaptation Using Linear Transformation Network in Speech Recognition (선형 변환망을 이용한 화자적응 음성인식)

  • 이기희
    • Journal of the Korea Society of Computer and Information
    • /
    • v.5 no.2
    • /
    • pp.90-97
    • /
    • 2000
  • This paper describes an speaker-adaptive speech recognition system which make a reliable recognition of speech signal for new speakers. In the Proposed method, an speech spectrum of new speaker is adapted to the reference speech spectrum by using Parameters of a 1st linear transformation network at the front of phoneme classification neural network. And the recognition system is based on semicontinuous HMM(hidden markov model) which use the multilayer perceptron as a fuzzy vector quantizer. The experiments on the isolated word recognition are performed to show the recognition rate of the recognition system. In the case of speaker adaptation recognition, the recognition rate show significant improvement for the unadapted recognition system.

  • PDF

Lossless Coding Scheme for Lattice Vector Quantizer Using Signal Set Partitioning Method (Signal Set Partitioning을 이용한 격자 양자화의 비 손실 부호화 기법)

  • Kim, Won-Ha
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.38 no.6
    • /
    • pp.93-105
    • /
    • 2001
  • In the lossless step of Lattice Vector Quantization(LVQ), the lattice codewords produced at quantization step are enumerated into radius sequence and index sequence. The radius sequence is run-length coded and then entropy coded, and the index sequence is represented by fixed length binary bits. As bit rate increases, the index bit linearly increases and deteriorates the coding performances. To reduce the index bits across the wide range of bit rates, we developed a novel lattice enumeration algorithm adopting the set partitioning method. The proposed enumeration method shifts down large index values to smaller ones and so reduces the index bits. When the proposed lossless coding scheme is applied to a wavelet based image coding, the proposed scheme achieves more than 10% at bit rates higher than 0.3 bits/pixel over the conventional lossless coding method, and yields more improvement as bit rate becomes higher.

  • PDF

The Recognition of Korean Syllables using Parameter Based on Principal Component Analysis (PCA 기반 파라메타를 이용한 숫자음 인식)

  • 박경훈;표창수;김창근;허강인
    • Proceedings of the Korea Institute of Convergence Signal Processing
    • /
    • 2000.12a
    • /
    • pp.181-184
    • /
    • 2000
  • The new method of feature extraction is proposed, considering the statistic feature of human voice, unlike the conventional methods of voice extraction. PCA(principal Component Analysis) is applied to this new method. PCA removes the repeating of data after finding the axis direction which has the greatest variance in input dimension. Then the new method is applied to real voice recognition to assess performance. When results of the number recognition in this paper and the conventional Mel-Cepstrum of voice feature parameter are compared, there is 0.5% difference of recognition rate. Better recognition rate is expected than word or sentence recognition in that less convergence time than the conventional method in extracting voice feature. Also, better recognition tate is expected when the optimum vector is used by statistic feature of data.

  • PDF

A Study on the Product Planning Model based on Word2Vec using On-offline Comment Analysis (온·오프라인 댓글 분석이 활용된 Word2Vec 기반 상품기획 모델연구)

  • Ahn, Yeong-Hwi;Jung, Jin-Young;Park, Koo-Rack
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2021.07a
    • /
    • pp.79-80
    • /
    • 2021
  • 인터넷은 우리 경제를 디지털 경제로 변화시키며 전자상거래도 증가하고 있다. 따라서 구매자가 전자상거래에서 남기는 긍정적인, 부정적인 상품평은 상품기획의 주요 정보가 될 수 있다. 본 논문에서는 버티컬 무소음 마우스 10,000개에 대한 정형화된 데이터셋을 Word2Vec을 이용하여 유사도 분석, 온라인 상품평 빈도분석 상위 50개 단어를 제시하여 실제 상품을 사용한 후 설문조사 시행을 하였다. 온라인 상품평 유사도 분석결과 클릭 키워드에 대한 장점으로 통증(.986), 디자인(.982)가 분석되었으며 단점은 적응(.866), 불편(.854)이었다. 오프라인 상품평에서는 장점으로 디자인(17명), 단점으로 불편(11명)이었다. 또한 온라인과 오프라인의 상품평을 비교함으로써 구매자의 긍정, 부정의 의미를 교차 확인하여 유의미한 정보를 제시 하였다고 볼수 있다. 따라서 본 연구에서 제시하는 상품기획 프로세스를 신상품 개발 및 기존 상품의 개선 전략으로 적용할 수 있겠다.

  • PDF

Temporal Relationship Extraction for Natural Language Texts by Using Deep Bidirectional Language Model (양방향 언어 모델을 활용한 자연어 텍스트의 시간 관계정보 추출 기법)

  • Lim, Chae-Gyun;Choi, Ho-Jin
    • Annual Conference on Human and Language Technology
    • /
    • 2019.10a
    • /
    • pp.81-84
    • /
    • 2019
  • 자연어 문장으로 작성된 문서들에는 대체적으로 시간에 관련된 정보가 포함되어 있을 뿐만 아니라, 문서의 전체 내용과 문맥을 이해하기 위해서 이러한 정보를 정확하게 인식하는 것이 중요하다. 주어진 문서 내에서 시간 정보를 발견하기 위한 작업으로는 시간적인 표현(time expression) 자체를 인식하거나, 시간 표현과 연관성이 있는 사건(event)을 찾거나, 시간 표현 또는 사건 간에서 발생하는 시간적 연관 관계(temporal relationship)를 추출하는 것이 있다. 문서에 사용된 언어에 따라 고유한 언어적 특성이 다르기 때문에, 만약 시간 정보에 대한 관계성을 고려하지 않는다면 주어진 문장들로부터 모든 시간 정보를 추출해내는 것은 상당히 어려운 일이다. 본 논문에서는, 양방향 구조로 학습된 심층 신경망 기반 언어 모델을 활용하여 한국어 입력문장들로부터 시간 정보를 발견하는 작업 중 하나인 시간 관계정보를 추출하는 기법을 제안한다. 이 기법은 주어진 단일 문장을 개별 단어 토큰들로 분리하여 임베딩 벡터로 변환하며, 각 토큰들의 잠재적 정보를 고려하여 문장 내에 어떤 유형의 시간 관계정보가 존재하는지를 인식하도록 학습시킨다. 또한, 한국어 시간 정보 주석 말뭉치를 활용한 실험을 수행하여 제안 기법의 시간 관계정보 인식 정확도를 확인한다.

  • PDF

An Attention Method-based Deep Learning Encoder for the Sentiment Classification of Documents (문서의 감정 분류를 위한 주목 방법 기반의 딥러닝 인코더)

  • Kwon, Sunjae;Kim, Juae;Kang, Sangwoo;Seo, Jungyun
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.4
    • /
    • pp.268-273
    • /
    • 2017
  • Recently, deep learning encoder-based approach has been actively applied in the field of sentiment classification. However, Long Short-Term Memory network deep learning encoder, the commonly used architecture, lacks the quality of vector representation when the length of the documents is prolonged. In this study, for effective classification of the sentiment documents, we suggest the use of attention method-based deep learning encoder that generates document vector representation by weighted sum of the outputs of Long Short-Term Memory network based on importance. In addition, we propose methods to modify the attention method-based deep learning encoder to suit the sentiment classification field, which consist of a part that is to applied to window attention method and an attention weight adjustment part. In the window attention method part, the weights are obtained in the window units to effectively recognize feeling features that consist of more than one word. In the attention weight adjustment part, the learned weights are smoothened. Experimental results revealed that the performance of the proposed method outperformed Long Short-Term Memory network encoder, showing 89.67% in accuracy criteria.

Development and Validation of the Letter-unit based Korean Sentimental Analysis Model Using Convolution Neural Network (회선 신경망을 활용한 자모 단위 한국형 감성 분석 모델 개발 및 검증)

  • Sung, Wonkyung;An, Jaeyoung;Lee, Choong C.
    • The Journal of Society for e-Business Studies
    • /
    • v.25 no.1
    • /
    • pp.13-33
    • /
    • 2020
  • This study proposes a Korean sentimental analysis algorithm that utilizes a letter-unit embedding and convolutional neural networks. Sentimental analysis is a natural language processing technique for subjective data analysis, such as a person's attitude, opinion, and propensity, as shown in the text. Recently, Korean sentimental analysis research has been steadily increased. However, it has failed to use a general-purpose sentimental dictionary and has built-up and used its own sentimental dictionary in each field. The problem with this phenomenon is that it does not conform to the characteristics of Korean. In this study, we have developed a model for analyzing emotions by producing syllable vectors based on the onset, peak, and coda, excluding morphology analysis during the emotional analysis procedure. As a result, we were able to minimize the problem of word learning and the problem of unregistered words, and the accuracy of the model was 88%. The model is less influenced by the unstructured nature of the input data and allows for polarized classification according to the context of the text. We hope that through this developed model will be easier for non-experts who wish to perform Korean sentimental analysis.

A Study on the Automatic Speech Control System Using DMS model on Real-Time Windows Environment (실시간 윈도우 환경에서 DMS모델을 이용한 자동 음성 제어 시스템에 관한 연구)

  • 이정기;남동선;양진우;김순협
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.3
    • /
    • pp.51-56
    • /
    • 2000
  • Is this paper, we studied on the automatic speech control system in real-time windows environment using voice recognition. The applied reference pattern is the variable DMS model which is proposed to fasten execution speed and the one-stage DP algorithm using this model is used for recognition algorithm. The recognition vocabulary set is composed of control command words which are frequently used in windows environment. In this paper, an automatic speech period detection algorithm which is for on-line voice processing in windows environment is implemented. The variable DMS model which applies variable number of section in consideration of duration of the input signal is proposed. Sometimes, unnecessary recognition target word are generated. therefore model is reconstructed in on-line to handle this efficiently. The Perceptual Linear Predictive analysis method which generate feature vector from extracted feature of voice is applied. According to the experiment result, but recognition speech is fastened in the proposed model because of small loud of calculation. The multi-speaker-independent recognition rate and the multi-speaker-dependent recognition rate is 99.08% and 99.39% respectively. In the noisy environment the recognition rate is 96.25%.

  • PDF