• 제목/요약/키워드: Square Language

검색결과 88건 처리시간 0.024초

A Unicode based Deep Handwritten Character Recognition model for Telugu to English Language Translation

  • BV Subba Rao;J. Nageswara Rao;Bandi Vamsi;Venkata Nagaraju Thatha;Katta Subba Rao
    • International Journal of Computer Science & Network Security
    • /
    • 제24권2호
    • /
    • pp.101-112
    • /
    • 2024
  • Telugu language is considered as fourth most used language in India especially in the regions of Andhra Pradesh, Telangana, Karnataka etc. In international recognized countries also, Telugu is widely growing spoken language. This language comprises of different dependent and independent vowels, consonants and digits. In this aspect, the enhancement of Telugu Handwritten Character Recognition (HCR) has not been propagated. HCR is a neural network technique of converting a documented image to edited text one which can be used for many other applications. This reduces time and effort without starting over from the beginning every time. In this work, a Unicode based Handwritten Character Recognition(U-HCR) is developed for translating the handwritten Telugu characters into English language. With the use of Centre of Gravity (CG) in our model we can easily divide a compound character into individual character with the help of Unicode values. For training this model, we have used both online and offline Telugu character datasets. To extract the features in the scanned image we used convolutional neural network along with Machine Learning classifiers like Random Forest and Support Vector Machine. Stochastic Gradient Descent (SGD), Root Mean Square Propagation (RMS-P) and Adaptative Moment Estimation (ADAM)optimizers are used in this work to enhance the performance of U-HCR and to reduce the loss function value. This loss value reduction can be possible with optimizers by using CNN. In both online and offline datasets, proposed model showed promising results by maintaining the accuracies with 90.28% for SGD, 96.97% for RMS-P and 93.57% for ADAM respectively.

CNN을 이용한 발화 주제 다중 분류 (Multi-labeled Domain Detection Using CNN)

  • 최경호;김경덕;김용희;강인호
    • 한국정보과학회 언어공학연구회:학술대회논문집(한글 및 한국어 정보처리)
    • /
    • 한국정보과학회언어공학연구회 2017년도 제29회 한글 및 한국어 정보처리 학술대회
    • /
    • pp.56-59
    • /
    • 2017
  • CNN(Convolutional Neural Network)을 이용하여 발화 주제 다중 분류 task를 multi-labeling 방법과, cluster 방법을 이용하여 수행하고, 각 방법론에 MSE(Mean Square Error), softmax cross-entropy, sigmoid cross-entropy를 적용하여 성능을 평가하였다. Network는 음절 단위로 tokenize하고, 품사정보를 각 token의 추가한 sequence와, Naver DB를 통하여 얻은 named entity 정보를 입력으로 사용한다. 실험결과 cluster 방법으로 문제를 변형하고, sigmoid를 output layer의 activation function으로 사용하고 cross entropy cost function을 이용하여 network를 학습시켰을 때 F1 0.9873으로 가장 좋은 성능을 보였다.

  • PDF

HMM 기반 혼용 언어 음성합성을 위한 모델 파라메터의 음절 경계에서의 평활화 기법 (Syllable-Level Smoothing of Model Parameters for HMM-Based Mixed-Lingual Text-to-Speech)

  • 양종열;김홍국
    • 말소리와 음성과학
    • /
    • 제2권1호
    • /
    • pp.87-95
    • /
    • 2010
  • In this paper, we address issues associated with mixed-lingual text-to-speech based on context-dependent HMMs, where there are multiple sets of HMMs corresponding to each individual language. In particular, we propose smoothing techniques of synthesis parameters at the boundaries between different languages to obtain more natural quality of speech. In other words, mel-frequency cepstral coefficients (MFCCs) at the language boundaries are smoothed by applying several linear and nonlinear approximation techniques. It is shown from an informal listening test that synthesized speech smoothed by a modified version of linear least square approximation (MLLSA) and a quadratic interpolation (QI) method is preferred than that without using any smoothing technique.

  • PDF

언어의 자의성과 이상의 ″상″ 이미지 (The Arbitrary Nature of Language and the Image of ′Sang′ (Box))

  • 오정란
    • 인문언어
    • /
    • 제1권1호
    • /
    • pp.159-183
    • /
    • 2001
  • This paper surveys the meaning of the pen name ′Yi Sang(李箱)′ in the light of the arbitariness of language and the relationship among names, observing in three ways how the images of ′Sang′ (Box) appear differently in his works. In Particular, we look into (1) the relationship between a box and its inner boxes, (2) the divisional relationship in the box, and (3) the relationship between the box and an outer box. In Yi Sang′s work, image (1) gives shape to the relationship between the writer himself and his ancestors who are responsible for his present existence. Image (2) gives shape to two internally divided Yi Sangs, and is symbolized as a mirror that reflects two selves. Image (3) gives shape to a relationship between himself in the closed world and other people and/or a bigger outside world. All these symbolic relationships are fully represented in his work "Nalgae" (The Wing)". What is interesting is that Yi Sang has already employed these box images in his earlier poetry "Geonchuk-muhan-yukmyeon-gaucho" (Infinite Architectural Hexagon). furthermore, by visualizing his name as a sqaure($\square$), he means to suggest the intentional image of his pen name as a ′box′.

  • PDF

본문과 덧글의 동시출현 자질을 이용한 역 카이제곱 기반 블로그 덧글 스팸 필터 시스템 (A Comment Spam Filter System based on Inverse Chi-Square Using of Co-occurrence Feature Between Comment and Blog Post)

  • 전희원;임해창
    • 한국정보과학회 언어공학연구회:학술대회논문집(한글 및 한국어 정보처리)
    • /
    • 한국정보과학회언어공학연구회 2007년도 제19회 한글 및 한국어 정보처리 학술대회
    • /
    • pp.122-127
    • /
    • 2007
  • 최근 대표적인 1인 미디어의 형태인 블로그는 개인 기록의 수단뿐만 아니라 기업의 홍보에까지 널리 사용되는 인터넷 미디어이다. 그러나 누구나 글을 쓸 수 있다는 자유로움 이면에 이를 이용한 덧글 스팸이 성행이 성행하고 있다. 일반적인 스팸 필터의 경우 그 해당 덧글만을 가지고 스팸 필터링을 한다. 그러나 특성상 스팸인 덧글이 정상인 덧글보다 상대적으로 짧기 때문에 일반적인 덧글 자체만의 필터링 방법으로는 높은 정확도를 기대하기 힘든 단점이 있다. 본 논문에서는 정상인 덧글과 본문간의 내용상의 유사도가 있음을 가정해 이런 정보를 역카이제곱 분류기에 동시출현(co-occurrence) 정보로 부여함으로써 스팸 필터의 정확도를 높이고자 했으며, 실제 그러한 정보를 추가함으로 단순한 확률기반 스팸 필터링 방법을 사용하는 것보다 스팸 필터의 전반적인 성능이 상승되었음을 실험 결과를 통해 알 수 있었다.

  • PDF

실내공간의 기호학적 공간분석에 관한 연구 -그레마스의 기호사변형을 중심으로- (A Study on the Analysis method of interior Space by Semiotic Approach)

  • 박진배;이수영;조종현
    • 한국실내디자인학회논문집
    • /
    • 제16호
    • /
    • pp.29-35
    • /
    • 1998
  • The purpose of this study is to analyze the elements forming interior design and to examine dimensional relationship among the elements which form space through the comparison of the spatial language and semiotics of space for the component of interior design. In addition to that it indtends to derive the principle of design which dominate interior design and the inherent diversified meaning by comparing those elements with the square of semiotic used in semiotics. Through this comparsion the meaning of constituent forming space which can be observed through the comparsion of square of semiotic has redefined flexbility among relational system of elements and this flexible concept make the scope of environment including human being broad and enriched. This study fist of all analyzes various phenomena of social culture review semiotics meta-learning and examines back theoretical ground of semiotics which is needed for space analysis. Second of all in the area of presenting an analysis tool for meaningful analysis this report introduces the square of semiotics which was invented,. A. J. Greimas in order to analyze the meaning of literary work and defind three categories of the progressive research method for the analysis of interior design and research itself. Finally as for the analysis of meaning for interior design this report sets the space and analyzed the space in accordance with the method and research procedure. being

  • PDF

잠재적 위험요인의 탐색에 관한 단일표본분석과 복합표본분석의 비교 (Comparative Analysis of Unweighted Sample Design and Complex Sample Design Related to the Exploration of Potential Risk Factors of Dysphonia)

  • 변해원
    • 한국산학기술학회논문지
    • /
    • 제13권5호
    • /
    • pp.2251-2258
    • /
    • 2012
  • 본 연구는 잠재적 위험요인을 탐색하는 방법으로 단순임의추출분석(unweighted sample design), 빈도 가중치를 적용한 단일표본분석(frequency weighted sample design), 가중치를 층화하여 적용한 복합표본분석(complex sample design)을 비교하고, 도출된 결과에 통계적인 차이가 있는지를 파악하고자 수행되었다. 자료원은 2009 국민건강영양조사의 이비인후과 검진 자료를 이용하였다. 분석 방법은 피어슨의 교차검정(Pearson chi-square test)과 라오-스콧교차검정(Rao-scott chi-square test)을 이용하였다. 분석 결과, 빈도 가중치만을 적용한 단일표본분석의 경우에는 모든 변수가 유의한 위험요인으로 과대 예측 되었고, 가중치를 적용하지 않은 단순임의추출 분석과 복합표본분석은 유의수준 및 결과에 차이가 있었다. 국가통계자료를 이용할 때, 연구의 결과가 전체 인구집단을 대표할 수 있도록 의미를 부여하기 위해서는 층화변수와 집락변수를 사용하여 가중치를 적용하는 복합표본분석이 필요하다. 나아가, 빈도 가중치만을 적용하는 경우에는 연구 결과에 대한 과잉해석의 가능성이 높기 때문에 각별한 주의가 요구된다.

계열별 프로그래밍 언어의 활용도에 관한 차이분석 -경영계열을 중심으로- (Difference Analysis on Application Level of Programming Language in Major : focused on non-business administration group and business administration group)

  • 박재용
    • 경영과정보연구
    • /
    • 제2권
    • /
    • pp.237-266
    • /
    • 1998
  • The purpose of this study was to analyze the differences of application level the computer programming languages in major. The method of this study is the empirical method based on theoretical one with the previous bibliographical suduies. The sample consists of 268 listed university disclosing over the period of Dec. 1. 1997 to Du. 15. 1997. The samples were 10 university in university of Seoul, Pusan, Masan City, Korea. The data were collected by questionnaire research through interview with each person. The analysis data was carried to 268 samples by using SPSS/PC for Windows Version 7.5 statistical package. Statistical methods such as frequency analysis, chi-square test, ANOVA test, correlation analysis were used to test the research questions. This paper focuses on the design of the hypothesis test show that the 2 type are significantly different in major of university students. Before the test of research questions performed it frequency analysis by using the factor score that bring each items. The research type of 2 guoups, that is, BA Group(business administration group) and NBA Group(non-business administration guoup). To summarize the result of this study is as follows ; (1) Hypothesis 1 : Concerning about computer programming language in major, the significant difference is application the present condition. (2) Hypothesis 2 : Concerning about computer programming language in major, the significant difference is application level. (3) Hypothesis 3 : Concerning about each application software in major, the significant difference is application level. According to the results of this study, it is found that (1) Hypothesis 1 related In application the present condition of computer programming language was accepted its all at 0.05 % significance level. (2) Hypothesis 2 related to application level of computer programming language was accepted its all at 0.05 % significance level. (3) Hypothesis 3 related to application level of each application software was rejected its all. at 0.05 % significance level.

  • PDF

최대 엔트로피 기반 문서 분류기의 학습 (Text Categorization Based on the Maximum Entropy Principle)

  • 장정호;장병탁;김영택
    • 한국정보과학회:학술대회논문집
    • /
    • 한국정보과학회 1999년도 가을 학술발표논문집 Vol.26 No.2 (2)
    • /
    • pp.57-59
    • /
    • 1999
  • 본 논문에서는 최대 엔트로피 원리에 기반한 문서 분류기의 학습을 제안한다. 최대 엔트로피 기법은 자연언어 처리에서 언어 모델링(Language Modeling), 품사 태깅 (Part-of-Speech Tagging) 등에 널리 사용되는 방법중의 하나이다. 최대 엔트로피 모델의 효율성을 위해서는 자질 선정이 중요한데, 본 논문에서는 자질 집합의 선택을 위한 기준으로 chi-square test, log-likelihood ratio, information gain, mutual information 등의 방법을 이용하여 실험하고, 전체 후보 자질에 대한 실험 결과와 비교해 보았다. 데이터 집합으로는 Reuters-21578을 사용하였으며, 각 클래스에 대한 이진 분류 실험을 수행하였다.

  • PDF

시계열 및 예측모델 선택과정에서 스펙트럼의 이용 (The use of spectral analysis in choosing time series and forecasting models)

  • 전덕빈
    • 대한산업공학회지
    • /
    • 제14권1호
    • /
    • pp.51-56
    • /
    • 1988
  • A spectrum analysis method is presented with an example as an aid to Box and Jerkins' model identification procedure, where the theoretical spectrum of ARMA model and its confidence intervals derived by chi-square distribution are compared. An APL (A Programming Language) program for the method is developed for the 16-bit personal computer.

  • PDF