통합 검색 | Korea Science

The Use of MSVM and HMM for Sentence Alignment

Fattah, Mohamed Abdel
- Journal of Information Processing Systems
- /
- 제8권2호
- /
- pp.301-314
- /
- 2012
In this paper, two new approaches to align English-Arabic sentences in bilingual parallel corpora based on the Multi-Class Support Vector Machine (MSVM) and the Hidden Markov Model (HMM) classifiers are presented. A feature vector is extracted from the text pair that is under consideration. This vector contains text features such as length, punctuation score, and cognate score values. A set of manually prepared training data was assigned to train the Multi-Class Support Vector Machine and Hidden Markov Model. Another set of data was used for testing. The results of the MSVM and HMM outperform the results of the length based approach. Moreover these new approaches are valid for any language pairs and are quite flexible since the feature vector may contain less, more, or different features, such as a lexical matching feature and Hanzi characters in Japanese-Chinese texts, than the ones used in the current research.
https://doi.org/10.3745/JIPS.2012.8.2.301 인용 PDF KSCI

Generative AI as a Virtual Conversation Partner in Language Learning

Ji-Young Seo;Seon-Ah, Kim
- International Journal of Advanced Culture Technology
- /
- 제12권2호
- /
- pp.7-15
- /
- 2024
Despite a recent surge in multifaceted research on AI-integrated language learning, empirical studies in this area remain limited. This study adopts a Human-Generative AI parallel processing model to examine students' perceptions, asking 182 college students to independently construct knowledge and then compare their efforts with the results generated through in-classroom conversations with ChatGPT 3.5. In questionnaire responses, most students indicated that they found these activities useful and expressed a keen interest in learning various ways to utilize generative AI for language learning with instructor guidance. The findings confirm that ChatGPT's potential as a virtual conversation partner. Identifying specific reasons for the perceived usefulness of conversation activities and drawbacks of ChatGPT, this study emphasizes the importance of teachers staying informed about both the latest advances in technology and their limitations. We recommend that teachers endeavor to creatively design various classroom activities using AI technology.
https://doi.org/10.17703/IJACT.2024.12.2.7 인용 PDF

단어 단위의 추정 정렬을 통한 영-한 대역어의 자동 추출 (An Automatic Extraction of English-Korean Bilingual Terms by Using Word-level Presumptive Alignment)

이공주
- 정보처리학회논문지:소프트웨어 및 데이터공학
- /
- 제2권6호
- /
- pp.433-442
- /
- 2013
기계번역 시스템 구축에 가장 필수적인 요소는 번역하고자 하는 언어간의 단어쌍을 담고 있는 대역어 사전이다. 대역어 사전은 기계번역뿐만 아니라 서로 다른 언어간의 정보를 교환하는 모든 응용프로그램의 필수적인 지식원(knowledge source)이다. 본 연구에서는 문서 단위로 정렬된 병렬 코퍼스와 기본적인 대역어 사전을 이용하여 영-한 대역어를 자동으로 추출하는 방법에 대해 소개한다. 이 방법은 수집된 병렬 코퍼스의 크기에 영향을 받지 않는 방법이다. 문서 단위로 정렬된 병렬 코퍼스로부터 문장 단위의 정렬을 수행하고 다시 단어 단위의 정렬을 수행한 후, 정렬이 채 되지 않은 부분에 대해 추정 정렬을 수행한다. 추정 정렬에는 문장에서의 위치, 다른 단어와의 관계, 두 언어간의 언어적 정보등 다양한 정보가 사용된다. 이렇게 추정 정렬된 단어쌍으로부터 영-한 대역어를 추출할 수 있다. 약 1,000개로 구성된 병렬 코퍼스로부터 추출한 영-한 대역어는 71.7%의 정확도를 얻을 수 있었다.
https://doi.org/10.3745/KTSDE.2013.2.6.433 인용 PDF KSCI

Full Search Equivalent Motion Estimation Algorithm for General-Purpose Multi-Core Architectures

Park, Chun-Su
- 반도체디스플레이기술학회지
- /
- 제12권3호
- /
- pp.13-18
- /
- 2013
Motion estimation is a key technique of modern video processing that significantly improves the coding efficiency significantly by exploiting the temporal redundancy between successive frames. Thread-level parallelism is a promising method to accelerate the motion estimation process for multithreading general-purpose processors. In this paper, we propose a parallel motion estimation algorithm which parallelizes the motion search process of the current H.264/AVC encoder. The proposed algorithm is implemented using the OpenMP application programming interface (API) and can be easily integrated into the current encoder. The experimental results show that the proposed parallel algorithm can reduce the processing time of the motion estimation up to 65.08% without any penalty in the rate-distortion (RD) performance.
PDF KSCI

Building an Annotated English-Vietnamese Parallel Corpus for Training Vietnamese-related NLPs

Dien Dinh;Kiem Hoang
- 대한전자공학회:학술대회논문집
- /
- 대한전자공학회 2004년도 ICEIC The International Conference on Electronics Informations and Communications
- /
- pp.103-109
- /
- 2004
In NLP (Natural Language Processing) tasks, the highest difficulty which computers had to face with, is the built-in ambiguity of Natural Languages. To disambiguate it, formerly, they based on human-devised rules. Building such a complete rule-set is time-consuming and labor-intensive task whilst it doesn't cover all the cases. Besides, when the scale of system increases, it is very difficult to control that rule-set. So, recently, many NLP tasks have changed from rule-based approaches into corpus-based approaches with large annotated corpora. Corpus-based NLP tasks for such popular languages as English, French, etc. have been well studied with satisfactory achievements. In contrast, corpus-based NLP tasks for Vietnamese are at a deadlock due to absence of annotated training data. Furthermore, hand-annotation of even reasonably well-determined features such as part-of-speech (POS) tags has proved to be labor intensive and costly. In this paper, we present our building an annotated English-Vietnamese parallel aligned corpus named EVC to train for Vietnamese-related NLP tasks such as Word Segmentation, POS-tagger, Word Order transfer, Word Sense Disambiguation, English-to-Vietnamese Machine Translation, etc.
PDF

8-이웃 연결값에 의한 병렬세선화 알고리즘 (A Parallel Thinning Algorithm by the 8-Neighbors Connectivity Value)

원남식;손윤구
- 한국정보처리학회논문지
- /
- 제2권5호
- /
- pp.701-710
- /
- 1995
세선화 알고리즘은 문자 인식에서 인식율을 높이기 위한 매우 중요한 과정이 된다. 본 연구는 다양한 문자 인식에 적용 가능한 8-이웃 연결값을 이용한 병렬세선화 알고 리즘을 제안한다. 제안된 알고리즘의 특징은 병렬성 구현이 용이하며, 세선화된 결과 는 완전 8연결이 되고, 수치 정보로 표현된다. 특히 곡선 선분의 골격이 정확하게 표 현되므로 영문, 일어 등 곡선이 많은 문자에 더욱 적합함을 보인다. 성능 평가는 기준 골격선과 유사도 측정 방법에 의해 수행되었다.
PDF

웹 문서로부터 한영 병렬말뭉치의 자동 구축 (Automatically Constructing English-Korean Parallel Corpus from Web Documents)

서형원;김형철;조희영;김재훈;양성일
- 한국정보처리학회:학술대회논문집
- /
- 한국정보처리학회 2006년도 추계학술발표대회
- /
- pp.161-164
- /
- 2006
인터넷이 발전하면서 웹에는 같은 내용을 다양한 언어로 표현한 문서들이 많이 존재한다. 이와 같은 웹 문서의 성질을 이용하여, 이 논문은 웹으로부터 수집된 병렬문서(parallel document)를 이용하여 한영 병렬말뭉치 구축 시스템을 설계하고 구현한다. 이 논문에서 구축과정을 요약하면 다음과 같다. 첫째, 웹 문서수집기를 이용해서 웹으로부터 한영 웹문서(html 문서)를 각각 수집한다. 둘째, 수집된 각 언어의 웹 문서에서 불필요한 내용(태그와 광고 문구 등)을 제거하여 문장을 추출하고, 추출된 문장을 단락단위로 정렬한다. 셋째, 단락단위로 정렬된 문서를 문장정렬(sentence alignment) 방법을 이용해서 문장을 정렬한다. 끝으로 정렬된 병렬문장을 단어 단위로 분리하여 병렬말뭉치를 구축한다. 이와 같은 방법으로 이 논문에서는 약 42만 5천 문장의 한영 병렬말뭉치를 구축하였다.
PDF

적응형 채도 향상 알고리즘을 이용한 컬러 영상 처리 기법 (The Method of Color Image Processing Using Adaptive Saturation Enhancement Algorithm)

양경옥;윤종호;조화현;최명렬
- 정보처리학회논문지B
- /
- 제14B권3호
- /
- pp.145-152
- /
- 2007
본 논문에서는 LCD 모니터, LCD TV, PDP TY, OLED TV 등과 같은 평판 디스플레이 장비를 위한 적응형 칼라 영상 향상 알고리즘에 대해서 제안한다. 제안한 알고리즘은 칼라 영상에서 콘트라스트와 채도를 함께 향상 시키는 방법이다. 콘트라스트 향상을 위해서 사용하는 적응형 선형 추정 CDF(Cumulative Density Function) 기법은 콘트라스트 향상 시 밝기에 따른 조정이 가능하여 원 영상의 왜곡을 막아준다. 적응형 채도 향상 알고리즘은 채도 향상의 문제점인 Contour Artifact와 Over-Saturation이 발생하지 않는 범위내에서 제도를 향상시킨다. 또한 원 영상의 색상 분포에 따른 선택적 채도 향상 방법을 사용하여 고품질의 영상을 얻을 수 있다. 제안된 알고리즘에 의한 처리 결과와 원 영상의 화질 평가를 위해서 시각적 검증과 히스토그램 편차를 도입하였다.
https://doi.org/10.3745/KIPSTB.2007.14-B.3.145 인용 PDF KSCI

영한 및 한영 통계기반 기계번역에서의 이중언어 간 어순처리 및 단어정렬 최적화 방안 연구 (The study of Method for Optimization of Phrase Ordering Process and Word Alignment between Parallel Languages in Korean-English Statistic Based Machine Translation)

정상원
- 한국정보처리학회:학술대회논문집
- /
- 한국정보처리학회 2013년도 춘계학술발표대회
- /
- pp.293-296
- /
- 2013
통계기반 기계번역 시스템 (SBMT system)은 기계번역시스템 중에서 최근 활발히 연구되고 있는 분야이다. 통계기반 기계번역은 대용량의 말뭉치를 사용할 수 있어 특정 언어 쌍에 제한을 덜 받아 모델을 자동으로 학습할 수 있으며 다른 언어에 일반화하여 적용이 가능하다는 장점이 있다. 그러나 영어와 한국어 간 통계기반 기계번역에 있어서는 어순의 차이로 인한 문제를 해결할 필요성이 여전히 남아 있다. 이에 본 연구에서는 영어와 한국어 간 이중언어 말뭉치를 구축하고 통계기반 기계번역 훈련 시스템인 Moses 에 기반하여 구현한 베이스 시스템을 이용하여 이중언어 간 어순처리 및 단어정렬의 최적화 방안을 연구하였다.
https://doi.org/10.3745/PKIPS.y2013m05a.293 인용 PDF

정렬기법을 이용한 미등록 대역어의 자동 추출 (Automatically Extracting Unknown Translations Using Phrase Alignment)

김재훈;양성일
- 정보처리학회논문지B
- /
- 제14B권3호
- /
- pp.231-240
- /
- 2007
이 논문은 정렬 기법을 이용한 미등록 대역어 추출 모델을 제안하고 그 추출 시스템을 구현한다. 제안된 미등록 대역어 추출 모델은 일종의 구절정렬 모델로서 경계모델과 언어모델 그리고 번역 모델로 구성된다. 제안된 추출 시스템은 병렬말뭉치 구축, 단어정렬, 미등록어 추출로 구성된다. 이 논문에서는 제안된 시스템을 평가하기 위해서 약 1,500여 개의 미등록어가 포함된 2,200문장의 평가말뭉치를 구축하여 다양한 실험을 수행하였다. 실험을 통해서 제안된 모델이 미등록 대역어 추출에 매우 유용함을 알 수 있었다. 앞으로 좀 더 객관적인 평가를 위해 대량의 평가말뭉치 구축이 선행되어야 하며 좀 더 양질의 병렬말뭉치의 구축이 필요할 것이다. 또한 미등록어 추출 모델을 개선하기 다양한 연구가 추진되어야 할 것이다.
https://doi.org/10.3745/KIPSTB.2007.14-B.3.231 인용 PDF KSCI

검색결과 15건 처리시간 0.023초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)