• Title/Summary/Keyword: language processing

Search Result 2,686, Processing Time 0.03 seconds

Contextual Modeling in Context-Aware Conversation Systems

  • Quoc-Dai Luong Tran;Dinh-Hong Vu;Anh-Cuong Le;Ashwin Ittoo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.5
    • /
    • pp.1396-1412
    • /
    • 2023
  • Conversation modeling is an important and challenging task in the field of natural language processing because it is a key component promoting the development of automated humanmachine conversation. Most recent research concerning conversation modeling focuses only on the current utterance (considered as the current question) to generate a response, and thus fails to capture the conversation's logic from its beginning. Some studies concatenate the current question with previous conversation sentences and use it as input for response generation. Another approach is to use an encoder to store all previous utterances. Each time a new question is encountered, the encoder is updated and used to generate the response. Our approach in this paper differs from previous studies in that we explicitly separate the encoding of the question from the encoding of its context. This results in different encoding models for the question and the context, capturing the specificity of each. In this way, we have access to the entire context when generating the response. To this end, we propose a deep neural network-based model, called the Context Model, to encode previous utterances' information and combine it with the current question. This approach satisfies the need for context information while keeping the different roles of the current question and its context separate while generating a response. We investigate two approaches for representing the context: Long short-term memory and Convolutional neural network. Experiments show that our Context Model outperforms a baseline model on both ConvAI2 Dataset and a collected dataset of conversational English.

Collision Cause-Providing Ratio Prediction Model Using Natural Language Processing Analytics (자연어 처리 기법을 활용한 충돌사고 원인 제공 비율 예측 모델 개발)

  • Ik-Hyun Youn;Hyeinn Park;Chang-Hee, Lee
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.30 no.1
    • /
    • pp.82-88
    • /
    • 2024
  • As the modern maritime industry rapidly progresses through technological advancements, data processing technology is emphasized as a key driver of this development. Natural language processing is a technology that enables machines to understand and process human language. Through this methodology, we aim to develop a model that predicts the proportions of outcomes when entering new written judgments by analyzing the rulings of the Marine Safety Tribunal and learning the cause-providing ratios of previously adjudicated ship collisions. The model calculated the cause-providing ratios of the accident using the navigation applied at the time of the accident and the weight of key keywords that affect the cause-providing ratios. Through this, the accuracy of the developed model could be analyzed, the practical applicability of the model could be reviewed, and it could be used to prevent the recurrence of collisions and resolve disputes between parties involved in marine accidents.

The Loom-LAG for syntax analysis Adding a language-independent level to LAG

  • Schulze, Markus
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 2002.02a
    • /
    • pp.411-420
    • /
    • 2002
  • The left-associative grammar model (LAG) has been applied successfully to the morphologic and syntactic analysis of various european and asian languages. The algebraic definition of the LAG is very well suited for the application to natural language processing as it inherently obeys de Saussure's second law (de Saussure, 1913, p. 103) on the linear nature of language, which phrase-structure grammar (PSG) and categorial grammar (CG) do not. This paper describes the so-called Loom-LAGs (LLAG) -a specialization of LAGs for the analysis of natural language. Whereas the only means of language-independent abstraction in ordinary LAG is the principle of possible continuations, LLAGs introduce a set of more detailed language-independent generalizations that form the so-called loom of a Loom-LAG. Every LLAG uses the very smut loom and adds the language-specific information in the form of a declarative description of the language -much like an ancient mechanised Jacquard-loom would take a program-card providing the specific pattern for the cloth to be woven. The linguistic information is formulated declaratively in so-called syntax plans that describe the sequential structure of clauses and phrases. This approach introduces the explicit notion of phrases and sentence structure to LAG without violating de Saussure's second law iud without leaving the ground of the original algebraic definition of LAG, LLAGS can in fact be shown to be just a notational variant of LAG -but one that is much better suited for the manual development of syntax grammars for the robust analysis of free texts.

  • PDF

Differential Effect for Neural Activation Processes according to the Proficiency Level of Code Switching: An ERP Study (이중언어환경에서의 언어간 부호전환 수준에 따른 차별적 신경활성화 과정: ERP연구)

  • Kim, Choong-Myung
    • Phonetics and Speech Sciences
    • /
    • v.2 no.4
    • /
    • pp.3-10
    • /
    • 2010
  • The present study aims to investigate neural activations according to the level of code switching in English proficient bilinguals and to find the relationship between the performance of language switching and proficiency level using ERPs (event-related potentials). First, when comparing high-proficient (HP) with low-proficient (LP) bilingual performance in a native language environment, the activation level of N2 was observed to be higher in the HP group than in the LP group, but only under two conditions: 1) the language switching (between-language) condition known as indexing attention of code switching and 2) the inhibition of current language for L1. Another effect of N400 can be shown in both groups only in the language non-switching (within-language) condition. This effect suggests that both groups completed the semantic acceptability task well in their native language environment without the burden of language switching, irrespective of high or low performance. The latencies of N400 are only about 100ms earlier in the HP group than in the LP group. This difference can be interpreted as facilitation of the given task. These results suggest that HP showed the differential activation in inhibitory system for L1 in switching condition of L1-to-L2 to be contrary to inactivation of inhibitory system for the LP group. Despite the absence of an N400 effect at the given task in both groups, differential latencies between the peaks were attributed to the differences of efficiency in semantic processing.

  • PDF

Comparison Thai Word Sense Disambiguation Method

  • Modhiran, Teerapong;Kruatrachue, Boontee;Supnithi, Thepchai
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2004.08a
    • /
    • pp.1307-1312
    • /
    • 2004
  • Word sense disambiguation is one of the most important problems in natural language processing research topics such as information retrieval and machine translation. Many approaches can be employed to resolve word ambiguity with a reasonable degree of accuracy. These strategies are: knowledge-based, corpus-based, and hybrid-based. This paper pays attention to the corpus-based strategy. The purpose of this paper is to compare three famous machine learning techniques, Snow, SVM and Naive Bayes in Word-Sense Disambiguation on Thai language. 10 ambiguous words are selected to test with word and POS features. The results show that SVM algorithm gives the best results in solving of Thai WSD and the accuracy rate is approximately 83-96%.

  • PDF

Features of an Error Correction Memory to Enhance Technical Texts Authoring in LELIE

  • SAINT-DIZIER, Patrick
    • International Journal of Knowledge Content Development & Technology
    • /
    • v.5 no.2
    • /
    • pp.75-101
    • /
    • 2015
  • In this paper, we investigate the notion of error correction memory applied to technical texts. The main purpose is to introduce flexibility and context sensitivity in the detection and the correction of errors related to Constrained Natural Language (CNL) principles. This is realized by enhancing error detection paired with relatively generic correction patterns and contextual correction recommendations. Patterns are induced from previous corrections made by technical writers for a given type of text. The impact of such an error correction memory is also investigated from the point of view of the technical writer's cognitive activity. The notion of error correction memory is developed within the framework of the LELIE project an experiment is carried out on the case of fuzzy lexical items and negation, which are both major problems in technical writing. Language processing and knowledge representation aspects are developed together with evaluation directions.

Implementation of Document Classification Engine by Using Associative Knowledge (연상 지식을 이용한 문서 분류 엔진의 구현)

  • Jang Jung-Hyo;Son Ju-Sung;Lee Sang-Kon;Ahn Dong-Un
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2006.05a
    • /
    • pp.625-628
    • /
    • 2006
  • 인간은 문서 내용의 적절성을 파악하기 위해서는 문서 전체를 읽어 보아야 그 적절성 여부를 알 수 있다. 그러나 문서의 양이 많은 경우나 문서 내에 여러 화제가 산재되어 있으면 문서의 분야를 파악하기 위해 많은 시간과 노력이 필요하게 된다. 따라서 본 논문에서 제안하는 방법은 이러한 비용을 절감하기 위해 카테고리의 트리 정보와 문서의 내용에서 추출한 분야연상어를 지식사전으로 구축하고 이를 이용하는 분류기를 설계하여 수집과 분류에 소요되는 비용을 절감하는 자동 분류기를 구현하였다.

  • PDF

Word Segmentation and POS tagging using Seq2seq Attention Model (seq2seq 주의집중 모델을 이용한 형태소 분석 및 품사 태깅)

  • Chung, Euisok;Park, Jeon-Gue
    • 한국어정보학회:학술대회논문집
    • /
    • 2016.10a
    • /
    • pp.217-219
    • /
    • 2016
  • 본 논문은 형태소 분석 및 품사 태깅을 위해 seq2seq 주의집중 모델을 이용하는 접근 방법에 대하여 기술한다. seq2seq 모델은 인코더와 디코더로 분할되어 있고, 일반적으로 RNN(recurrent neural network)를 기반으로 한다. 형태소 분석 및 품사 태깅을 위해 seq2seq 모델의 학습 단계에서 음절 시퀀스는 인코더의 입력으로, 각 음절에 해당하는 품사 태깅 시퀀스는 디코더의 출력으로 사용된다. 여기서 음절 시퀀스와 품사 태깅 시퀀스의 대응관계는 주의집중(attention) 모델을 통해 접근하게 된다. 본 연구는 사전 정보나 자질 정보와 같은 추가적 리소스를 배제한 end-to-end 접근 방법의 실험 결과를 제시한다. 또한, 디코딩 단계에서 빔(beam) 서치와 같은 추가적 프로세스를 배제하는 접근 방법을 취한다.

  • PDF

Disambiguation of Counting Unit Noun using Word Embedding (단어 임베딩을 이용한 단위성 의존명사 분별)

  • Lee, Ju-Sang;Ock, Cheol-Young
    • 한국어정보학회:학술대회논문집
    • /
    • 2016.10a
    • /
    • pp.246-248
    • /
    • 2016
  • 단위성 의존명사는 수나 분량 따위를 나타내는 의존명사로 혼자 사용할 수 없으며 수사나 수관형사와 함께 사용하는 의존명사이다. 단위성 의존명사가 2가지 이상인 동형이의어의 경우 기존의 인접 어절을 이용한 동형이의어 분별 모델에서는 동형이의어 분별에 어려움이 있다. 본 논문에서는 단위성 의존명사 분별을 위해 단어 임베딩을 사용했으며 총 115,767개의 단어를 벡터로 표현하였으며 분별할 의존명사 주변에 등장한 명사들과의 유사도를 계산하여 단위성 의존명사를 분별하였다. 단어 임베딩을 이용한 단위성 의존명사 분별이 효과가 있음을 보았다.

  • PDF

An Investigation of Grammar Design to Consider Minor Sentence in Speech Recognition (조각문을 고려한 음성 인식 문법 설계)

  • Yun, Seung;Kim, Sang-Hun;Park, Jun
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2007.05a
    • /
    • pp.409-410
    • /
    • 2007
  • 조각문이란 문장 성분을 온전히 갖추지 못한 문장으로 일반적인 문장과 달리 종결 어미로 문장을 끝맺지 못하는 문장을 말한다. 실험실 환경에서와 달리 실제 음성 인식 환경에서는 이러한 조각문이 비교적 빈번히 나타나므로 연속 음성 인식 시스템의 성능 향상을 위해서는 이러한 조각문에 대한 고려가 필수적이다. 본 연구에서는 음성 인식 문법 기술에 있어서 조각문을 반영한 경우와 그렇지 않은 경우의 커버리지를 비교해 봄으로써 조각문에 대한 고려가 음성 인식 성능 향상에 기여할 수 있음을 알아 보았다.

  • PDF