Search | Korea Science

Sentence-Chain Based Seq2seq Model for Corpus Expansion

Chung, Euisok;Park, Jeon Gue
- ETRI Journal
- /
- v.39 no.4
- /
- pp.455-466
- /
- 2017
This study focuses on a method for sequential data augmentation in order to alleviate data sparseness problems. Specifically, we present corpus expansion techniques for enhancing the coverage of a language model. Recent recurrent neural network studies show that a seq2seq model can be applied for addressing language generation issues; it has the ability to generate new sentences from given input sentences. We present a method of corpus expansion using a sentence-chain based seq2seq model. For training the seq2seq model, sentence chains are used as triples. The first two sentences in a triple are used for the encoder of the seq2seq model, while the last sentence becomes a target sequence for the decoder. Using only internal resources, evaluation results show an improvement of approximately 7.6% relative perplexity over a baseline language model of Korean text. Additionally, from a comparison with a previous study, the sentence chain approach reduces the size of the training data by 38.4% while generating 1.4-times the number of n-grams with superior performance for English text.
https://doi.org/10.4218/etrij.17.0116.0074 인용 PDF KSCI

An Efficient Index Term Extraction Method in IR using Lexical Chains (정보검색에서 어휘체인을 이용한 효과적인 색인어 추출 방안)

Kang, Bo-Yeong;Lee, Sang-Jo
- Journal of KIISE:Software and Applications
- /
- v.29 no.8
- /
- pp.584-594
- /
- 2002
In information retrieval or digital library, one of the most important factors is to find out the exact information which users need. In this paper, we present an efficient index term extraction method which makes it possible to guess the content of documents and get the information more exactly. To find out index terms in a document, we use lexical chains. Before generating lexical chains, we roughly disambiguate the senses of nouns in a document using specific concept, called semantic window. Semantic window is that we look ahead semantic relations of peripheral nouns and disambiguate the senses of nouns. After generating lexical chains with sense-disambiguated nouns, we find out strong chains by some metrics and extract index terms from a few strong chains. We evaluated our system, using results of a key phrase extraction system, KEA. This system works in general domains of documents Including Information Retrieval and Digital Library.
PDF KSCI

Automatic Summarization based on Lexical Chains considering Word Assocication (단어간의 연관성을 고려한 어휘 체인 기반 자동 요약)

Song, Young-In;Han, Kyoung-Soo;Rim, Hae-Chang
- Annual Conference on Human and Language Technology
- /
- 2002.10e
- /
- pp.300-305
- /
- 2002
자동 문서 요약 분야에서 대상 문서를 컴퓨터가 이해할 수 있는 형태로 어떻게 파악하고 구조화할 것인가는 중요한 이슈가 되어 왔다. 문서에 출현한 단어들은 Bag of Words 가정처럼 서로 독립적으로 존재하는 것이 아니라 문서가 쓰여진 의도에 따라 서로 간의 의미적, 혹은 지시적으로 연관되어 있다. 이러한 단어간의 연관성은 결속성(cohesion)이라고 표현하며, 이를 이용한 자동 방법으로 Barzilay의 어휘 체인(lexical chain)을 사용한 자동 방법이 대표적이다. 본 연구에서는 단어간의 연관성과 영문 시소러스인 워드넷(wordnet)에서 단어의 위치 정보를 사용하여 어휘 체인의 성능을 개선하였고, 대상 문서의 개념을 어휘 체인에 기반해 표현하여 자동의 성능을 개선하는 방안을 제시한다.
PDF

Representations and Responsibilities

Smith, Neil
- Korean Journal of English Language and Linguistics
- /
- v.3 no.4
- /
- pp.527-545
- /
- 2003
I look at the respective responsibilities of different components of the language faculty in the description of two radically different kinds of linguistic phenomenon. The first is the production/perception mismatch in the child's acquisition of the phonology of its first language. There is strong evidence that the child's lexical representations are the same as the adult's, but I argue that the child's own pronunciations, have no linguistic status and are best treated as the product of a neural network. The second is the nature of compositionality, where I argue that compositionality in Natural Language is derivative from that in the Language of Thought. With this assumption and using evidence from quantification in ‘backward control’ structures, I argue that chain theory is intrinsically inimical to a simple view of the legibility relation between LF and LoT.
PDF

An Improved Automatic Text Summarization Based on Lexical Chaining Using Semantical Word Relatedness (단어 간 의미적 연관성을 고려한 어휘 체인 기반의 개선된 자동 문서요약 방법)

Cha, Jun Seok;Kim, Jeong In;Kim, Jung Min
- Smart Media Journal
- /
- v.6 no.1
- /
- pp.22-29
- /
- 2017
Due to the rapid advancement and distribution of smart devices of late, document data on the Internet is on the sharp increase. The increment of information on the Web including a massive amount of documents makes it increasingly difficult for users to understand corresponding data. In order to efficiently summarize documents in the field of automated summary programs, various researches are under way. This study uses TextRank algorithm to efficiently summarize documents. TextRank algorithm expresses sentences or keywords in the form of a graph and understands the importance of sentences by using its vertices and edges to understand semantic relations between vocabulary and sentence. It extracts high-ranking keywords and based on keywords, it extracts important sentences. To extract important sentences, the algorithm first groups vocabulary. Grouping vocabulary is done using a scale of specific weight. The program sorts out sentences with higher scores on the weight scale, and based on selected sentences, it extracts important sentences to summarize the document. This study proved that this process confirmed an improved performance than summary methods shown in previous researches and that the algorithm can more efficiently summarize documents.
PDF KSCI

Open-domain Question Answering Using Lexico-Semantic Patterns (Lexico-Semantic Pattern을 이용한 오픈 도메인 질의 응답 시스템)

Lee, Seung-Woo;Jung, Han-Min;Kwak, Byung-Kwan;Kim, Dong-Seok;Cha, Jeong-Won;An, Joo-Hui;Lee, Gary Geun-Bae;Kim, Hark-Soo;Kim, Kyung-Sun;Seo, Jung-Yun
- Annual Conference on Human and Language Technology
- /
- 2001.10d
- /
- pp.538-545
- /
- 2001
본 연구에서는 오픈 도메인에서 동작할 수 있는 질의 응답 시스템(Open-domain Question Answer ing System)을 구현하고 영어권 TREC에 참가한 결과를 기술하였다. 정답 유형을 18개의 상위 노드를 갖는 계층구조로 분류하였고, 질문 처리에서는 LSP(Lexico-Semantic Pattern)으로 표현된 문법을 사용하여 질문의 정답 유형을 결정하고, lemma 형태와 WordNet 의미, stem 형태의 3가지 유형의 키워드로 구성된 질의를 생성한다. 이 질의를 바탕으로, 패시지 선택에서는 문서검색 엔진에 의해 검색된 문서들을 문장단위로 나눠 정수를 계산하고, 어휘체인(Lexical Chain)을 고려하여 인접한 문장을 결합하여 패시지를 구성하고 순위를 결정한다. 상위 랭크의 패시지를 대상으로, 정답 처리에서는 질문의 정답 유형에 따라 품사와 어휘, 의미 정보로 기술된 LSP 매칭과 AAO (Abbreviation-Appositive-Definition) 처리를 통해 정답을 추출하고 정수를 계산하여 순위를 결정한다. 구현된 시스템의 성능을 평가하기 위해 TREC10 QA Track의 main task의 질문들 중, 200개의 질문에 대해 TRIC 방식으로 자체 평가를 한 결과, MRR(Mean Reciprocal Rank)은 0.341로 TREC9의 상위 시스템들과 견줄 만한 성능을 보였다.
PDF

The Korean Fricatives in Acquisition: A Case Study

Kang, Kyung-Shim
- Speech Sciences
- /
- v.11 no.2
- /
- pp.71-87
- /
- 2004
Korean has a pair of voiceless fricatives, whose laryngeal manifestation comes in parallel to stops and affricates with a three-way lexical contrast. Prior phonetic studies by Kagaya (1974), Iverson (1983), and Kang (1999, 2000) point out /s/ is associated with multiple characteristics of the larynx shared with not only the lax but also the aspirated series, whereas /s' / carries a laryngeal distinction typical of the tense consonants. The complex dual nature of /s/ is again supported by a psycholinguistic study by Kang (2004), as /s/ was found to interact with /$c^h$/ (17% of the time) as well as /c/ (57%) in speech errors. In addition, a recent work by Cho and Lee (2003) notes an interesting chain shift case in the acquisition of the fricatives. Although they observed a significant phonological pattern between child English and Korean, Cho and Lee's description of acquiring fricatives is far from being precise from the perspective of phonetics. From a longitudinal study of recorded tapes by two children at 1;7-3;8 and 1;7-2;1 respectively, I found that /s' / was usually substituted into tense noncontinuants in young children's early production as predicted, whereas /s/ having both lax and aspirated-like glottal properties revealed a complicated pattern of substitutions into lax, tense, and aspirated noncontinuants with a varying degree of preference relative to the subjects. The current acquisition study supports the previous claims concerning fricatives in other languages, showing that their acquisition comes after stops. Besides, it also notes that Korean fricatives are subject to a series of phonological processes called stopping, affricating, tensifying and palatalizing during the transitional period of phonological development by young children. Moreover, between the two voiceless types, /s/ was acquired earlier than /s'/ as the unmarked segment.
PDF

Search Result 7, Processing Time 0.021 seconds

Sentence-Chain Based Seq2seq Model for Corpus Expansion

An Efficient Index Term Extraction Method in IR using Lexical Chains (정보검색에서 어휘체인을 이용한 효과적인 색인어 추출 방안)

Automatic Summarization based on Lexical Chains considering Word Assocication (단어간의 연관성을 고려한 어휘 체인 기반 자동 요약)

Representations and Responsibilities

An Improved Automatic Text Summarization Based on Lexical Chaining Using Semantical Word Relatedness (단어 간 의미적 연관성을 고려한 어휘 체인 기반의 개선된 자동 문서요약 방법)

Open-domain Question Answering Using Lexico-Semantic Patterns (Lexico-Semantic Pattern을 이용한 오픈 도메인 질의 응답 시스템)

The Korean Fricatives in Acquisition: A Case Study

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)