Search | Korea Science

Latent Semantic Analysis Approach for Document Summarization Based on Word Embeddings

Al-Sabahi, Kamal;Zuping, Zhang;Kang, Yang
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.13 no.1
- /
- pp.254-276
- /
- 2019
Since the amount of information on the internet is growing rapidly, it is not easy for a user to find relevant information for his/her query. To tackle this issue, the researchers are paying much attention to Document Summarization. The key point in any successful document summarizer is a good document representation. The traditional approaches based on word overlapping mostly fail to produce that kind of representation. Word embedding has shown good performance allowing words to match on a semantic level. Naively concatenating word embeddings makes common words dominant which in turn diminish the representation quality. In this paper, we employ word embeddings to improve the weighting schemes for calculating the Latent Semantic Analysis input matrix. Two embedding-based weighting schemes are proposed and then combined to calculate the values of this matrix. They are modified versions of the augment weight and the entropy frequency that combine the strength of traditional weighting schemes and word embedding. The proposed approach is evaluated on three English datasets, DUC 2002, DUC 2004 and Multilingual 2015 Single-document Summarization. Experimental results on the three datasets show that the proposed model achieved competitive performance compared to the state-of-the-art leading to a conclusion that it provides a better document representation and a better document summary as a result.
https://doi.org/10.3837/tiis.2019.01.015 인용 PDF KSCI HTML

Analysis of Effect of Learning to Solve Word Problems through a Structure-Representation Instruction. (문장제 해결에서 구조-표현을 강조한 학습의 교수학적 효과 분석)

이종희;김부미
- School Mathematics
- /
- v.5 no.3
- /
- pp.361-384
- /
- 2003
The purpose of this study was to investigate students' problem solving process based on the model of IDEAL if they learn to solve word problems of simultaneous linear equations through structure-representation instruction. The problem solving model of IDEAL is followed by stages; identifying problems(I), defining problems(D), exploring alternative approaches(E), acting on a plan(A). 160 second-grade students of middle schools participated in a study was classified into those of (a) a control group receiving no explicit instruction of structure-representation in word problem solving, and (b) a group receiving structure-representation instruction followed by IDEAL. As a result of this study, a structure-representation instruction improved word-problem solving performance and the students taught by the structure-representation approach discriminate more sharply equivalent problem, isomorphic problem and similar problem than the students of a control group. Also, students of the group instructed by structure-representation approach have less errors in understanding contexts and using data, in transferring mathematical symbol from internal learning relation of word problem and in setting up an equation than the students of a control group. Especially, this study shows that the model of direct transformation and the model of structure-schema in students' problem solving process of I and D stages.
PDF

A Design of Web-Based System for Mathematical Word Problem Representation Ability Improvement (수학 문장제 표상능력 향상을 위한 웹 기반 시스템의 설계)

Park, Jung-Sik;Kho, Dae-Ghon
- Journal of The Korean Association of Information Education
- /
- v.5 no.2
- /
- pp.185-196
- /
- 2001
Elementary school students feel more difficult the mathematical word problems than the numberical formula. I think that this reason isn't the ability of mathematical calculation but the problems representation. It is demanded exactly understanding about the requirements of problem for improving ability of the mathematical word problem representation. It is necessary that we take multimedia data and communication for this, because web advances multimedia materialization and promotes mutual communication, then it gives us with the most environment for word problem representation learning. According to, this thesis is designed web-based system to improve ability of the mathematical word problem representation, applied the sixth grade it experimentally.
PDF

Effects of Orthographic Knowledge and Phonological Awareness on Visual Word Decoding and Encoding in Children Aged 5-8 Years (5~8세 아동의 철자지식과 음운인식이 시각적 단어 해독과 부호화에 미치는 영향)

Na, Ye-Ju;Ha, Ji-Wan
- Journal of Digital Convergence
- /
- v.14 no.6
- /
- pp.535-546
- /
- 2016
This study examined the relation among orthographic knowledge, phonological awareness, and visual word decoding and encoding abilities. Children aged 5 to 8 years took letter knowledge test, phoneme-grapheme correspondence test, orthographic representation test(regular word and irregular word representation), phonological awareness test(word, syllable and phoneme awareness), word decoding test(regular word and irregular word reading) and word encoding test(regular word and irregular word dictation). The performances of all tasks were significantly different among groups, and there were positive correlations among the tasks. In the word decoding and encoding tests, the variables with the most predictive power were the letter knowledge ability and the orthographic representation ability. It was found that orthographic knowledge more influenced visual word decoding and encoding skills than phonological awareness at these ages.
https://doi.org/10.14400/JDC.2016.14.6.535 인용 PDF KSCI

Effective Korean sentiment classification method using word2vec and ensemble classifier (Word2vec과 앙상블 분류기를 사용한 효율적 한국어 감성 분류 방안)

Park, Sung Soo;Lee, Kun Chang
- Journal of Digital Contents Society
- /
- v.19 no.1
- /
- pp.133-140
- /
- 2018
Accurate sentiment classification is an important research topic in sentiment analysis. This study suggests an efficient classification method of Korean sentiment using word2vec and ensemble methods which have been recently studied variously. For the 200,000 Korean movie review texts, we generate a POS-based BOW feature and a feature using word2vec, and integrated features of two feature representation. We used a single classifier of Logistic Regression, Decision Tree, Naive Bayes, and Support Vector Machine and an ensemble classifier of Adaptive Boost, Bagging, Gradient Boosting, and Random Forest for sentiment classification. As a result of this study, the integrated feature representation composed of BOW feature including adjective and adverb and word2vec feature showed the highest sentiment classification accuracy. Empirical results show that SVM, a single classifier, has the highest performance but ensemble classifiers show similar or slightly lower performance than the single classifier.
https://doi.org/10.9728/dcs.2018.19.1.133 인용 PDF KSCI

A Semantic Representation Based-on Term Co-occurrence Network and Graph Kernel

Noh, Tae-Gil;Park, Seong-Bae;Lee, Sang-Jo
- International Journal of Fuzzy Logic and Intelligent Systems
- /
- v.11 no.4
- /
- pp.238-246
- /
- 2011
This paper proposes a new semantic representation and its associated similarity measure. The representation expresses textual context observed in a context of a certain term as a network where nodes are terms and edges are the number of cooccurrences between connected terms. To compare terms represented in networks, a graph kernel is adopted as a similarity measure. The proposed representation has two notable merits compared with previous semantic representations. First, it can process polysemous words in a better way than a vector representation. A network of a polysemous term is regarded as a combination of sub-networks that represent senses and the appropriate sub-network is identified by context before compared by the kernel. Second, the representation permits not only words but also senses or contexts to be represented directly from corresponding set of terms. The validity of the representation and its similarity measure is evaluated with two tasks: synonym test and unsupervised word sense disambiguation. The method performed well and could compete with the state-of-the-art unsupervised methods.
https://doi.org/10.5391/IJFIS.2011.11.4.238 인용 PDF KSCI

Expansion of Word Representation for Named Entity Recognition Based on Bidirectional LSTM CRFs (Bidirectional LSTM CRF 기반의 개체명 인식을 위한 단어 표상의 확장)

Yu, Hongyeon;Ko, Youngjoong
- Journal of KIISE
- /
- v.44 no.3
- /
- pp.306-313
- /
- 2017
Named entity recognition (NER) seeks to locate and classify named entities in text into pre-defined categories such as names of persons, organizations, locations, expressions of times, etc. Recently, many state-of-the-art NER systems have been implemented with bidirectional LSTM CRFs. Deep learning models based on long short-term memory (LSTM) generally depend on word representations as input. In this paper, we propose an approach to expand word representation by using pre-trained word embedding, part of speech (POS) tag embedding, syllable embedding and named entity dictionary feature vectors. Our experiments show that the proposed approach creates useful word representations as an input of bidirectional LSTM CRFs. Our final presentation shows its efficacy to be 8.05%p higher than baseline NERs with only the pre-trained word embedding vector.
https://doi.org/10.5626/JOK.2017.44.3.306 인용 KSCI

Representation of ambiguous word in Latent Semantic Analysis (LSA모형에서 다의어 의미의 표상)

이태헌;김청택
- Korean Journal of Cognitive Science
- /
- v.15 no.2
- /
- pp.23-31
- /
- 2004
Latent Semantic Analysis (LSA Landauer ＆ Dumais, 1997) is a technique to represent the meanings of words using co-occurrence information of words appearing in he same context, which is usually a sentence or a document. In LSA, a word is represented as a point in multidimensional space where each axis represents a context, and a word's meaning is determined by its frequency in each context. The space is reduced by singular value decomposition (SVD). The present study elaborates upon LSA for use of representation of ambiguous words. The proposed LSA applies rotation of axes in the document space which makes possible to interpret the meaning of un. A simulation study was conducted to illustrate the performance of LSA in representation of ambiguous words. In the simulation, first, the texts which contain an ambiguous word were extracted and LSA with rotation was performed. By comparing loading matrix, we categorized the texts according to meanings. The first meaning of an ambiguous wold was represented by LSA with the matrix excluding the vectors for the other meaning. The other meanings were also represented in the same way. The simulation showed that this way of representation of an ambiguous word can identify the meanings of the word. This result suggest that LSA with axis rotation can be applied to representation of ambiguous words. We discussed that the use of rotation makes it possible to represent multiple meanings of ambiguous words, and this technique can be applied in the area of web searching.
PDF

Effects of the Schema-Based Instructional Program on Word Problem Representation and Solving Ability (시각적 스키마 프로그램이 문장제 표상과 문제해결력에 미치는 효과)

Kim, Jong-Baeg;Lee, Sung-Won
- School Mathematics
- /
- v.13 no.1
- /
- pp.155-173
- /
- 2011
Problem representation is a key aspect in solving word problems. The purpose of this study was to investigate the effects of instructional program based on visual schema representing five types of word problems(Marshall, 1995). Two second grade classes of an elementary school located in Seoul were participated in this study. In experimental class, an instructional program including schema tools were suggested and administered and the other comparison group did have regular classes using diagrams and tables. Pre and post test including 15 word problems each were utilized to test students' problem solving ability. In addition, test scores on students' language ability were used to control the effects of word comprehension level on problem solving. The result revealed that experimental group showed higher problem representation and solving scores after controling the effects of pre-test. In addition, there was significant positive correlation between the ability to apply exact problem schema and problem solving results. The correlation was .58. This study showed even in the early developmental stage young students can get benefits from having instructions of word problem schema.
PDF

Word Representation Analysis of Bio-marker and Disease Word (바이오 마커와 질병 용어의 단어 표현 분석)

Youn, Young-Shin;Nam, Kyung-Min;Kim, Yu-Seop
- Annual Conference on Human and Language Technology
- /
- 2015.10a
- /
- pp.165-168
- /
- 2015
기계학습 기반의 자연어처리 모듈에서 중요한 단계 중 하나는 모듈의 입력으로 단어를 표현하는 것이다. 벡터의 사이즈가 크고, 단어 간의 유사성의 개념이 존재하지 않는 One-hot 형태와 대조적으로 유사성을 표현하기 위해서 단어를 벡터로 표현하는 단어 표현 (word representation/embedding) 생성 작업은 자연어 처리 작업의 기계학습 모델의 성능을 개선하고, 몇몇 자연어 처리 분야의 모델에서 성능 향상을 보여 주어 많은 관심을 받고 있다. 본 논문에서는 Word2Vec, CCA, 그리고 GloVe를 사용하여 106,552개의 PubMed의 바이오메디컬 논문의 요약으로 구축된 말뭉치 카테고리의 각 단어 표현 모델의 카테고리 분류 능력을 확인한다. 세부적으로 나눈 카테고리에는 질병의 이름, 질병 증상, 그리고 난소암 마커가 있다. 분류 능력을 확인하기 위해 t-SNE를 이용하여 2차원으로 단어 표현 결과를 맵핑하여 가시화 한다.
PDF

Search Result 166, Processing Time 0.02 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)