• Title/Summary/Keyword: Thematic word

Search Result 13, Processing Time 0.02 seconds

Effective Thematic Words Extraction from a Book using Compound Noun Phrase Synthesis Method

  • Ahn, Hee-Jeong;Kim, Kee-Won;Kim, Seung-Hoon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.3
    • /
    • pp.107-113
    • /
    • 2017
  • Most of online bookstores are providing a user with the bibliographic book information rather than the concrete information such as thematic words and atmosphere. Especially, thematic words help a user to understand books and cast a wide net. In this paper, we propose an efficient extraction method of thematic words from book text by applying the compound noun and noun phrase synthetic method. The compound nouns represent the characteristics of a book in more detail than single nouns. The proposed method extracts the thematic word from book text by recognizing two types of noun phrases, such as a single noun and a compound noun combined with single nouns. The recognized single nouns, compound nouns, and noun phrases are calculated through TF-IDF weights and extracted as main words. In addition, this paper suggests a method to calculate the frequency of subject, object, and other roles separately, not just the sum of the frequencies of all nouns in the TF-IDF calculation method. Experiments is carried out in the field of economic management, and thematic word extraction verification is conducted through survey and book search. Thus, 9 out of the 10 experimental results used in this study indicate that the thematic word extracted by the proposed method is more effective in understanding the content. Also, it is confirmed that the thematic word extracted by the proposed method has a better book search result.

A Automatic Document Summarization Method based on Principal Component Analysis

  • Kim, Min-Soo;Lee, Chang-Beom;Baek, Jang-Sun;Lee, Guee-Sang;Park, Hyuk-Ro
    • Communications for Statistical Applications and Methods
    • /
    • v.9 no.2
    • /
    • pp.491-503
    • /
    • 2002
  • In this paper, we propose a automatic document summarization method based on Principal Component Analysis(PCA) which is one of the multivariate statistical methods. After extracting thematic words using PCA, we select the statements containing the respective extracted thematic words, and make the document summary with them. Experimental results using newspaper articles show that the proposed method is superior to the method using either word frequency or information retrieval thesaurus.

Text Summarization using PCA and SVD (주성분 분석과 비정칙치 분해를 이용한 문서 요약)

  • Lee, Chang-Beom;Kim, Min-Soo;Baek, Jang-Sun;Park, Hyuk-Ro
    • The KIPS Transactions:PartB
    • /
    • v.10B no.7
    • /
    • pp.725-734
    • /
    • 2003
  • In this paper, we propose the text summarization method using PCA (Principal Component Analysis) and SVD (Singular Value Decomposition). The proposed method presents a summary by extracting significant sentences based on the distances between thematic words and sentences. To extract thematic words, we use both word frequency and co-occurence information that result from performing PCA. To extract significant sentences, we exploit Euclidean distances between thematic word vectors and sentence vectors that result from carrying out SVD. Experimental results using newspaper articles show that the proposed method is superior to the method using either word frequency or only PCA.

Document Thematic words Extraction using Principal Component Analysis (주성분 분석을 이용한 문서 주제어 추출)

  • Lee, Chang-Beom;Kim, Min-Soo;Lee, Ki-Ho;Lee, Guee-Sang;Park, Hyuk-Ro
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.10
    • /
    • pp.747-754
    • /
    • 2002
  • In this paper, We propose a document thematic words extraction by using principal component analysis(PCA) which is one of the multivariate statistical methods. The proposed PCA model understands the flow of words in the document by using an eigenvalue and an eigenvector, and extracts thematic words. The proposed model is estimated by applying to document summarization. Experimental results using newspaper articles show that the proposed model is superior to the model using either word frequency or information retrieval thesaurus. We expect that the Proposed model can be applied to information retrieval , information extraction and document summarization.

Thematic Word Extraction from Book Based on Keyword Weighting Method (키워드 가중치 방식에 근거한 도서 본문 주제어 추출)

  • Ahn, Hee-Jeong;Choi, Gun-Hee;Kim, Seung-Hoon
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2015.01a
    • /
    • pp.19-22
    • /
    • 2015
  • 본 논문에서는 문장 및 문단에서 키워드의 역할에 따른 가중치에 근거하여 도서 본문에서 주제어를 추출하는 방법을 제안한다. 기존의 주제어 추출 방식은 도서 본문이 아닌 신문이나 논문에 대한 방식이므로 도서 본문에서의 주제어 추출에 그대로 적용하기에는 어려움이 있다. 따라서 본 논문에서는 빈도수뿐만 아니라 문장 내 중요 요소에 대한 가중치와 중요 문장에 대한 가중치를 후보 키워드에 부여하는 방식을 제안하였다. 제안한 계산 방식을 비문학 도서에 대하여 실험한 결과, 빈도수만으로 주제어를 추출한 기존 방식보다 본 논문에서 제안한 방식의 주제어 추출 결과의 정확도가 향상되는 것을 확인하였다.

  • PDF

A Study on the Characterization of Post-Modernism Interior Design inAttri Language bute (언어성에서 본 포스트모더니즘 실내디자인의 특성연구)

  • 이춘섭
    • Archives of design research
    • /
    • no.18
    • /
    • pp.15-23
    • /
    • 1996
  • Interior is composed of void spaces decided by many substances and the function of the spaces which creat its valuation. Accordingly interior space makes peculiar forms by synthesizing each interior elements and its abstractness. Therefore characteristic of the expressed word and symbolic system become interior wholeness. The focus of this paper is to analyze interior words and the symbolic systems scientifically. The content is composed of 5 major parts. The 1st and 2nd chapter deal with Introduction and the Characterization of Interior Design. Linguistic symbol and metaphor which decide Post-Modern interior style were studied in the 3rd chapter. In the 4th chapter. this paper analysed linguistic character of interior words which need to express character of tradition. decoration, and publicity, expccially through code and metaphor. Finally Thematic I-louse library interior space designed by Charles Jencks was linguistically researched as a study model.

  • PDF

A Study on Plagiarism Detection and Document Classification Using Association Analysis (연관분석을 이용한 효과적인 표절검사 및 문서분류에 관한 연구)

  • Hwang, Insoo
    • The Journal of Information Systems
    • /
    • v.23 no.3
    • /
    • pp.127-142
    • /
    • 2014
  • Plagiarism occurs when the content is copied without permission or citation, and the problem of plagiarism has rapidly increased because of the digital era of resources available on the World Wide Web. An important task in plagiarism detection is measuring and determining similar text portions between a given pair of documents. One of the main difficulties of this task is that not all similar text fragments are examples of plagiarism, since thematic coincidences also tend to produce portions of similar text. In order to handle this problem, this paper proposed association analysis in data mining to detect plagiarism. This method is able to detect common actions performed by plagiarists such as word deletion, insertion and transposition, allowing to obtain plausible portions of plagiarized text. Experimental results employing an unsupervised document classification strategy showed that the proposed method outperformed traditionally used approaches.

Thematic Analysis for Classifying the E-Learning Challenges and the Suggested Solutions: The Unusual Era of the COVID-19

  • Nazari, Behzad;Hussin, AB Razak Bin Che;Niknejad, Naghmeh
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.13 no.4
    • /
    • pp.79-89
    • /
    • 2021
  • Electronic learning (e-learning) empowers the higher education in providing sustainable instruction during the infrequent circumstance when the wide-spreading disastrous challenge of the COVID-19 results in the closure of various sectors in the society. During this time, e-learning serves the levels of the education sector such as higher education well by delivering and receiving materials from distance with respect to movement restrictions imposed by the government, for example the Movement Control Order (MCO) in Malaysia. In this qualitative survey, the existing e-learning challenges and the recommended solutions to the problems from the senior lecturers' perspectives were collected through an online open-ended questionnaire. A number of five senior lecturers out of eight at the Universiti Teknologi Malaysia (UTM) answered the questionnaire. The UTM has been capable of providing e-learning courses for all of its lecturers and students during the closure of higher education institutions owing to the pernicious health conditions stemmed from the crisis of the COVID-19. The major existing challenges found in the e-learning program at the UTM and the suggested solutions to address them are listed and the main themes are illustrated in the word cloud format using the NVivo software. In the end, the conclusion is paragraphed and the future work is proposed. Overall, the purpose of this study is to address the e-learning challenges and to prepare a list of recommendations that can serve as solutions from the standpoint of the UTM senior lecturers during the MCO in Malaysia.

Examining Suicide Tendency Social Media Texts by Deep Learning and Topic Modeling Techniques (딥러닝 및 토픽모델링 기법을 활용한 소셜 미디어의 자살 경향 문헌 판별 및 분석)

  • Ko, Young Soo;Lee, Ju Hee;Song, Min
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.32 no.3
    • /
    • pp.247-264
    • /
    • 2021
  • This study aims to create a deep learning-based classification model to classify suicide tendency by suicide corpus constructed for the present study. Also, to analyze suicide factors, the study classified suicide tendency corpus into detailed topics by using topic modeling, an analysis technique that automatically extracts topics. For this purpose, 2,011 documents of the suicide-related corpus collected from social media naver knowledge iN were directly annotated into suicide-tendency documents or non-suicide-tendency documents based on suicide prevention education manual issued by the Central Suicide Prevention Center, and we also conducted the deep learning model(LSTM, BERT, ELECTRA) performance evaluation based on the classification model, using annotated corpus data. In addition, one of the topic modeling techniques, LDA identified suicide factors by classifying thematic literature, and co-word analysis and visualization were conducted to analyze the factors in-depth.

Digital humanities Research Trends on Marcel Proust (마르셀 프루스트에 관한 디지털인문학적 연구 동향분석)

  • Jinyoung MIN
    • The Journal of the Convergence on Culture Technology
    • /
    • v.10 no.3
    • /
    • pp.181-188
    • /
    • 2024
  • Fueled by the digital transformation era, the 150th anniversary of Marcel Proust's birth (2021) and 100th anniversary of his death (2022) witnessed a surge in digital humanities research. This goes beyond supplementing traditional methods; it fosters new approaches like Nicolas Lagonneau's 'Proustonomics' website (archiving online/offline Proust discourse) and 'Proustographe' (quantifying and visualizing data related to Proust). The Buffalo Proust Project (2021) provided online access to materials on his life and works, while the Corr-Proust project digitized his correspondence. While Korea lacks established digital Proust research, recent analysis of academic paper vocabulary (through word frequencies and word clouds) reveals significant thematic and quantitative development around 2000, paving the way for future Korean ventures in this exciting field. Digital humanities research offers the potential to unearth new research topics, enhance efficiency, and promote international collaboration, ultimately leading to a deeper understanding of Proust and groundbreaking advancements in the field.