Search | Korea Science

Automatic extraction of similar poetry for study of literary texts: An experiment on Hindi poetry

Prakash, Amit;Singh, Niraj Kumar;Saha, Sujan Kumar
- ETRI Journal
- /
- v.44 no.3
- /
- pp.413-425
- /
- 2022
The study of literary texts is one of the earliest disciplines practiced around the globe. Poetry is artistic writing in which words are carefully chosen and arranged for their meaning, sound, and rhythm. Poetry usually has a broad and profound sense that makes it difficult to be interpreted even by humans. The essence of poetry is Rasa, which signifies mood or emotion. In this paper, we propose a poetry classification-based approach to automatically extract similar poems from a repository. Specifically, we perform a novel Rasa-based classification of Hindi poetry. For the task, we primarily used lexical features in a bag-of-words model trained using the support vector machine classifier. In the model, we employed Hindi WordNet, Latent Semantic Indexing, and Word2Vec-based neural word embedding. To extract the rich feature vectors, we prepared a repository containing 37 717 poems collected from various sources. We evaluated the performance of the system on a manually constructed dataset containing 945 Hindi poems. Experimental results demonstrated that the proposed model attained satisfactory performance.
https://doi.org/10.4218/etrij.2019-0396 인용 PDF KSCI

Document Clustering using Clustering and Wikipedi (군집과 위키피디아를 이용한 문서군집)

Park, Sun;Lee, Seong Ho;Park, Hee Man;Kim, Won Ju;Kim, Dong Jin;Chandra, Abel;Lee, Seong Ro
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2012.10a
- /
- pp.392-393
- /
- 2012
This paper proposes a new document clustering method using clustering and Wikipedia. The proposed method can well represent the concept of cluster topics by means of NMF. It can solve the problem of "bags of words" to be not considered the meaningful relationships between documents and clusters, which expands the important terms of cluster by using of the synonyms of Wikipedia. The experimental results demonstrate that the proposed method achieves better performance than other document clustering methods.
PDF

A Study on Research Trends of Graph-Based Text Representations for Text Mining (텍스트 마이닝을 위한 그래프 기반 텍스트 표현 모델의 연구 동향)

Chang, Jae-Young
- The Journal of the Institute of Internet, Broadcasting and Communication
- /
- v.13 no.5
- /
- pp.37-47
- /
- 2013
Text Mining is a research area of retrieving high quality hidden information such as patterns, trends, or distributions through analyzing unformatted text. Basically, since text mining assumes an unstructured text, it needs to be represented as a simple text model for analyzing it. So far, most frequently used model is VSM(Vector Space Model), in which a text is represented as a bag of words. However, recently much researches tried to apply a graph-based text model for representing semantic relationships between words. In this paper, we survey research trends of graph-based text representation models for text mining. Additionally, we also discuss about future models of graph-based text mining.
https://doi.org/10.7236/JIIBC.2013.13.5.37 인용 PDF KSCI

The Effect of Several Paper Bags on Fruit Skin Coloration of Red Skin European Pear 'Kalle' (봉지종류가 적색과피 서양배 'Kalle'의 과피색 발현에 미치는 영향)

Kim, Yoon-Kyeong;Kang, Sam-Seok;Choi, Jang-Jeon;Park, Kyoung-Sub;Won, Kyeong-Ho;Lee, Han-Chan;Han, Tae-Ho
- Horticultural Science & Technology
- /
- v.32 no.1
- /
- pp.10-17
- /
- 2014
This study was conducted to elucidate the relationship between light and coloring and to obtain basic results for promoting redness expression in 'Kalle' (Pyrus communis L.) pear skin. It was investigated in location of anthocyanin layer by microscopic observation and differences in skin color expression of 'Kalle' bagged with paper bag which has different light transmittance rate and inside temperature. However, there was no anthocyanin layer in the brown skin and golden yellow color, anthocyanin layer was distributed in epidermins or hyperdermis of red skin pear and apple. Dark red colored 'Kalle' had more anthocyanin content, $29.8mg{\cdot}100g^{-1}$ FW than light red colored apple 'Hongro'. Light transmittance rate of physical characteristics used paper bags was the highest in white paper bag, 42.2% and it also had more light quantity, $8.9{\mu}mol$ than any other tested paper bags in specific wave length 650-655 nm. The maximum temperature of inner bag was higher about $3^{\circ}C$ in yellow paper bag. The red coloration and anthocyanin contents in no bagged fruits were higher than in any other bagged fruit. However, red color expression among the bagged fruits was higher in white paper bag than in double layered black paper bag and yellow paper bag. Also, chromaticity value seemd to be a good index to explain variation of fruit skin color, because anthocyanin content and chromaticity value were higher. Based on these results, it is desirable to cultivate 'Kalle' without bag for stable redness expression but bagging is essential for decreasing damage by insect in Korea. Further examination to find suitable time of removing paperbag for redness expression and decreasing insect damage. In addition, it is required to develop paperbag whose transmittance rate is high in specific light wavelength or temperature of inner bags is low. Additional key words: anthocyanin, bagging, chromaticity value, light transmittance, Pyrus communis L.
https://doi.org/10.7235/hort.2014.13030 인용 PDF KSCI

Sentiment Classification of Movie Reviews using Levenshtein Distance (Levenshtein 거리를 이용한 영화평 감성 분류)

Ahn, Kwang-Mo;Kim, Yun-Suk;Kim, Young-Hoon;Seo, Young-Hoon
- Journal of Digital Contents Society
- /
- v.14 no.4
- /
- pp.581-587
- /
- 2013
In this paper, we propose a method of sentiment classification which uses Levenshtein distance. We generate BOW(Bag-Of-Word) applying Levenshtein daistance in sentiment features and used it as the training set. Then the machine learning algorithms we used were SVMs(Support Vector Machines) and NB(Naive Bayes). As the data set, we gather 2,385 reviews of movies from an online movie community (Daum movie service). From the collected reviews, we pick sentiment words up manually and sorted 778 words. In the experiment, we perform the machine learning using previously generated BOW which was applied Levenshtein distance in sentiment words and then we evaluate the performance of classifier by a method, 10-fold-cross validation. As the result of evaluation, we got 85.46% using Multinomial Naive Bayes as the accuracy when the Levenshtein distance was 3. According to the result of the experiment, we proved that it is less affected to performance of the classification in spelling errors in documents.
https://doi.org/10.9728/dcs.2013.14.4.581 인용 PDF KSCI

Topic Classification for Suicidology

Read, Jonathon;Velldal, Erik;Ovrelid, Lilja
- Journal of Computing Science and Engineering
- /
- v.6 no.2
- /
- pp.143-150
- /
- 2012
Computational techniques for topic classification can support qualitative research by automatically applying labels in preparation for qualitative analyses. This paper presents an evaluation of supervised learning techniques applied to one such use case, namely, that of labeling emotions, instructions and information in suicide notes. We train a collection of one-versus-all binary support vector machine classifiers, using cost-sensitive learning to deal with class imbalance. The features investigated range from a simple bag-of-words and n-grams over stems, to information drawn from syntactic dependency analysis and WordNet synonym sets. The experimental results are complemented by an analysis of systematic errors in both the output of our system and the gold-standard annotations.
https://doi.org/10.5626/JCSE.2012.6.2.143 인용 PDF KSCI KPUBS

Emerging Topic Detection Using Text Embedding and Anomaly Pattern Detection in Text Streaming Data (텍스트 스트리밍 데이터에서 텍스트 임베딩과 이상 패턴 탐지를 이용한 신규 주제 발생 탐지)

Choi, Semok;Park, Cheong Hee
- Journal of Korea Multimedia Society
- /
- v.23 no.9
- /
- pp.1181-1190
- /
- 2020
Detection of an anomaly pattern deviating normal data distribution in streaming data is an important technique in many application areas. In this paper, a method for detection of an newly emerging pattern in text streaming data which is an ordered sequence of texts is proposed based on text embedding and anomaly pattern detection. Using text embedding methods such as BOW(Bag Of Words), Word2Vec, and BERT, the detection performance of the proposed method is compared. Experimental results show that anomaly pattern detection using BERT embedding gave an average F1 value of 0.85 and the F1 value of 1 in three cases among five test cases.
https://doi.org/10.9717/kmms.2020.23.9.1181 인용 PDF KSCI HTML

Text Classification for Patents: Experiments with Unigrams, Bigrams and Different Weighting Methods

Im, ChanJong;Kim, DoWan;Mandl, Thomas
- International Journal of Contents
- /
- v.13 no.2
- /
- pp.66-74
- /
- 2017
Patent classification is becoming more critical as patent filings have been increasing over the years. Despite comprehensive studies in the area, there remain several issues in classifying patents on IPC hierarchical levels. Not only structural complexity but also shortage of patents in the lower level of the hierarchy causes the decline in classification performance. Therefore, we propose a new method of classification based on different criteria that are categories defined by the domain's experts mentioned in trend analysis reports, i.e. Patent Landscape Report (PLR). Several experiments were conducted with the purpose of identifying type of features and weighting methods that lead to the best classification performance using Support Vector Machine (SVM). Two types of features (noun and noun phrases) and five different weighting schemes (TF-idf, TF-rf, TF-icf, TF-icf-based, and TF-idcef-based) were experimented on.
https://doi.org/10.5392/IJoC.2017.13.2.066 인용 PDF KSCI

Adaptive Bayesian Object Tracking with Histograms of Dense Local Image Descriptors

Kim, Minyoung
- International Journal of Fuzzy Logic and Intelligent Systems
- /
- v.16 no.2
- /
- pp.104-110
- /
- 2016
Dense local image descriptors like SIFT are fruitful for capturing salient information about image, shown to be successful in various image-related tasks when formed in bag-of-words representation (i.e., histograms). In this paper we consider to utilize these dense local descriptors in the object tracking problem. A notable aspect of our tracker is that instead of adopting a point estimate for the target model, we account for uncertainty in data noise and model incompleteness by maintaining a distribution over plausible candidate models within the Bayesian framework. The target model is also updated adaptively by the principled Bayesian posterior inference, which admits a closed form within our Dirichlet prior modeling. With empirical evaluations on some video datasets, the proposed method is shown to yield more accurate tracking than baseline histogram-based trackers with the same types of features, often being superior to the appearance-based (visual) trackers.
https://doi.org/10.5391/IJFIS.2016.16.2.104 인용 PDF KSCI

Domain Adaptation Image Classification Based on Multi-sparse Representation

Zhang, Xu;Wang, Xiaofeng;Du, Yue;Qin, Xiaoyan
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.11 no.5
- /
- pp.2590-2606
- /
- 2017
Generally, research of classical image classification algorithms assume that training data and testing data are derived from the same domain with the same distribution. Unfortunately, in practical applications, this assumption is rarely met. Aiming at the problem, a domain adaption image classification approach based on multi-sparse representation is proposed in this paper. The existences of intermediate domains are hypothesized between the source and target domains. And each intermediate subspace is modeled through online dictionary learning with target data updating. On the one hand, the reconstruction error of the target data is guaranteed, on the other, the transition from the source domain to the target domain is as smooth as possible. An augmented feature representation produced by invariant sparse codes across the source, intermediate and target domain dictionaries is employed for across domain recognition. Experimental results verify the effectiveness of the proposed algorithm.
https://doi.org/10.3837/tiis.2017.05.016 인용 PDF KSCI

Search Result 90, Processing Time 0.028 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)