• Title/Summary/Keyword: Library Classification Research

Search Result 205, Processing Time 0.025 seconds

A Study of 'Emotion Trigger' by Text Mining Techniques (텍스트 마이닝을 이용한 감정 유발 요인 'Emotion Trigger'에 관한 연구)

  • An, Juyoung;Bae, Junghwan;Han, Namgi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.69-92
    • /
    • 2015
  • The explosion of social media data has led to apply text-mining techniques to analyze big social media data in a more rigorous manner. Even if social media text analysis algorithms were improved, previous approaches to social media text analysis have some limitations. In the field of sentiment analysis of social media written in Korean, there are two typical approaches. One is the linguistic approach using machine learning, which is the most common approach. Some studies have been conducted by adding grammatical factors to feature sets for training classification model. The other approach adopts the semantic analysis method to sentiment analysis, but this approach is mainly applied to English texts. To overcome these limitations, this study applies the Word2Vec algorithm which is an extension of the neural network algorithms to deal with more extensive semantic features that were underestimated in existing sentiment analysis. The result from adopting the Word2Vec algorithm is compared to the result from co-occurrence analysis to identify the difference between two approaches. The results show that the distribution related word extracted by Word2Vec algorithm in that the words represent some emotion about the keyword used are three times more than extracted by co-occurrence analysis. The reason of the difference between two results comes from Word2Vec's semantic features vectorization. Therefore, it is possible to say that Word2Vec algorithm is able to catch the hidden related words which have not been found in traditional analysis. In addition, Part Of Speech (POS) tagging for Korean is used to detect adjective as "emotional word" in Korean. In addition, the emotion words extracted from the text are converted into word vector by the Word2Vec algorithm to find related words. Among these related words, noun words are selected because each word of them would have causal relationship with "emotional word" in the sentence. The process of extracting these trigger factor of emotional word is named "Emotion Trigger" in this study. As a case study, the datasets used in the study are collected by searching using three keywords: professor, prosecutor, and doctor in that these keywords contain rich public emotion and opinion. Advanced data collecting was conducted to select secondary keywords for data gathering. The secondary keywords for each keyword used to gather the data to be used in actual analysis are followed: Professor (sexual assault, misappropriation of research money, recruitment irregularities, polifessor), Doctor (Shin hae-chul sky hospital, drinking and plastic surgery, rebate) Prosecutor (lewd behavior, sponsor). The size of the text data is about to 100,000(Professor: 25720, Doctor: 35110, Prosecutor: 43225) and the data are gathered from news, blog, and twitter to reflect various level of public emotion into text data analysis. As a visualization method, Gephi (http://gephi.github.io) was used and every program used in text processing and analysis are java coding. The contributions of this study are as follows: First, different approaches for sentiment analysis are integrated to overcome the limitations of existing approaches. Secondly, finding Emotion Trigger can detect the hidden connections to public emotion which existing method cannot detect. Finally, the approach used in this study could be generalized regardless of types of text data. The limitation of this study is that it is hard to say the word extracted by Emotion Trigger processing has significantly causal relationship with emotional word in a sentence. The future study will be conducted to clarify the causal relationship between emotional words and the words extracted by Emotion Trigger by comparing with the relationships manually tagged. Furthermore, the text data used in Emotion Trigger are twitter, so the data have a number of distinct features which we did not deal with in this study. These features will be considered in further study.

EST Profiling for Seed-hair Characteristic and Development of EST-SSR and SNP Markers in Carrot (당근 종모 형질 관련 EST profiling과 이를 이용한 EST-SSR 및 SNP 마커 개발)

  • Oh, Gyu-Dong;Hwang, Eun-Mi;Shim, Eun-Jo;Jeon, Sang-Jin;Park, Young-Doo
    • Horticultural Science & Technology
    • /
    • v.28 no.6
    • /
    • pp.1025-1038
    • /
    • 2010
  • Carrot ($Daucus$ $carota$ L. var. $sativa$) is one of the most widely used crops in the world. Moreover it is an important crop because of its high content of ${\beta}$-carotene, well-known as the precursor of vitamin A carotenoid. However, seed-hair which is generated in epidermal cell of seeds inhibits absorption and germination. For that reason, carrot seeds are commercialized after mechanical hair removal process. To overcome such cumbersome weaknesses, new breeding program for developing hairless-seed carrot cultivar has been needed. Therefore, in this study, cDNA libraries from seeds of short-hair seed phenotype CT-ATR615 OP 666-13line and hairy seed CT-ATR615 OP-CK1-9 line were constructed and expression patterns related to generation of seed-hair were analyzed by comparison of EST sequences. Differential EST sequence results between two lines were classified into FunCat functional categories based on the results of BlastX search. Higher expression quantities belonging to metabolic category were shown on short-hair seed line than hairy-seed one. Differential expression quantities between those two lines in the protein folding and stabilization, subcellular localization categories were supposed to contribute variously on the generation of seed-hair. We confirmed 50 and 59 SSR sites, and 2 SNP sites by analyzing EST sequences in two lines; thereafter, we designed SNP and SSR primer sets from these EST sequence information as a molecular marker. These markers are thought to be used in research of molecular markers for classification of carrot family and related to various traits, as well as seed-hair characteristic.

A Systematic Review on the Present Condition of the Internal Robot Therapy (국내 로봇치료 연구 현황에 대한 체계적 고찰)

  • Song, Ji-Hyeon;Sim, Eun-Ji;Yom, Ji-Yun;Oh, Min-Kyeong;Yi, Hu-Shin;Yoo, Doo-Han
    • The Journal of Korean society of community based occupational therapy
    • /
    • v.6 no.1
    • /
    • pp.49-60
    • /
    • 2016
  • Objective : By organizing systematically the study case that use Robot Therapy as intervention tool according to PICO (Patient, Intervention, Comparison, Outcome), This study aims to investigate the domestic Robot Therapy's present condition. Methods : We searched 710 pieces of domestic scientific journal and master's thesis during the past nine years in 'Research Information Sharing Service' and 'National Digital Science Library' database using the keyword 'Robot therapy'. We finally chose 15 pieces of domestic scientific journal and master's thesis among the domestic studies that based on the full text which is affordable and used robot by therapeutic intervention tool. Chosen studies were layed out by PICO that could organize the resources systematically. Results : The quality of study tool was used to the method of evidence-based study level of 5 step classification. More than three stages of quality level study was 13. Result of dividing the studies using robot therapy by intervention field, language, lower extremity(gait), cognition, development and study for the region of the upper extremity of five is advancing. Conclusion : Nationally, the robot therapy has been used in various area that include the upper extremity and lower extremity's intervention of language, cognition, growth and others. We hope that this study for baseline data will be utilized in various area engaging to domestic robot therapy.

The Effect of the Quality of Pre-Assigned Subject Categories on the Text Categorization Performance (학습문헌집합에 기 부여된 범주의 정확성과 문헌 범주화 성능)

  • Shim, Kyung;Chung, Young-Mee
    • Journal of the Korean Society for information Management
    • /
    • v.23 no.2
    • /
    • pp.265-285
    • /
    • 2006
  • In text categorization a certain level of correctness of labels assigned to training documents is assumed without solid knowledge on that of real-world collections. Our research attempts to explore the quality of pre-assigned subject categories in a real-world collection, and to identify the relationship between the quality of category assignment in training set and text categorization performance. Particularly, we are interested in to what extent the performance can be improved by enhancing the quality (i.e., correctness) of category assignment in training documents. A collection of 1,150 abstracts in computer science is re-classified by an expert group, and divided into 907 training documents and 227 test documents (15 duplicates are removed). The performances of before and after re-classification groups, called Initial set and Recat-1/Recat-2 sets respectively, are compared using a kNN classifier. The average correctness of subject categories in the Initial set is 16%, and the categorization performance with the Initial set shows 17% in $F_1$ value. On the other hand, the Recat-1 set scores $F_1$ value of 61%, which is 3.6 times higher than that of the Initial set.

A New Approach to Automatic Keyword Generation Using Inverse Vector Space Model (키워드 자동 생성에 대한 새로운 접근법: 역 벡터공간모델을 이용한 키워드 할당 방법)

  • Cho, Won-Chin;Rho, Sang-Kyu;Yun, Ji-Young Agnes;Park, Jin-Soo
    • Asia pacific journal of information systems
    • /
    • v.21 no.1
    • /
    • pp.103-122
    • /
    • 2011
  • Recently, numerous documents have been made available electronically. Internet search engines and digital libraries commonly return query results containing hundreds or even thousands of documents. In this situation, it is virtually impossible for users to examine complete documents to determine whether they might be useful for them. For this reason, some on-line documents are accompanied by a list of keywords specified by the authors in an effort to guide the users by facilitating the filtering process. In this way, a set of keywords is often considered a condensed version of the whole document and therefore plays an important role for document retrieval, Web page retrieval, document clustering, summarization, text mining, and so on. Since many academic journals ask the authors to provide a list of five or six keywords on the first page of an article, keywords are most familiar in the context of journal articles. However, many other types of documents could not benefit from the use of keywords, including Web pages, email messages, news reports, magazine articles, and business papers. Although the potential benefit is large, the implementation itself is the obstacle; manually assigning keywords to all documents is a daunting task, or even impractical in that it is extremely tedious and time-consuming requiring a certain level of domain knowledge. Therefore, it is highly desirable to automate the keyword generation process. There are mainly two approaches to achieving this aim: keyword assignment approach and keyword extraction approach. Both approaches use machine learning methods and require, for training purposes, a set of documents with keywords already attached. In the former approach, there is a given set of vocabulary, and the aim is to match them to the texts. In other words, the keywords assignment approach seeks to select the words from a controlled vocabulary that best describes a document. Although this approach is domain dependent and is not easy to transfer and expand, it can generate implicit keywords that do not appear in a document. On the other hand, in the latter approach, the aim is to extract keywords with respect to their relevance in the text without prior vocabulary. In this approach, automatic keyword generation is treated as a classification task, and keywords are commonly extracted based on supervised learning techniques. Thus, keyword extraction algorithms classify candidate keywords in a document into positive or negative examples. Several systems such as Extractor and Kea were developed using keyword extraction approach. Most indicative words in a document are selected as keywords for that document and as a result, keywords extraction is limited to terms that appear in the document. Therefore, keywords extraction cannot generate implicit keywords that are not included in a document. According to the experiment results of Turney, about 64% to 90% of keywords assigned by the authors can be found in the full text of an article. Inversely, it also means that 10% to 36% of the keywords assigned by the authors do not appear in the article, which cannot be generated through keyword extraction algorithms. Our preliminary experiment result also shows that 37% of keywords assigned by the authors are not included in the full text. This is the reason why we have decided to adopt the keyword assignment approach. In this paper, we propose a new approach for automatic keyword assignment namely IVSM(Inverse Vector Space Model). The model is based on a vector space model. which is a conventional information retrieval model that represents documents and queries by vectors in a multidimensional space. IVSM generates an appropriate keyword set for a specific document by measuring the distance between the document and the keyword sets. The keyword assignment process of IVSM is as follows: (1) calculating the vector length of each keyword set based on each keyword weight; (2) preprocessing and parsing a target document that does not have keywords; (3) calculating the vector length of the target document based on the term frequency; (4) measuring the cosine similarity between each keyword set and the target document; and (5) generating keywords that have high similarity scores. Two keyword generation systems were implemented applying IVSM: IVSM system for Web-based community service and stand-alone IVSM system. Firstly, the IVSM system is implemented in a community service for sharing knowledge and opinions on current trends such as fashion, movies, social problems, and health information. The stand-alone IVSM system is dedicated to generating keywords for academic papers, and, indeed, it has been tested through a number of academic papers including those published by the Korean Association of Shipping and Logistics, the Korea Research Academy of Distribution Information, the Korea Logistics Society, the Korea Logistics Research Association, and the Korea Port Economic Association. We measured the performance of IVSM by the number of matches between the IVSM-generated keywords and the author-assigned keywords. According to our experiment, the precisions of IVSM applied to Web-based community service and academic journals were 0.75 and 0.71, respectively. The performance of both systems is much better than that of baseline systems that generate keywords based on simple probability. Also, IVSM shows comparable performance to Extractor that is a representative system of keyword extraction approach developed by Turney. As electronic documents increase, we expect that IVSM proposed in this paper can be applied to many electronic documents in Web-based community and digital library.