• Title/Summary/Keyword: 텍스트기반 분류

Search Result 354, Processing Time 0.029 seconds

A study on Similarity analysis of National R&D Programs using R&D Project's technical classification (R&D과제의 기술분류를 이용한 사업간 유사도 분석 기법에 관한 연구)

  • Kim, Ju-Ho;Kim, Young-Ja;Kim, Jong-Bae
    • Journal of Digital Contents Society
    • /
    • v.13 no.3
    • /
    • pp.317-324
    • /
    • 2012
  • Recently, coordination task of similarity between national R&D programs is emphasized on view from the R&D investment efficiency. But the previous similarity search method like text-based similarity search which using keyword of R&D projects has reached the limit due to deviation of document's quality. For the solve the limitations of text-based similarity search using the keyword extraction, in this study, utilization of R&D project's technical classification will be discussed as a new similarity search method when analyzed of similarity between national R&D programs. To this end, extracts the Science and Technology Standard Classification of R & D projects which are collected when national R&D Survey & analysis, and creates peculiar vector model of each R&D programs. Verify a reliability of this study by calculate the cosine-based and Euclidean distance-based similarity and compare with calculated the text-based similarity.

A Study on Educational Data Mining for Public Data Portal through Topic Modeling Method with Latent Dirichlet Allocation (LDA기반 토픽모델링을 활용한 공공데이터 기반의 교육용 데이터마이닝 연구)

  • Seungki Shin
    • Journal of The Korean Association of Information Education
    • /
    • v.26 no.5
    • /
    • pp.439-448
    • /
    • 2022
  • This study aims to search for education-related datasets provided by public data portals and examine what data types are constructed through classification using topic modeling methods. Regarding the data of the public data portal, 3,072 cases of file data in the education field were collected based on the classification system. Text mining analysis was performed using the LDA-based topic modeling method with stopword processing and data pre-processing for each dataset. Program information and student-supporting notifications were usually provided in the pre-classified dataset for education from the data portal. On the other hand, the characteristics of educational programs and supporting information for the disabled, parents, the elderly, and children through the perspective of lifelong education were generally indicated in the dataset collected by searching for education. The results of data analysis through this study show that providing sufficient educational information through the public data portal would be better to help the students' data science-based decision-making and problem-solving skills.

Hypermedia, Multimedia and Hypertext: Definitions and Overview (하이퍼미디어.멀티미디어.하이퍼텍스트: 정의(定義)와 개관(槪觀))

  • Kim, Ji-Hee
    • Journal of Information Management
    • /
    • v.25 no.1
    • /
    • pp.24-46
    • /
    • 1994
  • In this paper I will discuss definitions of hypermedia, multimedia and hypertext. Hypertext is the grouping of relevant information in the form of nodes. These nodes are then connected together through links. In the case of hypertext the nodes contain text or graphics. Multimedia is the combining of different media types for example sound, animation, text, graphics and video for the presentation of information by making use of computers. Hypermedia can be viewed as an extension of hypertext and multimedia. It is based on the concept of hypertext that uses nodes and links in the structuring of information in the system. In this case the nodes consist of an the different data types that are mentioned in the multimedia definition above. The 'node-and-link' concept is used in organisation of the information in hypermedia systems. The 'book' metaphor is an example of the way these systems are implemented. This concept is explained and a few advantages and disadvantages of making use of hypermedia systems are discussed. A new approach for the development of hypermedia systems, namely the knowledge-based approach is now looked into. Joel Peing-Ling Loo proposed this approach because he thought that it is the most effective way for handling this kind of technology. A semantic-based hypermedia model is developed in this approach to formulate solutions for the restrictions in presenting information authoring, maintenance and retrieval. The knowledge-based presentation of information includes the use of conventional data structures. These data structures make use of frames(objects), slots and the inheritance theory that is also used in expert systems. Relations develop between the different objects as these objects are included in the database. Relations can also exist between frames by means of attributes that belong to the frames.

  • PDF

The Construction of a Domain-Specific Sentiment Dictionary Using Graph-based Semi-supervised Learning Method (그래프 기반 준지도 학습 방법을 이용한 특정분야 감성사전 구축)

  • Kim, Jung-Ho;Oh, Yean-Ju;Chae, Soo-Hoan
    • Science of Emotion and Sensibility
    • /
    • v.18 no.1
    • /
    • pp.103-110
    • /
    • 2015
  • Sentiment lexicon is an essential element for expressing sentiment on a text or recognizing sentiment from a text. We propose a graph-based semi-supervised learning method to construct a sentiment dictionary as sentiment lexicon set. In particular, we focus on the construction of domain-specific sentiment dictionary. The proposed method makes up a graph according to lexicons and proximity among lexicons, and sentiments of some lexicons which already know their sentiment values are propagated throughout all of the lexicons on the graph. There are two typical types of the sentiment lexicon, sentiment words and sentiment phrase, and we construct a sentiment dictionary by creating each graph of them and infer sentiment of all sentiment lexicons. In order to verify our proposed method, we constructed a sentiment dictionary specific to the movie domain, and conducted sentiment classification experiments with it. As a result, it have been shown that the classification performance using the sentiment dictionary is better than the other using typical general-purpose sentiment dictionary.

A Suggestion of the Direction of Construction Disaster Document Management through Text Data Classification Model based on Deep Learning (딥러닝 기반 분류 모델의 성능 분석을 통한 건설 재해사례 텍스트 데이터의 효율적 관리방향 제안)

  • Kim, Hayoung;Jang, YeEun;Kang, HyunBin;Son, JeongWook;Yi, June-Seong
    • Korean Journal of Construction Engineering and Management
    • /
    • v.22 no.5
    • /
    • pp.73-85
    • /
    • 2021
  • This study proposes an efficient management direction for Korean construction accident cases through a deep learning-based text data classification model. A deep learning model was developed, which categorizes five categories of construction accidents: fall, electric shock, flying object, collapse, and narrowness, which are representative accident types of KOSHA. After initial model tests, the classification accuracy of fall disasters was relatively high, while other types were classified as fall disasters. Through these results, it was analyzed that 1) specific accident-causing behavior, 2) similar sentence structure, and 3) complex accidents corresponding to multiple types affect the results. Two accuracy improvement experiments were then conducted: 1) reclassification, 2) elimination. As a result, the classification performance improved with 185.7% when eliminating complex accidents. Through this, the multicollinearity of complex accidents, including the contents of multiple accident types, was resolved. In conclusion, this study suggests the necessity to independently manage complex accidents while preparing a system to describe the situation of future accidents in detail.

A Classification and Selection Method of Emotion Based on Classifying Emotion Terms by Users (사용자의 정서 단어 분류에 기반한 정서 분류와 선택 방법)

  • Rhee, Shin-Young;Ham, Jun-Seok;Ko, Il-Ju
    • Science of Emotion and Sensibility
    • /
    • v.15 no.1
    • /
    • pp.97-104
    • /
    • 2012
  • Recently, a big text data has been produced by users, an opinion mining to analyze information and opinion about users is becoming a hot issue. Of the opinion mining, especially a sentiment analysis is a study for analysing emotions such as a positive, negative, happiness, sadness, and so on analysing personal opinions or emotions for commercial products, social issues and opinions of politician. To analyze the sentiment analysis, previous studies used a mapping method setting up a distribution of emotions using two dimensions composed of a valence and arousal. But previous studies set up a distribution of emotions arbitrarily. In order to solve the problem, we composed a distribution of 12 emotions through carrying out a survey using Korean emotion words list. Also, certain emotional states on two dimension overlapping multiple emotions, we proposed a selection method with Roulette wheel method using a selection probability. The proposed method shows to classify a text into emotion extracting emotion terms from a text.

  • PDF

BERT & Hierarchical Graph Convolution Neural Network based Emotion Analysis Model (BERT 및 계층 그래프 컨볼루션 신경망 기반 감성분석 모델)

  • Zhang, Junjun;Shin, Jongho;An, Suvin;Park, Taeyoung;Noh, Giseop
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.10a
    • /
    • pp.34-36
    • /
    • 2022
  • In the existing text sentiment analysis models, the entire text is usually directly modeled as a whole, and the hierarchical relationship between text contents is less considered. However, in the practice of sentiment analysis, many texts are mixed with multiple emotions. If the semantic modeling of the whole is directly performed, it may increase the difficulty of the sentiment analysis model to judge the sentiment, making the model difficult to apply to the classification of mixed-sentiment sentences. Therefore, this paper proposes a sentiment analysis model BHGCN that considers the text hierarchy. In this model, the output of hidden states of each layer of BERT is used as a node, and a directed connection is made between the upper and lower layers to construct a graph network with a semantic hierarchy. The model not only pays attention to layer-by-layer semantics, but also pays attention to hierarchical relationships. Suitable for handling mixed sentiment classification tasks. The comparative experimental results show that the BHGCN model exhibits obvious competitive advantages.

  • PDF

Methods of Korean Text Data Quality Assessment (한국어 텍스트 데이터의 품질 평가 요소 및 방법)

  • Kim, Jung-Wook;Hong, Cho-hee;Lee, Saebyeok
    • Annual Conference on Human and Language Technology
    • /
    • 2018.10a
    • /
    • pp.619-622
    • /
    • 2018
  • 최근 데이터의 형태는 점점 다양화되고 증가하고 있기 때문에 데이터의 체계적 분류 및 관리의 필요성이 증대되고 있다. 이러한 목적을 위하여 데이터에 대한 품질 평가는 중요한 요소가 된다. 최근 데이터는 기존의 정형화된 데이터보다 비정형 데이터가 대부분을 차지하고 있다. 그러나 기존의 데이터 품질 평가는 정형 데이터에 편중되어 왔다. 따라서 다양한 형태와 의미를 가지고 있는 비정형 데이터는 기존의 평가 기술로는 품질을 측정하기 어렵다. 이와 같은 문제로 본 논문은 텍스트기반의 비정형 데이터에 적용 가능한 영역별 평가 지표를 구축하고, 신문기사와 커뮤니티(질의응답)데이터를 사용하여 각 요소별 품질을 측정하여 그 결과에 대해서 고찰하였다.

  • PDF

An Extended Hypertext Data Model based on Object-Oriented Praradigm (객체지향 개념을 기반으로한 하이퍼텍스트 데이터 모델)

  • 이재무;임해철
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.19 no.9
    • /
    • pp.1680-1691
    • /
    • 1994
  • We propose an extended hypertext data model based on object oriented paradigm that can easily the real world and semantics. We use the BNF notation to formalize the model. In our model, We introduce conceptional navigation by associating semantics on links and drive intelligent navigation using weights on links to alleviate user disorientation problem which is currently somewhat vague. We functionally classify the hypertext node into three types:Indexing node, Content node, Extract node and likewise classify the link into Alink type, Rlink type, Slink type. We believe that the typed node and typed link approach accommodate efficient query/search in hypertext.

  • PDF

피싱 웹사이트 URL의 수준별 특징 모델링을 위한 컨볼루션 신경망과 게이트 순환신경망의 퓨전 신경망

  • Bu, Seok-Jun;Kim, Hae-Jung
    • Review of KIISC
    • /
    • v.29 no.3
    • /
    • pp.29-36
    • /
    • 2019
  • 폭발적으로 성장하는 소셜 미디어 서비스로 인해 개인간의 연결이 강화된 환경에서는 URL로써 전파되는 피싱 공격의 위험성이 크게 강조된다. 최근 텍스트 분류 및 모델링 분야에서 그 성능을 입증받은 딥러닝 알고리즘은 피싱 URL의 구문적, 의미적 특징을 각각 모델링하기에 적절하지만, 기존에 사용하는 규칙 기반 앙상블 방법으로는 문자와 단어로부터 추출되는 특징간의 비선형적인 관계를 효과적으로 융합하는데 한계가 있다. 본 논문에서는 피싱 URL의 구문적, 의미적 특징을 체계적으로 융합하기 위한 컨볼루션 신경망 기반의 퓨전 신경망을 제안하고 기계학습 방법 중 최고의 분류정확도 (0.9804)를 달성하였다. 학습 및 테스트 데이터셋으로 45,000건의 정상 URL과 15,000건의 피싱 URL을 수집하였고, 정량적 검증으로 10겹 교차검증과 ROC커브, 정성적 검증으로 오분류 케이스와 딥러닝 내부 파라미터를 시각화하여 분석하였다.