• 제목/요약/키워드: Semantic Classification

검색결과 329건 처리시간 0.03초

Arabic Stock News Sentiments Using the Bidirectional Encoder Representations from Transformers Model

  • Eman Alasmari;Mohamed Hamdy;Khaled H. Alyoubi;Fahd Saleh Alotaibi
    • International Journal of Computer Science & Network Security
    • /
    • 제24권2호
    • /
    • pp.113-123
    • /
    • 2024
  • Stock market news sentiment analysis (SA) aims to identify the attitudes of the news of the stock on the official platforms toward companies' stocks. It supports making the right decision in investing or analysts' evaluation. However, the research on Arabic SA is limited compared to that on English SA due to the complexity and limited corpora of the Arabic language. This paper develops a model of sentiment classification to predict the polarity of Arabic stock news in microblogs. Also, it aims to extract the reasons which lead to polarity categorization as the main economic causes or aspects based on semantic unity. Therefore, this paper presents an Arabic SA approach based on the logistic regression model and the Bidirectional Encoder Representations from Transformers (BERT) model. The proposed model is used to classify articles as positive, negative, or neutral. It was trained on the basis of data collected from an official Saudi stock market article platform that was later preprocessed and labeled. Moreover, the economic reasons for the articles based on semantic unit, divided into seven economic aspects to highlight the polarity of the articles, were investigated. The supervised BERT model obtained 88% article classification accuracy based on SA, and the unsupervised mean Word2Vec encoder obtained 80% economic-aspect clustering accuracy. Predicting polarity classification on the Arabic stock market news and their economic reasons would provide valuable benefits to the stock SA field.

딥러닝 기반 실내 디자인 인식 (Deep Learning-based Interior Design Recognition)

  • 이원규;박지훈;이종혁;정희철
    • 대한임베디드공학회논문지
    • /
    • 제19권1호
    • /
    • pp.47-55
    • /
    • 2024
  • We spend a lot of time in indoor space, and the space has a huge impact on our lives. Interior design plays a significant role to make an indoor space attractive and functional. However, it should consider a lot of complex elements such as color, pattern, and material etc. With the increasing demand for interior design, there is a growing need for technologies that analyze these design elements accurately and efficiently. To address this need, this study suggests a deep learning-based design analysis system. The proposed system consists of a semantic segmentation model that classifies spatial components and an image classification model that classifies attributes such as color, pattern, and material from the segmented components. Semantic segmentation model was trained using a dataset of 30000 personal indoor interior images collected for research, and during inference, the model separate the input image pixel into 34 categories. And experiments were conducted with various backbones in order to obtain the optimal performance of the deep learning model for the collected interior dataset. Finally, the model achieved good performance of 89.05% and 0.5768 in terms of accuracy and mean intersection over union (mIoU). In classification part convolutional neural network (CNN) model which has recorded high performance in other image recognition tasks was used. To improve the performance of the classification model we suggests an approach that how to handle data that has data imbalance and vulnerable to light intensity. Using our methods, we achieve satisfactory results in classifying interior design component attributes. In this paper, we propose indoor space design analysis system that automatically analyzes and classifies the attributes of indoor images using a deep learning-based model. This analysis system, used as a core module in the A.I interior recommendation service, can help users pursuing self-interior design to complete their designs more easily and efficiently.

ISO 15926 국제 표준을 이용한 원자력 플랜트 기자재 분류체계 (Development of the ISO 15926-based Classification Structure for Nuclear Plant Equipment)

  • 윤진현;문두환;한순흥;조광종
    • 한국CDE학회논문집
    • /
    • 제12권3호
    • /
    • pp.191-199
    • /
    • 2007
  • In order to construct a data warehouse of process plant equipment, a classification structure should be defined first, identifying not only the equipment categories but also attributes of an each equipment to represent the specifications of equipment. ISO 15926 Process Plants is an international standard dealing with the life-cycle data of process plant facilities. From the viewpoints of defining classification structure, Part 2 data model and Reference Data Library (RDL) of ISO 15926 are seen to respectively provide standard syntactic structure and semantic vocabulary, facilitating the exchange and sharing of plant equipment's life-cycle data. Therefore, the equipment data warehouse with an ISO 15926-based classification structure has the advantage of easy integration among different engineering systems. This paper introduces ISO 15926 and then discusses how to define a classification structure with ISO 15926 Part 2 data model and RDL. Finally, we describe the development result of an ISO 15926-based classification structure for a variety of equipment consisting in the reactor coolant system (RCS) of APR 1400 nuclear plant.

의무 기록 문서 분류를 위한 자연어 처리에서 최적의 벡터화 방법에 대한 비교 분석 (Comparative Analysis of Vectorization Techniques in Electronic Medical Records Classification)

  • 유성림
    • 대한의용생체공학회:의공학회지
    • /
    • 제43권2호
    • /
    • pp.109-115
    • /
    • 2022
  • Purpose: Medical records classification using vectorization techniques plays an important role in natural language processing. The purpose of this study was to investigate proper vectorization techniques for electronic medical records classification. Material and methods: 403 electronic medical documents were extracted retrospectively and classified using the cosine similarity calculated by Scikit-learn (Python module for machine learning) in Jupyter Notebook. Vectors for medical documents were produced by three different vectorization techniques (TF-IDF, latent sematic analysis and Word2Vec) and the classification precisions for three vectorization techniques were evaluated. The Kruskal-Wallis test was used to determine if there was a significant difference among three vectorization techniques. Results: 403 medical documents were relevant to 41 different diseases and the average number of documents per diagnosis was 9.83 (standard deviation=3.46). The classification precisions for three vectorization techniques were 0.78 (TF-IDF), 0.87 (LSA) and 0.79 (Word2Vec). There was a statistically significant difference among three vectorization techniques. Conclusions: The results suggest that removing irrelevant information (LSA) is more efficient vectorization technique than modifying weights of vectorization models (TF-IDF, Word2Vec) for medical documents classification.

의미 수준이 다른 비즈니스 프로세스의 검색 방법 (A methodology for discovering business processes in different semantic levels)

  • 최영환;채희권;김광수
    • 한국경영과학회:학술대회논문집
    • /
    • 한국경영과학회/대한산업공학회 2003년도 춘계공동학술대회
    • /
    • pp.1128-1135
    • /
    • 2003
  • e-Transformation of an enterprise requires the collaboration of business processes to be suited to the business participants' purpose. To realize this collaboration, business processes should be implemented as components and the system developers could be able to reuse the components for their specific purpose. The first step of this collaboration is the discovery of exact components for business processes. A dilemma, however, is the fact that there are thousands or even millions of business processes which vary from one enterprise to another. Moreover, business processes could be decomposed into multiple levels of semantics and classified into several process areas. In general, discovery of exact business processes requires understanding of widely adopted classification schemes such as CBPC, OAGIS, or SCOR. To cope with this obstacle, business process metadata should be defined and managed regardless of specific classification schemes to support effective discovery and reuse of business processes components. In this paper, a methodology to discover business process components published in different semantic levels is proposed. The proposed methodology represents the metadata of business process components as topic maps stored in a registry and utilizes the powerful features of topic maps for process discovery. TM4J, an open-source topic map engine, is modified to support concept matching and navigation. With the implemented tool, application system developers can discover and publish the business process components effectively.

  • PDF

귀금속.보석 상품정보 온톨로지 구축에 관한 연구 (A Study on the Development of Ontology based on the Jewelry Brand Information)

  • 이기영
    • 한국컴퓨터정보학회논문지
    • /
    • 제13권7호
    • /
    • pp.247-256
    • /
    • 2008
  • 본 연구에서는 웹 문서에서의 단순 키워드 매칭으로 검색하는 전자상거래시스템의 문제점을 해결하기 위한 방안으로 도메인 온톨로지를 자동으로 생성하고 이를 기반으로 지능형 에이전트기술을 접목함으로서 의사소통이 단일화된 상품검색시스템을 개발한다. 온톨로지 개발은 국제상품분류코드(UNSPSC)와 귀금속 보석 사이트들의 분류정보를 기반으로 대표용어를 추출하고 유사관계 시소러스 적용하여 표준화된 온톨로지를 구축하며 지능형에이전트 기술을 검색 단계에서 접목시켜 사용자에게 정보수집의 효율성을 지원하도록 시맨틱 웹을 지원하는 상거래 시스템을 설계하고 구현한다. 또한 개인화된 검색 환경을 지원하기 위해 사용자 프로파일을 설계하고, 개인화 검색 에이전트와 추론기능을 이용한 검색 환경을 제공함으로서 정보수집의 신속성과 정확한 정보검색이 가능하도록 지원한다.

  • PDF

스포츠 장르 분석을 위한 스포츠 뉴스 비디오의 의미적 장면 분류 (Semantic Scenes Classification of Sports News Video for Sports Genre Analysis)

  • 송미영
    • 한국멀티미디어학회논문지
    • /
    • 제10권5호
    • /
    • pp.559-568
    • /
    • 2007
  • 앵커 장면 검출은 내용기반 뉴스 비디오 색인과 검색 시스템에서 비디오 장면의 의미적 파싱과 색인을 추출하는데 중요한 역할을 한다. 이 논문은 스포츠 뉴스의 단위 구조화를 위해서 뉴스 동영상에 존재하는 앵커 구간을 구분해내는 효율적인 알고리즘을 제안한다. 앵커 장면을 검출하기 위해서, 우선 MPEG4 압축 비디오에서 DCT 계수치와 모션 방향성 정보를 이용하여 앵커 후보 장면을 결정한다. 그리고 검출된 후보앵커 장면으로부터 영상처리 방법을 활용하여 뉴스 비디오를 앵커 장면과 비앵커(스포츠) 장면으로 분류한다. 제안된 방법은 앵커 장면 검출 실험에서 평균적으로 98%의 정확도와 재현율을 얻었다.

  • PDF

한국어 학습자 대상 관형격 조사 '의'의 교육 내용 재고: 학습자 말뭉치에 나타난 오류를 바탕으로 (A Study to Rethink the Components of Teaching Korean Genitive Particle '의': Based on the Errors in Korean Learners' Corpus)

  • 이수현;심지영
    • 한국산업융합학회 논문집
    • /
    • 제26권3호
    • /
    • pp.443-454
    • /
    • 2023
  • The purpose of this study is to reveal the Korean learners' usage pattern of '의', the genitive particle, according to semantic classification, so that it can be referred to in determining the contents and methods of related education. The method of this study adopts a quantitative analysis using learners corpus established by National Institute of Korean Language. As a result of the analysis, as proficiency increases, the overall frequency of '의' increases and the number of meaning senses used increases. However, the frequency of errors also increases with it. As for the usage pattern of each sense, the meaning of 'ownership, belonging' is the most frequent, and followed by 'acting entity', 'kinship, social relations', and 'relationship(area)'. In conclusion, the meanings of 'acting subjects' and 'relationships(area) need to be supplemented with explicit education. Other meanings need to be discussed, and decisions should be made in consideration of learning purpose and proficiency.

구문 의미 이해 기반의 VOC 요약 및 분류 (VOC Summarization and Classification based on Sentence Understanding)

  • 김문종;이재안;한규열;안영민
    • 정보과학회 컴퓨팅의 실제 논문지
    • /
    • 제22권1호
    • /
    • pp.50-55
    • /
    • 2016
  • VOC(Voice of Customer)는 기업의 제품 또는 서비스에 대한 고객의 의견이나 요구를 파악할 수 있는 중요한 데이터이다. 그러나 VOC 데이터는 대화체의 특징으로 인해 내용의 분절이나 중복이 다수 존재할 뿐 아니라 다양한 내용의 대화가 포함되어 유형을 파악하는데 어려움이 있다. 본 논문에서는, 문서에서 중요한 의미를 갖는 키워드와 품사, 형태소 등을 언어 자원으로 선정하였고, 이를 바탕으로 문장의 구조 및 의미를 이해하기 위한 LSP(Lexico-Semantic-Pattern, 어휘 의미 패턴)를 정의하여 구문 의미 이해 기반의 주요 문장을 요약문으로 추출하였다. 요약문을 생성함에 있어 분절된 문장을 연결하고 중복된 의미를 갖는 문장을 줄이는 방법을 제안하였다. 또한 카테고리 별로 어휘 의미 패턴을 정의하고 어휘 의미 패턴에 매칭된 주요 문장이 속한 카테고리를 기반으로 문서를 분류하였다. 실험에서는 VOC 데이터를 대상으로 문서를 분류하고 요약문을 생성하여 기존의 방법들과 비교하였다.

STW를 이용한 웹 문서 장르 분류에 관한 연구 (A Research for Web Documents Genre Classification using STW)

  • 고병규;오군석;김판구
    • 정보화연구
    • /
    • 제9권4호
    • /
    • pp.413-422
    • /
    • 2012
  • 웹 문서의 지속적인 증가로 인해 텍스트 기반, Page Rank 등의 방법으로 한 연구들이 증가하고 있다. 특히 웹 문서 내 URL 정보, HTML Tag 정보 등을 활용하는 연구들이 다시 주목을 받고 있다. 따라서 웹 문서 장르 분류를 위해 앞서 언급한 웹 문서 내 특징 요소들을 바탕으로 본 논문에서는 STW(Semantic Term Weight)를 적용하여 웹 문서 장르 분류하는 연구를 기술한다. 웹 문서 장르 분류에 사용되는 데이터 셋은 학습 문서와 테스트 문서로 구성되고, SVM 알고리즘을 사용하여 웹 문서 분류 실험을 수행한다. 학습 과정을 위해 20-Genre-collection corpus 내 1,000여개의 문서를 선정하여 SVM 알고리즘을 통해 학습하였고, 테스트 과정에서 사용된 데이터 셋은 KI-04 corpus를 사용하였다. 테스트 과정 후 STW를 사용한 실험과 STW를 사용하지 않은 실험으로 분류하여 정확도를 측정하였다. 또한 이를 바탕으로 1,212개의 테스트 문서를 분류하였다. 그 결과 STW를 사용한 실험 이 그렇지 않은 실험 보다 약 10.2% 높은 정확도를 보였다.