• Title/Summary/Keyword: Document-Classification

Search Result 448, Processing Time 0.035 seconds

Information Technology Application for Oral Document Analysis (구술문서 자료분석을 위한 정보검색기술의 응용)

  • Park, Soon-Cheol;Hahm, Han-Hee
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.13 no.2
    • /
    • pp.47-55
    • /
    • 2008
  • The purpose of this paper is to develop an analytical methodology of or릴 documents by the application of. Information Technologies. This system consists of the key word search, contents summary, clustering, classification & topic tracing of the contents. The integrated model of the five levels of retrieval technologies can be exhaustively used in the analysis of oral documents, which were collected as oral history of five men and women in the area of North Jeolla. Of the five methods topic tracing is the most pioneering accomplishment both home and abroad. In final this research will shed light on the methodological and theoretical studies of oral history and culture.

  • PDF

Patent Document Classification by Using Hierarchical Attention Network (계층적 주의 네트워크를 활용한 특허 문서 분류)

  • Jang, Hyuncheol;Han, Donghee;Ryu, Teaseon;Jang, Hyungkuk;Lim, HeuiSeok
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2018.05a
    • /
    • pp.369-372
    • /
    • 2018
  • 최근 지식경영에 있어 특허를 통한 지식재산권 확보는 기업 운영에 큰 영향을 주는 요소이다. 성공적인 특허 확보를 위해서, 먼저 변화하는 특허 분류 제계를 이해하고, 방대한 특허 정보 데이터를 빠르고 신속하게 특허 분류 체계에 따라 분류화 시킬 필요가 있다. 본 연구에서는 머신 러닝 기술 중에서도 계층적 주의 네트워크를 활용하여 특허 자료의 초록을 학습시켜 분류를 할 수 있는 방법을 제안한다. 그리고 본 연구에서는 제안된 계층적 주의 네트워크의 성능을 검증하기 위해 수정된 입력데이터와 다른 워드 임베딩을 활용하여 진행하였다. 이를 통하여 특허 문서 분류에 활용하려는 계층적 주의 네트워크의 성능과 특허 문서 분류 활용화 방안을 보여주고자 한다. 본 연구의 결과는 많은 기업 지식경영에서 실용적으로 활용할 수 있도록 지식경영 연구자, 기업의 관리자 및 실무자에게 유용한 특허분류기법에 관한 이론적 실무적 활용 방안을 제시한다.

Cross-Domain Text Sentiment Classification Method Based on the CNN-BiLSTM-TE Model

  • Zeng, Yuyang;Zhang, Ruirui;Yang, Liang;Song, Sujuan
    • Journal of Information Processing Systems
    • /
    • v.17 no.4
    • /
    • pp.818-833
    • /
    • 2021
  • To address the problems of low precision rate, insufficient feature extraction, and poor contextual ability in existing text sentiment analysis methods, a mixed model account of a CNN-BiLSTM-TE (convolutional neural network, bidirectional long short-term memory, and topic extraction) model was proposed. First, Chinese text data was converted into vectors through the method of transfer learning by Word2Vec. Second, local features were extracted by the CNN model. Then, contextual information was extracted by the BiLSTM neural network and the emotional tendency was obtained using softmax. Finally, topics were extracted by the term frequency-inverse document frequency and K-means. Compared with the CNN, BiLSTM, and gate recurrent unit (GRU) models, the CNN-BiLSTM-TE model's F1-score was higher than other models by 0.0147, 0.006, and 0.0052, respectively. Then compared with CNN-LSTM, LSTM-CNN, and BiLSTM-CNN models, the F1-score was higher by 0.0071, 0.0038, and 0.0049, respectively. Experimental results showed that the CNN-BiLSTM-TE model can effectively improve various indicators in application. Lastly, performed scalability verification through a takeaway dataset, which has great value in practical applications.

SMS Text Messages Filtering using Word Embedding and Deep Learning Techniques (워드 임베딩과 딥러닝 기법을 이용한 SMS 문자 메시지 필터링)

  • Lee, Hyun Young;Kang, Seung Shik
    • Smart Media Journal
    • /
    • v.7 no.4
    • /
    • pp.24-29
    • /
    • 2018
  • Text analysis technique for natural language processing in deep learning represents words in vector form through word embedding. In this paper, we propose a method of constructing a document vector and classifying it into spam and normal text message, using word embedding and deep learning method. Automatic spacing applied in the preprocessing process ensures that words with similar context are adjacently represented in vector space. Additionally, the intentional word formation errors with non-alphabetic or extraordinary characters are designed to avoid being blocked by spam message filter. Two embedding algorithms, CBOW and skip grams, are used to produce the sentence vector and the performance and the accuracy of deep learning based spam filter model are measured by comparing to those of SVM Light.

Research on Subjective-type Grading System Using Syntactic-Semantic Tree Comparator (구문의미트리 비교기를 이용한 주관식 문항 채점 시스템에 대한 연구)

  • Kang, WonSeog
    • The Journal of Korean Association of Computer Education
    • /
    • v.21 no.6
    • /
    • pp.83-92
    • /
    • 2018
  • The subjective question is appropriate for evaluation of deep thinking, but it is not easy to score. Since, regardless of same scoring criterion, the graders are able to produce different scores, we need the objective automatic evaluation system. However, the system has the problem of Korean analysis and comparison. This paper suggests the Korean syntactic analysis and subjective grading system using the syntactic-semantic tree comparator. This system is the hybrid grading system of word based and syntactic-semantic tree based grading. This system grades the answers on the subjective question using the syntactic-semantic comparator. This proposed system has the good result. This system will be utilized in Korean syntactic-semantic analysis, subjective question grading, and document classification.

Design and Implementation of Web Crawler utilizing Unstructured data

  • Tanvir, Ahmed Md.;Chung, Mokdong
    • Journal of Korea Multimedia Society
    • /
    • v.22 no.3
    • /
    • pp.374-385
    • /
    • 2019
  • A Web Crawler is a program, which is commonly used by search engines to find the new brainchild on the internet. The use of crawlers has made the web easier for users. In this paper, we have used unstructured data by structuralization to collect data from the web pages. Our system is able to choose the word near our keyword in more than one document using unstructured way. Neighbor data were collected on the keyword through word2vec. The system goal is filtered at the data acquisition level and for a large taxonomy. The main problem in text taxonomy is how to improve the classification accuracy. In order to improve the accuracy, we propose a new weighting method of TF-IDF. In this paper, we modified TF-algorithm to calculate the accuracy of unstructured data. Finally, our system proposes a competent web pages search crawling algorithm, which is derived from TF-IDF and RL Web search algorithm to enhance the searching efficiency of the relevant information. In this paper, an attempt has been made to research and examine the work nature of crawlers and crawling algorithms in search engines for efficient information retrieval.

An Efficient Machine Learning-based Text Summarization in the Malayalam Language

  • P Haroon, Rosna;Gafur M, Abdul;Nisha U, Barakkath
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.6
    • /
    • pp.1778-1799
    • /
    • 2022
  • Automatic text summarization is a procedure that packs enormous content into a more limited book that incorporates significant data. Malayalam is one of the toughest languages utilized in certain areas of India, most normally in Kerala and in Lakshadweep. Natural language processing in the Malayalam language is relatively low due to the complexity of the language as well as the scarcity of available resources. In this paper, a way is proposed to deal with the text summarization process in Malayalam documents by training a model based on the Support Vector Machine classification algorithm. Different features of the text are taken into account for training the machine so that the system can output the most important data from the input text. The classifier can classify the most important, important, average, and least significant sentences into separate classes and based on this, the machine will be able to create a summary of the input document. The user can select a compression ratio so that the system will output that much fraction of the summary. The model performance is measured by using different genres of Malayalam documents as well as documents from the same domain. The model is evaluated by considering content evaluation measures precision, recall, F score, and relative utility. Obtained precision and recall value shows that the model is trustable and found to be more relevant compared to the other summarizers.

Comparative Study on the Weldability of Different Shipbuilding Steels

  • Laitinen, R.;Porter, D.;Dahmen, M.;Kaierle, S.;Poprawe, R.
    • International Journal of Korean Welding Society
    • /
    • v.2 no.2
    • /
    • pp.7-13
    • /
    • 2002
  • A comparison of the welding performance of ship hull structural steels has been made. The weldability of steels especially designed for laser processing was compared to that of conventional hull and structural steels with plate thicknesses up to 12 mm. Autogenous laser beam welding was used to weld butt joints as well as skid and stake welded T-joints. The welds were assessed in accordance with the document "The Classification Societies" Requirements for Approval of $CO_2$ Laser Welding Procedures" Small imperfections in the weld only grew slightly in root bend tests and they only had a minor influence on the fatigue properties of laser fillet welded joints. In Charpy impact tests, the 27 J transition temperature of the weld metal and HAZ ranged from below -60 to $-50^{\circ}C$. The amount of martensite in the weld metal depended on the carbon equivalent of the steel with the highest amounts and highest hardness levels in conventional EH 36 (389 HV 5). Thermomechanically rolled steels contained less martensite and showed a correspondingly lower maximum hardness.ximum hardness.

  • PDF

COMPARATIVE STUDY ON THE WELDABILITY OF DIFFERENT SHIPBUILDING STEELS

  • Laitinen, R.;Porter, D.;Dahmen, M.;Kaierle, S.;Poprawe, R.
    • Proceedings of the KWS Conference
    • /
    • 2002.10a
    • /
    • pp.222-228
    • /
    • 2002
  • A comparison of the welding performance of ship hull structural steels has been made. The weldability of steels especially designed for laser processing was compared to that of conventional hull and structural steels with plate thicknesses up to 12 mm. Autogenous laser beam welding was used to weld butt joints as well as skid and stake welded T-joints. The welds were assessed in accordance with the document "The Classification Societies′ Requirements for Approval of $CO_2$ Laser Welding Procedures". Small imperfections in the weld only grew slightly in root bend tests and they only had a minor influence on the fatigue properties of laser fillet welded joints. In Charpy impact tests, the 27 J transition temperature of the weld metal and HAZ ranged from below -60 to -5$0^{\circ}C$. The amount of martensite in the weld metal depended on the carbon equivalent of the steel with the highest amounts and highest hardness levels in conventional EH 36 (389 HV 5). Thermomechanically rolled steels contained less martensite and showed a correspondingly lower maximum hardness.

  • PDF

Convolutional Neural Network-based Malware Classification Method utilizing Local Feature-based Global Image (로컬 특징 기반 글로벌 이미지를 사용한 CNN 기반의 악성코드 분류 방법)

  • Jang, Sejun;Sung, Yunsick
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2020.05a
    • /
    • pp.222-223
    • /
    • 2020
  • 최근 악성코드로 인한 피해가 증가하고 있다. 악성코드는 악성코드가 속한 종류에 따라서 대응하는 방법도 다르기 때문에 악성코드를 종류별로 분류하는 연구도 중요하다. 기존에는 악성코드 시각화 과정을 통해서 생성된 악성코드의 글로벌 이미지를 사용해 악성코드를 각 종류별로 분류한다. 글로벌 이미지를 악성코드로부터 추출한 바이너리 정보를 사용해서 생성한다. 하지만, 글로벌 이미지만을 사용해서 악성코드를 각 종류별로 분류하는 경우 악성코드의 종류별로 중요한 특징을 고려하기 않기 때문에 분류 정확도가 떨어진다. 본 논문에서는 악성코드의 글로벌 이미지에 악성코드의 종류별 특징을 나타내기 위한 로컬 특징 기반 글로벌 이미지를 사용한 악성코드 분류 방법을 제안한다. 첫 번째, 악성 코드로부터 바이너리를 추출하고 추출된 바이너리를 사용해서 글로벌 이미지를 생성한다. 두 번째, 악성 코드로부터 로컬 특징을 추출하고 악성코드의 종류별 핵심 로컬 특징을 단어-역문서 빈도(Term Frequency Inverse Document Frequency, TFIDF) 알고리즘을 사용해 선택한다. 세 번째, 생성된 글로벌 이미지에 악성코드의 패밀리별 핵심 특징을 픽셀화해서 적용한다. 네 번째, 생성된 로컬 특징 기반 글로벌 이미지를 사용해서 컨볼루션 모델을 학습하고, 학습된 컨볼루션 모델을 사용해서 악성코드를 각 종류별로 분류한다.