• Title/Summary/Keyword: engineering documents

Search Result 1,074, Processing Time 0.033 seconds

A Document Ranking Method by Document Clustering Using Bayesian SoM and Botstrap (베이지안 SOM과 붓스트랩을 이용한 문서 군집화에 의한 문서 순위조정)

  • Choe, Jun-Hyeok;Jeon, Seong-Hae;Lee, Jeong-Hyeon
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.7
    • /
    • pp.2108-2115
    • /
    • 2000
  • The conventional Boolean retrieval systems based on vector spae model can provide the results of retrieval fast, they can't reflect exactly user's retrieval purpose including semantic information. Consequently, the results of retrieval process are very different from those users expected. This fact forces users to waste much time for finding expected documents among retrieved documents. In his paper, we designed a bayesian SOM(Self-Organizing feature Maps) in combination with bayesian statistical method and Kohonen network as a kind of unsupervised learning, then perform classifying documents depending on the semantic similarity to user query in real time. If it is difficult to observe statistical characteristics as there are less than 30 documents for clustering, the number of documents must be increased to at least 50. Also, to give high rank to the documents which is most similar to user query semantically among generalized classifications for generalized clusters, we find the similarity by means of Kohonen centroid of each document classification and adjust the secondary rank depending on the similarity.

  • PDF

Topical Clustering Techniques of Twitter Documents Using Korean Wikipedia (한글 위키피디아를 이용한 트위터 문서의 주제별 클러스터링 기법)

  • Chang, Jae-Young
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.14 no.5
    • /
    • pp.189-196
    • /
    • 2014
  • Recently, the need for retrieving documents is growing in SNS environment such as twitter. For supporting the twitter search, a clustering technique classifying the massively retrieved documents in terms of topics is required. However, due to the nature of twitter, there is a limit in applying previous simple techniques to clustering the twitter documents. To overcome such problem, we propose in this paper a new clustering technique suitable to twitter environment. In proposed method, we augment new terms to feature vectors representing the twitter documents, and recalculate the weights of features using Korean Wikipedia. In addition, we performed the experiments with Korean twitter documents, and proved the usability of proposed method through performance comparison with the previous techniques.

An XML Data Management System Using an Object-Relational Database

  • Nam, S.H.;Jung, T.S.;Kim, T.K.;Kim, K.R.;Zahng, H.K.;Yoo, J.S.;Cho, W.S.
    • Proceedings of the Korea Society for Industrial Systems Conference
    • /
    • 2007.02a
    • /
    • pp.163-167
    • /
    • 2007
  • We propose an XML document storage system, called XDMS (XML Document Management System), by using an object-relational DBMS. XDMS generates object database schema from XML Schema and stores the XML documents in an object-relational database. SAX parser is used for understanding the structure of the XML documents, and XDMS transforms the documents into objects in the database. Experiment shows that object-relational databases provide more efficient storage and query model compared with relational databases.

  • PDF

Enhancing Text Document Clustering Using Non-negative Matrix Factorization and WordNet

  • Kim, Chul-Won;Park, Sun
    • Journal of information and communication convergence engineering
    • /
    • v.11 no.4
    • /
    • pp.241-246
    • /
    • 2013
  • A classic document clustering technique may incorrectly classify documents into different clusters when documents that should belong to the same cluster do not have any shared terms. Recently, to overcome this problem, internal and external knowledge-based approaches have been used for text document clustering. However, the clustering results of these approaches are influenced by the inherent structure and the topical composition of the documents. Further, the organization of knowledge into an ontology is expensive. In this paper, we propose a new enhanced text document clustering method using non-negative matrix factorization (NMF) and WordNet. The semantic terms extracted as cluster labels by NMF can represent the inherent structure of a document cluster well. The proposed method can also improve the quality of document clustering that uses cluster labels and term weights based on term mutual information of WordNet. The experimental results demonstrate that the proposed method achieves better performance than the other text clustering methods.

XML-based EDI Document Processing System with Binary Format Mapping Rules

  • Kim, Chang-Su;Jung, Hoe-Kyung
    • Journal of information and communication convergence engineering
    • /
    • v.10 no.3
    • /
    • pp.258-263
    • /
    • 2012
  • Recently, the magnitude of electronic data interchange (EDI) document processing for the handling of port logistics is abruptly being increased. The existing system processes EDI documents in a script mode, but due to a complicated script preparation procedure and low document processing efficiency, it cannot meet the demand as the usage flow of documents increases. In this paper, an EDI electronic document processing system was designed and implemented in a document scanner and mapper, which are binary form electronic document processing tools and do not require script files during the conversion of extensible markup language (XML)-based electronic documents. This new system has the merits of XML features during reading and writing with improved speed, usage convenience, and good portability on systems when compared to the conventional ones.

Access Control of XML Documents using Predictable Flags (예측성 플래그를 이용한 XML 문서의 접근통제 기법)

  • Son, Tae-Yong;Lee, Jong-Hak
    • Journal of Information Technology and Architecture
    • /
    • v.11 no.3
    • /
    • pp.321-332
    • /
    • 2014
  • In this paper we propose a new notion of predictable flags type of authorization for controlling access to XML documents. By using predictable flags, we are able to efficiently detect conflicts between existing authorizations and new authorizations to be added. XML documents have an element-composition hierarchical structure in that a higher level element consists of multiple lower level sub-elements. Many XML documents systems have used the notion of implicit authorization that grants authorizations to an element and the all descendants to avoid the overhead caused by explicitly storing all authorization for each element. When we grant an authorization on an element in the XML documents, the implicit authorization method is inefficient in determining the conflicts since it needs to examine all authorizations on the descendants of that element. In contrast, our mechanism using predictable flags has the advantage of detecting the conflicts immediately at the element where an explicit authorization is to be granted.

BERT-based Classification Model for Korean Documents (한국어 기술문서 분석을 위한 BERT 기반의 분류모델)

  • Hwang, Sangheum;Kim, Dohyun
    • The Journal of Society for e-Business Studies
    • /
    • v.25 no.1
    • /
    • pp.203-214
    • /
    • 2020
  • It is necessary to classify technical documents such as patents, R&D project reports in order to understand the trends of technology convergence and interdisciplinary joint research, technology development and so on. Text mining techniques have been mainly used to classify these technical documents. However, in the case of classifying technical documents by text mining algorithms, there is a disadvantage that the features representing technical documents must be directly extracted. In this study, we propose a BERT-based document classification model to automatically extract document features from text information of national R&D projects and to classify them. Then, we verify the applicability and performance of the proposed model for classifying documents.

Electronic Approval System of XML-based Business Document using Crypto Algorithm (암호 알고리즘을 이용한 XML 기반 비즈니스문서의 전자 결재 시스템)

  • Kim, Chang-Su;Jung, Hoe-Kyung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.10 no.11
    • /
    • pp.1983-1988
    • /
    • 2006
  • There are gradually built on electronic commerce and business information system for the effective and automated use of internet while the mainstream of industry moves on information. It is necessary that a company should develop a electronic approval system because the business documents have application to an electronic commerce, business information system as well Currently, electronic approval system on groupware is using the way of inserting the image of an approval signature, which is vulnerable on a security by attacks of fraudulent use of electrical signature and eavesdropping on electronic documents. In this paper, we implementation XML form generator based on DTD having business documents structure for creating a valid business XML documents. we designed electronic approval system based on secured XML which transfers encrypted documents. For the security issues of written XML business documents, it makes use of the crypto algorithm having high performance transaction by the interchange of public key between a server and a client.

Analysis of Massive Scholarly Keywords using Inverted-Index based Bottom-up Clustering (역인덱스 기반 상향식 군집화 기법을 이용한 대규모 학술 핵심어 분석)

  • Oh, Heung-Seon;Jung, Yuchul
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.11
    • /
    • pp.758-764
    • /
    • 2018
  • Digital documents such as patents, scholarly papers and research reports have author keywords which summarize the topics of documents. Different documents are likely to describe the same topic if they share the same keywords. Document clustering aims at clustering documents to similar topics with an unsupervised learning method. However, it is difficult to apply to a large amount of documents event though the document clustering is utilized to in various data analysis due to computational complexity. In this case, we can cluster and connect massive documents using keywords efficiently. Existing bottom-up hierarchical clustering requires huge computation and time complexity for clustering a large number of keywords. This paper proposes an inverted index based bottom-up clustering for keywords and analyzes the results of clustering with massive keywords extracted from scholarly papers and research reports.

Related Documents Classification System by Similarity between Documents (문서 유사도를 통한 관련 문서 분류 시스템 연구)

  • Jeong, Jisoo;Jee, Minkyu;Go, Myunghyun;Kim, Hakdong;Lim, Heonyeong;Lee, Yurim;Kim, Wonil
    • Journal of Broadcast Engineering
    • /
    • v.24 no.1
    • /
    • pp.77-86
    • /
    • 2019
  • This paper proposes using machine-learning technology to analyze and classify historical collected documents based on them. Data is collected based on keywords associated with a specific domain and the non-conceptuals such as special characters are removed. Then, tag each word of the document collected using a Korean-language morpheme analyzer with its nouns, verbs, and sentences. Embedded documents using Doc2Vec model that converts documents into vectors. Measure the similarity between documents through the embedded model and learn the document classifier using the machine running algorithm. The highest performance support vector machine measured 0.83 of F1-score as a result of comparing the classification model learned.