Search | Korea Science

Semantic Similarity Measures Between Words within a Document using WordNet (워드넷을 이용한 문서내에서 단어 사이의 의미적 유사도 측정)

Kang, SeokHoon;Park, JongMin
- Journal of the Korea Academia-Industrial cooperation Society
- /
- v.16 no.11
- /
- pp.7718-7728
- /
- 2015
Semantic similarity between words can be applied in many fields including computational linguistics, artificial intelligence, and information retrieval. In this paper, we present weighted method for measuring a semantic similarity between words in a document. This method uses edge distance and depth of WordNet. The method calculates a semantic similarity between words on the basis of document information. Document information uses word term frequencies(TF) and word concept frequencies(CF). Each word weight value is calculated by TF and CF in the document. The method includes the edge distance between words, the depth of subsumer, and the word weight in the document. We compared out scheme with the other method by experiments. As the result, the proposed method outperforms other similarity measures. In the document, the word weight value is calculated by the proposed method. Other methods which based simple shortest distance or depth had difficult to represent the information or merge informations. This paper considered shortest distance, depth and information of words in the document, and also improved the performance.
https://doi.org/10.5762/KAIS.2015.16.11.7718 인용 PDF KSCI

Text Document Classification Scheme using TF-IDF and Naïve Bayes Classifier (TF-IDF와 Naïve Bayes 분류기를 활용한 문서 분류 기법)

Yoo, Jong-Yeol;Hyun, Sang-Hyun;Yang, Dong-Min
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2015.10a
- /
- pp.242-245
- /
- 2015
Recently due to large-scale data spread in digital economy, the era of big data is coming. Through big data, unstructured text data consisting of technical text document, confidential document, false information documents are experiencing serious problems in the runoff. To prevent this, the need of art to sort and process the document consisting of unstructured text data has increased. In this paper, we propose a novel text classification scheme which learns some data sets and correctly classifies unstructured text data into two different categories, True and False. For the performance evaluation, we implement our proposed scheme using $Na{\ddot{i}}ve$ Bayes document classifier and TF-IDF modules in Python library, and compare it with the existing document classifier.
PDF

Document Classification using Recurrent Neural Network with Word Sense and Contexts (단어의 의미와 문맥을 고려한 순환신경망 기반의 문서 분류)

Joo, Jong-Min;Kim, Nam-Hun;Yang, Hyung-Jeong;Park, Hyuck-Ro
- KIPS Transactions on Software and Data Engineering
- /
- v.7 no.7
- /
- pp.259-266
- /
- 2018
In this paper, we propose a method to classify a document using a Recurrent Neural Network by extracting features considering word sense and contexts. Word2vec method is adopted to include the order and meaning of the words expressing the word in the document as a vector. Doc2vec is applied for considering the context to extract the feature of the document. RNN classifier, which includes the output of the previous node as the input of the next node, is used as the document classification method. RNN classifier presents good performance for document classification because it is suitable for sequence data among neural network classifiers. We applied GRU (Gated Recurrent Unit) model which solves the vanishing gradient problem of RNN. It also reduces computation speed. We used one Hangul document set and two English document sets for the experiments and GRU based document classifier improves performance by about 3.5% compared to CNN based document classifier.
https://doi.org/10.3745/KTSDE.2018.7.7.259 인용 PDF KSCI

A Text Detection Method Using Wavelet Packet Analysis and Unsupervised Classifier

Lee, Geum-Boon;Odoyo Wilfred O.;Kim, Kuk-Se;Cho, Beom-Joon
- Journal of information and communication convergence engineering
- /
- v.4 no.4
- /
- pp.174-179
- /
- 2006
In this paper we present a text detection method inspired by wavelet packet analysis and improved fuzzy clustering algorithm(IAFC).This approach assumes that the text and non-text regions are considered as two different texture regions. The text detection is achieved by using wavelet packet analysis as a feature analysis. The wavelet packet analysis is a method of wavelet decomposition that offers a richer range of possibilities for document image. From these multi scale features, we adapt the improved fuzzy clustering algorithm based on the unsupervised learning rule. The results show that our text detection method is effective for document images scanned from newspapers and journals.
PDF KSCI

A Study on the Systems Engineering Management Plan for the Railway Safety System (철도안전시스템에 적용한 시스템 엔지니어링 관리 계획 작성사례 연구)

Choi Yo-Chul;Park Young-Won;Cho Yun-Ok
- Proceedings of the KSR Conference
- /
- 2005.05a
- /
- pp.64-69
- /
- 2005
The Systems Engineering Management Plan (SEMP) is the primary, top level technical management document for the integration of all engineering activities at the project plan phase. This document defined the activities to plan, control, and perform overall engineering integration. To develop the SEMP for Railway Safety System, several standards are reviewed and analyzed. And then a common requirement for SEMP preparation is derived from the results of analysis. Also, the SEMP example available practically applies to Railway Safety System. In particular, The SEMP focused on controling technical program management has been organized so far, but in this study the detailed contents of SEMP put stress on project management is derived. And it is related to each other between project management and technical engineering management. At the end, to continuously manage the items and contents of the SEMP, a database management and an automatic document generation system is presented using Computer-Aided Systems Engineering (CASE) tool.
PDF

Texture-based PCA for Analyzing Document Image (텍스처 정보 기반의 PCA를 이용한 문서 영상의 분석)

Kim, Bo-Ram;Kim, Wook-Hyun
- Proceedings of the IEEK Conference
- /
- 2006.06a
- /
- pp.283-284
- /
- 2006
In this paper, we propose a novel segmentation and classification method using texture features for the document image. First, we extract the local entropy and then segment the document image to separate the background and the foreground using the Otsu's method. Finally, we classify the segmented regions into each component using PCA(principle component analysis) algorithm based on the texture features that are extracted from the co-occurrence matrix for the entropy image. The entropy-based segmentation is robust to not only noise and the change of light, but also skew and rotation. Texture features are not restricted from any form of the document image and have a superior discrimination for each component. In addition, PCA algorithm used for the classifier can classify the components more robustly than neural network.
PDF

A Study on Web Document's Efficient Browsing

Kim, Dong-Hyun;Song, Seung-Heon;Kim, Eung-Kon
- Journal of information and communication convergence engineering
- /
- v.1 no.2
- /
- pp.88-92
- /
- 2003
Most document consists of primary content and supporting material, such as footnotes, detailed explanations, and illustrations, and the related supporting materials are linked as hypertext on web document. However, the content of hypertext links is appeared in the new windows on present web browser. Then the user will leave the primary material, may lose the entire context, and must have some difficulties to return to the primary context when the interest disappears. Using the technique for fluid links, we can solve these problems easily. If the mouse is putted on the link, the related material is presented in between lines or at margin maintaining the context of primary material. In this paper, we introduce the various browsing techniques using fluid links, analyze the forms and the features, and then we propose the way to implement in Java.
PDF KSCI

Transformation of Text Contents of Engineering Documents into an XML Document by using a Technique of Document Structure Extraction (문서구조 추출기법을 이용한 엔지니어링 문서 텍스트 정보의 XML 변환)

Lee, Sang-Ho;Park, Junwon;Park, Sang Il;Kim, Bong-Geun
- KSCE Journal of Civil and Environmental Engineering Research
- /
- v.31 no.6D
- /
- pp.849-856
- /
- 2011
This paper proposes a method for transforming unstructured text contents of engineering documents, which have complex hierarchical structure of subtitles with various heading symbols, into a semi-structured XML document according to the hierarchical subtitle structure. In order to extract the hierarchical structure from plain text information, this study employed a method of document structure extraction which is an analysis technique of the document structure. In addition, a method for processing enumerative text contents was developed to increase overall accuracy during extraction of the subtitles and construction of a hierarchical subtitle structure. An application module was developed based on the proposed method, and the performance of the module was evaluated with 40 test documents containing structural calculation records of bridges. The first test group of 20 documents related to the superstructure of steel girder bridges as applied in a previous study and they were used to verify the enhanced performance of the proposed method. The test results show that the new module guarantees an increase in accuracy and reliability in comparison with the test results of the previous study. The remaining 20 test documents were used to evaluate the applicability of the method. The final mean value of accuracy exceeded 99%, and the standard deviation was 1.52. The final results demonstrate that the proposed method can be applied to diverse heading symbols in various types of engineering documents to represent the hierarchical subtitle structure in a semi-structured XML document.
https://doi.org/10.12652/Ksce.2011.31.6D.849 인용 PDF KSCI

Study of XML document editing system that is creation for structural digital document (구조화된 전자문서 생성을 위한 사용자 중심의 XML 문서편집 시스템에 관한 연구)

차원준;황재각;이용준;정회경
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2003.05a
- /
- pp.277-280
- /
- 2003
Established XML at February, 1998 in W3C by solution about document processing and exchange and reusability to be shortcoming that early web happens using nonstructural document. Existing electron transaction is changing in electronic business form between corporation through XML base message exchange using XML. Necessity about solution that can masticate structured electron transaction of XML base that is used in electron transaction between corporation rose. Structured electron transaction of XML base that is used in electron transaction in treatise that see hereupon efficiently study about XML document editing system that integrate XML Schema editor to masticate XML Schema document that define edit and XML instance editor of user central that can write a book and structure of XML document efficiently do.
PDF

Automatic Single Document Text Summarization Using Key Concepts in Documents

Sarkar, Kamal
- Journal of Information Processing Systems
- /
- v.9 no.4
- /
- pp.602-620
- /
- 2013
Many previous research studies on extractive text summarization consider a subset of words in a document as keywords and use a sentence ranking function that ranks sentences based on their similarities with the list of extracted keywords. But the use of key concepts in automatic text summarization task has received less attention in literature on summarization. The proposed work uses key concepts identified from a document for creating a summary of the document. We view single-word or multi-word keyphrases of a document as the important concepts that a document elaborates on. Our work is based on the hypothesis that an extract is an elaboration of the important concepts to some permissible extent and it is controlled by the given summary length restriction. In other words, our method of text summarization chooses a subset of sentences from a document that maximizes the important concepts in the final summary. To allow diverse information in the summary, for each important concept, we select one sentence that is the best possible elaboration of the concept. Accordingly, the most important concept will contribute first to the summary, then to the second best concept, and so on. To prove the effectiveness of our proposed summarization method, we have compared it to some state-of-the art summarization systems and the results show that the proposed method outperforms the existing systems to which it is compared.
https://doi.org/10.3745/JIPS.2013.9.4.602 인용 PDF KSCI KPUBS HTML

Search Result 1,256, Processing Time 0.03 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)