• Title/Summary/Keyword: Web Document Analysis

Search Result 139, Processing Time 0.02 seconds

An efficient Decision-Making using the extended Fuzzy AHP Method(EFAM) (확장된 Fuzzy AHP를 이용한 효율적인 의사결정)

  • Ryu, Kyung-Hyun;Pi, Su-Young
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.19 no.6
    • /
    • pp.828-833
    • /
    • 2009
  • WWW which is an applicable massive set of document on the Web is a thesaurus of various information for users. However, Search engines spend a lot of time to retrieve necessary information and to filter out unnecessary information for user. In this paper, we propose the EFAM(the Extended Fuzzy AHP Method) model to manage the Web resource efficiently, and to make a decision in the problem of specific domain definitely. The EFAM model is concerned with the emotion analysis based on the domain corpus information, and it composed with systematic common concept grids by the knowledge of multiple experts. Therefore, The proposed the EFAM model can extract the documents by considering on the emotion criteria in the semantic context that is extracted concept from the corpus of specific domain and confirms that our model provides more efficient decision-making through an experiment than the conventional methods such as AHP and Fuzzy AHP which describe as a hierarchical structure elements about decision-making based on the alternatives, evaluation criteria, subjective attribute weight and fuzzy relation between concept and object.

Terminology Recognition System based on Machine Learning for Scientific Document Analysis (과학 기술 문헌 분석을 위한 기계학습 기반 범용 전문용어 인식 시스템)

  • Choi, Yun-Soo;Song, Sa-Kwang;Chun, Hong-Woo;Jeong, Chang-Hoo;Choi, Sung-Pil
    • The KIPS Transactions:PartD
    • /
    • v.18D no.5
    • /
    • pp.329-338
    • /
    • 2011
  • Terminology recognition system which is a preceding research for text mining, information extraction, information retrieval, semantic web, and question-answering has been intensively studied in limited range of domains, especially in bio-medical domain. We propose a domain independent terminology recognition system based on machine learning method using dictionary, syntactic features, and Web search results, since the previous works revealed limitation on applying their approaches to general domain because their resources were domain specific. We achieved F-score 80.8 and 6.5% improvement after comparing the proposed approach with the related approach, C-value, which has been widely used and is based on local domain frequencies. In the second experiment with various combinations of unithood features, the method combined with NGD(Normalized Google Distance) showed the best performance of 81.8 on F-score. We applied three machine learning methods such as Logistic regression, C4.5, and SVMs, and got the best score from the decision tree method, C4.5.

Analysis on Update Performance of XML Data by the Labeling Method (Labeling 방식에 따른 XML 데이터의 갱신 성능 분석)

  • Jung Min-Ok;Nam Dong-Sun;Han Jung-Yeob;Park Jong-Hyen;Kang Ji-Hoon
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.07b
    • /
    • pp.106-108
    • /
    • 2005
  • XML is situating a standard fur data exchange in the Web. Most applications use database to manage XML documents of high-capacity efficiently. Therefore, most applications create label that expresses structure information of XML data and stores with information of XML document. A number of labeling schemes have been designed to label the element nodes such that the relationships between nodes can be easily determined by comparing their labels. With the increased popularity of XML data on the web, finding a labeling scheme that is able to support order-sensitive queries in the presence of dynamic updates becomes urgent. XML documents that most applications use have many properties as their application. So, in the thesis, we present the most efficient updating methods dependent on properties of XML documents in practical application by choosing a representative labeling method and applying these properties. The result of our test is based on XML data management system, so it expect not only used directly in practical application, but a standard to select the most proper methods for environment of application to develop a new exclusive XML database or use XML.

  • PDF

Deep Learning Research Trends Analysis with Ego Centered Topic Citation Analysis (자아 중심 주제 인용분석을 활용한 딥러닝 연구동향 분석)

  • Lee, Jae Yun
    • Journal of the Korean Society for information Management
    • /
    • v.34 no.4
    • /
    • pp.7-32
    • /
    • 2017
  • Recently, deep learning has been rapidly spreading as an innovative machine learning technique in various domains. This study explored the research trends of deep learning via modified ego centered topic citation analysis. To do that, a few seed documents were selected from among the retrieved documents with the keyword 'deep learning' from Web of Science, and the related documents were obtained through citation relations. Those papers citing seed documents were set as ego documents reflecting current research in the field of deep learning. Preliminary studies cited frequently in the ego documents were set as the citation identity documents that represents the specific themes in the field of deep learning. For ego documents which are the result of current research activities, some quantitative analysis methods including co-authorship network analysis were performed to identify major countries and research institutes. For the citation identity documents, co-citation analysis was conducted, and key literatures and key research themes were identified by investigating the citation image keywords, which are major keywords those citing the citation identity document clusters. Finally, we proposed and measured the citation growth index which reflects the growth trend of the citation influence on a specific topic, and showed the changes in the leading research themes in the field of deep learning.

A Re-Ranking Retrieval Model based on Two-Level Similarity Relation Matrices (2단계 유사관계 행렬을 기반으로 한 순위 재조정 검색 모델)

  • 이기영;은희주;김용성
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.11
    • /
    • pp.1519-1533
    • /
    • 2004
  • When Web-based special retrieval systems for scientific field extremely restrict the expression of user's information request, the process of the information content analysis and that of the information acquisition become inconsistent. In this paper, we apply the fuzzy retrieval model to solve the high time complexity of the retrieval system by constructing a reduced term set for the term's relatively importance degree. Furthermore, we perform a cluster retrieval to reflect the user's Query exactly through the similarity relation matrix satisfying the characteristics of the fuzzy compatibility relation. We have proven the performance of a proposed re-ranking model based on the similarity union of the fuzzy retrieval model and the document cluster retrieval model.

Detection of Porno Sites on the Web using Fuzzy Inference (퍼지추론을 적용한 웹 음란문서 검출)

  • 김병만;최상필;노순억;김종완
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.11 no.5
    • /
    • pp.419-425
    • /
    • 2001
  • A method to detect lots of porno documents on the internet is presented in this parer. The proposed method applies fuzzy inference mechanism to the conventional information retrieval techniques. First, several example sites on porno arc provided by users and then candidate words representing for porno documents are extracted from theme documents. In this process, lexical analysis and stemming are performed. Then, several values such as tole term frequency(TF), the document frequency(DF), and the Heuristic Information(HI) Is computed for each candidate word. Finally, fuzzy inference is performed with the above three values to weight candidate words. The weights of candidate words arc used to determine whether a liven site is sexual or not. From experiments on small test collection, the proposed method was shown useful to detect the sexual sites automatically.

  • PDF

Analysis of Term Ambiguity based on Genetic Algorithm (유전자 알고리즘 기반 용어 중의성 분석)

  • Kim, Jeong-Joon;Chung, Sung-Taek;Park, Jeong-Min
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.17 no.5
    • /
    • pp.131-136
    • /
    • 2017
  • Recently, with the development of Internet media, many document materials have become exponentially increasing on the web. These materials are described, and the information on what is the most by this text are classified according. However, the text has meant that many have room for ambiguous interpretation must look at it from various angles in order to interpret them correctly. In conventional classification methods it was simply a classification only have the appearance of the text. In this paper, we analyze it in terms genetic algorithm and local preserving based techniques and implemented a clustering system fragmentation them. Finally, the performance of this paper was evaluated based on the implementation results compared to traditional methods.

Accounting Education in the Era of Information and Technology : Suggestions for Adopting IT Related Curriculum (기술정보화(IT) 시대의 회계 교육 : IT교과와의 융합교육의 제안)

  • Yoon, Sora
    • Journal of Information Technology Services
    • /
    • v.20 no.2
    • /
    • pp.91-109
    • /
    • 2021
  • Recently, social and economic environment has been rapidly changed. In particular, the development of IT technology accelerated the introduction of databases, communication networks, information processing and analyzing systems, making the use of such information and communication technology an essential factor for corporate management innovation. This change also affected the accounting areas. The purpose of this study is to document changes in accounting areas due to the adoption of IT technologies in the era of technology and information, to define the required accounting professions in this era, and to present the efficient educational methodologies for training such accounting experts. An accounting expert suitable for the era of technology and information means an accounting profession not only with basic accounting knowledge, competence, independency, reliability, communication skills, and flexible interpersonal skills, but also with IT skills, data utilization and analysis skills, the understanding big data and artificial intelligence, and blockchain-based accounting information systems. In order to educate future accounting experts, the accounting curriculum should be reorganized to strengthen the IT capabilities, and it should provide a wide variety of learning opportunities. It is also important to provide a practical level of education through industry and academic cooperation. Distance learning, web-based learning, discussion-type classes, TBL, PBL, and flipped-learnings will be suitable for accounting education methodologies to foster future accounting experts. This study is meaningful because it can motivate to consider accounting educational system and curriculum to enhance IT capabilities.

Survey of Automatic Query Expansion for Arabic Text Retrieval

  • Farhan, Yasir Hadi;Noah, Shahrul Azman Mohd;Mohd, Masnizah
    • Journal of Information Science Theory and Practice
    • /
    • v.8 no.4
    • /
    • pp.67-86
    • /
    • 2020
  • Information need has been one of the main motivations for a person using a search engine. Queries can represent very different information needs. Ironically, a query can be a poor representation of the information need because the user can find it difficult to express the information need. Query Expansion (QE) is being popularly used to address this limitation. While QE can be considered as a language-independent technique, recent findings have shown that in certain cases, language plays an important role. Arabic is a language with a particularly large vocabulary rich in words with synonymous shades of meaning and has high morphological complexity. This paper, therefore, provides a review on QE for Arabic information retrieval, the intention being to identify the recent state-of-the-art of this burgeoning area. In this review, we primarily discuss statistical QE approaches that include document analysis, search, browse log analyses, and web knowledge analyses, in addition to the semantic QE approaches, which use semantic knowledge structures to extract meaningful word relationships. Finally, our conclusion is that QE regarding the Arabic language is subjected to additional investigation and research due to the intricate nature of this language.

Page Group Search Model : A New Internet Search Model for Illegal and Harmful Content (페이지 그룹 검색 그룹 모델 : 음란성 유해 정보 색출 시스템을 위한 인터넷 정보 검색 모델)

  • Yuk, Hyeon-Gyu;Yu, Byeong-Jeon;Park, Myeong-Sun
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.26 no.12
    • /
    • pp.1516-1528
    • /
    • 1999
  • 월드 와이드 웹(World Wide Web)에 존재하는 음란성 유해 정보는 많은 국가에서 사회적인 문제를 일으키고 있다. 그러나 현재 음란성 유해 정보로부터 미성년자를 보호하는 실효성 있는 방법은 유해 정보 접근 차단 프로그램을 사용하는 방법뿐이다. 유해 정보 접근 차단 프로그램은 기본적으로 음란성 유해 정보를 포함한 유해 정보 주소 목록을 기반으로 사용자의 유해 정보에 대한 접근을 차단하는 방식으로 동작한다.그런데 대규모 유해 정보 주소 목록의 확보를 위해서는 월드 와이드 웹으로부터 음란성 유해 정보를 자동 색출하는 인터넷 정보 검색 시스템의 일종인 음란성 유해 정보 색출 시스템이 필요하다. 그런데 음란성 유해 정보 색출 시스템은 그 대상이 사람이 아닌 유해 정보 접근 차단 프로그램이기 때문에 일반 인터넷 정보 검색 시스템과는 달리, 대단히 높은 검색 정확성을 유지해야 하고, 유해 정보 접근 차단 프로그램에서 관리가 용이한 검색 목록을 생성해야 하는 요구 사항을 가진다.본 논문에서는 기존 인터넷 정보 검색 모델이 "문헌"에 대한 잘못된 가정 때문에 위 요구사항을 만족시키지 못하고 있음을 지적하고, 월드 와이드 웹 상의 문헌에 대한 새로운 정의와 이를 기반으로 위의 요구사항을 만족하는 검색 모델인 페이지 그룹 검색 모델을 제안한다. 또한 다양한 실험과 분석을 통해 제안하는 모델이 기존 인터넷 정보 검색 모델보다 높은 정확성과 빠른 검색 속도, 그리고 유해 정보 접근 차단 프로그램에서의 관리가 용이한 검색 목록을 생성함을 보인다.Abstract Illegal and Harmful Content on the Internet, especially content for adults causes a social problem in many countries. To protect children from harmful content, A filtering software, which blocks user's access to harmful content based on a blocking list, and harmful content search system, which is a special purpose internet search system to generate the blocking list, are necessary. We found that current internet search models do not satisfy the requirements of the harmful content search system: high accuracy in document analysis, fast search time, and low overhead in the filtering software.In this paper we point out these problems are caused by a mistake in a document definition of the current internet models and propose a new internet search model, Page Group Search Model. This model considers a document as a set of pages that are made for one subject. We suggest a Group Construction algorithm and a Group Evaluation algorithm. And we perform experiments to prove that Page Group Search Model satisfies the requirements.uirements.