• Title/Summary/Keyword: Document-Classification

Search Result 448, Processing Time 0.029 seconds

Knowledge Extraction Methodology and Framework from Wikipedia Articles for Construction of Knowledge-Base (지식베이스 구축을 위한 한국어 위키피디아의 학습 기반 지식추출 방법론 및 플랫폼 연구)

  • Kim, JaeHun;Lee, Myungjin
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.43-61
    • /
    • 2019
  • Development of technologies in artificial intelligence has been rapidly increasing with the Fourth Industrial Revolution, and researches related to AI have been actively conducted in a variety of fields such as autonomous vehicles, natural language processing, and robotics. These researches have been focused on solving cognitive problems such as learning and problem solving related to human intelligence from the 1950s. The field of artificial intelligence has achieved more technological advance than ever, due to recent interest in technology and research on various algorithms. The knowledge-based system is a sub-domain of artificial intelligence, and it aims to enable artificial intelligence agents to make decisions by using machine-readable and processible knowledge constructed from complex and informal human knowledge and rules in various fields. A knowledge base is used to optimize information collection, organization, and retrieval, and recently it is used with statistical artificial intelligence such as machine learning. Recently, the purpose of the knowledge base is to express, publish, and share knowledge on the web by describing and connecting web resources such as pages and data. These knowledge bases are used for intelligent processing in various fields of artificial intelligence such as question answering system of the smart speaker. However, building a useful knowledge base is a time-consuming task and still requires a lot of effort of the experts. In recent years, many kinds of research and technologies of knowledge based artificial intelligence use DBpedia that is one of the biggest knowledge base aiming to extract structured content from the various information of Wikipedia. DBpedia contains various information extracted from Wikipedia such as a title, categories, and links, but the most useful knowledge is from infobox of Wikipedia that presents a summary of some unifying aspect created by users. These knowledge are created by the mapping rule between infobox structures and DBpedia ontology schema defined in DBpedia Extraction Framework. In this way, DBpedia can expect high reliability in terms of accuracy of knowledge by using the method of generating knowledge from semi-structured infobox data created by users. However, since only about 50% of all wiki pages contain infobox in Korean Wikipedia, DBpedia has limitations in term of knowledge scalability. This paper proposes a method to extract knowledge from text documents according to the ontology schema using machine learning. In order to demonstrate the appropriateness of this method, we explain a knowledge extraction model according to the DBpedia ontology schema by learning Wikipedia infoboxes. Our knowledge extraction model consists of three steps, document classification as ontology classes, proper sentence classification to extract triples, and value selection and transformation into RDF triple structure. The structure of Wikipedia infobox are defined as infobox templates that provide standardized information across related articles, and DBpedia ontology schema can be mapped these infobox templates. Based on these mapping relations, we classify the input document according to infobox categories which means ontology classes. After determining the classification of the input document, we classify the appropriate sentence according to attributes belonging to the classification. Finally, we extract knowledge from sentences that are classified as appropriate, and we convert knowledge into a form of triples. In order to train models, we generated training data set from Wikipedia dump using a method to add BIO tags to sentences, so we trained about 200 classes and about 2,500 relations for extracting knowledge. Furthermore, we evaluated comparative experiments of CRF and Bi-LSTM-CRF for the knowledge extraction process. Through this proposed process, it is possible to utilize structured knowledge by extracting knowledge according to the ontology schema from text documents. In addition, this methodology can significantly reduce the effort of the experts to construct instances according to the ontology schema.

Performance Evaluation on the Learning Algorithm for Automatic Classification of Q&A Documents (고객 질의 문서 자동 분류를 위한 학습 알고리즘 성능 평가)

  • Choi Jung-Min;Lee Byoung-Soo
    • The KIPS Transactions:PartD
    • /
    • v.13D no.1 s.104
    • /
    • pp.133-138
    • /
    • 2006
  • Electric commerce of surpassing the traditional one appeared before the public and has currently led the change in the management of enterprises. To establish and maintain good relations with customers, electric commerce has various channels for customers that understand what they want to and suggest it to them. The bulletin board and e-mail among em are inbound information that enterprises can directly listen to customers' opinions and are different from other channels in characters. Enterprises can effectively manage the bulletin board and e-mail by understanding customers' ideas as many as possible and provide them with optimum answers. It is one of the important factors to improve the reliability of the notice board and e-mail as well as the whole electric commerce. Therefore this thesis researches into methods to classify various kinds of documents automatically in electric commerce; they are possible to solve existing problems of the bulletin board and e-mail, to operate effectively and to manage systematically. Moreover, it researches what the most suitable algorithm is in the automatic classification of Q&A documents by experiment the classifying performance of Naive Bayesian, TFIDF, Neural Network, k-NN

An automated Classification System of Standard Industry and Occupation Codes by Using Information Retrieval Techniques (정보검색 기법을 이용한 산업/직업 코드 자동 분류 시스템)

  • Lim, Heui Seok
    • The Journal of Korean Association of Computer Education
    • /
    • v.7 no.4
    • /
    • pp.51-60
    • /
    • 2004
  • This paper proposes an automated coding system of Korean standard industry/occupation for census which reduces a lot of cost and labor for manual coding. The proposed system converts natural language responses on survey questionnaires into corresponding numeric codes using information retrieval techniques and document classification algorithm. The system was experimented with 46,762 industry records and occupation 36,286 records using 10-fold cross -validation evaluation method. As experimental results, the system show 87.08% and 66.08% production rates when classifying industry records into level 2 and level 5 codes respectively. The system shows slightly lower performances on occupation code classification. We expect that the system is enough to be used as a semi-automate coding system which can minimize manual coding task or as a verification tool for manual coding results though it has much room to be improved as an automated coding system.

  • PDF

A New Similarity Measure for Improving Ranking in QA Systems (질의응답시스템 응답순위 개선을 위한 새로운 유사도 계산방법)

  • Kim Myung-Gwan;Park Young-Tack
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.10 no.6
    • /
    • pp.529-536
    • /
    • 2004
  • The main idea of this paper is to combine position information in sentence and query type classification to make the documents ranking to query more accessible. First, the use of conceptual graphs for the representation of document contents In information retrieval is discussed. The method is based on well-known strategies of text comparison, such as Dice Coefficient, with position-based weighted term. Second, we introduce a method for learning query type classification that improves the ability to retrieve answers to questions from Question Answering system. Proposed methods employ naive bayes classification in machine learning fields. And, we used a collection of approximately 30,000 question-answer pairs for training, obtained from Frequently Asked Question(FAQ) files on various subjects. The evaluation on a set of queries from international TREC-9 question answering track shows that the method with machine learning outperforms the underline other systems in TREC-9 (0.29 for mean reciprocal rank and 55.1% for precision).

The Establishment Process and Institutional Characteristics of Records and Archival Management System of Korean Government in the Early 1960s (1960년대 초반 한국 국가기록관리체제의 수립과정과 제도적 특징)

  • Lee, Seong-Il
    • Journal of Korean Society of Archives and Records Management
    • /
    • v.7 no.2
    • /
    • pp.43-71
    • /
    • 2007
  • The Records and Archival Management System of Korean Government was founded in the early 1960s after the overall national structure reform and the implementation of the new administrative management technique, which boosted the efficiency of the way of conducting business, into the public administration, and Promoted in 1962, the records appraisal and destruction works included not only retention and destruction of official documents but also the development of efficient management and elimination systems for official documents to be produced in the future. and Korean government elaborated the appraisal system to stipulate the retention period on the basis of functional classification and documentary function.

A Study on a documentation Area for photography documentation of railway station. (철도역 사진기록화를 위한 영역설정에 관한 연구)

  • Kim, Jeong Hyeon
    • The Korean Journal of Archival Studies
    • /
    • no.30
    • /
    • pp.125-174
    • /
    • 2011
  • Railway Station in Korea built for purpose to plunder resource and invade China 19 and 20 century early, It functioned as main transportation and important role in local history. However, Since the late 1990s, Railway Station in Korea is facing major changes due to Railroad Improvement and Restructuring plan. Photography is useful way to represent memory and local history about railway station, but Photography Documentation on railroad station were not discussed until now due to indifference of the KORAIL and peculiarities of national security facilities for a long time. This study suggest what is document activities and facilities of railway station, use value analysis about railway station's value firstly. next, this study set each documentation area with value analysis for a basis and present concrete example. finally, this study adapt documentation area and detail to classification scheme and apply activities and facilities of supplement or revision to classification scheme.

IACS UR E26 - Analysis of the Cyber Resilience of Ships (국제선급협회 공통 규칙 - 선박의 사이버 복원력에 대한 기술적 분석)

  • Nam-seon Kang;Gum-jun Son;Rae-Chon Park;Chang-sik Lee;Seong-sang Yu
    • Journal of Advanced Navigation Technology
    • /
    • v.28 no.1
    • /
    • pp.27-36
    • /
    • 2024
  • In this paper, we analyze the unified requirements of international association of classification societies - cyber resilience of ships, ahead of implementation of the agreement on July 1, 2024, and respond to ship cyber security and resilience programs based on 5 requirements, 17 details, and documents that must be submitted or maintained according to the ship's cyber resilience,. Measures include document management such as classification certification documents and design documents, configuration of a network with enhanced security, establishment of processes for accident response, configuration management using software tools, integrated network management, malware protection, and detection of ship network security threats with security management solutions. proposed a technology capable of real-time response.

Improving Naïve Bayes Text Classifiers with Incremental Feature Weighting (점진적 특징 가중치 기법을 이용한 나이브 베이즈 문서분류기의 성능 개선)

  • Kim, Han-Joon;Chang, Jae-Young
    • The KIPS Transactions:PartB
    • /
    • v.15B no.5
    • /
    • pp.457-464
    • /
    • 2008
  • In the real-world operational environment, most of text classification systems have the problems of insufficient training documents and no prior knowledge of feature space. In this regard, $Na{\ddot{i}ve$ Bayes is known to be an appropriate algorithm of operational text classification since the classification model can be evolved easily by incrementally updating its pre-learned classification model and feature space. This paper proposes the improving technique of $Na{\ddot{i}ve$ Bayes classifier through feature weighting strategy. The basic idea is that parameter estimation of $Na{\ddot{i}ve$ Bayes considers the degree of feature importance as well as feature distribution. We can develop a more accurate classification model by incorporating feature weights into Naive Bayes learning algorithm, not performing a learning process with a reduced feature set. In addition, we have extended a conventional feature update algorithm for incremental feature weighting in a dynamic operational environment. To evaluate the proposed method, we perform the experiments using the various document collections, and show that the traditional $Na{\ddot{i}ve$ Bayes classifier can be significantly improved by the proposed technique.

Legal Aspects on the Procedures and Settlement of the Disputes arising from the WTO Preshipment Inspection (WTO 선적전검사제도에 따른 실태와 분쟁조정의 해결에 관한 고찰)

  • Seo, Jeong-Il
    • Journal of Arbitration Studies
    • /
    • v.8 no.1
    • /
    • pp.293-322
    • /
    • 1998
  • General Administrative Procedures of the Preshipment Inspection 1. Initial notification Preshipment Inspection is initiated by Agency when it receives notice either from the importing country, or the seller, that an export needs to be imspected 1.1 Notice from the importing country 1.2 Notice from the seller 2. Preliminary price verification After receipt of initial notification, Agency undertakes, Where possible, a preliminary price verification, based upon the Inspection Order and other contractual documents received. 3. Customs classification When required by the Government of the importing country. Agency forms an opinion of the Customs Classification Code based upon the Customs Tariff Book and Rules of Classification of the country of importation. The Customs Classification Code determines the tariff rate on the basis of which the importer will be required to pay import duties. 4. Import eligibility 5. Arrangements for physical inspection 5.1 Inspection request from seller 5.2 Place of inspection 5.3 Date of inspection 5.4 Physical inspection procedures 6. Physical inspection results When the physical inspection is completed, the inspector submits his report to the Agency office and the result of inspection will be communicated to the seller and, where applicable, the place of inspection. The result will state: satisfactory or conditional of unsatisfactory. The seller is welcome to present his views in writting to Agency in the event there is any query regarding the issuance of a conditional of unsatisfactory inspection result. 6.1 Satisfactory 6.2 Conditional 6.3 Unsatisfactory 7. Shipment of the goods The seller is advised to check with Agency prior to shipment if the physical inspection result has not been received or there are any doubts concerning whether a Clean Report of Findings will be issued. 8. Final price verification and classification Based on the results of physical inspection and appropriate final documents, Agency finalises the price verification and the Agency opinion of Customs classification code. When the preliminary price verification has not resulted in any unresolved questions and the inspection result and other documents received are consistent with the preliminary documentation, Agency will not normally require any additional information. The main exception would be if the terms of sale require reference to prices at the date of shipment. 9. The Report of Findings 9.1 Types of Reports of Findings - Clean Reports of Findings(CRF) The Agency will issue a Clean Reports of Findings(CRF), or equivalent document, normally within two working days after receipt of the necessary correct final documents and a satisfactory result in all aspects of the inspection. - Discrepancy Report.

  • PDF

Improving Hypertext Classification Systems through WordNet-based Feature Abstraction (워드넷 기반 특징 추상화를 통한 웹문서 자동분류시스템의 성능향상)

  • Roh, Jun-Ho;Kim, Han-Joon;Chang, Jae-Young
    • The Journal of Society for e-Business Studies
    • /
    • v.18 no.2
    • /
    • pp.95-110
    • /
    • 2013
  • This paper presents a novel feature engineering technique that can improve the conventional machine learning-based text classification systems. The proposed method extends the initial set of features by using hyperlink relationships in order to effectively categorize hypertext web documents. Web documents are connected to each other through hyperlinks, and in many cases hyperlinks exist among highly related documents. Such hyperlink relationships can be used to enhance the quality of features which consist of classification models. The basic idea of the proposed method is to generate a sort of ed concept feature which consists of a few raw feature words; for this, the method computes the semantic similarity between a target document and its neighbor documents by utilizing hierarchical relationships in the WordNet ontology. In developing classification models, the ed concept features are equated with other raw features, and they can play a great role in developing more accurate classification models. Through the extensive experiments with the Web-KB test collection, we prove that the proposed methods outperform the conventional ones.