• Title/Summary/Keyword: document classification

Search Result 451, Processing Time 0.029 seconds

An Optimized e-Lecture Video Search and Indexing framework

  • Medida, Lakshmi Haritha;Ramani, Kasarapu
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.8
    • /
    • pp.87-96
    • /
    • 2021
  • The demand for e-learning through video lectures is rapidly increasing due to its diverse advantages over the traditional learning methods. This led to massive volumes of web-based lecture videos. Indexing and retrieval of a lecture video or a lecture video topic has thus proved to be an exceptionally challenging problem. Many techniques listed by literature were either visual or audio based, but not both. Since the effects of both the visual and audio components are equally important for the content-based indexing and retrieval, the current work is focused on both these components. A framework for automatic topic-based indexing and search depending on the innate content of the lecture videos is presented. The text from the slides is extracted using the proposed Merged Bounding Box (MBB) text detector. The audio component text extraction is done using Google Speech Recognition (GSR) technology. This hybrid approach generates the indexing keywords from the merged transcripts of both the video and audio component extractors. The search within the indexed documents is optimized based on the Naïve Bayes (NB) Classification and K-Means Clustering models. This optimized search retrieves results by searching only the relevant document cluster in the predefined categories and not the whole lecture video corpus. The work is carried out on the dataset generated by assigning categories to the lecture video transcripts gathered from e-learning portals. The performance of search is assessed based on the accuracy and time taken. Further the improved accuracy of the proposed indexing technique is compared with the accepted chain indexing technique.

The Development of the Drawing Information Management System Based on Group Technology (Group Technology를 이용한 설계정보관리 시스템의 개발)

  • H.S. Moon;Kim, S.H.
    • Journal of the Korean Society for Precision Engineering
    • /
    • v.14 no.1
    • /
    • pp.58-68
    • /
    • 1997
  • In order to provide economic high-quality products to customers in a timely manner, companies have tried much effort to decrease the time period of engineering design and information management. As a part of this effort, we have developed the Drawing Information Management System(DIMS) based ofn GT(Group Technology) that could decrease design processing time by speedy and rational management of design processes. The characteristics of DIMS are as follows: First, the concept of Concurrent Engineering was applied to DIMS. Through LAN, reviewers are able to attach comments to dlectronic documents by anno- tation functions called Mark-up. The reviewer annotations are collected and combind with the original document to revise the documents. Second, we have developed a Classification and Coding(C&C) system suitable for electronic component parts bassed on GT(Group Technology). The C&C system makes both parts and drawing with similar characteriscs into families and helps users search existing documents or create new drawings promptly. Finally, DIMS provides the Engineering BOM(Bill of Material) using the concept of Family BOM based on model options.

  • PDF

Knowledge Extraction Methodology and Framework from Wikipedia Articles for Construction of Knowledge-Base (지식베이스 구축을 위한 한국어 위키피디아의 학습 기반 지식추출 방법론 및 플랫폼 연구)

  • Kim, JaeHun;Lee, Myungjin
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.43-61
    • /
    • 2019
  • Development of technologies in artificial intelligence has been rapidly increasing with the Fourth Industrial Revolution, and researches related to AI have been actively conducted in a variety of fields such as autonomous vehicles, natural language processing, and robotics. These researches have been focused on solving cognitive problems such as learning and problem solving related to human intelligence from the 1950s. The field of artificial intelligence has achieved more technological advance than ever, due to recent interest in technology and research on various algorithms. The knowledge-based system is a sub-domain of artificial intelligence, and it aims to enable artificial intelligence agents to make decisions by using machine-readable and processible knowledge constructed from complex and informal human knowledge and rules in various fields. A knowledge base is used to optimize information collection, organization, and retrieval, and recently it is used with statistical artificial intelligence such as machine learning. Recently, the purpose of the knowledge base is to express, publish, and share knowledge on the web by describing and connecting web resources such as pages and data. These knowledge bases are used for intelligent processing in various fields of artificial intelligence such as question answering system of the smart speaker. However, building a useful knowledge base is a time-consuming task and still requires a lot of effort of the experts. In recent years, many kinds of research and technologies of knowledge based artificial intelligence use DBpedia that is one of the biggest knowledge base aiming to extract structured content from the various information of Wikipedia. DBpedia contains various information extracted from Wikipedia such as a title, categories, and links, but the most useful knowledge is from infobox of Wikipedia that presents a summary of some unifying aspect created by users. These knowledge are created by the mapping rule between infobox structures and DBpedia ontology schema defined in DBpedia Extraction Framework. In this way, DBpedia can expect high reliability in terms of accuracy of knowledge by using the method of generating knowledge from semi-structured infobox data created by users. However, since only about 50% of all wiki pages contain infobox in Korean Wikipedia, DBpedia has limitations in term of knowledge scalability. This paper proposes a method to extract knowledge from text documents according to the ontology schema using machine learning. In order to demonstrate the appropriateness of this method, we explain a knowledge extraction model according to the DBpedia ontology schema by learning Wikipedia infoboxes. Our knowledge extraction model consists of three steps, document classification as ontology classes, proper sentence classification to extract triples, and value selection and transformation into RDF triple structure. The structure of Wikipedia infobox are defined as infobox templates that provide standardized information across related articles, and DBpedia ontology schema can be mapped these infobox templates. Based on these mapping relations, we classify the input document according to infobox categories which means ontology classes. After determining the classification of the input document, we classify the appropriate sentence according to attributes belonging to the classification. Finally, we extract knowledge from sentences that are classified as appropriate, and we convert knowledge into a form of triples. In order to train models, we generated training data set from Wikipedia dump using a method to add BIO tags to sentences, so we trained about 200 classes and about 2,500 relations for extracting knowledge. Furthermore, we evaluated comparative experiments of CRF and Bi-LSTM-CRF for the knowledge extraction process. Through this proposed process, it is possible to utilize structured knowledge by extracting knowledge according to the ontology schema from text documents. In addition, this methodology can significantly reduce the effort of the experts to construct instances according to the ontology schema.

Performance Evaluation on the Learning Algorithm for Automatic Classification of Q&A Documents (고객 질의 문서 자동 분류를 위한 학습 알고리즘 성능 평가)

  • Choi Jung-Min;Lee Byoung-Soo
    • The KIPS Transactions:PartD
    • /
    • v.13D no.1 s.104
    • /
    • pp.133-138
    • /
    • 2006
  • Electric commerce of surpassing the traditional one appeared before the public and has currently led the change in the management of enterprises. To establish and maintain good relations with customers, electric commerce has various channels for customers that understand what they want to and suggest it to them. The bulletin board and e-mail among em are inbound information that enterprises can directly listen to customers' opinions and are different from other channels in characters. Enterprises can effectively manage the bulletin board and e-mail by understanding customers' ideas as many as possible and provide them with optimum answers. It is one of the important factors to improve the reliability of the notice board and e-mail as well as the whole electric commerce. Therefore this thesis researches into methods to classify various kinds of documents automatically in electric commerce; they are possible to solve existing problems of the bulletin board and e-mail, to operate effectively and to manage systematically. Moreover, it researches what the most suitable algorithm is in the automatic classification of Q&A documents by experiment the classifying performance of Naive Bayesian, TFIDF, Neural Network, k-NN

An automated Classification System of Standard Industry and Occupation Codes by Using Information Retrieval Techniques (정보검색 기법을 이용한 산업/직업 코드 자동 분류 시스템)

  • Lim, Heui Seok
    • The Journal of Korean Association of Computer Education
    • /
    • v.7 no.4
    • /
    • pp.51-60
    • /
    • 2004
  • This paper proposes an automated coding system of Korean standard industry/occupation for census which reduces a lot of cost and labor for manual coding. The proposed system converts natural language responses on survey questionnaires into corresponding numeric codes using information retrieval techniques and document classification algorithm. The system was experimented with 46,762 industry records and occupation 36,286 records using 10-fold cross -validation evaluation method. As experimental results, the system show 87.08% and 66.08% production rates when classifying industry records into level 2 and level 5 codes respectively. The system shows slightly lower performances on occupation code classification. We expect that the system is enough to be used as a semi-automate coding system which can minimize manual coding task or as a verification tool for manual coding results though it has much room to be improved as an automated coding system.

  • PDF

A New Similarity Measure for Improving Ranking in QA Systems (질의응답시스템 응답순위 개선을 위한 새로운 유사도 계산방법)

  • Kim Myung-Gwan;Park Young-Tack
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.10 no.6
    • /
    • pp.529-536
    • /
    • 2004
  • The main idea of this paper is to combine position information in sentence and query type classification to make the documents ranking to query more accessible. First, the use of conceptual graphs for the representation of document contents In information retrieval is discussed. The method is based on well-known strategies of text comparison, such as Dice Coefficient, with position-based weighted term. Second, we introduce a method for learning query type classification that improves the ability to retrieve answers to questions from Question Answering system. Proposed methods employ naive bayes classification in machine learning fields. And, we used a collection of approximately 30,000 question-answer pairs for training, obtained from Frequently Asked Question(FAQ) files on various subjects. The evaluation on a set of queries from international TREC-9 question answering track shows that the method with machine learning outperforms the underline other systems in TREC-9 (0.29 for mean reciprocal rank and 55.1% for precision).

The Establishment Process and Institutional Characteristics of Records and Archival Management System of Korean Government in the Early 1960s (1960년대 초반 한국 국가기록관리체제의 수립과정과 제도적 특징)

  • Lee, Seong-Il
    • Journal of Korean Society of Archives and Records Management
    • /
    • v.7 no.2
    • /
    • pp.43-71
    • /
    • 2007
  • The Records and Archival Management System of Korean Government was founded in the early 1960s after the overall national structure reform and the implementation of the new administrative management technique, which boosted the efficiency of the way of conducting business, into the public administration, and Promoted in 1962, the records appraisal and destruction works included not only retention and destruction of official documents but also the development of efficient management and elimination systems for official documents to be produced in the future. and Korean government elaborated the appraisal system to stipulate the retention period on the basis of functional classification and documentary function.

A Study on a documentation Area for photography documentation of railway station. (철도역 사진기록화를 위한 영역설정에 관한 연구)

  • Kim, Jeong Hyeon
    • The Korean Journal of Archival Studies
    • /
    • no.30
    • /
    • pp.125-174
    • /
    • 2011
  • Railway Station in Korea built for purpose to plunder resource and invade China 19 and 20 century early, It functioned as main transportation and important role in local history. However, Since the late 1990s, Railway Station in Korea is facing major changes due to Railroad Improvement and Restructuring plan. Photography is useful way to represent memory and local history about railway station, but Photography Documentation on railroad station were not discussed until now due to indifference of the KORAIL and peculiarities of national security facilities for a long time. This study suggest what is document activities and facilities of railway station, use value analysis about railway station's value firstly. next, this study set each documentation area with value analysis for a basis and present concrete example. finally, this study adapt documentation area and detail to classification scheme and apply activities and facilities of supplement or revision to classification scheme.

IACS UR E26 - Analysis of the Cyber Resilience of Ships (국제선급협회 공통 규칙 - 선박의 사이버 복원력에 대한 기술적 분석)

  • Nam-seon Kang;Gum-jun Son;Rae-Chon Park;Chang-sik Lee;Seong-sang Yu
    • Journal of Advanced Navigation Technology
    • /
    • v.28 no.1
    • /
    • pp.27-36
    • /
    • 2024
  • In this paper, we analyze the unified requirements of international association of classification societies - cyber resilience of ships, ahead of implementation of the agreement on July 1, 2024, and respond to ship cyber security and resilience programs based on 5 requirements, 17 details, and documents that must be submitted or maintained according to the ship's cyber resilience,. Measures include document management such as classification certification documents and design documents, configuration of a network with enhanced security, establishment of processes for accident response, configuration management using software tools, integrated network management, malware protection, and detection of ship network security threats with security management solutions. proposed a technology capable of real-time response.

Improving Naïve Bayes Text Classifiers with Incremental Feature Weighting (점진적 특징 가중치 기법을 이용한 나이브 베이즈 문서분류기의 성능 개선)

  • Kim, Han-Joon;Chang, Jae-Young
    • The KIPS Transactions:PartB
    • /
    • v.15B no.5
    • /
    • pp.457-464
    • /
    • 2008
  • In the real-world operational environment, most of text classification systems have the problems of insufficient training documents and no prior knowledge of feature space. In this regard, $Na{\ddot{i}ve$ Bayes is known to be an appropriate algorithm of operational text classification since the classification model can be evolved easily by incrementally updating its pre-learned classification model and feature space. This paper proposes the improving technique of $Na{\ddot{i}ve$ Bayes classifier through feature weighting strategy. The basic idea is that parameter estimation of $Na{\ddot{i}ve$ Bayes considers the degree of feature importance as well as feature distribution. We can develop a more accurate classification model by incorporating feature weights into Naive Bayes learning algorithm, not performing a learning process with a reduced feature set. In addition, we have extended a conventional feature update algorithm for incremental feature weighting in a dynamic operational environment. To evaluate the proposed method, we perform the experiments using the various document collections, and show that the traditional $Na{\ddot{i}ve$ Bayes classifier can be significantly improved by the proposed technique.