• Title/Summary/Keyword: Automatic Document Generation

Search Result 50, Processing Time 0.021 seconds

Automatic Generation of the Local Level Knowledge Structure of a Single Document Using Clustering Methods (클러스터링 기법을 이용한 개별문서의 지식구조 자동 생성에 관한 연구)

  • Han, Seung-Hee;Chung, Young-Mee
    • Journal of the Korean Society for information Management
    • /
    • v.21 no.3
    • /
    • pp.251-267
    • /
    • 2004
  • The purpose of this study is to generate the local level knowledge structure of a single document, similar to end-of-the-book indexes and table of contents of printed material through the use of term clustering and cluster representative term selection. Furthermore, it aims to analyze the functionalities of the knowledge structure. and to confirm the applicability of these methods in user-friend1y information services. The results of the term clustering experiment showed that the performance of the Ward's method was superior to that of the fuzzy K -means clustering method. In the cluster representative term selection experiment, using the highest passage frequency term as the representative yielded the best performance. Finally, the result of user task-based functionality tests illustrate that the automatically generated knowledge structure in this study functions similarly to the local level knowledge structure presented In printed material.

Automated Verification of Livestock Manure Transfer Management System Handover Document using Gradient Boosting (Gradient Boosting을 이용한 가축분뇨 인계관리시스템 인계서 자동 검증)

  • Jonghwi Hwang;Hwakyung Kim;Jaehak Ryu;Taeho Kim;Yongtae Shin
    • Journal of Information Technology Services
    • /
    • v.22 no.4
    • /
    • pp.97-110
    • /
    • 2023
  • In this study, we propose a technique to automatically generate transfer documents using sensor data from livestock manure transfer systems. The research involves analyzing sensor data and applying machine learning techniques to derive optimized outcomes for livestock manure transfer documents. By comparing and contrasting with existing documents, we present a method for automatic document generation. Specifically, we propose the utilization of Gradient Boosting, a machine learning algorithm. The objective of this research is to enhance the efficiency of livestock manure and liquid byproduct management. Currently, stakeholders including producers, transporters, and processors manually input data into the livestock manure transfer management system during the disposal of manure and liquid byproducts. This manual process consumes additional labor, leads to data inconsistency, and complicates the management of distribution and treatment. Therefore, the aim of this study is to leverage data to automatically generate transfer documents, thereby increasing the efficiency of livestock manure and liquid byproduct management. By utilizing sensor data from livestock manure and liquid byproduct transport vehicles and employing machine learning algorithms, we establish a system that automates the validation of transfer documents, reducing the burden on producers, transporters, and processors. This efficient management system is anticipated to create a transparent environment for the distribution and treatment of livestock manure and liquid byproducts.

An Efficient Machine Learning-based Text Summarization in the Malayalam Language

  • P Haroon, Rosna;Gafur M, Abdul;Nisha U, Barakkath
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.6
    • /
    • pp.1778-1799
    • /
    • 2022
  • Automatic text summarization is a procedure that packs enormous content into a more limited book that incorporates significant data. Malayalam is one of the toughest languages utilized in certain areas of India, most normally in Kerala and in Lakshadweep. Natural language processing in the Malayalam language is relatively low due to the complexity of the language as well as the scarcity of available resources. In this paper, a way is proposed to deal with the text summarization process in Malayalam documents by training a model based on the Support Vector Machine classification algorithm. Different features of the text are taken into account for training the machine so that the system can output the most important data from the input text. The classifier can classify the most important, important, average, and least significant sentences into separate classes and based on this, the machine will be able to create a summary of the input document. The user can select a compression ratio so that the system will output that much fraction of the summary. The model performance is measured by using different genres of Malayalam documents as well as documents from the same domain. The model is evaluated by considering content evaluation measures precision, recall, F score, and relative utility. Obtained precision and recall value shows that the model is trustable and found to be more relevant compared to the other summarizers.

The Design and Implementation of The Amendment Statement Automatic Generated System for Attached Tables in Legislation (법령 내 별표 서식에 대한 개정지시문 자동 생성 시스템의 설계 및 구현)

  • Cho, Sung Soo;Jo, Dae Woong;Kim, Myung Ho
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.4
    • /
    • pp.111-122
    • /
    • 2014
  • Legislation are social norms that give directly or indirectly, huge impact on the social or corporate, personal problems, unlike a normal document. Also, over time it has a feature constantly changing by the laws enactment and amendment, repealed. The amendment statement automatic generated system is used for purpose of proclamation to those. However, existing system is able to generate amendment statement just text body of law how compare and analyze the current legislation and amendment legislation. However, actual legislation to be created attached table of the table form in complex structure besides simple text form as body text. In this paper, we additional implement attached table processing to existing the amendment statement automatic generated system that containing the table does not handle attached table. We were analyse to the amendment statement generated grammar and table structure in attached table of the legislation for processing to attached table. Also proposed a method to compare attached table in the table. So, it is enable the automatic generation with amendment statement which various forms of legislation the documents.

Linking LOD and MEP Items towards an Automated LOD Elaboration of MEP Design

  • Shin, Minso;Park, SeongHun;Kim, Tae wan
    • International conference on construction engineering and project management
    • /
    • 2022.06a
    • /
    • pp.768-775
    • /
    • 2022
  • Current MEP designs are mostly applied by 2D-based design methods and tend to focus on simple modeling or geometry information expression such as converting 2D-written drawings into 3D modeling without taking advantage of the strength of BIM application. To increase the demand for BIM-based MEP design, geometric information, and property information of each member of the 3D model must be conveniently linked from the phase of the Design Development (DD) to the phase of Construction Document (CD). To conveniently implement a detailed model at each phase, the detailed level of each member of the 3D model must be specific, and an automatic generation of objects at each phase and automatic detailing module for each LOD are required. However, South Korea's guidelines have comprehensive standards for the degree of MEP modeling details for each design phase, and the application of each design phase is ambiguous. Furthermore, in practice, detailed levels of each phase are input manually. Therefore, this paper summarized the detailed standards of MEP modeling for each design phase through interviews with MEP design companies and related literature research. In addition, items that enable auto-detailing with DYNAMO were selected using the checklist for each design phase, and the types of detailed methods were presented. Auto-detailing items considering the detailed level of each phase were classified by members. If a DYNAMO algorithm is produced that automates selected auto-detailing items in this paper, the time and costs required for modeling construction will be reduced, and the demand for MEP design will increase.

  • PDF

A Generation from Entity-Relationship Model to XML Schema Model (개체-관계 모델에선 XML Schema의 생성)

  • Kim, Chang-Suk;Kim, Dae-Su;Son, Dong-Cheul
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.14 no.6
    • /
    • pp.667-673
    • /
    • 2004
  • The XML is emerging as standard language for data exchange on the Web. Therefore the demand of XML Schema(W3C XML Schema Spec.) that verifies XML document becomes increasing. However, XML Schema has a weak point for design because of its complication despite of various data and abundant expressiveness. This paper shows a simple way of design for XML Schema using a fundamental means for database design, the Entity-Relationship model. The conversion from the Entity-Relationship model to XML Schema can not be directly on account of discordance between the two models. So we present some algorithms to generate XML Schema from the Entity-Relationship model. The algorithms produce XML Schema codes using a hierarchical view representation. An important objective of this automatic generation is to preserve XML Schema's characteristics such as reusability, global and local ability, ability of expansion and various type changes.

A Study on Mapping Relations between eBook Contents for Conversion (전자책 문서 변환을 위한 컨텐츠 대응 관계에 관한 연구)

  • 고승규;임순범;김성혁;최윤철
    • The Journal of Society for e-Business Studies
    • /
    • v.8 no.2
    • /
    • pp.99-111
    • /
    • 2003
  • By virtue of diverse advantages derived from digital media, eBook is getting started to use. And many market research agencies have predicted that its market will be greatly expanded soon. But against those expectations, copyright-related problems and the difficulties of its accessing inherited from various eBook content formats become an obstacle to its diffusion. The first problems can be solved by DRM technology. And to solve the second problems, each nation has published its own content standard format. But the domestic standards are useful only the domestic level, they still leave the problems in the national level. The variety of content formats has created a demand for mechanisms that allow the exchange of eBook contents. Therefore we study the mapping relations between eBook contents for conversion. To define the mapping relations, first we extract the mapping both between eBook contents and between normal XML documents. From those mappings, we define seven mapping relations and classify them by cardinality. And we analyze the classified relations, which can be generated by automatic, or not. Using these results, we also classify the eBook content conversion as automatic, semi-automatic, and manual. Besides, we provide the conversion templates for mapping relations for automatic generation of conversion scripts. To show the feasibility of conversion templates, we apply them to the eBook content conversion. Experiment shows that our conversion templates generate the conversion scripts properly. We expected that defined mapping relations and conversion templates can be used not only in eBook content conversion , but also in normal XML document conversion.

  • PDF

Subject-Balanced Intelligent Text Summarization Scheme (주제 균형 지능형 텍스트 요약 기법)

  • Yun, Yeoil;Ko, Eunjung;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.2
    • /
    • pp.141-166
    • /
    • 2019
  • Recently, channels like social media and SNS create enormous amount of data. In all kinds of data, portions of unstructured data which represented as text data has increased geometrically. But there are some difficulties to check all text data, so it is important to access those data rapidly and grasp key points of text. Due to needs of efficient understanding, many studies about text summarization for handling and using tremendous amounts of text data have been proposed. Especially, a lot of summarization methods using machine learning and artificial intelligence algorithms have been proposed lately to generate summary objectively and effectively which called "automatic summarization". However almost text summarization methods proposed up to date construct summary focused on frequency of contents in original documents. Those summaries have a limitation for contain small-weight subjects that mentioned less in original text. If summaries include contents with only major subject, bias occurs and it causes loss of information so that it is hard to ascertain every subject documents have. To avoid those bias, it is possible to summarize in point of balance between topics document have so all subject in document can be ascertained, but still unbalance of distribution between those subjects remains. To retain balance of subjects in summary, it is necessary to consider proportion of every subject documents originally have and also allocate the portion of subjects equally so that even sentences of minor subjects can be included in summary sufficiently. In this study, we propose "subject-balanced" text summarization method that procure balance between all subjects and minimize omission of low-frequency subjects. For subject-balanced summary, we use two concept of summary evaluation metrics "completeness" and "succinctness". Completeness is the feature that summary should include contents of original documents fully and succinctness means summary has minimum duplication with contents in itself. Proposed method has 3-phases for summarization. First phase is constructing subject term dictionaries. Topic modeling is used for calculating topic-term weight which indicates degrees that each terms are related to each topic. From derived weight, it is possible to figure out highly related terms for every topic and subjects of documents can be found from various topic composed similar meaning terms. And then, few terms are selected which represent subject well. In this method, it is called "seed terms". However, those terms are too small to explain each subject enough, so sufficient similar terms with seed terms are needed for well-constructed subject dictionary. Word2Vec is used for word expansion, finds similar terms with seed terms. Word vectors are created after Word2Vec modeling, and from those vectors, similarity between all terms can be derived by using cosine-similarity. Higher cosine similarity between two terms calculated, higher relationship between two terms defined. So terms that have high similarity values with seed terms for each subjects are selected and filtering those expanded terms subject dictionary is finally constructed. Next phase is allocating subjects to every sentences which original documents have. To grasp contents of all sentences first, frequency analysis is conducted with specific terms that subject dictionaries compose. TF-IDF weight of each subjects are calculated after frequency analysis, and it is possible to figure out how much sentences are explaining about each subjects. However, TF-IDF weight has limitation that the weight can be increased infinitely, so by normalizing TF-IDF weights for every subject sentences have, all values are changed to 0 to 1 values. Then allocating subject for every sentences with maximum TF-IDF weight between all subjects, sentence group are constructed for each subjects finally. Last phase is summary generation parts. Sen2Vec is used to figure out similarity between subject-sentences, and similarity matrix can be formed. By repetitive sentences selecting, it is possible to generate summary that include contents of original documents fully and minimize duplication in summary itself. For evaluation of proposed method, 50,000 reviews of TripAdvisor are used for constructing subject dictionaries and 23,087 reviews are used for generating summary. Also comparison between proposed method summary and frequency-based summary is performed and as a result, it is verified that summary from proposed method can retain balance of all subject more which documents originally have.

Development of Template for Automatic Generation of Presentation Layer in J2EE-Based Web Applications (J2EE기반의 웹 애플리케이션을 위한 프리젠테이션 계층 자동생성 템플릿 개발)

  • 유철중;채정화;김송주;장옥배
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.9 no.2
    • /
    • pp.133-145
    • /
    • 2003
  • Web applications based on J2EE($Java^{TM}$ 2 Platform, Enterprise Edition) were occurred for solution to overcome the limitations in time and space that the former applications had. Recently, lots of solutions using frameworks are being suggested to develope applications more quickly and efficiently. In this paper, we propose the template for several processes and types, which should be taken in presentation layer of web applications. This idea was based on the fact that web applications developers can concentrate on their specific tasks with independent manner in layered architecture. This template is XML-typed document that shows information about presentation layer of Web applications, which the user wants to compose. This template is inputted to the code generator. After then, the code generator generates skeleton code in presentation layer automatically after parsing information of XML documents. It means that we can develope Web applications more efficiently, by constructing skeleton code which inherits from hot spot classes of framework. Using this template and code generator, developer can develop Web applications with little practice and also is easy to cooperate with other developers to develop them just in time with distributing the standard development process.

Twitter Issue Tracking System by Topic Modeling Techniques (토픽 모델링을 이용한 트위터 이슈 트래킹 시스템)

  • Bae, Jung-Hwan;Han, Nam-Gi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.109-122
    • /
    • 2014
  • People are nowadays creating a tremendous amount of data on Social Network Service (SNS). In particular, the incorporation of SNS into mobile devices has resulted in massive amounts of data generation, thereby greatly influencing society. This is an unmatched phenomenon in history, and now we live in the Age of Big Data. SNS Data is defined as a condition of Big Data where the amount of data (volume), data input and output speeds (velocity), and the variety of data types (variety) are satisfied. If someone intends to discover the trend of an issue in SNS Big Data, this information can be used as a new important source for the creation of new values because this information covers the whole of society. In this study, a Twitter Issue Tracking System (TITS) is designed and established to meet the needs of analyzing SNS Big Data. TITS extracts issues from Twitter texts and visualizes them on the web. The proposed system provides the following four functions: (1) Provide the topic keyword set that corresponds to daily ranking; (2) Visualize the daily time series graph of a topic for the duration of a month; (3) Provide the importance of a topic through a treemap based on the score system and frequency; (4) Visualize the daily time-series graph of keywords by searching the keyword; The present study analyzes the Big Data generated by SNS in real time. SNS Big Data analysis requires various natural language processing techniques, including the removal of stop words, and noun extraction for processing various unrefined forms of unstructured data. In addition, such analysis requires the latest big data technology to process rapidly a large amount of real-time data, such as the Hadoop distributed system or NoSQL, which is an alternative to relational database. We built TITS based on Hadoop to optimize the processing of big data because Hadoop is designed to scale up from single node computing to thousands of machines. Furthermore, we use MongoDB, which is classified as a NoSQL database. In addition, MongoDB is an open source platform, document-oriented database that provides high performance, high availability, and automatic scaling. Unlike existing relational database, there are no schema or tables with MongoDB, and its most important goal is that of data accessibility and data processing performance. In the Age of Big Data, the visualization of Big Data is more attractive to the Big Data community because it helps analysts to examine such data easily and clearly. Therefore, TITS uses the d3.js library as a visualization tool. This library is designed for the purpose of creating Data Driven Documents that bind document object model (DOM) and any data; the interaction between data is easy and useful for managing real-time data stream with smooth animation. In addition, TITS uses a bootstrap made of pre-configured plug-in style sheets and JavaScript libraries to build a web system. The TITS Graphical User Interface (GUI) is designed using these libraries, and it is capable of detecting issues on Twitter in an easy and intuitive manner. The proposed work demonstrates the superiority of our issue detection techniques by matching detected issues with corresponding online news articles. The contributions of the present study are threefold. First, we suggest an alternative approach to real-time big data analysis, which has become an extremely important issue. Second, we apply a topic modeling technique that is used in various research areas, including Library and Information Science (LIS). Based on this, we can confirm the utility of storytelling and time series analysis. Third, we develop a web-based system, and make the system available for the real-time discovery of topics. The present study conducted experiments with nearly 150 million tweets in Korea during March 2013.