• Title/Summary/Keyword: Weight by Paragraph

Search Result 13, Processing Time 0.027 seconds

A Method for Measuring Similarity Measure of Thesaurus Transformation Documents using DBSCAN (DBSCAN을 활용한 유의어 변환 문서 유사도 측정 방법)

  • Kim, Byeongsik;Shin, Juhyun
    • Journal of Korea Multimedia Society
    • /
    • v.21 no.9
    • /
    • pp.1035-1043
    • /
    • 2018
  • There is a case where the core content of another person's work is decorated as though it is his own thoughts by changing own thoughts without showing the source. Plagiarism test of copykiller free service used in plagiarism check is performed by comparing plagiarism more than 6th word. However, it is not enough to judge it as a plagiarism with a six - word match if it is replaced with a similar word. Therefore, in this paper, we construct word clusters by using DBSCAN algorithm, find synonyms, convert the words in the clusters into representative synonyms, and construct L-R tables through L-R parsing. We then propose a method for determining the similarity of documents by applying weights to the thesaurus and weights for each paragraph of the thesis.

XML Document Keyword Weight Analysis based Paragraph Extraction Model (XML 문서 키워드 가중치 분석 기반 문단 추출 모델)

  • Lee, Jongwon;Kang, Inshik;Jung, Hoekyung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.11
    • /
    • pp.2133-2138
    • /
    • 2017
  • The analysis of existing XML documents and other documents was centered on words. It can be implemented using a morpheme analyzer, but it can classify many words in the document and cannot grasp the core contents of the document. In order for a user to efficiently understand a document, a paragraph containing a main word must be extracted and presented to the user. The proposed system retrieves keyword in the normalized XML document. Then, the user extracts the paragraphs containing the keyword inputted for searching and displays them to the user. In addition, the frequency and weight of the keyword used in the search are informed to the user, and the order of the extracted paragraphs and the redundancy elimination function are minimized so that the user can understand the document. The proposed system can minimize the time and effort required to understand the document by allowing the user to understand the document without reading the whole document.

User Profile Generation using Visual Differences of HTML Document (HTML 문서의 시각적 분석을 이용한 사용자 프로파일 생성)

  • Gwak, Ju-Hyeon;Lee, Chang-Hun
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.6
    • /
    • pp.1827-1833
    • /
    • 2000
  • In this study, I've suggested how to improve the function of web-agents to find out the web-document users prefer. Web-agents employ TFIDF, which considers all the worked used in a document as equal in improtance to find out users' preferences. Web-documents like HTML, however, make visual differences by using different sizes of letters and highlighting them based on importance of words. In this study, I've attempted to improve the functions of the web-agents by differentiating the weight of each worked in accordance with the visual importance of each paragraph. To enhance functions, I've suggested how to make a profile from each paragraph to be consolidated later. As to suggested algorithms, I've tested their effects by comparing the established TFIDF algorithm with the function which helps users find documents they prefer.

  • PDF

Keyword Weight based Paragraph Extraction Algorithm (문단 가중치 분석 기반 본문 영역 선정 알고리즘)

  • Lee, Jongwon;Yu, Seongjong;Kim, Doan;Jung, Hoekyung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2018.05a
    • /
    • pp.462-463
    • /
    • 2018
  • Traditional document analysis systems used word-based analysis using a morphological analyzer or TF-IDF technique. These systems have the advantage of being able to derive key keywords by calculating the weights of the keywords. On the other hand, it is not appropriate to analyze the contents of documents due to the structural limitations. To solve this problem, the proposed algorithm calculates the weights of the documents in the document and divides the paragraphs into areas. And we calculate the importance of the divided regions and let the user know the area with the most important paragraphs in the document. So, it is expected that the user will be provided with a service suitable for analyzing documents rather than using existing document analysis systems.

  • PDF

Keyword Weight based Paragraph Extraction Algorithm (키워드 가중치 기반 문단 추출 알고리즘)

  • Lee, Jongwon;Joo, Sangwoong;Lee, Hyunju;Jung, Hoekyung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2017.10a
    • /
    • pp.504-505
    • /
    • 2017
  • Existing morpheme analyzers classify the words used in writing documents. A system for extracting sentences and paragraphs based on a morpheme analyzer is being developed. However, there are very few systems that compress documents and extract important paragraphs. The algorithm proposed in this paper calculates the weights of the keyword written in the document and extracts the paragraphs containing the keyword. Users can reduce the time to understand the document by reading the paragraphs containing the keyword without reading the entire document. In addition, since the number of extracted paragraphs differs according to the number of keyword used in the search, the user can search various patterns compared to the existing system.

  • PDF

A Study on the Acoustic Performance Indication Standards of Apartment Housing Performance Grade Indication System (공동주택 성능등급 표시제도 상의 음성능 표시기준 고찰)

  • Yang, Kwan-Seop;Kim, Kyoung-Woo
    • Proceedings of the Korean Society for Noise and Vibration Engineering Conference
    • /
    • 2006.05a
    • /
    • pp.1252-1255
    • /
    • 2006
  • The government has enforced Housing Performance Grade Indication System (Article 21, Paragraph 2 of Housing Act) starting January 2006 for the purpose of giving users in hope of toying an apartment opportunities to select housing based on personal preferences by providing information on housing performance at the time of tenant recruitment announcement as well as securing desirable environment (comfort) by encouraging construction companies to build housing of the indicated performance level. The acoustic performance indication items include three items such as floor impact isolation performance(light weight impact sound, heavy weight impact sound), bathroom noise and insulation performance of boundary walls between households. This paper explains the background, the basis of creation and evaluation method focused on the acoustic environment performance helping for the developer of technique and a staff in charge of construction business who cope with this system.

  • PDF

Documents of Air Carriage (항공운송증권(航空運送證卷))

  • Choi, June-sun
    • The Korean Journal of Air & Space Law and Policy
    • /
    • v.7
    • /
    • pp.101-134
    • /
    • 1995
  • Article 3 Paragraph 1 of the Warsaw Convention regulates the requirements of passenger tickets, Article 4 Paragraph 3, the requirements of baggage tickets, Article 8, the requirements of airway bills. In this article the writer has discussed the legal nature of the documents of air carriage, such as air waybills, passenger tickets and baggage checks. Further, the writer has also discussed several issues relating to the use of the documents of air carriage under the Warsaw Convention. Article 3 Paragraph 2, as well as Article 4 Paragraph 4 and 9 provides that the carrier shall not be entitled to avail himself of the provisions of the Convention which evade or limit his liability. In particular, the Montreal Agreement of 1966 provides that the notification on the carrier's liability in passenger ticket should be printed in more than 10 point type size with contrasting ink colors. However, another question is whether the carrier shall not be entitled to avail himself of the liability limit under the Convention in case the type size is below 10 points. The Convention does not specify the type size of certain parts in passenger tickets and only provides that the carrier shall not be entitled to avail himself of liability limit, when a carrier fails to deliver the ticket to passenger. However, since the delivery of passenger tickets is to provide an opportunity for passengers to recognize the liability limit under the Convention and to map out a subsequent measures, the carrier who fails to give this opportunity shall not be entitled to avail himself of the liability limit under the Convention. But some decisions argue that when the notice on the carrier's liability limit is presented in a fine print in a hardly noticeable place, the carrier shall not be entitled to avail himself under the Convention. Meanwhile, most decisions declare that regardless of the type size, the carrier is entitled to avail himself of liability limit of the provisions of the Convention. The reason is that neither the Warsaw Convention nor the Montreal Agreement stipulate that the carrier is deprived from the right to avail himself of liability limit of the provisions of the Convention when violating the notice requirement. In particular, the main objective of the Montreal Agreement is not on the notice of liability limit but on the increase of it. The latest decisons also maintain the same view. This issue seems to have beeen settled on the occasion of Elisa Chan, et al. vs. Korean Airlines Ltd. The U.S. Supreme Court held that the type size of passenger ticket can not be a target of controversy since it is not required by law, after a cautious interpretation of the Warsaw Convention and the Montreal Agreement highlighting the fact that no grounds for that are found both in the Warsaw Convention and the Montreal Agreement. Now the issue of type size can hardly become any grounds for the carrier not to exclude himself from the liability limit. In this regard, any challenge to raise issue on type size seems to be defeated. The same issue can be raised in both airway bills and baggage tickets. But this argument can be raised only to the tranportation where the original Convention is applied. This creates no problem under the Convention revised by the Hague Protocol, because the Hague Protocol does not require any information on weight, bulk, size, and number of cargo or baggage. The problem here is whether the carrier is entitled to avail himself of the liability limit of the provisions of the Convention when no information on number or weight of the consigned packages is available in accordance with Article 4 of the Convention. Currently the majority of decisions show positive stance on this. The carrier is entitled to avail himself of the liability limit of the provisions of the Convention when the requirement of information on number and weight of consigned packages is skipped, because these requirements are too technical and insubstancial. However some decisions declare just the opposite. They hold that the provisions of the Convention Article 4 is clear, and their meaning and effect should be imposed on it literally and that it is neither unjust nor too technical for a carrier to meet the minimum requirement prescribed in the Convention. Up to now, no decisions by the U.S. Supreme Court on this issue is available.

  • PDF

Deep Learning Document Analysis System Based on Keyword Frequency and Section Centrality Analysis

  • Lee, Jongwon;Wu, Guanchen;Jung, Hoekyung
    • Journal of information and communication convergence engineering
    • /
    • v.19 no.1
    • /
    • pp.48-53
    • /
    • 2021
  • Herein, we propose a document analysis system that analyzes papers or reports transformed into XML(Extensible Markup Language) format. It reads the document specified by the user, extracts keywords from the document, and compares the frequency of keywords to extract the top-three keywords. It maintains the order of the paragraphs containing the keywords and removes duplicated paragraphs. The frequency of the top-three keywords in the extracted paragraphs is re-verified, and the paragraphs are partitioned into 10 sections. Subsequently, the importance of the relevant areas is calculated and compared. By notifying the user of areas with the highest frequency and areas with higher importance than the average frequency, the user can read only the main content without reading all the contents. In addition, the number of paragraphs extracted through the deep learning model and the number of paragraphs in a section of high importance are predicted.

Development of a Mobile Application for Disease Prediction Using Speech Data of Korean Patients with Dysarthria (한국인 구음장애 환자의 발화 데이터 기반 질병 예측을 위한 모바일 애플리케이션 개발)

  • Changjin Ha;Taesik Go
    • Journal of Biomedical Engineering Research
    • /
    • v.45 no.1
    • /
    • pp.1-9
    • /
    • 2024
  • Communication with others plays an important role in human social interaction and information exchange in modern society. However, some individuals have difficulty in communicating due to dysarthria. Therefore, it is necessary to develop effective diagnostic techniques for early treatment of the dysarthria. In the present study, we propose a mobile device-based methodology that enables to automatically classify dysarthria type. The light-weight CNN model was trained by using the open audio dataset of Korean patients with dysarthria. The trained CNN model can successfully classify dysarthria into related subtype disease with 78.8%~96.6% accuracy. In addition, the user-friendly mobile application was also developed based on the trained CNN model. Users can easily record their voices according to the selected inspection type (e.g. word, sentence, paragraph, and semi-free speech) and evaluate the recorded voice data through their mobile device and the developed mobile application. This proposed technique would be helpful for personal management of dysarthria and decision making in clinic.

Document Analysis based Main Requisite Extraction System (문서 분석 기반 주요 요소 추출 시스템)

  • Lee, Jongwon;Yeo, Ilyeon;Jung, Hoekyung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.23 no.4
    • /
    • pp.401-406
    • /
    • 2019
  • In this paper, we propose a system for analyzing documents in XML format and in reports. The system extracts the paper or reports of keywords, shows them to the user, and then extracts the paragraphs containing the keywords by inputting the keywords that the user wants to search within the document. The system checks the frequency of keywords entered by the user, calculates weights, and removes paragraphs containing only keywords with the lowest weight. Also, we divide the refined paragraphs into 10 regions, calculate the importance of the paragraphs per region, compare the importance of each region, and inform the user of the main region having the highest importance. With these features, the proposed system can provide the main paragraphs with higher compression ratio than analyzing the papers or reports using the existing document analysis system. This will reduce the time required to understand the document.