• 제목/요약/키워드: Weight by Paragraph

검색결과 13건 처리시간 0.031초

DBSCAN을 활용한 유의어 변환 문서 유사도 측정 방법 (A Method for Measuring Similarity Measure of Thesaurus Transformation Documents using DBSCAN)

  • 김병식;신주현
    • 한국멀티미디어학회논문지
    • /
    • 제21권9호
    • /
    • pp.1035-1043
    • /
    • 2018
  • There is a case where the core content of another person's work is decorated as though it is his own thoughts by changing own thoughts without showing the source. Plagiarism test of copykiller free service used in plagiarism check is performed by comparing plagiarism more than 6th word. However, it is not enough to judge it as a plagiarism with a six - word match if it is replaced with a similar word. Therefore, in this paper, we construct word clusters by using DBSCAN algorithm, find synonyms, convert the words in the clusters into representative synonyms, and construct L-R tables through L-R parsing. We then propose a method for determining the similarity of documents by applying weights to the thesaurus and weights for each paragraph of the thesis.

XML 문서 키워드 가중치 분석 기반 문단 추출 모델 (XML Document Keyword Weight Analysis based Paragraph Extraction Model)

  • 이종원;강인식;정회경
    • 한국정보통신학회논문지
    • /
    • 제21권11호
    • /
    • pp.2133-2138
    • /
    • 2017
  • 기존의 XML 문서나 다른 문서는 단어를 중심으로 분석이 진행되었다. 이는 형태소 분석기를 활용하여 구현이 가능하나 문서 내에 기재되어 있는 많은 단어를 분류할 뿐 문서의 핵심 내용을 파악하기에는 어려움이 있다. 사용자가 문서를 효율적으로 이해하기 위해서는 주요 단어가 포함되어 있는 문단을 추출하여 사용자에게 보여주어야 한다. 본 논문에서 제안하는 시스템은 정규화 된 XML 문서 내에 키워드를 검색하고 사용자가 입력한 키워드들이 포함되어 있는 문단을 추출하여 사용자에게 보여준다. 그리고 검색에 사용된 키워드들의 빈도수와 가중치를 사용자에게 알려주고 추출한 문단의 순서와 중복 제거 기능을 통해 사용자가 문서를 이해하는데 발생할 수 있는 오류를 최소화하였다. 제안하는 시스템은 사용자가 문서 전체를 읽지 않고 문서를 이해할 수 있게 하여 문서를 이해하는데 필요한 시간과 노력을 최소화할 수 있을 것으로 사료된다.

HTML 문서의 시각적 분석을 이용한 사용자 프로파일 생성 (User Profile Generation using Visual Differences of HTML Document)

  • 곽주현;이창훈
    • 한국정보처리학회논문지
    • /
    • 제7권6호
    • /
    • pp.1827-1833
    • /
    • 2000
  • In this study, I've suggested how to improve the function of web-agents to find out the web-document users prefer. Web-agents employ TFIDF, which considers all the worked used in a document as equal in improtance to find out users' preferences. Web-documents like HTML, however, make visual differences by using different sizes of letters and highlighting them based on importance of words. In this study, I've attempted to improve the functions of the web-agents by differentiating the weight of each worked in accordance with the visual importance of each paragraph. To enhance functions, I've suggested how to make a profile from each paragraph to be consolidated later. As to suggested algorithms, I've tested their effects by comparing the established TFIDF algorithm with the function which helps users find documents they prefer.

  • PDF

문단 가중치 분석 기반 본문 영역 선정 알고리즘 (Keyword Weight based Paragraph Extraction Algorithm)

  • 이종원;유성종;김도안;정회경
    • 한국정보통신학회:학술대회논문집
    • /
    • 한국정보통신학회 2018년도 춘계학술대회
    • /
    • pp.462-463
    • /
    • 2018
  • 기존의 문서 분석 시스템들은 형태소 분석기나 TF-IDF 기법을 통해 단어 위주의 분석을 진행하였다. 이러한 시스템들은 키워드들의 가중치를 계산하여 주요 키워드를 도출할 수 있는 장점이 있다. 이에 반해 문서의 내용을 분석하기에는 구조적인 한계로 인해 부적합한 실정이다. 이를 해결하기 위해 본 논문에서 제안하는 알고리즘은 문서 내에 있는 문단들의 가중치를 계산한 뒤 문단들을 영역별로 분할한다. 그리고 분할된 영역별로 중요도를 계산하여 해당 문서 내에 가장 중요한 문단들이 있는 영역을 사용자에게 알려준다. 이를 통해 사용자는 기존의 문서 분석 시스템들을 사용할 때보다 문서를 분석하기에 적합한 서비스를 제공받을 것으로 사료된다.

  • PDF

키워드 가중치 기반 문단 추출 알고리즘 (Keyword Weight based Paragraph Extraction Algorithm)

  • 이종원;주상웅;이현주;정회경
    • 한국정보통신학회:학술대회논문집
    • /
    • 한국정보통신학회 2017년도 추계학술대회
    • /
    • pp.504-505
    • /
    • 2017
  • 기존의 형태소 분석기는 문서 내에 사용된 단어들을 분류한다. 이를 기반으로 문장과 문단을 추출하는 시스템이 개발되고 있으나 해당 문서를 압축하여 주요 문단을 추출하는 시스템은 매우 미흡한 실정이다. 본 논문에서 제안하는 알고리즘은 문서 내에 사용된 키워드들의 가중치를 계산하고 키워드를 포함한 문단들을 추출한다. 이는 해당 문서를 모두 읽지 않고 키워드가 포함된 문단들을 읽음으로써 문서를 이해하는 시간을 줄일 수 있다. 또한 검색에 사용된 키워드의 개수에 따라 추출되는 문단의 수가 다름으로 사용자는 기존 시스템에 비해 다양한 패턴의 검색이 가능하다.

  • PDF

공동주택 성능등급 표시제도 상의 음성능 표시기준 고찰 (A Study on the Acoustic Performance Indication Standards of Apartment Housing Performance Grade Indication System)

  • 양관섭;김경우
    • 한국소음진동공학회:학술대회논문집
    • /
    • 한국소음진동공학회 2006년도 춘계학술대회논문집
    • /
    • pp.1252-1255
    • /
    • 2006
  • The government has enforced Housing Performance Grade Indication System (Article 21, Paragraph 2 of Housing Act) starting January 2006 for the purpose of giving users in hope of toying an apartment opportunities to select housing based on personal preferences by providing information on housing performance at the time of tenant recruitment announcement as well as securing desirable environment (comfort) by encouraging construction companies to build housing of the indicated performance level. The acoustic performance indication items include three items such as floor impact isolation performance(light weight impact sound, heavy weight impact sound), bathroom noise and insulation performance of boundary walls between households. This paper explains the background, the basis of creation and evaluation method focused on the acoustic environment performance helping for the developer of technique and a staff in charge of construction business who cope with this system.

  • PDF

항공운송증권(航空運送證卷) (Documents of Air Carriage)

  • 최준선
    • 항공우주정책ㆍ법학회지
    • /
    • 제7권
    • /
    • pp.101-134
    • /
    • 1995
  • Article 3 Paragraph 1 of the Warsaw Convention regulates the requirements of passenger tickets, Article 4 Paragraph 3, the requirements of baggage tickets, Article 8, the requirements of airway bills. In this article the writer has discussed the legal nature of the documents of air carriage, such as air waybills, passenger tickets and baggage checks. Further, the writer has also discussed several issues relating to the use of the documents of air carriage under the Warsaw Convention. Article 3 Paragraph 2, as well as Article 4 Paragraph 4 and 9 provides that the carrier shall not be entitled to avail himself of the provisions of the Convention which evade or limit his liability. In particular, the Montreal Agreement of 1966 provides that the notification on the carrier's liability in passenger ticket should be printed in more than 10 point type size with contrasting ink colors. However, another question is whether the carrier shall not be entitled to avail himself of the liability limit under the Convention in case the type size is below 10 points. The Convention does not specify the type size of certain parts in passenger tickets and only provides that the carrier shall not be entitled to avail himself of liability limit, when a carrier fails to deliver the ticket to passenger. However, since the delivery of passenger tickets is to provide an opportunity for passengers to recognize the liability limit under the Convention and to map out a subsequent measures, the carrier who fails to give this opportunity shall not be entitled to avail himself of the liability limit under the Convention. But some decisions argue that when the notice on the carrier's liability limit is presented in a fine print in a hardly noticeable place, the carrier shall not be entitled to avail himself under the Convention. Meanwhile, most decisions declare that regardless of the type size, the carrier is entitled to avail himself of liability limit of the provisions of the Convention. The reason is that neither the Warsaw Convention nor the Montreal Agreement stipulate that the carrier is deprived from the right to avail himself of liability limit of the provisions of the Convention when violating the notice requirement. In particular, the main objective of the Montreal Agreement is not on the notice of liability limit but on the increase of it. The latest decisons also maintain the same view. This issue seems to have beeen settled on the occasion of Elisa Chan, et al. vs. Korean Airlines Ltd. The U.S. Supreme Court held that the type size of passenger ticket can not be a target of controversy since it is not required by law, after a cautious interpretation of the Warsaw Convention and the Montreal Agreement highlighting the fact that no grounds for that are found both in the Warsaw Convention and the Montreal Agreement. Now the issue of type size can hardly become any grounds for the carrier not to exclude himself from the liability limit. In this regard, any challenge to raise issue on type size seems to be defeated. The same issue can be raised in both airway bills and baggage tickets. But this argument can be raised only to the tranportation where the original Convention is applied. This creates no problem under the Convention revised by the Hague Protocol, because the Hague Protocol does not require any information on weight, bulk, size, and number of cargo or baggage. The problem here is whether the carrier is entitled to avail himself of the liability limit of the provisions of the Convention when no information on number or weight of the consigned packages is available in accordance with Article 4 of the Convention. Currently the majority of decisions show positive stance on this. The carrier is entitled to avail himself of the liability limit of the provisions of the Convention when the requirement of information on number and weight of consigned packages is skipped, because these requirements are too technical and insubstancial. However some decisions declare just the opposite. They hold that the provisions of the Convention Article 4 is clear, and their meaning and effect should be imposed on it literally and that it is neither unjust nor too technical for a carrier to meet the minimum requirement prescribed in the Convention. Up to now, no decisions by the U.S. Supreme Court on this issue is available.

  • PDF

Deep Learning Document Analysis System Based on Keyword Frequency and Section Centrality Analysis

  • Lee, Jongwon;Wu, Guanchen;Jung, Hoekyung
    • Journal of information and communication convergence engineering
    • /
    • 제19권1호
    • /
    • pp.48-53
    • /
    • 2021
  • Herein, we propose a document analysis system that analyzes papers or reports transformed into XML(Extensible Markup Language) format. It reads the document specified by the user, extracts keywords from the document, and compares the frequency of keywords to extract the top-three keywords. It maintains the order of the paragraphs containing the keywords and removes duplicated paragraphs. The frequency of the top-three keywords in the extracted paragraphs is re-verified, and the paragraphs are partitioned into 10 sections. Subsequently, the importance of the relevant areas is calculated and compared. By notifying the user of areas with the highest frequency and areas with higher importance than the average frequency, the user can read only the main content without reading all the contents. In addition, the number of paragraphs extracted through the deep learning model and the number of paragraphs in a section of high importance are predicted.

한국인 구음장애 환자의 발화 데이터 기반 질병 예측을 위한 모바일 애플리케이션 개발 (Development of a Mobile Application for Disease Prediction Using Speech Data of Korean Patients with Dysarthria)

  • 하창진;고태식
    • 대한의용생체공학회:의공학회지
    • /
    • 제45권1호
    • /
    • pp.1-9
    • /
    • 2024
  • Communication with others plays an important role in human social interaction and information exchange in modern society. However, some individuals have difficulty in communicating due to dysarthria. Therefore, it is necessary to develop effective diagnostic techniques for early treatment of the dysarthria. In the present study, we propose a mobile device-based methodology that enables to automatically classify dysarthria type. The light-weight CNN model was trained by using the open audio dataset of Korean patients with dysarthria. The trained CNN model can successfully classify dysarthria into related subtype disease with 78.8%~96.6% accuracy. In addition, the user-friendly mobile application was also developed based on the trained CNN model. Users can easily record their voices according to the selected inspection type (e.g. word, sentence, paragraph, and semi-free speech) and evaluate the recorded voice data through their mobile device and the developed mobile application. This proposed technique would be helpful for personal management of dysarthria and decision making in clinic.

문서 분석 기반 주요 요소 추출 시스템 (Document Analysis based Main Requisite Extraction System)

  • 이종원;여일연;정회경
    • 한국정보통신학회논문지
    • /
    • 제23권4호
    • /
    • pp.401-406
    • /
    • 2019
  • 본 논문에서는 XML 형태의 논문이나 보고서로 작성된 문서를 분석하는 시스템을 제안한다. 논문이나 보고서에서 지정한 키워드를 추출하고 이를 사용자에게 보여준 뒤 사용자가 해당 문서 내에서 검색을 원하는 키워드를 입력하면 각 키워드들을 포함하고 있는 문단들을 추출한다. 시스템은 사용자가 입력한 키워드들의 빈도수를 확인하고 가중치를 계산한 뒤 가중치가 가장 낮은 키워드만을 포함한 문단들을 제거한다. 또한, 정제된 문단들을 10개의 영역으로 나눈 뒤 영역별 문단들의 중요도를 계산하고 각 영역들의 중요도를 비교하여 가장 높은 중요도를 갖는 주요 영역을 사용자에게 알려준다. 이러한 특징들로 인해 제안하는 시스템을 활용할 경우 기존의 문서 분석 시스템을 활용하여 논문이나 보고서를 분석하는 것보다 압축률이 높은 형태로 주요 문단들을 제공받을 수 있다. 이로 인해 문서를 이해하는데 필요한 시간을 줄일 수 있을 것으로 사료된다.