• Title/Summary/Keyword: 주제별 문장 분류

Search Result 11, Processing Time 0.029 seconds

Article Analytic and Summarizing Algorithm by facilitating TF-IDF based on k-means (TF-IDF를 활용한 k-means 기반의 효율적인 대용량 기사 처리 및 요약 알고리즘)

  • Jang, Minseo;OH, Sujin;Kim, Ung-Mo
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2018.05a
    • /
    • pp.271-274
    • /
    • 2018
  • 본 논문에서는 뉴스기사 데이터를 활용하여 대규모 뉴스기사를 소주제로 분류하는 군집 분석 방법을 제안한다. 또한, 분류된 뉴스기사를 사용자가 빠르게 이해하고 접할 수 있도록 핵심 문장을 추출하여 제공하는 방법을 제안한다. 분석 데이터는 포털 사이트 점유율 1위인 네이버의 경제 분야 뉴스기사를 크롤링하여 수집한다. 뉴스기사의 분석을 위해 전 처리를 통해 특수문자, 조사, 어미, 구두점 등의 불 용어 처리를 수행한다. 또한, k-means 알고리즘을 이용하여 대용량의 뉴스기사를 주제 별로 분류하는 것을 진행하며 그것을 토대로 핵심 문장을 추출한다. 추출된 핵심 문장은 분류된 뉴스기사의 주제를 나타내며 사용자에게 빠르게 정보를 전달하기 위해 활용한다. 본 논문의 연구 내용이 여러 언론사 사이트에 반영되면 사이트 품질과 사용자 만족도 향상에 기여할 수 있을 것으로 보인다.

Multi-Document Summarization using Time Feature (시간자질을 이용한 다중 문서요약)

  • 임정민;강인수;배재학;이종혁
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2004.04b
    • /
    • pp.898-900
    • /
    • 2004
  • 시간에 중속적인 문서집합에서 사람이 만든 요약문은 시간에 따른 중요 내용의 분포를 보여준다. 본 논문은 다중 문서에 시간 자질을 이용한 문서의 분류와 시간별 문서집합에서 핵심문장과 부가문장을 선별하고, 문장간의 계층적인 클러스터링을 통해서 중요 문장을 선별하는 방법을 제안한다. 동일한 주제를 갖는 문서집합에서 사랑이 선택한 중요 문장에 대해서 제안한 방법은 50% 정확률을 나타냈다.

  • PDF

A Sentence Theme Allocation Scheme based on Head Driven Patterns in Encyclopedia Domain (백과사전 영역에서 중심어주도패턴에 기반한 문장주제 할당 기법)

  • Kang Bo-Young;Myaeng Sung-Hyon
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.5
    • /
    • pp.396-405
    • /
    • 2005
  • Since sentences are the basic propositional units of text, their themes would be helpful for various tasks that require knowledge about the semantic content of text. Despite the importance of determining the theme of a sentence, however, few studies have investigated the problem of automatically assigning the theme to a sentence. Therefore, we propose a sentence theme allocation scheme based on the head-driven patterns of sentences in encyclopedia. In a serious of experiments using Dusan Dong-A encyclopedia, the proposed method outperformed the baseline of the theme allocation performance. The head-driven pattern 4, which is reconfigured based on the predicate, showed superior performance in the theme allocation with the average F-score of $98.96\%$ for the training data, and $88.57\%$ for the test data.

Web-based Requirements Elicitation Supporting System using Requirements Sentences Categorization (요구 사항 문장 범주화를 이용한 웹 기반의 요구 사항 추출 지원 시스템)

  • Ko, Young-Joong;Kang, Ki-Sun;Kim, Jae-Seon;Park, Soo-Yong;Seo, Jung-Yun
    • Journal of KIISE:Software and Applications
    • /
    • v.27 no.4
    • /
    • pp.384-392
    • /
    • 2000
  • As a software becomes more complicated and large-scaled, it is very important for a software engineer to analyze user's requirements precisely and apply them effectively in the development stage. Due to the growth of the internet, the necessity of requirements elicitation and analysis in distributed environments has also become larger. This paper proposes a requirements elicitation supporting system that offer the basis for effectively analyzing requirements collected in distributed environments. The proposed system automatically categorizes collected requirements sentences into selected subject fields by measuring their similarity using a similarity measurement technique. Therefore, it reduces the difficulties in the initial stage of requirements analysis and it supports rapid and correct requirements analysis. This paper verifies the efficiency of the proposed system in similarity measurement techniques through experiments, and presents a process for requirements specifications elicitation using the embodied system

  • PDF

Multi-Label Classification Approach to Effective Aspect-Mining (효과적인 애스팩트 마이닝을 위한 다중 레이블 분류접근법)

  • Jong Yoon Won;Kun Chang Lee
    • Information Systems Review
    • /
    • v.22 no.3
    • /
    • pp.81-97
    • /
    • 2020
  • Recent trends in sentiment analysis have been focused on applying single label classification approaches. However, when considering the fact that a review comment by one person is usually composed of several topics or aspects, it would be better to classify sentiments for those aspects respectively. This paper has two purposes. First, based on the fact that there are various aspects in one sentence, aspect mining is performed to classify the emotions by each aspect. Second, we apply the multiple label classification method to analyze two or more dependent variables (output values) at once. To prove our proposed approach's validity, online review comments about musical performances were garnered from domestic online platform, and the multi-label classification approach was applied to the dataset. Results were promising, and potentials of our proposed approach were discussed.

The Blog Polarity Classification Technique using Opinion Mining (오피니언 마이닝을 활용한 블로그의 극성 분류 기법)

  • Lee, Jong-Hyuk;Lee, Won-Sang;Park, Jea-Won;Choi, Jae-Hyun
    • Journal of Digital Contents Society
    • /
    • v.15 no.4
    • /
    • pp.559-568
    • /
    • 2014
  • Previous polarity classification using sentiment analysis utilizes a sentence rule by product reviews based rating points. It is difficult to be applied to blogs which have not rating of product reviews and is possible to fabricate product reviews by comment part-timers and managers who use web site so it is not easy to understand a product and store reviews which are reliability. Considering to these problems, if we analyze blogs which have personal and frank opinions and classify polarity, it is possible to understand rightly opinions for the product, store. This paper suggests that we extract high frequency vocabularies in blogs by several domains and choose topic words. Then we apply a technique of sentiment analysis and classify polarity about contents of blogs. To evaluate performances of sentiment analysis, we utilize the measurement index that use Precision, Recall, F-Score in an information retrieval field. In a result of evaluation, using suggested sentiment analysis is the better performances to classify polarity than previous techniques of using the sentence rule based product reviews.

Detects depression-related emotions in user input sentences (사용자 입력 문장에서 우울 관련 감정 탐지)

  • Oh, Jaedong;Oh, Hayoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.12
    • /
    • pp.1759-1768
    • /
    • 2022
  • This paper proposes a model to detect depression-related emotions in a user's speech using wellness dialogue scripts provided by AI Hub, topic-specific daily conversation datasets, and chatbot datasets published on Github. There are 18 emotions, including depression and lethargy, in depression-related emotions, and emotion classification tasks are performed using KoBERT and KOELECTRA models that show high performance in language models. For model-specific performance comparisons, we build diverse datasets and compare classification results while adjusting batch sizes and learning rates for models that perform well. Furthermore, a person performs a multi-classification task by selecting all labels whose output values are higher than a specific threshold as the correct answer, in order to reflect feeling multiple emotions at the same time. The model with the best performance derived through this process is called the Depression model, and the model is then used to classify depression-related emotions for user utterances.

UX Methodology Study by Data Analysis Focusing on deriving persona through customer segment classification (데이터 분석을 통한 UX 방법론 연구 고객 세그먼트 분류를 통한 페르소나 도출을 중심으로)

  • Lee, Seul-Yi;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.1
    • /
    • pp.151-176
    • /
    • 2021
  • As the information technology industry develops, various kinds of data are being created, and it is now essential to process them and use them in the industry. Analyzing and utilizing various digital data collected online and offline is a necessary process to provide an appropriate experience for customers in the industry. In order to create new businesses, products, and services, it is essential to use customer data collected in various ways to deeply understand potential customers' needs and analyze behavior patterns to capture hidden signals of desire. However, it is true that research using data analysis and UX methodology, which should be conducted in parallel for effective service development, is being conducted separately and that there is a lack of examples of use in the industry. In thiswork, we construct a single process by applying data analysis methods and UX methodologies. This study is important in that it is highly likely to be used because it applies methodologies that are actively used in practice. We conducted a survey on the topic to identify and cluster the associations between factors to establish customer classification and target customers. The research methods are as follows. First, we first conduct a factor, regression analysis to determine the association between factors in the happiness data survey. Groups are grouped according to the survey results and identify the relationship between 34 questions of psychological stability, family life, relational satisfaction, health, economic satisfaction, work satisfaction, daily life satisfaction, and residential environment satisfaction. Second, we classify clusters based on factors affecting happiness and extract the optimal number of clusters. Based on the results, we cross-analyzed the characteristics of each cluster. Third, forservice definition, analysis was conducted by correlating with keywords related to happiness. We leverage keyword analysis of the thumb trend to derive ideas based on the interest and associations of the keyword. We also collected approximately 11,000 news articles based on the top three keywords that are highly related to happiness, then derived issues between keywords through text mining analysis in SAS, and utilized them in defining services after ideas were conceived. Fourth, based on the characteristics identified through data analysis, we selected segmentation and targetingappropriate for service discovery. To this end, the characteristics of the factors were grouped and selected into four groups, and the profile was drawn up and the main target customers were selected. Fifth, based on the characteristics of the main target customers, interviewers were selected and the In-depthinterviews were conducted to discover the causes of happiness, causes of unhappiness, and needs for services. Sixth, we derive customer behavior patterns based on segment results and detailed interviews, and specify the objectives associated with the characteristics. Seventh, a typical persona using qualitative surveys and a persona using data were produced to analyze each characteristic and pros and cons by comparing the two personas. Existing market segmentation classifies customers based on purchasing factors, and UX methodology measures users' behavior variables to establish criteria and redefine users' classification. Utilizing these segment classification methods, applying the process of producinguser classification and persona in UX methodology will be able to utilize them as more accurate customer classification schemes. The significance of this study is summarized in two ways: First, the idea of using data to create a variety of services was linked to the UX methodology used to plan IT services by applying it in the hot topic era. Second, we further enhance user classification by applying segment analysis methods that are not currently used well in UX methodologies. To provide a consistent experience in creating a single service, from large to small, it is necessary to define customers with common goals. To this end, it is necessary to derive persona and persuade various stakeholders. Under these circumstances, designing a consistent experience from beginning to end, through fast and concrete user descriptions, would be a very effective way to produce a successful service.

A study on the case of education to train an archivist - Focus on archival training courses and the tradition of archival science in Italiy - (기록관리전문가의 양성교육에 관한 사례연구 -이탈리아의 기록관리학 전통과 교육과정을 중심으로-)

  • Kim, Jung-Ha
    • Journal of Korean Society of Archives and Records Management
    • /
    • v.1 no.1
    • /
    • pp.201-230
    • /
    • 2001
  • Conserving the recored cultural inheritance is actually the duty of all of us. Above all, the management and conservation of archives and documents is up to archivists who have technical knowledge about archival science. Archivists have to not only conserve archives and documents but also carry out classifying and appraising them in order to define them as current historic ones. The fundamental education about archival science is made up of history and law. Because Archive is the organisation which manage archives and documents produced by legal and administrative actions. Although there are still arguments about technical knowledge and degree archivists have to acquire, most of them prefer the studies related with history and emphasize legal studies to be the general boundary of archivits' ideology and trust. The training course about conservation of archives is conducted in about 9 National Archives of Torino, Milano, Venezia, Genova, Bologna, Parma, Roma, Napoli, Palermo. The training course in 19th was mostly based on the lectures of Phaleography, Diplomatics. There were not the education about archival science yet. Toward the end of 19th and 20th, people stressed the most basic subject in the training course of National Archive was not Phaleography and Diplomatics but archival science. The goal of archival science is to study the institution and organisation transferring archives and documents to Archive. And also it help archivists not wander about with ignorance of organisational and original procedures and divisions but know exactly theirs works. Like this, the studies on institution and organisation have got in the saddle as a branch of archival science since a few ten years. While archival science didn't evoke sympathy among people and experienced the tedious and difficult path in italy and other countries, Archive was managed by experts of other branches. As a result, there were a lot of faults in Archival Science. Specializing training course for Italian archivists came into being under the backdrop of Social Science Institute of Roma National University in 1925. The archival course of universities accomplished by the studies of history, law and economy. And such as Eugenio Casanova and Giorgio Cencetti were devoted archival science was abled to settle down in national archive. The training course for experts of 'archival science, 'Phaleography and Diplomatics' in National Archive of Bologna(Archivio di Stato di Bologna) is one of courses conducted in 17 National Archives in italy. This course is gratuitous and made up of 8 subjects(Archivistica, Paleografia, Diplomatica, Storia dell' Archivio, Notariato e documenti privati, istituzione medievale, istituzione moderna, istituzione contemporanea) students have to complete for two years. Students can receive the degree through passing twice written exam and once oral test. After department of Culture and education finally puts the marks of students, the chief Nationa Archive of Bologna confer the degree of 'archival science Phaleography and Diplomatics' on students passing the exams. This degree authenticates trainees' qualification which enables him to work at the archive in province, district and administrative capital city and archive of comunity and so on. Italian training course naturally leads archivists to keep in contact with valuable cultural inheritance through training in Archive. And it shows the intention to strengthen the affinity with each documents in the spot of archival management before training archivists. Also this is appraised as one of positive policies to conserve the local cultual inheritante in connection with the original qualitity of national archive with testify the history of each region. Traning course for archivist in Italy shows us the way how we have to prepare and proceed it. First, from producing documents to conserving than forever there has introduced 'original order that is to say a general rule to respect the first order given at the time producing documents'. Management of administrative documents is related consistently with one of historical documents. Second, the traning course for archivist is managing around 17 national archives. because italian national archive lay stress not or rducation of theory bus on train for archivest working in the first time of archival science. Third, diplomatics and phaleography for studies about historical document support archives. Forth, the studies on history id proceeding by cooperation between archivist and historian around archive. How our duties is non continuinf disputer who has to conserve and manage document and archives, but traing experts who having ability, vision and flexible thought, responsibility about archivals.

An Analysis of the High School 'Common Science' Contents and Textbooks (고등학교 ‘공통과학’의 교과내용 및 교과서 분석)

  • Lee, Gwang-Ho;Choi, Jong-Bum;Park, Moon-Kook;Cho, Kyu-Seong
    • Journal of the Korean earth science society
    • /
    • v.18 no.6
    • /
    • pp.453-463
    • /
    • 1997
  • The contents of high school 'Common science' textbooks was analyzed qualitatively and quantitatively. Seven common science textbooks were selected and its contents, structure, inquiry, activities, appendix and its characteristics were investigated, and analyzed using the Goal Clusters of Project Synthesis and Romey's indices of text evaluation were calculated. The contents of each unit are not much different among textbooks because they are written according to the curriculum ordinance and textbook guidelines of the Ministry of Education. The textbooks was consist of $471{\sim}519$ pages. It was distribute similarly among the chapter of 'materials', 'forces', lives' and 'earth'. The chapter of 'energy' and 'environment' was treat significantly. The contents and structure of common science is a mere physical consolidation. I make an alternative plan that a topic form. Inquiry activities used in the textbooks are 11 type, however most of that is interpretation of data, experiment, survey and discussion. Ninety six percents of the experiment, belong to the 1st level, four percents of that belong to the 2nd level of the Schwab's inquiry level and there are no activities of the 3rd level. Little attention is given to Goal Cluster I, II, IV in the common science textbooks currently employed. Its content should be broadened to include all Goal Clusters of Project Synthesis. Homey's indices representing the degrees of student involvement. are $0.57{\sim}1.14$ for sentence analysis, $0.60{\sim}1.67$ for figure and diagram analysis, $0.67{\sim}1.50$ for analysis of questions at chapter ends, respectively, student activity per page investigated being $0.6{\sim}0.9$. But chapter summaries cease to repeats the conclusions of the chapter also it be rather formally and inattentively written.

  • PDF