• Title/Summary/Keyword: summarization

Search Result 378, Processing Time 0.02 seconds

Parallel Multithreaded Processing for Data Set Summarization on Multicore CPUs

  • Ordonez, Carlos;Navas, Mario;Garcia-Alvarado, Carlos
    • Journal of Computing Science and Engineering
    • /
    • v.5 no.2
    • /
    • pp.111-120
    • /
    • 2011
  • Data mining algorithms should exploit new hardware technologies to accelerate computations. Such goal is difficult to achieve in database management system (DBMS) due to its complex internal subsystems and because data mining numeric computations of large data sets are difficult to optimize. This paper explores taking advantage of existing multithreaded capabilities of multicore CPUs as well as caching in RAM memory to efficiently compute summaries of a large data set, a fundamental data mining problem. We introduce parallel algorithms working on multiple threads, which overcome the row aggregation processing bottleneck of accessing secondary storage, while maintaining linear time complexity with respect to data set size. Our proposal is based on a combination of table scans and parallel multithreaded processing among multiple cores in the CPU. We introduce several database-style and hardware-level optimizations: caching row blocks of the input table, managing available RAM memory, interleaving I/O and CPU processing, as well as tuning the number of working threads. We experimentally benchmark our algorithms with large data sets on a DBMS running on a computer with a multicore CPU. We show that our algorithms outperform existing DBMS mechanisms in computing aggregations of multidimensional data summaries, especially as dimensionality grows. Furthermore, we show that local memory allocation (RAM block size) does not have a significant impact when the thread management algorithm distributes the workload among a fixed number of threads. Our proposal is unique in the sense that we do not modify or require access to the DBMS source code, but instead, we extend the DBMS with analytic functionality by developing User-Defined Functions.

Automatic Extraction and Usage of Terminology Dictionary Based on Definitional Sentences Patterns in Technical Documents (기술문서 정의문 패턴을 이용한 전문용어사전 자동추출 및 활용방안)

  • Han, Hui-Jeong;Kim, Tae-Young;Doo, Hyo-Chul;Oh, Hyo-Jung
    • Journal of the Korean Society for information Management
    • /
    • v.34 no.4
    • /
    • pp.81-99
    • /
    • 2017
  • Technical documents are important research outputs generated by knowledge and information society. In order to properly use the technical documents properly, it is necessary to utilize advanced information processing techniques, such as summarization and information extraction. In this paper, to extract core information, we automatically extracted the terminologies and their definition based on definitional sentences patterns and the structure of technical documents. Based on this, we proposed the system to build a specialized terminology dictionary. And further we suggested the personalized services so that users can utilize the terminology dictionary in various ways as an knowledge memory. The results of this study will allow users to find up-to-date information faster and easier. In addition, providing a personalized terminology dictionary to users can maximize the value, usability, and retrieval efficiency of the dictionary.

Development of Corrosion Evaluation Index Calculation Program of Raw Water and Evaluation on Corrosivity of Tap Water using the Calcium Carbonate Saturation Index (상수원수의 부식평가 지수 산정 프로그램 개발 및 탄산칼슘 포화지수에 의한 수돗물의 부식성 평가)

  • Hwang, Byung-Gi;Woo, Dal-Sik
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.10 no.1
    • /
    • pp.177-185
    • /
    • 2009
  • In this study, we developed the program to calculate the corrosion evaluation index for examining the corrosivity of raw water. When it was applied to the Han river and Nakdong river system, sulfuric acid ion, which accelerated corrosion, was higher in Nakdong river system than Han river system while calcium and hardness, which restrained corrosion, was the same way. Summarization of the LI and CCPP calculation result by the developed corrosion evaluation model showed that water quality of Han river system had strong tendency to corrode (is strongly corrosive). Moreover, this study evaluated the corrosivity of calcium carbonate saturation index by adding the chemicals to tap water. Saturation status was maintained in the order of $Ca(OH)_2$ > NaOH > ${Na_2}{CO_3}$ > $CaCO_3$ in the case of LI and RI.

The Optimal Process of Weapon Acquisition Management (I) -With Special Reference to the Cost/Effectiveness Model for the Selection of Weapon Acquisition System- (무기체계 획득관리의 최적화 (I) -무기체계 획득시스템의 선정을 위한 비용대효과분석모형을 중심으로-)

  • Lee Jin-Joo;Kwon Tae-Young;Joo Nam-Youn
    • Journal of the military operations research society of Korea
    • /
    • v.3 no.2
    • /
    • pp.49-77
    • /
    • 1977
  • Weapon systems are curcial instruments for the security of a nation and critical elements for the victory in a war. Since modern weapon systems tend to be capital-intensive with high precision and quality, they become more and more complex and diversified; their acquisition costs become huge; and their technological obsolescence becomes accelerated. Therefore, the systematic management of weapon acquisition process would be one of the most important defense tasks at the national level. To analyze such problems and find solutions, this paper has studied various aspects related to the efficient management of weapon system acquisition. After brief summarization of the general characteristics of weapon systems, their effectiveness, and developmental trend, the paper discusses the defense management policies and techniques for the weapon systems. Specifically, four alternative acquisition methods such as indigenous R & D, foreign purchase, co-production and joint-production are discussed and analyzed by systems approach. The systems analysis procedure to evaluate and select weapon acquisition method is as follows; 1) to analyze the merits and demerits of the alternative methods, 2) to screen unrealistic alternatives through the consideration of significant factors such as political, economic, military, technological, and social constraints, 3) to evaluate and select an optimal one among the remaining acquisition methods after the cost-effectivenss analysis. For the base of cost-effectivess analysis, cost analysis model as well as effectiveness analysis model of each acquisition method are developed.

  • PDF

Keyframe Extraction from Home Videos Using 5W and 1H Information (육하원칙 정보에 기반한 홈비디오 키프레임 추출)

  • Jang, Cheolhun;Cho, Sunghyun;Lee, Seungyong
    • Journal of the Korea Computer Graphics Society
    • /
    • v.19 no.2
    • /
    • pp.9-18
    • /
    • 2013
  • We propose a novel method to extract keyframes from home videos based on the 5W and 1H information. Keyframe extraction is a kind of video summarization which selects only specific frames containing important information of a video. As a home video may have content with a variety of topics, we cannot make specific assumptions for information extraction. In addition, to summarize a home video we must analyze human behaviors, because people are important subjects in home videos. In this paper, we extract 5W and 1H information by analyzing human faces, human behaviors, and the global information of background. Experimental results demonstrate that our technique extract more similar keyframes to human selections than previous methods.

Cloud storage-based intelligent archiving system applying automatic document summarization (문서 자동요약 기술을 적용한 클라우드 스토리지 기반 지능적 아카이빙 시스템)

  • Yoo, Kee-Dong
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.17 no.3
    • /
    • pp.59-68
    • /
    • 2012
  • Zero client-based cloud storage technology is gaining much interest as a tool to centralized management of organizational documents nowadays. Besides the well-known cloud storage's defects such as security and privacy protection, users of the zero client-based cloud storage point out the difficulty in browsing and selecting the storage category because of its diversity and complexity. To resolve this problem, this study proposes a method of intelligent document archiving by applying an algorithm-based automatic topic identification technology. Without user's direct definition of category to store the working document, the proposed methodology and prototype enable the working documents to be automatically archived into the predefined categories according to the extracted topic. Based on the proposed ideas, more effective and efficient centralized management of electronic documents can be achieved.

Keyword Extraction from News Corpus using Modified TF-IDF (TF-IDF의 변형을 이용한 전자뉴스에서의 키워드 추출 기법)

  • Lee, Sung-Jick;Kim, Han-Joon
    • The Journal of Society for e-Business Studies
    • /
    • v.14 no.4
    • /
    • pp.59-73
    • /
    • 2009
  • Keyword extraction is an important and essential technique for text mining applications such as information retrieval, text categorization, summarization and topic detection. A set of keywords extracted from a large-scale electronic document data are used for significant features for text mining algorithms and they contribute to improve the performance of document browsing, topic detection, and automated text classification. This paper presents a keyword extraction technique that can be used to detect topics for each news domain from a large document collection of internet news portal sites. Basically, we have used six variants of traditional TF-IDF weighting model. On top of the TF-IDF model, we propose a word filtering technique called 'cross-domain comparison filtering'. To prove effectiveness of our method, we have analyzed usefulness of keywords extracted from Korean news articles and have presented changes of the keywords over time of each news domain.

  • PDF

Implementation of Smart E-learning based on Blended Learning (혼합형 학습 기반 스마트 이러닝 구현)

  • Hong, YouSik
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.20 no.2
    • /
    • pp.171-178
    • /
    • 2020
  • Many countries are establishing and operating blended learning that combines the advantages of online and offline education. However, online education lecture-based Mooc courses have a very low level, with a graduation rate of less than 5-10%. Therefore, in order to increase the graduation rate of students taking online Mooc distance education lectures that anyone can easily take lectures anytime, anywhere on the web-based basis, it is necessary to introduce automatic analysis of students' understanding level of lectures and an automatic academic warning system. Moreover, in order to enter an advanced education country, it is necessary to develop an automatic judgment SW for wrong answer rate, automatic summary SW for lectures, and automatic analysis SW education for lecture-based weak subjects based on mixed learning levels. In order to improve this problem, in this paper, we proposed and simulated an automatic summarization system for lecture contents, an automatic warning system for incorrect answers, and an automatic judgment algorithm for weak subjects.

Purchasing Behavior of the Latest Trendy Color Bags - Focusing on Purchase Motives, Purchase Types, Satisfaction and Repurchase Intention - (최신 유행색 가방 구매행동 - 구매동기, 구매유형, 만족도 및 재구매 의도를 중심으로 -)

  • Kim, Eun Joo;Lee, Min Ji
    • Fashion & Textile Research Journal
    • /
    • v.16 no.5
    • /
    • pp.719-729
    • /
    • 2014
  • This study identified factors for purchase motives in regards to the latest trendy color bags as well as ascertained the structural relations of purchasing behavior in regards to purchase motives, purchase type, satisfaction, and repurchase intention. Other purposes examined the differences in purchase motivations, purchase types, satisfaction, and repurchase intention according to consumer characteristics, and provided strategic information on women's bag manufacturers and retailers. A random sampling method collected data based on a survey of Korean women between the ages of 20 and 59 who had purchased the latest trendy color bag. A questionnaire developed by the researcher was distributed to 450 women in 2013. We analyzed 433 questionnaires using the SPSS 18.0 program and AMOS 18.0 program. The summarization of the findings are as follows. First, purchase motives for the latest trendy color bags were classified into 5 factors: awareness-symbolicity, practicality, aesthetic, harmony, and fashionability. Second, aesthetic and harmony showed significant influenceson planned purchases due to an analysis of structural relations between purchase motives for the latest trendy color bags and type of purchase; in addition, awareness-symbolicity, aesthetic and fashionability significantly influenced unplanned purchases. Third, there was no significant influence for planned purchases on satisfaction; however, unplanned purchases showed a significant.

A Performance Comparison of Multi-Label Classification Methods for Protein Subcellular Localization Prediction (단백질의 세포내 위치 예측을 위한 다중레이블 분류 방법의 성능 비교)

  • Chi, Sang-Mun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.18 no.4
    • /
    • pp.992-999
    • /
    • 2014
  • This paper presents an extensive experimental comparison of a variety of multi-label learning methods for the accurate prediction of subcellular localization of proteins which simultaneously exist at multiple subcellular locations. We compared several methods from three categories of multi-label classification algorithms: algorithm adaptation, problem transformation, and meta learning. Experimental results are analyzed using 12 multi-label evaluation measures to assess the behavior of the methods from a variety of view-points. We also use a new summarization measure to find the best performing method. Experimental results show that the best performing methods are power-set method pruning a infrequently occurring subsets of labels and classifier chains modeling relevant labels with an additional feature. futhermore, ensembles of many classifiers of these methods enhance the performance further. The recommendation from this study is that the correlation of subcellular locations is an effective clue for classification, this is because the subcellular locations of proteins performing certain biological function are not independent but correlated.