• Title/Summary/Keyword: Software Clustering

Search Result 321, Processing Time 0.021 seconds

Multi-Document Summarization Method of Reviews Using Word Embedding Clustering (워드 임베딩 클러스터링을 활용한 리뷰 다중문서 요약기법)

  • Lee, Pil Won;Hwang, Yun Young;Choi, Jong Seok;Shin, Young Tae
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.11
    • /
    • pp.535-540
    • /
    • 2021
  • Multi-document refers to a document consisting of various topics, not a single topic, and a typical example is online reviews. There have been several attempts to summarize online reviews because of their vast amounts of information. However, collective summarization of reviews through existing summary models creates a problem of losing the various topics that make up the reviews. Therefore, in this paper, we present method to summarize the review with minimal loss of the topic. The proposed method classify reviews through processes such as preprocessing, importance evaluation, embedding substitution using BERT, and embedding clustering. Furthermore, the classified sentences generate the final summary using the trained Transformer summary model. The performance evaluation of the proposed model was compared by evaluating the existing summary model, seq2seq model, and the cosine similarity with the ROUGE score, and performed a high performance summary compared to the existing summary model.

Simulation Analysis for Job Sequences in a Packaging Film Manufacturing Plant (포장용 필름 제조공장의 작업 우선순위 결정을 위한 시뮬레이션 분석)

  • LIU, JIONGKAI;Seo, Dong-Won
    • Journal of the Korea Society for Simulation
    • /
    • v.31 no.2
    • /
    • pp.1-10
    • /
    • 2022
  • The packaging plastic manufacturing(blown film) industry has long developed in China, but most of them are small/medium-sized enterprises, and it is very rare to have appropriate operation plans suitable for their own business. The packaging plastic manufacturing industry(blown film) follows a typical Make-To-Order method, and the sequence of processing orders is very important. Waste of materials incurred by frequent conversions of production cannot be avoided, and generally, related costs incurred during conversion production are also different. Therefore, this study developed a job sequence determination model for improving operating profits using @RISK simulation software, compared and analyzed 3 actionable clustering treatment methods proposed by technical managers and field experts under the actual situation of the factory.

Student Group Division Algorithm based on Multi-view Attribute Heterogeneous Information Network

  • Jia, Xibin;Lu, Zijia;Mi, Qing;An, Zhefeng;Li, Xiaoyong;Hong, Min
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.12
    • /
    • pp.3836-3854
    • /
    • 2022
  • The student group division is benefit for universities to do the student management based on the group profile. With the widespread use of student smart cards on campus, especially where students living in campus residence halls, students' daily activities on campus are recorded with information such as smart card swiping time and location. Therefore, it is feasible to depict the students with the daily activity data and accordingly group students based on objective measuring from their campus behavior with some regular student attributions collected in the management system. However, it is challenge in feature representation due to diverse forms of the student data. To effectively and comprehensively represent students' behaviors for further student group division, we proposed to adopt activity data from student smart cards and student attributes as input data with taking account of activity and attribution relationship types from different perspective. Specially, we propose a novel student group division method based on a multi-view student attribute heterogeneous information network (MSA-HIN). The network nodes in our proposed MSA-HIN represent students with their multi-dimensional attribute information. Meanwhile, the edges are constructed to characterize student different relationships, such as co-major, co-occurrence, and co-borrowing books. Based on the MSA-HIN, embedded representations of students are learned and a deep graph cluster algorithm is applied to divide students into groups. Comparative experiments have been done on a real-life campus dataset collected from a university. The experimental results demonstrate that our method can effectively reveal the variability of student attributes and relationships and accordingly achieves the best clustering results for group division.

Video Analysis System for Action and Emotion Detection by Object with Hierarchical Clustering based Re-ID (계층적 군집화 기반 Re-ID를 활용한 객체별 행동 및 표정 검출용 영상 분석 시스템)

  • Lee, Sang-Hyun;Yang, Seong-Hun;Oh, Seung-Jin;Kang, Jinbeom
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.1
    • /
    • pp.89-106
    • /
    • 2022
  • Recently, the amount of video data collected from smartphones, CCTVs, black boxes, and high-definition cameras has increased rapidly. According to the increasing video data, the requirements for analysis and utilization are increasing. Due to the lack of skilled manpower to analyze videos in many industries, machine learning and artificial intelligence are actively used to assist manpower. In this situation, the demand for various computer vision technologies such as object detection and tracking, action detection, emotion detection, and Re-ID also increased rapidly. However, the object detection and tracking technology has many difficulties that degrade performance, such as re-appearance after the object's departure from the video recording location, and occlusion. Accordingly, action and emotion detection models based on object detection and tracking models also have difficulties in extracting data for each object. In addition, deep learning architectures consist of various models suffer from performance degradation due to bottlenects and lack of optimization. In this study, we propose an video analysis system consists of YOLOv5 based DeepSORT object tracking model, SlowFast based action recognition model, Torchreid based Re-ID model, and AWS Rekognition which is emotion recognition service. Proposed model uses single-linkage hierarchical clustering based Re-ID and some processing method which maximize hardware throughput. It has higher accuracy than the performance of the re-identification model using simple metrics, near real-time processing performance, and prevents tracking failure due to object departure and re-emergence, occlusion, etc. By continuously linking the action and facial emotion detection results of each object to the same object, it is possible to efficiently analyze videos. The re-identification model extracts a feature vector from the bounding box of object image detected by the object tracking model for each frame, and applies the single-linkage hierarchical clustering from the past frame using the extracted feature vectors to identify the same object that failed to track. Through the above process, it is possible to re-track the same object that has failed to tracking in the case of re-appearance or occlusion after leaving the video location. As a result, action and facial emotion detection results of the newly recognized object due to the tracking fails can be linked to those of the object that appeared in the past. On the other hand, as a way to improve processing performance, we introduce Bounding Box Queue by Object and Feature Queue method that can reduce RAM memory requirements while maximizing GPU memory throughput. Also we introduce the IoF(Intersection over Face) algorithm that allows facial emotion recognized through AWS Rekognition to be linked with object tracking information. The academic significance of this study is that the two-stage re-identification model can have real-time performance even in a high-cost environment that performs action and facial emotion detection according to processing techniques without reducing the accuracy by using simple metrics to achieve real-time performance. The practical implication of this study is that in various industrial fields that require action and facial emotion detection but have many difficulties due to the fails in object tracking can analyze videos effectively through proposed model. Proposed model which has high accuracy of retrace and processing performance can be used in various fields such as intelligent monitoring, observation services and behavioral or psychological analysis services where the integration of tracking information and extracted metadata creates greate industrial and business value. In the future, in order to measure the object tracking performance more precisely, there is a need to conduct an experiment using the MOT Challenge dataset, which is data used by many international conferences. We will investigate the problem that the IoF algorithm cannot solve to develop an additional complementary algorithm. In addition, we plan to conduct additional research to apply this model to various fields' dataset related to intelligent video analysis.

Automated Detecting and Tracing for Plagiarized Programs using Gumbel Distribution Model (굼벨 분포 모델을 이용한 표절 프로그램 자동 탐색 및 추적)

  • Ji, Jeong-Hoon;Woo, Gyun;Cho, Hwan-Gue
    • The KIPS Transactions:PartA
    • /
    • v.16A no.6
    • /
    • pp.453-462
    • /
    • 2009
  • Studies on software plagiarism detection, prevention and judgement have become widespread due to the growing of interest and importance for the protection and authentication of software intellectual property. Many previous studies focused on comparing all pairs of submitted codes by using attribute counting, token pattern, program parse tree, and similarity measuring algorithm. It is important to provide a clear-cut model for distinguishing plagiarism and collaboration. This paper proposes a source code clustering algorithm using a probability model on extreme value distribution. First, we propose an asymmetric distance measure pdist($P_a$, $P_b$) to measure the similarity of $P_a$ and $P_b$ Then, we construct the Plagiarism Direction Graph (PDG) for a given program set using pdist($P_a$, $P_b$) as edge weights. And, we transform the PDG into a Gumbel Distance Graph (GDG) model, since we found that the pdist($P_a$, $P_b$) score distribution is similar to a well-known Gumbel distribution. Second, we newly define pseudo-plagiarism which is a sort of virtual plagiarism forced by a very strong functional requirement in the specification. We conducted experiments with 18 groups of programs (more than 700 source codes) collected from the ICPC (International Collegiate Programming Contest) and KOI (Korean Olympiad for Informatics) programming contests. The experiments showed that most plagiarized codes could be detected with high sensitivity and that our algorithm successfully separated real plagiarism from pseudo plagiarism.

Automatic Left Ventricle Segmentation Algorithm using K-mean Clustering and Graph Searching on Cardiac MRI (K-평균 클러스터링과 그래프 탐색을 통한 심장 자기공명영상의 좌심실 자동분할 알고리즘)

  • Jo, Hyun-Wu;Lee, Hae-Yeoun
    • The KIPS Transactions:PartB
    • /
    • v.18B no.2
    • /
    • pp.57-66
    • /
    • 2011
  • To prevent cardiac diseases, quantifying cardiac function is important in routine clinical practice by analyzing blood volume and ejection fraction. These works have been manually performed and hence it requires computational costs and varies depending on the operator. In this paper, an automatic left ventricle segmentation algorithm is presented to segment left ventricle on cardiac magnetic resonance images. After coil sensitivity of MRI images is compensated, a K-mean clustering scheme is applied to segment blood area. A graph searching scheme is employed to correct the segmentation error from coil distortions and noises. Using cardiac MRI images from 38 subjects, the presented algorithm is performed to calculate blood volume and ejection fraction and compared with those of manual contouring by experts and GE MASS software. Based on the results, the presented algorithm achieves the average accuracy of 6.2mL${\pm}$5.6, 2.9mL${\pm}$3.0 and 2.1%${\pm}$1.5 in diastolic phase, systolic phase and ejection fraction, respectively. Moreover, the presented algorithm minimizes user intervention rates which was critical to automatize algorithms in previous researches.

A Space-Time Cluster of Foot-and-Mouth Disease Outbreaks in South Korea, 2010~2011 (구제역의 시.공간 군집 분석 - 2010~2011 한국에서 발생한 구제역을 사례로 -)

  • Pak, Son Il;Bae, Sun Hak
    • Journal of the Korean association of regional geographers
    • /
    • v.18 no.4
    • /
    • pp.464-472
    • /
    • 2012
  • To assess the space-time clustering of FMD(Foot-and-Mouth Disease) epidemic occurred in Korea between November 2010 to April 2011, geographical information system (GIS)-based spatial analysis technique was used. Farm address and geographic data obtained from a commercial portal site were integrated into GIS software, which we used to map out the color-shading geographic features of the outbreaks through a process called thematic mapping, and to produce a visual representation of the relationship between epidemic course and time throughout the country. FMD cases reported in northern area of Gyounggi province were clustered in space and time within small geographic areas due to the environmental characteristics which livestock population density is high enough to ease transmit FMD virus to the neighboring farm, whereas FMD cases were clustered in space but not in time for southern and eastern area of Gyounggi province. When analyzing the data for 7-day interval, the mean radius of the spatial-time clustering was 25km with minimum 5.4km and maximum 74km. In addition, the radius of clustering was relatively small in the early stage of FMD epidemic, but the size was geographically expanded over the epidemic course. Prior to implementing control measures during the outbreak period, assessment of geographic units potentially affected and identification of risky areas which are subsequently be targeted for specific intervention measures is recommended.

  • PDF

Improved CS-RANSAC Algorithm Using K-Means Clustering (K-Means 클러스터링을 적용한 향상된 CS-RANSAC 알고리즘)

  • Ko, Seunghyun;Yoon, Ui-Nyoung;Alikhanov, Jumabek;Jo, Geun-Sik
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.6
    • /
    • pp.315-320
    • /
    • 2017
  • Estimating the correct pose of augmented objects on the real camera view efficiently is one of the most important questions in image tracking area. In computer vision, Homography is used for camera pose estimation in augmented reality system with markerless. To estimating Homography, several algorithm like SURF features which extracted from images are used. Based on extracted features, Homography is estimated. For this purpose, RANSAC algorithm is well used to estimate homography and DCS-RANSAC algorithm is researched which apply constraints dynamically based on Constraint Satisfaction Problem to improve performance. In DCS-RANSAC, however, the dataset is based on pattern of feature distribution of images manually, so this algorithm cannot classify the input image, pattern of feature distribution is not recognized in DCS-RANSAC algorithm, which lead to reduce it's performance. To improve this problem, we suggest the KCS-RANSAC algorithm using K-means clustering in CS-RANSAC to cluster the images automatically based on pattern of feature distribution and apply constraints to each image groups. The suggested algorithm cluster the images automatically and apply the constraints to each clustered image groups. The experiment result shows that our KCS-RANSAC algorithm outperformed the DCS-RANSAC algorithm in terms of speed, accuracy, and inlier rate.

An Efficient BotNet Detection Scheme Exploiting Word2Vec and Accelerated Hierarchical Density-based Clustering (Word2Vec과 가속화 계층적 밀집도 기반 클러스터링을 활용한 효율적 봇넷 탐지 기법)

  • Lee, Taeil;Kim, Kwanhyun;Lee, Jihyun;Lee, Suchul
    • Journal of Internet Computing and Services
    • /
    • v.20 no.6
    • /
    • pp.11-20
    • /
    • 2019
  • Numerous enterprises, organizations and individual users are exposed to large DDoS (Distributed Denial of Service) attacks. DDoS attacks are performed through a BotNet, which is composed of a number of computers infected with a malware, e.g., zombie PCs and a special computer that controls the zombie PCs within a hierarchical chain of a command system. In order to detect a malware, a malware detection software or a vaccine program must identify the malware signature through an in-depth analysis, and these signatures need to be updated in priori. This is time consuming and costly. In this paper, we propose a botnet detection scheme that does not require a periodic signature update using an artificial neural network model. The proposed scheme exploits Word2Vec and accelerated hierarchical density-based clustering. Botnet detection performance of the proposed method was evaluated using the CTU-13 dataset. The experimental result shows that the detection rate is 99.9%, which outperforms the conventional method.

Identification of Fuzzy Inference System Based on Information Granulation

  • Huang, Wei;Ding, Lixin;Oh, Sung-Kwun;Jeong, Chang-Won;Joo, Su-Chong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.4 no.4
    • /
    • pp.575-594
    • /
    • 2010
  • In this study, we propose a space search algorithm (SSA) and then introduce a hybrid optimization of fuzzy inference systems based on SSA and information granulation (IG). In comparison with "conventional" evolutionary algorithms (such as PSO), SSA leads no.t only to better search performance to find global optimization but is also more computationally effective when dealing with the optimization of the fuzzy models. In the hybrid optimization of fuzzy inference system, SSA is exploited to carry out the parametric optimization of the fuzzy model as well as to realize its structural optimization. IG realized with the aid of C-Means clustering helps determine the initial values of the apex parameters of the membership function of fuzzy model. The overall hybrid identification of fuzzy inference systems comes in the form of two optimization mechanisms: structure identification (such as the number of input variables to be used, a specific subset of input variables, the number of membership functions, and polyno.mial type) and parameter identification (viz. the apexes of membership function). The structure identification is developed by SSA and C-Means while the parameter estimation is realized via SSA and a standard least square method. The evaluation of the performance of the proposed model was carried out by using four representative numerical examples such as No.n-linear function, gas furnace, NO.x emission process data, and Mackey-Glass time series. A comparative study of SSA and PSO demonstrates that SSA leads to improved performance both in terms of the quality of the model and the computing time required. The proposed model is also contrasted with the quality of some "conventional" fuzzy models already encountered in the literature.