• Title/Summary/Keyword: 커뮤니티 클러스터링

Search Result 12, Processing Time 0.025 seconds

Recovering Module View of Software Architecture using Community Detection Algorithm (커뮤니티 검출기법을 이용한 소프트웨어 아키텍쳐 모듈 뷰 복원)

  • Kim, Jungmin;Lee, Changun
    • Journal of Software Engineering Society
    • /
    • v.25 no.4
    • /
    • pp.69-74
    • /
    • 2012
  • This article suggests applicability to community detection algorithm from module recovering process of software architecture through compare to software clustering metric and community dectection metric. in addition to, analyze mutual relation and difference between separated module and measurement value of typical clustering algorithms and community detection algorithms. and then only sugeested several kinds basis that community detection algorithm can use to recovering module view of software architecture and, by so comparing measurement value of existing clustering metric and community algorithms, this article suggested correlation of two result data.

  • PDF

Web Crawling and PageRank Calculation for Community-Limited Search (커뮤니티 제한 검색을 위한 웹 크롤링 및 PageRank 계산)

  • Kim Gye-Jeong;Kim Min-Soo;Kim Yi-Reun;Whang Kyu-Young
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.07b
    • /
    • pp.1-3
    • /
    • 2005
  • 최근 웹 검색 분야에서는 검색 질을 높이기 위한 기법들이 많이 연구되어 왔으며, 대표적인 연구로는 제한 검색, focused crawling, 웹 클러스터링 등이 있다. 그러나 제한 검색은 검색 범위를 의미적으로 관련된 사이트들로 제한할 수 없으며, focused crawling은 질의 시점에 클러스터링하기 때문에 질의 처리 시간이 오래 걸리고, 웹 클러스터링은 많은 웹 페이지들을 대상으로 클러스터링하기 위한 오버헤드가 크다. 본 논문에서는 검색 범위를 특정 커뮤니티로 제한하여 검색 하는 커뮤니티 제한 검색과 커뮤니티를 구하는 방법으로 cluster crawler를 제안하여 이러한 문제점을 해결한다. 또한, 커뮤니티를 이용하여 PageRank를 2단계로 계산하는 방법을 제안한다. 제안된 방법은 첫 번째 과정에서 커뮤니티 단위로 지역적으로 PageRank를 계산한 후, 두 번째 과정에서 이를 바탕으로 전역적으로 PageRank론 계산한다. 제안된 방법은 Wang에 의해 제안된 방법에 비해 PageRank 근사치의 오차를 $59\%$ 정도로 줄일 수 있다.

  • PDF

Role Grades Classification and Community Clustering at Character-net (Character-net에서 배역비중의 분류와 커뮤니티 클러스터링)

  • Park, Seung-Bo;Jo, Geun-Sik
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.11
    • /
    • pp.169-178
    • /
    • 2009
  • There are various approaches that retrieve information from video. However, previous approaches have considered just object information and relationship between objects without story information to retrieve contents. To retrieve exact information at video, we need analyzing approach based on characters and community since these are body of story proceeding. Therefore, this paper describes video information retrieval methodology based on character information. Characters progress story to form relationship through conversations. We can analyze the relationship between characters in a story with the methods that classifies role grades and clusters communities of characters. In this paper, for these, we propose the Character-net and describe how to classify role grades and cluster communities at Character-net. And we show this method to be efficient.

A Case Study on Job Analysis Utilizing Cluster Analysis and Community Analysis (군집분석 및 커뮤니티 분석 기법을 활용한 직무분석 사례 연구)

  • Jo, Il-Hyun
    • The Journal of Korean Association of Computer Education
    • /
    • v.7 no.1
    • /
    • pp.151-165
    • /
    • 2004
  • The purpose of the study was to explore the potential of the Cluster Analysis and the Community Analysis of Social Network Analyses family in job-task analysis for curriculum design. These two multivariate analysis techniques were expected to bring us relevant and scientific information as well as inspiration in investigating the structure and nature of job system, which are critical in developing relevant curriculum. To pursue the purpose mentioned above, qualitative and quantitative data were collected from "S" Corporate, a major large high-tech manufacturing company, and analyzed by relevant analytic procedures. Results indicate that there are discrepancies between formal job structures and actual ones. Following Community analysis showed that the presence of center-marginal structure along with clustering structure in the current job formation. Interpretations of the results of the study are provided in light of past research and additional data collected from the study. Implications of the study are also discussed along with suggestions for future research.

  • PDF

Trends in Social Media Participation and Change in ssues with Meta Analysis Using Network Analysis and Clustering Technique (소셜 미디어 참여에 관한 연구 동향과 쟁점의 변화: 네트워크 분석과 클러스터링 기법을 활용한 메타 분석을 중심으로)

  • Shin, Hyun-Bo;Seon, Hyung-Ju;Lee, Zoon-Ky
    • The Journal of Bigdata
    • /
    • v.4 no.1
    • /
    • pp.99-118
    • /
    • 2019
  • This study used network analysis and clustering techniques to analyze studies on social media participation. As a result of the main path analysis, 37 major studies were extracted and divided into two networks: community-related networks and new media-related. Network analysis and clustering result in four clusters. This study has the academic significance of using academic data to grasp research trends at a macro level and using network analysis and machine learning as a methodology.

  • PDF

User Perspective Website Clustering for Site Portfolio Construction (사이트 포트폴리오 구성을 위한 사용자 관점의 웹사이트 클러스터링)

  • Kim, Mingyu;Kim, Namgyu
    • Journal of Internet Computing and Services
    • /
    • v.16 no.3
    • /
    • pp.59-69
    • /
    • 2015
  • Many users visit websites every day to perform information retrieval, shopping, and community activities. On the other hand, there is intense competition among sites which attempt to profit from the Internet users. Thus, the owners or marketing officers of each site try to design a variety of marketing strategies including cooperation with other sites. Through such cooperation, a site can share customers' information, mileage points, and hyperlinks with other sites. To create effective cooperation, it is crucial to choose an appropriate partner site that may have many potential customers. Unfortunately, it is exceedingly difficult to identify such an appropriate partner among the vast number of sites. In this paper, therefore, we devise a new methodology for recommending appropriate partner sites to each site. For this purpose, we perform site clustering from the perspective of visitors' similarities, and then identify a group of sites that has a number of common customers. We then analyze the potential for the practical use of the proposed methodology through its application to approximately 140 million actual site browsing histories.

Keyword Network Analysis for Technology Forecasting (기술예측을 위한 특허 키워드 네트워크 분석)

  • Choi, Jin-Ho;Kim, Hee-Su;Im, Nam-Gyu
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.4
    • /
    • pp.227-240
    • /
    • 2011
  • New concepts and ideas often result from extensive recombination of existing concepts or ideas. Both researchers and developers build on existing concepts and ideas in published papers or registered patents to develop new theories and technologies that in turn serve as a basis for further development. As the importance of patent increases, so does that of patent analysis. Patent analysis is largely divided into network-based and keyword-based analyses. The former lacks its ability to analyze information technology in details while the letter is unable to identify the relationship between such technologies. In order to overcome the limitations of network-based and keyword-based analyses, this study, which blends those two methods, suggests the keyword network based analysis methodology. In this study, we collected significant technology information in each patent that is related to Light Emitting Diode (LED) through text mining, built a keyword network, and then executed a community network analysis on the collected data. The results of analysis are as the following. First, the patent keyword network indicated very low density and exceptionally high clustering coefficient. Technically, density is obtained by dividing the number of ties in a network by the number of all possible ties. The value ranges between 0 and 1, with higher values indicating denser networks and lower values indicating sparser networks. In real-world networks, the density varies depending on the size of a network; increasing the size of a network generally leads to a decrease in the density. The clustering coefficient is a network-level measure that illustrates the tendency of nodes to cluster in densely interconnected modules. This measure is to show the small-world property in which a network can be highly clustered even though it has a small average distance between nodes in spite of the large number of nodes. Therefore, high density in patent keyword network means that nodes in the patent keyword network are connected sporadically, and high clustering coefficient shows that nodes in the network are closely connected one another. Second, the cumulative degree distribution of the patent keyword network, as any other knowledge network like citation network or collaboration network, followed a clear power-law distribution. A well-known mechanism of this pattern is the preferential attachment mechanism, whereby a node with more links is likely to attain further new links in the evolution of the corresponding network. Unlike general normal distributions, the power-law distribution does not have a representative scale. This means that one cannot pick a representative or an average because there is always a considerable probability of finding much larger values. Networks with power-law distributions are therefore often referred to as scale-free networks. The presence of heavy-tailed scale-free distribution represents the fundamental signature of an emergent collective behavior of the actors who contribute to forming the network. In our context, the more frequently a patent keyword is used, the more often it is selected by researchers and is associated with other keywords or concepts to constitute and convey new patents or technologies. The evidence of power-law distribution implies that the preferential attachment mechanism suggests the origin of heavy-tailed distributions in a wide range of growing patent keyword network. Third, we found that among keywords that flew into a particular field, the vast majority of keywords with new links join existing keywords in the associated community in forming the concept of a new patent. This finding resulted in the same outcomes for both the short-term period (4-year) and long-term period (10-year) analyses. Furthermore, using the keyword combination information that was derived from the methodology suggested by our study enables one to forecast which concepts combine to form a new patent dimension and refer to those concepts when developing a new patent.

Analyzing Influence of Outlier Elimination on Accuracy of Software Effort Estimation (소프트웨어 공수 예측의 정확성에 대한 이상치 제거의 영향 분석)

  • Seo, Yeong-Seok;Yoon, Kyung-A;Bae, Doo-Hwan
    • Journal of KIISE:Software and Applications
    • /
    • v.35 no.10
    • /
    • pp.589-599
    • /
    • 2008
  • Accurate software effort estimation has always been a challenge for the software industrial and academic software engineering communities. Many studies have focused on effort estimation methods to improve the estimation accuracy of software effort. Although data quality is one of important factors for accurate effort estimation, most of the work has not considered it. In this paper, we investigate the influence of outlier elimination on the accuracy of software effort estimation through empirical studies applying two outlier elimination methods(Least trimmed square regression and K-means clustering) and three effort estimation methods(Least squares regression, Neural network and Bayesian network) associatively. The empirical studies are performed using two industry data sets(the ISBSG Release 9 and the Bank data set which consists of the project data collected from a bank in Korea) with or without outlier elimination.

An Alternative Method for Assessing Local Spatial Association Among Inter-paired Location Events: Vector Spatial Autocorrelation in Housing Transactions (쌍대위치 이벤트들의 국지적 공간적 연관성을 평가하기 위한 방법론적 연구: 주택거래의 벡터 공간적 자기상관)

  • Lee, Gun-Hak
    • Journal of the Economic Geographical Society of Korea
    • /
    • v.11 no.4
    • /
    • pp.564-579
    • /
    • 2008
  • It is often challenging to evaluate local spatial association among onedimensional vectors generally representing paired-location events where two points are physically or functionally connected. This is largely because of complex process of such geographic phenomena itself and partially representational complexity. This paper addresses an alternative way to identify spatially autocorrelated paired-location events (or vectors) at a local scale. In doing so, we propose a statistical algorithm combining univariate point pattern analysis for evaluating local clustering of origin-points and similarity measure of corresponding vectors. For practical use of the suggested method, we present an empirical application using transactions data in a local housing market, particularly recorded from 2004 to 2006 in Franklin County, Ohio in the United States. As a result, several locally characterized similar transactions are identified among a set of vectors showing various local moves associated with communities defined.

  • PDF

Analysis of Big Data Visualization Technology Based on Patent Analysis (특허분석을 통한 빅 데이터의 시각화 기술 분석)

  • Rho, Seungmin;Choi, YongSoo
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.7
    • /
    • pp.149-154
    • /
    • 2014
  • Modern data computing developments have led to big improvements in graphic capabilities and there are many new possibilities for data displays. The visualization has proven effective for not only presenting essential information in vast amounts of data but also driving complex analyses. Big-data analytics and discovery present new research opportunities to the computer graphics and visualization community. In this paper, we discuss the patent analysis of big data visualization technology development in major countries. Especially, we analyzed 160 patent applications and registered patents in four countries on November 2012. According to the result of analysis provided by this paper, the text clustering analysis and 2D visualization are important and urgent development is needed to be oriented. In particular, due to the increase of use of smart devices and social networks in domestic, the development of three-dimensional visualization for Big Data can be seen very urgent.