• Title/Summary/Keyword: Web data mining

Search Result 412, Processing Time 0.031 seconds

Idea proposal of InfograaS for Visualization of Public Big-data (공공 빅데이터의 시각화를 위한 InfograaS의 아이디어 제안)

  • Cha, Byung-Rae;Lee, Hyung-Ho;Sim, Su-Jeong;Kim, Jong-Won
    • Journal of Advanced Navigation Technology
    • /
    • v.18 no.5
    • /
    • pp.524-531
    • /
    • 2014
  • In this paper, we have proposed the processing and analyzing the linked open data (LOD), a kind of big-data, using resources of cloud computing. The LOD is web-based open data in order to share and recycle of public data. Specially, we defined the InfograaS (Info-graphic as a service), new business area of SaaS (software as a service), to support visualization technique for BA (business analytics) and Info-graphic. The goal of this study is easily to use it by the non-specialist and beginner without experts of visualization and business analysis. Data visualization is the process to represent visually and understand the data analysis easily. The purpose of data visualization is to deliver information clearly and effectively by chart and figure. The big data of public data are shared and presented in the charts and the graphics understood easily by various processing results using Hadoop, R, machine learning, and data mining of open source and resources of cloud computing.

A Multimedia Recommender System Using User Playback Time (사용자의 재생 시간을 이용한 멀티미디어 추천 시스템)

  • Kwon, Hyeong-Joon;Chung, Dong-Keun;Hong, Kwang-Seok
    • Journal of Internet Computing and Services
    • /
    • v.10 no.1
    • /
    • pp.111-121
    • /
    • 2009
  • In this paper, we propose a multimedia recommender system using user's playback time. Proposed system collects multimedia content which is requested by user and its user‘s playback time, as web log data. The system predicts playback time.based preference level and related contents from collected transaction database by fuzzy association rule mining. Proposed method has a merit which sorts recommendation list according to preference without user’s custom preference data, and prevents a false preference. As an experimental result, we confirm that proposed system discovers useful rules and applies them to recommender system from a transaction which doesn‘t include custom preferences.

  • PDF

Truncated Kernel Projection Machine for Link Prediction

  • Huang, Liang;Li, Ruixuan;Chen, Hong
    • Journal of Computing Science and Engineering
    • /
    • v.10 no.2
    • /
    • pp.58-67
    • /
    • 2016
  • With the large amount of complex network data that is increasingly available on the Web, link prediction has become a popular data-mining research field. The focus of this paper is on a link-prediction task that can be formulated as a binary classification problem in complex networks. To solve this link-prediction problem, a sparse-classification algorithm called "Truncated Kernel Projection Machine" that is based on empirical-feature selection is proposed. The proposed algorithm is a novel way to achieve a realization of sparse empirical-feature-based learning that is different from those of the regularized kernel-projection machines. The algorithm is more appealing than those of the previous outstanding learning machines since it can be computed efficiently, and it is also implemented easily and stably during the link-prediction task. The algorithm is applied here for link-prediction tasks in different complex networks, and an investigation of several classification algorithms was performed for comparison. The experimental results show that the proposed algorithm outperformed the compared algorithms in several key indices with a smaller number of test errors and greater stability.

Similarity Measure based on XML Document's Structure and Contents (XML 문서의 구조와 내용을 고려한 유사도 측정)

  • Kim, Woo-Saeng
    • Journal of Korea Multimedia Society
    • /
    • v.11 no.8
    • /
    • pp.1043-1050
    • /
    • 2008
  • XML has become a standard for data representation and exchange on the Internet. With a large number of XML documents on the Web, there is an increasing need to automatically process those structurally rich documents for information retrieval, document management, and data mining applications. In this paper, we propose a new method to measure the similarity between XML documents by considering their structures and contents. The similarity of document's structure is found by a simple string matching technique and that of document's contents is found by weights taking into account of the names and positions of elements. The overall algorithm runs in time that is linear in the combined size of the two documents involved in comparison evaluation.

  • PDF

A Study on the Possible New Fusion between Mobile and Healthcare Service (모바일과 의료서비스 간의 새로운 융합 가능성에 관한 연구)

  • Shin, Yong Jae;Kim, Jin Hwa;Lee, Jea Beom
    • Journal of Information Technology Services
    • /
    • v.11 no.sup
    • /
    • pp.27-39
    • /
    • 2012
  • As many applications are possible now in mobile environment with the trend of mobile convergence, diverse applications in healthcare industry are also possible in mobile devices. Though lots of researches on mobile and health services are introduced, they are limited to specific area or techniques. This study shows possible directions of fusion between mobile technologies and health services in the future using a data mining technique called association rule analysis. The data used in this study is collected from web pages containing key words related to mobile technologies and health services. The analysis shows that current cases of fusion between monitoring based telemedicine and patients. It also shows another case of fusion between mobile hospital and medical screen charts. These show that fusion between mobile technologies and health services already began in industry. Association rules are found between well-being, city, diet, and sleep. The association rules containing security and privacy, though their associations are not so strong, also show that security and privacy of patient information should be protected in the future. The results show that the fusion of mobile technologies and health services is expected to provide health services to more users and larger areas. It is also expected to create new diverse business models in the future.

2DSpotDB: A Database for the Annotated Two-dimensional Polyacrylamide Gel Electrophoresis of Pathogen Proteins

  • Kim, Dae-Won;Yoo, Won-Gi;Lee, Myoung-Ro;Kim, Yu-Jung;Cho, Shin-Hyeong;Lee, Won-Ja;Ju, Jung-Won
    • Genomics & Informatics
    • /
    • v.9 no.4
    • /
    • pp.197-199
    • /
    • 2011
  • The biological interpretation of two-dimensional (2D) gel electrophoresis experiments is a key step toward understanding the functions of biological systems. We here present a web-based integrated database, called 2DSpotDB, for the management of proteome data derived from several pathogens. The 2DSpotDB was established as a part of the management of a pathogen proteome project at the Korea National Institute of Health. The goals of the 2DSpotDB implementation are to store and define important pathogen genes, retrieve information obtained by 2D polyacrylamide gel electrophoresis and mass spectrometry, and create an integrated system to provide pathogen proteome information for biological scientists. This database currently contains 14 gels and information on 387 protein spots, among which 329 proteins were identified and annotated.

A Study on Interest Issues Using Social Media New (소셜미디어 뉴스를 이용한 관심 이슈 연구)

  • Kwak, Noh Young;Lee, Moon Bong
    • The Journal of Information Systems
    • /
    • v.32 no.2
    • /
    • pp.177-190
    • /
    • 2023
  • Purpose Recently, as a new business marketing tool, short form content focused on fun and interest has been shared as hashtags. By extracting positive and negative keywords from media audiences through comment analysis of social media news, various stakeholders aim to quickly and easily grasp users' opinions on major news. Design/methodology/approach YouTube videos were searched using the YouTube Data API and the results were collected. Video comments were crawled and implemented as HTML elements, and the collection results were checked on the web page. The collected data consisted of video thumbnails, titles, contents, and comments. Comments were word tokenized with the R program, comparing positive and negative dictionaries, and then quantifying polarity. In addition, social network analysis was conducted using divided positive and negative comments, and the results of centrality analysis and visualization were confirmed. Findings Social media users' opinions on issue news were confirmed by analyzing and visualizing the centrality of keywords through social network analysis by dividing comments into positive and negative. As a result of the analysis, it was found that negative objective reviews had the highest effect on information usefulness. In this way, previous studies have been reaffirmed that online negative information has a strong effect on personal decision-making. Corporate marketers will analyze user comments on social network services (SNS) to detect negative opinions about products or corporate images, which will serve as an opportunity to satisfy customers' needs.

Analysis of Shipping and Logistics News Articles using Topic Modeling (토픽모델링을 활용한 해운물류 뉴스 분석)

  • Hee-Young Yoon;Il-Youp Kwak
    • Korea Trade Review
    • /
    • v.46 no.4
    • /
    • pp.61-76
    • /
    • 2021
  • This study focuses on three logistics-related news (Logistics Newspaper, Korea Shipping Gadget, and Korea Shipping Newspaper) in order to present changes in logistics issues, centering on Corona 19, which has recently had the greatest impact in the world. For data collection, two-year news articles in 2019 and 2020 (title, article, content, date, article classification, article URL) were collected through web crawling (using Python's BeautifulSoup, requests module) on the homepages of three representative logistics-related media companies. As for the data analysis methods, fundamental statistical analysis, Latent Dirichlet Allocation (LDA) for topic modeling, and Scattertext were performed. The analysis results were as follows. First, among the three news media related to logistics, the Korea Shipping Newspaper was carrying out the most active media activities. Second, through topic modeling with LDA, eight logistics-related topics were identified, and keywords and significant issues of each topic were presented. Third, the keywords were visually expressed through Scattertext. This is the first study to present changes in the logistics field, focusing on articles from representative logistics-related media in 2019 and 2020. In particular, 2019 and 2020 can be divided into before and after the outbreak of Corona 19, which has had a great impact not only on the logistics field but also on our lives as a whole. For future work, a multi-faceted approach is required, such as comparative studies of logistics issues between countries or presenting implications based on long-term time-series articles.

Trend of Research and Industry-Related Analysis in Data Quality Using Time Series Network Analysis (시계열 네트워크분석을 통한 데이터품질 연구경향 및 산업연관 분석)

  • Jang, Kyoung-Ae;Lee, Kwang-Suk;Kim, Woo-Je
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.6
    • /
    • pp.295-306
    • /
    • 2016
  • The purpose of this paper is both to analyze research trends and to predict industrial flows using the meta-data from the previous studies on data quality. There have been many attempts to analyze the research trends in various fields till lately. However, analysis of previous studies on data quality has produced poor results because of its vast scope and data. Therefore, in this paper, we used a text mining, social network analysis for time series network analysis to analyze the vast scope and data of data quality collected from a Web of Science index database of papers published in the international data quality-field journals for 10 years. The analysis results are as follows: Decreases in Mathematical & Computational Biology, Chemistry, Health Care Sciences & Services, Biochemistry & Molecular Biology, Biochemistry & Molecular Biology, and Medical Information Science. Increases, on the contrary, in Environmental Sciences, Water Resources, Geology, and Instruments & Instrumentation. In addition, the social network analysis results show that the subjects which have the high centrality are analysis, algorithm, and network, and also, image, model, sensor, and optimization are increasing subjects in the data quality field. Furthermore, the industrial connection analysis result on data quality shows that there is high correlation between technique, industry, health, infrastructure, and customer service. And it predicted that the Environmental Sciences, Biotechnology, and Health Industry will be continuously developed. This paper will be useful for people, not only who are in the data quality industry field, but also the researchers who analyze research patterns and find out the industry connection on data quality.

A Study on Recognition of Artificial Intelligence Utilizing Big Data Analysis (빅데이터 분석을 활용한 인공지능 인식에 관한 연구)

  • Nam, Soo-Tai;Kim, Do-Goan;Jin, Chan-Yong
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2018.05a
    • /
    • pp.129-130
    • /
    • 2018
  • Big data analysis is a technique for effectively analyzing unstructured data such as the Internet, social network services, web documents generated in the mobile environment, e-mail, and social data, as well as well formed structured data in a database. The most big data analysis techniques are data mining, machine learning, natural language processing, and pattern recognition, which were used in existing statistics and computer science. Global research institutes have identified analysis of big data as the most noteworthy new technology since 2011. Therefore, companies in most industries are making efforts to create new value through the application of big data. In this study, we analyzed using the Social Matrics which a big data analysis tool of Daum communications. We analyzed public perceptions of "Artificial Intelligence" keyword, one month as of May 19, 2018. The results of the big data analysis are as follows. First, the 1st related search keyword of the keyword of the "Artificial Intelligence" has been found to be technology (4,122). This study suggests theoretical implications based on the results.

  • PDF