• Title/Summary/Keyword: 뉴스웹

Search Result 171, Processing Time 0.021 seconds

Automatic Document Classification Using Multiple Classifier Systems (다중 분류기 시스템을 이용한 자동 문서 분류)

  • Kim, In-Cheol
    • The KIPS Transactions:PartB
    • /
    • v.11B no.5
    • /
    • pp.545-554
    • /
    • 2004
  • Combining multiple classifiers to obtain improved performance over the individual classifier has been a widely used technique. The task of constructing a multiple classifier system(MCS) contains two different Issues how to generate a diverse set of base-level classifiers and how to combine their predictions. In this paper, we review the characteristics of existing multiple classifier systems : Bagging, Boosting, and Slaking. For document classification, we propose new MCSs such as Stacked Bagging, Stacked Boosting, Bagged Stacking, Boosted Stacking. These MCSs are a sort of hybrid MCSs that combine advantages of existing MCSs such as Bugging, Boosting, and Stacking. We conducted some experiments of document classification to evaluate the performances of the proposed schemes on MEDLINE, Usenet news, and Web document collections. The result of experiments demonstrate the superiority of our hybrid MCSs over the existing ones.

TK-Indexing : An Indexing Method for SNS Data Based on NoSQL (TK-Indexing : NoSQL 기반 SNS 데이터 색인 기법)

  • Shim, Hyung-Nam;Kim, Jeong-Dong;Seol, Kwang-Soo;Baik, Doo-Kwon
    • The KIPS Transactions:PartD
    • /
    • v.19D no.4
    • /
    • pp.271-280
    • /
    • 2012
  • Currently, contents generated by SNS services are increasing exponentially, as the number of SNS users increase. The SNS is commonly used to post personal status and individual interests. Also, the SNS is applied in socialization, entertainment, product marketing, news sharing, and single person journalism. As SNS services became available on smart phones, the users of SNS services can generate and spread the social issues and controversies faster than the traditional media. The existing indexing methods for web contents have limitation in terms of real-time indexing for SNS contents, as they usually focus on diversity and accuracy of indexing. To overcome this problem, there are real-time indexing techniques based on RDBMSs. However, these techniques suffer from complex indexing procedures and reduced indexing targets. In this regard, we introduce the TK-Indexing method to improve the previous indexing techniques. Our method indexes the generation time of SNS contents and keywords by way of NoSQL to indexing SNS contents in real-time.

k-Bitmap Clustering Method for XML Data based on Relational DBMS (관계형 DBMS 기반의 XML 데이터를 위한 k-비트맵 클러스터링 기법)

  • Lee, Bum-Suk;Hwang, Byung-Yeon
    • The KIPS Transactions:PartD
    • /
    • v.16D no.6
    • /
    • pp.845-850
    • /
    • 2009
  • Use of XML data has been increased with growth of Web 2.0 environment. XML is recognized its advantages by using based technology of RSS or ATOM for transferring information from blogs and news feed. Bitmap clustering is a method to keep index in main memory based on Relational DBMS, and which performed better than the other XML indexing methods during the evaluation. Existing method generates too many clusters, and it causes deterioration of result of searching quality. This paper proposes k-Bitmap clustering method that can generate user defined k clusters to solve above-mentioned problem. The proposed method also keeps additional inverted index for searching excluded terms from representative bits of k-Bitmap. We performed evaluation and the result shows that the users can control the number of clusters. Also our method has high recall value in single term search, and it guarantees the searching result includes all related documents for its query with keeping two indices.

TRIB : A Clustering and Visualization System for Responding Comments on Blogs (TRIB: 블로그 댓글 분류 및 시각화 시스템)

  • Lee, Yun-Jung;Ji, Jung-Hoon;Woo, Gyun;Cho, Hwan-Gue
    • The KIPS Transactions:PartD
    • /
    • v.16D no.5
    • /
    • pp.817-824
    • /
    • 2009
  • In recent years, Weblog has become the most typical social media for citizens to share their opinions. And, many Weblogs reflect several social issues. There are many internet users who actively express their opinions for internet news or Weblog articles through the replying comments on online community. Hence, we can easily find internet blogs including more than 10 thousand replying comments. It is hard to search and explore useful messages on weblogs since most of weblog systems show articles and their comments to the form of sequential list. In this paper, we propose a visualizing and clustering system called TRIB (Telescope for Responding comments for Internet Blog) for a large set of responding comments for a Weblog article. TRIB clusters and visualizes the replying comments considering their contents using pre-defined user dictionary. Also, TRIB provides various personalized views considering the interests of users. To show the usefulness of TRIB, we conducted some experiments, concerning the clustering and visualizing capabilities of TRIB, with articles that have more than 1,000 comments.

A study on the User Experience at Unmanned Checkout Counter Using Big Data Analysis (빅데이터를 활용한 편의점 간편식에 대한 의미 분석)

  • Kim, Ae-sook;Ryu, Gi-hwan;Jung, Ju-hee;Kim, Hee-young
    • The Journal of the Convergence on Culture Technology
    • /
    • v.8 no.4
    • /
    • pp.375-380
    • /
    • 2022
  • The purpose of this study is to find out consumers' perception and meaning of convenience store convenience food by using big data. For this study, NNAVER and Daum analyzed news, intellectuals, blogs, cafes, intellectuals(tips), and web documents, and used 'convenience store convenience food' as keywords for data search. The data analysis period was selected as 3 years from January 1, 2019 to December 31, 2021. For data collection and analysis, frequency and matrix data were extracted using TEXTOM, and network analysis and visualization analysis were conducted using the NetDraw function of the UCINET 6 program. As a result, convenience store convenience foods were clustered into health, diversity, convenience, and economy according to consumers' selection attributes. It is expected to be the basis for the development of a new convenience menu that pursues convenience and convenience based on consumers' meaning of convenience store convenience foods such as appropriate prices, discount coupons, and events.

A Study on the Purchasing Factors of Color Cosmetics Using Big Data: Focusing on Topic Modeling and Concor Analysis (빅데이터를 활용한 색조화장품의 구매 요인에 관한 연구: 토픽모델링과 Concor 분석을 중심으로)

  • Eun-Hee Lee;Seung- Hee Bae
    • Journal of the Korean Applied Science and Technology
    • /
    • v.40 no.4
    • /
    • pp.724-732
    • /
    • 2023
  • In this study, we tried to analyze the characteristics of color cosmetics information search and the major information of interest in the color cosmetics market after COVID-19 shown in the text mining analysis results by collecting data on online interest information of consumers in the color cosmetics market after COVID-19. In the empirical analysis, text mining was performed on all documents such as news, blogs, cafes, and web pages, including the word "color cosmetics". As a result of the analysis, online information searches for color cosmetics after COVID-19 were mainly focused on purchase information, information on skin and mask-related makeup methods, and major topics such as interest brands and event information. As a result, post-COVID-19 color cosmetics buyers will become more sensitive to purchase information such as product value, safety, price benefits, and store information through active online information search, so a response strategy is required.

Analysis and Utilization of Housing Information based on Open API and Web Scraping (오픈API와 웹스크래핑에 기반한 주택정보 분석 및 활용방안)

  • Shin-Hyeong Choi
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.17 no.5
    • /
    • pp.323-329
    • /
    • 2024
  • In an era of low interest rates around the world, interest in real estate has increased. We can collect real estate information using the Internet, but it takes a lot of time to find. In this paper, real estate information from January 2015 to April 2024 is collected from three places to help users more easily collect real estate information of interest and use it for sales. First, by analyzing HTML documents using web scraping techniques, information on real estate of interest is automatically extracted from the website of the platform company. Second, the actual transaction price of the real estate is additionally collected through the open API provided by the Ministry of Land, Infrastructure and Transport. Third, real estate-related news is provided so that users can learn about the future value and prospects of real estate. The simulation results for the data collected in this study show that the lowest price predicted by the ARIMA model is expected to be in May 2024 among the next eight months. Therefore, by following this procedure, real estate buyers can make more efficient home sales by referring to related information including the predicted transaction price.

A Proposal of a Keyword Extraction System for Detecting Social Issues (사회문제 해결형 기술수요 발굴을 위한 키워드 추출 시스템 제안)

  • Jeong, Dami;Kim, Jaeseok;Kim, Gi-Nam;Heo, Jong-Uk;On, Byung-Won;Kang, Mijung
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.3
    • /
    • pp.1-23
    • /
    • 2013
  • To discover significant social issues such as unemployment, economy crisis, social welfare etc. that are urgent issues to be solved in a modern society, in the existing approach, researchers usually collect opinions from professional experts and scholars through either online or offline surveys. However, such a method does not seem to be effective from time to time. As usual, due to the problem of expense, a large number of survey replies are seldom gathered. In some cases, it is also hard to find out professional persons dealing with specific social issues. Thus, the sample set is often small and may have some bias. Furthermore, regarding a social issue, several experts may make totally different conclusions because each expert has his subjective point of view and different background. In this case, it is considerably hard to figure out what current social issues are and which social issues are really important. To surmount the shortcomings of the current approach, in this paper, we develop a prototype system that semi-automatically detects social issue keywords representing social issues and problems from about 1.3 million news articles issued by about 10 major domestic presses in Korea from June 2009 until July 2012. Our proposed system consists of (1) collecting and extracting texts from the collected news articles, (2) identifying only news articles related to social issues, (3) analyzing the lexical items of Korean sentences, (4) finding a set of topics regarding social keywords over time based on probabilistic topic modeling, (5) matching relevant paragraphs to a given topic, and (6) visualizing social keywords for easy understanding. In particular, we propose a novel matching algorithm relying on generative models. The goal of our proposed matching algorithm is to best match paragraphs to each topic. Technically, using a topic model such as Latent Dirichlet Allocation (LDA), we can obtain a set of topics, each of which has relevant terms and their probability values. In our problem, given a set of text documents (e.g., news articles), LDA shows a set of topic clusters, and then each topic cluster is labeled by human annotators, where each topic label stands for a social keyword. For example, suppose there is a topic (e.g., Topic1 = {(unemployment, 0.4), (layoff, 0.3), (business, 0.3)}) and then a human annotator labels "Unemployment Problem" on Topic1. In this example, it is non-trivial to understand what happened to the unemployment problem in our society. In other words, taking a look at only social keywords, we have no idea of the detailed events occurring in our society. To tackle this matter, we develop the matching algorithm that computes the probability value of a paragraph given a topic, relying on (i) topic terms and (ii) their probability values. For instance, given a set of text documents, we segment each text document to paragraphs. In the meantime, using LDA, we can extract a set of topics from the text documents. Based on our matching process, each paragraph is assigned to a topic, indicating that the paragraph best matches the topic. Finally, each topic has several best matched paragraphs. Furthermore, assuming there are a topic (e.g., Unemployment Problem) and the best matched paragraph (e.g., Up to 300 workers lost their jobs in XXX company at Seoul). In this case, we can grasp the detailed information of the social keyword such as "300 workers", "unemployment", "XXX company", and "Seoul". In addition, our system visualizes social keywords over time. Therefore, through our matching process and keyword visualization, most researchers will be able to detect social issues easily and quickly. Through this prototype system, we have detected various social issues appearing in our society and also showed effectiveness of our proposed methods according to our experimental results. Note that you can also use our proof-of-concept system in http://dslab.snu.ac.kr/demo.html.

An Analysis for Deriving New Convergent Service of Mobile Learning: The Case of Social Network Analysis and Association Rule (모바일 러닝에서의 신규 융합서비스 도출을 위한 분석: 사회연결망 분석과 연관성 분석 사례)

  • Baek, Heon;Kim, Jin Hwa;Kim, Yong Jin
    • Information Systems Review
    • /
    • v.15 no.3
    • /
    • pp.1-37
    • /
    • 2013
  • This study is conducted to explore the possibility of service convergence to promote mobile learning. This study has attempted to identify how mobile learning service is provided, which services among them are considered most popular, and which services are highly demanded by users. This study has also investigated the potential opportunities for service convergence of mobile service and e-learning. This research is then extended to examine the possibility of active convergence of common services in mobile services and e-learning. Important variables have been identified from related web pages of portal sites using social network analysis (SNA) and association rules. Due to the differences in number and type of variables on different web pages, SNA was used to deal with the difficulties of identifying the degree of complex connection. Association analysis has been used to identify association rules among variables. The study has revealed that most frequent services among common services of mobile services and e-learning were Games and SNS followed by Payment, Advertising, Mail, Event, Animation, Cloud, e-Book, Augmented Reality and Jobs. This study has also found that Search, News, GPS in mobile services were turned out to be very highly demanded while Simulation, Culture, Public Education were highly demanded in e-learning. In addition, It has been found that variables involving with high service convergence based on common variables of mobile and e-learning services were Games and SNS, Games and Sports, SNS and Advertising, Games and Event, SNS and e-Book, Games and Community in mobile services while Games, Animation, Counseling, e-Book, being preceding services Simulation, Speaking, Public Education, Attendance Management were turned out be highly convergent in e-learning services. Finally, this study has attempted to predict possibility of active service convergence focusing on Games, SNS, e-Book which were highly demanded common services in mobile and e-learning services. It is expected that this study can be used to suggest a strategic direction to promote mobile learning by converging mobile services and e-learning.

  • PDF

A Quantitative Analysis of Classification Classes and Classified Information Resources of Directory (디렉터리 서비스 분류항목 및 정보자원의 계량적 분석)

  • Kim, Sung-Won
    • Journal of Information Management
    • /
    • v.37 no.1
    • /
    • pp.83-103
    • /
    • 2006
  • This study analyzes the classification schemes and classified information resources of the directory services provided by major web portals to complement keyword-based retrieval. Specifically, this study intends to quantitatively analyze the topic categories, the information resources by subject, and the information resources classified by the topic categories of three directories, Yahoo, Naver, and Empas. The result of this analysis reveals some differences among directory services. Overall, these directories show different ratios of referred categories to original categories depending on the subject area, and the categories regarded as format-based show the highest proportion of referred categories. In terms of the total amount of classified information resources, Yahoo has the largest number of resources. The directories compared have different amounts of resources depending on the subject area. The quantitative analysis of resources classified by the specific category is performed on the class of 'News & Media'. The result reveals that Naver and Empas contain overly specified categories compared to Yahoo, as far as the number of information resources categorized is concerned. Comparing the depth of the categories assigned by the three directories to the same information resources, it is found that, on average, Yahoo assigns one-step further segmented divisions than the other two directories to the identical resources.