• 제목/요약/키워드: Datasets as Topic

검색결과 33건 처리시간 0.022초

거주민 공간복지 향상을 위한 공공 개방 민원 데이터 분석 모델 - 강동구 공간복지 분석 사례를 중심으로 - (A Public Open Civil Complaint Data Analysis Model to Improve Spatial Welfare for Residents - A Case Study of Community Welfare Analysis in Gangdong District -)

  • 신동윤
    • 한국BIM학회 논문집
    • /
    • 제13권3호
    • /
    • pp.39-47
    • /
    • 2023
  • This study aims to introduce a model for enhancing community well-being through the utilization of public open data. To objectively assess abstract notions of residential satisfaction, text data from complaints is analyzed. By leveraging accessible public data, costs related to data collection are minimized. Initially, relevant text data containing civic complaints is collected and refined by removing extraneous information. This processed data is then combined with meaningful datasets and subjected to topic modeling, a text mining technique. The insights derived are visualized using Geographic Information System (GIS) and Application Programming Interface (API) data. The efficacy of this analytical model was demonstrated in the Godeok/Gangil area. The proposed methodology allows for comprehensive analysis across time, space, and categories. This flexible approach involves incorporating specific public open data as needed, all within the overarching framework.

텍스트마이닝과 ChatGPT 분석을 활용한 기업과 대중의 ESG 인식 비교: 지속가능경영보고서와 소셜미디어를 기반으로 (Comparing Corporate and Public ESG Perceptions Using Text Mining and ChatGPT Analysis: Based on Sustainability Reports and Social Media)

  • 최재훈;양성병;윤상혁
    • 지능정보연구
    • /
    • 제29권4호
    • /
    • pp.347-373
    • /
    • 2023
  • 최근 기업의 지속가능한 성장을 이끄는 ESG(Environmental, Social, and Governance) 관리의 중요성이 강조되고 있다. 이에, 본 연구는 기업과 일반 대중 간의 ESG에 대한 인식 차이를 실증적으로 밝히고, ESG 정책의 시행을 방해하는 부정적인 여론과 그 배경을 규명하는 것을 목표로 한다. 이를 위해, LDA(Latent Dirichlet Allocation) 토픽모델링, JST(Joint Sentiment Topic Modeling) 및 의미연결망분석 기법을 사용하여 지속가능경영보고서와 소셜미디어에서의 주요 키워드와 토픽, 그리고 그 연결관계를 분석하였다. 또한, ChatGPT를 활용하여, 텍스트마이닝 분석의 결과를 보완하였다. 분석 결과, 기업과 일반 대중 간 ESG에 대한 인식과 중요도에 상당한 차이가 있음을 확인하였다. 구체적으로, 기업들은 위기 관리, 투명한 지배구조, 윤리적 경영 등에 집중하여 신뢰를 구축하려 했으나, '그린워싱', '중대재해', '불매운동' 등과 같은 부정적 키워드가 자주 소셜네트워크에서 등장하여, 많은 대중들이 기업의 ESG 이슈 처리에 대해 의심하고 있음을 확인하였다. 본 연구는 기업, 정부 기관, 고객 및 투자자를 위한 ESG 전략수립에 도움이 될 수 있는 가이드라인을 제공한다는 점에서 의의가 있다.

시퀀스 유틸리티 리스트를 사용하여 높은 유틸리티 순차 패턴 탐사 기법 (Mining High Utility Sequential Patterns Using Sequence Utility Lists)

  • 박종수
    • 정보처리학회논문지:소프트웨어 및 데이터공학
    • /
    • 제7권2호
    • /
    • pp.51-62
    • /
    • 2018
  • 높은 유틸리티 순차 패턴 탐사는 데이터 마이닝에서 중요한 연구 주제로 간주되고 있다. 이 주제에 대해 몇 개의 알고리즘들이 제안되었지만, 그것들은 높은 유틸리티 순차 패턴 탐사의 탐색 공간이 커지는 문제에 부딪히게 된다. 한 시퀀스의 더 엄격한 유틸리티 상한 값은 탐색 공간에서 초기에 유망하지 않은 패턴들을 더 가지치기할 수 있다. 본 논문에서 새로운 유틸리티 상한 값을 제안하는데, 그것은 한 시퀀스와 그 자손 시퀀스들의 최대 예상 유틸리티인 sequence expected utility (SEU)이다. 높은 유틸리티 순차 패턴들을 탐사하는데 필수적인 정보를 유지하기 위해 각 패턴에 대한 시퀀스 유틸리티 리스트를 새로운 자료구조로 사용한다. SEU를 활용하여 높은 유틸리티 순차 패턴들을 찾아내는 알고리즘인 High Sequence Utility List-Span (HSUL-Span)을 제안한다. 서로 다른 영역의 합성 데이터세트와 실제 데이터세트에 대한 실험 결과는 HSUL-Span이 상당히 적은 수의 후보 패턴들을 생성하고 실행 시간 면에서 다른 알고리즘들보다 우수한 것을 보여준다.

리뷰 정보를 활용한 이용자의 선호요인 식별에 관한 연구 (Identification of User Preference Factor Using Review Information)

  • 송성전;심지영
    • 정보관리학회지
    • /
    • 제39권3호
    • /
    • pp.311-336
    • /
    • 2022
  • 본 연구는 도서관 정보서비스 환경에서 도서 이용자의 도서추천에 영향을 미치는 선호요인을 파악하기 위해 전 세계 도서 이용자의 참여로 이루어지는 사회적 목록 서비스인 Goodreads 리뷰 데이터를 대상으로 내용분석하였다. 이용자 선호의 내용을 보다 세부적인 관점에서 파악하기 위해 샘플 선정 과정에서 평점 그룹별, 도서별, 이용자별 하위 데이터 집합을 구성하였으며, 다양한 토픽을 고루 반영하기 위해 리뷰 텍스트의 토픽모델링 결과에 기반하여 층화 샘플링을 수행하였다. 그 결과, '내용', '캐릭터', '글쓰기', '읽기', '작가', '스토리', '형식'의 7개 범주에 속하는 총 90개 선호요인 관련 개념을 식별하는 한편, 평점에 따라 드러나는 일반적인 선호요인은 물론 호불호가 분명한 도서와 이용자에서 드러나는 선호요인의 양상을 파악하였다. 본 연구의 결과는 이용자 선호요인의 구체적 양상을 파악하여 향후 추천시스템 등에서 보다 정교한 추천에 기여할 수 있을 것으로 보인다.

Improvement of Accuracy for Human Action Recognition by Histogram of Changing Points and Average Speed Descriptors

  • Vu, Thi Ly;Do, Trung Dung;Jin, Cheng-Bin;Li, Shengzhe;Nguyen, Van Huan;Kim, Hakil;Lee, Chongho
    • Journal of Computing Science and Engineering
    • /
    • 제9권1호
    • /
    • pp.29-38
    • /
    • 2015
  • Human action recognition has become an important research topic in computer vision area recently due to many applications in the real world, such as video surveillance, video retrieval, video analysis, and human-computer interaction. The goal of this paper is to evaluate descriptors which have recently been used in action recognition, namely Histogram of Oriented Gradient (HOG) and Histogram of Optical Flow (HOF). This paper also proposes new descriptors to represent the change of points within each part of a human body, caused by actions named as Histogram of Changing Points (HCP) and so-called Average Speed (AS) which measures the average speed of actions. The descriptors are combined to build a strong descriptor to represent human actions by modeling the information about appearance, local motion, and changes on each part of the body, as well as motion speed. The effectiveness of these new descriptors is evaluated in the experiments on KTH and Hollywood datasets.

Age Estimation via Selecting Discriminated Features and Preserving Geometry

  • Tian, Qing;Sun, Heyang;Ma, Chuang;Cao, Meng;Chu, Yi
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제14권4호
    • /
    • pp.1721-1737
    • /
    • 2020
  • Human apparent age estimation has become a popular research topic and attracted great attention in recent years due to its wide applications, such as personal security and law enforcement. To achieve the goal of age estimation, a large number of methods have been pro-posed, where the models derived through the cumulative attribute coding achieve promised performance by preserving the neighbor-similarity of ages. However, these methods afore-mentioned ignore the geometric structure of extracted facial features. Indeed, the geometric structure of data greatly affects the accuracy of prediction. To this end, we propose an age estimation algorithm through joint feature selection and manifold learning paradigms, so-called Feature-selected and Geometry-preserved Least Square Regression (FGLSR). Based on this, our proposed method, compared with the others, not only preserves the geometry structures within facial representations, but also selects the discriminative features. Moreover, a deep learning extension based FGLSR is proposed later, namely Feature selected and Geometry preserved Neural Network (FGNN). Finally, related experiments are conducted on Morph2 and FG-Net datasets for FGLSR and on Morph2 datasets for FGNN. Experimental results testify our method achieve the best performances.

LSTM 기반의 네트워크 트래픽 용량 예측 (LSTM based Network Traffic Volume Prediction)

  • 뉘엔양쯔엉;뉘엔반퀴엣;뉘엔휴쥐;김경백
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2018년도 추계학술발표대회
    • /
    • pp.362-364
    • /
    • 2018
  • Predicting network traffic volume has become a popular topic recently due to its support in many situations such as detecting abnormal network activities and provisioning network services. Especially, predicting the volume of the next upcoming traffic from the series of observed recent traffic volume is an interesting and challenging problem. In past, various techniques are researched by using time series forecasting methods such as moving averaging and exponential smoothing. In this paper, we propose a long short-term memory neural network (LSTM) based network traffic volume prediction method. The proposed method employs the changing rate of observed traffic volume, the corresponding time window index, and a seasonality factor indicating the changing trend as input features, and predicts the upcoming network traffic. The experiment results with real datasets proves that our proposed method works better than other time series forecasting methods in predicting upcoming network traffic.

DA-Res2Net: a novel Densely connected residual Attention network for image semantic segmentation

  • Zhao, Xiaopin;Liu, Weibin;Xing, Weiwei;Wei, Xiang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제14권11호
    • /
    • pp.4426-4442
    • /
    • 2020
  • Since scene segmentation is becoming a hot topic in the field of autonomous driving and medical image analysis, researchers are actively trying new methods to improve segmentation accuracy. At present, the main issues in image semantic segmentation are intra-class inconsistency and inter-class indistinction. From our analysis, the lack of global information as well as macroscopic discrimination on the object are the two main reasons. In this paper, we propose a Densely connected residual Attention network (DA-Res2Net) which consists of a dense residual network and channel attention guidance module to deal with these problems and improve the accuracy of image segmentation. Specifically, in order to make the extracted features equipped with stronger multi-scale characteristics, a densely connected residual network is proposed as a feature extractor. Furthermore, to improve the representativeness of each channel feature, we design a Channel-Attention-Guide module to make the model focusing on the high-level semantic features and low-level location features simultaneously. Experimental results show that the method achieves significant performance on various datasets. Compared to other state-of-the-art methods, the proposed method reaches the mean IOU accuracy of 83.2% on PASCAL VOC 2012 and 79.7% on Cityscapes dataset, respectively.

이요인 이론 기반 텍스트 마이닝을 통한 한·중 스마트홈 앱 서비스 사용자 평가 차이에 대한 연구: 신뢰성 중심 (A Study on the Evaluation Differences of Korean and Chinese Users in Smart Home App Services through Text Mining based on the Two-Factor Theory: Focus on Trustness)

  • 조욱녕;임규건
    • 한국IT서비스학회지
    • /
    • 제22권3호
    • /
    • pp.141-165
    • /
    • 2023
  • With the advent of the fourth industrial revolution, technologies such as the Internet of Things, artificial intelligence and cloud computing are developing rapidly, and smart homes enabled by these technologies are rapidly gaining popularity. To gain a competitive advantage in the global market, companies must understand the differences in consumer needs in different countries and cultures and develop corresponding business strategies. Therefore, this study conducts a comparative analysis of consumer reviews of smart homes in South Korea and China. This study collected online reviews of SmartThings, ThinQ, Msmarthom, and MiHome, the four most commonly used smart home apps in Korea and China. The collected review data is divided into satisfied reviews and dissatisfied reviews according to the ratings, and topics are extracted for each review dataset using LDA topic modeling. Next, the extracted topics are classified according to five evaluation factors of Perceived Usefulness, Reachability, Interoperability,Trustness, and Product Brand proposed by previous studies. Then, by comparing the importance of each evaluation factor in the two datasets of satisfaction and dissatisfaction, we find out the factors that affect consumer satisfaction and dissatisfaction, and compare the differences between users in Korea and China. We found Trustness and Reachability are very important factors. Finally, through language network analysis, the relationship between dissatisfied factors is analyzed from a more microscopic level, and improvement plans are proposed to the companies according to the analysis results.

Learning Similarity with Probabilistic Latent Semantic Analysis for Image Retrieval

  • Li, Xiong;Lv, Qi;Huang, Wenting
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제9권4호
    • /
    • pp.1424-1440
    • /
    • 2015
  • It is a challenging problem to search the intended images from a large number of candidates. Content based image retrieval (CBIR) is the most promising way to tackle this problem, where the most important topic is to measure the similarity of images so as to cover the variance of shape, color, pose, illumination etc. While previous works made significant progresses, their adaption ability to dataset is not fully explored. In this paper, we propose a similarity learning method on the basis of probabilistic generative model, i.e., probabilistic latent semantic analysis (PLSA). It first derives Fisher kernel, a function over the parameters and variables, based on PLSA. Then, the parameters are determined through simultaneously maximizing the log likelihood function of PLSA and the retrieval performance over the training dataset. The main advantages of this work are twofold: (1) deriving similarity measure based on PLSA which fully exploits the data distribution and Bayes inference; (2) learning model parameters by maximizing the fitting of model to data and the retrieval performance simultaneously. The proposed method (PLSA-FK) is empirically evaluated over three datasets, and the results exhibit promising performance.