• Title/Summary/Keyword: ranking algorithm

Search Result 204, Processing Time 0.024 seconds

Development of an Automated ESG Document Review System using Ensemble-Based OCR and RAG Technologies

  • Eun-Sil Choi
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.9
    • /
    • pp.25-37
    • /
    • 2024
  • This study proposes a novel automation system that integrates Optical Character Recognition (OCR) and Retrieval-Augmented Generation (RAG) technologies to enhance the efficiency of the ESG (Environmental, Social, and Governance) document review process. The proposed system improves text recognition accuracy by applying an ensemble model-based image preprocessing algorithm and hybrid information extraction models in the OCR process. Additionally, the RAG pipeline optimizes information retrieval and answer generation reliability through the implementation of layout analysis algorithms, re-ranking algorithms, and ensemble retrievers. The system's performance was evaluated using certificate images from online portals and corporate internal regulations obtained from various sources, such as the company's websites. The results demonstrated an accuracy of 93.8% for certification reviews and 92.2% for company regulations reviews, indicating that the proposed system effectively supports human evaluators in the ESG assessment process.

Real-time Fall Accident Prediction using Random Forest in IoT Environment (사물인터넷 환경에서 랜덤포레스트를 이용한 실시간 낙상 사고 예측)

  • Chan-Woo Bang;Bong-Hyun Kim
    • Journal of Internet of Things and Convergence
    • /
    • v.10 no.4
    • /
    • pp.27-33
    • /
    • 2024
  • As of 2023, the number of accident victims in the domestic construction industry is 26,829, ranking second only to other businesses (service industries). The accident types of casualties in all industries were falls (29,229 people), followed by falls (14,357 people). Based on the above data, this study attaches sensors to hard hats and insoles to predict fall accidents that frequently occur at construction sites, and proposes smart safety equipment that applies a random forest algorithm based on the data collected through this. The random forest model can determine fall accidents in real time with high accuracy by generating multiple decision trees and combining the predictions of each tree. This model classifies whether a worker has had a fall accident and the type of behavior through data collected from the MPU-6050 sensor attached to the hard hat. Fall accidents that are primarily determined from hard hats are secondarily predicted through sensors attached to the insole, thereby increasing prediction accuracy. It is expected that this will enable rapid response in the event of an accident, thereby reducing worker deaths and accidents.

Recommender system using BERT sentiment analysis (BERT 기반 감성분석을 이용한 추천시스템)

  • Park, Ho-yeon;Kim, Kyoung-jae
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.2
    • /
    • pp.1-15
    • /
    • 2021
  • If it is difficult for us to make decisions, we ask for advice from friends or people around us. When we decide to buy products online, we read anonymous reviews and buy them. With the advent of the Data-driven era, IT technology's development is spilling out many data from individuals to objects. Companies or individuals have accumulated, processed, and analyzed such a large amount of data that they can now make decisions or execute directly using data that used to depend on experts. Nowadays, the recommender system plays a vital role in determining the user's preferences to purchase goods and uses a recommender system to induce clicks on web services (Facebook, Amazon, Netflix, Youtube). For example, Youtube's recommender system, which is used by 1 billion people worldwide every month, includes videos that users like, "like" and videos they watched. Recommended system research is deeply linked to practical business. Therefore, many researchers are interested in building better solutions. Recommender systems use the information obtained from their users to generate recommendations because the development of the provided recommender systems requires information on items that are likely to be preferred by the user. We began to trust patterns and rules derived from data rather than empirical intuition through the recommender systems. The capacity and development of data have led machine learning to develop deep learning. However, such recommender systems are not all solutions. Proceeding with the recommender systems, there should be no scarcity in all data and a sufficient amount. Also, it requires detailed information about the individual. The recommender systems work correctly when these conditions operate. The recommender systems become a complex problem for both consumers and sellers when the interaction log is insufficient. Because the seller's perspective needs to make recommendations at a personal level to the consumer and receive appropriate recommendations with reliable data from the consumer's perspective. In this paper, to improve the accuracy problem for "appropriate recommendation" to consumers, the recommender systems are proposed in combination with context-based deep learning. This research is to combine user-based data to create hybrid Recommender Systems. The hybrid approach developed is not a collaborative type of Recommender Systems, but a collaborative extension that integrates user data with deep learning. Customer review data were used for the data set. Consumers buy products in online shopping malls and then evaluate product reviews. Rating reviews are based on reviews from buyers who have already purchased, giving users confidence before purchasing the product. However, the recommendation system mainly uses scores or ratings rather than reviews to suggest items purchased by many users. In fact, consumer reviews include product opinions and user sentiment that will be spent on evaluation. By incorporating these parts into the study, this paper aims to improve the recommendation system. This study is an algorithm used when individuals have difficulty in selecting an item. Consumer reviews and record patterns made it possible to rely on recommendations appropriately. The algorithm implements a recommendation system through collaborative filtering. This study's predictive accuracy is measured by Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE). Netflix is strategically using the referral system in its programs through competitions that reduce RMSE every year, making fair use of predictive accuracy. Research on hybrid recommender systems combining the NLP approach for personalization recommender systems, deep learning base, etc. has been increasing. Among NLP studies, sentiment analysis began to take shape in the mid-2000s as user review data increased. Sentiment analysis is a text classification task based on machine learning. The machine learning-based sentiment analysis has a disadvantage in that it is difficult to identify the review's information expression because it is challenging to consider the text's characteristics. In this study, we propose a deep learning recommender system that utilizes BERT's sentiment analysis by minimizing the disadvantages of machine learning. This study offers a deep learning recommender system that uses BERT's sentiment analysis by reducing the disadvantages of machine learning. The comparison model was performed through a recommender system based on Naive-CF(collaborative filtering), SVD(singular value decomposition)-CF, MF(matrix factorization)-CF, BPR-MF(Bayesian personalized ranking matrix factorization)-CF, LSTM, CNN-LSTM, GRU(Gated Recurrent Units). As a result of the experiment, the recommender system based on BERT was the best.

The Analysis of Competitiveness in Container Ports of Shanghai and North China & Korea Using Inverse Relation of Fuzzy Evaluation and Scenario Analysis (퍼지 역평가법과 시나리오 분석을 통한 상하이 및 북중국과 우리나라 컨테이너항만의 경쟁력분석에 관한 연구)

  • Ryu, Hyung-Geun;Lee, Hong-Girl;Yeo, Ki-Tae
    • Journal of Korean Society of Transportation
    • /
    • v.22 no.7 s.78
    • /
    • pp.49-59
    • /
    • 2004
  • In order to be a hub-port in Northeast Asia, current China government has intensively invested in port development. Further, this development Project is significantly big scale, compared with those projects which Korea and Japan have. Thus, China is beginning to threaten Korean ports, especially Busan port which try to be a hub port in Northeast Asia. For this reason, recently many studies to evaluate competitiveness between Korean ports, especially Busan and Gwangyang, and Chinese ports have been conducted. In the mean time, implications of those pervious research has mainly been based on evaluation of port competitiveness using evaluation methodologies, such as AHP(Analytical Hierarchy Process) and HFP(Hierarchical Fuzzy Process). However, as previous evaluation algorithms are methodologies that only calculate ranking of ports by competitiveness level, from the results of analysis, critical weak points affected current port competitiveness could not clearly fine out. That is, because there has not been any algorithm that can extract critical points from the evaluation results. The aim of this paper is to present critical points that affect port competitiveness using an algorithm based on IRFE(Inverse Relation of Fuzzy Evaluation), and scenario analysis, from previous results of evaluation of port competitiveness. And The research scope is to covey the subjective ports of Korea and China's 7 major ports (Busan, Gwangyang, Sanghai, Qingdao, Tienjin, Dalian and Kaoshuing). From analysis, it was found that critical weak point of Busan port is the level of hinterland including availability of free trade zone.

A Study on the Effect of Network Centralities on Recommendation Performance (네트워크 중심성 척도가 추천 성능에 미치는 영향에 대한 연구)

  • Lee, Dongwon
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.1
    • /
    • pp.23-46
    • /
    • 2021
  • Collaborative filtering, which is often used in personalization recommendations, is recognized as a very useful technique to find similar customers and recommend products to them based on their purchase history. However, the traditional collaborative filtering technique has raised the question of having difficulty calculating the similarity for new customers or products due to the method of calculating similaritiesbased on direct connections and common features among customers. For this reason, a hybrid technique was designed to use content-based filtering techniques together. On the one hand, efforts have been made to solve these problems by applying the structural characteristics of social networks. This applies a method of indirectly calculating similarities through their similar customers placed between them. This means creating a customer's network based on purchasing data and calculating the similarity between the two based on the features of the network that indirectly connects the two customers within this network. Such similarity can be used as a measure to predict whether the target customer accepts recommendations. The centrality metrics of networks can be utilized for the calculation of these similarities. Different centrality metrics have important implications in that they may have different effects on recommended performance. In this study, furthermore, the effect of these centrality metrics on the performance of recommendation may vary depending on recommender algorithms. In addition, recommendation techniques using network analysis can be expected to contribute to increasing recommendation performance even if they apply not only to new customers or products but also to entire customers or products. By considering a customer's purchase of an item as a link generated between the customer and the item on the network, the prediction of user acceptance of recommendation is solved as a prediction of whether a new link will be created between them. As the classification models fit the purpose of solving the binary problem of whether the link is engaged or not, decision tree, k-nearest neighbors (KNN), logistic regression, artificial neural network, and support vector machine (SVM) are selected in the research. The data for performance evaluation used order data collected from an online shopping mall over four years and two months. Among them, the previous three years and eight months constitute social networks composed of and the experiment was conducted by organizing the data collected into the social network. The next four months' records were used to train and evaluate recommender models. Experiments with the centrality metrics applied to each model show that the recommendation acceptance rates of the centrality metrics are different for each algorithm at a meaningful level. In this work, we analyzed only four commonly used centrality metrics: degree centrality, betweenness centrality, closeness centrality, and eigenvector centrality. Eigenvector centrality records the lowest performance in all models except support vector machines. Closeness centrality and betweenness centrality show similar performance across all models. Degree centrality ranking moderate across overall models while betweenness centrality always ranking higher than degree centrality. Finally, closeness centrality is characterized by distinct differences in performance according to the model. It ranks first in logistic regression, artificial neural network, and decision tree withnumerically high performance. However, it only records very low rankings in support vector machine and K-neighborhood with low-performance levels. As the experiment results reveal, in a classification model, network centrality metrics over a subnetwork that connects the two nodes can effectively predict the connectivity between two nodes in a social network. Furthermore, each metric has a different performance depending on the classification model type. This result implies that choosing appropriate metrics for each algorithm can lead to achieving higher recommendation performance. In general, betweenness centrality can guarantee a high level of performance in any model. It would be possible to consider the introduction of proximity centrality to obtain higher performance for certain models.

Automatic Recommendation of (IP)TV programs based on A Rank Model using Collaborative Filtering (협업 필터링을 이용한 순위 정렬 모델 기반 (IP)TV 프로그램 자동 추천)

  • Kim, Eun-Hui;Pyo, Shin-Jee;Kim, Mun-Churl
    • Journal of Broadcast Engineering
    • /
    • v.14 no.2
    • /
    • pp.238-252
    • /
    • 2009
  • Due to the rapid increase of available contents via the convergence of broadcasting and internet, the efficient access to personally preferred contents has become an important issue. In this paper, for recommendation scheme for TV programs using a collaborative filtering technique is studied. For recommendation of user preferred TV programs, our proposed recommendation scheme consists of offline and online computation. About offline computation, we propose reasoning implicitly each user's preference in TV programs in terms of program contents, genres and channels, and propose clustering users based on each user's preferences in terms of genres and channels by dynamic fuzzy clustering method. After an active user logs in, to recommend TV programs to the user with high accuracy, the online computation includes pulling similar users to an active user by similarity measure based on the standard preference list of active user and filtering-out of the watched TV programs of the similar users, which do not exist in EPG and ranking of the remaining TV programs by proposed rank model. Especially, in this paper, the BM (Best Match) algorithm is extended to make the recommended TV programs be ranked by taking into account user's preferences. The experimental results show that the proposed scheme with the extended BM model yields 62.1% of prediction accuracy in top five recommendations for the TV watching history of 2,441 people.

Collision Risk Assessment by using Hierarchical Clustering Method and Real-time Data (계층 클러스터링과 실시간 데이터를 이용한 충돌위험평가)

  • Vu, Dang-Thai;Jeong, Jae-Yong
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.27 no.4
    • /
    • pp.483-491
    • /
    • 2021
  • The identification of regional collision risks in water areas is significant for the safety of navigation. This paper introduces a new method of collision risk assessment that incorporates a clustering method based on the distance factor - hierarchical clustering - and uses real-time data in case of several surrounding vessels, group methodology and preliminary assessment to classify vessels and evaluate the basis of collision risk evaluation (called HCAAP processing). The vessels are clustered using the hierarchical program to obtain clusters of encounter vessels and are combined with the preliminary assessment to filter relatively safe vessels. Subsequently, the distance at the closest point of approach (DCPA) and time to the closest point of approach (TCPA) between encounter vessels within each cluster are calculated to obtain the relation and comparison with the collision risk index (CRI). The mathematical relationship of CRI for each cluster of encounter vessels with DCPA and TCPA is constructed using a negative exponential function. Operators can easily evaluate the safety of all vessels navigating in the defined area using the calculated CRI. Therefore, this framework can improve the safety and security of vessel traffic transportation and reduce the loss of life and property. To illustrate the effectiveness of the framework proposed, an experimental case study was conducted within the coastal waters of Mokpo, Korea. The results demonstrated that the framework was effective and efficient in detecting and ranking collision risk indexes between encounter vessels within each cluster, which allowed an automatic risk prioritization of encounter vessels for further investigation by operators.

An Influence Value Algorithm based on Social Network in Knowledge Retrieval Service (지식검색 서비스에서의 소셜 네트워크 기반 영향력 지수 알고리즘)

  • Choi, Chang-Hyun;Park, Gun-Woo;Lee, Sang-Hoon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.10
    • /
    • pp.43-53
    • /
    • 2009
  • Knowledge retrieval service that uses collective intelligence which has special quality of open structure and can share the accumulative data is gaining popularity. However, acquiring the right needs for users from massive public knowledge is getting harder. Recently, search results from Google which is known for it's exquisite algorism, shows results for collective intelligence such as Wikipedia, Yahoo Q/A at the highest rank. Objective of this paper is to show that most answers come from human and to find the most influential people in on-line knowledge retrieval service. Hereupon, this paper suggest the influence value calculation algorism by analyzing user relation as centrality which social network is based on user activeness and reliance in Naver 지식iN. The influence value calculated by the suggested algorism will be an important index in distinguishing reliable and the right user for the question by ranking users with troubleshooting solutions in the knowledge retrieval service. This will contribute in search satisfaction by acquiring the right information and knowledge for the users which is the most important objective for knowledge retrieval service.

Analysis of Football Fans' Uniform Consumption: Before and After Son Heung-Min's Transfer to Tottenham Hotspur FC (국내 프로축구 팬들의 유니폼 소비 분석: 손흥민의 토트넘 홋스퍼 FC 이적 전후 비교)

  • Choi, Yeong-Hyeon;Lee, Kyu-Hye
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.3
    • /
    • pp.91-108
    • /
    • 2020
  • Korea's famous soccer players are steadily performing well in international leagues, which led to higher interests of Korean fans in the international leagues. Reflecting the growing social phenomenon of rising interests on international leagues by Korean fans, the study examined the overall consumer perception in the consumption of uniform by domestic soccer fans and compared the changes in perception following the transfers of the players. Among others, the paper examined the consumer perception and purchase factors of soccer fans shown in social media, focusing on periods before and after the recruitment of Heung-Min Son to English Premier League's Tottenham Football Club. To this end, the EPL uniform is the collection keyword the paper utilized and collected consumer postings from domestic website and social media via Python 3.7, and analyzed them using Ucinet 6, NodeXL 1.0.1, and SPSS 25.0 programs. The results of this study can be summarized as follows. First, the uniform of the club that consistently topped the league, has been gaining attention as a popular uniform, and the players' performance, and the players' position have been identified as key factors in the purchase and search of professional football uniforms. In the case of the club, the actual ranking and whether the league won are shown to be important factors in the purchase and search of professional soccer uniforms. The club's emblem and the sponsor logo that will be attached to the uniform are also factors of interest to consumers. In addition, in the decision making process of purchase of a uniform by professional soccer fan, uniform's form, marking, authenticity, and sponsors are found to be more important than price, design, size, and logo. The official online store has emerged as a major purchasing channel, followed by gifts for friends or requests from acquaintances when someone travels to the United Kingdom. Second, a classification of key control categories through the convergence of iteration correlation analysis and Clauset-Newman-Moore clustering algorithm shows differences in the classification of individual groups, but groups that include the EPL's club and player keywords are identified as the key topics in relation to professional football uniforms. Third, between 2002 and 2006, the central theme for professional football uniforms was World Cup and English Premier League, but from 2012 to 2015, the focus has shifted to more interest of domestic and international players in the English Premier League. The subject has changed to the uniform itself from this time on. In this context, the paper can confirm that the major issues regarding the uniforms of professional soccer players have changed since Ji-Sung Park's transfer to Manchester United, and Sung-Yong Ki, Chung-Yong Lee, and Heung-Min Son's good performances in these leagues. The paper also identified that the uniforms of the clubs to which the players have transferred to are of interest. Fourth, both male and female consumers are showing increasing interest in Son's league, the English Premier League, which Tottenham FC belongs to. In particular, the increasing interest in Son has shown a tendency to increase interest in football uniforms for female consumers. This study presents a variety of researches on sports consumption and has value as a consumer study by identifying unique consumption patterns. It is meaningful in that the accuracy of the interpretation has been enhanced by using a cluster analysis via convergence of iteration correlation analysis and Clauset-Newman-Moore clustering algorithm to identify the main topics. Based on the results of this study, the clubs will be able to maximize its profits and maintain good relationships with fans by identifying key drivers of consumer awareness and purchasing for professional soccer fans and establishing an effective marketing strategy.

Estimation of Family Variation and Genetic Parameter for Growth Traits of Pacific Abalone, Haliotis discus hannai on the 3th Generation of Selection (선발 3세대 북방전복의 성장형질에 대한 가계변이 및 유전모수 추정)

  • Park, Jong-Won;Park, Choul-Ji;Lee, Jeong-Ho;Noh, Jae-Koo;Kim, Hyun-Chul;Hwang, In-Joon;Kim, Sung-Yeon
    • The Korean Journal of Malacology
    • /
    • v.29 no.4
    • /
    • pp.325-334
    • /
    • 2013
  • The purpose of this paper is to compare and analyze family variations for growth-related traits of Pacific abalone, Haliotis discus hannai. Genetic parameters and breeding values were estimated using all measurement data like shell length, shell width, and total weight as 18-month-old growth traits of 5,334 individuals of selected third generation's Pacific abalone produced in 2011. Family variations of 865 individuals of the upper 10 families with the largest number were inspected. Overall mean in phenotypic traits of 18-month-old Pacific abalone which was investigated in this study showed 54.5 mm of shell length, 36.8 mm of shell width and 21.3 g of total weight respectively. And, variation coefficient of total weight was 51.0%, so variability of data was shown to be higher than 21.1% of shell length and 20.7% of shell width. The family effects showed significant difference by each family (p < 0.05), and heritability of shell length, shell width, and total weight was medium with 0.370, 0.382, and 0.367 respectively. So it is considered that family selection is more advantageous than individual selection. On the basis of breeding values of estimated shell length and total weight, to investigate distribution and ranking by each individual about the upper 10 families with the largest number of individuals, the values were used by being changed into standardized breeding values. Based on shell length, it was investigated that the individual number of the upper 5.4% is 152 and the number of the lower 5.4% is 8. In case of total weight, it was inspected that the individual number of the upper 5.4% is 164 and the number of the lower 5.4% is 1. Like these, phenotypic and genetic diverse variations between families could be checked. By estimating genetic parameters and breeding values of a population for production of the next generation, if they are used properly in selection and mating, it is considered that more breeding effects can be expected.