• Title/Summary/Keyword: 텍스트 연구

Search Result 3,492, Processing Time 0.027 seconds

Development of an Automated ESG Document Review System using Ensemble-Based OCR and RAG Technologies

  • Eun-Sil Choi
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.9
    • /
    • pp.25-37
    • /
    • 2024
  • This study proposes a novel automation system that integrates Optical Character Recognition (OCR) and Retrieval-Augmented Generation (RAG) technologies to enhance the efficiency of the ESG (Environmental, Social, and Governance) document review process. The proposed system improves text recognition accuracy by applying an ensemble model-based image preprocessing algorithm and hybrid information extraction models in the OCR process. Additionally, the RAG pipeline optimizes information retrieval and answer generation reliability through the implementation of layout analysis algorithms, re-ranking algorithms, and ensemble retrievers. The system's performance was evaluated using certificate images from online portals and corporate internal regulations obtained from various sources, such as the company's websites. The results demonstrated an accuracy of 93.8% for certification reviews and 92.2% for company regulations reviews, indicating that the proposed system effectively supports human evaluators in the ESG assessment process.

Collaborative Filtered Enhanced Recommendation System Using BERT (BERT를 이용한 협업 필터링 강화 추천 시스템)

  • Jin-Bae Kim;Young-Gon Kim;Jung-Min Park
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.24 no.5
    • /
    • pp.61-67
    • /
    • 2024
  • In recent years, artificial intelligence and deep learning technologies have made significant advances, and the BERT model has been recognized for its excellent contextual understanding in natural language processing based on the transformer architecture. This performance has the potential to take traditional recommendation systems to the next level. In this study, we adopt an approach that combines a collaborative filtering approach with a deep learning model to improve the performance of recommendation systems. Specifically, we implemented a system that uses BERT to analyze the sentiment of user reviews and embed users based on these review sentiments to find and recommend users with similar tastes. In the process, we also utilized Elasticsearch, an open-source search engine, for quick search and retrieval of recommended results. The approach of analyzing users' textual data to increase the accuracy and personalization of recommendations will play an important role in improving the user experience on various online services in the future.

Influence analysis of Internet buzz to corporate performance : Individual stock price prediction using sentiment analysis of online news (온라인 언급이 기업 성과에 미치는 영향 분석 : 뉴스 감성분석을 통한 기업별 주가 예측)

  • Jeong, Ji Seon;Kim, Dong Sung;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.4
    • /
    • pp.37-51
    • /
    • 2015
  • Due to the development of internet technology and the rapid increase of internet data, various studies are actively conducted on how to use and analyze internet data for various purposes. In particular, in recent years, a number of studies have been performed on the applications of text mining techniques in order to overcome the limitations of the current application of structured data. Especially, there are various studies on sentimental analysis to score opinions based on the distribution of polarity such as positivity or negativity of vocabularies or sentences of the texts in documents. As a part of such studies, this study tries to predict ups and downs of stock prices of companies by performing sentimental analysis on news contexts of the particular companies in the Internet. A variety of news on companies is produced online by different economic agents, and it is diffused quickly and accessed easily in the Internet. So, based on inefficient market hypothesis, we can expect that news information of an individual company can be used to predict the fluctuations of stock prices of the company if we apply proper data analysis techniques. However, as the areas of corporate management activity are different, an analysis considering characteristics of each company is required in the analysis of text data based on machine-learning. In addition, since the news including positive or negative information on certain companies have various impacts on other companies or industry fields, an analysis for the prediction of the stock price of each company is necessary. Therefore, this study attempted to predict changes in the stock prices of the individual companies that applied a sentimental analysis of the online news data. Accordingly, this study chose top company in KOSPI 200 as the subjects of the analysis, and collected and analyzed online news data by each company produced for two years on a representative domestic search portal service, Naver. In addition, considering the differences in the meanings of vocabularies for each of the certain economic subjects, it aims to improve performance by building up a lexicon for each individual company and applying that to an analysis. As a result of the analysis, the accuracy of the prediction by each company are different, and the prediction accurate rate turned out to be 56% on average. Comparing the accuracy of the prediction of stock prices on industry sectors, 'energy/chemical', 'consumer goods for living' and 'consumer discretionary' showed a relatively higher accuracy of the prediction of stock prices than other industries, while it was found that the sectors such as 'information technology' and 'shipbuilding/transportation' industry had lower accuracy of prediction. The number of the representative companies in each industry collected was five each, so it is somewhat difficult to generalize, but it could be confirmed that there was a difference in the accuracy of the prediction of stock prices depending on industry sectors. In addition, at the individual company level, the companies such as 'Kangwon Land', 'KT & G' and 'SK Innovation' showed a relatively higher prediction accuracy as compared to other companies, while it showed that the companies such as 'Young Poong', 'LG', 'Samsung Life Insurance', and 'Doosan' had a low prediction accuracy of less than 50%. In this paper, we performed an analysis of the share price performance relative to the prediction of individual companies through the vocabulary of pre-built company to take advantage of the online news information. In this paper, we aim to improve performance of the stock prices prediction, applying online news information, through the stock price prediction of individual companies. Based on this, in the future, it will be possible to find ways to increase the stock price prediction accuracy by complementing the problem of unnecessary words that are added to the sentiment dictionary.

Definition and Division in Intelligent Service Facility for Integrating Management (지능화시설의 통합운영관리를 위한 정의 및 구분에 관한 연구)

  • PARK, Jeong-Woo;YIM, Du-Hyun;NAM, Kwang-Woo;KIM, Jin-Young
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.19 no.4
    • /
    • pp.52-62
    • /
    • 2016
  • Smart City is urban development for complex problem solving that provides convenience and safety for citizens, and it is a blueprint for future cities. In 2008, the Korean government defined the construction, management, and government support of U-Cities in the legislation, Act on the Construction, Etc. of Ubiquitous Cities (Ubiquitous City Act), which included definitions of terms used in the act. In addition, the Minister of Land, Infrastructure and Transport has established a "ubiquitous city master plan" considering this legislation. The concept of U-Cities is complex, due to the mix of informatization and urban planning. Because of this complexity, the foundation of relevant regulations is inadequate, which is impeding the establishment and implementation of practical plans. Smart City intelligent service facilities are not easy to define and classify, because technology is rapidly changing and includes various devices for gathering and expressing information. The purpose of this study is to complement the legal definition of the intelligent service facility, which is necessary for integrated management and operation. The related laws and regulations on U-City were analyzed using text-mining techniques to identify insufficient legal definitions of intelligent service facilities. Using data gathered from interviews with officials responsible for constructing U-Cities, this study identified problems generated by implementing intelligent service facilities at the field level. This strategy should contribute to improved efficiency management, the foundation for building integrated utilization between departments. Efficiencies include providing a clear concept for establishing five-year renewable plans for U-Cities.

Latent topics-based product reputation mining (잠재 토픽 기반의 제품 평판 마이닝)

  • Park, Sang-Min;On, Byung-Won
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.39-70
    • /
    • 2017
  • Data-drive analytics techniques have been recently applied to public surveys. Instead of simply gathering survey results or expert opinions to research the preference for a recently launched product, enterprises need a way to collect and analyze various types of online data and then accurately figure out customer preferences. In the main concept of existing data-based survey methods, the sentiment lexicon for a particular domain is first constructed by domain experts who usually judge the positive, neutral, or negative meanings of the frequently used words from the collected text documents. In order to research the preference for a particular product, the existing approach collects (1) review posts, which are related to the product, from several product review web sites; (2) extracts sentences (or phrases) in the collection after the pre-processing step such as stemming and removal of stop words is performed; (3) classifies the polarity (either positive or negative sense) of each sentence (or phrase) based on the sentiment lexicon; and (4) estimates the positive and negative ratios of the product by dividing the total numbers of the positive and negative sentences (or phrases) by the total number of the sentences (or phrases) in the collection. Furthermore, the existing approach automatically finds important sentences (or phrases) including the positive and negative meaning to/against the product. As a motivated example, given a product like Sonata made by Hyundai Motors, customers often want to see the summary note including what positive points are in the 'car design' aspect as well as what negative points are in thesame aspect. They also want to gain more useful information regarding other aspects such as 'car quality', 'car performance', and 'car service.' Such an information will enable customers to make good choice when they attempt to purchase brand-new vehicles. In addition, automobile makers will be able to figure out the preference and positive/negative points for new models on market. In the near future, the weak points of the models will be improved by the sentiment analysis. For this, the existing approach computes the sentiment score of each sentence (or phrase) and then selects top-k sentences (or phrases) with the highest positive and negative scores. However, the existing approach has several shortcomings and is limited to apply to real applications. The main disadvantages of the existing approach is as follows: (1) The main aspects (e.g., car design, quality, performance, and service) to a product (e.g., Hyundai Sonata) are not considered. Through the sentiment analysis without considering aspects, as a result, the summary note including the positive and negative ratios of the product and top-k sentences (or phrases) with the highest sentiment scores in the entire corpus is just reported to customers and car makers. This approach is not enough and main aspects of the target product need to be considered in the sentiment analysis. (2) In general, since the same word has different meanings across different domains, the sentiment lexicon which is proper to each domain needs to be constructed. The efficient way to construct the sentiment lexicon per domain is required because the sentiment lexicon construction is labor intensive and time consuming. To address the above problems, in this article, we propose a novel product reputation mining algorithm that (1) extracts topics hidden in review documents written by customers; (2) mines main aspects based on the extracted topics; (3) measures the positive and negative ratios of the product using the aspects; and (4) presents the digest in which a few important sentences with the positive and negative meanings are listed in each aspect. Unlike the existing approach, using hidden topics makes experts construct the sentimental lexicon easily and quickly. Furthermore, reinforcing topic semantics, we can improve the accuracy of the product reputation mining algorithms more largely than that of the existing approach. In the experiments, we collected large review documents to the domestic vehicles such as K5, SM5, and Avante; measured the positive and negative ratios of the three cars; showed top-k positive and negative summaries per aspect; and conducted statistical analysis. Our experimental results clearly show the effectiveness of the proposed method, compared with the existing method.

Analyzing the Issue Life Cycle by Mapping Inter-Period Issues (기간별 이슈 매핑을 통한 이슈 생명주기 분석 방법론)

  • Lim, Myungsu;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.25-41
    • /
    • 2014
  • Recently, the number of social media users has increased rapidly because of the prevalence of smart devices. As a result, the amount of real-time data has been increasing exponentially, which, in turn, is generating more interest in using such data to create added value. For instance, several attempts are being made to analyze the relevant search keywords that are frequently used on new portal sites and the words that are regularly mentioned on various social media in order to identify social issues. The technique of "topic analysis" is employed in order to identify topics and themes from a large amount of text documents. As one of the most prevalent applications of topic analysis, the technique of issue tracking investigates changes in the social issues that are identified through topic analysis. Currently, traditional issue tracking is conducted by identifying the main topics of documents that cover an entire period at the same time and analyzing the occurrence of each topic by the period of occurrence. However, this traditional issue tracking approach has two limitations. First, when a new period is included, topic analysis must be repeated for all the documents of the entire period, rather than being conducted only on the new documents of the added period. This creates practical limitations in the form of significant time and cost burdens. Therefore, this traditional approach is difficult to apply in most applications that need to perform an analysis on the additional period. Second, the issue is not only generated and terminated constantly, but also one issue can sometimes be distributed into several issues or multiple issues can be integrated into one single issue. In other words, each issue is characterized by a life cycle that consists of the stages of creation, transition (merging and segmentation), and termination. The existing issue tracking methods do not address the connection and effect relationship between these issues. The purpose of this study is to overcome the two limitations of the existing issue tracking method, one being the limitation regarding the analysis method and the other being the limitation involving the lack of consideration of the changeability of the issues. Let us assume that we perform multiple topic analysis for each multiple period. Then it is essential to map issues of different periods in order to trace trend of issues. However, it is not easy to discover connection between issues of different periods because the issues derived for each period mutually contain heterogeneity. In this study, to overcome these limitations without having to analyze the entire period's documents simultaneously, the analysis can be performed independently for each period. In addition, we performed issue mapping to link the identified issues of each period. An integrated approach on each details period was presented, and the issue flow of the entire integrated period was depicted in this study. Thus, as the entire process of the issue life cycle, including the stages of creation, transition (merging and segmentation), and extinction, is identified and examined systematically, the changeability of the issues was analyzed in this study. The proposed methodology is highly efficient in terms of time and cost, as it sufficiently considered the changeability of the issues. Further, the results of this study can be used to adapt the methodology to a practical situation. By applying the proposed methodology to actual Internet news, the potential practical applications of the proposed methodology are analyzed. Consequently, the proposed methodology was able to extend the period of the analysis and it could follow the course of progress of each issue's life cycle. Further, this methodology can facilitate a clearer understanding of complex social phenomena using topic analysis.

A Symbolic Characteristic of Mimetic Words in Published Cartoon: Focusing on Works of Heo, Young Man (허영만의 작품에서 나타난 효과태의 상징어적 특징과 활용)

  • O, Yul Seok;Yoon, Ki Heon
    • Cartoon and Animation Studies
    • /
    • s.30
    • /
    • pp.169-199
    • /
    • 2013
  • In various directions of cartoon, vertical stroll direction is opposite to the page direction of existing published cartoon with the popularity of webtoon and established new genre. Lots of studies on published cartoon focus on the cut direction by page, but webtoon doesn't have any concept of page. The pivot of cartoon oriented people is changed from paper to computer monitor as times go by, characteristics of media are changed and media is gradually diversified. Like the strengthening of mobile caused by smart phone's popularity, tablet PC's propagation in public education, etc. cartoon is included to the environment of media which is rapidly changed. In this situation, one of cartoon's unchanged important identities can be the direction made by harmony between picture and text. This thesis analyzed symbolic characteristics and effective value of hyogwatae, mimetic words of cartoon, focusing on works of Heo, Young Man. Hyogwatae just delivers not only sound but also shape, feeling, status, etc. and has significant characteristics by invoking the imaginary structure of literature. Strengths of modern Korean, various linguistic expressions and syllabic systems, let people feel minute feeling of language and difference of emotion and remember the memory through the direct and indirect experiences, so it makes it nuance. Because of the characteristics, representative works of Heo, Young Man have commercialization and writer characteristics, have communicated with people for a long time and have plentiful knowledge of Korean cartoon. The characteristics of hyogwatae in Heo, Young Man's cartoon make a lot of effects for the expression and delivery of cartoon more than the general expectation. When conducting the study focusing on the symbolic process of language, uncertainty and vague standard of judgement caused by the wide factors of study on the direction of general cartoon could be endured. And, through the Heo, Young Man's deep analysis on hyogwatae's direction, readers enjoy the process while inferring actually and intellectually between pictures and sentences. In the process, the equipment stimulating imagination more than pictures, effects and dialogues is hyogwatae. It's reader's equipment of active participation and its strength is symbolic structure.

An Interface Technique for Avatar-Object Behavior Control using Layered Behavior Script Representation (계층적 행위 스크립트 표현을 통한 아바타-객체 행위 제어를 위한 인터페이스 기법)

  • Choi Seung-Hyuk;Kim Jae-Kyung;Lim Soon-Bum;Choy Yoon-Chul
    • Journal of KIISE:Software and Applications
    • /
    • v.33 no.9
    • /
    • pp.751-775
    • /
    • 2006
  • In this paper, we suggested an avatar control technique using the high-level behavior. We separated behaviors into three levels according to level of abstraction and defined layered scripts. Layered scripts provide the user with the control over the avatar behaviors at the abstract level and the reusability of scripts. As the 3D environment gets complicated, the number of required avatar behaviors increases accordingly and thus controlling the avatar-object behaviors gets even more challenging. To solve this problem, we embed avatar behaviors into each environment object, which informs how the avatar can interact with the object. Even with a large number of environment objects, our system can manage avatar-object interactions in an object-oriented manner Finally, we suggest an easy-to-use user interface technique that allows the user to control avatars based on context menus. Using the avatar behavior information that is embedded into the object, the system can analyze the object state and filter the behaviors. As a result, context menu shows the behaviors that the avatar can do. In this paper, we made the virtual presentation environment and applied our model to the system. In this paper, we suggested the technique that we controling an the avatar control technique using the high-level behavior. We separated behaviors into three levels byaccording to level of abstract levelion and defined multi-levellayered script. Multi-leveILayered script offers that the user can control avatar behavior at the abstract level and reuses script easily. We suggested object models for avatar-object interaction. Because, TtThe 3D environment is getting more complicated very quickly, so that the numberss of avatar behaviors are getting more variableincreased. Therefore, controlling avatar-object behavior is getting complex and difficultWe need tough processing for handling avatar-object interaction. To solve this problem, we suggested object models that embedded avatar behaviors into object for avatar-object interaction. insert embedded ail avatar behaviors into object. Even though the numbers of objects areis large bigger, it can manage avatar-object interactions by very efficientlyobject-oriented manner. Finally Wewe suggested context menu for ease ordering. User can control avatar throughusing not avatar but the object-oriented interfaces. To do this, Oobject model is suggested by analyzeing object state and filtering the behavior, behavior and context menu shows the behaviors that avatar can do. The user doesn't care about the object or avatar state through the related object.

Discussions about Expanded Fests of Cartoons and Multimedia Comics as Visual Culture: With a Focus on New Technologies (비주얼 컬처로서 만화영상의 확장된 장(場, fest)에 대한 논의: 뉴 테크놀로지를 중심으로)

  • Lee, Hwa-Ja;Kim, Se-Jong
    • Cartoon and Animation Studies
    • /
    • s.28
    • /
    • pp.1-25
    • /
    • 2012
  • The rapid digitalization across all aspects of society since 1990 led to the digitalization of cartoons. As the medium of cartoons moved from paper to the web, a powerful visual culture emerged. An encounter between cartoons and multimedia technologies has helped cartoons evolve into a video culture. Today cartoons are no longer literate culture. It is critical to pay attention to cartoons as an "expanded fest" and as visual and video culture with much broader significance. In this paper, the investigator set out to diagnose the current position of cartoons changing in the rapidly changing digital age and talk about future directions that they should pursue. Thus she discussed cases of changes from 1990 when colleges began to provide specialized education for cartoons and animation to the present day when cartoon and Multimedia Comics fests exist in addition to the digitalization of cartoons. The encounter between new technologies and cartoons broke down the conventional forms of cartoons. The massive appearance of artists that made active use of new technologies in their works, in particular, has facilitated changes to the content and forms of cartoons and the expansion of character uses. The development of high technologies extends influence to the roles of appreciators beyond the artists' works. Today readers voice their opinions about works actively, build a fan base, promote the works and artists they favor, and help them rise to stardom. As artist groups of various genres were formed, the possibilities of new stories and texts and the appearance of diverse styles and world views have expanded the essence of cartoon texts and the overall cartoon system of cartoon culture, industry, education, institution, and technology. It is expected that cartoons and Multimedia Comics will continue to make a contribution as a messenger to reflect the next generation of culture, mediate it, and communicate with it. Today there is no longer a distinction between print and video cartoons. Cartoons will expand in every field through a wide range of forms and styles, given the current situations involving installation concept cartoons, blockbuster digital videos, fancy items, and characters at theme parks based on a narrative. It is therefore necessary to diversify cartoon and Multimedia Comics education in diverse ways. Today educators are faced with a task to bring up future generations of talents who are capable of leading the culture of overall senses based on literate and video culture by incorporating humanities, social studies, and new technology education into their creative artistic abilities.

A Characteristics of Visual Narrative Expression in Garden Design - Focused on the Taehwagang Garden Show 2018 - (정원디자인에 나타난 시각적 서술의 표현특성 - 2018 태화강 정원박람회 작품을 대상으로 -)

  • Kwon, Jin-Wook
    • Journal of the Korean Institute of Traditional Landscape Architecture
    • /
    • v.37 no.3
    • /
    • pp.108-118
    • /
    • 2019
  • Appreciating gardens in garden show has its meaning in appreciating concepts and ideas of artists, hidden inside of garden, as formative arts, as well as the beauty of the nature. This study is aimed to understand the intension of artists in visual expression through formative media in the gardens by assessing structure of visual narrative in the space with 20 artworks among the ones presented in 2018 Taehwagang Garden Show. The formative structure is delivered as contents and forms through formative media and formative language. Thus, for analysis on the artworks, the researcher assessed expressive characteristics of the media, through visual and space language, that forms the formative structure in the contest of narrative structure expressed in the gardens and findings of the analysis are as follows. First, for intertextuality obtained through media image, most of the artworks delivered message through 'figure image.' This means, the concept is delivered as 'affinity of actual objects' through the media and associated 'meaning and meaning action' are expected. Second, the characteristics of signs to show symbolism in the gardens were categorized into 'icon'. 'index' and 'symbol'. The results showed that most of the artworks expressed common characteristics between image and meaning, using 'icon' and 'symbol'. Third, as space formation components, based on formative principles, the components of 'dominant' and 'subordinate' roles were defined as the key components for meaning delivery. Also, it was understood that 'space configuration with overlapped image' and 'space configuration with transparency' were adopted to strengthen conceptional layers. Forth, for space occupation types, there were mostly central hall type, corridor type and passage space type and for open space type, the entire space area was conceptualized, instead of certain object. The circulation line was defined in the frequency order of circular type, pass type and return type. The study on the expressive characteristics of visual narrative in garden design is meaningful as it could build base data for the method of spatial design for visual development of concepts in the future.