• Title/Summary/Keyword: label text

Search Result 65, Processing Time 0.021 seconds

CORRECT? CORECT!: Classification of ESG Ratings with Earnings Call Transcript

  • Haein Lee;Hae Sun Jung;Heungju Park;Jang Hyun Kim
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.4
    • /
    • pp.1090-1100
    • /
    • 2024
  • While the incorporating ESG indicator is recognized as crucial for sustainability and increased firm value, inconsistent disclosure of ESG data and vague assessment standards have been key challenges. To address these issues, this study proposes an ambiguous text-based automated ESG rating strategy. Earnings Call Transcript data were classified as E, S, or G using the Refinitiv-Sustainable Leadership Monitor's over 450 metrics. The study employed advanced natural language processing techniques such as BERT, RoBERTa, ALBERT, FinBERT, and ELECTRA models to precisely classify ESG documents. In addition, the authors computed the average predicted probabilities for each label, providing a means to identify the relative significance of different ESG factors. The results of experiments demonstrated the capability of the proposed methodology in enhancing ESG assessment criteria established by various rating agencies and highlighted that companies primarily focus on governance factors. In other words, companies were making efforts to strengthen their governance framework. In conclusion, this framework enables sustainable and responsible business by providing insight into the ESG information contained in Earnings Call Transcript data.

Automatic Training Corpus Generation Method of Named Entity Recognition Using Knowledge-Bases (개체명 인식 코퍼스 생성을 위한 지식베이스 활용 기법)

  • Park, Youngmin;Kim, Yejin;Kang, Sangwoo;Seo, Jungyun
    • Korean Journal of Cognitive Science
    • /
    • v.27 no.1
    • /
    • pp.27-41
    • /
    • 2016
  • Named entity recognition is to classify elements in text into predefined categories and used for various departments which receives natural language inputs. In this paper, we propose a method which can generate named entity training corpus automatically using knowledge bases. We apply two different methods to generate corpus depending on the knowledge bases. One of the methods attaches named entity labels to text data using Wikipedia. The other method crawls data from web and labels named entities to web text data using Freebase. We conduct two experiments to evaluate corpus quality and our proposed method for generating Named entity recognition corpus automatically. We extract sentences randomly from two corpus which called Wikipedia corpus and Web corpus then label them to validate both automatic labeled corpus. We also show the performance of named entity recognizer trained by corpus generated in our proposed method. The result shows that our proposed method adapts well with new corpus which reflects diverse sentence structures and the newest entities.

  • PDF

Safety and Effectiveness of Long Acting Injectable Antipsychotic Paliperidone Palmitate Treatment in Schizophrenics : A 24-Week Open-Label Study (조현병 환자에서 장기지속형 항정신병 주사제 팔리페리돈 팔미테이트의 효능과 안전 : 24주 개방형 연구)

  • Kang, Hyun-Ku;Hahm, Woong;Shon, In-Ki;Paik, In-Ho
    • Korean Journal of Biological Psychiatry
    • /
    • v.20 no.3
    • /
    • pp.111-117
    • /
    • 2013
  • Objectives We investigated the effectiveness and safety when treated in schizophrenics with paliperidone palmitate, a long acting injectable antipsychotic. Methods This was a 24-week open-label study, performed at one center in Korea. The eligible patients with schizophrenia diagnosed by Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition, Text Revision (DSM-IV-TR) criteria were enrolled. Patients received long-acting paliperidone palmitate injection (234 mg, baseline; 156 mg, week 1 ; then once 4 weeks flexible dosing). Effectiveness assessments were measured by the Positive and Negative Syndrome Scale (PANSS), The Clinical Global Impression Severity Scale (CGI-S), The Personal and Social Performance (PSP) at baseline, week 1, every 4 weeks untill 24 weeks or endpoint. Safety assessments were measured by The Extrapyramidal Symptom Rating Scale (ESRS), body weight (BW) and incidence of adverse events. Oral antipsychotics were stopped or tapered off within next 14 days. Results Of 20 patients recruited, 9 patients (45%) completed the study. Paliperidone palmitate produced a significant improvement in PANSS total score from baseline to endpoint. The response rate was 75% [mean change (${\pm}SD$) $-25.9{\pm}14.4$, all p < 0.001]. The CGI-S and PSP total scores significantly improved during 24 weeks (All p < 0.001). Eighty percent of patients reported adverse events and most common adverse events (${\geq}10%$) in paliperidone palmitate were anticholinergic adverse event, extrapyramidal symptoms, weight gain, akathisia, insomnia, headache, agitation, anxiety and GI trouble. ESRS score is not statistically significant, but tends to get better at the end of the study when compared to baseline. Conclusions Our study results demonstrated maintained effectiveness and safety of paliperidone palmitate treatment in schizophrenics. And provides both clinicians and patients with a new choice of treatment that can improve the outcome of long term therapy. Their potential effectiveness and safety should be better addressed by future randomized-controlled trials.

Restoring Omitted Sentence Constituents in Encyclopedia Documents Using Structural SVM (Structural SVM을 이용한 백과사전 문서 내 생략 문장성분 복원)

  • Hwang, Min-Kook;Kim, Youngtae;Ra, Dongyul;Lim, Soojong;Kim, Hyunki
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.131-150
    • /
    • 2015
  • Omission of noun phrases for obligatory cases is a common phenomenon in sentences of Korean and Japanese, which is not observed in English. When an argument of a predicate can be filled with a noun phrase co-referential with the title, the argument is more easily omitted in Encyclopedia texts. The omitted noun phrase is called a zero anaphor or zero pronoun. Encyclopedias like Wikipedia are major source for information extraction by intelligent application systems such as information retrieval and question answering systems. However, omission of noun phrases makes the quality of information extraction poor. This paper deals with the problem of developing a system that can restore omitted noun phrases in encyclopedia documents. The problem that our system deals with is almost similar to zero anaphora resolution which is one of the important problems in natural language processing. A noun phrase existing in the text that can be used for restoration is called an antecedent. An antecedent must be co-referential with the zero anaphor. While the candidates for the antecedent are only noun phrases in the same text in case of zero anaphora resolution, the title is also a candidate in our problem. In our system, the first stage is in charge of detecting the zero anaphor. In the second stage, antecedent search is carried out by considering the candidates. If antecedent search fails, an attempt made, in the third stage, to use the title as the antecedent. The main characteristic of our system is to make use of a structural SVM for finding the antecedent. The noun phrases in the text that appear before the position of zero anaphor comprise the search space. The main technique used in the methods proposed in previous research works is to perform binary classification for all the noun phrases in the search space. The noun phrase classified to be an antecedent with highest confidence is selected as the antecedent. However, we propose in this paper that antecedent search is viewed as the problem of assigning the antecedent indicator labels to a sequence of noun phrases. In other words, sequence labeling is employed in antecedent search in the text. We are the first to suggest this idea. To perform sequence labeling, we suggest to use a structural SVM which receives a sequence of noun phrases as input and returns the sequence of labels as output. An output label takes one of two values: one indicating that the corresponding noun phrase is the antecedent and the other indicating that it is not. The structural SVM we used is based on the modified Pegasos algorithm which exploits a subgradient descent methodology used for optimization problems. To train and test our system we selected a set of Wikipedia texts and constructed the annotated corpus in which gold-standard answers are provided such as zero anaphors and their possible antecedents. Training examples are prepared using the annotated corpus and used to train the SVMs and test the system. For zero anaphor detection, sentences are parsed by a syntactic analyzer and subject or object cases omitted are identified. Thus performance of our system is dependent on that of the syntactic analyzer, which is a limitation of our system. When an antecedent is not found in the text, our system tries to use the title to restore the zero anaphor. This is based on binary classification using the regular SVM. The experiment showed that our system's performance is F1 = 68.58%. This means that state-of-the-art system can be developed with our technique. It is expected that future work that enables the system to utilize semantic information can lead to a significant performance improvement.

A Study on Usability Improvement of Camera Application of Galaxy S7 (갤럭시S7의 카메라 어플리케이션 사용성 개선에 관한연구)

  • Yu, Sung-ho;Lim, Seong-Taek
    • Journal of the Korea Convergence Society
    • /
    • v.8 no.12
    • /
    • pp.249-255
    • /
    • 2017
  • Recently, among smart phone functions, cameras are one of the most popular functions and have become one of the most influential functions for purchasing smartphones. However, the basic camera application of the smart phone has a complicated user environment, which is causing many difficulties for the first time user. In this study, Galaxy S7, which is the newest Galaxy S series among the most used Galaxy S series in Korea, was selected and the usability test of the camera application was limited to shooting and editing sharing functions. As a result, first, improvement of icon graphic and label of text form should be provided at the same time to increase the recognition rate and attention of the icon. Second, it is necessary to simplify the structure and provide an intuitive interface in order to facilitate access to various modes and functions. Third, it is necessary to simplify the provision of personalized customized menus or functions in the development of the camera application because it causes a high failure rate and inconvenience in the special functions which are not widely used.

Long Song Type Classification based on Lyrics

  • Namjil, Bayarsaikhan;Ganbaatar, Nandinbilig;Batsuuri, Suvdaa
    • Journal of Multimedia Information System
    • /
    • v.9 no.2
    • /
    • pp.113-120
    • /
    • 2022
  • Mongolian folk songs are inspired by Mongolian labor songs and are classified into long and short songs. Mongolian long songs have ancient origins, are rich in legends, and are a great source of folklore. So it was inscribed by UNESCO in 2008. Mongolian written literature is formed under the direct influence of oral literature. Mongolian long song has 3 classes: ayzam, suman, and besreg by their lyrics and structure. In ayzam long song, the world perfectly embodies the philosophical nature of world phenomena and the nature of human life. Suman long song has a wide range of topics such as the common way of life, respect for ancestors, respect for fathers, respect for mountains and water, livestock and animal husbandry, as well as the history of Mongolia. Besreg long songs are dominated by commanded and trained characters. In this paper, we proposed a method to classify their 3 types of long songs using machine learning, based on their lyrics structures without semantic information. We collected lyrics of over 80 long songs and extracted 11 features from every single song. The features are the name of a song, number of the verse, number of lines, number of words, general value, double value, elapsed time of verse, elapsed time of 5 words, and the longest elapsed time of 1 word, full text, and type label. In experimental results, our proposed features show on average 78% recognition rates in function type machine learning methods, to classify the ayzam, suman, and besreg classes.

Vietnamese Syncretism and the Characteristics of Caodaism's Chief Deity: Problematising Đức Cao Đài as a 'Monotheistic' God Within an East Asian Heavenly Milieu

  • HARTNEY, Christopher
    • Journal of Daesoon Thought and the Religions of East Asia
    • /
    • v.1 no.2
    • /
    • pp.41-59
    • /
    • 2022
  • Caodaism is a new religion from Vietnam which began in late 1925 and spread rapidly across the French colony of Indochina. With a broad syncretic aim, the new faith sought to revivify Vietnamese religious traditions whilst also incorporating religious, literary, and spiritist influences from France. Like Catholicism, Caodaism kept a strong focus on its monotheistic nature and today Caodaists are eager to label their religion a monotheism. It will be argued here, however, that the syncretic nature of this new faith complicates this claim to a significant degree. To make this argument, we will consider here the nature of God in Caodaism through two central texts from two important stages in the life of the religion. The first is the canonized Compilation of Divine Messages which collects a range of spirit messages from God and some other divine voices. These were received in the early years of the faith. The second is a collection of sermons from 1948/9 that takes Caodaist believers on a tour of heaven, and which is entitled The Divine Path to Eternal Life. It will be shown that in the first text, God speaks in the mode of a fully omnipotent and omniscient supreme being. In the second text, however, we are given a view of paradise that is much more akin to the court of a Jade Emperor within an East Asian milieu. In these realms, the personalities of other beings and redemptive mechanisms claim much of our attention, and seem to be a competing center of power to that of God. Furthermore, God's consort, the Divine Mother, takes on a range of sacred creative prerogatives that do something similar. Additionally, cadres of celestial administrators; buddhas, immortals, and saints help with the operation of a cosmos which spins on with guidance from its own laws. These laws form sacred mechanisms, such as cycles of reincarnation and judgement. These operate not in the purview of God, but as part of the very nature of the cosmos itself. In this context, the dualistic, polytheistic, and even automatic nature of Caodaism's cosmos will be considered in terms of the way in which they complicate this religion's monotheistic claims. To conclude, this article seeks to demonstrate the precise relevance of the term 'monotheism' for this religion.

The way to make training data for deep learning model to recognize keywords in product catalog image at E-commerce (온라인 쇼핑몰에서 상품 설명 이미지 내의 키워드 인식을 위한 딥러닝 훈련 데이터 자동 생성 방안)

  • Kim, Kitae;Oh, Wonseok;Lim, Geunwon;Cha, Eunwoo;Shin, Minyoung;Kim, Jongwoo
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.1-23
    • /
    • 2018
  • From the 21st century, various high-quality services have come up with the growth of the internet or 'Information and Communication Technologies'. Especially, the scale of E-commerce industry in which Amazon and E-bay are standing out is exploding in a large way. As E-commerce grows, Customers could get what they want to buy easily while comparing various products because more products have been registered at online shopping malls. However, a problem has arisen with the growth of E-commerce. As too many products have been registered, it has become difficult for customers to search what they really need in the flood of products. When customers search for desired products with a generalized keyword, too many products have come out as a result. On the contrary, few products have been searched if customers type in details of products because concrete product-attributes have been registered rarely. In this situation, recognizing texts in images automatically with a machine can be a solution. Because bulk of product details are written in catalogs as image format, most of product information are not searched with text inputs in the current text-based searching system. It means if information in images can be converted to text format, customers can search products with product-details, which make them shop more conveniently. There are various existing OCR(Optical Character Recognition) programs which can recognize texts in images. But existing OCR programs are hard to be applied to catalog because they have problems in recognizing texts in certain circumstances, like texts are not big enough or fonts are not consistent. Therefore, this research suggests the way to recognize keywords in catalog with the Deep Learning algorithm which is state of the art in image-recognition area from 2010s. Single Shot Multibox Detector(SSD), which is a credited model for object-detection performance, can be used with structures re-designed to take into account the difference of text from object. But there is an issue that SSD model needs a lot of labeled-train data to be trained, because of the characteristic of deep learning algorithms, that it should be trained by supervised-learning. To collect data, we can try labelling location and classification information to texts in catalog manually. But if data are collected manually, many problems would come up. Some keywords would be missed because human can make mistakes while labelling train data. And it becomes too time-consuming to collect train data considering the scale of data needed or costly if a lot of workers are hired to shorten the time. Furthermore, if some specific keywords are needed to be trained, searching images that have the words would be difficult, as well. To solve the data issue, this research developed a program which create train data automatically. This program can make images which have various keywords and pictures like catalog and save location-information of keywords at the same time. With this program, not only data can be collected efficiently, but also the performance of SSD model becomes better. The SSD model recorded 81.99% of recognition rate with 20,000 data created by the program. Moreover, this research had an efficiency test of SSD model according to data differences to analyze what feature of data exert influence upon the performance of recognizing texts in images. As a result, it is figured out that the number of labeled keywords, the addition of overlapped keyword label, the existence of keywords that is not labeled, the spaces among keywords and the differences of background images are related to the performance of SSD model. This test can lead performance improvement of SSD model or other text-recognizing machine based on deep learning algorithm with high-quality data. SSD model which is re-designed to recognize texts in images and the program developed for creating train data are expected to contribute to improvement of searching system in E-commerce. Suppliers can put less time to register keywords for products and customers can search products with product-details which is written on the catalog.

Consumers Perceptions on Sodium Saccharin in Social Media (소셜미디어 분석을 통한 삭카린나트륨 소비자 인식 조사)

  • Lee, Sooyeon;Lee, Wonsung;Moon, Il-Chul;Kwon, Hoonjeong
    • Journal of Food Hygiene and Safety
    • /
    • v.30 no.4
    • /
    • pp.329-342
    • /
    • 2015
  • The purpose of this study was to investigate consumers' perceptions of sodium saccharin in social media. Data was collected from Naver blogs and Naver web communities (Korean representative portal web-site), and media reports including comment sections on a Yonhap news website (Korean largest news agency). The results from Naver blogs and Naver web communities showed that it was primarily mentioned 'sodium saccharin-no added' products, properties of sodium saccharin, and methods of reducing sodium saccharin in food. When media reported the expansion of food categories permitted to use sodium saccharin, search volume for sodium saccharin has increased in both PC and mobile search engines. Also, it was mainly commented about distrust of government, criticism of food product price, and distrust of food companies below the news on the news site. The label of sodium saccharin-no added products in market emphasized "no added-sodium saccharin". These results suggest that consumers are interested in sodium saccharin and especially when media reported the expansion of food categories permitted to use it. Consumers were able to search various information on sodium saccharin except safety or acceptable daily intake through social media. Therefore media or competent authority should report item on sodium saccharin with information including safety or acceptable daily intake based on scientific background and reference or experts' interview for consumers to get reliable information.

Korean speech recognition using deep learning (딥러닝 모형을 사용한 한국어 음성인식)

  • Lee, Suji;Han, Seokjin;Park, Sewon;Lee, Kyeongwon;Lee, Jaeyong
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.2
    • /
    • pp.213-227
    • /
    • 2019
  • In this paper, we propose an end-to-end deep learning model combining Bayesian neural network with Korean speech recognition. In the past, Korean speech recognition was a complicated task due to the excessive parameters of many intermediate steps and needs for Korean expertise knowledge. Fortunately, Korean speech recognition becomes manageable with the aid of recent breakthroughs in "End-to-end" model. The end-to-end model decodes mel-frequency cepstral coefficients directly as text without any intermediate processes. Especially, Connectionist Temporal Classification loss and Attention based model are a kind of the end-to-end. In addition, we combine Bayesian neural network to implement the end-to-end model and obtain Monte Carlo estimates. Finally, we carry out our experiments on the "WorimalSam" online dictionary dataset. We obtain 4.58% Word Error Rate showing improved results compared to Google and Naver API.