• Title/Summary/Keyword: Multilingual Data Analysis

Search Result 16, Processing Time 0.019 seconds

Multilingual Knowledge Graphs: Challenges and Opportunities

  • Partha Sarathi Mandal;Sukumar Mandal
    • International Journal of Knowledge Content Development & Technology
    • /
    • v.14 no.4
    • /
    • pp.101-111
    • /
    • 2024
  • Multilingual Knowledge Graphs (MKGs) have emerged as a crucial component in various natural language processing tasks, enabling efficient representation and utilization of structured knowledge across multiple languages. One can get data, information, and knowledge from various sectors, like libraries, archives, institutional repositories, etc. Variable quality of metadata, multilingualism, and semantic diversity make it a challenge to create a digital library and multilingual search facility. To accept these challenges, there is a need to design a framework to integrate various structured and unstructured data sources for integration, unification, and sharing databases. These are controlled using linked data and semantic web approaches. In future, multilingual knowledge graph overcomes all the linguistic nuances, technical barriers like semantic interoperability, data harmonization etc and enhance cooperation and collaboration throughout the world. Through a comprehensive analysis of the current state-of-the-art techniques and ongoing research efforts, this paper aims to offer insights into the future directions and potential advancements in the field of Multilingual Knowledge Graphs. This paper deals with a multilingual knowledge graph and how to build up a multilingual knowledge graph. It also focuses on the various challenges and opportunities for designing multilingual knowledge graphs.

A study on the aspect-based sentiment analysis of multilingual customer reviews (다국어 사용자 후기에 대한 속성기반 감성분석 연구)

  • Sungyoung Ji;Siyoon Lee;Daewoo Choi;Kee-Hoon Kang
    • The Korean Journal of Applied Statistics
    • /
    • v.36 no.6
    • /
    • pp.515-528
    • /
    • 2023
  • With the growth of the e-commerce market, consumers increasingly rely on user reviews to make purchasing decisions. Consequently, researchers are actively conducting studies to effectively analyze these reviews. Among the various methods of sentiment analysis, the aspect-based sentiment analysis approach, which examines user reviews from multiple angles rather than solely relying on simple positive or negative sentiments, is gaining widespread attention. Among the various methodologies for aspect-based sentiment analysis, there is an analysis method using a transformer-based model, which is the latest natural language processing technology. In this paper, we conduct an aspect-based sentiment analysis on multilingual user reviews using two real datasets from the latest natural language processing technology model. Specifically, we use restaurant data from the SemEval 2016 public dataset and multilingual user review data from the cosmetic domain. We compare the performance of transformer-based models for aspect-based sentiment analysis and apply various methodologies to improve their performance. Models using multilingual data are expected to be highly useful in that they can analyze multiple languages in one model without building separate models for each language.

Improving Elasticsearch for Chinese, Japanese, and Korean Text Search through Language Detector

  • Kim, Ki-Ju;Cho, Young-Bok
    • Journal of information and communication convergence engineering
    • /
    • v.18 no.1
    • /
    • pp.33-38
    • /
    • 2020
  • Elasticsearch is an open source search and analytics engine that can search petabytes of data in near real time. It is designed as a distributed system horizontally scalable and highly available. It provides RESTful APIs, thereby making it programming-language agnostic. Full text search of multilingual text requires language-specific analyzers and field mappings appropriate for indexing and searching multilingual text. Additionally, a language detector can be used in conjunction with the analyzers to improve the multilingual text search. Elasticsearch provides more than 40 language analysis plugins that can process text and extract language-specific tokens and language detector plugins that can determine the language of the given text. This study investigates three different approaches to index and search Chinese, Japanese, and Korean (CJK) text (single analyzer, multi-fields, and language detector-based), and identifies the advantages of the language detector-based approach compared to the other two.

A Method of Analyzing Sentiment Polarity of Multilingual Social Media: A Case of Korean-Chinese Languages (다국어 소셜미디어에 대한 감성분석 방법 개발: 한국어-중국어를 중심으로)

  • Cui, Meina;Jin, Yoonsun;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.91-111
    • /
    • 2016
  • It is crucial for the social media based marketing practices to perform sentiment analyze the unstructured data written by the potential consumers of their products and services. In particular, when it comes to the companies which are interested in global business, the companies must collect and analyze the data from the social media of multinational settings (e.g. Youtube, Instagram, etc.). In this case, since the texts are multilingual, they usually translate the sentences into a certain target language before conducting sentiment analysis. However, due to the lack of cultural differences and highly qualified data dictionary, translated sentences suffer from misunderstanding the true meaning. These result in decreasing the quality of sentiment analysis. Hence, this study aims to propose a method to perform a multilingual sentiment analysis, focusing on Korean-Chinese cases, while avoiding language translations. To show the feasibility of the idea proposed in this paper, we compare the performance of the proposed method with those of the legacy methods which adopt language translators. The results suggest that our method outperforms in terms of RMSE, and can be applied by the global business institutions.

Identifying Similar Overseas Patent Using Word2Vec-Based Semantic Text Analytics (Word2Vec 학습을 통한 의미 기반 해외 유사 특허 검색 방안)

  • Paek, Minji;Kim, Namgyu
    • Journal of Information Technology Services
    • /
    • v.17 no.2
    • /
    • pp.129-142
    • /
    • 2018
  • Recently, the number of patent applications have been increasing rapidly every year as the importance of protecting intellectual property rights becomes more important. Patents must be inventive and have novelty. Especially, the novelty implies that the corresponding invention is not the same as the previous invention. To confirm the novelty, prior art search must be conducted before and after the application. The target of prior art search should include not only Korean patents but also foreign patents. Search of foreign patents should be supported by multilingual search techniques. However, a dictionary-based naive approach shows a limitation because some technical concepts are represented in different terms according to each nation. For example, a Korean term and a Japanese term may not be synonym even though they represent the same technical concept. In this paper, we propose a new method to map semantic similarity between technical terms in Korean patents and Japanese patents. To investigate different representations in each nation for the same technical concept, we identified and analyzed pairs of patents those are mutually connected with priority claim relationship. By performing an experiment with real-world data, we showed that our approach can reveal semantically similar technical terms in other language successfully.

Sentiment analysis of Korean movie reviews using XLM-R

  • Shin, Noo Ri;Kim, TaeHyeon;Yun, Dai Yeol;Moon, Seok-Jae;Hwang, Chi-gon
    • International Journal of Advanced Culture Technology
    • /
    • v.9 no.2
    • /
    • pp.86-90
    • /
    • 2021
  • Sentiment refers to a person's thoughts, opinions, and feelings toward an object. Sentiment analysis is a process of collecting opinions on a specific target and classifying them according to their emotions, and applies to opinion mining that analyzes product reviews and reviews on the web. Companies and users can grasp the opinions of public opinion and come up with a way to do so. Recently, natural language processing models using the Transformer structure have appeared, and Google's BERT is a representative example. Afterwards, various models came out by remodeling the BERT. Among them, the Facebook AI team unveiled the XLM-R (XLM-RoBERTa), an upgraded XLM model. XLM-R solved the data limitation and the curse of multilinguality by training XLM with 2TB or more refined CC (CommonCrawl), not Wikipedia data. This model showed that the multilingual model has similar performance to the single language model when it is trained by adjusting the size of the model and the data required for training. Therefore, in this paper, we study the improvement of Korean sentiment analysis performed using a pre-trained XLM-R model that solved curse of multilinguality and improved performance.

Lesson from the Cataloging Experience on Multicultural Collection (다문화 장서에 대한 목록 구축의 경험과 교훈)

  • Rho, Jee-Hyun
    • Journal of Korean Library and Information Science Society
    • /
    • v.39 no.4
    • /
    • pp.397-420
    • /
    • 2008
  • This study aims to discuss cataloging on multicultural collection in Korean libraries. Especially, this study emphasized to derive a lesson of immeasurable value from the cataloging experience in International Children's Library at Asia School. To the end, (1) comprehensive literature survey and analysis were conducted to introduce the discussion on cataloging for multicultural or multilingual collection, (2) the cataloging examples were examined comprehensively (the data needed were collected by public, academic and nonofficial libraries in Korea, as well as several libraries in North America), and finally (3) cataloging policy and practices for multicultural collection were suggested on the basis of the experience in International Children's Library at Asia School.

  • PDF

Web Contents Mining System for Real-Time Monitoring of Opinion Information based on Web 2.0 (웹2.0에서 의견정보의 실시간 모니터링을 위한 웹 콘텐츠 마이닝 시스템)

  • Kim, Young-Choon;Joo, Hae-Jong;Choi, Hae-Gill;Cho, Moon-Taek;Kim, Young-Baek;Rhee, Sang-Yong
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.21 no.1
    • /
    • pp.68-79
    • /
    • 2011
  • This paper focuses on the opinion information extraction and analysis system through Web mining that is based on statistics collected from Web contents. That is, users' opinion information which is scattered across several websites can be automatically analyzed and extracted. The system provides the opinion information search service that enables users to search for real-time positive and negative opinions and check their statistics. Also, users can do real-time search and monitoring about other opinion information by putting keywords in the system. Proposing technique proved that the actual performance is excellent by comparison experiment with other techniques. Performance evaluation of function extracting positive/negative opinion information, the performance evaluation applying dynamic window technique and tokenizer technique for multilingual information retrieval, and the performance evaluation of technique extracting exact multilingual phonetic translation are carried out. The experiment with typical movie review sentence and Wikipedia experiment data as object as that applying example is carried out and the result is analyzed.

Analysis of LinkedIn Jobs for Finding High Demand Job Trends Using Text Processing Techniques

  • Kazi, Abdul Karim;Farooq, Muhammad Umer;Fatima, Zainab;Hina, Saman;Abid, Hasan
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.10
    • /
    • pp.223-229
    • /
    • 2022
  • LinkedIn is one of the most job hunting and career-growing applications in the world. There are a lot of opportunities and jobs available on LinkedIn. According to statistics, LinkedIn has 738M+ members. 14M+ open jobs on LinkedIn and 55M+ Companies listed on this mega-connected application. A lot of vacancies are available daily. LinkedIn data has been used for the research work carried out in this paper. This in turn can significantly tackle the challenges faced by LinkedIn and other job posting applications to improve the levels of jobs available in the industry. This research introduces Text Processing in natural language processing on datasets of LinkedIn which aims to find out the jobs that appear most in a month or/and year. Therefore, the large data became renewed into the required or needful source. This study thus uses Multinomial Naïve Bayes and Linear Support Vector Machine learning algorithms for text classification and developed a trained multilingual dataset. The results indicate the most needed job vacancies in any field. This will help students, job seekers, and entrepreneurs with their career decisions

Study on Zero-shot based Quality Estimation (Zero-Shot 기반 기계번역 품질 예측 연구)

  • Eo, Sugyeong;Park, Chanjun;Seo, Jaehyung;Moon, Hyeonseok;Lim, Heuiseok
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.11
    • /
    • pp.35-43
    • /
    • 2021
  • Recently, there has been a growing interest in zero-shot cross-lingual transfer, which leverages cross-lingual language models (CLLMs) to perform downstream tasks that are not trained in a specific language. In this paper, we point out the limitations of the data-centric aspect of quality estimation (QE), and perform zero-shot cross-lingual transfer even in environments where it is difficult to construct QE data. Few studies have dealt with zero-shots in QE, and after fine-tuning the English-German QE dataset, we perform zero-shot transfer leveraging CLLMs. We conduct comparative analysis between various CLLMs. We also perform zero-shot transfer on language pairs with different sized resources and analyze results based on the linguistic characteristics of each language. Experimental results showed the highest performance in multilingual BART and multillingual BERT, and we induced QE to be performed even when QE learning for a specific language pair was not performed at all.