• Title/Summary/Keyword: language processing

Search Result 2,686, Processing Time 0.034 seconds

A Survey on the Trend of Wireless Internet Protocol (무선인터넷 프로토콜 표준화)

  • Oh, Heang-Suk
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2000.10b
    • /
    • pp.1087-1090
    • /
    • 2000
  • 현재의 이동통신망에서 인터넷 서비스를 제공할 수 있도록 하기 위하여 Phone.com 사에서는 HDTP(Handhold Device Transport Protocol)와 HDML(Handheld Device Markup Language)을 개발하고, Nokia 사에서는 TTML(Tagged Text Mark-up Language)를 개발하였다. 이에 따라 업체마다 각기 나름의 기술을 개발하게 되어 서로 호환되지 않는 문제가 발생했다. 이에 1997년 6월에 WAP 포럼이 결성되어, 현재 전세계 700여개 업체가 참여하고 있다. 본고에서는 WAP 포럼을 중심으로 한 기술개발 및 표준화 현황을 조사 분석하였다.

  • PDF

A Method of Descriptor Extraction for Automatic Document Clustering (자동 문서 클러스터링을 위한 디스크립터 추출 방안)

  • Yun, Bo-Hyun;Kang, Hyun-Kyu;Ko, Hyung-Dae
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2000.04a
    • /
    • pp.230-233
    • /
    • 2000
  • 기존의 검색엔진은 검색결과를 적합도 순서로 나열하여 사용자가 원하는 문서를 찾는데 어려움이 있다. 이러한 문제의 해결책으로 검색결과 문서에 대해 자동 클러스터링을 수행하여 문서 내용이 유사한 문서가 하나의 클러스터내에 존재하도록 한다. 본 논문에서는 검색 결과 문서의 클러스터링에서 필요한 디스크립터 추출 방안을 제안한다. 각 클러스터 내에서 디스크립터를 추출하기 위해 정보검색의 색인과정에서 사용하는 용어 가중치 계산 방법을 이용한다.

  • PDF

Linguistic Processing in Automatic Interpretation System between English-Korean Language Pair

  • Choi, K.S.;Lee, S.M.;Lee, Y.J.
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1994.06a
    • /
    • pp.1076-1081
    • /
    • 1994
  • This paper presents the linguistic processing for the Automatic Interpretation system between English/Korean language pair. We introduce two machine translation systems, each for English-to-Korean and Korean-to-English, describe the system configuration and several characteristics, and discuss the translation evaluation results.

  • PDF

A Study on Natural Language Document and Query Processor for Information Retrieval in Digital Library (디지털 도서관 환경에서의 정보 검색을 위한 자연어 문서 및 질의 처리기에 관한 연구)

  • 윤성희
    • Journal of the Korea Computer Industry Society
    • /
    • v.2 no.12
    • /
    • pp.1601-1608
    • /
    • 2001
  • Digital library is the most important database system that needs information retrieval engine for natural language documents and multimedia data. This paper describes the experimental results of information retrieval engine and browser based on natural language processing. It includes lexical analysis, syntax processing, stemming, and keyword indexing for the natural language text. With the experimental database ‘Earth and Space Science’ that has lots of images and titles and their descriptive text in natural language, text-based search engine was tested. Combined with content-based image search engine, it is expected to be a multimedia information retrieval system in digital library

  • PDF

A Relational Spatiotemporal Database Query Language and Their Operation (관계형 시공간 데이터베이스 질의언어와 연산)

  • Kim, Dong-Ho;Ryu, Keun-Ho
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.10
    • /
    • pp.2467-2478
    • /
    • 1998
  • The spatiotemporal databases support historical informations as well as spatial managements for the objects in the real world, and can be efficiently used to various applications such as geographic information system, urban plan system, car navigation system. Although there are so far several pioneering works in spatioemporal data modeling, it is little to study on spatiotemporal query language. So in this paper, we at first survey a modeling technique that makes spatiotemporal objects into databases, and then show some sighificant functions supported by query language. Also we suggest a new spatiotemporal database query language, entitled as STQL, which make users to enable to manage efficient historical information as well as spatial management function on the basis of relational query language, SQL. Also the proposed STQL, shows their operationsl in spatiotemporal daabases.

  • PDF

Emotion Recognition of Low Resource (Sindhi) Language Using Machine Learning

  • Ahmed, Tanveer;Memon, Sajjad Ali;Hussain, Saqib;Tanwani, Amer;Sadat, Ahmed
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.8
    • /
    • pp.369-376
    • /
    • 2021
  • One of the most active areas of research in the field of affective computing and signal processing is emotion recognition. This paper proposes emotion recognition of low-resource (Sindhi) language. This work's uniqueness is that it examines the emotions of languages for which there is currently no publicly accessible dataset. The proposed effort has provided a dataset named MAVDESS (Mehran Audio-Visual Dataset Mehran Audio-Visual Database of Emotional Speech in Sindhi) for the academic community of a significant Sindhi language that is mainly spoken in Pakistan; however, no generic data for such languages is accessible in machine learning except few. Furthermore, the analysis of various emotions of Sindhi language in MAVDESS has been carried out to annotate the emotions using line features such as pitch, volume, and base, as well as toolkits such as OpenSmile, Scikit-Learn, and some important classification schemes such as LR, SVC, DT, and KNN, which will be further classified and computed to the machine via Python language for training a machine. Meanwhile, the dataset can be accessed in future via https://doi.org/10.5281/zenodo.5213073.

Syntactic Structured Framework for Resolving Reflexive Anaphora in Urdu Discourse Using Multilingual NLP

  • Nasir, Jamal A.;Din, Zia Ud.
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.4
    • /
    • pp.1409-1425
    • /
    • 2021
  • In wide-ranging information society, fast and easy access to information in language of one's choice is indispensable, which may be provided by using various multilingual Natural Language Processing (NLP) applications. Natural language text contains references among different language elements, called anaphoric links. Resolving anaphoric links is a key problem in NLP. Anaphora resolution is an essential part of NLP applications. Anaphoric links need to be properly interpreted for clear understanding of natural languages. For this purpose, a mechanism is desirable for the identification and resolution of these naturally occurring anaphoric links. In this paper, a framework based on Hobbs syntactic approach and a system developed by Lappin & Leass is proposed for resolution of reflexive anaphoric links, present in Urdu text documents. Generally, anaphora resolution process takes three main steps: identification of the anaphor, location of the candidate antecedent(s) and selection of the appropriate antecedent. The proposed framework is based on exploring the syntactic structure of reflexive anaphors to find out various features for constructing heuristic rules to develop an algorithm for resolving these anaphoric references. System takes Urdu text containing reflexive anaphors as input, and outputs Urdu text with resolved reflexive anaphoric links. Despite having scarcity of Urdu resources, our results are encouraging. The proposed framework can be utilized in multilingual NLP (m-NLP) applications.

A Protein-Protein Interaction Extraction Approach Based on Large Pre-trained Language Model and Adversarial Training

  • Tang, Zhan;Guo, Xuchao;Bai, Zhao;Diao, Lei;Lu, Shuhan;Li, Lin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.3
    • /
    • pp.771-791
    • /
    • 2022
  • Protein-protein interaction (PPI) extraction from original text is important for revealing the molecular mechanism of biological processes. With the rapid growth of biomedical literature, manually extracting PPI has become more time-consuming and laborious. Therefore, the automatic PPI extraction from the raw literature through natural language processing technology has attracted the attention of the majority of researchers. We propose a PPI extraction model based on the large pre-trained language model and adversarial training. It enhances the learning of semantic and syntactic features using BioBERT pre-trained weights, which are built on large-scale domain corpora, and adversarial perturbations are applied to the embedding layer to improve the robustness of the model. Experimental results showed that the proposed model achieved the highest F1 scores (83.93% and 90.31%) on two corpora with large sample sizes, namely, AIMed and BioInfer, respectively, compared with the previous method. It also achieved comparable performance on three corpora with small sample sizes, namely, HPRD50, IEPA, and LLL.

Comparative study of text representation and learning for Persian named entity recognition

  • Pour, Mohammad Mahdi Abdollah;Momtazi, Saeedeh
    • ETRI Journal
    • /
    • v.44 no.5
    • /
    • pp.794-804
    • /
    • 2022
  • Transformer models have had a great impact on natural language processing (NLP) in recent years by realizing outstanding and efficient contextualized language models. Recent studies have used transformer-based language models for various NLP tasks, including Persian named entity recognition (NER). However, in complex tasks, for example, NER, it is difficult to determine which contextualized embedding will produce the best representation for the tasks. Considering the lack of comparative studies to investigate the use of different contextualized pretrained models with sequence modeling classifiers, we conducted a comparative study about using different classifiers and embedding models. In this paper, we use different transformer-based language models tuned with different classifiers, and we evaluate these models on the Persian NER task. We perform a comparative analysis to assess the impact of text representation and text classification methods on Persian NER performance. We train and evaluate the models on three different Persian NER datasets, that is, MoNa, Peyma, and Arman. Experimental results demonstrate that XLM-R with a linear layer and conditional random field (CRF) layer exhibited the best performance. This model achieved phrase-based F-measures of 70.04, 86.37, and 79.25 and word-based F scores of 78, 84.02, and 89.73 on the MoNa, Peyma, and Arman datasets, respectively. These results represent state-of-the-art performance on the Persian NER task.

A Study on the Web Building Assistant System Using GUI Object Detection and Large Language Model (웹 구축 보조 시스템에 대한 GUI 객체 감지 및 대규모 언어 모델 활용 연구)

  • Hyun-Cheol Jang;Hyungkuk Jang
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2024.05a
    • /
    • pp.830-833
    • /
    • 2024
  • As Large Language Models (LLM) like OpenAI's ChatGPT[1] continue to grow in popularity, new applications and services are expected to emerge. This paper introduces an experimental study on a smart web-builder application assistance system that combines Computer Vision with GUI object recognition and the ChatGPT (LLM). First of all, the research strategy employed computer vision technology in conjunction with Microsoft's "ChatGPT for Robotics: Design Principles and Model Abilities"[2] design strategy. Additionally, this research explores the capabilities of Large Language Model like ChatGPT in various application design tasks, specifically in assisting with web-builder tasks. The study examines the ability of ChatGPT to synthesize code through both directed prompts and free-form conversation strategies. The researchers also explored ChatGPT's ability to perform various tasks within the builder domain, including functions and closure loop inferences, basic logical and mathematical reasoning. Overall, this research proposes an efficient way to perform various application system tasks by combining natural language commands with computer vision technology and LLM (ChatGPT). This approach allows for user interaction through natural language commands while building applications.

  • PDF