• Title/Summary/Keyword: language data

Search Result 3,807, Processing Time 0.031 seconds

Implementation of New Markup Language for Integrating of Motion Capture Data formats (모션 캡쳐 데이타 통합을 위한 새로운 마크업 언어의 구현)

  • 정현숙;이일병
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.3_4
    • /
    • pp.219-230
    • /
    • 2003
  • Motion capture technology is widely used to make a realistic motion in these days. So, motion capture data is required to exchange between many animators in a work group. But different motion capture devices have different motion rapture data formats. Thus it is difficult that a animator reuse and exchange motion capture data in a storage. In this paper, we proposes a standard format for integrating a different motion capture data format. In addition, we proposes a framework of a system that manage motion capture data using our standard format. Our standard format is called MCML(Motion Capture Markup Language). It is a markup language for motion capture data and based on XML(extensable Markup Language). Our system to manage motion capture data consists of a several component - MCML Converter, MCML , MCML Editor, Motion Viewer.

Language-based Classification of Words using Deep Learning (딥러닝을 이용한 언어별 단어 분류 기법)

  • Zacharia, Nyambegera Duke;Dahouda, Mwamba Kasongo;Joe, Inwhee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.05a
    • /
    • pp.411-414
    • /
    • 2021
  • One of the elements of technology that has become extremely critical within the field of education today is Deep learning. It has been especially used in the area of natural language processing, with some word-representation vectors playing a critical role. However, some of the low-resource languages, such as Swahili, which is spoken in East and Central Africa, do not fall into this category. Natural Language Processing is a field of artificial intelligence where systems and computational algorithms are built that can automatically understand, analyze, manipulate, and potentially generate human language. After coming to discover that some African languages fail to have a proper representation within language processing, even going so far as to describe them as lower resource languages because of inadequate data for NLP, we decided to study the Swahili language. As it stands currently, language modeling using neural networks requires adequate data to guarantee quality word representation, which is important for natural language processing (NLP) tasks. Most African languages have no data for such processing. The main aim of this project is to recognize and focus on the classification of words in English, Swahili, and Korean with a particular emphasis on the low-resource Swahili language. Finally, we are going to create our own dataset and reprocess the data using Python Script, formulate the syllabic alphabet, and finally develop an English, Swahili, and Korean word analogy dataset.

An Enhancement of Japanese Acoustic Model using Korean Speech Database (한국어 음성데이터를 이용한 일본어 음향모델 성능 개선)

  • Lee, Minkyu;Kim, Sanghun
    • The Journal of the Acoustical Society of Korea
    • /
    • v.32 no.5
    • /
    • pp.438-445
    • /
    • 2013
  • In this paper, we propose an enhancement of Japanese acoustic model which is trained with Korean speech database by using several combination strategies. We describe the strategies for training more than two language combination, which are Cross-Language Transfer, Cross-Language Adaptation, and Data Pooling Approach. We simulated those strategies and found a proper method for our current Japanese database. Existing combination strategies are generally verified for under-resourced Language environments, but when the speech database is not fully under-resourced, those strategies have been confirmed inappropriate. We made tyied-list with only object-language on Data Pooling Approach training process. As the result, we found the ERR of the acoustic model to be 12.8 %.

CNN-based Sign Language Translation Program for the Deaf (CNN기반의 청각장애인을 위한 수화번역 프로그램)

  • Hong, Kyeong-Chan;Kim, Hyung-Su;Han, Young-Hwan
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.22 no.4
    • /
    • pp.206-212
    • /
    • 2021
  • Society is developing more and more, and communication methods are developing in many ways. However, developed communication is a way for the non-disabled and has no effect on the deaf. Therefore, in this paper, a CNN-based sign language translation program is designed and implemented to help deaf people communicate. Sign language translation programs translate sign language images entered through WebCam according to meaning based on data. The sign language translation program uses 24,000 pieces of Korean vowel data produced directly and conducts U-Net segmentation to train effective classification models. In the implemented sign language translation program, 'ㅋ' showed the best performance among all sign language data with 97% accuracy and 99% F1-Score, while 'ㅣ' showed the highest performance among vowel data with 94% accuracy and 95.5% F1-Score.

A Study of Fine Tuning Pre-Trained Korean BERT for Question Answering Performance Development (사전 학습된 한국어 BERT의 전이학습을 통한 한국어 기계독해 성능개선에 관한 연구)

  • Lee, Chi Hoon;Lee, Yeon Ji;Lee, Dong Hee
    • Journal of Information Technology Services
    • /
    • v.19 no.5
    • /
    • pp.83-91
    • /
    • 2020
  • Language Models such as BERT has been an important factor of deep learning-based natural language processing. Pre-training the transformer-based language models would be computationally expensive since they are consist of deep and broad architecture and layers using an attention mechanism and also require huge amount of data to train. Hence, it became mandatory to do fine-tuning large pre-trained language models which are trained by Google or some companies can afford the resources and cost. There are various techniques for fine tuning the language models and this paper examines three techniques, which are data augmentation, tuning the hyper paramters and partly re-constructing the neural networks. For data augmentation, we use no-answer augmentation and back-translation method. Also, some useful combinations of hyper parameters are observed by conducting a number of experiments. Finally, we have GRU, LSTM networks to boost our model performance with adding those networks to BERT pre-trained model. We do fine-tuning the pre-trained korean-based language model through the methods mentioned above and push the F1 score from baseline up to 89.66. Moreover, some failure attempts give us important lessons and tell us the further direction in a good way.

The Conversion of a Set, a Sequence, and a Map in VDM to a Linked List in a Programming Language (VDM의 자료구조인 set, sequency, map의 프로그래밍 언어 자료구조인 linked list로의 변환)

  • Yu, Mun-Seong
    • The KIPS Transactions:PartD
    • /
    • v.8D no.4
    • /
    • pp.421-426
    • /
    • 2001
  • A formal development method is used to develop software rigorously and systematically. In a formal development method, we specify system by a formal specification language and gradually develop the system more concretely until we can implement the system. VDM is one of formal specification languages. VDM uses mathematical data structures such as sets, sequences, and maps to specify the system, but most programming languages do not have such data structures. Therefore, these data structures should be converted. We can convert mathematical data structures in VDM to a linked list, a data structure in a programming language. In this article, we propose a method to convert a set, a sequence, and a map in VDM to a linked list in a programming language and prove the correctness of this conversion mathematically.

  • PDF

DISTRIBUTED WEB GIS SERVICE BASED ON XML AND INTEROPERABILITY

  • Kim, Do-Hyun
    • Proceedings of the KSRS Conference
    • /
    • 2002.10a
    • /
    • pp.145-150
    • /
    • 2002
  • Web GIS (Geographic Information Systems) service systems provide the various GIS services of analyzing and displaying the spatial data with friendly user-interface. These services are expanding the business domain and many users want to access the distributed various spatial data. But, it is difficult to access diverse data sources because of different spatial data format and data access methods. In this paper, we design and implement web GIS services based on the inter-operability and GML (Geography Markup Language) of OGC(Open GIS Consortium) in web distributed environment. Inter-operability provides unique accessing method to distributed data sources based on OLE DB technology of Microsoft. In addition, GML support web GIS services based on XML. We design these GIS services as components using UML (Unified Modeling Language) of an object-oriented modeling language for specifying, visualizing, constructing, and documenting the artifacts of software system. In addition, they also were developed in object-oriented computing environment, and it provides the interoperability, language-independent, easy developing environment as well as re-usability.

  • PDF

On the Problems of North and South Korean Scholars′ Studies on the Genealogy of Korean Language (남북한 학자의 국어 계통 연구의 제문제)

  • 정광
    • Lingua Humanitatis
    • /
    • v.6
    • /
    • pp.169-183
    • /
    • 2004
  • So far I have reviewed the two controversial opinions of the North Korean and the South Korean linguists concerning the position of the Koguryeo language in the formation of Korean. Many South Korean scholars in favor of the Altaic Language Family Hypothesis argue that the ancient Korean language consisted of two different languages, one of which was the northern dialect including four languages such as the Koguryeo language (the largest one within the area), the Puyo language, the Okche language, the Yemaek language, and the other was the southern dialect, the largest language of which is the Shinla language. On the other hand, the linguists of North Korea claim that in Koguryeo and Shinla the same language was spoken and that modern Korean is formed based on the Koguryeo language. Before evaluating which of these claims is correct I would like to turn to the scarcity of the linguistic data of the Koguryeo language. Compared with the pragmatic methodology of the South Korean linguists in the studies on the Altaic affinity of Korean, the North Korean scholars need to present still more evidences in order to support their argument. In Chung (1993) I argued that studies on the genealogy of the Korean language or history had to be performed regardless of tile political purpose or for the purposes. We should admit the historical fact that there had been many tribal states in the Korean peninsula before the ancient Korean stage, those of which had been emerged to become three kingdoms. Those kingdoms were unified by Shinla, which was connected to Koryeo Dynasty. We cannot disregard the fact that the Korean language has been developed hand in hand with these historical process with those steps related with each age. The first thing we should do right now is to collect the remaining data of the Koguryeo language recorded in the old written materials, which have been found in North Korea as many as possible. Also, 1 hope that the linguists of South Korea achieve more academic success in the comparative studies of the Paekjae language, the Shinla language, and other adjacent Altaic languages.

  • PDF

Spatial Big Data Query Processing System Supporting SQL-based Query Language in Hadoop (Hadoop에서 SQL 기반 질의언어를 지원하는 공간 빅데이터 질의처리 시스템)

  • Joo, In-Hak
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.10 no.1
    • /
    • pp.1-8
    • /
    • 2017
  • In this paper we present a spatial big data query processing system that can store spatial data in Hadoop and query the data with SQL-based query language. The system stores large-scale spatial data in HDFS-based storage system, and supports spatial queries expressed in SQL-based query language extended for spatial data processing. It supports standard spatial data types and functions defined in OGC simple feature model in the query language. This paper presents the development of core functions of the system including query language parsing, query validation, query planning, and connection with storage system. We compares the performance of the suggested system with an existing system, and our experiments show that the system shows about 58% performance improvement of query execution time over the existing system when executing region query for spatial data stored in Hadoop.

Analysis of Korean Language by First Order Markov Source (한글의 First Order Markov Source에 의한 해석)

  • 한영렬;박종원
    • Proceedings of the Korean Institute of Communication Sciences Conference
    • /
    • 1982.10a
    • /
    • pp.51-55
    • /
    • 1982
  • The analysis of Korean language by the first order markov source is carried out. The calculated entropy of the first order Markov source is also included. The results presented here are new data. The data can be useful in designing the keyboard pattern of terminal and the automatic discrimination of monosyllable in Korean language.

  • PDF