• Title/Summary/Keyword: R language

Search Result 499, Processing Time 0.028 seconds

Phoneme distribution and phonological processes of orthographic and pronounced phrasal words in light of syllable structure in the Seoul Corpus (음절구조로 본 서울코퍼스의 글 어절과 말 어절의 음소분포와 음운변동)

  • Yang, Byunggon
    • Phonetics and Speech Sciences
    • /
    • v.8 no.3
    • /
    • pp.1-9
    • /
    • 2016
  • This paper investigated the phoneme distribution and phonological processes of orthographic and pronounced phrasal words in light of syllable structure in the Seoul Corpus in order to provide linguists and phoneticians with a clearer understanding of the Korean language system. To achieve the goal, the phrasal words were extracted from the transcribed label scripts of the Seoul Corpus using Praat. Following this, the onsets, peaks, codas and syllable types of the phrasal words were analyzed using an R script. Results revealed that k0 was most frequently used as an onset in both orthographic and pronounced phrasal words. Also, aa was the most favored vowel in the Korean syllable peak with fewer phonological processes in its pronounced form. The total proportion of all diphthongs according to the frequency of the peaks in the orthographic phrasal words was 8.8%, which was almost double those found in the pronounced phrasal words. For the codas, nn accounted for 34.4% of the total pronounced phrasal words and was the varied form. From syllable type classification of the Corpus, CV appeared to be the most frequent type followed by CVC, V, and VC from the orthographic forms. Overall, the onsets were more prevalent in the pronunciation more than the codas. From the results, this paper concluded that an analysis of phoneme distribution and phonological processes in light of syllable structure can contribute greatly to the understanding of the phonology of spoken Korean.

Multi-Topic Sentiment Analysis using LDA for Online Review (LDA를 이용한 온라인 리뷰의 다중 토픽별 감성분석 - TripAdvisor 사례를 중심으로 -)

  • Hong, Tae-Ho;Niu, Hanying;Ren, Gang;Park, Ji-Young
    • The Journal of Information Systems
    • /
    • v.27 no.1
    • /
    • pp.89-110
    • /
    • 2018
  • Purpose There is much information in customer reviews, but finding key information in many texts is not easy. Business decision makers need a model to solve this problem. In this study we propose a multi-topic sentiment analysis approach using Latent Dirichlet Allocation (LDA) for user-generated contents (UGC). Design/methodology/approach In this paper, we collected a total of 104,039 hotel reviews in seven of the world's top tourist destinations from TripAdvisor (www.tripadvisor.com) and extracted 30 topics related to the hotel from all customer reviews using the LDA model. Six major dimensions (value, cleanliness, rooms, service, location, and sleep quality) were selected from the 30 extracted topics. To analyze data, we employed R language. Findings This study contributes to propose a lexicon-based sentiment analysis approach for the keywords-embedded sentences related to the six dimensions within a review. The performance of the proposed model was evaluated by comparing the sentiment analysis results of each topic with the real attribute ratings provided by the platform. The results show its outperformance, with a high ratio of accuracy and recall. Through our proposed model, it is expected to analyze the customers' sentiments over different topics for those reviews with an absence of the detailed attribute ratings.

Performance Comparison of Python and Scala APIs in Spark Distributed Cluster Computing System (Spark 기반에서 Python과 Scala API의 성능 비교 분석)

  • Ji, Keung-yeup;Kwon, Youngmi
    • Journal of Korea Multimedia Society
    • /
    • v.23 no.2
    • /
    • pp.241-246
    • /
    • 2020
  • Hadoop is a framework to process large data sets in a distributed way across clusters of nodes. It has been a popular platform to process big data, but in recent years, other platforms became competitive ones depending on the characteristics of the application. Spark is one of distributed platforms to enable real-time data processing and improve overall processing performance over Hadoop by introducing in-memory processing instead of disk I/O. Whereas Hadoop is designed to work on Java and data analysis is processed using Java API, Spark provides a variety of APIs with Scala, Python, Java and R. In this paper, the goal is to find out whether the APIs of different programming languages af ect the performances in Spark. We chose two popular APIs: Python and Scala. Python is easy to learn and is used in AI domain in a wide range. Scala is a programming language with advantages of parallelism. Our experiment shows much faster processing with Scala API than Python API. For the performance issues on AI-based analysis, further study is needed.

The remote control method of LED angle based on HTML5 for the increasing coverage of LED light in uninhabited facilities (무인시설물에서 LED조명의 커버리지 향상을 위한 HTML5기반의 LED 지향각 원격제어 방식)

  • Kim, Daeho;Yang, Seungyoun;Lee, Seungyoun;Cha, Jaesang
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2012.11a
    • /
    • pp.256-258
    • /
    • 2012
  • 최근 LED를 이용한 IT 기술이 많은 관심을 받고 있으며, 특히 무인시설물에서는 효율적으로 LED를 관리해야하기 때문에 통신 및 조명 제어기술에 대한 연구가 활발히 진행 중에 있다. 그러나 LED 특성상 FOI(Filed of Illumination)의 제약이 따르기 때문에 서비스의 범위와 커버리지가 제한적이라는 한계를 가지며, LED 제어의 경우 최근 스마트단말을 이용한 기술이 많이 보급되고 있으나 수많은 스마트단말의 OS 플랫폼에 종속적으로 프로그램을해야만 하는 번거로움이 있기 때문에 개발 비용적인 측면에서도 매우 비효율적이다. 따라서 본 논문에서는 LED 기반 IT기술의 서비스 이용범위를 유연하게 넓힐 수 있는 LED 지향각의 원격제어 기술을 제안하고, 단말환경에 독립적으로 효과적인 LED제어를 가능하게 하는 차세대 웹표준인 HTML5(Hypertext Mark-up Language 5)기반의 시스템을 제안하였으며, 제안 기술의 핵심 프로그램을 제시하고 시뮬레이션을 수행함으로써 유용성을 입증하였다.

  • PDF

A Study on the On-Line Fault Diagnosis and Restorative Control Expert System for a Substation Simulator (변전소 시뮬레이터의 온라인 고장진단 및 복구제어 전문가 시스템에 대한 연구)

  • Lee, H.J.;Chung, S.J.;Lim, C.H.;Cho, K.R.;Shin, H.S.
    • Proceedings of the KIEE Conference
    • /
    • 1999.11b
    • /
    • pp.110-112
    • /
    • 1999
  • Recently the substation automation is actively researched at each country. Since the substation automation is an integrated system that consist of electrical, electronic and computer technologies, performance evaluation is very important to inspect the developed system's applicable possibility for real power system. Most studies are verified using computer simulation, because it is hard to apply them to real power systems. Therefore development of a substation simulator is necessary for the performance evaluation of many application systems such as operator aid system. This paper introduces a substation simulator. An intelligent fault diagnosis and restorative control expert system is also introduced UDP/IP is applied as a protocol for data transport between expert system and SC(station computer). As to the Graphic User Interface, C++ Language and Visual Basic is used in the Windows NT operating system together with four Pentium II systems.

  • PDF

Evaluation of Recent Data Processing Strategies on Q-TOF LC/MS Based Untargeted Metabolomics

  • Kaplan, Ozan;Celebier, Mustafa
    • Mass Spectrometry Letters
    • /
    • v.11 no.1
    • /
    • pp.1-5
    • /
    • 2020
  • In this study, some of the recently reported data processing strategies were evaluated and modified based on their capabilities and a brief workflow for data mining was redefined for Q-TOF LC-MS based untargeted metabolomics. Commercial pooled human plasma samples were used for this purpose. An ultrafiltration procedure was applied on sample preparation. Sample set was analyzed through Q-TOF LC/MS. A C18 column (Agilent Zorbax 1.8 µM, 50 × 2.1 mm) was used for chromatographic separation. Raw chromatograms were processed using XCMS - R programming language edition and Isotopologue Parameter Optimization (IPO) was used to optimize XCMS parameters. The raw XCMS table was processed using MS Excel to find reliable and reproducible peaks. Totally 1650 reliable and reproducible potential metabolite peaks were found based on the data processing procedures given in this paper. The redefined dataset was upload into MetaboAnalyst platform and the identified metabolites were matched with 86 metabolic pathways. Thus, two list were obtained and presented in this study as supplement files. The first list is to present the retention times and m/z values of detected metabolite peaks. The second list is the metabolic pathways related with the identified metabolites. The briefly described data processing strategies and dataset presented in this study could be beneficial for the researchers working on untargeted metabolomics for processing their data and validating their results.

Examining the relationship between educational effectiveness and computational thinking in smart learning environment

  • Han, Oakyoung;Kim, Jaehyoun
    • Journal of Internet Computing and Services
    • /
    • v.19 no.2
    • /
    • pp.57-67
    • /
    • 2018
  • The $4^{th}$ industrial revolution has brought innovation in the educational environment. The purpose of this study is to verify the educational effectiveness of smart learning environment especially with the computational thinking. A big data analysis was performed to confirm that computational thinking is the one to prepare the 4th industrial revolution. To teach computational thinking at university, educational design should be careful. This study verified the relationship between improvement of computational thinking ability and major of students with coding education. There was difference in effectiveness of the coding education depending on the major of students, it means students must be guaranteed to be educated by the differentiated coding education for different major. This study extracted factors of computational thinking through literature review. Thirteen research hypotheses were applied for the statistical analysis in R language. It was proved that expectation of class and improvement of abstraction ability and algorithmic thinking ability had mediation effect to the relationship between knowledge acquisition and problem-solving abilities. Based on this study, effectiveness of education can be improved, and it will lead to produce a lot of distinguished students who are ready for the 4th industrial revolution.

Robust Image Similarity Measurement based on MR Physical Information

  • Eun, Sung-Jong;Jung, Eun-Young;Park, Dong Kyun;Whangbo, Taeg-Keun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.9
    • /
    • pp.4461-4475
    • /
    • 2017
  • Recently, introduction of the hospital information system has remarkably improved the efficiency of health care services within hospitals. Due to improvement of the hospital information system, the issue of integration of medical information has emerged, and attempts to achieve it have been made. However, as a preceding step for integration of medical information, the problem of searching the same patient should be solved first, and studies on patient identification algorithm are required. As a typical case, similarity can be calculated through MPI (Master Patient Index) module, by comparing various fields such as patient's basic information and treatment information, etc. but it has many problems including the language system not suitable to Korean, estimation of an optimal weight by field, etc. This paper proposes a method searching the same patient using MRI information besides patient's field information as a supplementary method to increase the accuracy of matching algorithm such as MPI, etc. Unlike existing methods only using image information, upon identifying a patient, a highest weight was given to physical information of medical image and set as an unchangeable unique value, and as a result a high accuracy was detected. We aim to use the similarity measurement result as secondary measures in identifying a patient in the future.

Sentiment Analysis of the Quotations of Intensive Care Unit Survivors in Qualitative Studies (질적연구 진술문을 이용한 중환자실 생존자의 감성분석)

  • Kang, Jiyeon
    • Journal of Korean Critical Care Nursing
    • /
    • v.11 no.1
    • /
    • pp.1-14
    • /
    • 2018
  • Purpose : As the intensive care unit (ICU) survival rate increases, interest in the lives of ICU survivors has also been increasing. The purpose of this study was to identify the sentiment of ICU survivors. Method : The author analyzed the quotations from previous qualitative studies related to ICU survivors; a total of 1,074 sentences comprising 429 quotations from 25 relevant studies were analyzed. A word cloud created in the R program was utilized to identify the most frequent adjectives used, and sentiment and emotional scores were calculated using the Artificial Intelligence (AI) program. Results : The 10 adjectives that appeared the most in the quotations were 'difficult', 'different', 'normal', 'able', 'hard', 'bad', 'ill', 'better', 'weak', and 'afraid', in order of decreasing occurrence. The mean sentiment score was negative ($-.31{\pm}.23$), and the three emotions with the highest score were 'sadness'($.52{\pm}.13$), 'joy'($.35{\pm}.22$), and 'fear'($.30{\pm}.25$). Conclusion : The natural language processing of AI used in this study is a relatively new method. As such, it is necessary to refine the methodology through repeated research in various nursing fields. In addition, further studies on nursing interventions that improve the coherency of ICU memory of survivors and familial support for the ICU survivors are needed.

ShEx Schema Generator for RDF Graphs Created by Direct Mapping

  • Choi, Ji-Woong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.23 no.10
    • /
    • pp.33-43
    • /
    • 2018
  • In this paper, we propose a method to automatically generate the description of an RDF graph structure. The description is expressed in Shape Expression Language (ShEx), which is developed by W3C and provides the syntax for describing the structure of RDF data. The RDF graphs to which this method can be applied are limited to those generated by the direct mapping, which is an algorithm for transforming relational data into RDF by W3C. A relational database consists of its schema including integrity constraints and its instance data. While the instance data can have been published in RDF by some standard methods such as the direct mapping, the translation of the schema has been missing so far. Unlike the users on relational databases, the ones on RDF datasets were forced to write repeated vague SPARQL queries over the datasets to acquire the exact results. This is because the schema for RDF data has not been provided to the users. The ShEx documents generated by our method can be referred as the schema on writing SPARQL queries. They also can validate data on RDF graph update operations with ShEx validators. In other words, they can work as the integrity constraints in relational databases.