• Title/Summary/Keyword: Language Models

Search Result 990, Processing Time 0.028 seconds

Improving the Classification of Population and Housing Census with AI: An Industry and Job Code Study

  • Byung-Il Yun;Dahye Kim;Young-Jin Kim;Medard Edmund Mswahili;Young-Seob Jeong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.4
    • /
    • pp.21-29
    • /
    • 2023
  • In this paper, we propose an AI-based system for automatically classifying industry and occupation codes in the population census. The accurate classification of industry and occupation codes is crucial for informing policy decisions, allocating resources, and conducting research. However, this task has traditionally been performed by human coders, which is time-consuming, resource-intensive, and prone to errors. Our system represents a significant improvement over the existing rule-based system used by the statistics agency, which relies on user-entered data for code classification. In this paper, we trained and evaluated several models, and developed an ensemble model that achieved an 86.76% match accuracy in industry and 81.84% in occupation, outperforming the best individual model. Additionally, we propose process improvement work based on the classification probability results of the model. Our proposed method utilizes an ensemble model that combines transfer learning techniques with pre-trained models. In this paper, we demonstrate the potential for AI-based systems to improve the accuracy and efficiency of population census data classification. By automating this process with AI, we can achieve more accurate and consistent results while reducing the workload on agency staff.

A Study on the Fast Enrollment of Text-Independent Speaker Verification for Vehicle Security (차량 보안을 위한 어구독립 화자증명의 등록시간 단축에 관한 연구)

  • Lee, Tae-Seung;Choi, Ho-Jin
    • Journal of Advanced Navigation Technology
    • /
    • v.5 no.1
    • /
    • pp.1-10
    • /
    • 2001
  • Speech has a good characteristics of which car drivers busy to concern with miscellaneous operation can make use in convenient handling and manipulating of devices. By utilizing this, this works proposes a speaker verification method for protecting cars from being stolen and identifying a person trying to access critical on-line services. In this, continuant phonemes recognition which uses language information of speech and MLP(mult-layer perceptron) which has some advantages against previous stochastic methods are adopted. The recognition method, though, involves huge computation amount for learning, so it is somewhat difficult to adopt this in speaker verification application in which speakers should enroll themselves at real time. To relieve this problem, this works presents a solution that introduces speaker cohort models from speaker verification score normalization technique established before, dividing background speakers into small cohorts in advance. As a result, this enables computation burden to be reduced through classifying the enrolling speaker into one of those cohorts and going through enrollment for only that cohort.

  • PDF

Designing a Repository Independent Model for Mining and Analyzing Heterogeneous Bug Tracking Systems (다형의 버그 추적 시스템 마이닝 및 분석을 위한 저장소 독립 모델 설계)

  • Lee, Jae-Kwon;Jung, Woo-Sung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.9
    • /
    • pp.103-115
    • /
    • 2014
  • In this paper, we propose UniBAS(Unified Bug Analysis System) to provide a unified repository model by integrating the extracted data from the heterogeneous bug tracking systems. The UniBAS reduces the cost and complexity of the MSR(Mining Software Repositories) research process and enables the researchers to focus on their logics rather than the tedious and repeated works such as extracting repositories, processing data and building analysis models. Additionally, the system not only extracts the data but also automatically generates database tables, views and stored procedures which are required for the researchers to perform query-based analysis easily. It can also generate various types of exported files for utilizing external analysis tools or managing research data. A case study of detecting duplicate bug reports from the Firfox project of the Mozilla site has been performed based on the UniBAS in order to evaluate the usefulness of the system. The results of the experiments with various algorithms of natural language processing and flexible querying to the automatically extracted data also showed the effectiveness of the proposed system.

ACT-R Predictive Model of Korean Text Entry on Touchscreen

  • Lim, Soo-Yong;Jo, Seong-Sik;Myung, Ro-Hae;Kim, Sang-Hyeob;Jang, Eun-Hye;Park, Byoung-Jun
    • Journal of the Ergonomics Society of Korea
    • /
    • v.31 no.2
    • /
    • pp.291-298
    • /
    • 2012
  • Objective: The aim of this study is to predict Korean text entry on touchscreens using ACT-R cognitive architecture. Background: Touchscreen application in devices such as satellite navigation devices, PDAs, mobile phones, etc. has been increasing, and the market size is expanding. Accordingly, there is an increasing interest to develop and evaluate the interface to enhance the user experience and increase satisfaction in the touchscreen environment. Method: In this study, Korean text entry performance in the touchscreen environment was analyzed using ACT-R. The ACT-R model considering the characteristics of the Korean language which is composed of vowels and consonants was established. Further, this study analyzed if the prediction of Korean text entry is possible through the ACT-R cognitive model. Results: In the analysis results, no significant difference on performance time between model prediction and empirical data was found. Conclusion: The proposed model can predict the accurate physical movement time as well as cognitive processing time. Application: This study is useful in conducting model-based evaluation on the text entry interface of the touchscreen and enabled quantitative and effective evaluation on the diverse types of Korean text input interfaces through the cognitive models.

Automated Scoring of Scientific Argumentation Using Expert Morpheme Classification Approaches (전문가의 형태소 분류를 활용한 과학 논증 자동 채점)

  • Lee, Manhyoung;Ryu, Suna
    • Journal of The Korean Association For Science Education
    • /
    • v.40 no.3
    • /
    • pp.321-336
    • /
    • 2020
  • We explore automated scoring models of scientific argumentation. We consider how a new analytical approach using a machine learning technique may enhance the understanding of spoken argumentation in the classroom. We sampled 2,605 utterances that occurred during a high school student's science class on molecular structure and classified the utterances into five argumentative elements. Next, we performed Text Preprocessing for the classified utterances. As machine learning techniques, we applied support vector machines, decision tree, random forest, and artificial neural network. For enhancing the identification of rebuttal elements, we used a heuristic feature-engineering method that applies experts' classification of morphemes of scientific argumentation.

A Self-regulated Learning Model Development in Computer Programming Education (컴퓨터 프로그램 교육에서 자기조절 학습 모델 개발)

  • Kim, Kapsu
    • Journal of The Korean Association of Information Education
    • /
    • v.19 no.1
    • /
    • pp.21-30
    • /
    • 2015
  • Information and knowledge society in the 21st century computer education is very important. Computer programming education in computer education is very important. There are very few teaching and learning model of computer programming education. In this paper, we develop a self-regulated learning model for students to be self-regulated learning. In this study, we propose self-regulated learning elements, a self-regulated learning steps and self-regulated learning modele. Self-regulated learning elements are task level, generalized level, and efficiency level. Self-regulated learning phases are problem understanding, design, and coding, testing, and maintenance. Self-regulated learning models are to copy, to modify, create, and to challenge. The results of this study are as follows. At Correlations between learning elements and achievement, generalized level, and efficiency level are higher than the task level. At Correlations between learning and achievement, Understanding and design stages are higher than the other stages. At Correlations between learning model and achievement, to transform, to create, and to challenge are higher than to copy.

An Instantaneous Integer Ambiguity Resolution for GPS Real-Time Structure Monitoring (GPS 실시간 구조물 모니터링을 위한 반송파 관측데이터 순간미지정수 결정)

  • Lee, Hungkyu
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.34 no.1
    • /
    • pp.341-353
    • /
    • 2014
  • In order to deliver a centimeter-level kinematic positioning solution with GPS carrier-phase measurements, it is prerequisite to use correctly resolved integer ambiguities. Based on the mathematical modeling of GPS network with application of its geometrical constraints, this research has investigated an instantaneous ambiguity resolution procedure for the so-called 'integer constrained least-squares' technique which can be effectively implemented in real-time structure monitoring. In this process, algorithms of quality control for the float solutions and hypothesis tests using the constrained baseline for the ambiguity validation are included to enhance reliability of the solutions. The proposed procedure has been implemented by MATLAB, the language of technical computing, and processed field trial data obtained at a cable-stayed bridge to access its real-world applicability. The results are summarized in terms of ambiguity successful rates, impact of the stochastical models, and computation time to demonstrate performance of the instantaneous ambiguity resolution proposed.

Design and Implementation of Synchronization Unit for AeroMACS System (AeroMACS 시스템을 위한 동기화기 설계)

  • Jang, Soohyun;Lee, Eunsang;Jung, Yunho
    • Journal of Advanced Navigation Technology
    • /
    • v.18 no.2
    • /
    • pp.142-150
    • /
    • 2014
  • In this paper, the performance analysis results of time/frequency synchronization and cell search algorithm are presented for aeronautical mobile airport communication systems (AeroMACS). AeroMACS is based on IEEE 802.16e mobile WiMAX standard and uses the aeronautical frequency band of 5GHz with the bandwidth of 5MHz. The simulation model of AeroMACS is designed and the performance evaluation is conducted with the various airport channel models such as apron (APR), runway (RWY), taxiway (TWY), and park (PRK). The proposed synchronization unit was designed in hardware description language (HDL) and implemented on FPGA. Also, the real-time operation was verified and evaluated using FPGA test system.

Development of a String Injection Vulnerability Analyzer for Web Application Programs (웹 응용 프로그램의 문자열 삽입 보안 취약성 분석기 개발)

  • Ahn, Joon-Seon;Kim, Yeong-Min;Jo, Jang-Wu
    • The KIPS Transactions:PartA
    • /
    • v.15A no.3
    • /
    • pp.181-188
    • /
    • 2008
  • Nowadays, most web sites are developed using dynamic web pages where web pages are generated and transmitted by web application programs. Therefore, the ratio of attacks injecting malevolent strings to vulnerable web applications is increasing. In this paper, we present a static program analyzer which analyzes whether a web application program has vulnerabilities to the SQL injection attack and the cross site scripting(XSS) attack. To analyze programs using abstract interpretation framework, we designed an abstract domain which models potential string set along with excluded strings and developed an abstract interpreter for the PHP language. Also, based on them, we implemented a static analyzer. According to our experiments, our analyzer has competitive analysis speed and accuracy compared with related research results.

Analysis on Gifted Class in Mathematics using Flanders Category System (Flanders 언어상호작용 분석법을 활용한 수학영재 수업 분석)

  • Lee, Yoon-Gyeong;Lee, Joong-Kweon
    • The Journal of the Korea Contents Association
    • /
    • v.14 no.5
    • /
    • pp.512-523
    • /
    • 2014
  • The purpose of this study is to provide useful information for improving interaction between teacher and student by analysing gifted class in mathematics with the Flanders Category System. Research questions are as follow. In gifted class in mathematics, How is the result of analysis regarding interactions between the teacher and students, according to 1) Flanders' Coding system? 2) Flanders' language pattern? 3) Flanders' Index system? For this, 3 gifted classes in mathematics were recorded by video camera and analyzed by Advanced Flanders(AF) analysis program version 3.54. Results are as follow. 1) Code Category Analysis mostly consists of lecture, voluntary speaking and chaos, silence work. 2) Most class patterns are not in accordance with effective class pattern models. So teacher needs to accept student's opinion actively and give appropriate feedback. 3) In Indices Results, revised I/d ratio, teacher's question ratio, student's speaking ratio, Student question and wide answer ratio are higher than analysis standard, indirect ratio is lower than analysis standard.