• Title/Summary/Keyword: 벤치마크 기법

Search Result 219, Processing Time 0.025 seconds

Evaluation of Large Language Models' Korean-Text to SQL Capability (대형 언어 모델의 한국어 Text-to-SQL 변환 능력 평가)

  • Jooyoung Choi;Kyungkoo Min;Myoseop Sim;Haemin Jung;Minjun Park;Stanley Jungkyu Choi
    • Annual Conference on Human and Language Technology
    • /
    • 2023.10a
    • /
    • pp.171-176
    • /
    • 2023
  • 최근 등장한 대규모 데이터로 사전학습된 자연어 생성 모델들은 대화 능력 및 코드 생성 태스크등에서 인상적인 성능을 보여주고 있어, 본 논문에서는 대형 언어 모델 (LLM)의 한국어 질문을 SQL 쿼리 (Text-to-SQL) 변환하는 성능을 평가하고자 한다. 먼저, 영어 Text-to-SQL 벤치마크 데이터셋을 활용하여 영어 질의문을 한국어 질의문으로 번역하여 한국어 Text-to-SQL 데이터셋으로 만들었다. 대형 생성형 모델 (GPT-3 davinci, GPT-3 turbo) 의 few-shot 세팅에서 성능 평가를 진행하며, fine-tuning 없이도 대형 언어 모델들의 경쟁력있는 한국어 Text-to-SQL 변환 성능을 확인한다. 또한, 에러 분석을 수행하여 한국어 문장을 데이터베이스 쿼리문으로 변환하는 과정에서 발생하는 다양한 문제와 프롬프트 기법을 활용한 가능한 해결책을 제시한다.

  • PDF

Potential Races Detection in Shared-Memory Programs with Internal Nondeterminism (내부적 비결정성을 가진 공유 메모리 프로그램의 잠재적 경합 탐지)

  • Jung, Min-Sub;Kim, Young-Joo;Ha, Ok-Kyoon;Jun, Yong-Kee
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2008.05a
    • /
    • pp.553-556
    • /
    • 2008
  • 임계구역을 가진 공유 메모리 기반의 병렬 프로그램에서 발생하는 경합은 프로그래머가 의도하지 않은 비결정적인 수행 결과를 초래하므로 반드시 디버깅해야 한다. 이러한 경합을 수행 중에 탐지하는 기존의 기법들은 임계구역의 실행순서에 의해서 발생하는 내부적 비결정성이 존재하지 않는 프로그램에 대해서만 경합의 존재를 검증할 수 있다. 본 논문에서는 내부적 비결정성을 가진 프로그램에 존재하는 비결정적 접근사건을 정적으로 분석하고, 이 정보를 이용하여 수행 중에 경합을 탐지함으로써 잠재되어 있는 경합까지 탐지할 수 있는 도구를 제안한다. 제안한 도구는 비결정성이 포함된 합성프로그램과 공인된 OpenMP 벤치마크 프로그램인 Microbenchmark를 이용하여 경합 검증이 가능함을 보인다.

Improving Performance of Human Action Recognition on Accelerometer Data (가속도 센서 데이터 기반의 행동 인식 모델 성능 향상 기법)

  • Nam, Jung-Woo;Kim, Jin-Heon
    • Journal of IKEEE
    • /
    • v.24 no.2
    • /
    • pp.523-528
    • /
    • 2020
  • With a widespread of sensor-rich mobile devices, the analysis of human activities becomes more general and simpler than ever before. In this paper, we propose two deep neural networks that efficiently and accurately perform human activity recognition (HAR) using tri-axial accelerometers. In combination with powerful modern deep learning techniques like batch normalization and LSTM networks, our model outperforms baseline approaches and establishes state-of-the-art results on WISDM dataset.

User Behavior Based Web Attack Detection in the Face of Camouflage (정상 사용자로 위장한 웹 공격 탐지 목적의 사용자 행위 분석 기법)

  • Shin, MinSik;Kwon, Taekyoung
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.31 no.3
    • /
    • pp.365-371
    • /
    • 2021
  • With the rapid growth in Internet users, web applications are becoming the main target of hackers. Most previous WAFs (Web Application Firewalls) target every single HTTP request packet rather than the overall behavior of the attacker, and are known to be difficult to detect new types of attacks. In this paper, we propose a web attack detection system based on user behavior using machine learning to detect attacks of unknown patterns. In order to define user behavior, we focus on features excluding areas where an attacker can camouflage as a normal user. The experimental results shows that by using the path and query information to define users' behaviors, best results for an accuracy of 99% with Decision forest.

Robust Head Pose Estimation for Masked Face Image via Data Augmentation (데이터 증강을 통한 마스크 착용 얼굴 이미지에 강인한 얼굴 자세추정)

  • Kyeongtak, Han;Sungeun, Hong
    • Journal of Broadcast Engineering
    • /
    • v.27 no.6
    • /
    • pp.944-947
    • /
    • 2022
  • Due to the coronavirus pandemic, the wearing of a mask has been increasing worldwide; thus, the importance of image analysis on masked face images has become essential. Although head pose estimation can be applied to various face-related applications including driver attention, face frontalization, and gaze detection, few studies have been conducted to address the performance degradation caused by masked faces. This study proposes a new data augmentation that synthesizes the masked face, depending on the face image size and poses, which shows robust performance on BIWI benchmark dataset regardless of mask-wearing. Since the proposed scheme is not limited to the specific model, it can be utilized in various head pose estimation models.

A Wavelet-based Blind Watermarking Scheme Using Pixel Correlation of Low Sub-band (저주파 대역의 픽셀 상관도를 이용한 웨이블릿 기반 블라인드 워터마킹 기법)

  • Yoo, Kil-Sang;Jahng, Sung-Gahb;Lee, Won-Hyung
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.9C
    • /
    • pp.1298-1305
    • /
    • 2004
  • Most watermarking techniques embed watermarks in the middle frequency range for robustness and invisibility. In our proposed watermarking algorithms embed the gaussian sequence watermark into low frequency area of the wavelet transform domain because the histogram of low sub-band area is composed by similar coefficients. Also, our proposed scheme doesn't need the original image in extraction procedure The experimental results show good robustness against the Check Mark benchmarking tools.

Improved Simulated-Annealing Technique for Sequence-Pair based Floorplan (Sequence-Pair 기반의 플로어플랜을 위한 개선된 Simulated-Annealing 기법)

  • Sung, Young-Tae;Hur, Sung-Woo
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.4
    • /
    • pp.28-36
    • /
    • 2009
  • Sequence-Pair(SP) model represents the topological relation between modules. In general, SP model based floorplanners search solutions using Simulated-Annealing(SA) algorithm. Several SA based floorplanning techniques using SP model have been published. To improve the performance of those techniques they tried to improve the speed for evaluation function for SP model, to find better scheduling methods and perturb functions for SA. In this paper we propose a two phase SA based algorithm. In the first phase, white space between modules is reduced by applying compaction technique to the floorplan obtained by an SP. From the compacted floorplan, the corresponding SP is determined. Solution space has been searched by changing the SP in the SA framework. When solutions converge to some threshold value, the first phase of the SA based search stops. Then using the typical SA based algorithm, ie, without using the compaction technique, the second phase of our algorithm continues to find optimal solutions. Experimental results with MCNC benchmark circuits show that how the proposed technique affects to the procedure for SA based floorplainning algorithm and that the results obtained by our technique is better than those obtained by existing SA-based algorithms.

A Multistriped Checkpointing Scheme for the Fault-tolerant Cluster Computers (다중 분할된 구조를 가지는 클러스터 검사점 저장 기법)

  • Chang, Yun-Seok
    • The KIPS Transactions:PartA
    • /
    • v.13A no.7 s.104
    • /
    • pp.607-614
    • /
    • 2006
  • The checkpointing schemes should reduce the process delay through managing the checkpoints of each node to fit the network load to enhance the performance of the process running on the cluster system that write the checkpoints into its global stable storage. For this reason, a cluster system with single IO space on a distributed RAID chooses a suitable checkpointng scheme to get the maximum IO performance and the best rollback recovery efficiency. In this paper, we improved the striped checkpointing scheme with dynamic stripe group size by adapting to the network bandwidth variation at the point of checkpointing. To analyze the performance of the multi striped checkpointing scheme, we applied Linpack HPC benchmark with MPI on our own cluster system with maximum 512 virtual nodes. The benchmark results showed that the multistriped checkpointing scheme has better performance than the striped checkpointing scheme on the checkpoint writing efficiency and rollback recovery at heavy system load.

A Representative Pattern Generation Algorithm Based on Evaluation And Selection (평가와 선택기법에 기반한 대표패턴 생성 알고리즘)

  • Yih, Hyeong-Il
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.3
    • /
    • pp.139-147
    • /
    • 2009
  • The memory based reasoning just stores in the memory in the form of the training pattern of the representative pattern. And it classifies through the distance calculation with the test pattern. Because it uses the techniques which stores the training pattern whole in the memory or in which it replaces training patterns with the representative pattern. Due to this, the memory in which it is a lot for the other machine learning techniques is required. And as the moreover stored training pattern increases, the time required for a classification is very much required. In this paper, We propose the EAS(Evaluation And Selection) algorithm in order to minimize memory usage and to improve classification performance. After partitioning the training space, this evaluates each partitioned space as MDL and PM method. The partitioned space in which the evaluation result is most excellent makes into the representative pattern. Remainder partitioned spaces again partitions and repeat the evaluation. We verify the performance of Proposed algorithm using benchmark data sets from UCI Machine Learning Repository.

Hybrid Neural Network Clustering Using SOM and BP for DataMing (데이터 마이닝을 위한 신경망 클러스터링 기법에 관한 연구)

  • 김만선;이상용
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2001.10b
    • /
    • pp.160-162
    • /
    • 2001
  • 최근 대용량의 데이터베이스로부터 유용한 정보를 발견하고 데이터간에 존재하는 연관성을 탐색하고 분석하는 데이터 마이닝에 관한 많은 연구들이 진행되고 있다. 실제 응용분야에선 수집된 데이터는 시간이 지날수록 데이터의 양이 늘어나게 되고, 중복되는 속성과 잡음을 갖게 되어 마이닝 기법을 이용하는데 많은 시간과 비용이 소요된다. 또한 어느 속성이 중요한지 알 수 없어 중요한 속성이 중요하지 않은 속성에 의해 왜곡되거나 제대로 분석되지 않을 수 있다. 이 논문은 이러한 문제점들을 해결하기 위해, 대용량의 데이터에 적용할 수 있고 데이터에서 알려지지 않은 패턴을 발견할 뿐만 아니라, 사용자가 얻고자 하는 출력을 생성할 수 있는 혼합형 신경망 클러스터링 기법을 제안한다. 그리고 알고리즘의 타당성을 검증하기 위해 몇 가지 벤치마크데이터를 이용하여 본 논문의 타당성을 보인다.

  • PDF