• 제목/요약/키워드: sequence-to-sequence model

검색결과 1,626건 처리시간 0.029초

Deep Learning-based Delinquent Taxpayer Prediction: A Scientific Administrative Approach

  • YongHyun Lee;Eunchan Kim
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제18권1호
    • /
    • pp.30-45
    • /
    • 2024
  • This study introduces an effective method for predicting individual local tax delinquencies using prevalent machine learning and deep learning algorithms. The evaluation of credit risk holds great significance in the financial realm, impacting both companies and individuals. While credit risk prediction has been explored using statistical and machine learning techniques, their application to tax arrears prediction remains underexplored. We forecast individual local tax defaults in Republic of Korea using machine and deep learning algorithms, including convolutional neural networks (CNN), long short-term memory (LSTM), and sequence-to-sequence (seq2seq). Our model incorporates diverse credit and public information like loan history, delinquency records, credit card usage, and public taxation data, offering richer insights than prior studies. The results highlight the superior predictive accuracy of the CNN model. Anticipating local tax arrears more effectively could lead to efficient allocation of administrative resources. By leveraging advanced machine learning, this research offers a promising avenue for refining tax collection strategies and resource management.

원심모형실험에 의한 방파제의 수평변위 거동에 관한 연구 (A Study on Behavior of the Lateral Movement of Breakwater by Centrifuge model Experiments)

  • 이동원;김동건;전상현;유남재
    • 한국지반공학회:학술대회논문집
    • /
    • 한국지반공학회 2010년도 춘계 학술발표회
    • /
    • pp.1473-1478
    • /
    • 2010
  • For the cassion type of breakwater under the condition of large wave loads, stability about lateral movement of breakwater was investigated by performing centrifuge model experiments. Prototype of breakwater was modelled by scaling down to centrifuge model and the soft ground reinforced with grouting was also reconstructed in the centrifuge model experiments. Sandy ground beneath breakwater was prepared with a soil sampled in field so that identical value of internal friction angle could be obtained. Centrifuge model experiments were carried out to reconstruct the construction sequence in field. Lateral static wave load was applied to the model caisson after the final stage of construction sequence was rebuilt and the measured lateral movement of caisson was compared with allowable value by the code to assess the stability about lateral movement of the breakwater.

  • PDF

선형 계획법을 이용한 Timing Diagram의 테스트 입력 시퀀스 자동 생성 전략 (Test Input Sequence Generation Strategy for Timing Diagram using Linear Programming)

  • 이홍석;정기현;최경희
    • 정보처리학회논문지D
    • /
    • 제17D권5호
    • /
    • pp.337-346
    • /
    • 2010
  • Timing diagram은 시간에 따른 시스템의 행동을 표현하기 용이하고 표현된 행동을 쉽게 인식할 수 있는다는 장점 때문에 널리 사용되고 있다. Timing diagram으로 기술된 시스템을 테스트 하기 위해서는 여러 기술이 필요하다. 그 중의 하나는 테스트 케이스 목표들이 존재할 때, 시스템 모델이 원하는 상태에 도달하도록 하기 위해 입력 값들의 시퀀스를 생성하는 기술이다. 본 논문은 Timing diagram모델에 대한 테스트 케이스 목표로부터 테스트 입력 시퀀스를 자동으로 생성하는 방법을 제안한다. Timing diagram에서 테스트 입력 시퀀스를 자동으로 생성하기 위해서는 입력 waveform과 시간 제약으로 이루어진 시점의 전이 조건을 만족시키는 적절한 입력 집합을 필요로 한다. 이와 같은 문제를 해결하기 위해, 본 논문에서는 선형 계획법을 이용한 접근 방식을 택하였는데, 해결과정은 다음과 같다. 1) Timing diagram 모델을 입력으로 받아 이를 선형 계획 문제로 변형한다. 2)변형된 선형 계획 문제를 선형 문제 해결 도구를 사용하여 해결한다. 3) 선형 계획 문제의 해답으로부터 Timing diagram모델의 테스트 입력 시퀀스를 생성한다. 본 논문에서는 임의의 Timing diagram 모델에 대해 이를 선형 계획법으로 모델링 하는 방법을 형식적으로 기술하였고, 증명을 통해 본 논문의 접근 방법의 타당성을 보였으며, 또한 도구를 구현하여 Timing diagram 예제 모델로부터 테스트 입력 시퀀스를 생성함으로써 본 논문의 유용성을 입증하였다.

언어 정보가 반영된 문장 점수를 활용하는 삭제 기반 문장 압축 (Deletion-Based Sentence Compression Using Sentence Scoring Reflecting Linguistic Information)

  • 이준범;김소언;박성배
    • 정보처리학회논문지:소프트웨어 및 데이터공학
    • /
    • 제11권3호
    • /
    • pp.125-132
    • /
    • 2022
  • 문장 압축은 원본 문장의 중요한 의미는 유지하면서 길이가 축소된 압축 문장을 생성하는 자연어처리 태스크이다. 문법적으로 적절한 문장 압축을 위해, 초기 연구들은 사람이 정의한 언어 규칙을 활용하였다. 또한 시퀀스-투-시퀀스 모델이 기계 번역과 같은 다양한 자연어처리 태스크에서 좋은 성능을 보이면서, 이를 문장 압축에 활용하고자 하는 연구들도 존재했다. 하지만 언어 규칙을 활용하는 연구의 경우 모든 언어 규칙을 정의하는 데에 큰 비용이 들고, 시퀀스-투-시퀀스 모델 기반 연구의 경우 학습을 위해 대량의 데이터셋이 필요하다는 문제점이 존재한다. 이를 해결할 수 있는 방법으로 사전 학습된 언어 모델인 BERT를 활용하는 문장 압축 모델인 Deleter가 제안되었다. Deleter는 BERT를 통해 계산된 perplexity를 활용하여 문장을 압축하기 때문에 문장 압축 규칙과 모델 학습을 위한 데이터셋이 필요하지 않다는 장점이 있다. 하지만 Deleter는 perplexity만을 고려하여 문장을 압축하기 때문에, 문장에 속한 단어들의 언어 정보를 반영하여 문장을 압축하지 못한다. 또한, perplexity 측정을 위한 BERT의 사전 학습에 사용된 데이터가 압축 문장과 거리가 있어, 이를 통해 측정된 perplexity가 잘못된 문장 압축을 유도할 수 있다는 문제점이 있다. 이를 해결하기 위해 본 논문은 언어 정보의 중요도를 수치화하여 perplexity 기반의 문장 점수 계산에 반영하는 방법을 제안한다. 또한 고유명사가 자주 포함되어 있으며, 불필요한 수식어가 생략되는 경우가 많은 뉴스 기사 말뭉치로 BERT를 fine-tuning하여 문장 압축에 적절한 perplexity를 측정할 수 있도록 하였다. 영어 및 한국어 데이터에 대한 성능 평가를 위해 본 논문에서 제안하는 LI-Deleter와 비교 모델의 문장 압축 성능을 비교 실험을 진행하였고, 높은 문장 압축 성능을 보임을 확인하였다.

Computational Approaches for Structural and Functional Genomics

  • Brenner, Steven-E.
    • 한국생물정보학회:학술대회논문집
    • /
    • 한국생물정보시스템생물학회 2000년도 International Symposium on Bioinformatics
    • /
    • pp.17-20
    • /
    • 2000
  • Structural genomics aims to provide a good experimental structure or computational model of every tractable protein in a complete genome. Underlying this goal is the immense value of protein structure, especially in permitting recognition of distant evolutionary relationships for proteins whose sequence analysis has failed to find any significant homolog. A considerable fraction of the genes in all sequenced genomes have no known function, and structure determination provides a direct means of revealing homology that may be used to infer their putative molecular function. The solved structures will be similarly useful for elucidating the biochemical or biophysical role of proteins that have been previously ascribed only phenotypic functions. More generally, knowledge of an increasingly complete repertoire of protein structures will aid structure prediction methods, improve understanding of protein structure, and ultimately lend insight into molecular interactions and pathways. We use computational methods to select families whose structures cannot be predicted and which are likely to be amenable to experimental characterization. Methods to be employed included modern sequence analysis and clustering algorithms. A critical component is consultation of the presage database for structural genomics, which records the community's experimental work underway and computational predictions. The protein families are ranked according to several criteria including taxonomic diversity and known functional information. Individual proteins, often homologs from hyperthermophiles, are selected from these families as targets for structure determination. The solved structures are examined for structural similarity to other proteins of known structure. Homologous proteins in sequence databases are computationally modeled, to provide a resource of protein structure models complementing the experimentally solved protein structures.

  • PDF

Text Summarization on Large-scale Vietnamese Datasets

  • Ti-Hon, Nguyen;Thanh-Nghi, Do
    • Journal of information and communication convergence engineering
    • /
    • 제20권4호
    • /
    • pp.309-316
    • /
    • 2022
  • This investigation is aimed at automatic text summarization on large-scale Vietnamese datasets. Vietnamese articles were collected from newspaper websites and plain text was extracted to build the dataset, that included 1,101,101 documents. Next, a new single-document extractive text summarization model was proposed to evaluate this dataset. In this summary model, the k-means algorithm is used to cluster the sentences of the input document using different text representations, such as BoW (bag-of-words), TF-IDF (term frequency - inverse document frequency), Word2Vec (Word-to-vector), Glove, and FastText. The summary algorithm then uses the trained k-means model to rank the candidate sentences and create a summary with the highest-ranked sentences. The empirical results of the F1-score achieved 51.91% ROUGE-1, 18.77% ROUGE-2 and 29.72% ROUGE-L, compared to 52.33% ROUGE-1, 16.17% ROUGE-2, and 33.09% ROUGE-L performed using a competitive abstractive model. The advantage of the proposed model is that it can perform well with O(n,k,p) = O(n(k+2/p)) + O(nlog2n) + O(np) + O(nk2) + O(k) time complexity.

Real-Time 2D-to-3D Conversion for 3DTV using Time-Coherent Depth-Map Generation Method

  • Nam, Seung-Woo;Kim, Hye-Sun;Ban, Yun-Ji;Chien, Sung-Il
    • International Journal of Contents
    • /
    • 제10권3호
    • /
    • pp.9-16
    • /
    • 2014
  • Depth-image-based rendering is generally used in real-time 2D-to-3D conversion for 3DTV. However, inaccurate depth maps cause flickering issues between image frames in a video sequence, resulting in eye fatigue while viewing 3DTV. To resolve this flickering issue, we propose a new 2D-to-3D conversion scheme based on fast and robust depth-map generation from a 2D video sequence. The proposed depth-map generation algorithm divides an input video sequence into several cuts using a color histogram. The initial depth of each cut is assigned based on a hypothesized depth-gradient model. The initial depth map of the current frame is refined using color and motion information. Thereafter, the depth map of the next frame is updated using the difference image to reduce depth flickering. The experimental results confirm that the proposed scheme performs real-time 2D-to-3D conversions effectively and reduces human eye fatigue.

준비시간이 있는 혼합모델 조립라인의 제품투입순서 결정 : Tabu Search 기법 적용 (Sequencing in Mixed Model Assembly Lines with Setup Time : A Tabu Search Approach)

  • 김여근;현철주
    • 한국경영과학회지
    • /
    • 제13권1호
    • /
    • pp.13-13
    • /
    • 1988
  • This paper considers the sequencing problem in mixed model assembly lines with hybrid workstation types and sequence-dependent setup times. Computation time is often a critical factor in choosing a method of determining the sequence. We develop a mathematical formulation of the problem to minimize the overall length of a line, and present a tabu search technique which can provide a near optimal solution in real time. The proposed technique is compared with a genetic algorithm and a branch-and-bound method. Experimental results are reported to demonstrate the efficiency of the technique.

SOM과 PRL을 이용한 고유얼굴 기반의 머리동작 인식방법 (A Head Gesture Recognition Method based on Eigenfaces using SOM and PRL)

  • 이우진;구자영
    • 한국정보처리학회논문지
    • /
    • 제7권3호
    • /
    • pp.971-976
    • /
    • 2000
  • In this paper a new method for head gesture recognition is proposed. A the first stage, face image data are transformed into low dimensional vectors by principal component analysis (PCA), which utilizes the high correlation between face pose images. The a self organization map(SM) is trained by the transformed face vectors, in such a that the nodes at similar locations respond to similar poses. A sequence of poses which comprises each model gesture goes through PCA and SOM, and the result is stored in the database. At the recognition stage any sequence of frames goes through the PCA and SOM, and the result is compared with the model gesture stored in the database. To improve robustness of classification, probabilistic relaxation labeling(PRL) is used, which utilizes the contextural information imbedded in the adjacent poses.

  • PDF

Zero-anaphora resolution in Korean based on deep language representation model: BERT

  • Kim, Youngtae;Ra, Dongyul;Lim, Soojong
    • ETRI Journal
    • /
    • 제43권2호
    • /
    • pp.299-312
    • /
    • 2021
  • It is necessary to achieve high performance in the task of zero anaphora resolution (ZAR) for completely understanding the texts in Korean, Japanese, Chinese, and various other languages. Deep-learning-based models are being employed for building ZAR systems, owing to the success of deep learning in the recent years. However, the objective of building a high-quality ZAR system is far from being achieved even using these models. To enhance the current ZAR techniques, we fine-tuned a pretrained bidirectional encoder representations from transformers (BERT). Notably, BERT is a general language representation model that enables systems to utilize deep bidirectional contextual information in a natural language text. It extensively exploits the attention mechanism based upon the sequence-transduction model Transformer. In our model, classification is simultaneously performed for all the words in the input word sequence to decide whether each word can be an antecedent. We seek end-to-end learning by disallowing any use of hand-crafted or dependency-parsing features. Experimental results show that compared with other models, our approach can significantly improve the performance of ZAR.