• Title/Summary/Keyword: sequence-to-sequence model

Search Result 1,626, Processing Time 0.026 seconds

Deep Learning-based Delinquent Taxpayer Prediction: A Scientific Administrative Approach

  • YongHyun Lee;Eunchan Kim
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.1
    • /
    • pp.30-45
    • /
    • 2024
  • This study introduces an effective method for predicting individual local tax delinquencies using prevalent machine learning and deep learning algorithms. The evaluation of credit risk holds great significance in the financial realm, impacting both companies and individuals. While credit risk prediction has been explored using statistical and machine learning techniques, their application to tax arrears prediction remains underexplored. We forecast individual local tax defaults in Republic of Korea using machine and deep learning algorithms, including convolutional neural networks (CNN), long short-term memory (LSTM), and sequence-to-sequence (seq2seq). Our model incorporates diverse credit and public information like loan history, delinquency records, credit card usage, and public taxation data, offering richer insights than prior studies. The results highlight the superior predictive accuracy of the CNN model. Anticipating local tax arrears more effectively could lead to efficient allocation of administrative resources. By leveraging advanced machine learning, this research offers a promising avenue for refining tax collection strategies and resource management.

A Study on Behavior of the Lateral Movement of Breakwater by Centrifuge model Experiments (원심모형실험에 의한 방파제의 수평변위 거동에 관한 연구)

  • Lee, Dong-Won;Kim, Dong-Gun;Jun, Sang-Hyun;Yoo, Nam-Jae
    • Proceedings of the Korean Geotechical Society Conference
    • /
    • 2010.03a
    • /
    • pp.1473-1478
    • /
    • 2010
  • For the cassion type of breakwater under the condition of large wave loads, stability about lateral movement of breakwater was investigated by performing centrifuge model experiments. Prototype of breakwater was modelled by scaling down to centrifuge model and the soft ground reinforced with grouting was also reconstructed in the centrifuge model experiments. Sandy ground beneath breakwater was prepared with a soil sampled in field so that identical value of internal friction angle could be obtained. Centrifuge model experiments were carried out to reconstruct the construction sequence in field. Lateral static wave load was applied to the model caisson after the final stage of construction sequence was rebuilt and the measured lateral movement of caisson was compared with allowable value by the code to assess the stability about lateral movement of the breakwater.

  • PDF

Test Input Sequence Generation Strategy for Timing Diagram using Linear Programming (선형 계획법을 이용한 Timing Diagram의 테스트 입력 시퀀스 자동 생성 전략)

  • Lee, Hong-Seok;Chung, Ki-Hyun;Choi, Kyung-Hee
    • The KIPS Transactions:PartD
    • /
    • v.17D no.5
    • /
    • pp.337-346
    • /
    • 2010
  • Timing diagram is popularly utilized for the reason of its advantages; it is convenient for timing diagram to describe behavior of system and it is simple for described behaviors to recognize it. Various techniques are needed to test systems described in timing diagram. One of them is a technique to derive the system into a certain condition under which a test case is effective. This paper proposes a technique to automatically generate the test input sequence to reach the condition for systems described in timing diagram. It requires a proper input set which satisfy transition condition restricted by input waveform and timing constraints to generate a test input sequence automatically. To solve the problem, this paper chooses an approach utilizing the linear programming, and solving procedure is as follows: 1) Get a Timing diagram model as an input, and transforms the timing diagram model into a linear programming problem. 2) Solve the linear programming problem using a linear programming tool. 3) Generate test input sequences of a timing diagram model from the solution of linear programming problem. This paper addresses the formal method to drive the linear programming model from a given timing diagram, shows the feasibility of our approach by prove it, and demonstrates the usability of our paper by showing that our implemented tool solves an example of a timing diagram model.

Deletion-Based Sentence Compression Using Sentence Scoring Reflecting Linguistic Information (언어 정보가 반영된 문장 점수를 활용하는 삭제 기반 문장 압축)

  • Lee, Jun-Beom;Kim, So-Eon;Park, Seong-Bae
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.3
    • /
    • pp.125-132
    • /
    • 2022
  • Sentence compression is a natural language processing task that generates concise sentences that preserves the important meaning of the original sentence. For grammatically appropriate sentence compression, early studies utilized human-defined linguistic rules. Furthermore, while the sequence-to-sequence models perform well on various natural language processing tasks, such as machine translation, there have been studies that utilize it for sentence compression. However, for the linguistic rule-based studies, all rules have to be defined by human, and for the sequence-to-sequence model based studies require a large amount of parallel data for model training. In order to address these challenges, Deleter, a sentence compression model that leverages a pre-trained language model BERT, is proposed. Because the Deleter utilizes perplexity based score computed over BERT to compress sentences, any linguistic rules and parallel dataset is not required for sentence compression. However, because Deleter compresses sentences only considering perplexity, it does not compress sentences by reflecting the linguistic information of the words in the sentences. Furthermore, since the dataset used for pre-learning BERT are far from compressed sentences, there is a problem that this can lad to incorrect sentence compression. In order to address these problems, this paper proposes a method to quantify the importance of linguistic information and reflect it in perplexity-based sentence scoring. Furthermore, by fine-tuning BERT with a corpus of news articles that often contain proper nouns and often omit the unnecessary modifiers, we allow BERT to measure the perplexity appropriate for sentence compression. The evaluations on the English and Korean dataset confirm that the sentence compression performance of sentence-scoring based models can be improved by utilizing the proposed method.

Computational Approaches for Structural and Functional Genomics

  • Brenner, Steven-E.
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2000.11a
    • /
    • pp.17-20
    • /
    • 2000
  • Structural genomics aims to provide a good experimental structure or computational model of every tractable protein in a complete genome. Underlying this goal is the immense value of protein structure, especially in permitting recognition of distant evolutionary relationships for proteins whose sequence analysis has failed to find any significant homolog. A considerable fraction of the genes in all sequenced genomes have no known function, and structure determination provides a direct means of revealing homology that may be used to infer their putative molecular function. The solved structures will be similarly useful for elucidating the biochemical or biophysical role of proteins that have been previously ascribed only phenotypic functions. More generally, knowledge of an increasingly complete repertoire of protein structures will aid structure prediction methods, improve understanding of protein structure, and ultimately lend insight into molecular interactions and pathways. We use computational methods to select families whose structures cannot be predicted and which are likely to be amenable to experimental characterization. Methods to be employed included modern sequence analysis and clustering algorithms. A critical component is consultation of the presage database for structural genomics, which records the community's experimental work underway and computational predictions. The protein families are ranked according to several criteria including taxonomic diversity and known functional information. Individual proteins, often homologs from hyperthermophiles, are selected from these families as targets for structure determination. The solved structures are examined for structural similarity to other proteins of known structure. Homologous proteins in sequence databases are computationally modeled, to provide a resource of protein structure models complementing the experimentally solved protein structures.

  • PDF

Text Summarization on Large-scale Vietnamese Datasets

  • Ti-Hon, Nguyen;Thanh-Nghi, Do
    • Journal of information and communication convergence engineering
    • /
    • v.20 no.4
    • /
    • pp.309-316
    • /
    • 2022
  • This investigation is aimed at automatic text summarization on large-scale Vietnamese datasets. Vietnamese articles were collected from newspaper websites and plain text was extracted to build the dataset, that included 1,101,101 documents. Next, a new single-document extractive text summarization model was proposed to evaluate this dataset. In this summary model, the k-means algorithm is used to cluster the sentences of the input document using different text representations, such as BoW (bag-of-words), TF-IDF (term frequency - inverse document frequency), Word2Vec (Word-to-vector), Glove, and FastText. The summary algorithm then uses the trained k-means model to rank the candidate sentences and create a summary with the highest-ranked sentences. The empirical results of the F1-score achieved 51.91% ROUGE-1, 18.77% ROUGE-2 and 29.72% ROUGE-L, compared to 52.33% ROUGE-1, 16.17% ROUGE-2, and 33.09% ROUGE-L performed using a competitive abstractive model. The advantage of the proposed model is that it can perform well with O(n,k,p) = O(n(k+2/p)) + O(nlog2n) + O(np) + O(nk2) + O(k) time complexity.

Real-Time 2D-to-3D Conversion for 3DTV using Time-Coherent Depth-Map Generation Method

  • Nam, Seung-Woo;Kim, Hye-Sun;Ban, Yun-Ji;Chien, Sung-Il
    • International Journal of Contents
    • /
    • v.10 no.3
    • /
    • pp.9-16
    • /
    • 2014
  • Depth-image-based rendering is generally used in real-time 2D-to-3D conversion for 3DTV. However, inaccurate depth maps cause flickering issues between image frames in a video sequence, resulting in eye fatigue while viewing 3DTV. To resolve this flickering issue, we propose a new 2D-to-3D conversion scheme based on fast and robust depth-map generation from a 2D video sequence. The proposed depth-map generation algorithm divides an input video sequence into several cuts using a color histogram. The initial depth of each cut is assigned based on a hypothesized depth-gradient model. The initial depth map of the current frame is refined using color and motion information. Thereafter, the depth map of the next frame is updated using the difference image to reduce depth flickering. The experimental results confirm that the proposed scheme performs real-time 2D-to-3D conversions effectively and reduces human eye fatigue.

Sequencing in Mixed Model Assembly Lines with Setup Time : A Tabu Search Approach (준비시간이 있는 혼합모델 조립라인의 제품투입순서 결정 : Tabu Search 기법 적용)

  • 김여근;현철주
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.13 no.1
    • /
    • pp.13-13
    • /
    • 1988
  • This paper considers the sequencing problem in mixed model assembly lines with hybrid workstation types and sequence-dependent setup times. Computation time is often a critical factor in choosing a method of determining the sequence. We develop a mathematical formulation of the problem to minimize the overall length of a line, and present a tabu search technique which can provide a near optimal solution in real time. The proposed technique is compared with a genetic algorithm and a branch-and-bound method. Experimental results are reported to demonstrate the efficiency of the technique.

A Head Gesture Recognition Method based on Eigenfaces using SOM and PRL (SOM과 PRL을 이용한 고유얼굴 기반의 머리동작 인식방법)

  • Lee, U-Jin;Gu, Ja-Yeong
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.3
    • /
    • pp.971-976
    • /
    • 2000
  • In this paper a new method for head gesture recognition is proposed. A the first stage, face image data are transformed into low dimensional vectors by principal component analysis (PCA), which utilizes the high correlation between face pose images. The a self organization map(SM) is trained by the transformed face vectors, in such a that the nodes at similar locations respond to similar poses. A sequence of poses which comprises each model gesture goes through PCA and SOM, and the result is stored in the database. At the recognition stage any sequence of frames goes through the PCA and SOM, and the result is compared with the model gesture stored in the database. To improve robustness of classification, probabilistic relaxation labeling(PRL) is used, which utilizes the contextural information imbedded in the adjacent poses.

  • PDF

Zero-anaphora resolution in Korean based on deep language representation model: BERT

  • Kim, Youngtae;Ra, Dongyul;Lim, Soojong
    • ETRI Journal
    • /
    • v.43 no.2
    • /
    • pp.299-312
    • /
    • 2021
  • It is necessary to achieve high performance in the task of zero anaphora resolution (ZAR) for completely understanding the texts in Korean, Japanese, Chinese, and various other languages. Deep-learning-based models are being employed for building ZAR systems, owing to the success of deep learning in the recent years. However, the objective of building a high-quality ZAR system is far from being achieved even using these models. To enhance the current ZAR techniques, we fine-tuned a pretrained bidirectional encoder representations from transformers (BERT). Notably, BERT is a general language representation model that enables systems to utilize deep bidirectional contextual information in a natural language text. It extensively exploits the attention mechanism based upon the sequence-transduction model Transformer. In our model, classification is simultaneously performed for all the words in the input word sequence to decide whether each word can be an antecedent. We seek end-to-end learning by disallowing any use of hand-crafted or dependency-parsing features. Experimental results show that compared with other models, our approach can significantly improve the performance of ZAR.