• 제목/요약/키워드: Mining Difficulty

검색결과 74건 처리시간 0.025초

데이터 웨어하우스 ETT 도구들의 평가 및 검증 (A Comparative Test of ETT Tools for Data Warehousing)

  • 김기운;서용무
    • Asia pacific journal of information systems
    • /
    • 제10권2호
    • /
    • pp.213-236
    • /
    • 2000
  • Many enterprises continue to have an interest in the usage of new information technologies to gain a competitive advantage. In particular, their interest in the data warehouse and the data mining reveals the aspect of such a trend. Although lots of vendors announce a variety of tools for data warehousing, many a enterprise have a difficulty in building a robust data warehouse due to the lack of the ability of selecting an appropriate data warehouse technology options. Therefore, this study presents some evaluation factors, evaluation methods, and evaluation results about ETT tools, mainly in terms of a comparative test for the current available data warehousing ETT tools, Also, this paper suggests some guides about choosing the right ETT tools.

  • PDF

Exploratory Study of Developing a Synchronization-Based Approach for Multi-step Discovery of Knowledge Structures

  • Yu, So Young
    • Journal of Information Science Theory and Practice
    • /
    • 제2권2호
    • /
    • pp.16-32
    • /
    • 2014
  • As Topic Modeling has been applied in increasingly various domains, the difficulty in naming and characterizing topics also has been recognized more. This study, therefore, explores an approach of combining text mining with network analysis in a multi-step approach. The concept of synchronization was applied to re-assign the top author keywords in more than one topic category, in order to improve the visibility of the topic-author keyword network, and to increase the topical cohesion in each topic. The suggested approach was applied using 16,548 articles with 2,881 unique author keywords in construction and building engineering indexed by KSCI. As a result, it was revealed that the combined approach could improve both the visibility of the topic-author keyword map and topical cohesion in most of the detected topic categories. There should be more cases of applying the approach in various domains for generalization and advancement of the approach. Also, more sophisticated evaluation methods should also be necessary to develop the suggested approach.

생체 데이터를 이용한 프로그래머의 프로그램 난이도 예측 (Mining Biometric Data to Predict Task Difficulty)

  • 이설화;임희석
    • 한국정보과학회 언어공학연구회:학술대회논문집(한글 및 한국어 정보처리)
    • /
    • 한국정보과학회언어공학연구회 2016년도 제28회 한글 및 한국어 정보처리 학술대회
    • /
    • pp.231-234
    • /
    • 2016
  • 프로그래머들이 코딩을 할 때 발생하는 빈번한 실수는 많은 시간적 비용을 낭비할 수 있고 작은 실수가 전체 코드에 치명적인 에러를 유발하기도 한다. 이러한 문제점은 프로그래머들이 코드를 작성할 때 전체적인 알고리즘을 얼마나 잘 이해하는지와 이전 코드에 대한 이해력과 연관이 있다. 만약 코드에 대한 이해가 어렵다면 정교하고 간결한 코드를 작성하는데 무리가 있을 것이다. 기존 코드에 대한 난이도를 평가하는 방법은 자가평가 등을 통해 이루어져 왔다. 사람 내부 변화를 직접 측정하면 더 객관적인 평가가 가능할 것이다. 본 논문은 이런 문제들을 해결하고자 동공 추적이 가능한 아이트래커와 뇌파 측정이 가능한 EEG장비를 이용하여 습득한 생체 데이터를 통해 프로그래머들의 프로그램 난이도 예측 모델을 개발하였다.

  • PDF

교차판매효과를 고려한 상품의 가치평가에 관한 연구 (A Study on the Business Value of Products Considering Cross Selling Effect)

  • 황인수
    • Asia pacific journal of information systems
    • /
    • 제15권3호
    • /
    • pp.209-221
    • /
    • 2005
  • One of the most fundamental problems in business is to evaluate the value of each product. The difficulty is that the profit of one product not only comes from its own sales, but also its influence on the sales of other products, i.e., the "cross-selling effect". This study integrates a measure for cross selling and an algorithm for profit estimation. Sales transaction data and post sales survey data from on-line and off-line shopping mall is used to show the effectiveness of the method against other heuristic for profit estimation based on product-specific profitability. We show that with the use of the new method we are able to identify the cross-selling potential of each product and use the information for better product selection.

A Computer-Assisted Pronunciation Training System for Correcting Pronunciation of Adjacent Phonemes

  • Lee, Jaesung
    • 한국컴퓨터정보학회논문지
    • /
    • 제24권2호
    • /
    • pp.9-16
    • /
    • 2019
  • Computer-Assisted Pronunciation Training system is considered to be a useful tool for pronunciation learning for students who received elementary level English pronunciation education, especially for students who have difficulty in correcting their pronunciation in front of others or who are not able to receive face-to-face training. The conventional Computer-Assisted Pronunciation Training system shows the word to the user, the user pronounces the word, and then the system provides phoneme or audio feedback according to the pronunciation of the user. In this paper, we propose a Computer-Assisted Pronunciation Training system that can practice on the varying pronunciation according to positions of adjacent phonemes. To achieve this, the proposed system is implemented by recommending a series of words by focusing on adjacent phonemes for simplicity and clarity. Experimental results showed that word recommendation considering adjacent phonemes leads to improvement of pronunciation accuracy.

데이터마이닝 기술을 이용한 학업 부진학생의 역량 분석 (An Analysis on Core Competency of Underachievers using Data Mining Techniques)

  • 전봉기
    • 한국정보통신학회:학술대회논문집
    • /
    • 한국정보통신학회 2016년도 추계학술대회
    • /
    • pp.857-858
    • /
    • 2016
  • 본 논문에서는 2006학년도부터 2013학년도 입학생을 중심으로 입학전형별 학업성취도의 차이를 분석하였다. 학업 부진학생이 증가하는 문제를 분석하기 위하여 대학생 핵심역량진단검사와 3년간의 개설과목의 학점 분포를 분석하였다. 분석결과 교양강화를 위한 학점 수 증가와 신입생들의 외국어 강좌의 부적응이 주된 원인이었다.

  • PDF

Labeling Big Spatial Data: A Case Study of New York Taxi Limousine Dataset

  • AlBatati, Fawaz;Alarabi, Louai
    • International Journal of Computer Science & Network Security
    • /
    • 제21권6호
    • /
    • pp.207-212
    • /
    • 2021
  • Clustering Unlabeled Spatial-datasets to convert them to Labeled Spatial-datasets is a challenging task specially for geographical information systems. In this research study we investigated the NYC Taxi Limousine Commission dataset and discover that all of the spatial-temporal trajectory are unlabeled Spatial-datasets, which is in this case it is not suitable for any data mining tasks, such as classification and regression. Therefore, it is necessary to convert unlabeled Spatial-datasets into labeled Spatial-datasets. In this research study we are going to use the Clustering Technique to do this task for all the Trajectory datasets. A key difficulty for applying machine learning classification algorithms for many applications is that they require a lot of labeled datasets. Labeling a Big-data in many cases is a costly process. In this paper, we show the effectiveness of utilizing a Clustering Technique for labeling spatial data that leads to a high-accuracy classifier.

석회석 광산 내 광주의 안정성 분석을 위한 미소진동 계측기술의 현장적용 (Case study of microseismic techniques for stability analysis of pillars in a limestone mine)

  • 김창오;엄우용;정소걸;천대성
    • 터널과지하공간
    • /
    • 제26권1호
    • /
    • pp.1-11
    • /
    • 2016
  • 본 연구는 미소진동 계측기술을 국내 광산의 안정성 분석에 적용한 사례연구로서, 계측자료의 분석을 통해 미소진동 기법의 광산 적용성과 한계성을 알아보았다. 적용 광산은 채수율 향상을 위해 주방식하이브리드 채광법이 적용된 석회석광산으로, 수평 단면 $50m{\times}50m$의 시험영역에 대해 각각의 수직 광주에 미소진동 센서를 설치하였다. 측정된 미소진동 신호는 발파와 천공작업으로 인한 신호, 손상에 의한 신호, 전기 잡음에 의한 신호로 구분되었으며, 손상에 의한 신호를 중심으로 안정성 분석을 실시하였다. 시험영역에 근접한 채굴부의 발파작업 후 광주의 손상이 증가하였으며, 주변에서 발생한 낙반을 미소진동 신호로부터 추정할 수 있었다. 또한 일일 미소진동 발생량의 변화로부터 광주와 채굴주변 암반의 안정성을 평가할 수 있었으며, 누적된 계측정보를 토대로 본 광산의 시험영역에 대한 안전관리 기준안을 제시하였다. 그러나 국부적인 센서 배열에 따라 3차원 음원위치를 산정하는 데 어려움이 존재하고, 실시간 계측을 위한 현실적인 대안의 필요성이 제기되었다. 향후 광산적용에서 제기된 문제점을 보완하고, 광산 현장작업과의 유기적인 비교, 분석을 통해 보다 좋은 안전감시의 지시자로서 미소진동 계측기술이 활용될 수 있을 것으로 사료된다.

Legal search method using S-BERT

  • Park, Gil-sik;Kim, Jun-tae
    • 한국컴퓨터정보학회논문지
    • /
    • 제27권11호
    • /
    • pp.57-66
    • /
    • 2022
  • 본 논문에서는 Sentence-BERT 모델을 활용한 법률 문서 검색 방법을 제안한다. 법률 검색 서비스를 이용하고자 하는 일반인들은 법률 용어 및 구조에 대한 이해가 부족함에 따라 관련 판례 검색 등에 있어 어려움을 겪고 있다. 기존의 키워드 및 텍스트마이닝 기반 법률 검색 방법은 판결문의 문맥에 대한 정보가 없으며, 동음이의어 및 다의어에 대해 구분하기 어려워 성능을 높이는 데 한계가 있었다. 그로 인해 법률 문서 검색 결과에 대한 정확도가 낮아 신뢰하기가 어려웠다. 이를 위해, 대법원 판례 및 법률구조공단 상담사례 데이터에서 일반인의 법률 검색 문장에 대한 성능을 개선하고자 한다. Sentence-BERT 모델은 판례 및 상담 데이터에 대한 문맥 정보가 임베딩 되므로, 문장의 의미 손실이 적어 TF-IDF 및 Doc2Vec 검색 방법과 비교했을 때보다 검색 정확도가 개선된 것을 확인할 수 있었다.

Network Anomaly Traffic Detection Using WGAN-CNN-BiLSTM in Big Data Cloud-Edge Collaborative Computing Environment

  • Yue Wang
    • Journal of Information Processing Systems
    • /
    • 제20권3호
    • /
    • pp.375-390
    • /
    • 2024
  • Edge computing architecture has effectively alleviated the computing pressure on cloud platforms, reduced network bandwidth consumption, and improved the quality of service for user experience; however, it has also introduced new security issues. Existing anomaly detection methods in big data scenarios with cloud-edge computing collaboration face several challenges, such as sample imbalance, difficulty in dealing with complex network traffic attacks, and difficulty in effectively training large-scale data or overly complex deep-learning network models. A lightweight deep-learning model was proposed to address these challenges. First, normalization on the user side was used to preprocess the traffic data. On the edge side, a trained Wasserstein generative adversarial network (WGAN) was used to supplement the data samples, which effectively alleviates the imbalance issue of a few types of samples while occupying a small amount of edge-computing resources. Finally, a trained lightweight deep learning network model is deployed on the edge side, and the preprocessed and expanded local data are used to fine-tune the trained model. This ensures that the data of each edge node are more consistent with the local characteristics, effectively improving the system's detection ability. In the designed lightweight deep learning network model, two sets of convolutional pooling layers of convolutional neural networks (CNN) were used to extract spatial features. The bidirectional long short-term memory network (BiLSTM) was used to collect time sequence features, and the weight of traffic features was adjusted through the attention mechanism, improving the model's ability to identify abnormal traffic features. The proposed model was experimentally demonstrated using the NSL-KDD, UNSW-NB15, and CIC-ISD2018 datasets. The accuracies of the proposed model on the three datasets were as high as 0.974, 0.925, and 0.953, respectively, showing superior accuracy to other comparative models. The proposed lightweight deep learning network model has good application prospects for anomaly traffic detection in cloud-edge collaborative computing architectures.