• Title/Summary/Keyword: limited data

Search Result 6,503, Processing Time 0.036 seconds

Numerical Study on Surface Data Assimilation for Estimation of Air Quality in Complex Terrain (복잡 지형의 대기질 예측을 위한 지상자료동화의 효용성에 관한 수치연구)

  • 이순환;김헌숙;이화운
    • Journal of Korean Society for Atmospheric Environment
    • /
    • v.20 no.4
    • /
    • pp.523-537
    • /
    • 2004
  • In order to raise the accuracy of meteorological data, several numerical experiments about the usefulness of data assimilation to prediction of air pollution was carried out. Used data for data assimilation are surface meteorological components observed by Automatical Weather System with high spatial density. The usage of surface data assimilation gives changes of temperature and wind fields and the change caused by the influence of land-use on meterological simulation is more sensitive at night than noon. The data quality in assimilation it also one of the important factors to predict the meteorological field precisely and through the static IOA (Index of Agreement), simulated meteorological components with selected limited surface data assimilation are agree well with observations.

A Reliable Secure Storage Cloud and Data Migration Based on Erasure Code

  • Mugisha, Emmy;Zhang, Gongxuan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.1
    • /
    • pp.436-453
    • /
    • 2018
  • Storage cloud scheme, pushing data to the storage cloud poses much attention regarding data confidentiality. With encryption concept, data accessibility is limited because of encrypted data. To secure storage system with high access power is complicated due to dispersed storage environment. In this paper, we propose a hardware-based security scheme such that a secure dispersed storage system using erasure code is articulated. We designed a hardware-based security scheme with data encoding operations and migration capabilities. Using TPM (Trusted Platform Module), the data integrity and security is evaluated and achieved.

Cross platform classification of microarrays by rank comparison

  • Lee, Sunho
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.2
    • /
    • pp.475-486
    • /
    • 2015
  • Mining the microarray data accumulated in the public data repositories can save experimental cost and time and provide valuable biomedical information. Big data analysis pooling multiple data sets increases statistical power, improves the reliability of the results, and reduces the specific bias of the individual study. However, integrating several data sets from different studies is needed to deal with many problems. In this study, I limited the focus to the cross platform classification that the platform of a testing sample is different from the platform of a training set, and suggested a simple classification method based on rank. This method is compared with the diagonal linear discriminant analysis, k nearest neighbor method and support vector machine using the cross platform real example data sets of two cancers.

Subgroup Discovery Method with Internal Disjunctive Expression

  • Kim, Seyoung;Ryu, Kwang Ryel
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.1
    • /
    • pp.23-32
    • /
    • 2017
  • We can obtain useful knowledge from data by using a subgroup discovery algorithm. Subgroup discovery is a rule model learning method that finds data subgroups containing specific information from data and expresses them in a rule form. Subgroups are meaningful as they account for a high percentage of total data and tend to differ significantly from the overall data. Subgroup is expressed with conjunction of only literals previously. So, the scope of the rules that can be derived from the learning process is limited. In this paper, we propose a method to increase expressiveness of rules through internal disjunctive representation of attribute values. Also, we analyze the characteristics of existing subgroup discovery algorithms and propose an improved algorithm that complements their defects and takes advantage of them. Experiments are conducted with the traffic accident data given from Busan metropolitan city. The results shows that performance of the proposed method is better than that of existing methods. Rule set learned by proposed method has interesting and general rules more.

Development of the design methodology for large-scale database based on MongoDB

  • Lee, Jun-Ho;Joo, Kyung-Soo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.11
    • /
    • pp.57-63
    • /
    • 2017
  • The recent sudden increase of big data has characteristics such as continuous generation of data, large amount, and unstructured format. The existing relational database technologies are inadequate to handle such big data due to the limited processing speed and the significant storage expansion cost. Thus, big data processing technologies, which are normally based on distributed file systems, distributed database management, and parallel processing technologies, have arisen as a core technology to implement big data repositories. In this paper, we propose a design methodology for large-scale database based on MongoDB by extending the information engineering methodology based on E-R data model.

Fully connecting the Observational Health Data Science and Informatics (OHDSI) initiative with the world of linked open data

  • Banda, Juan M.
    • Genomics & Informatics
    • /
    • v.17 no.2
    • /
    • pp.13.1-13.3
    • /
    • 2019
  • The usage of controlled biomedical vocabularies is the cornerstone that enables seamless interoperability when using a common data model across multiple data sites. The Observational Health Data Science and Informatics (OHDSI) initiative combines over 100 controlled vocabularies into its own. However, the OHDSI vocabulary is limited in the sense that it combines multiple terminologies and does not provide a direct way to link them outside of their own self-contained scope. This issue makes the tasks of enriching feature sets by using external resources extremely difficult. In order to address these shortcomings, we have created a linked data version of the OHDSI vocabulary, connecting it with already established linked resources like bioportal, bio2rdf, etc. with the ultimate purpose of enabling the interoperability of resources previously foreign to the OHDSI universe.

A Comparative Analysis on the Perceptions of Users' and Financial Company Employees' on MyData Services: Using Q Methodology (마이데이터 서비스 수용 의도와 요인에 대한 사용자와 금융사 직원의 인식 비교 연구: Q 방법론을 활용하여)

  • Lee, Jungwoo;Kim, Chulmin;Song, Young-gue;Park, Hyunji
    • Journal of Information Technology Services
    • /
    • v.21 no.3
    • /
    • pp.1-25
    • /
    • 2022
  • The financial MyData service has implemented in January 2022 and launched 45 services by banks, securities, credit cards and fintech companies. This study applied the Q methodology, to identify the user types of MyData services and compared the perceptions of employees of financial institutions who plan and develop the MyData services. There are three types of MyData service users: active users, limited users who focus on consumption and asset status inquiry, and sensitive users for personal information. There were two types of recognition of financial company employees. One is the active user support other is the sensitive user for personal information support. The analysis of subjective perceptions can be used as a reference for establishing a company's MyData service marketing strategy and establishing related policies to improve the MyData ecosystem.

Incorporating BERT-based NLP and Transformer for An Ensemble Model and its Application to Personal Credit Prediction

  • Sophot Ky;Ju-Hong Lee;Kwangtek Na
    • Smart Media Journal
    • /
    • v.13 no.4
    • /
    • pp.9-15
    • /
    • 2024
  • Tree-based algorithms have been the dominant methods used build a prediction model for tabular data. This also includes personal credit data. However, they are limited to compatibility with categorical and numerical data only, and also do not capture information of the relationship between other features. In this work, we proposed an ensemble model using the Transformer architecture that includes text features and harness the self-attention mechanism to tackle the feature relationships limitation. We describe a text formatter module, that converts the original tabular data into sentence data that is fed into FinBERT along with other text features. Furthermore, we employed FT-Transformer that train with the original tabular data. We evaluate this multi-modal approach with two popular tree-based algorithms known as, Random Forest and Extreme Gradient Boosting, XGBoost and TabTransformer. Our proposed method shows superior Default Recall, F1 score and AUC results across two public data sets. Our results are significant for financial institutions to reduce the risk of financial loss regarding defaulters.

Crack Opening Displacement Estimation for Engineering Leak-Before-Break Analyses of Pressurized Nuclear Piping (원자력 배관의 공학적 파단전누설 해석을 위한 균열열림변위 계산)

  • Huh Nam-Su;Kim Yun-Jae;Chang Yoon-Suk;Yang Jun-Seok;Choi Jae-Boons
    • Transactions of the Korean Society of Mechanical Engineers A
    • /
    • v.28 no.10
    • /
    • pp.1612-1620
    • /
    • 2004
  • This study presents methods to estimate elastic-plastic crack opening displacement (COD) fur circumferential through-wall cracked pipes for the Leak-Before-Break (LBB) analysis of pressurized piping. Proposed methods are based not only on the GE/EPRI approach but also on the reference stress approach. For each approach, two different estimation schemes are given, one for the case when full stress-strain data are available and the other fur the case when only yield and ultimate tensile strengths are available. For the GE/EPRI approach a robust way of determining the Ramberg-Osgood (R-O) parameters is proposed, not only fur the case when detailed information on full stress-strain data is available but also for the case when only yield and ultimate tensile strengths are available. The COD estimates according to the GE/EPRI approach, using the R-O parameters determined from the proposed R-O fitting procedures, generally compare well with the published pipe test data. For the reference stress approach, the COD estimates according to the method based on both full stress-strain data and limited tensile properties are in good agreement with pipe test data. In conclusion, experimental validation given in the present study provides sufficient confidence in the use of the proposed method to practical LBB analyses even though when information on material's tensile properties is limited.

An Adaptive Method based on Data Size for Broadcast in Mobile Computing Environments (이동 컴퓨팅 환경에서 데이타 크기를 고려하는 적응적 브로드캐스팅 기법)

  • 유영호;이종환;김경석
    • Journal of KIISE:Information Networking
    • /
    • v.30 no.2
    • /
    • pp.155-166
    • /
    • 2003
  • Mobile computing becomes a new issue of researches in computing due to the advances of mobile equipment and the connection with Internet. In mobile environment, there are many constraints such as limited bandwidth, intermittent disconnection, limited battery life, and so on. By these reasons, broadcasting has been generally used to disseminate data efficiently by the mobile applications. This paper proposes an adaptive broadcasting method which logically divides broadcast channel into the periodic broadcast channel and the on-demand broadcast channel and dynamically assigns the bandwidths of both channel. The former disseminates data that are selected based on both the popularity and the size of each datum, the latter disseminates data that are selected based on the requests of mobile clients. When selecting data to be disseminated, the proposed broadcasting method considers the mobility of a mobile client and also considers the size of each datum by using SF(size factor) proposed in this paper. This paper also evaluates the proposed broadcasting method by measuring the energy expenditure of mobile client in experiments.