• Title/Summary/Keyword: Large Data Set

Search Result 1,054, Processing Time 0.034 seconds

Segmentation-free Recognition of Touching Numeral Pairs (두자 접촉 숫자열의 분할 자유 인식)

  • Choi, Soon-Man;Oh, Il-Seok
    • Journal of KIISE:Software and Applications
    • /
    • v.27 no.5
    • /
    • pp.563-574
    • /
    • 2000
  • Recognition of numeral fields is a very important task for many document automation applications. Conventional methods are based on the two-steps process, segmentation of touching numerals and recognition of the individual numerals. However, due to a large variation of touching types this approach has not produced a robust result. In this paper, we present a new segmentation-free method for recognizing the two touching numerals. In this approach, two touching numerals are regarded as a single pattern coming from 100 classes ('00', '01', '02', ..., '98', '99'). For the test set, we manually extract two touching numerals from the data set of NIST numeral fields. Due to the limitation of conventional neural network in case of large-set classification, we use a modular neural network and Drove its superiority through recognition experimen.

  • PDF

A Study on the Spatial Characteristics of Franchise Beauty Salon in Korea (국내 프랜차이즈 미용실의 공간 특성에 관한 연구-세트부스를 중심으로-)

  • 홍승대;이상호;신은주
    • Korean Institute of Interior Design Journal
    • /
    • no.22
    • /
    • pp.16-22
    • /
    • 2000
  • The purpose of this study is to analyze characteristics of set-booth in beauty salon as well as to suggest the basic design data for franchise beauty salon. The method of this research was based on field observation of the franchise beauty salon in Seoul. The results of this research are as follows. 1) In set-booth type analysis, set-mirror wall type and set-mirror partition type are mainly used, but set-mirror table type is not showed in this research. 2) In terms of scale, wall type and partition type are classified as large scale, wall type and partition type are used as meduim scale. In shop front analysis, the result is shown in two things. If it is type, they used partition type and if it is close type, they used wall type. 3) Set-mirror is consisted of mirror and drawer and it is classified by 4 types with combination method. In a result, most of them used separated mirror type because they want to emphasize the separation between set booth and its layout. 4) Lighting method has 4 types; corniced type, bracket type, pendant type and downlight type. Among them, downlight is showed as the most-used.

  • PDF

Change of Sunspot Groups Observed from 2002 to 2011 at ButterStar Observatory

  • Oh, Sung-Jin;Chang, Heon-Young
    • Journal of Astronomy and Space Sciences
    • /
    • v.29 no.3
    • /
    • pp.245-251
    • /
    • 2012
  • Since the development of surface magnetic features should reflect the evolution of the solar magnetic field in the deep interior of the Sun, it is crucial to study properties of sunspots and sunspot groups to understand the physical processes working below the solar surface. Here, using the data set of sunspot groups observed at the ButterStar observatory for 3,364 days from 2002 October 16 to 2011 December 31, we investigate temporal change of sunspot groups depending on their Z$\ddot{u}$rich classification type. Our main findings are as follows: (1) There are more sunspot groups in the southern hemisphere in solar cycle 23, while more sunspot groups appear in the northern hemisphere in solar cycle 24. We also note that in the declining phase of solar cycle 23 the decreasing tendency is apparently steeper in the solar northern hemisphere than in the solar southern hemisphere. (2) Some of sunspot group types make a secondary peak in the distribution between the solar maximum and the solar minimum. More importantly, in this particular data set, sunspot groups which have appeared in the solar southern hemisphere make a secondary peak 1 year after a secondary peak occurs in the solar northern hemisphere. (3) The temporal variations of small and large sunspot group numbers are disparate. That is, the number of large sunspot group declines earlier and faster and that the number of small sunspot group begins to rise earlier and faster. (4) The total number of observed sunspot is found to behave more likewise as the small sunspot group does. Hence, according to our findings, behaviors and evolution of small magnetic flux tubes and large magnetic flux tubes seem to be different over solar cycles. Finally, we conclude by briefly pointing out its implication on the space weather forecast.

Optimal Provider Mobility in Large-Scale Named- Data Networking

  • Do, Truong-Xuan;Kim, Younghan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.10
    • /
    • pp.4054-4071
    • /
    • 2015
  • Named-Data Networking (NDN) is one of the promising approaches for the Future Internet to cope with the explosion and current usage pattern of Internet traffic. Content provider mobility in the NDN allows users to receive real-time traffic when the content providers are on the move. However, the current solutions for managing these mobile content providers suffer several issues such as long handover latency, high cost, and non-optimal routing path. In this paper, we survey main approaches for provider mobility in NDN and propose an optimal scheme to support the mobile content providers in the large-scale NDN domain. Our scheme predicts the movement of the provider and uses state information in the NDN forwarding plane to set up an optimal new routing path for mobile providers. By numerical analysis, our approach provides NDN users with better service access delay and lower total handover cost compared with the current solutions.

Big Data Astronomy: Large-scale Graph Analyses of Five Different Multiverses

  • Hong, Sungryong
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.43 no.2
    • /
    • pp.36.3-37
    • /
    • 2018
  • By utilizing large-scale graph analytic tools in the modern Big Data platform, Apache Spark, we investigate the topological structures of five different multiverses produced by cosmological n-body simulations with various cosmological initial conditions: (1) one standard universe, (2) two different dark energy states, and (3) two different dark matter densities. For the Big Data calculations, we use a custom build of stand-alone Spark cluster at KIAS and Dataproc Compute Engine in Google Cloud Platform with the sample sizes ranging from 7 millions to 200 millions. Among many graph statistics, we find that three simple graph measurements, denoted by (1) $n_\k$, (2) $\tau_\Delta$, and (3) $n_{S\ge5}$, can efficiently discern different topology in discrete point distributions. We denote this set of three graph diagnostics by kT5+. These kT5+ statistics provide a quick look of various orders of n-points correlation functions in a computationally cheap way: (1) $n = 2$ by $n_k$, (2) $n = 3$ by $\tau_\Delta$, and (3) $n \ge 5$ by $n_{S\ge5}$.

  • PDF

Blended-Transfer Learning for Compressed-Sensing Cardiac CINE MRI

  • Park, Seong Jae;Ahn, Chang-Beom
    • Investigative Magnetic Resonance Imaging
    • /
    • v.25 no.1
    • /
    • pp.10-22
    • /
    • 2021
  • Purpose: To overcome the difficulty in building a large data set with a high-quality in medical imaging, a concept of 'blended-transfer learning' (BTL) using a combination of both source data and target data is proposed for the target task. Materials and Methods: Source and target tasks were defined as training of the source and target networks to reconstruct cardiac CINE images from undersampled data, respectively. In transfer learning (TL), the entire neural network (NN) or some parts of the NN after conducting a source task using an open data set was adopted in the target network as the initial network to improve the learning speed and the performance of the target task. Using BTL, an NN effectively learned the target data while preserving knowledge from the source data to the maximum extent possible. The ratio of the source data to the target data was reduced stepwise from 1 in the initial stage to 0 in the final stage. Results: NN that performed BTL showed an improved performance compared to those that performed TL or standalone learning (SL). Generalization of NN was also better achieved. The learning curve was evaluated using normalized mean square error (NMSE) of reconstructed images for both target data and source data. BTL reduced the learning time by 1.25 to 100 times and provided better image quality. Its NMSE was 3% to 8% lower than with SL. Conclusion: The NN that performed the proposed BTL showed the best performance in terms of learning speed and learning curve. It also showed the highest reconstructed-image quality with the lowest NMSE for the test data set. Thus, BTL is an effective way of learning for NNs in the medical-imaging domain where both quality and quantity of data are always limited.

A Meta Analysis of the Edible Insects (식용곤충 연구 메타 분석)

  • Yu, Ok-Kyeong;Jin, Chan-Yong;Nam, Soo-Tai;Lee, Hyun-Chang
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2018.10a
    • /
    • pp.182-183
    • /
    • 2018
  • Big data analysis is the process of discovering a meaningful correlation, pattern, and trends in large data set stored in existing data warehouse management tools and creating new values. In addition, by extracts new value from structured and unstructured data set in big volume means a technology to analyze the results. Most of the methods of Big data analysis technology are data mining, machine learning, natural language processing, pattern recognition, etc. used in existing statistical computer science. Global research institutes have identified Big data as the most notable new technology since 2011.

  • PDF

Exploring the feasibility of fine-tuning large-scale speech recognition models for domain-specific applications: A case study on Whisper model and KsponSpeech dataset

  • Jungwon Chang;Hosung Nam
    • Phonetics and Speech Sciences
    • /
    • v.15 no.3
    • /
    • pp.83-88
    • /
    • 2023
  • This study investigates the fine-tuning of large-scale Automatic Speech Recognition (ASR) models, specifically OpenAI's Whisper model, for domain-specific applications using the KsponSpeech dataset. The primary research questions address the effectiveness of targeted lexical item emphasis during fine-tuning, its impact on domain-specific performance, and whether the fine-tuned model can maintain generalization capabilities across different languages and environments. Experiments were conducted using two fine-tuning datasets: Set A, a small subset emphasizing specific lexical items, and Set B, consisting of the entire KsponSpeech dataset. Results showed that fine-tuning with targeted lexical items increased recognition accuracy and improved domain-specific performance, with generalization capabilities maintained when fine-tuned with a smaller dataset. For noisier environments, a trade-off between specificity and generalization capabilities was observed. This study highlights the potential of fine-tuning using minimal domain-specific data to achieve satisfactory results, emphasizing the importance of balancing specialization and generalization for ASR models. Future research could explore different fine-tuning strategies and novel technologies such as prompting to further enhance large-scale ASR models' domain-specific performance.

Deep Meta Learning Based Classification Problem Learning Method for Skeletal Maturity Indication (골 성숙도 판별을 위한 심층 메타 학습 기반의 분류 문제 학습 방법)

  • Min, Jeong Won;Kang, Dong Joong
    • Journal of Korea Multimedia Society
    • /
    • v.21 no.2
    • /
    • pp.98-107
    • /
    • 2018
  • In this paper, we propose a method to classify the skeletal maturity with a small amount of hand wrist X-ray image using deep learning-based meta-learning. General deep-learning techniques require large amounts of data, but in many cases, these data sets are not available for practical application. Lack of learning data is usually solved through transfer learning using pre-trained models with large data sets. However, transfer learning performance may be degraded due to over fitting for unknown new task with small data, which results in poor generalization capability. In addition, medical images require high cost resources such as a professional manpower and mcuh time to obtain labeled data. Therefore, in this paper, we use meta-learning that can classify using only a small amount of new data by pre-trained models trained with various learning tasks. First, we train the meta-model by using a separate data set composed of various learning tasks. The network learns to classify the bone maturity using the bone maturity data composed of the radiographs of the wrist. Then, we compare the results of the classification using the conventional learning algorithm with the results of the meta learning by the same number of learning data sets.

CBIR-based Data Augmentation and Its Application to Deep Learning (CBIR 기반 데이터 확장을 이용한 딥 러닝 기술)

  • Kim, Sesong;Jung, Seung-Won
    • Journal of Broadcast Engineering
    • /
    • v.23 no.3
    • /
    • pp.403-408
    • /
    • 2018
  • Generally, a large data set is required for learning of deep learning. However, since it is not easy to create large data sets, there are a lot of techniques that make small data sets larger through data expansion such as rotation, flipping, and filtering. However, these simple techniques have limitation on extendibility because they are difficult to escape from the features already possessed. In order to solve this problem, we propose a method to acquire new image data by using existing data. This is done by retrieving and acquiring similar images using existing image data as a query of the content-based image retrieval (CBIR). Finally, we compare the performance of the base model with the model using CBIR.