• Title/Summary/Keyword: Large Scale Data

Search Result 2,773, Processing Time 0.034 seconds

Integration of a Large-Scale Genetic Analysis Workbench Increases the Accessibility of a High-Performance Pathway-Based Analysis Method

  • Lee, Sungyoung;Park, Taesung
    • Genomics & Informatics
    • /
    • v.16 no.4
    • /
    • pp.39.1-39.3
    • /
    • 2018
  • The rapid increase in genetic dataset volume has demanded extensive adoption of biological knowledge to reduce the computational complexity, and the biological pathway is one well-known source of such knowledge. In this regard, we have introduced a novel statistical method that enables the pathway-based association study of large-scale genetic dataset-namely, PHARAOH. However, researcher-level application of the PHARAOH method has been limited by a lack of generally used file formats and the absence of various quality control options that are essential to practical analysis. In order to overcome these limitations, we introduce our integration of the PHARAOH method into our recently developed all-in-one workbench. The proposed new PHARAOH program not only supports various de facto standard genetic data formats but also provides many quality control measures and filters based on those measures. We expect that our updated PHARAOH provides advanced accessibility of the pathway-level analysis of large-scale genetic datasets to researchers.

A Study on the Diachronic Evolution of Ancient Chinese Vocabulary Based on a Large-Scale Rough Annotated Corpus

  • Yuan, Yiguo;Li, Bin
    • Asia Pacific Journal of Corpus Research
    • /
    • v.2 no.2
    • /
    • pp.31-41
    • /
    • 2021
  • This paper makes a quantitative analysis of the diachronic evolution of ancient Chinese vocabulary by constructing and counting a large-scale rough annotated corpus. The texts from Si Ku Quan Shu (a collection of Chinese ancient books) are automatically segmented to obtain ancient Chinese vocabulary with time information, which is used to the statistics on word frequency, standardized type/token ratio and proportion of monosyllabic words and dissyllabic words. Through data analysis, this study has the following four findings. Firstly, the high-frequency words in ancient Chinese are stable to a certain extent. Secondly, there is no obvious dissyllabic trend in ancient Chinese vocabulary. Moreover, the Northern and Southern Dynasties (420-589 AD) and Yuan Dynasty (1271-1368 AD) are probably the two periods with the most abundant vocabulary in ancient Chinese. Finally, the unique words with high frequency in each dynasty are mainly official titles with real power. These findings break away from qualitative methods used in traditional researches on Chinese language history and instead uses quantitative methods to draw macroscopic conclusions from large-scale corpus.

Analysis of Traffic Accident using Association Rule Model

  • Ihm, Sun-Young;Park, Young-Ho
    • Journal of Multimedia Information System
    • /
    • v.5 no.2
    • /
    • pp.111-114
    • /
    • 2018
  • Traffic accident analysis is important to reduce the occurrence of the accidents. In this paper, we analyze the traffic accident with Apriori algorithm to find out an association rule of traffic accident in Korea. We first design the traffic accident analysis model, and then collect the traffic accidents data. We preprocessed the collected data and derived some new variables and attributes for analyzing. Next, we analyze based on statistical method and Apriori algorithm. The result shows that many large-scale accident has occurred by vans in daytime. Medium-scale accident has occurred more in day than nighttime, and by cars more than vans. Small-scale accident has occurred more in night time than day time, however, the numbers were similar. Also, car-human accident is more occurred than car-car accident in small-scale accident.

Small Scale Digital Mapping using Airborne Digital Camera Image Map (디지털 항공영상의 도화성과를 이용한 소축척 수치지도 제작)

  • Choi, Seok-Keun;Oh, Eu-Gene
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.29 no.2
    • /
    • pp.141-147
    • /
    • 2011
  • This study analyzed the issues and its usefulness of drawing small-scale digital map by using the large-scale digital map which was producted with high-resolution digital aerial photograph which are commonly photographed in recent years. To this end, correlation analysis of the feature categories on the digital map was conducted, and this map was processed by inputting data, organizing, deleting, editing, and supervising feature categories according to the generalization process. As a result, 18 unnecessary feature codes were deleted, and the accuracy of 1/5,000 for the digital map was met. Although the size of the data and the number of feature categories increased, this was proven to be shown due to the excellent description of the digital aerial photograph. Accordingly, it was shown that drawing a small-scale digital map with the large-scale digital map by digital aerial photograph provided excellent description and high-quality information for digital map.

Large Eddy Simulation of Turbulent Premixed Combustion Flow around Bluff Body based on the G-equation with Dynamic sub-grid model (Dynamic Sub-grid 모델을 이용한 G 방정식에 의한 보염기 주위의 난류 예혼합 연소에 관한 대 와동 모사)

  • Park, Nam-Seob;Ko, Sang-Cheol
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.34 no.8
    • /
    • pp.1084-1093
    • /
    • 2010
  • Large eddy simulation of turbulent premixed flame stabilized by the bluff body is performed by using sub-grid scale combustion model based on the G-equation describing the flame front propagation. The basic idea of LES modeling is to evaluate the filtered-front speed, which should be enhanced in the grid scale by the scale fluctuations. The dynamic subgrid scale models newly introduced into the G-equation are validated by the premixed combustion flow behind the triangle flame holder. The calculated results can predict the velocity and temperature of the combustion flow in good agreement with the experiment data.

REDUCING LATENCY IN SMART MANUFACTURING SERVICE SYSTEM USING EDGE COMPUTING

  • Vimal, S.;Jesuva, Arockiadoss S;Bharathiraja, S;Guru, S;Jackins, V.
    • Journal of Platform Technology
    • /
    • v.9 no.1
    • /
    • pp.15-22
    • /
    • 2021
  • In a smart manufacturing environment, more and more devices are connected to the Internet so that a large volume of data can be obtained during all phases of the product life cycle. The large-scale industries, companies and organizations that have more operational units scattered among the various geographical locations face a huge resource consumption because of their unorganized structure of sharing resources among themselves that directly affects the supply chain of the corresponding concerns. Cloud-based smart manufacturing paradigm facilitates a new variety of applications and services to analyze a large volume of data and enable large-scale manufacturing collaboration. The manufacturing units include machinery that may be situated in different geological areas and process instances that are executed from different machinery data should be constantly managed by the super admin to coordinate the manufacturing process in the large-scale industries these environments make the manufacturing process a tedious work to maintain the efficiency of the production unit. The data from all these instances should be monitored to maintain the integrity of the manufacturing service system, all these data are computed in the cloud environment which leads to the latency in the performance of the smart manufacturing service system. Instead, validating data from the external device, we propose to validate the data at the front-end of each device. The validation process can be automated by script validation and then the processed data will be sent to the cloud processing and storing unit. Along with the end-device data validation we will implement the APM(Asset Performance Management) to enhance the productive functionality of the manufacturers. The manufacturing service system will be chunked into modules based on the functionalities of the machines and process instances corresponding to the time schedules of the respective machines. On breaking the whole system into chunks of modules and further divisions as required we can reduce the data loss or data mismatch due to the processing of data from the instances that may be down for maintenance or malfunction ties of the machinery. This will help the admin to trace the individual domains of the smart manufacturing service system that needs attention for error recovery among the various process instances from different machines that operate on the various conditions. This helps in reducing the latency, which in turn increases the efficiency of the whole system

Analysis of Electrical Equipment and Work Environment for Domestic Small-Scale Construction Site (국내 소규모 건설현장의 전기설비 및 작업환경 분석)

  • Kim, Doo-Hyun;Hwang, Dong-Kyu;Kim, Sung-Chul;Kang, Shin-Uk;Choi, Sang-Won
    • Journal of the Korean Society of Safety
    • /
    • v.29 no.4
    • /
    • pp.42-47
    • /
    • 2014
  • This paper is aimed to investigate and analyze of characteristic for electrical equipment and work environment for the small-scale site. In order to investigate and analyze electrical equipment and work environment for preventing electric shock disaster in construction sites, 50 small-scale construction sites and 12 large-scale construction sites are selected. This paper completed site investigations of low-voltage equipment and the portable electric machine and equipment in 12 large-scale construction sites and 50 small-scale construction sites. The findings were about the electric shock environment relevant to the ground-relevant equipment, the panel board, the protection tools, the sockets, the temporary wiring system, the portable and movable electric machines and equipments in small-scale construction sites. Finally, this study analyzed the domestic and foreign relevant standards and regulations and these findings can be utilized as educational data warning electric shock risk caused by electric equipment in small-scale construction site.

A Study on the Sale Estimate Model of a Large-Scale Store in Korea (국내 대형점의 매출추정모델 설정 방안 연구)

  • Youn, Myoung-Kil;Kim, Jong-Jin;Park, Chul-Ju;Shim, Kyu-Yeol
    • Journal of Distribution Science
    • /
    • v.11 no.12
    • /
    • pp.5-11
    • /
    • 2013
  • Purpose - The purpose of this study was to construct a turnover estimation model by investigating research by Park et al. (2006) on the market area of domestic distribution. The study investigated distribution by using a new tool for the turnover estimation technique. This study developed and discussed the turnover estimation technique of Park et al. (2006), applying it to a large-scale retailer in "D"city that was suitable for on-the-spot distribution. It constructed the new model in accordance with test procedures keeping to this retail business location, to apply its procedures to a specific situation and improve the turn over estimation process. Further, it investigated the analysis and procedures of existing turnover estimation cases to provide problems and alternatives for turnover estimation for a large-scale retailer in "D"city. Finally, it also discussed problems and scope for further research. Research design, data, and methodology - This study was conducted on the basis of "virtue" studies. In other words, it took into account the special quality of the structure of Korea's trade zones. The researcher sought to verify a sale estimate model for use in a distribution industry's location. The main purpose was to enable the sale estimate model (that is, the individual model's presentation) to be practically used in real situations in Korea by supplementing processes and variables. Results - The sale estimate model is constructed, first, by conducting a data survey of the general trading area. Second, staying within the city's census of company operating areas, the city's total consumption expenditure is derived by applying the large-scale store index. Third, the probability of shopping is investigated. Fourth, the scale of sales is estimated using the process of singularity. The correct details need to be verified for the model construction and the new model will need to be a distinct sale estimate model, with this being a special quality for business conditions. This will need to be a subsequent research task. Conclusions - The study investigated, tested, and supplemented the turnover estimation model of Park et al. (2006) in a market area in South Korea. Supplementation of some procedures and variables could provide a turnover estimation model in South Korea that would be an independent model. The turnover estimation model is applied, first, by undertaking an investigation of the market area. Second, a census of the intercity market area is carried out to estimate the total consumption of the specific city. Consumption is estimated by applying indexes of large-scale retailers. Third, an investigation is undertaken on the probability of shopping. Fourth, the scale of turnover is estimated. Further studies should investigate each department as well as direct and indirect variables. The turnover estimation model should be tested to construct new models depending on the type of region and business. In-depth and careful discussion by researchers is also needed. An upgraded turnover estimation model could be developed for Korea's on-the-spot distribution.

Streaming Decision Tree for Continuity Data with Changed Pattern (패턴의 변화를 가지는 연속성 데이터를 위한 스트리밍 의사결정나무)

  • Yoon, Tae-Bok;Sim, Hak-Joon;Lee, Jee-Hyong;Choi, Young-Mee
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.20 no.1
    • /
    • pp.94-100
    • /
    • 2010
  • Data Mining is mainly used for pattern extracting and information discovery from collected data. However previous methods is difficult to reflect changing patterns with time. In this paper, we introduce Streaming Decision Tree(SDT) analyzing data with continuity, large scale, and changed patterns. SDT defines continuity data as blocks and extracts rules using a Decision Tree's learning method. The extracted rules are combined considering time of occurrence, frequency, and contradiction. In experiment, we applied time series data and confirmed resonable result.