• Title/Summary/Keyword: Datasets

Search Result 2,094, Processing Time 0.031 seconds

Trends of Upper Jet Streams Characteristics (Intensity, Altitude, Latitude and Longitude) Over the Asia-North Pacific Region Based on Four Reanalysis Datasets (재분석자료들을 활용한 아시아-북태평양 상층제트의 강도(풍속) 및 3차원적 위치 변화 경향)

  • So, Eun-Mi;Suh, Myoung-Seok
    • Atmosphere
    • /
    • v.27 no.1
    • /
    • pp.1-16
    • /
    • 2017
  • In this study, trends of upper jet stream characteristics (intensity, altitude, latitude, and longitude) over the Asia-North Pacific region during the recent 30 (1979~2008) years were analyzed by using four reanalysis datasets (CFSR, ERA-Int., JRA-55, MERRA). We defined the characteristics of upper jet stream as the averages of mass weighted wind speed, mass-flux weighted altitude, latitude and longitude between 400 and 100 hPa. Due to the vertical averaging of jet stream characteristics, our results reveal a weaker spatial variabilities and trends than previous studies. In general, the four reanalysis datasets show similar jet stream properties (intensity, altitude, latitude and longitude) although the magnitude and trends are slightly different among the reanalysis datasets. The altitude of MERRA is slightly higher than that of others for all seasons. The domain averaged intensity shows a weakening trend except for winter and the altitude of jet stream shows an increasing trend for all seasons. Also, the meridional trend of jet core shows a poleward trend for all seasons but it shows a contrasting trend, poleward trend in the continental area but equatorward trend in the Western Pacific region during summer. The zonal trend of jet core is very weak but a relatively strong westward trend in jet core except for spring and winter. The trends of jet stream characteristics found in this study are thermodynamically consistent with the global warming trends observed in the Asia-Pacific region.

A Simulation Model Development to Analyze Effects on LiDAR Acquisition Parameters in Forest Inventory (산림조사에서의 항공라이다 취득인자에 따른 영향분석을 위한 시뮬레이션 모델 개발)

  • Song, Chul-Chul;Lee, Woo-Kyun;Kwak, Doo-An;Kwak, Han-Bin
    • Proceedings of the Korean Association of Geographic Inforamtion Studies Conference
    • /
    • 2008.06a
    • /
    • pp.310-317
    • /
    • 2008
  • Although aerial LiDAR had been launched commercially several years ago, it is still difficult to study data acquisition conditions and effects with various datasets because of its acquisition cost. Thus, this research was performed to study data acquisition conditions and effects with virtually various datasets. For this research, 3D tree models and forest stand models were built to represent graded tree sizes and tree plantation densities. Also, a variable aerial LiDAR acquisition model was developed. Then, through controlling flight height parameter, one of the data acquisition parameters, virtual datasets were collected for various data acquisition densities. From those datasets, forest canopy volumes and maximum tree heights were estimated and the estimated results were compared. As the results, the estimated is getting closer to the expected during the data acquisition density increase. This research would be helpful to perform further studios on relations between forest inventory accuracy and LiDAR cost.

  • PDF

Detecting Uncertain Boundary Algorithm using Constrained Delaunay Triangulation (제한된 델로네 삼각분할을 이용한 공간 불확실한 영역 탐색 기법)

  • Cho, Sunghwan
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.32 no.2
    • /
    • pp.87-93
    • /
    • 2014
  • Cadastral parcel objects as polygons are fundamental dataset which represent land administration and management of the real world. Thus it is necessary to assure topological seamlessness of cadastral datasets which means no overlaps or gaps between adjacent parcels. However, the problem of overlaps or gaps are frequently found due to non-coinciding edges between adjacent parcels. These erroneous edges are called uncertain edges, and polygons containing at least one uncertain edge are called uncertain polygons. In this paper, we proposed a new algorithm to efficiently search parcels of uncertain polygons between two adjacent cadastral datasets. The algorithm first selects points and polylines around adjacent datasets. Then the Constrained Delaunay Triangulation (CDT) is applied to extract triangles. These triangles are tagged by the number of the original cadastral datasets which intersected with the triangles. If the tagging value is zero, the area of triangles mean gaps, meanwhile, the value is two, the area means overlaps. Merging these triangles with the same tagging values according to adjacency analysis, uncertain edges and uncertain polygons could be found. We have performed experimental application of this automated derivation of partitioned boundary from a real land-cadastral dataset.

CDRgator: An Integrative Navigator of Cancer Drug Resistance Gene Signatures

  • Jang, Su-Kyeong;Yoon, Byung-Ha;Kang, Seung Min;Yoon, Yeo-Gha;Kim, Seon-Young;Kim, Wankyu
    • Molecules and Cells
    • /
    • v.42 no.3
    • /
    • pp.237-244
    • /
    • 2019
  • Understanding the mechanisms of cancer drug resistance is a critical challenge in cancer therapy. For many cancer drugs, various resistance mechanisms have been identified such as target alteration, alternative signaling pathways, epithelial-mesenchymal transition, and epigenetic modulation. Resistance may arise via multiple mechanisms even for a single drug, making it necessary to investigate multiple independent models for comprehensive understanding and therapeutic application. In particular, we hypothesize that different resistance processes result in distinct gene expression changes. Here, we present a web-based database, CDRgator (Cancer Drug Resistance navigator) for comparative analysis of gene expression signatures of cancer drug resistance. Resistance signatures were extracted from two different types of datasets. First, resistance signatures were extracted from transcriptomic profiles of cancer cells or patient samples and their resistance-induced counterparts for >30 cancer drugs. Second, drug resistance group signatures were also extracted from two large-scale drug sensitivity datasets representing ~1,000 cancer cell lines. All the datasets are available for download, and are conveniently accessible based on drug class and cancer type, along with analytic features such as clustering analysis, multidimensional scaling, and pathway analysis. CDRgator allows meta-analysis of independent resistance models for more comprehensive understanding of drug-resistance mechanisms that is difficult to accomplish with individual datasets alone (database URL: http://cdrgator.ewha.ac.kr).

Land Cover Classification Using Sematic Image Segmentation with Deep Learning (딥러닝 기반의 영상분할을 이용한 토지피복분류)

  • Lee, Seonghyeok;Kim, Jinsoo
    • Korean Journal of Remote Sensing
    • /
    • v.35 no.2
    • /
    • pp.279-288
    • /
    • 2019
  • We evaluated the land cover classification performance of SegNet, which features semantic segmentation of aerial imagery. We selected four semantic classes, i.e., urban, farmland, forest, and water areas, and created 2,000 datasets using aerial images and land cover maps. The datasets were divided at a 8:2 ratio into training (1,600) and validation datasets (400); we evaluated validation accuracy after tuning the hyperparameters. SegNet performance was optimal at a batch size of five with 100,000 iterations. When 200 test datasets were subjected to semantic segmentation using the trained SegNet model, the accuracies were farmland 87.89%, forest 87.18%, water 83.66%, and urban regions 82.67%; the overall accuracy was 85.48%. Thus, deep learning-based semantic segmentation can be used to classify land cover.

Development of a Method for Analyzing and Visualizing Concept Hierarchies based on Relational Attributes and its Application on Public Open Datasets

  • Hwang, Suk-Hyung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.9
    • /
    • pp.13-25
    • /
    • 2021
  • In the age of digital innovation based on the Internet, Information and Communication and Artificial Intelligence technologies, huge amounts of datasets are being generated, collected, accumulated, and opened on the web by various public institutions providing useful and public information. In order to analyse, gain useful insights and information from data, Formal Concept Analysis(FCA) has been successfully used for analyzing, classifying, clustering and visualizing data based on the binary relation between objects and attributes in the dataset. In this paper, we present an approach for enhancing the analysis of relational attributes of data within the extended framework of FCA, which is designed to classify, conceptualize and visualize sets of objects described not only by attributes but also by relations between these objects. By using the proposed tool, RCA wizard, several experiments carried out on some public open datasets demonstrate the validity and usability of our approach on generating and visualizing conceptual hierarchies for extracting more useful knowledge from datasets. The proposed approach can be used as an useful tool for effective data analysis, classifying, clustering, visualization and exploration.

Issues and Challenges in the Extraction and Mapping of Linked Open Data Resources with Recommender Systems Datasets

  • Nawi, Rosmamalmi Mat;Noah, Shahrul Azman Mohd;Zakaria, Lailatul Qadri
    • Journal of Information Science Theory and Practice
    • /
    • v.9 no.2
    • /
    • pp.66-82
    • /
    • 2021
  • Recommender Systems have gained immense popularity due to their capability of dealing with a massive amount of information in various domains. They are considered information filtering systems that make predictions or recommendations to users based on their interests and preferences. The more recent technology, Linked Open Data (LOD), has been introduced, and a vast amount of Resource Description Framework data have been published in freely accessible datasets. These datasets are connected to form the so-called LOD cloud. The need for semantic data representation has been identified as one of the next challenges in Recommender Systems. In a LOD-enabled recommendation framework where domain awareness plays a key role, the semantic information provided in the LOD can be exploited. However, dealing with a big chunk of the data from the LOD cloud and its integration with any domain datasets remains a challenge due to various issues, such as resource constraints and broken links. This paper presents the challenges of interconnecting and extracting the DBpedia data with the MovieLens 1 Million dataset. This study demonstrates how LOD can be a vital yet rich source of content knowledge that helps recommender systems address the issues of data sparsity and insufficient content analysis. Based on the challenges, we proposed a few alternatives and solutions to some of the challenges.

Performance Analysis of Cloud-Net with Cross-sensor Training Dataset for Satellite Image-based Cloud Detection

  • Kim, Mi-Jeong;Ko, Yun-Ho
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.1
    • /
    • pp.103-110
    • /
    • 2022
  • Since satellite images generally include clouds in the atmosphere, it is essential to detect or mask clouds before satellite image processing. Clouds were detected using physical characteristics of clouds in previous research. Cloud detection methods using deep learning techniques such as CNN or the modified U-Net in image segmentation field have been studied recently. Since image segmentation is the process of assigning a label to every pixel in an image, precise pixel-based dataset is required for cloud detection. Obtaining accurate training datasets is more important than a network configuration in image segmentation for cloud detection. Existing deep learning techniques used different training datasets. And test datasets were extracted from intra-dataset which were acquired by same sensor and procedure as training dataset. Different datasets make it difficult to determine which network shows a better overall performance. To verify the effectiveness of the cloud detection network such as Cloud-Net, two types of networks were trained using the cloud dataset from KOMPSAT-3 images provided by the AIHUB site and the L8-Cloud dataset from Landsat8 images which was publicly opened by a Cloud-Net author. Test data from intra-dataset of KOMPSAT-3 cloud dataset were used for validating the network. The simulation results show that the network trained with KOMPSAT-3 cloud dataset shows good performance on the network trained with L8-Cloud dataset. Because Landsat8 and KOMPSAT-3 satellite images have different GSDs, making it difficult to achieve good results from cross-sensor validation. The network could be superior for intra-dataset, but it could be inferior for cross-sensor data. It is necessary to study techniques that show good results in cross-senor validation dataset in the future.

Accuracy of Phishing Websites Detection Algorithms by Using Three Ranking Techniques

  • Mohammed, Badiea Abdulkarem;Al-Mekhlafi, Zeyad Ghaleb
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.2
    • /
    • pp.272-282
    • /
    • 2022
  • Between 2014 and 2019, the US lost more than 2.1 billion USD to phishing attacks, according to the FBI's Internet Crime Complaint Center, and COVID-19 scam complaints totaled more than 1,200. Phishing attacks reflect these awful effects. Phishing websites (PWs) detection appear in the literature. Previous methods included maintaining a centralized blacklist that is manually updated, but newly created pseudonyms cannot be detected. Several recent studies utilized supervised machine learning (SML) algorithms and schemes to manipulate the PWs detection problem. URL extraction-based algorithms and schemes. These studies demonstrate that some classification algorithms are more effective on different data sets. However, for the phishing site detection problem, no widely known classifier has been developed. This study is aimed at identifying the features and schemes of SML that work best in the face of PWs across all publicly available phishing data sets. The Scikit Learn library has eight widely used classification algorithms configured for assessment on the public phishing datasets. Eight was tested. Later, classification algorithms were used to measure accuracy on three different datasets for statistically significant differences, along with the Welch t-test. Assemblies and neural networks outclass classical algorithms in this study. On three publicly accessible phishing datasets, eight traditional SML algorithms were evaluated, and the results were calculated in terms of classification accuracy and classifier ranking as shown in tables 4 and 8. Eventually, on severely unbalanced datasets, classifiers that obtained higher than 99.0 percent classification accuracy. Finally, the results show that this could also be adapted and outperforms conventional techniques with good precision.

Media-based Analysis of Gasoline Inventory with Korean Text Summarization (한국어 문서 요약 기법을 활용한 휘발유 재고량에 대한 미디어 분석)

  • Sungyeon Yoon;Minseo Park
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.5
    • /
    • pp.509-515
    • /
    • 2023
  • Despite the continued development of alternative energies, fuel consumption is increasing. In particular, the price of gasoline fluctuates greatly according to fluctuations in international oil prices. Gas stations adjust their gasoline inventory to respond to gasoline price fluctuations. In this study, news datasets is used to analyze the gasoline consumption patterns through fluctuations of the gasoline inventory. First, collecting news datasets with web crawling. Second, summarizing news datasets using KoBART, which summarizes the Korean text datasets. Finally, preprocessing and deriving the fluctuations factors through N-Gram Language Model and TF-IDF. Through this study, it is possible to analyze and predict gasoline consumption patterns.