• Title/Summary/Keyword: data pre-processing

Search Result 804, Processing Time 0.036 seconds

Effects of Vacuum Precooling on Shelf Life of Pleurotus eryngii during PE Packaging Storage (큰느타리 버섯의 PE 포장 저장 중 선도에 미치는 예냉처리 효과)

  • Beik, Kyung-Yean;Lee, Ye-Kyung;Kim, Jae-Won;Park, In-Sik;Kim, Soon-Dong
    • Food Science and Preservation
    • /
    • v.16 no.2
    • /
    • pp.166-171
    • /
    • 2009
  • The effects of vacuum precooling(VP) on the shelf-life of polyethylene film(PE) packaged King oyster mushrooms(Pleurotus eryngii) during storage at $-1^{\circ}C$ were investigated. VP was conducted below $0^{\circ}C$ in a $-1^{\circ}C$ cold chamber of 40 minutes, and mushrooms were stored for 30 days in batches of 1kg. The weight loss of the VP-treated mushroom was slightly lower than that of control. The $O_2$ concentrations of VP-treated mushroos, within 4 days of storage, were 2.44-14.50 %/kg-package/hr, thus higher than control values(2.01-8.19%/kg-package/hr). $CO_2$ generation of VP-treated mushrooms, again within 4 days of storage, was 0.47%/kg-package/hr, thus lower than that of controls(0.58%/kg-package/hr). The $CO_2/O_2$ ratio peaked on day 4 of storage in the control group, tbut no such peak was observed in VP-treated mushrooms. In the VP-treated fungi, lightness was higher, and redness and yellowness lower, than in controls, at all storage times.. In VP-treated mushrooms, strength, hardness and chewiness were significantly higher than in controls, but there were no significant differemces in springiness or cohesiveness. Softening and breakdown of under-cap wrinkles were observed in control mushrooms stored for 30 days, but occurred to a lesser extent in VP-treated fungi. Stipe reticulum tissue vessels in the 30 day-stored VP-treated mushrooms were relatively well-defined and clear, but were softer and diffuse in the control fungi. The results thus confirmed that VP after harvest enhanced mushroom shelf-life and PE packaging prolonged storage time. The data will have industrial applications.

Selective Word Embedding for Sentence Classification by Considering Information Gain and Word Similarity (문장 분류를 위한 정보 이득 및 유사도에 따른 단어 제거와 선택적 단어 임베딩 방안)

  • Lee, Min Seok;Yang, Seok Woo;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.105-122
    • /
    • 2019
  • Dimensionality reduction is one of the methods to handle big data in text mining. For dimensionality reduction, we should consider the density of data, which has a significant influence on the performance of sentence classification. It requires lots of computations for data of higher dimensions. Eventually, it can cause lots of computational cost and overfitting in the model. Thus, the dimension reduction process is necessary to improve the performance of the model. Diverse methods have been proposed from only lessening the noise of data like misspelling or informal text to including semantic and syntactic information. On top of it, the expression and selection of the text features have impacts on the performance of the classifier for sentence classification, which is one of the fields of Natural Language Processing. The common goal of dimension reduction is to find latent space that is representative of raw data from observation space. Existing methods utilize various algorithms for dimensionality reduction, such as feature extraction and feature selection. In addition to these algorithms, word embeddings, learning low-dimensional vector space representations of words, that can capture semantic and syntactic information from data are also utilized. For improving performance, recent studies have suggested methods that the word dictionary is modified according to the positive and negative score of pre-defined words. The basic idea of this study is that similar words have similar vector representations. Once the feature selection algorithm selects the words that are not important, we thought the words that are similar to the selected words also have no impacts on sentence classification. This study proposes two ways to achieve more accurate classification that conduct selective word elimination under specific regulations and construct word embedding based on Word2Vec embedding. To select words having low importance from the text, we use information gain algorithm to measure the importance and cosine similarity to search for similar words. First, we eliminate words that have comparatively low information gain values from the raw text and form word embedding. Second, we select words additionally that are similar to the words that have a low level of information gain values and make word embedding. In the end, these filtered text and word embedding apply to the deep learning models; Convolutional Neural Network and Attention-Based Bidirectional LSTM. This study uses customer reviews on Kindle in Amazon.com, IMDB, and Yelp as datasets, and classify each data using the deep learning models. The reviews got more than five helpful votes, and the ratio of helpful votes was over 70% classified as helpful reviews. Also, Yelp only shows the number of helpful votes. We extracted 100,000 reviews which got more than five helpful votes using a random sampling method among 750,000 reviews. The minimal preprocessing was executed to each dataset, such as removing numbers and special characters from text data. To evaluate the proposed methods, we compared the performances of Word2Vec and GloVe word embeddings, which used all the words. We showed that one of the proposed methods is better than the embeddings with all the words. By removing unimportant words, we can get better performance. However, if we removed too many words, it showed that the performance was lowered. For future research, it is required to consider diverse ways of preprocessing and the in-depth analysis for the co-occurrence of words to measure similarity values among words. Also, we only applied the proposed method with Word2Vec. Other embedding methods such as GloVe, fastText, ELMo can be applied with the proposed methods, and it is possible to identify the possible combinations between word embedding methods and elimination methods.

Preliminary Inspection Prediction Model to select the on-Site Inspected Foreign Food Facility using Multiple Correspondence Analysis (차원축소를 활용한 해외제조업체 대상 사전점검 예측 모형에 관한 연구)

  • Hae Jin Park;Jae Suk Choi;Sang Goo Cho
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.1
    • /
    • pp.121-142
    • /
    • 2023
  • As the number and weight of imported food are steadily increasing, safety management of imported food to prevent food safety accidents is becoming more important. The Ministry of Food and Drug Safety conducts on-site inspections of foreign food facilities before customs clearance as well as import inspection at the customs clearance stage. However, a data-based safety management plan for imported food is needed due to time, cost, and limited resources. In this study, we tried to increase the efficiency of the on-site inspection by preparing a machine learning prediction model that pre-selects the companies that are expected to fail before the on-site inspection. Basic information of 303,272 foreign food facilities and processing businesses collected in the Integrated Food Safety Information Network and 1,689 cases of on-site inspection information data collected from 2019 to April 2022 were collected. After preprocessing the data of foreign food facilities, only the data subject to on-site inspection were extracted using the foreign food facility_code. As a result, it consisted of a total of 1,689 data and 103 variables. For 103 variables, variables that were '0' were removed based on the Theil-U index, and after reducing by applying Multiple Correspondence Analysis, 49 characteristic variables were finally derived. We build eight different models and perform hyperparameter tuning through 5-fold cross validation. Then, the performance of the generated models are evaluated. The research purpose of selecting companies subject to on-site inspection is to maximize the recall, which is the probability of judging nonconforming companies as nonconforming. As a result of applying various algorithms of machine learning, the Random Forest model with the highest Recall_macro, AUROC, Average PR, F1-score, and Balanced Accuracy was evaluated as the best model. Finally, we apply Kernal SHAP (SHapley Additive exPlanations) to present the selection reason for nonconforming facilities of individual instances, and discuss applicability to the on-site inspection facility selection system. Based on the results of this study, it is expected that it will contribute to the efficient operation of limited resources such as manpower and budget by establishing an imported food management system through a data-based scientific risk management model.

Airborne Hyperspectral Imagery availability to estimate inland water quality parameter (수질 매개변수 추정에 있어서 항공 초분광영상의 가용성 고찰)

  • Kim, Tae-Woo;Shin, Han-Sup;Suh, Yong-Cheol
    • Korean Journal of Remote Sensing
    • /
    • v.30 no.1
    • /
    • pp.61-73
    • /
    • 2014
  • This study reviewed an application of water quality estimation using an Airborne Hyperspectral Imagery (A-HSI) and tested a part of Han River water quality (especially suspended solid) estimation with available in-situ data. The estimation of water quality was processed two methods. One is using observation data as downwelling radiance to water surface and as scattering and reflectance into water body. Other is linear regression analysis with water quality in-situ measurement and upwelling data as at-sensor radiance (or reflectance). Both methods drive meaningful results of RS estimation. However it has more effects on the auxiliary dataset as water quality in-situ measurement and water body scattering measurement. The test processed a part of Han River located Paldang-dam downstream. We applied linear regression analysis with AISA eagle hyperspectral sensor data and water quality measurement in-situ data. The result of linear regression for a meaningful band combination shows $-24.847+0.013L_{560}$ as 560 nm in radiance (L) with 0.985 R-square. To comparison with Multispectral Imagery (MSI) case, we make simulated Landsat TM by spectral resampling. The regression using MSI shows -55.932 + 33.881 (TM1/TM3) as radiance with 0.968 R-square. Suspended Solid (SS) concentration was about 3.75 mg/l at in-situ data and estimated SS concentration by A-HIS was about 3.65 mg/l, and about 5.85mg/l with MSI with same location. It shows overestimation trends case of estimating using MSI. In order to upgrade value for practical use and to estimate more precisely, it needs that minimizing sun glint effect into whole image, constructing elaborate flight plan considering solar altitude angle, and making good pre-processing and calibration system. We found some limitations and restrictions such as precise atmospheric correction, sample count of water quality measurement, retrieve spectral bands into A-HSI, adequate linear regression model selection, and quantitative calibration/validation method through the literature review and test adopted general methods.

An Implementation of OTB Extension to Produce TOA and TOC Reflectance of LANDSAT-8 OLI Images and Its Product Verification Using RadCalNet RVUS Data (Landsat-8 OLI 영상정보의 대기 및 지표반사도 산출을 위한 OTB Extension 구현과 RadCalNet RVUS 자료를 이용한 성과검증)

  • Kim, Kwangseob;Lee, Kiwon
    • Korean Journal of Remote Sensing
    • /
    • v.37 no.3
    • /
    • pp.449-461
    • /
    • 2021
  • Analysis Ready Data (ARD) for optical satellite images represents a pre-processed product by applying spectral characteristics and viewing parameters for each sensor. The atmospheric correction is one of the fundamental and complicated topics, which helps to produce Top-of-Atmosphere (TOA) and Top-of-Canopy (TOC) reflectance from multi-spectral image sets. Most remote sensing software provides algorithms or processing schemes dedicated to those corrections of the Landsat-8 OLI sensors. Furthermore, Google Earth Engine (GEE), provides direct access to Landsat reflectance products, USGS-based ARD (USGS-ARD), on the cloud environment. We implemented the Orfeo ToolBox (OTB) atmospheric correction extension, an open-source remote sensing software for manipulating and analyzing high-resolution satellite images. This is the first tool because OTB has not provided calibration modules for any Landsat sensors. Using this extension software, we conducted the absolute atmospheric correction on the Landsat-8 OLI images of Railroad Valley, United States (RVUS) to validate their reflectance products using reflectance data sets of RVUS in the RadCalNet portal. The results showed that the reflectance products using the OTB extension for Landsat revealed a difference by less than 5% compared to RadCalNet RVUS data. In addition, we performed a comparative analysis with reflectance products obtained from other open-source tools such as a QGIS semi-automatic classification plugin and SAGA, besides USGS-ARD products. The reflectance products by the OTB extension showed a high consistency to those of USGS-ARD within the acceptable level in the measurement data range of the RadCalNet RVUS, compared to those of the other two open-source tools. In this study, the verification of the atmospheric calibration processor in OTB extension was carried out, and it proved the application possibility for other satellite sensors in the Compact Advanced Satellite (CAS)-500 or new optical satellites.

Template-Based Object-Order Volume Rendering with Perspective Projection (원형기반 객체순서의 원근 투영 볼륨 렌더링)

  • Koo, Yun-Mo;Lee, Cheol-Hi;Shin, Yeong-Gil
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.27 no.7
    • /
    • pp.619-628
    • /
    • 2000
  • Abstract Perspective views provide a powerful depth cue and thus aid the interpretation of complicated images. The main drawback of current perspective volume rendering is the long execution time. In this paper, we present an efficient perspective volume rendering algorithm based on coherency between rays. Two sets of templates are built for the rays cast from horizontal and vertical scanlines in the intermediate image which is parallel to one of volume faces. Each sample along a ray is calculated by interpolating neighboring voxels with the pre-computed weights in the templates. We also solve the problem of uneven sampling rate due to perspective ray divergence by building more templates for the regions far away from a viewpoint. Since our algorithm operates in object-order, it can avoid redundant access to each voxel and exploit spatial data coherency by using run-length encoded volume. Experimental results show that the use of templates and the object-order processing with run-length encoded volume provide speedups, compared to the other approaches. Additionally, the image quality of our algorithm improves by solving uneven sampling rate due to perspective ray di vergence.

  • PDF

Neural correlations of familiar and Unfamiliar face recognition by using Event Related fMRI

  • Kim, Jeong-Seok;Jeun, Sin-Soo;Kim, Bum-Soo;Choe, Bo-Young;Lee, Hyoung-Koo;Suh, Tae-Suk
    • Proceedings of the Korean Society of Medical Physics Conference
    • /
    • 2003.09a
    • /
    • pp.78-78
    • /
    • 2003
  • Purpose: This event related fMRI study was to further our understanding about how different brain regions could contribute to effective access of specific information stored in long term memory. This experiment has allowed us to determine the brain regions involved in recognition of familiar faces among non familiar faces. Materials and Methods: Twelve right handed normal, healthy volunteer adults participated in face recognition experiment. The paradigm consists of two 40 familiar faces, 40 unfamiliar faces and control base with scrambled faces in a randomized order, with null events. Volunteers were instructed to press on one of two possible buttons of a response box to indicate whether a face was familiar or not. Incorrect answers were ignored. A 1.5T MRI system(GMENS) was employed to evaluate brain activity by using blood oxygen level dependent (BOLD) contrast. Gradient Echo EPI sequence with TR/TE= 2250/40 msec was used for 17 contiguous axial slices of 7mm thickness, covering the whole brain volume (240mm Field of view, 64 ${\times}$ 64 in plane resolution). The acquired data were applied to SPM99 for the processing such as realignment, normalization, smoothing, statistical ANOVA and statistical preference. Results/Disscusion: The comparison of familiar faces vs unfamiliar faces yielded significant activations in the medial temporal regions, the occipito temporal regions and in frontal regions. These results suggest that when volunteers are asked to recognize familiar faces among unfamiliar faces they tend to activate several regions frequently involved in face perception. The medial temporal regions are also activated for familiar and unfamiliar faces. This interesting result suggests a contribution of this structure in the attempt to match perceived faces with pre existing semantic representations stored in long term memory.

  • PDF

Improvements of an English Pronunciation Dictionary Generator Using DP-based Lexicon Pre-processing and Context-dependent Grapheme-to-phoneme MLP (DP 알고리즘에 의한 발음사전 전처리와 문맥종속 자소별 MLP를 이용한 영어 발음사전 생성기의 개선)

  • 김회린;문광식;이영직;정재호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.5
    • /
    • pp.21-27
    • /
    • 1999
  • In this paper, we propose an improved MLP-based English pronunciation dictionary generator to apply to the variable vocabulary word recognizer. The variable vocabulary word recognizer can process any words specified in Korean word lexicon dynamically determined according to the current recognition task. To extend the ability of the system to task for English words, it is necessary to build a pronunciation dictionary generator to be able to process words not included in a predefined lexicon, such as proper nouns. In order to build the English pronunciation dictionary generator, we use context-dependent grapheme-to-phoneme multi-layer perceptron(MLP) architecture for each grapheme. To train each MLP, it is necessary to obtain grapheme-to-phoneme training data from general pronunciation dictionary. To automate the process, we use dynamic programming(DP) algorithm with some distance metrics. For training and testing the grapheme-to-phoneme MLPs, we use general English pronunciation dictionary with about 110 thousand words. With 26 MLPs each having 30 to 50 hidden nodes and the exception grapheme lexicon, we obtained the word accuracy of 72.8% for the 110 thousand words superior to rule-based method showing the word accuracy of 24.0%.

  • PDF

A proper folder recommendation technique using frequent itemsets for efficient e-mail classification (효과적인 이메일 분류를 위한 빈발 항목집합 기반 최적 이메일 폴더 추천 기법)

  • Moon, Jong-Pil;Lee, Won-Suk;Chang, Joong-Hyuk
    • Journal of the Korea Society of Computer and Information
    • /
    • v.16 no.2
    • /
    • pp.33-46
    • /
    • 2011
  • Since an e-mail has been an important mean of communication and information sharing, there have been much effort to classify e-mails efficiently by their contents. An e-mail has various forms in length and style, and words used in an e-mail are usually irregular. In addition, the criteria of an e-mail classification are subjective. As a result, it is quite difficult for the conventional text classification technique to be adapted to an e-mail classification efficiently. An e-mail classification technique in a commercial e-mail program uses a simple text filtering technique in an e-mail client. In the previous studies on automatic classification of an e-mail, the Naive Bayesian technique based on the probability has been used to improve the classification accuracy, and most of them are on an e-mail in English. This paper proposes the personalized recommendation technique of an email in Korean using a data mining technique of frequent patterns. The proposed technique consists of two phases such as the pre-processing of e-mails in an e-mail folder and the generating a profile for the e-mail folder. The generated profile is used for an e-mail to be classified into the most appropriate e-mail folder by the subjective criteria. The e-mail classification system is also implemented, which adapts the proposed technique.

A Multi-Wavelength Study of Galaxy Transition in Different Environments (다파장 관측 자료를 이용한 다양한 환경에서의 은하 진화 연구)

  • Lee, Gwang-Ho
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.43 no.1
    • /
    • pp.34.2-35
    • /
    • 2018
  • Galaxy transition from star-forming to quiescent, accompanied with morphology transformation, is one of the key unresolved issues in extragalactic astronomy. Although several environmental mechanisms have been proposed, a deeper understanding of the impact of environment on galaxy transition still requires much exploration. My Ph.D. thesis focuses on which environmental mechanisms are primarily responsible for galaxy transition in different environments and looks at what happens during the transition phase using multi-wavelength photometric/spectroscopic data, from UV to mid-infrared (MIR), derived from several large surveys (GALEX, SDSS, and WISE) and our GMOS-North IFU observations. Our multi-wavelength approach provides new insights into the *late* stages of galaxy transition with a definition of the MIR green valley different from the optical green valley. I will present highlights from three areas in my thesis. First, through an in-depth study of environmental dependence of various properties of galaxies in a nearby supercluster A2199 (Lee et al. 2015), we found that the star formation of galaxies is quenched before the galaxies enter the MIR green valley, which is driven mainly by strangulation. Then, the morphological transformation from late- to early-type galaxies occurs in the MIR green valley. The main environmental mechanisms for the morphological transformation are galaxy-galaxy mergers and interactions that are likely to happen in high-density regions such as galaxy groups/clusters. After the transformation, early-type MIR green valley galaxies keep the memory of their last star formation for several Gyr until they move on to the next stage for completely quiescent galaxies. Second, compact groups (CGs) of galaxies are the most favorable environments for galaxy interactions. We studied MIR properties of galaxies in CGs and their environmental dependence (Lee et al. 2017), using a sample of 670 CGs identified using a friends-of-friends algorithms. We found that MIR [3.4]-[12] colors of CG galaxies are, on average, bluer than those of cluster galaxies. As CGs are located in denser regions, they tend to have larger early-type galaxy fractions and bluer MIR color galaxies. These trends can also be seen for neighboring galaxies around CGs. However, CG members always have larger early-type fractions and bluer MIR colors than their neighboring galaxies. These results suggest that galaxy evolution is faster in CGs than in other environments and that CGs are likely to be the best place for pre-processing. Third, post-starburst galaxies (PSBs) are an ideal laboratory to investigate the details of the transition phase. Their spectra reveal a phase of vigorous star formation activity, which is abruptly ended within the last 1 Gyr. Numerical simulations predict that the starburst, and thus the current A-type stellar population, should be localized within the galaxy's center (< kpc). Yet our GMOS IFU observations show otherwise; all five PSBs in our sample have Hdelta absorption line profiles that extend well beyond the central kpc. Most interestingly, we found a negative correlation between the Hdelta gradient slopes and the fractions of the stellar mass produced during the starburst, suggesting that stronger starbursts are more centrally-concentrated. I will discuss the results in relation with the origin of PSBs.

  • PDF