• Title/Summary/Keyword: Unstructured Play

Search Result 18, Processing Time 0.021 seconds

Isolation of a novel dehydrin gene from Codonopsis lanceolata and analysis of its response to abiotic stresses

  • Pulla, Rama Krishna;Kim, Yu-Jin;Kim, Myung-Kyum;Senthil, Kalai Selvi;In, Jun-Gyo;Yang, Deok-Chun
    • BMB Reports
    • /
    • v.41 no.4
    • /
    • pp.338-343
    • /
    • 2008
  • Dehydrins (DHNs) compose a family of intrinsically unstructured proteins that have high water solubility and accumulate during late seed development at low temperature or in water-deficit conditions. They are believed to play a protective role in freezing and drought-tolerance in plants. A full-length cDNA encoding DHN (designated as ClDhn) was isolated from an oriental medicinal plant Codonopsis lanceolata, which has been used widely in Asia for its anticancer and anti-inflammatory properties. The full-length cDNA of ClDhn was 813 bp and contained a 477 bp open reading frame (ORF) encoding a polypeptide of 159 amino acids. Deduced ClDhn protein had high similarities with other plant DHNs. RT-PCR analysis showed that different abiotic stresses such as salt, wounding, chilling and light, triggered a significant induction of ClDhn at different time points within 4-48 hrs post-treatment. This study revealed that ClDhn assisted C. lanceolata in becoming resistant to dehydration.

Text-mining based Cause Analysis of Accidents at Workplaces in Korea (텍스트 마이닝 기법을 활용한 우리나라 산업재해의 원인분석)

  • Choi, Gi Heung
    • Journal of the Korean Society of Safety
    • /
    • v.37 no.3
    • /
    • pp.9-15
    • /
    • 2022
  • The analysis of the causes of accidents in workplaces where machines and tools are used is essential to improve the effectiveness and efficiency of safety prevention policies in places of employment in Korea. The causes of workplace accidents are not fully understood mainly due to difficulties in analyzing available descriptive information. This study focuses on the automated accident cause analysis in workplaces based on the accident abstracts found in industrial accident reports written in an unstructured descriptive format. The method proposed in this paper is based on text data mining and uses the keyword search function of Excel software to automate the analysis. The analysis results indicate that the primary reason for the frequency of accidents is related to technical aspects at a stage in which dangerous situations occur in the workplace. Accidents due to managerial causes are typically observed when danger exists in the workplace; however, managerial actions play a more important role in reducing accident severity. A small company tends to use unsafe machines and devices, leading to further accidents due to technical causes, whereas managerial causes are more conspicuous as the company grows. To preclude the occurrence of accidents due to inadequate knowledge, the implementation of safety management and the provision of safety education to elderly workers at the early stage of their employment are particularly important for small companies with less than 100 workers.

A Study on the Convergence Relativity of the Combining Curved Forms of Tall Buildings (초고층빌딩의 비정형 곡면형태 조합 및 복합관계에 관한 연구)

  • Park, Sang-Jun
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.3
    • /
    • pp.190-199
    • /
    • 2020
  • Globally, more super-tall buildings tend to be constructed competitively in the social and economic foundations. In the circumstance, this study is aimed at establishing a paradigm of super-tall buildings in terms of their various forms. Symbolizing a city or state, super-tall buildings not only are used as resources of tourism, but play an important role as a characteristic landmark. Therefore, it is necessary to find a curved form for a futuristic perspective. The purpose of this study is to infer the convergence relativity of curved forms among complex and diverse unstructured construction forms. This study used as subjects 50 super-tall buildings among the ranking data selected Council on Tall Buildings and Urban Habitat (CTBUH) in order for the basis of constructability related to actual design, rather than the way of recognizing a formative type, in the classification of curved forms into regularized surfaces, developable surfaces, and double-curved surfaces. The results of this study are presented as follows. This classification can be used as a fundamental material which is reasonably involved in the design process pursuing diverse curved surfaces in terms of design of tall buildings.

The Effect of Expert Reviews on Consumer Product Evaluations: A Text Mining Approach (전문가 제품 후기가 소비자 제품 평가에 미치는 영향: 텍스트마이닝 분석을 중심으로)

  • Kang, Taeyoung;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.63-82
    • /
    • 2016
  • Individuals gather information online to resolve problems in their daily lives and make various decisions about the purchase of products or services. With the revolutionary development of information technology, Web 2.0 has allowed more people to easily generate and use online reviews such that the volume of information is rapidly increasing, and the usefulness and significance of analyzing the unstructured data have also increased. This paper presents an analysis on the lexical features of expert product reviews to determine their influence on consumers' purchasing decisions. The focus was on how unstructured data can be organized and used in diverse contexts through text mining. In addition, diverse lexical features of expert reviews of contents provided by a third-party review site were extracted and defined. Expert reviews are defined as evaluations by people who have expert knowledge about specific products or services in newspapers or magazines; this type of review is also called a critic review. Consumers who purchased products before the widespread use of the Internet were able to access expert reviews through newspapers or magazines; thus, they were not able to access many of them. Recently, however, major media also now provide online services so that people can more easily and affordably access expert reviews compared to the past. The reason why diverse reviews from experts in several fields are important is that there is an information asymmetry where some information is not shared among consumers and sellers. The information asymmetry can be resolved with information provided by third parties with expertise to consumers. Then, consumers can read expert reviews and make purchasing decisions by considering the abundant information on products or services. Therefore, expert reviews play an important role in consumers' purchasing decisions and the performance of companies across diverse industries. If the influence of qualitative data such as reviews or assessment after the purchase of products can be separately identified from the quantitative data resources, such as the actual quality of products or price, it is possible to identify which aspects of product reviews hamper or promote product sales. Previous studies have focused on the characteristics of the experts themselves, such as the expertise and credibility of sources regarding expert reviews; however, these studies did not suggest the influence of the linguistic features of experts' product reviews on consumers' overall evaluation. However, this study focused on experts' recommendations and evaluations to reveal the lexical features of expert reviews and whether such features influence consumers' overall evaluations and purchasing decisions. Real expert product reviews were analyzed based on the suggested methodology, and five lexical features of expert reviews were ultimately determined. Specifically, the "review depth" (i.e., degree of detail of the expert's product analysis), and "lack of assurance" (i.e., degree of confidence that the expert has in the evaluation) have statistically significant effects on consumers' product evaluations. In contrast, the "positive polarity" (i.e., the degree of positivity of an expert's evaluations) has an insignificant effect, while the "negative polarity" (i.e., the degree of negativity of an expert's evaluations) has a significant negative effect on consumers' product evaluations. Finally, the "social orientation" (i.e., the degree of how many social expressions experts include in their reviews) does not have a significant effect on consumers' product evaluations. In summary, the lexical properties of the product reviews were defined according to each relevant factor. Then, the influence of each linguistic factor of expert reviews on the consumers' final evaluations was tested. In addition, a test was performed on whether each linguistic factor influencing consumers' product evaluations differs depending on the lexical features. The results of these analyses should provide guidelines on how individuals process massive volumes of unstructured data depending on lexical features in various contexts and how companies can use this mechanism from their perspective. This paper provides several theoretical and practical contributions, such as the proposal of a new methodology and its application to real data.

Korean Word Sense Disambiguation using Dictionary and Corpus (사전과 말뭉치를 이용한 한국어 단어 중의성 해소)

  • Jeong, Hanjo;Park, Byeonghwa
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.1-13
    • /
    • 2015
  • As opinion mining in big data applications has been highlighted, a lot of research on unstructured data has made. Lots of social media on the Internet generate unstructured or semi-structured data every second and they are often made by natural or human languages we use in daily life. Many words in human languages have multiple meanings or senses. In this result, it is very difficult for computers to extract useful information from these datasets. Traditional web search engines are usually based on keyword search, resulting in incorrect search results which are far from users' intentions. Even though a lot of progress in enhancing the performance of search engines has made over the last years in order to provide users with appropriate results, there is still so much to improve it. Word sense disambiguation can play a very important role in dealing with natural language processing and is considered as one of the most difficult problems in this area. Major approaches to word sense disambiguation can be classified as knowledge-base, supervised corpus-based, and unsupervised corpus-based approaches. This paper presents a method which automatically generates a corpus for word sense disambiguation by taking advantage of examples in existing dictionaries and avoids expensive sense tagging processes. It experiments the effectiveness of the method based on Naïve Bayes Model, which is one of supervised learning algorithms, by using Korean standard unabridged dictionary and Sejong Corpus. Korean standard unabridged dictionary has approximately 57,000 sentences. Sejong Corpus has about 790,000 sentences tagged with part-of-speech and senses all together. For the experiment of this study, Korean standard unabridged dictionary and Sejong Corpus were experimented as a combination and separate entities using cross validation. Only nouns, target subjects in word sense disambiguation, were selected. 93,522 word senses among 265,655 nouns and 56,914 sentences from related proverbs and examples were additionally combined in the corpus. Sejong Corpus was easily merged with Korean standard unabridged dictionary because Sejong Corpus was tagged based on sense indices defined by Korean standard unabridged dictionary. Sense vectors were formed after the merged corpus was created. Terms used in creating sense vectors were added in the named entity dictionary of Korean morphological analyzer. By using the extended named entity dictionary, term vectors were extracted from the input sentences and then term vectors for the sentences were created. Given the extracted term vector and the sense vector model made during the pre-processing stage, the sense-tagged terms were determined by the vector space model based word sense disambiguation. In addition, this study shows the effectiveness of merged corpus from examples in Korean standard unabridged dictionary and Sejong Corpus. The experiment shows the better results in precision and recall are found with the merged corpus. This study suggests it can practically enhance the performance of internet search engines and help us to understand more accurate meaning of a sentence in natural language processing pertinent to search engines, opinion mining, and text mining. Naïve Bayes classifier used in this study represents a supervised learning algorithm and uses Bayes theorem. Naïve Bayes classifier has an assumption that all senses are independent. Even though the assumption of Naïve Bayes classifier is not realistic and ignores the correlation between attributes, Naïve Bayes classifier is widely used because of its simplicity and in practice it is known to be very effective in many applications such as text classification and medical diagnosis. However, further research need to be carried out to consider all possible combinations and/or partial combinations of all senses in a sentence. Also, the effectiveness of word sense disambiguation may be improved if rhetorical structures or morphological dependencies between words are analyzed through syntactic analysis.

The Study of Docent System Improvement for Revitalization of Science Museum (과학관 활성화를 위한 도슨트 제도 개선 연구)

  • Park, Young-Shin;Lee, Jung-Hwa
    • Journal of the Korean earth science society
    • /
    • v.33 no.2
    • /
    • pp.200-215
    • /
    • 2012
  • The revitalization of science museum depends on the number of qualified docents who can meet the museum visitors' educational needs. However, the current unstructured docent system is not sufficient to meet the goal. Forty six docents currently working in science museums were surveyed about docent training program, current working conditions, and docent professional program in order to propose a viable system providing a docent profession. Data were collected through surveys with 46 docents, interviews with two experienced docents, and several artifacts from the science museum and selected docents. The surveys consisted of 47 items asking about personal biography, docent's perception, docents training program they took, current working conditions, and supplementary professional program. The conclusion of this study is as follows; First, there must be recognition about docents who can play educator's roles which are different from those of general volunteers in terms of recruiting and training system in science museum. Second, docents need to take training and supplementary professional courses that focus on observing and educating visitors in the field. Third, we need a docent management system by employing a well structured evaluating tools. A well established docent system will bring forth the enhancement of science museum education and the increase of science popularization by providing visitors with the quality educational services.

Status and Quality Analysis on the Biodiversity Data of East Asian Vascular Plants Mobilized through the Global Biodiversity Information Facility (GBIF) (세계생물다양성정보기구(GBIF)에 출판된 동아시아 관속식물 생물다양성 정보 현황과 자료품질 분석)

  • Chang, Chin-Sung;Kwon, Shin-Young;Kim, Hui
    • Journal of Korean Society of Forest Science
    • /
    • v.110 no.2
    • /
    • pp.179-188
    • /
    • 2021
  • Biodiversity informatics applies information technology methods in organizing, accessing, visualizing, and analyzing primary biodiversity data and quantitative data management through the scientific names of accepted names and synonyms. We reviewed the GBIF data published by China, Japan, Taiwan, and internal institutes, such as NIBR, NIE, and KNA of the Republic of Korea, and assessed data in diverse aspects of data quality using BRAHMS software. Most data from four Asian countries have quality problems with the lack of data consistency and missing information on georeferenced data, collectors, collection date, and place names (gazetteers) or other invalid data forms. The major problem is that biodiversity management institutions in East Asia are using unstructured databases and simple spreadsheet-type data. Owing to the nature of the biodiversity information, if data relationships are not structured, it would be impossible to secure the data integrity of scientific names, human names, geographical names, literature, and ecological information. For data quality, it is essential to build data integrity for database management and training systems for taxonomists who are continuous data managers to correct errors. Thus, publishers in East Asia play an essential role not only in using specialized software to manage biodiversity data but also in developing structured databases and ensuring their integration and value within biodiversity publishing platforms.

Investigating Dynamic Mutation Process of Issues Using Unstructured Text Analysis (부도예측을 위한 KNN 앙상블 모형의 동시 최적화)

  • Min, Sung-Hwan
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.139-157
    • /
    • 2016
  • Bankruptcy involves considerable costs, so it can have significant effects on a country's economy. Thus, bankruptcy prediction is an important issue. Over the past several decades, many researchers have addressed topics associated with bankruptcy prediction. Early research on bankruptcy prediction employed conventional statistical methods such as univariate analysis, discriminant analysis, multiple regression, and logistic regression. Later on, many studies began utilizing artificial intelligence techniques such as inductive learning, neural networks, and case-based reasoning. Currently, ensemble models are being utilized to enhance the accuracy of bankruptcy prediction. Ensemble classification involves combining multiple classifiers to obtain more accurate predictions than those obtained using individual models. Ensemble learning techniques are known to be very useful for improving the generalization ability of the classifier. Base classifiers in the ensemble must be as accurate and diverse as possible in order to enhance the generalization ability of an ensemble model. Commonly used methods for constructing ensemble classifiers include bagging, boosting, and random subspace. The random subspace method selects a random feature subset for each classifier from the original feature space to diversify the base classifiers of an ensemble. Each ensemble member is trained by a randomly chosen feature subspace from the original feature set, and predictions from each ensemble member are combined by an aggregation method. The k-nearest neighbors (KNN) classifier is robust with respect to variations in the dataset but is very sensitive to changes in the feature space. For this reason, KNN is a good classifier for the random subspace method. The KNN random subspace ensemble model has been shown to be very effective for improving an individual KNN model. The k parameter of KNN base classifiers and selected feature subsets for base classifiers play an important role in determining the performance of the KNN ensemble model. However, few studies have focused on optimizing the k parameter and feature subsets of base classifiers in the ensemble. This study proposed a new ensemble method that improves upon the performance KNN ensemble model by optimizing both k parameters and feature subsets of base classifiers. A genetic algorithm was used to optimize the KNN ensemble model and improve the prediction accuracy of the ensemble model. The proposed model was applied to a bankruptcy prediction problem by using a real dataset from Korean companies. The research data included 1800 externally non-audited firms that filed for bankruptcy (900 cases) or non-bankruptcy (900 cases). Initially, the dataset consisted of 134 financial ratios. Prior to the experiments, 75 financial ratios were selected based on an independent sample t-test of each financial ratio as an input variable and bankruptcy or non-bankruptcy as an output variable. Of these, 24 financial ratios were selected by using a logistic regression backward feature selection method. The complete dataset was separated into two parts: training and validation. The training dataset was further divided into two portions: one for the training model and the other to avoid overfitting. The prediction accuracy against this dataset was used to determine the fitness value in order to avoid overfitting. The validation dataset was used to evaluate the effectiveness of the final model. A 10-fold cross-validation was implemented to compare the performances of the proposed model and other models. To evaluate the effectiveness of the proposed model, the classification accuracy of the proposed model was compared with that of other models. The Q-statistic values and average classification accuracies of base classifiers were investigated. The experimental results showed that the proposed model outperformed other models, such as the single model and random subspace ensemble model.