• 제목/요약/키워드: Sparse data

검색결과 413건 처리시간 0.028초

Abnormal Crowd Behavior Detection via H.264 Compression and SVDD in Video Surveillance System (H.264 압축과 SVDD를 이용한 영상 감시 시스템에서의 비정상 집단행동 탐지)

  • Oh, Seung-Geun;Lee, Jong-Uk;Chung, Yongw-Ha;Park, Dai-Hee
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • 제21권6호
    • /
    • pp.183-190
    • /
    • 2011
  • In this paper, we propose a prototype system for abnormal sound detection and identification which detects and recognizes the abnormal situations by means of analyzing audio information coming in real time from CCTV cameras under surveillance environment. The proposed system is composed of two layers: The first layer is an one-class support vector machine, i.e., support vector data description (SVDD) that performs rapid detection of abnormal situations and alerts to the manager. The second layer classifies the detected abnormal sound into predefined class such as 'gun', 'scream', 'siren', 'crash', 'bomb' via a sparse representation classifier (SRC) to cope with emergency situations. The proposed system is designed in a hierarchical manner via a mixture of SVDD and SRC, which has desired characteristics as follows: 1) By fast detecting abnormal sound using SVDD trained with only normal sound, it does not perform the unnecessary classification for normal sound. 2) It ensures a reliable system performance via a SRC that has been successfully applied in the field of face recognition. 3) With the intrinsic incremental learning capability of SRC, it can actively adapt itself to the change of a sound database. The experimental results with the qualitative analysis illustrate the efficiency of the proposed method.

Doubly-robust Q-estimation in observational studies with high-dimensional covariates (고차원 관측자료에서의 Q-학습 모형에 대한 이중강건성 연구)

  • Lee, Hyobeen;Kim, Yeji;Cho, Hyungjun;Choi, Sangbum
    • The Korean Journal of Applied Statistics
    • /
    • 제34권3호
    • /
    • pp.309-327
    • /
    • 2021
  • Dynamic treatment regimes (DTRs) are decision-making rules designed to provide personalized treatment to individuals in multi-stage randomized trials. Unlike classical methods, in which all individuals are prescribed the same type of treatment, DTRs prescribe patient-tailored treatments which take into account individual characteristics that may change over time. The Q-learning method, one of regression-based algorithms to figure out optimal treatment rules, becomes more popular as it can be easily implemented. However, the performance of the Q-learning algorithm heavily relies on the correct specification of the Q-function for response, especially in observational studies. In this article, we examine a number of double-robust weighted least-squares estimating methods for Q-learning in high-dimensional settings, where treatment models for propensity score and penalization for sparse estimation are also investigated. We further consider flexible ensemble machine learning methods for the treatment model to achieve double-robustness, so that optimal decision rule can be correctly estimated as long as at least one of the outcome model or treatment model is correct. Extensive simulation studies show that the proposed methods work well with practical sample sizes. The practical utility of the proposed methods is proven with real data example.

Comparison of hematologic and biochemical values in htPA transgenic pigs (사람 조직 플라스미노겐 활성인자 생산용 형질전환 돼지에서의 혈액학적 성상 비교)

  • Park, Mi-Ryung;Hwang, In-Sul;Lee, Seunghoon;Lee, Hwi-Cheul
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • 제21권12호
    • /
    • pp.395-400
    • /
    • 2020
  • Pigs have been used widely in biomedical research owing to their physiologic and anatomic similarities to humans. Analysis of the hematologic and biochemical values in pigs is an important basis for biomedical research and veterinary clinical diagnosis, but research on transgenic pigs has been sparse. This study was conducted to obtain basic data on transgenic pigs and to describe and compare the reference values for hematologic and biochemical parameters in human tissue plasminogen activator (htPA) transgenic pigs vs normal pigs. Blood samples were obtained from 7 normal LY (Landrace-Yorkshire crossbred) pigs and 8 transgenic pigs and 16 hematologic and 15 serum biochemical parameters were tested. Among the hematologic parameters tested, significant differences were observed in the red blood cells (RBC), mean red blood cell hemoglobin (MCH), and lymphocytes (LYM), between the non-transgenic and transgenic pigs. Among the biochemical parameters tested, the blood urea nitrogen (BUN), total protein (TP), cholesterol (CHOL), alanine aminotransferase (ALT), creatinine (CREA), gamma glutamyl transpeptidase (GGT), globin (GOB), and amylase (AMYL) showed significant differences between the two groups. Thus, the values determined in this study can be used as basic reference values for transgenic pigs and will contribute to their use in biomedical research.

Severe choline deficiency induces alternative splicing aberrance in optimized duck primary hepatocyte cultures

  • Zhao, Lulu;Cai, Hongying;Wu, Yongbao;Tian, Changfu;Wen, Zhiguo;Yang, Peilong
    • Animal Bioscience
    • /
    • 제35권11호
    • /
    • pp.1787-1799
    • /
    • 2022
  • Objective: Choline deficiency, one main trigger for nonalcoholic fatty liver disease (NAFLD), is closely related to lipid metabolism disorder. Previous study in a choline-deficient model has largely focused on gene expression rather than gene structure, especially sparse are studies regarding to alternative splicing (AS). In modern life science research, primary hepatocytes culture technology facilitates such studies, which can accurately imitate liver activity in vitro and show unique superiority. Whereas limitations to traditional hepatocytes culture technology exist in terms of efficiency and operability. This study pursued an optimization culture method for duck primary hepatocytes to explore AS in choline-deficient model. Methods: We performed an optimization culture method for duck primary hepatocytes with multi-step digestion procedure from Pekin duck embryos. Subsequently a NAFLD model was constructed with choline-free medium. RNA-seq and further analysis by rMATS were performed to identify AS events alterations in choline-deficency duck primary hepatocytes. Results: The results showed E13 (embryonic day 13) to E15 is suitable to obtain hepatocytes, and the viability reached over 95% by trypan blue exclusion assay. Primary hepatocyte retained their biological function as well identified by Periodic Acid-Schiff staining method and Glucose-6-phosphate dehydrogenase activity assay, respectively. Meanwhile, genes of alb and afp and specific protein of albumin were detected to verify cultured hepatocytes. Immunofluorescence was used to evaluate purity of hepatocytes, presenting up to 90%. On this base, choline-deficient model was constructed and displayed significantly increase of intracellular triglyceride and cholesterol as reported previously. Intriguingly, our data suggested that AS events in choline-deficient model were implicated in pivotal biological processes as an aberrant transcriptional regulator, of which 16 genes were involved in lipid metabolism and highly enriched in glycerophospholipid metabolism. Conclusion: An effective and rapid protocol for obtaining duck primary hepatocytes was established, by which our findings manifested choline deficiency could induce the accumulation of lipid and result in aberrant AS events in hepatocytes, providing a novel insight into various AS in the metabolism role of choline.

Malicious Traffic Classification Using Mitre ATT&CK and Machine Learning Based on UNSW-NB15 Dataset (마이터 어택과 머신러닝을 이용한 UNSW-NB15 데이터셋 기반 유해 트래픽 분류)

  • Yoon, Dong Hyun;Koo, Ja Hwan;Won, Dong Ho
    • KIPS Transactions on Software and Data Engineering
    • /
    • 제12권2호
    • /
    • pp.99-110
    • /
    • 2023
  • This study proposed a classification of malicious network traffic using the cyber threat framework(Mitre ATT&CK) and machine learning to solve the real-time traffic detection problems faced by current security monitoring systems. We applied a network traffic dataset called UNSW-NB15 to the Mitre ATT&CK framework to transform the label and generate the final dataset through rare class processing. After learning several boosting-based ensemble models using the generated final dataset, we demonstrated how these ensemble models classify network traffic using various performance metrics. Based on the F-1 score, we showed that XGBoost with no rare class processing is the best in the multi-class traffic environment. We recognized that machine learning ensemble models through Mitre ATT&CK label conversion and oversampling processing have differences over existing studies, but have limitations due to (1) the inability to match perfectly when converting between existing datasets and Mitre ATT&CK labels and (2) the presence of excessive sparse classes. Nevertheless, Catboost with B-SMOTE achieved the classification accuracy of 0.9526, which is expected to be able to automatically detect normal/abnormal network traffic.

User Perception of Personal Information Security: An Analytic Hierarch Process (AHP) Approach and Cross-Industry Analysis (기업의 개인정보 보호에 대한 사용자 인식 연구: 다차원 접근법(Analytic Hierarch Process)을 활용한 정보보안 속성 평가 및 업종별 비교)

  • Jonghwa Park;Seoungmin Han;Yoonhyuk Jung
    • Information Systems Review
    • /
    • 제25권4호
    • /
    • pp.233-248
    • /
    • 2023
  • The increasing integration of intelligent information technologies within organizational systems has amplified the risk to personal information security. This escalation, in turn, has fueled growing apprehension about an organization's capabilities in safeguarding user data. While Internet users adopt a multifaceted approach in assessing a company's information security, existing research on the multiple dimensions of information security is decidedly sparse. Moreover, there is a conspicuous gap in investigations exploring whether users' evaluations of organizational information security differ across industry types. With an aim to bridge these gaps, our study strives to identify which information security attributes users perceive as most critical and to delve deeper into potential variations in these attributes across different industry sectors. To this end, we conducted a structured survey involving 498 users and utilized the analytic hierarchy process (AHP) to determine the relative significance of various information security attributes. Our results indicate that users place the greatest importance on the technological dimension of information security, followed closely by transparency. In the technological arena, banks and domestic portal providers earned high ratings, while for transparency, banks and governmental agencies stood out. Contrarily, social media providers received the lowest evaluations in both domains. By introducing a multidimensional model of information security attributes and highlighting the relative importance of each in the realm of information security research, this study provides a significant theoretical contribution. Moreover, the practical implications are noteworthy: our findings serve as a foundational resource for Internet service companies to discern the security attributes that demand their attention, thereby facilitating an enhancement of their information security measures.

Future Prospects of Forest Type Change Determined from National Forest Inventory Time-series Data (시계열 국가산림자원조사 자료를 이용한 전국 산림의 임상 변화 특성 분석과 미래 전망)

  • Eun-Sook, Kim;Byung-Heon, Jung;Jae-Soo, Bae;Jong-Hwan, Lim
    • Journal of Korean Society of Forest Science
    • /
    • 제111권4호
    • /
    • pp.461-472
    • /
    • 2022
  • Natural and anthropogenic factors cause forest types to continuously change. Since the ratio of forest area by forest type is important information for identifying the characteristics of national forest resources, an accurate understanding of the prospect of forest type change is required. The study aim was to use National Forest Inventory (NFI) time-series data to understand the characteristics of forest type change and to estimate future prospects of nationwide forest type change. We used forest type change information from the fifth and seventh NFI datasets, climate, topography, forest stand, and disturbance variables related to forest type change to analyze trends and characteristics of forest type change. The results showed that the forests in Korea are changing in the direction of decreasing coniferous forests and increasing mixed and broadleaf forests. The forest sites that were changing from coniferous to mixed forests or from mixed to broadleaf forests were mainly located in wet topographic environments and climatic conditions. The forest type changes occurred more frequently in sites with high disturbance potential (high temperature, young or sparse forest stands, and non-forest areas). We used a climate change scenario (RCP 8.5) to establish a forest type change model (SVM) to predict future changes. During the 40-year period from 2015 to 2055, the SVM predicted that coniferous forests will decrease from 38.1% to 28.5%, broadleaf forests will increase from 34.2% to 38.8%, and mixed forests will increase from 27.7% to 32.7%. These results can be used as basic data for establishing future forest management strategies.

Resolving the 'Gray sheep' Problem Using Social Network Analysis (SNA) in Collaborative Filtering (CF) Recommender Systems (소셜 네트워크 분석 기법을 활용한 협업필터링의 특이취향 사용자(Gray Sheep) 문제 해결)

  • Kim, Minsung;Im, Il
    • Journal of Intelligence and Information Systems
    • /
    • 제20권2호
    • /
    • pp.137-148
    • /
    • 2014
  • Recommender system has become one of the most important technologies in e-commerce in these days. The ultimate reason to shop online, for many consumers, is to reduce the efforts for information search and purchase. Recommender system is a key technology to serve these needs. Many of the past studies about recommender systems have been devoted to developing and improving recommendation algorithms and collaborative filtering (CF) is known to be the most successful one. Despite its success, however, CF has several shortcomings such as cold-start, sparsity, gray sheep problems. In order to be able to generate recommendations, ordinary CF algorithms require evaluations or preference information directly from users. For new users who do not have any evaluations or preference information, therefore, CF cannot come up with recommendations (Cold-star problem). As the numbers of products and customers increase, the scale of the data increases exponentially and most of the data cells are empty. This sparse dataset makes computation for recommendation extremely hard (Sparsity problem). Since CF is based on the assumption that there are groups of users sharing common preferences or tastes, CF becomes inaccurate if there are many users with rare and unique tastes (Gray sheep problem). This study proposes a new algorithm that utilizes Social Network Analysis (SNA) techniques to resolve the gray sheep problem. We utilize 'degree centrality' in SNA to identify users with unique preferences (gray sheep). Degree centrality in SNA refers to the number of direct links to and from a node. In a network of users who are connected through common preferences or tastes, those with unique tastes have fewer links to other users (nodes) and they are isolated from other users. Therefore, gray sheep can be identified by calculating degree centrality of each node. We divide the dataset into two, gray sheep and others, based on the degree centrality of the users. Then, different similarity measures and recommendation methods are applied to these two datasets. More detail algorithm is as follows: Step 1: Convert the initial data which is a two-mode network (user to item) into an one-mode network (user to user). Step 2: Calculate degree centrality of each node and separate those nodes having degree centrality values lower than the pre-set threshold. The threshold value is determined by simulations such that the accuracy of CF for the remaining dataset is maximized. Step 3: Ordinary CF algorithm is applied to the remaining dataset. Step 4: Since the separated dataset consist of users with unique tastes, an ordinary CF algorithm cannot generate recommendations for them. A 'popular item' method is used to generate recommendations for these users. The F measures of the two datasets are weighted by the numbers of nodes and summed to be used as the final performance metric. In order to test performance improvement by this new algorithm, an empirical study was conducted using a publically available dataset - the MovieLens data by GroupLens research team. We used 100,000 evaluations by 943 users on 1,682 movies. The proposed algorithm was compared with an ordinary CF algorithm utilizing 'Best-N-neighbors' and 'Cosine' similarity method. The empirical results show that F measure was improved about 11% on average when the proposed algorithm was used

    . Past studies to improve CF performance typically used additional information other than users' evaluations such as demographic data. Some studies applied SNA techniques as a new similarity metric. This study is novel in that it used SNA to separate dataset. This study shows that performance of CF can be improved, without any additional information, when SNA techniques are used as proposed. This study has several theoretical and practical implications. This study empirically shows that the characteristics of dataset can affect the performance of CF recommender systems. This helps researchers understand factors affecting performance of CF. This study also opens a door for future studies in the area of applying SNA to CF to analyze characteristics of dataset. In practice, this study provides guidelines to improve performance of CF recommender systems with a simple modification.

  • Prognostic Factors of Thymic Carcinoma (흉선암의 예후인자)

    • Park, In-Kyu;Kim, Dae-Joon;Kim, Kil-Dong;Bae, Mi-Kyung;Chung, Kyung-Young
      • Journal of Chest Surgery
      • /
      • 제38권8호
      • /
      • pp.564-569
      • /
      • 2005
    • Background: Thymic carcinoma is a rare malignant disease with sparse data for treatment and prognosis. We intended to investigate the prognostic factors of thymic carcinoma. Material and Method: Data of 42 patients, who were diagnosed and treated for thymic carcinoma from January of 1986 to August of 2003 were reviewed retrospectively. Influences of characteristics of patients, Masaoka stage, histologic grade, completeness of resection and adjuvant treatment on survival were evaluated. Result: There were 30 male and 12 female patients and their mean age was $52.0\pm15.7$ years old. There were 28 patients with low-grade histology and 13 patients with high-grade histology. Clinical stage according to Masaoka stage were I in 2, II in 2, III in 15 $(35.7\%)$, IVa in 10 $(23.8\%),\;and\;IVb\;in\;13\;(31\%)$ patients. Surgical resection was done in 22 patients, Complete resection was possible in 13 patients and incomplete resection was done in 9 patients. Among 20 patients without resection, 8 patients received chemotherapy, 7 patients received radiotherapy and 5 patients received combined therapy. Median survival time was $31.7\pm6.1$ months and 5 year survival rate was $28.5\%$. High grade histology (hazard ratio=3.009, $95\%\;confidence\;interval=1.178\sim7.685,$ p=0.021) and incompleteness of resection (hazard ratio=3.605, $95\%$ confidence interval= $1.1541\sim1.580$, p=0.023) were the prognostic factors of thymic carcinoma. Conclusion: In thymic carcinoma, low grade histology is a good prognostic factor and complete resection can prolong the survival of patients.

    A Study on the Effects of User Participation on Stickiness and Continued Use on Internet Community (인터넷 커뮤니티에서 사용자 참여가 밀착도와 지속적 이용의도에 미치는 영향)

    • Ko, Mi-Hyun;Kwon, Sun-Dong
      • Asia pacific journal of information systems
      • /
      • 제18권2호
      • /
      • pp.41-72
      • /
      • 2008
    • The purpose of this study is the investigation of the effects of user participation, network effect, social influence, and usefulness on stickiness and continued use on Internet communities. In this research, stickiness refers to repeat visit and visit duration to an Internet community. Continued use means the willingness to continue to use an Internet community in the future. Internet community-based companies can earn money through selling the digital contents such as game, music, and avatar, advertizing on internet site, or offering an affiliate marketing. For such money making, stickiness and continued use of Internet users is much more important than the number of Internet users. We tried to answer following three questions. Fist, what is the effects of user participation on stickiness and continued use on Internet communities? Second, by what is user participation formed? Third, are network effect, social influence, and usefulness that was significant at prior research about technology acceptance model(TAM) still significant on internet communities? In this study, user participation, network effect, social influence, and usefulness are independent variables, stickiness is mediating variable, and continued use is dependent variable. Among independent variables, we are focused on user participation. User participation means that Internet user participates in the development of Internet community site (called mini-hompy or blog in Korea). User participation was studied from 1970 to 1997 at the research area of information system. But since 1997 when Internet started to spread to the public, user participation has hardly been studied. Given the importance of user participation at the success of Internet-based companies, it is very meaningful to study the research topic of user participation. To test the proposed model, we used a data set generated from the survey. The survey instrument was designed on the basis of a comprehensive literature review and interviews of experts, and was refined through several rounds of pretests, revisions, and pilot tests. The respondents of survey were the undergraduates and the graduate students who mainly used Internet communities. Data analysis was conducted using 217 respondents(response rate, 97.7 percent). We used structural equation modeling(SEM) implemented in partial least square(PLS). We chose PLS for two reason. First, our model has formative constructs. PLS uses components-based algorithm and can estimated formative constructs. Second, PLS is more appropriate when the research model is in an early stage of development. A review of the literature suggests that empirical tests of user participation is still sparse. The test of model was executed in the order of three research questions. First user participation had the direct effects on stickiness(${\beta}$=0.150, p<0.01) and continued use (${\beta}$=0.119, p<0.05). And user participation, as a partial mediation model, had a indirect effect on continued use mediated through stickiness (${\beta}$=0.007, p<0.05). Second, optional participation and prosuming participation significantly formed user participation. Optional participation, with a path magnitude as high as 0.986 (p<0.001), is a key determinant for the strength of user participation. Third, Network effect (${\beta}$=0.236, p<0.001). social influence (${\beta}$=0.135, p<0.05), and usefulness (${\beta}$=0.343, p<0.001) had directly significant impacts on stickiness. But network effect and social influence, as a full mediation model, had both indirectly significant impacts on continued use mediated through stickiness (${\beta}$=0.11, p<0.001, and ${\beta}$=0.063, p<0.05, respectively). Compared with this result, usefulness, as a partial mediation model, had a direct impact on continued use and a indirect impact on continued use mediated through stickiness. This study has three contributions. First this is the first empirical study showing that user participation is the significant driver of continued use. The researchers of information system have hardly studies user participation since late 1990s. And the researchers of marketing have studied a few lately. Second, this study enhanced the understanding of user participation. Up to recently, user participation has been studied from the bipolar viewpoint of participation v.s non-participation. Also, even the study on participation has been studied from the point of limited optional participation. But, this study proved the existence of prosuming participation to design and produce products or services, besides optional participation. And this study empirically proved that optional participation and prosuming participation were the key determinant for user participation. Third, our study compliments traditional studies of TAM. According prior literature about of TAM, the constructs of network effect, social influence, and usefulness had effects on the technology adoption. This study proved that these constructs still are significant on Internet communities.


    (34141) Korea Institute of Science and Technology Information, 245, Daehak-ro, Yuseong-gu, Daejeon
    Copyright (C) KISTI. All Rights Reserved.