• Title/Summary/Keyword: extraction process

Search Result 2,466, Processing Time 0.03 seconds

Region of Interest Extraction and Bilinear Interpolation Application for Preprocessing of Lipreading Systems (입 모양 인식 시스템 전처리를 위한 관심 영역 추출과 이중 선형 보간법 적용)

  • Jae Hyeok Han;Yong Ki Kim;Mi Hye Kim
    • The Transactions of the Korea Information Processing Society
    • /
    • v.13 no.4
    • /
    • pp.189-198
    • /
    • 2024
  • Lipreading is one of the important parts of speech recognition, and several studies have been conducted to improve the performance of lipreading in lipreading systems for speech recognition. Recent studies have used method to modify the model architecture of lipreading system to improve recognition performance. Unlike previous research that improve recognition performance by modifying model architecture, we aim to improve recognition performance without any change in model architecture. In order to improve the recognition performance without modifying the model architecture, we refer to the cues used in human lipreading and set other regions such as chin and cheeks as regions of interest along with the lip region, which is the existing region of interest of lipreading systems, and compare the recognition rate of each region of interest to propose the highest performing region of interest In addition, assuming that the difference in normalization results caused by the difference in interpolation method during the process of normalizing the size of the region of interest affects the recognition performance, we interpolate the same region of interest using nearest neighbor interpolation, bilinear interpolation, and bicubic interpolation, and compare the recognition rate of each interpolation method to propose the best performing interpolation method. Each region of interest was detected by training an object detection neural network, and dynamic time warping templates were generated by normalizing each region of interest, extracting and combining features, and mapping the dimensionality reduction of the combined features into a low-dimensional space. The recognition rate was evaluated by comparing the distance between the generated dynamic time warping templates and the data mapped to the low-dimensional space. In the comparison of regions of interest, the result of the region of interest containing only the lip region showed an average recognition rate of 97.36%, which is 3.44% higher than the average recognition rate of 93.92% in the previous study, and in the comparison of interpolation methods, the bilinear interpolation method performed 97.36%, which is 14.65% higher than the nearest neighbor interpolation method and 5.55% higher than the bicubic interpolation method. The code used in this study can be found a https://github.com/haraisi2/Lipreading-Systems.

The Jurisdictional Precedent Analysis of Medical Dispute in Dental Field (치과임상영역에서 발생된 의료분쟁의 판례분석)

  • Kwon, Byung-Ki;Ahn, Hyoung-Joon;Kang, Jin-Kyu;Kim, Chong-Youl;Choi, Jong-Hoon
    • Journal of Oral Medicine and Pain
    • /
    • v.31 no.4
    • /
    • pp.283-296
    • /
    • 2006
  • Along with the development of scientific technologies, health care has been growing remarkably, and as the social life quality improves with increasing interest in health, the demand for medical service is rapidly increasing. However, medical accident and medical dispute also are rapidly increasing due to various factors such as, increasing sense of people's right, lack of understanding in the nature of medical practice, over expectation on medical technique, commercialize medical supply system, moral degeneracy and unawareness of medical jurisprudence by doctors, widespread trend of mutual distrust, and lack of systematized device for solution of medical dispute. This study analysed 30 cases of civil suit in the year between 1994 to 2004, which were selected among the medical dispute cases in dental field with the judgement collected from organizations related to dentistry and department of oral medicine, Yonsei university dental hospital. The following results were drawn from the analyses: 1. The distribution of year showed rapid increase of medical dispute after the year 2000. 2. In the types of medical dispute, suit associated with tooth extraction took 36.7% of all. 3. As for the cause of medical dispute, uncomfortable feeling and dissatisfaction with the treatment showed 36.7%, death and permanent damage showed 16.7% each. 4. Winning the suit, compulsory mediation and recommendation for settlement took 60.0% of judgement result for the plaintiff. 5. For the type of medical organization in relation to medical dispute, 60.0% was found to be the private dental clinics, and 30.0% was university dental hospitals. 6. For the level of trial, dispute that progressed above 2 or 3 trials was of 30.0%. 7. For the amount of claim for damage, the claim amounting between 50 million to 100 million won was of 36.7%, and that of more than 100 million won was 13.3%, and in case of the judgement amount, the amount ranging from 10 million to 30 million won was of 40.0%, and that of more than 100 million won was of 6.7%. 8. For the number of dentist involved in the suit, 26.7% was of 2 or more dentists. 9. For the amount of time spent until the judgement, 46.7% took 11 to 20 months, and 36.7% took 21 to 30 months. 10. For medical malpractice, 46.7% was judged to be guilty, and 70% of the cases had undergone medical judgement or verification of the case by specialists during the process of the suit. 11. In the lost cases of doctors(18 cases), 72.2% was due to violence of carefulness in practice and 16.7% was due to missing of explanation to patient. Medical disputes occurring in the field of dentistry are usually of relatively less risky cases. Hence, the importance of explanation to patient is emphasized, and since the levels of patient satisfaction are subjective, improvement of the relationship between the patient and the dentist and recovery of autonomy within the group dentist are essential in addition to the reduction of technical malpractice. Moreover, management measure against the medical dispute should be set up through complement of the current doctors and hospitals medical malpractice insurance which is being conducted irrationally, and establishment of system in which education as well as consultation for medical disputes lead by the group of dental clinicians and academic scholars are accessible.

Twitter Issue Tracking System by Topic Modeling Techniques (토픽 모델링을 이용한 트위터 이슈 트래킹 시스템)

  • Bae, Jung-Hwan;Han, Nam-Gi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.109-122
    • /
    • 2014
  • People are nowadays creating a tremendous amount of data on Social Network Service (SNS). In particular, the incorporation of SNS into mobile devices has resulted in massive amounts of data generation, thereby greatly influencing society. This is an unmatched phenomenon in history, and now we live in the Age of Big Data. SNS Data is defined as a condition of Big Data where the amount of data (volume), data input and output speeds (velocity), and the variety of data types (variety) are satisfied. If someone intends to discover the trend of an issue in SNS Big Data, this information can be used as a new important source for the creation of new values because this information covers the whole of society. In this study, a Twitter Issue Tracking System (TITS) is designed and established to meet the needs of analyzing SNS Big Data. TITS extracts issues from Twitter texts and visualizes them on the web. The proposed system provides the following four functions: (1) Provide the topic keyword set that corresponds to daily ranking; (2) Visualize the daily time series graph of a topic for the duration of a month; (3) Provide the importance of a topic through a treemap based on the score system and frequency; (4) Visualize the daily time-series graph of keywords by searching the keyword; The present study analyzes the Big Data generated by SNS in real time. SNS Big Data analysis requires various natural language processing techniques, including the removal of stop words, and noun extraction for processing various unrefined forms of unstructured data. In addition, such analysis requires the latest big data technology to process rapidly a large amount of real-time data, such as the Hadoop distributed system or NoSQL, which is an alternative to relational database. We built TITS based on Hadoop to optimize the processing of big data because Hadoop is designed to scale up from single node computing to thousands of machines. Furthermore, we use MongoDB, which is classified as a NoSQL database. In addition, MongoDB is an open source platform, document-oriented database that provides high performance, high availability, and automatic scaling. Unlike existing relational database, there are no schema or tables with MongoDB, and its most important goal is that of data accessibility and data processing performance. In the Age of Big Data, the visualization of Big Data is more attractive to the Big Data community because it helps analysts to examine such data easily and clearly. Therefore, TITS uses the d3.js library as a visualization tool. This library is designed for the purpose of creating Data Driven Documents that bind document object model (DOM) and any data; the interaction between data is easy and useful for managing real-time data stream with smooth animation. In addition, TITS uses a bootstrap made of pre-configured plug-in style sheets and JavaScript libraries to build a web system. The TITS Graphical User Interface (GUI) is designed using these libraries, and it is capable of detecting issues on Twitter in an easy and intuitive manner. The proposed work demonstrates the superiority of our issue detection techniques by matching detected issues with corresponding online news articles. The contributions of the present study are threefold. First, we suggest an alternative approach to real-time big data analysis, which has become an extremely important issue. Second, we apply a topic modeling technique that is used in various research areas, including Library and Information Science (LIS). Based on this, we can confirm the utility of storytelling and time series analysis. Third, we develop a web-based system, and make the system available for the real-time discovery of topics. The present study conducted experiments with nearly 150 million tweets in Korea during March 2013.

Analysis of the Time-dependent Relation between TV Ratings and the Content of Microblogs (TV 시청률과 마이크로블로그 내용어와의 시간대별 관계 분석)

  • Choeh, Joon Yeon;Baek, Haedeuk;Choi, Jinho
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.163-176
    • /
    • 2014
  • Social media is becoming the platform for users to communicate their activities, status, emotions, and experiences to other people. In recent years, microblogs, such as Twitter, have gained in popularity because of its ease of use, speed, and reach. Compared to a conventional web blog, a microblog lowers users' efforts and investment for content generation by recommending shorter posts. There has been a lot research into capturing the social phenomena and analyzing the chatter of microblogs. However, measuring television ratings has been given little attention so far. Currently, the most common method to measure TV ratings uses an electronic metering device installed in a small number of sampled households. Microblogs allow users to post short messages, share daily updates, and conveniently keep in touch. In a similar way, microblog users are interacting with each other while watching television or movies, or visiting a new place. In order to measure TV ratings, some features are significant during certain hours of the day, or days of the week, whereas these same features are meaningless during other time periods. Thus, the importance of features can change during the day, and a model capturing the time sensitive relevance is required to estimate TV ratings. Therefore, modeling time-related characteristics of features should be a key when measuring the TV ratings through microblogs. We show that capturing time-dependency of features in measuring TV ratings is vitally necessary for improving their accuracy. To explore the relationship between the content of microblogs and TV ratings, we collected Twitter data using the Get Search component of the Twitter REST API from January 2013 to October 2013. There are about 300 thousand posts in our data set for the experiment. After excluding data such as adverting or promoted tweets, we selected 149 thousand tweets for analysis. The number of tweets reaches its maximum level on the broadcasting day and increases rapidly around the broadcasting time. This result is stems from the characteristics of the public channel, which broadcasts the program at the predetermined time. From our analysis, we find that count-based features such as the number of tweets or retweets have a low correlation with TV ratings. This result implies that a simple tweet rate does not reflect the satisfaction or response to the TV programs. Content-based features extracted from the content of tweets have a relatively high correlation with TV ratings. Further, some emoticons or newly coined words that are not tagged in the morpheme extraction process have a strong relationship with TV ratings. We find that there is a time-dependency in the correlation of features between the before and after broadcasting time. Since the TV program is broadcast at the predetermined time regularly, users post tweets expressing their expectation for the program or disappointment over not being able to watch the program. The highly correlated features before the broadcast are different from the features after broadcasting. This result explains that the relevance of words with TV programs can change according to the time of the tweets. Among the 336 words that fulfill the minimum requirements for candidate features, 145 words have the highest correlation before the broadcasting time, whereas 68 words reach the highest correlation after broadcasting. Interestingly, some words that express the impossibility of watching the program show a high relevance, despite containing a negative meaning. Understanding the time-dependency of features can be helpful in improving the accuracy of TV ratings measurement. This research contributes a basis to estimate the response to or satisfaction with the broadcasted programs using the time dependency of words in Twitter chatter. More research is needed to refine the methodology for predicting or measuring TV ratings.

A Proposal of a Keyword Extraction System for Detecting Social Issues (사회문제 해결형 기술수요 발굴을 위한 키워드 추출 시스템 제안)

  • Jeong, Dami;Kim, Jaeseok;Kim, Gi-Nam;Heo, Jong-Uk;On, Byung-Won;Kang, Mijung
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.3
    • /
    • pp.1-23
    • /
    • 2013
  • To discover significant social issues such as unemployment, economy crisis, social welfare etc. that are urgent issues to be solved in a modern society, in the existing approach, researchers usually collect opinions from professional experts and scholars through either online or offline surveys. However, such a method does not seem to be effective from time to time. As usual, due to the problem of expense, a large number of survey replies are seldom gathered. In some cases, it is also hard to find out professional persons dealing with specific social issues. Thus, the sample set is often small and may have some bias. Furthermore, regarding a social issue, several experts may make totally different conclusions because each expert has his subjective point of view and different background. In this case, it is considerably hard to figure out what current social issues are and which social issues are really important. To surmount the shortcomings of the current approach, in this paper, we develop a prototype system that semi-automatically detects social issue keywords representing social issues and problems from about 1.3 million news articles issued by about 10 major domestic presses in Korea from June 2009 until July 2012. Our proposed system consists of (1) collecting and extracting texts from the collected news articles, (2) identifying only news articles related to social issues, (3) analyzing the lexical items of Korean sentences, (4) finding a set of topics regarding social keywords over time based on probabilistic topic modeling, (5) matching relevant paragraphs to a given topic, and (6) visualizing social keywords for easy understanding. In particular, we propose a novel matching algorithm relying on generative models. The goal of our proposed matching algorithm is to best match paragraphs to each topic. Technically, using a topic model such as Latent Dirichlet Allocation (LDA), we can obtain a set of topics, each of which has relevant terms and their probability values. In our problem, given a set of text documents (e.g., news articles), LDA shows a set of topic clusters, and then each topic cluster is labeled by human annotators, where each topic label stands for a social keyword. For example, suppose there is a topic (e.g., Topic1 = {(unemployment, 0.4), (layoff, 0.3), (business, 0.3)}) and then a human annotator labels "Unemployment Problem" on Topic1. In this example, it is non-trivial to understand what happened to the unemployment problem in our society. In other words, taking a look at only social keywords, we have no idea of the detailed events occurring in our society. To tackle this matter, we develop the matching algorithm that computes the probability value of a paragraph given a topic, relying on (i) topic terms and (ii) their probability values. For instance, given a set of text documents, we segment each text document to paragraphs. In the meantime, using LDA, we can extract a set of topics from the text documents. Based on our matching process, each paragraph is assigned to a topic, indicating that the paragraph best matches the topic. Finally, each topic has several best matched paragraphs. Furthermore, assuming there are a topic (e.g., Unemployment Problem) and the best matched paragraph (e.g., Up to 300 workers lost their jobs in XXX company at Seoul). In this case, we can grasp the detailed information of the social keyword such as "300 workers", "unemployment", "XXX company", and "Seoul". In addition, our system visualizes social keywords over time. Therefore, through our matching process and keyword visualization, most researchers will be able to detect social issues easily and quickly. Through this prototype system, we have detected various social issues appearing in our society and also showed effectiveness of our proposed methods according to our experimental results. Note that you can also use our proof-of-concept system in http://dslab.snu.ac.kr/demo.html.

Studies on the Physical and Chemical Denatures of Cocoon Bave Sericin throughout Silk Filature Processes (제사과정 전후에서의 견사세리신의 물리화학적 성질변화에 관한 연구)

  • 남중희
    • Journal of Sericultural and Entomological Science
    • /
    • v.16 no.1
    • /
    • pp.21-48
    • /
    • 1974
  • The studies were carried out to disclose the physical and chemical properties of sericin fraction obtained from silk cocoon shells and its characteristics of swelling and solubility. The following results were obtained. 1. The physical and chemical properties of sericin fraction. 1) In contrast to the easy water soluble sericin, the hard soluble sericin contains fewer amino acids include of polar side radical while the hard soluble amino acid sach as alanine and leucine were detected. 2) The easy soluble amino acids were found mainly on the outer part of the fibroin, but the hard soluble amino acids were located in the near parts to the fibroin. 3) The swelling and solubility of the sericin could be hardly assayed by the analysis of the amino acid composition, and could be considered to tee closely related to the compound of the sericin crystal and secondary structure. 4) The X-ray patterns of the cocoon filament were ring shape, but they disappeared by the degumming treatment. 5) The sericin of tussah silkworm (A. pernyi), showed stronger circular patterns in the meridian than the regular silkworm (Bombyx mori). 6) There was no pattern difference between Fraction A and B. 7) X-ray diffraction patterns of the Sericin 1, ll and 111 were similar except interference of 8.85A (side chain spacing). 8) The amino acids above 150 in molecular weight such as Cys. Tyr. Phe. His. and Arg. were not found quantitatively by the 60 minutes-hydrolysis (6N-HCI). 9) The X-ray Pattern of 4.6A had a tendency to disappear with hot-water, ether, and alcohol treatment. 10) The partial hydrolysis of sericin showed a cirucular interference (2A) on the meridian. 11) The sericin pellet after hydrolysis was considered to be peptides composed with specific amino acids. 12) The decomposing temperature of Sericin 111 was higher than that of Sericin I and II. 13) Thermogram of the inner portioned sericin of the cocoon shell had double endothermic peaks at 165$^{\circ}C$, and 245$^{\circ}C$, and its decomposing temperature was higher than that of other portioned sericin. 14) The infrared spectroscopic properties among sericin I, II, III and sericin extracted from each layer portion of the cocoon shell were similar. II. The characteristics of seriein swelling and solubility related with silk processing. 1) Fifteen minutes was required to dehydrate the free moisture of cocoon shells with centrifugal force controlled at 13${\times}$10$^4$ dyne/g at 3,000 R.P.M. B) It took 30 minutes for the sericin to show positive reaction with the Folin-Ciocaltue reagent at room temperature. 3) The measurable wave length of the visible radiation was 500-750m${\mu}$, and the highest absorbance was observed at the wave length of 650m${\mu}$. 4) The colorimetric analysis should be conducted at 650mu for low concentration (10$\mu\textrm{g}$/$m\ell$), and at 500m${\mu}$ for the higher concentration to obtain an exact analysis. 5) The absorbing curves of sericin and egg albumin at different wave lengths were similar, but the absorbance of the former was slightly higher than that of the latter. 6) The quantity of the sericin measured by the colorimetric analysis, turned out to be less than by the Kjeldahl method. 7) Both temperature and duration in the cocoon cooking process has much effect on the swelling and solubility of the cocoon shells, but the temperature was more influential than the duration of the treatment. 8) The factorial relation between the temperature and the duration of treatment of the cocoon cooking to check for siricin swelling and solubility showed that the treatment duration should be gradually increased to reach optimum swelling and solubility of sericin with low temperature(70$^{\circ}C$) . High temperature, however, showed more sharp increase. 9) The more increased temperature in the drying of fresh cocoons, the less the sericin swelling and solubility were obtained. 10) In a specific cooking duration, the heavier the cocoon shell is, the less the swelling and solubility were obtained. 11) It was considered that there are differences in swelling or solubility between the filaments of each cocoon layer. 12) Sericin swelling or solubility in the cocoon filament was decreased by the wax extraction.. 13) The ionic surface active agent accelerated the swelling and solubility of the sericin at the range of pH 6-7. 14) In the same conditions as above, the cation agent was absorbed into the sericin. 15) In case of the increase of Ca ang Mg in the reeling water, its pH value drifted toward the acidity. 16) A buffering action was observed between the sericin and the water hardness constituents in the reeling water. 17) The effect of calcium on the swelling and solubility of the sericin was more moderate than that of magnecium. 18) The solute of the water hardness constituents increased the electric conductivity in the reeling water.

  • PDF