Search | Korea Science

Hate Speech Detection Using Modified Principal Component Analysis and Enhanced Convolution Neural Network on Twitter Dataset

Majed, Alowaidi
- International Journal of Computer Science & Network Security
- /
- v.23 no.1
- /
- pp.112-119
- /
- 2023
Traditionally used for networking computers and communications, the Internet has been evolving from the beginning. Internet is the backbone for many things on the web including social media. The concept of social networking which started in the early 1990s has also been growing with the internet. Social Networking Sites (SNSs) sprung and stayed back to an important element of internet usage mainly due to the services or provisions they allow on the web. Twitter and Facebook have become the primary means by which most individuals keep in touch with others and carry on substantive conversations. These sites allow the posting of photos, videos and support audio and video storage on the sites which can be shared amongst users. Although an attractive option, these provisions have also culminated in issues for these sites like posting offensive material. Though not always, users of SNSs have their share in promoting hate by their words or speeches which is difficult to be curtailed after being uploaded in the media. Hence, this article outlines a process for extracting user reviews from the Twitter corpus in order to identify instances of hate speech. Through the use of MPCA (Modified Principal Component Analysis) and ECNN, we are able to identify instances of hate speech in the text (Enhanced Convolutional Neural Network). With the use of NLP, a fully autonomous system for assessing syntax and meaning can be established (NLP). There is a strong emphasis on pre-processing, feature extraction, and classification. Cleansing the text by removing extra spaces, punctuation, and stop words is what normalization is all about. In the process of extracting features, these features that have already been processed are used. During the feature extraction process, the MPCA algorithm is used. It takes a set of related features and pulls out the ones that tell us the most about the dataset we give itThe proposed categorization method is then put forth as a means of detecting instances of hate speech or abusive language. It is argued that ECNN is superior to other methods for identifying hateful content online. It can take in massive amounts of data and quickly return accurate results, especially for larger datasets. As a result, the proposed MPCA+ECNN algorithm improves not only the F-measure values, but also the accuracy, precision, and recall.
https://doi.org/10.22937/IJCSNS.2023.23.1.15 인용 PDF

The need for mechanization in todays canal building program in korea and overseas (수로의 기계화 시공의 필요성)

Ha, Gordon P.wkins
- Magazine of the Korean Society of Agricultural Engineers
- /
- v.21 no.2
- /
- pp.21-27
- /
- 1979
Canal construction is not the only area in which mechanization has advanced with great strides. All phases of the construction industry, including earthmoving, land clearing and levelling, road construction, and drainage and water control projects, have benefited from today's technological advancements. Lasers, an excellant example of advanced technology, have been refined for use as guidance systems for construction machinery, increasing accuracy and the speed of operation. The use of explosives by contractors is becoming more commonplace. One of the most valuable modern tools available today is the two-way radio. On today's sophisticated projects a single machine being down can frequently stop the progress of the entire project, delaying hundreds of men and machines from completing their assigned work for the day. The use of two-way radios in all the pickups and cars being used on a project facilitates communication so that emergency repairs can be effected immediately, and costly down time on any project can be reduced to a minimum. Not every construction project is suitable to mechanization. However, on the majority of projects mechanization has a great deal to offer the Korean contractor, and all contractors, in savings of time and money. Each and every project being considered by a contractor, should be closely examined for the most effective and efficient machinery application available. The International Commission on Irrigation and Drainage (ICID) has formed a committee on construction techniques being used in canal construction today. Two publications are now available describing the advances made in recent years. Standards for construction have been established for mechanized systems and this information is being distributed worldwide.
PDF

Clinical Study of Primary Carcinoma of The Lung (III) (원발성 폐암의 조직학적 분류 및 임상적 관찰 (III))

Seo, Jee-Young;Park, Mee-Ran;Kim, Chang-Sun;Son, Hyung-Dae;Cho, Dong-Il;Rhu, Nam-Soo
- Tuberculosis and Respiratory Diseases
- /
- v.45 no.1
- /
- pp.45-56
- /
- 1998
Background: Lung cancer continues to increase worldwide. Also, the proportion of female patients incease and adenocarcinoma is the predominant histological type among lung cancer in many western countries. So, we studied these current trends of lung cancer by clinical approach of recent patients from our department Method: We conducted a retrospective analysis on 212 subjects who were diagnosed with lung cancer at the department of chest medicine in National Medical Center between January 1990 and July 1996. The contents of analysis were patient's profile, clinical manifestation, smoking habits, accuracy of diagnostic methods, histological cell type, staging and treatment, etc. Results: The results were as follows. 1) The ratio of male to female was 5.2 : 1. The peak incidence of age was 7th decade(35.4%). 2) Chief complaints were cough, dyspnea and chest pain, etc. The most common duration of symptoms before the first admission was less than 3 months(57.7%). On the other side, duration more than 1 year represented 6.5%. The early diagnosed patients has been increased from the 1980s. 3) Smokers among the total patients were 77.2%. The proportion of smokers in sqamous cell carcinoma, small cell carcinoma and adenocarcinoma were 88.4%, 85.7% and 55.7%, respectively. Smoking history and histological cell type were correlated in squamous and small cell carcinoma. 4) Squamous cell carcinoma is still the predominant histological type (44.8%), but, adenocarcinoma increased more than the previous study(30.7%). The other histological types were small cell carcinoma(17.0%) and large cell carcinoma(3.8%) in order of their proportions. 5) The accuracy of diagnostic methods were as follows: sputum cytology 75.3%, bronchoscopic biopsy 65.7%, lymph node aspiration cytology 95.8%, percutaneous lung aspiration cytology 94.6% and open lung biopsy 100%. The general accuracies of diagnostic methods were improved than previous studies. 6) Performance status scales on admission were relatively good. After diagnosis, chemotherapy and/or radiotherapy were undertaken in 69.3% of the patients, and only 7.5% of the patients were operated. Conclusion: In our study, squamous cell carcinoma is still the predominant histological cell type, but, adenocarcinoma continues to increase. Because adenocarcinoma is less correlated with smoking habits, further evaluation of other carcinogens than smoking is requested. Screening and early diagnosis of lung cancer is important for good performance status scales in spite of advanced stages. But, we think that the prevention, for example, stop smokings is more important as because of no perfect treatment for lung cancer.
PDF

An Analysis of IT Trends Using Tweet Data (트윗 데이터를 활용한 IT 트렌드 분석)

Yi, Jin Baek;Lee, Choong Kwon;Cha, Kyung Jin
- Journal of Intelligence and Information Systems
- /
- v.21 no.1
- /
- pp.143-159
- /
- 2015
Predicting IT trends has been a long and important subject for information systems research. IT trend prediction makes it possible to acknowledge emerging eras of innovation and allocate budgets to prepare against rapidly changing technological trends. Towards the end of each year, various domestic and global organizations predict and announce IT trends for the following year. For example, Gartner Predicts 10 top IT trend during the next year, and these predictions affect IT and industry leaders and organization's basic assumptions about technology and the future of IT, but the accuracy of these reports are difficult to verify. Social media data can be useful tool to verify the accuracy. As social media services have gained in popularity, it is used in a variety of ways, from posting about personal daily life to keeping up to date with news and trends. In the recent years, rates of social media activity in Korea have reached unprecedented levels. Hundreds of millions of users now participate in online social networks and communicate with colleague and friends their opinions and thoughts. In particular, Twitter is currently the major micro blog service, it has an important function named 'tweets' which is to report their current thoughts and actions, comments on news and engage in discussions. For an analysis on IT trends, we chose Tweet data because not only it produces massive unstructured textual data in real time but also it serves as an influential channel for opinion leading on technology. Previous studies found that the tweet data provides useful information and detects the trend of society effectively, these studies also identifies that Twitter can track the issue faster than the other media, newspapers. Therefore, this study investigates how frequently the predicted IT trends for the following year announced by public organizations are mentioned on social network services like Twitter. IT trend predictions for 2013, announced near the end of 2012 from two domestic organizations, the National IT Industry Promotion Agency (NIPA) and the National Information Society Agency (NIA), were used as a basis for this research. The present study analyzes the Twitter data generated from Seoul (Korea) compared with the predictions of the two organizations to analyze the differences. Thus, Twitter data analysis requires various natural language processing techniques, including the removal of stop words, and noun extraction for processing various unrefined forms of unstructured data. To overcome these challenges, we used SAS IRS (Information Retrieval Studio) developed by SAS to capture the trend in real-time processing big stream datasets of Twitter. The system offers a framework for crawling, normalizing, analyzing, indexing and searching tweet data. As a result, we have crawled the entire Twitter sphere in Seoul area and obtained 21,589 tweets in 2013 to review how frequently the IT trend topics announced by the two organizations were mentioned by the people in Seoul. The results shows that most IT trend predicted by NIPA and NIA were all frequently mentioned in Twitter except some topics such as 'new types of security threat', 'green IT', 'next generation semiconductor' since these topics non generalized compound words so they can be mentioned in Twitter with other words. To answer whether the IT trend tweets from Korea is related to the following year's IT trends in real world, we compared Twitter's trending topics with those in Nara Market, Korea's online e-Procurement system which is a nationwide web-based procurement system, dealing with whole procurement process of all public organizations in Korea. The correlation analysis show that Tweet frequencies on IT trending topics predicted by NIPA and NIA are significantly correlated with frequencies on IT topics mentioned in project announcements by Nara market in 2012 and 2013. The main contribution of our research can be found in the following aspects: i) the IT topic predictions announced by NIPA and NIA can provide an effective guideline to IT professionals and researchers in Korea who are looking for verified IT topic trends in the following topic, ii) researchers can use Twitter to get some useful ideas to detect and predict dynamic trends of technological and social issues.
https://doi.org/10.13088/jiis.2015.21.1.143 인용 PDF KSCI

An accuracy analysis of Cyberknife tumor tracking radiotherapy according to unpredictable change of respiration (예측 불가능한 호흡 변화에 따른 사이버나이프 종양 추적 방사선 치료의 정확도 분석)

Seo, jung min;Lee, chang yeol;Huh, hyun do;Kim, wan sun
- The Journal of Korean Society for Radiation Therapy
- /
- v.27 no.2
- /
- pp.157-166
- /
- 2015
Purpose : Cyber-Knife tumor tracking system, based on the correlation relationship between the position of a tumor which moves in response to the real time respiratory cycle signal and respiration was obtained by the LED marker attached to the outside of the patient, the location of the tumor to predict in advance, the movement of the tumor in synchronization with the therapeutic device to track real-time tumor, is a system for treating. The purpose of this study, in the cyber knife tumor tracking radiation therapy, trying to evaluate the accuracy of tumor tracking radiation therapy system due to the change in the form of unpredictable sudden breathing due to cough and sleep. Materials and Methods : Breathing Log files that were used in the study, based on the Respiratory gating radiotherapy and Cyber-knife tracking radiosurgery breathing Log files of patients who received herein, measured using the Log files in the form of a Sinusoidal pattern and Sudden change pattern. it has been reconstituted as possible. Enter the reconstructed respiratory Log file cyber knife dynamic chest Phantom, so that it is possible to implement a motion due to respiration, add manufacturing the driving apparatus of the existing dynamic chest Phantom, Phantom the form of respiration we have developed a program that can be applied to. Movement of the phantom inside the target (Ball cube target) was driven by the displacement of three sizes of according to the size of the respiratory vertical (Superior-Inferior) direction to the 5 mm, 10 mm, 20 mm. Insert crosses two EBT3 films in phantom inside the target in response to changes in the target movement, the End-to-End (E2E) test provided in Cyber-Knife manufacturer depending on the form of the breathing five times each. It was determined by carrying. Accuracy of tumor tracking system is indicated by the target error by analyzing the inserted film, additional E2E test is analyzed by measuring the correlation error while being advanced. Results : If the target error is a sine curve breathing form, the size of the target of the movement is in response to the 5 mm, 10 mm, 20 mm, respectively, of the average $1.14{\pm}0.13mm$, $1.05{\pm}0.20mm$, with $2.37{\pm}0.17mm$, suddenly for it is variations in breathing, respective average $1.87{\pm}0.19mm$, $2.15{\pm}0.21mm$, and analyzed with $2.44{\pm}0.26mm$. If the correlation error can be defined by the length of the displacement vector in the target track is a sinusoidal breathing mode, the size of the target of the movement in response to 5 mm, 10 mm, 20 mm, respective average $0.84{\pm}0.01mm$, $0.70{\pm}0.13mm$, with $1.63{\pm}0.10mm$, if it is a variant of sudden breathing respective average $0.97{\pm}0.06mm$, $1.44{\pm}0.11mm$, and analyzed with $1.98{\pm}0.10mm$. The larger the correlation error values in both the both the respiratory form, the target error value is large. If the motion size of the target of the sine curve breathing form is greater than or equal to 20 mm, was measured at 1.5 mm or more is a recommendation value of both cyber knife manufacturer of both error value. Conclusion : There is a tendency that the correlation error value between about target error value magnitude of the target motion is large is increased, the error value becomes large in variation of rapid respiration than breathing the form of a sine curve. The more the shape of the breathing large movements regular shape of sine curves target accuracy of the tumor tracking system can be judged to be reduced. Using the algorithm of Cyber-Knife tumor tracking system, when there is a change in the sudden unpredictable respiratory due patient coughing during treatment enforcement is to stop the treatment, it is assumed to carry out the internal target validation process again, it is necessary to readjust the form of respiration. Patients under treatment is determined to be able to improve the treatment of accuracy to induce the observed form of regular breathing and put like to see the goggles monitor capable of the respiratory form of the person.
PDF

Stock Price Prediction by Utilizing Category Neutral Terms: Text Mining Approach (카테고리 중립 단어 활용을 통한 주가 예측 방안: 텍스트 마이닝 활용)

Lee, Minsik;Lee, Hong Joo
- Journal of Intelligence and Information Systems
- /
- v.23 no.2
- /
- pp.123-138
- /
- 2017
Since the stock market is driven by the expectation of traders, studies have been conducted to predict stock price movements through analysis of various sources of text data. In order to predict stock price movements, research has been conducted not only on the relationship between text data and fluctuations in stock prices, but also on the trading stocks based on news articles and social media responses. Studies that predict the movements of stock prices have also applied classification algorithms with constructing term-document matrix in the same way as other text mining approaches. Because the document contains a lot of words, it is better to select words that contribute more for building a term-document matrix. Based on the frequency of words, words that show too little frequency or importance are removed. It also selects words according to their contribution by measuring the degree to which a word contributes to correctly classifying a document. The basic idea of constructing a term-document matrix was to collect all the documents to be analyzed and to select and use the words that have an influence on the classification. In this study, we analyze the documents for each individual item and select the words that are irrelevant for all categories as neutral words. We extract the words around the selected neutral word and use it to generate the term-document matrix. The neutral word itself starts with the idea that the stock movement is less related to the existence of the neutral words, and that the surrounding words of the neutral word are more likely to affect the stock price movements. And apply it to the algorithm that classifies the stock price fluctuations with the generated term-document matrix. In this study, we firstly removed stop words and selected neutral words for each stock. And we used a method to exclude words that are included in news articles for other stocks among the selected words. Through the online news portal, we collected four months of news articles on the top 10 market cap stocks. We split the news articles into 3 month news data as training data and apply the remaining one month news articles to the model to predict the stock price movements of the next day. We used SVM, Boosting and Random Forest for building models and predicting the movements of stock prices. The stock market opened for four months (2016/02/01 ~ 2016/05/31) for a total of 80 days, using the initial 60 days as a training set and the remaining 20 days as a test set. The proposed word - based algorithm in this study showed better classification performance than the word selection method based on sparsity. This study predicted stock price volatility by collecting and analyzing news articles of the top 10 stocks in market cap. We used the term - document matrix based classification model to estimate the stock price fluctuations and compared the performance of the existing sparse - based word extraction method and the suggested method of removing words from the term - document matrix. The suggested method differs from the word extraction method in that it uses not only the news articles for the corresponding stock but also other news items to determine the words to extract. In other words, it removed not only the words that appeared in all the increase and decrease but also the words that appeared common in the news for other stocks. When the prediction accuracy was compared, the suggested method showed higher accuracy. The limitation of this study is that the stock price prediction was set up to classify the rise and fall, and the experiment was conducted only for the top ten stocks. The 10 stocks used in the experiment do not represent the entire stock market. In addition, it is difficult to show the investment performance because stock price fluctuation and profit rate may be different. Therefore, it is necessary to study the research using more stocks and the yield prediction through trading simulation.
https://doi.org/10.13088/jiis.2017.23.2.123 인용 PDF KSCI

Latent topics-based product reputation mining (잠재 토픽 기반의 제품 평판 마이닝)

Park, Sang-Min;On, Byung-Won
- Journal of Intelligence and Information Systems
- /
- v.23 no.2
- /
- pp.39-70
- /
- 2017
Data-drive analytics techniques have been recently applied to public surveys. Instead of simply gathering survey results or expert opinions to research the preference for a recently launched product, enterprises need a way to collect and analyze various types of online data and then accurately figure out customer preferences. In the main concept of existing data-based survey methods, the sentiment lexicon for a particular domain is first constructed by domain experts who usually judge the positive, neutral, or negative meanings of the frequently used words from the collected text documents. In order to research the preference for a particular product, the existing approach collects (1) review posts, which are related to the product, from several product review web sites; (2) extracts sentences (or phrases) in the collection after the pre-processing step such as stemming and removal of stop words is performed; (3) classifies the polarity (either positive or negative sense) of each sentence (or phrase) based on the sentiment lexicon; and (4) estimates the positive and negative ratios of the product by dividing the total numbers of the positive and negative sentences (or phrases) by the total number of the sentences (or phrases) in the collection. Furthermore, the existing approach automatically finds important sentences (or phrases) including the positive and negative meaning to/against the product. As a motivated example, given a product like Sonata made by Hyundai Motors, customers often want to see the summary note including what positive points are in the 'car design' aspect as well as what negative points are in thesame aspect. They also want to gain more useful information regarding other aspects such as 'car quality', 'car performance', and 'car service.' Such an information will enable customers to make good choice when they attempt to purchase brand-new vehicles. In addition, automobile makers will be able to figure out the preference and positive/negative points for new models on market. In the near future, the weak points of the models will be improved by the sentiment analysis. For this, the existing approach computes the sentiment score of each sentence (or phrase) and then selects top-k sentences (or phrases) with the highest positive and negative scores. However, the existing approach has several shortcomings and is limited to apply to real applications. The main disadvantages of the existing approach is as follows: (1) The main aspects (e.g., car design, quality, performance, and service) to a product (e.g., Hyundai Sonata) are not considered. Through the sentiment analysis without considering aspects, as a result, the summary note including the positive and negative ratios of the product and top-k sentences (or phrases) with the highest sentiment scores in the entire corpus is just reported to customers and car makers. This approach is not enough and main aspects of the target product need to be considered in the sentiment analysis. (2) In general, since the same word has different meanings across different domains, the sentiment lexicon which is proper to each domain needs to be constructed. The efficient way to construct the sentiment lexicon per domain is required because the sentiment lexicon construction is labor intensive and time consuming. To address the above problems, in this article, we propose a novel product reputation mining algorithm that (1) extracts topics hidden in review documents written by customers; (2) mines main aspects based on the extracted topics; (3) measures the positive and negative ratios of the product using the aspects; and (4) presents the digest in which a few important sentences with the positive and negative meanings are listed in each aspect. Unlike the existing approach, using hidden topics makes experts construct the sentimental lexicon easily and quickly. Furthermore, reinforcing topic semantics, we can improve the accuracy of the product reputation mining algorithms more largely than that of the existing approach. In the experiments, we collected large review documents to the domestic vehicles such as K5, SM5, and Avante; measured the positive and negative ratios of the three cars; showed top-k positive and negative summaries per aspect; and conducted statistical analysis. Our experimental results clearly show the effectiveness of the proposed method, compared with the existing method.
https://doi.org/10.13088/jiis.2017.23.2.039 인용 PDF KSCI

The Accuracy Evaluation according to Dose Delivery Interruption and Restart for Volumetric Modulated Arc Therapy (용적변조회전 방사선치료에서 선량전달의 중단 및 재시작에 따른 정확성 평가)

Lee, Dong Hyung;Bae, Sun Myung;Kwak, Jung Won;Kang, Tae Young;Back, Geum Mun
- The Journal of Korean Society for Radiation Therapy
- /
- v.25 no.1
- /
- pp.77-85
- /
- 2013
Purpose: The accurate movement of gantry rotation, collimator and correct application of dose rate are very important to approach the successful performance of Volumetric Modulated Arc Therapy (VMAT), because it is tightly interlocked with a complex treatment plan. The interruption and restart of dose delivery, however, are able to occur on treatment by various factors of a treatment machine and treatment plan. If unexpected problems of a treat machine or a patient interrupt the VMAT, the movement of treatment machine for delivering the remaining dose will be restarted at the start point. In this investigation, We would like to know the effect of interruptions and restart regarding dose delivery at VMAT. Materials and Methods: Treatment plans of 10 patients who had been treated at our center were used to measure and compare the dose distribution of each VMAT after converting to a form of digital image and communications in Medicine (DICOM) with treatment planning system (Eclipse V 10.0, Varian, USA). We selected the 6 MV photon energy of Trilogy (Varian, USA) and used OmniPro I'mRT system (V 1.7b, IBA dosimetry, Germany) to analyze the data that were acquired through this measurement with two types of interruptions four times for each case. The door interlock and the beam-off were used to stop and then to restart the dose delivery of VMAT. The gamma index in OmniPro I'mRT system and T-test in Microsoft Excel 2007 were used to evaluate the result of this investigation. Results: The deviations of average gamma index in cases with door interlock, beam-off and without interruption on VMAT are 0.141, 0.128 and 0.1. The standard deviations of acquired gamma values are 0.099, 0.091, 0.071 and The maximum gamma value in each case is 0.413, 0.379, 0.286, respectively. This analysis has a 95-percent confidence level and the P-value of T-test is under 0.05. Gamma pass rate (3%, 3 mm) is acceptable in all of measurements. Conclusion: As a result, We could make sure that the interruption of this investgation are not enough to seriously affect dose delivery of VMAT by analyzing the measured data. But this investigation did not reflect all cases about interruptions and errors regarding the movement of a gantry rotation, collimator and patient So, We should continuously maintain a treatment machine and program to deliver the accurate dose when we perform the VMAT for the many kinds of cancer patients.
PDF

Development of a water quality prediction model for mineral springs in the metropolitan area using machine learning (머신러닝을 활용한 수도권 약수터 수질 예측 모델 개발)

Yeong-Woo Lim;Ji-Yeon Eom;Kee-Young Kwahk
- Journal of Intelligence and Information Systems
- /
- v.29 no.1
- /
- pp.307-325
- /
- 2023
Due to the prolonged COVID-19 pandemic, the frequency of people who are tired of living indoors visiting nearby mountains and national parks to relieve depression and lethargy has exploded. There is a place where thousands of people who came out of nature stop walking and breathe and rest, that is the mineral spring. Even in mountains or national parks, there are about 600 mineral springs that can be found occasionally in neighboring parks or trails in the metropolitan area. However, due to irregular and manual water quality tests, people drink mineral water without knowing the test results in real time. Therefore, in this study, we intend to develop a model that can predict the quality of the spring water in real time by exploring the factors affecting the quality of the spring water and collecting data scattered in various places. After limiting the regions to Seoul and Gyeonggi-do due to the limitations of data collection, we obtained data on water quality tests from 2015 to 2020 for about 300 mineral springs in 18 cities where data management is well performed. A total of 10 factors were finally selected after two rounds of review among various factors that are considered to affect the suitability of the mineral spring water quality. Using AutoML, an automated machine learning technology that has recently been attracting attention, we derived the top 5 models based on prediction performance among about 20 machine learning methods. Among them, the catboost model has the highest performance with a prediction classification accuracy of 75.26%. In addition, as a result of examining the absolute influence of the variables used in the analysis through the SHAP method on the prediction, the most important factor was whether or not a water quality test was judged nonconforming in the previous water quality test. It was confirmed that the temperature on the day of the inspection and the altitude of the mineral spring had an influence on whether the water quality was unsuitable.
https://doi.org/10.13088/jiis.2023.29.1.307 인용 PDF

Search Result 99, Processing Time 0.023 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)