I. INTRODUCTION
The purpose of this study is to analyze the characteristics of oral records of contributors to sports development to find the direction of Big Data research on oral records of contributors to sports development in the future.
In this study, we intend to conduct a study on the 2017 sports development contributor oral recording project conducted by the National Sports Promotion Agency and its results.
For this project, oral interviews are conducted for specific people, and the results of the oral interviews are recorded as audio files, transcripts, research reports, photos, oral materials, and videos. These data are unstructured data. This study intends to suggest a way to analyze and utilize the data related to these oral records.
The related prior studies are as follows. Jisun Byun designed and implemented the Seoul Village Gut electronic cultural map, built a database of images, photos, audio, and text files of Seoul Village Gut including oral records and provided them with an electronic cultural map [1]. In addition, Ji-sun Byun classified the field survey data and oral records of Seoul Jinokwuigigut, a ritual for the death of the dead, by media, and summarized the characteristics and contents of the data.
In addition, the geographic information revealed in the delivered samurai was visualized and marked on the map, and the regions of Seoul Gut (shamanic ritual) were divided [2]. In the study of dataizing related data including “The History of the Three Kingdoms”, Jeong-Hoon Lee argued that data analysis. Technicians did not know the History of the Three Kingdoms, and that experts who knew the History of the Three Kingdoms could not touch the data analysis technology [3]. Construction of Big Data, such as the construction of the historical context of writing, was considered impossible because it was related to capital.
Jisun Byun has conducted Big Data-related research on records of the oral recording project hosted by the local government [4]. Keywords were extracted from oral records, and documents collected through Google search were analyzed to derive new cultural content topics. In this study, we intend to apply Jisun Byun’sn Byun's research method to the oral records of contributors to sports development in 2017.
On the other hand, among the studies on unstructured data produced in Korea, among the studies that attempted text mining on texts, the discussion referenced in this paper is as follows. Myeong-sook Koh studied unstructured data processing using keyword-based topic-oriented analysis [5].
Through the LDA technique, he extracted the top 50 keywords related to 'Government 3.0' by year based on the word and frequency, and applied the correlation network and tag cloud technique to intuitively grasp the importance of these words. Kang Seon-kyung, Lee Hyun-chang, and Shin Sung-yun analyzed the drama viewer rating related words through the atypical data collection frame [6].
Collected using crawling techniques from bulletin boards operated by each drama at each broadcasting company, blogs before airing and blogs after airing. Afterwards, the related words 'broadcast, viewer, content, child, request, woman, love' were derived through correlation analysis of the frequency of appearance. Jeon and Nam-Yong Lee studied major analysis techniques for the analysis of unstructured Big Data in public institutions [8]. They analyzed the text research status and keywords by using the academic article search site Dbpia, and performed morphological analysis on the collected 1,570 thesis titles to extract nouns and verbs, and analyzed the frequency of the extracted keywords. Seong hyeon Jeon, while studying the 'heritability' of Busan, the capital of refugees, and the current status and use of domestic and foreign materials, argued that a comprehensive database for each heritage should be established for heritage, and a Big Data system that encompassed them all should be pursued [9].
Moon Hyung-jin asserted the necessity of establishing a correspondence database for Koreans living in the Middle East and it’s meaning [10]. He saw such research as a stepping stone for the establishment of digital humanities. Garim Park and Minjeong Ha conceived a legislative information system using the energy law ontology [11]. They attempted to make the energy law into knowledge and presented the process of building an energy law ontology based on this. It suggested the possibility of being used in the construction of a knowledge-based system that supports. These studies are studies on texts among the unstructured data that are being produced explosively. Rather than designing and implementing a specific system and proving its utility, such an attempt is a study that considers research methodology by looking at a vast amount of text as unstructured data, i.e., an attempt to study text as unstructured data.
In this study, based on the research of Koh Myeong-suk, Kang Seon-kyung, Lee Hyeon-chang, and Shin Sung-yun, who analyzed texts as keywords, and based on the research of Jeon Yong-su and Yong-nam Lee, keywords were extracted from the oral records of government agencies and used through related search Let's try to figure out a way to do it.
White, R.W.Song, and H.Liu proposed a new technique to support information retrieval in oral histories using a new conceptual map [12]. A pilot study was conducted using a prototype concept mapping tool with teachers and students participating in a work task. The result showed that the conceptual map is helpful to searchers, especially when the task is complex.
Geselowitz, Michael and Lanzerotti, Mary studied factors that lowered women's participation in engineering by preserving and communicating women's roles in the history of technology and presented a role model that persisted despite these factors [12]. They argued that oral historiography is one way to designate and recognize important events in the history of engineering and technology, and to help preserve and promote that history. They also argued that oral history interviews with people related to the narrator, including those interviewed, could be collected to preserve that history. At the same time, he argued, there is a growing global awareness that women, over half of the planet's population, are undervalued in engineering. This study suggested a milestone for women who want to study science, and the oral life history of female scientists recorded will all be an opportunity for more women to apply for the engineering field. Since then, the Center for the History of Physics at the American Institute of Physics has joined as a partner in the Oral Life and History Project of Women Scientists. In addition, it was argued that the oral histories of women scientists can provide a career model for women.
The IEEE recorded interviews with scientists from 2018 to the present. The most recent interview with the scientist is that of Alan Cooper. Hsu and Hansend and Spicer, Dag interviewed scientist Alan Cooper using an oral life history study method and published the full text of the interview in the IEEE [13]. In the above study, the interview content between the narrator and the interviewer was recorded as it is. Oral life history research is also very important to prepare questionnaires, conduct interviews, and record the contents through investigation of the narrator. This study can be said to secure primary interview data. Afterwards, Allison Marsh and Katherine Kuisel's oral life history study on Linda Katehi, a Greek-American female scientist, can be said to be a considerably advanced study from the above study.
Allison Marsh and Katherine Kuisel wrote an oral account of Linda Katehi, a female scientist who immigrated to the United States from Greece [14]. They interviewed Linda Katehi using an oral history research methodology. They recorded the interview and turned it into text. This text is divided into chapters in chronological order. In Linda Katehi's oral history, they discussed growth, marriage, education, sexual harassment, and discrimination in the workplace.
Allison Marsh and Katherine Kuisel's research first created a questionnaire based on the data survey on Linda Katehi, the narrator, and conducted an interview by using this questionnaire to ask questions to the narrator [14-16].
The scale of Big Data is much larger than that of the data generated from the analog environment of the past, shorter in generation cycles, and not only the numerical data but the character and image data are included in the Big Data as well [17-18].
Accordingly, the market for the Big Data is becoming larger over time and the data is being used in different areas of our daily lives and much information is shared by the general population. However, since the analysis of Big Data is very complicated and difficult that sometimes it is quite hard to recognize its meaning and direction, the visualization of Big Data has come into the picture. Recently, the Big Data analysis is shifting from R to Python [17, 19-21].
The contents of the interview were recorded and the results of the secondary processing were presented by analyzing the records. In this study, we will study the method of expanding the secondary data processed by Allison Marsh and Katherine Kuisel's method of already recorded interview data into related documents through keyword search again.
II. TARGET DATA ANAYSIS
Target Big Data are results of the 2017 Sports Development Contributor Oral Recording Project. A contributor to sports development is a person who has achieved global achievements in their sports or contributed to the development of sports in Korea.
In June 2017, the National Sports Promotion Agency, a government agency in Korea, established the Sports Development Contributor Selection Committee. In addition, this committee selected 10 contributors to the development of sports using its own screening criteria for 20 subjects. Detailed screening criteria cannot be disclosed as they are confidential. The process of this project is as Figure. 1 follows.
Fig. 1. Process of oral recording project.
Based on the above process, detailed research was conducted as shown in the article below. This detailed study was summarized from Jisunn Byun's paper [12].
The project started with ‘making a public investigation report’. A ‘public investigation report’ was prepared for 20 oral subjects suggested by the ordering company. Newspaper articles, magazine articles, and TV programs related to each oral audience were collected and organized by year. The public investigation report for each narrator was organized into less than two pages. The public investigation report was prepared by the researchers of this research group and divided by narrator.
After submitting the public investigation report to the client, the client held the ‘2017 Sports Development Contributor Selection Meeting’. The members of the selection committee were made up of workers in domestic sports-related public institutions, female athletes, and the writer, who was in charge of research on oral records.
The members of the selection committee selected 10 dictators, including female athletes and disabled athletes, based on the public investigation report on 20 oral subjects. The criteria for selecting the narrator are confidential from the outside.
In the next step, related data were collected and analyzed for each selected narrator. Additional data were collected in addition to the data already collected in the public investigation report, and the data on the narrator was carefully considered. The characteristics of each narrator's personal life and the life of the narrator as a whole were synthesized to extract common characteristics.
The selected narrators are as follows. Elite athletes included Kim Seong-hee (golf), Kim In-sik (baseball), Kim Jeong-nam (soccer), Bang Yeol (basketball), Song Nam (athletic for the disabled), Lee Young-sook (aerobics), Lee In-jeong (climbing), and Jo Yun-sik (ice skating). They are from outstanding athletes in each sport, and even after retiring, they will continue to play sports.
He was contributing to the He has contributed as a leader in the subject and a professor at the university. In the case of In-jeong Lee, it was found that he contributed to the field of mountaineering even after he was young, when he showed excellent climbing performance. While running a company, In-jeong Lee participated in and contributed to mountaineering-related organizations and led the construction of the mountain museum.
The sports administrators included Soon-hak Bae (sports administration) and Yong-seong Park (induction of international competitions and sports administration). In the case of Park Yong-seong, he managed the Doosan Group as an entrepreneur and raised the status of Korea by hosting international games such as the 1988 Seoul Olympics.
Relevant Big Data were collected and analyzed for each selected narrator. Additional data were collected in addition to the data already collected in the public investigation report, and the data on the narrator was carefully reviewed. As a result of analyzing the life of the narrator, the following results were obtained. Figure. 2 shows a lifetime chapter of a sports contributor.
Fig. 2. A lifetime chapter of a sports contributor.
The subject of this study, In-jeon Lee, is an elite athlete. In-jeong Lee, who suffered the Korean War as a child, took refuge in the mountains in a chaotic life afterward. After that, he went to Middle East High School and joined the mountain club to learn mountain properly. While I was active in the mountain club of Middle East High School, I earned pocket money by appearing as an extra in a movie where I played a mountain climbing scene. He will go on to the Dongguk University mountaineering club as a climbing scholarship student and participate in the Vietnam War to prepare the latest mountaineering equipment.
The results of the 2017 Sports Development Contributor Oral Recording Project. These are the performance plan, oral recording preparation document, oral recording schedule, audio file, video file, photo file, oral recording text, metadata file for audio, video, and photo files after oral recording, interim report, and final report. The audio file is 40 hours long and the video file is also 40 hours long. For audio files and video files, the oral chronology, which converted the audio file into text, was produced in the amount of 1,000 A4 sheets. Transcripts included not only oral transcripts, but also interview diaries, interview reviews, special items, detailed lists, keywords, transcript summaries, and oral transcripts. Among them, the data used in this study is the oral recording text.
There were 20 oral candidates presented by the original ordering company. A survey report on their achievements was prepared, and 10 oral subjects were selected after deliberation by the narrator selection committee.
In the process of data collection, it was confirmed that there were already many data on the selected oral subjects,
III. USE OF ANALYSIS RESULTS
Among these records, we analyzed the data of the oral records about Injung Lee. The process of data analysis and utilization is as Figure 3 follows.
Fig. 3. Process of Big Data analysis and utilization.
In-jeong Lee's oral synopsis is a document of audio and video recordings of a four-hour interview. The following keywords were extracted for this document.
The process of keyword extraction is as follows. First, I divided In-jeong Lee's oral chronicles in chronological order and wrote the following sub-titles.
Figure 4 shows result searchingof 'Mt. Inwang', 'mountainer'. Among them, the results of searching in a search engine (google) with the keywords ‘Mt. Inwang’ and ‘mountainer’ are as follows.
Fig. 4. Result searching of 'Mt. Inwang', 'mountainer'.
Among the search results that satisfy both Inwangsan and mountaineers, there are 443,000 web pages.
Among the articles searched for, there is an article about President Moon Jae-in, the president of the Republic of Korea. President Moon Jae-in's hobby is mountain climbing. He has experience climbing the Himalayas and often climbed even after becoming a big man.
Articles about President Moon Jae-in and Mt. Inwang are shown in the Figure. 5 below.
Fig. 5. Result 'Mt. Inwang' and 'Jaein Moon'.
The above figure is a captured article from the Internet Maeil Business Newspaper [22]. To summarize the contents of the above document, President Moon Jae-in's hobby is climbing. He personally opened the entrance to Mt. Bukak and explained Mt. Bukan to professional mountaineers. “By opening the entrance to Mt. Bukak, it is now possible to connect Mt. Inwang, Mt. Bukak, and Mt. Bukhan.”
In addition, he promised to return Bukaksan and Inwangsan mountains to the public when he was a presidential candidate. During his presidential campaign, President Moon met with Captain Um and members of the Korea Mountain Federation and promised that "Bukaksan and Inwangsan will be fully opened and returned to the citizens." In 2017, the road in front of the Blue House was open 24 hours a day, in 2018, the Inwangsan road was fully opened, and on the 1st, the northern side of Bukaksan Dullegil was opened for the third time.
Figure. 6 shows searching result 'Mt. Iwang' and 'Injeong Lee'. Next is an article about Mt. Inwang and the orator In-jeong Lee. This is an article published in which the narrator In-jeong Lee interviewed a magazine called 'San (Mountain) [23]'. The article reads:
Fig. 6. Searching result 'Mt. Iwang' and 'In-jeong Lee'.
The above figure is a captured article from the Internet Maeil Business Newspaper [23].
To summarize the contents of the above document, Mt. Inwang was like a mother mountain to climber In-jeong Lee. In-jeong Lee, who often starved after the Korean War, climbed Mt. Inwang and got food from a monk in the hermitage of Mt. Inwang. In-jeong Lee heard a wonderful word from the monk. Lee In-jeong returned from Vietnam War and took her mother's hand and drank water from the mineral spring of Mt. Inwang. Years passed, and in 1968, North Korean spy ‘Sinjo Kim’ tried to break into the Blue House through Inwangsan. After this incident, the road to Mt. Inwang was closed, but reopened in 1993. For In-jeong Lee, Mt. Inwang is a mother mountain.
Ⅳ. UTILIZATION PLAN
Oral record for climbers related to Mt. Inwang is a mountain that spans Jongno-gu and Seodaemun-gu in Seoul. Tigers appeared in Mt. Inwang during the Joseon Dynasty. Mt. Inwang also had a military facility protecting the Blue House and was closed due to espionage. Mt. Inwang has steep rocks, making it a good place for mountain climbers to go rock climbing. Mt. Inwang is a mountain of interest to mountaineers. Through this oral chronicle, you will be able to collect experiences, legends and folktales of climbing Mt. Inwang. These stories can be collected and published as a book or produced as culturalcontents related to Mt. Inwang.
4.1. Deriving the Theme of the Museum Exhibition
Museum exhibits with themes such as ‘Mt. Inwang and the Mountaineer’ or ‘Mt. Inwang as seen by a mountaineer’ will be possible. Since the area where Mt. Inwang is located is Seoul, it is possible to exhibit the subject of mountaineers related to Mt. Inwang at the Seoul Museum of History and Mountain Museum. In addition, if you search using the keywords 'climbing' and 'death', you can get related documents. The keywords of ‘climbing’ and ‘death’ can be used as the theme of the museum exhibition.
4.2. Collection of Museum Exhibits
You will be able to collect materials such as Mt. Inwang and Korean climbers on the exhibition theme of ‘Special Exhibition of Mt. Inwang’ or ‘Korean Mountaineers’. If you look at , you can see ‘Inwangjaesseakdo’ by the painter Gyeomjae Seon Jeong who was active in the Joseon Dynasty. You can get paintings, photos, documents, etc. according to the museum exhibition theme.
4.3. Use as Climbing Education Material
A Google search with the keywords ‘climbing’ and ‘death’ resulted in 829,000. Among them, there are 19,900 news articles. The title of the press article is as follows. 'Five Korean climbers died while climbing the Himalayas'[24], Tracking of husband who died on Everest ‘wife in doubt’ [25], ' climber who did not return ... A sad group that has been going on for 47 years ‘[26], 'Most deaths from climbing Everest occur when descending’ [27].
In this way, the articles of climbers who died while climbing can be used as climbing education materials. Analyzing the contents of the article, it is conveying death due to carelessness during climbing. In particular, the content of chasing a husband who died on Everest… a wife in doubt…’ is about finding the cause of death by visiting those involved in the death process of her husband who died while climbing Everest. Through these articles, you will be able to use the search results as educational materials on the rules to follow when climbing.
Ⅴ. CONCLUSION
In this paper, we proposed a method to utilize the oral records of sports development contributors in Korea. The interview sound source of the narrator was made into a document and keywords were extracted from the document. By grouping two keywords, we searched on Google, analyzed the contents of the searched documents to come up with ways to utilize them. This research process includes the process of re-collecting a large amount of documents, that is, Big Data, using the Google search engine, rather than analyzing only the oral archives themselves. Oral records and related Big Data including them can be utilized in various ways through data analysis. This paper has a poetic character on related Big Data research including oral records. Oral records are unstructured data. Therefore, it is necessary to find an appropriate data mining method for the oral archives and analyze the oral archives. If trial and error are corrected in Big Data research including oral records, research can be conducted in a better direction.
In the future, I would like to consider the museum exhibitions made using oral records.
In addition, this study will focus on the oral records of Kim Seong-hee, a contributor to women's golf in Korea. Kim Seong-hee is a contributor to the development of sports in Korea selected by the National Sports Promotion Agency in 2017. Previously, if Allison Marsh and Katherine Kuisel's conducted research on Kim Seong-hee using the research methodology that studied Linda Katehi, a Greek-American female scientist, a higher level of research would proceed.
In particular, in the study of women's oral life history, 'marriage', 'birth', 'parenting', and 'sexual harassment', a sexually sensitive issue, the research perspective and research methodology will be studied in the future for the oral life history of famous and public interest female figures. Because it is a very important point in the study, we intend to actively utilize their research methods.
References
- Jisun Byun, "Construction and utilization plan of the Seoul area village- gut electronic culture map," Korean Folklore, vol. 45, no.1, pp. 147-174, 2007.
- Jisun Byun, "A study on Soukginokwuigut," Ph. D. Thisis, Korea Universiy, Republic of Korea, pp.1-21, 2008.
- Jung-Hoon Lee, "Application and Limitations of Data Analysis Technology of Classical Translation Text - Focusing on the Analysis of Oral Transcripts Applied to Okapi in the History of the Three Kingdoms," Culture and Convergence, vol. 40, no. 6, pp.22, 2018.
- Jisun Byun, "Big Data Research Using Marine-related Oral Records Documents Shiron-Banwolsihwa National Industrial Complex Construction Witnesses Oral Recording Project," Culture and Convergence, vol. 43, no. 4, pp. 31, 2021.
- Myung-Sook Ko, "Processing of Unstructured Data Using Keyword-Based Theme-Based Analysis," Journal of Information Processing Society. Software and Data Engineering, vol. 6, no. 11, pp. 521-526, 2017.
- Kang-Sun kyung, "Analysis of related words in drama viewership through the collection of unstructured data," Journal of the Korean Society for Information and Communication Sciences, vol. 21, no. 8, pp. 1567-1574, 2018.
- Jeon Soo, Nam-Yong Lee, "A study of major analysis techniques for the analysis of unstructured big data in public institutions," Journal of the Korea IT Policy Management Association, vol. 10, no. 5, pp. 1001-1006, 2018.
- M. F. Abdullah and K. Ahmad, "Business intelligence model for unstructured data management," in Proceeding of IEEE International Conference on Electrical Engineering and Informatics, pp. 473-477, 2015.
- J. He and C. Naughton, "Relational databases for querying XML documents: Limitations and opportunities," in Proceeding of VLDB, Edinburgh, Scotland, 1999.
- Jeon Soo and Nam-Yong Lee, "A study of major analysis techniques for the analysis of unstructured big data in public institutions," Journal of the Korea IT Policy Management Association, vol. 10, no. 5, pp. 1001-1006, 2018.
- Jeon Seong-hyeon, "Heritage' in Busan, the capital of refugees, and the current status and utilization plan of domestic and foreign data," Hando- Busan, vol. 41, no. 2, pp. 1-39, 2021. https://doi.org/10.19169/hd.2021.2.41.1
- Jisun Byun, "A Case Study on the Social Noun's Life Story-Focused on the '2017 Oral history recording Project for Sports Development Contributors," Journal of The Society of Korean Language and Literature, no. 86, pp. 5-40, 2019. https://doi.org/10.33335/KLL.86.1
- R. W. White, H. Song, and J. Liu, "Concept Maps to Support oral History Search and Use," in Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06) Digital Libraries, pp. 192-193, 2006.
- Hansen Hsu and Dag Spicer, "Oral History of Alan Cooper," IEEE Annals of the History of Computing, vol. 42, no. 4, pp. 100-118, 2020. https://doi.org/10.1109/mahc.2020.3033744
- A. Marsh, K. Kuisel, "Women in Microwaves: Linda Katehi," IEEE Journal of Microwaves, vol. 1, no. 3, pp. 689-697, Jul. 2021. https://doi.org/10.1109/JMW.2021.3087860
- Jisun Byun, "A Study on Contributor to Sports Development Big data Research using Oral RecordsFocused on the Records of the 2017 Sports Development Contributor" in Proceeding of MITA2021, pp. 1-3, 2021.
- J. Huh and K. Seo, "A Preliminary Analysis Model of Big Data for Prevention of Bioaccumulation of Heavy Metal-Based Pollutants: Focusing on the Atmospheric Data Analyses for Smart Farm," Contemporary Engineering Sciences, vol. 9, no. 30, pp. 1447-1462, 2016. https://doi.org/10.12988/ces.2016.69161
- M. Cha, H. Haddadi, F. Benevenuto, and P. K. Gummadi, "Measuring user influence in twitter: the million follower fallacy," in Proceeding of the Fourth International AAAI Conference on Weblogs and Social Media," vol. 4, no. 1, pp. 10-17, 2010.
- H. Chafi, Z. DeVito, A. Moors, T. Rompf, A.K. Sujeeth, P. Hanrahan, M. Odersky, and K. Olukotun, "Language virtualization for heterogeneous parallel computing," in Proceedings of the ACM International Conference on Object Oriented Programming Systems Languages and Applications, OOPSLA, vol. 45, no. 10, pp. 835-847, Oct. 2010.
- S. Lee and J. Huh, "An effective security measures for nuclear power plant using Big Data analysis approach," The Journal of Supercomputing, vol. 75, no. 8, pp. 4267-4294, 2019. https://doi.org/10.1007/s11227-018-2440-4
- B. Chattopadhyay, L. Lin, W. Liu, S. Mittal, P. Aragonda, V. Lychagina, Y. Kwon, and M. Wong, "Tenzing a SQL Implementation on the Mapreduce Framework," in Proceedings of VLDB, vol. 4, no. 12, pp. 1318-1327, 2011.
- https://m.khan.co.kr/politics/election/article/201704152007001#c2b (accessed on 1 December 2021).
- http://san.chosun.com/m/svc/article.html?contid=2012018062702611 (accessed on 1 December 2021).
- https://www.yna.co.kr.kr/view/AKR20181013049100007 (accessed on 1 December 2021).
- https://www.joongang.co.kr/article/23088127#home (accessed on 1 December 2021).
- https://www.yna.co.kr/view/AKR20181013049100007 (accessed on 1 December 2021).
- https://www.joongang.co.kr/article/3422448#home (accessed on 1 December 2021).