Outstanding

Current Projects

  1. Graduate School of Metaverse Convergence (Sungkyunkwan University) (메타버스 융합대학원(성균관대학교)), Jul. 2023 – Dec. 2028, 
    [PI],  Funded by Institute for Information & communications Technology Promotion (IITP)
  2. Development of Core Technologies for Contents Streaming Copyright on Metaverse Platform (메타버스 플랫폼에서의 콘텐츠 스트리밍 저작권 핵심 기술 개발), Apr. 2023 – Dec. 2026, 
    [Sub-PI], Funded by Korea Creative Content Agency (KOCCA)
  3. Development of Moving Robot-based Immersive Video Acquisition and Processing System in Metaverse (이동형 로봇 기반 실사 메타버스 실감형 비디오의 획득 및 처리 기술 개발) – XR 국제공동협력과제, Jul. 2022 – Dec. 2024,
    [PI]
    , Funded by Funded by Institute for Information & communications Technology Promotion (IITP)
  4. Foreground and background matching 3D object streaming technology development (전배경 정합 3D 객체 스트리밍 기술개발), Apr. 2022 – Dec. 2025,
    [Sub-PI], Funded by Institute for Information & communications Technology Promotion (IITP)
  1. Development of immersive video spatial computing technology for ultra-realistic metaverse services (초실감 메타버스 서비스를 위한 실사기반 입체영상 공간컴퓨팅 기술 개발), Jan. 2022 – Dec. 2023,
    [Sub-PI], Funded by Electronics and Telecommunications Research Institute (ETRI)
  2. Development of Ultra High Resolution Unstructured Plenoptic Video Storage/Compression/Streaming Technology for Medium to Large Space (중대형 공간용 초고해상도 비정형 플렌옵틱 영상 저장/압축/전송 기술 개발), Apr. 2020 – Dec. 2023,
    [Sub-PI], Funded by Institute for Information & communications Technology Promotion (IITP)
  3. Development of Low Latency VR·AR Streaming Technology based on 5G Edge Cloud (5G 엣지클라우드 기반 VR·AR 저지연 스트리밍 기술 개발), Apr. 2020 – Dec. 2023,
    [Sub-PI], Funded by Korea Electronics Technology Institute (KETI)Funded by Institute for Information & communications Technology Promotion (IITP)
  4. Development of Surgical Planning Education Platform Using Hololens2 Augmented Reality Device (홀로렌즈2 기반의 증강현실을 이용한 해부학 교육 및 다자간 원격 수술계획 수립 도구 개발), Oct. 2021 – Sep. 2023, 
    (Collaborative research with Prof. Yong Gi Jung, Department of ENT Clinic, Samsung Medical Center(SMC)), 
    [Sub-PI], Funded by Samsung Medical Center (SMC) & Sungkyunkwan University (SKKU)
  5. Development of Video Acquisition and Rendering System for 6DoF VR Studio/Theater (6DoF VR 스튜디오와 대형공연장을 위한 영상 취득 및 렌더링 기술 개발), Jun. 2022 – May. 2023, 
    [PI], Funded by National Research Foundation of Korea (NRF)
  6. Learned Image Compression with Frequency Domain Loss (주파수 영역 손실함수를 활용한 신경망 이미지 압축 방법), Apr. 2021 – Mar. 2022,
    [PI], Funded by Sungkyunkwan University (SKKU)
  7. Development of Real-time 360-degree Video Streaming System for Virtual Reality Theaters (가상현실 공연장을 위한 360도 비디오 실시간 스트리밍 시스템 개발), Mar. 2019 – Feb. 2022
    Funded by National Research Foundation of Korea (NRF)
  8. Development of Brain Disease Prediction/Prevention Technology using Medical Big Data and Human Resource Development Program (의료 빅데이터를 활용한 뇌질환 예측 · 예방 기술개발 및 전문인력 양성), Jun. 2017 – Dec. 2021
    (Collaborative research, PI: Prof. Taegkeun Whangbo, Department of Computer Engineering, Gachon University),
    Funded by the Ministry of Science, ICT & Future Planning(ITRC)
  9. Development of Metaverse-Education Platform Using AI/VR/AR Technologies (인공지능/VR/AR 기술을 활용한 메타버스 교육 플랫폼 개발), Sep. 2021 – Jan. 2022,
    Funded by National Research Foundation of Korea (NRF)
  10. Development of Multi-view Video Processing System with Plenoptic Camera for 6DoF Virtual Reality (6DoF 가상현실 기반 홀로그래피 시스템을 위한 다시점 플렌옵틱 비디오 처리 연구), Nov. 2020 – Dec. 2021,
    [PI], Funded by Sungkyunkwan University (SKKU)
  11. Development of Compression and Transmission Technologies for Ultra High Quality Immersive Videos Supporting 6DoF (6DoF지원 초고화질 몰입형 비디오의 압축 및 전송 핵심 기술 개발), Jul. 2018 – Dec. 2020,
    Funded by Institute for Information & communications Technology Promotion (IITP)
  12. Development of Low-delay VLC Player for AVC (저지연 AVC 영상데이터의 고속 VLC 플레이어 개발), Jun. 2019 – Feb. 2020
    [PI]
    , Funded by Korea Technology and Information Promotion Agency for SMEs (TIPA)
  13. Development of the Augmented Reality / Mixed Reality System by Extending VR Technologies (VR 기술의 확장을 통한 AR/MR 시스템 개발), May. 2019 – Aug. 2019
    [PI]
    , Funded by Gachon University, Korea
  14. Development of the Video Scene Analysis Technology for Video Recommendation System (비디오 추천 시스템을 위한 비디오 장면 분석 기술 개발), Dec. 2018 – Aug. 2019
    [PI]
    , Funded by Gachon University, Korea
  15. Development of 3DoF+ 360-degree Video System for Immersive VR services (몰입형 VR 서비스를 위한 3DoF+ 360 비디오 표준화 연구), Jun. 2018 – May. 2019
    [PI]
    , Funded by LG Electronics Research
  16. Personalized Media Communication, Jan. 2018 – May. 2019
    [PI], Funded by InterDigital, USA
  17. Development of Healthcare for Senior with Artificial Intelligence Technology (인공지능기술 기반 시니어 헬스케어 기술 개발), Jul. 2017 – Aug. 2019
    (Collaborative research, PI: Prof. Taegkeun Whangbo, Department of Computer Engineering, Gachon University), Funded by Gyunggi Regional Research Center(GRRC)
  18. Development of Tiled Streaming Technology For High Quality VR Contents Real-Time Service (고품질 VR 콘텐츠 실시간 서비스를 위한 분할영상 스트리밍 기술 개발), Jun. 2017 – Dec. 2019, Funded by Institute for Information & communications Technology Promotion (IITP)
  19. Reference SW Development for Viewport Dependent 360 Video Processing (360 비디오의 사용자 뷰포트 기반 프로세싱을 위한 레퍼런스 SW 개발), Jun. 2017 – Mar. 2018
    [PI]
    , Funded by LG Electronics Research
  20. Development of Multi-sensor Intelligent Edge Camera System (전력설비 고장 감시/사전진단을 위한 다중센서 융합 지능형 AV디바이스 및 플랫폼 개발), May. 2017 – Apr. 2020
    [PI], Funded by the Korea Electric Power Corporation Research Institute
  21. Commercialization of smartphone/PC compatible mobile Braille pad and content production/service system, Aug. 2016 – Aug. 2019
    (Collaborative research, PI: Prof. Jinsoo Cho, Department of Computer Engineering, Gachon University), Funded by Commercializations Promotion Agency for R&D Outcomes, Korea
  22. Sensor Networking Protocols for Emergency Data Collection, Jun. 2016 – Nov. 2016
    (Collaborative research, PI: Prof. Sungrae Cho, Ubiquitous Computing Lab., Chung-Ang University, Seoul, Korea), Funded by Electronics and Telecommunications Research Institute (ETRI), Korea
  23. Haptic Video Conferencing System for Individuals with Visual Impairments, Jul. 2015 – Jun. 2018
    [PI], Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Science, ICT & Future Planning(NRF-2015R1C1A1A02037743)
  24. Haptic Telepresence System for Individuals with Visual Impairments, Apr. 2015 – Mar. 2017
    [PI], Funded by Gachon University, Korea

Research

Overall Research Goal: Merciless Video Processing (MVP)
: Video decoding speed-up for mobile VR by using Tiled-SHVC as well as asymmetric mobile CPU multicores.

1. [Video Standard] MPEG Immersive Video (MIV)

MPEG Immersive Video (MIV) is a new standard for encoding and streaming immersive video content, such as virtual reality and augmented reality. MIV offers several benefits over existing video codecs, including better compression, improved image quality, and support for 360-degree and 3D video. MCSL is working to develop new MIV technologies that can be used in a variety of immersive media applications, including virtual reality, immersive education and training, and telepresence.

MPEG Immersive Video (MIV)는 가상 현실과 증강 현실과 같은 몰입형 비디오 콘텐츠를 인코딩하고 스트리밍하기 위한 새로운 표준입니다. MIV는 기존의 비디오 코덱보다 더 나은 압축률, 개선된 이미지 품질, 360도 및 3D 비디오 지원 등의 장점을 제공합니다. MCSL은 MIV 기술을 개발하여 가상현실, 몰입형 교육 및 훈련, 원격 회의 등 다양한 몰입형 미디어 응용 분야에서 활용할 수 있는 기술을 연구하고 있습니다.

2. [Video Standard] Neural Network-based Video Representation

Implicit Neural Visual Representation (INVR) is a novel approach to representing images that uses implicit neural networks to map an input image to a high-dimensional latent space. It allows you to store low-capacity, complex colors and depths of the real world, and effectively render high-quality images or videos based on the view point and viewing direction. We are exploring the potential of INVR in immersive media applications beyond the traditional ways such as pixel, voxel, and mesh units.

Implicit Neural Visual Representation (INVR)은 신경망을 사용하여 입력 이미지를 고차원의 잠재 공간(latent space)으로 매핑하여 나타내는 이미지 표현의 새로운 방식입니다. INVR을 통해 저용량으로 현실 세계의 복잡한 색상과 깊이를 가진 정보를 저장할 수 있으며, 바라보는 시점 및 방향에 따라 효과적으로 고품질의 이미지 또는 비디오를 렌더링할 수 있습니다. MCSL은 INVR을 활용하여 전통적인 비디오 표현방식을 넘어선 몰입형 미디어 응용의 가능성을 탐구하고 있습니다.

3. [Video Enabling] 6DoF Immersive Video Capturing System

3 degrees of freedom(3DoF) video is an immersive video that considers only rotational motion based on the x-axis, y-axis, and z-axis in a fixed position. Further from 3DoF, 6 degrees of freedom(6DoF) is a realistic video that users can watch while moving their position directly in 3 axial directions, as well as in-place rotational motion. MCSL is working on the field of acquiring synchronized color and depth videos with multiple cameras and synthesizing one complete 6DoF video content using camera calibration, color correction, and virtual viewpoint synthesis technology.

3 degrees of freedom(3DoF) 비디오는 고정된 위치에서 x축, y축, z축을 기준으로 하는 회전운동만을 고려하는 실감형 비디오 기술입니다. 3DoF에서 더 나아가, 6 degrees of freedom(6DoF) 비디오는 제자리 회전운동뿐만 아니라 사용자가 3가지 축 방향으로 자신의 위치를 직접 이동하면서 볼 수 있는 실감형 비디오입니다. MCSL에서는 여러 개의 카메라로 동기화된 컬러 및 뎁스 영상들을 취득하고 카메라 캘리브레이션, 색 조정, 가상시점 합성 기술 등을 활용하여 하나의 완전한 6DoF 비디오 콘텐츠를 합성하는 분야를 연구하고 있습니다.

Past Researches

1. [Video System] HEVC Parallel Processing for Asymmetric Multicore Processors

1.1. Tile Partitioning-based HEVC Parallel Processing Optimization

Recently, there is an emerging need for parallel UHD video processing, and the usage of computing systems that have an asymmetric processor such as ARM big.LITTLE is actively increasing. Thus, a new parallel UHD video processing method that is optimized for the asymmetric multicore systems is needed.
This study proposes a novel HEVC tile partitioning method for parallel processing by analyzing the computational power of asymmetric multicores. The proposed method analyzes (1) the computing power of asymmetric multicores and (2) the regression model of computational complexity per video resolution. Finally, the model (3) determines the optimal HEVC tile resolution for each core and partitions/allocates the tiles to suitable cores.
The proposed method minimizes the gap in the decoding time between the fastest CPU core and the slowest CPU core. Experimental results with the 4K UHD official test sequences show average 20% improvement in the decoding speedup on the ARM asymmetric multicore system.

1.2. Prediction Complexity-based HEVC Parallel Processing Optimization

We also study a new HEVC Tile allocation method considering the computational ability of asymmetric multicores as well as the computational complexity of each Tile.
The computational complexity of each Tile can be measured using the amount of HEVC prediction unit (PU) partitioning.
Our implemented system (1) counts and sorts the amount of PU partitioning of each Tile and (2) allocates Tiles to asymmetric big/LITTLE cores according to their expecting computational complexity. 4K PeopleOnStreet test sequence, thee coding structures such as random access (RA), all intra (AI), and low-delay B (LDB) defined in the common test condition (CTC) of HEVC standard, and 6 multicores consists of 2 big cores and 4 little cores were used for experiments.
When experiments were conducted, the amount of PU partitioning and the computational complexity (decoding time) show a close correlation, and average performance gains of decoding time were 5.24% for 6 tiles and 8.44% for 12 tiles, respectively. The proposed method with adaptive allocation shows the average performance 18.03% as well.

한글요약:

최근 비디오 시스템은 초고해상도 영상의 사용으로 병렬처리의 필요성이 대두되고 있고, 시스템은 ARM big.LITTLE 같은 비대칭 처리능력을 지닌 컴퓨팅 시스템이 도입되고 있다. 따라서, 이 같은 비대칭 컴퓨팅 환경에 최적화된 초고해상도 UHD 비디오 병렬처리 기법이 필요한 시점이다.
본 연구는 인코딩/디코딩시에 비대칭 컴퓨팅 환경에 최적화 된 HEVC 타일(Tile) 분할 기법을 제안한다. 제안하는 방식은 (1) 비대칭 CPU 코어들의 처리능력과 (2) 비디오 크기별 연산 복잡도 분석 모델을 분석하여, (3) 각 코어에 최적화된 크기의 타일을 할당함으로써, 처리속도가 빠른 CPU 코어와 느린 코어의 인코딩/디코딩 시간차를 최소화한다.
이를 ARM기반의 비대칭 멀티코어 플랫폼에서 4K UHD 표준 영상을 대상으로 실험하였을 때, 평균 약 20%의 디코딩 시간 개선이 발생함을 확인하였다.
또다른 방법으로, 우리는 HEVC의 Tile 사이즈가 인코딩시에 이미 동일하게 나뉘어져있을 경우도 연구한다. 이 연구는 (1) Prediction Unit (PU)의 Partitioning 횟수를 분석함으로써 Tile별 연산 복잡도를 계산하고, (2) 이 복잡도 기반으로 Big/LITTLE의 Asymmetric cores에 각 Tile을 할당하여 디코딩을 하는데, HEVC 표준화 활동의 기본 실험 조건에 맞추어 4K PeopleOnStreet 테스트 컨텐츠를 대상으로 실험한 결과 6개의 Tile을 대상으로는 5.24%, 12개의 Tile을 대상으로는 8.44%의 디코딩 속도 향상을 확인하였다. 추가적으로 코어에 적응적인 Tile 할당을 수행한 결과, 18.03%의 디코딩 속도 향상을 확인하였다.

2. [Cyber Physical Systems(CPS)]Haptic Telepresence System for the Individuals with Visual Impairments

This study proposes a novel video conferencing system for individuals with visual impairments by using an RGB-D sensor and haptic device. The recent improvement on RGB-D sensors has enabled real-time access on 3D spatial information in the form of point clouds. However, the real-time representation of this data in the form of tangible haptic experience has not been challenged enough, especially in the case of telepresence. Thus, the proposed system addresses the telepresence of remote 3D information using an RGB-D sensor through video encoding and 3D depth-map enhancement by utilizing both 2D image and depth-map. In our implemented system, the Kinect sensor from Microsoft is an RGB-D sensor that provides depth and color images at a rate of approximately 30 fps. The Kinect depth data frame is buffered, projected into a 3D coordinate system with resolution 640 by 480, and then transformed into a 3D map structure.
To verify the benefits of the proposed video content adaptation method for individuals with visual impairments, this study conducts 3D video encoding and user testing. In conclusion, the proposed system provides a new haptic telepresence system for individuals with visual impairments by providing an enhanced interactive experience.

한글요약:

정보통신 기술의 발달과 스마트폰의 보급으로 일반 사용자들은 언제, 어디서나 가족 및 주변 사용자들의 모습을 바라보며 통화를 할 수 있게 되었고, 원하는 영상이나 사진을 감상할 수 있다. 하지만 시각장애인들은 이들을 위한 연구 및 사회적 인프라의 부족으로 인하여 이와 같은 서비스들의 제공 대상에서 늘 제외되어 왔다. 이러한 사항을 개선하기 위하여 본 연구는 시각장애인들을 위한 새로운 방식의 촉각TV 시스템을 제안한다. 제안하는 시스템은 크게 3D 캡쳐(Capture) 기술, 실시간 전송/스트리밍 기술, 햅틱(Haptic) 장치 및 액츄에이터(Actuator)제어 기술로 구성되며, 이동이 불편한 시각 장애인이 제한적 기능일지라도 원격의 가족 얼굴 윤곽을 느끼고 인식하고 TV 및 사진감상을 가능하게 하려는 노력이다. 현재는 2D Braille (점자) Pad 개발로 본 연구를 확장하고 있다.

3. [Video Standard] Viewport Dependent 360-Degree Video Streaming

360-degree video streaming for virtual reality (VR) is emerging. However, the computational ability and bandwidth of the current VR are limited when compared to the high-quality VR. To overcome these limits, we proposes a new viewport dependent streaming method that transmits 360-degree videos using the high efficiency video coding (HEVC) and the scalability extension of HEVC (SHVC). The proposed SHVC and HEVC encoders generate the bitstream that can transmit tiles independently. The proposed extractor extracts the bitstream of the tiles corresponding to the viewport from the bitstream generated by the proposed encoder. SHVC video bitstream extracted by the proposed methods consist of (i) an SHVC base layer (BL) which represents the entire 360-degree area and (ii) an SHVC enhancement layer (EL) with region of interest (ROI) tiles. When the proposed HEVC encoder is used, low resolution and high resolution sequences are separately encoded and serve as BL and EL. By transmitting the BL(low resolution) and EL(high resolution) with ROI tiles, the proposed method helps reduce not only the computational complexity on the decoder side but also the network bandwidth.

4. [Video Enabling] Tile-based Streaming with Saliency map

We also experiment taking advantage of a tile-based approach using saliency map that integrates information of human visual attention on the contents to deliver high-quality tile in a region of interest (ROI). In our method, 360-degree videos are encoded in various quality levels with MCTS techniques, then we assign tile quality level using saliency map predicted by existing convolutional neural network (CNN) model. Consequently, The mixed quality videos based on saliency map enable efficient streaming in 360-degree videos. We get viewport position from eye fixation history (e.g. longitude, latitude, frame), then render the viewport with 90◦×90◦ FoV 2D image from ERP. Then compute the peak signal-to-noise ratio (PSNR) to the rendered viewport. Through the evaluation using Salient360! dataset, we show that the proposed method results in significant improvement in terms of bandwidth with little loss of viewport image quality.

5. [Video Standard] 3DoF+ 360-degree Video System for Immersive VR services

Immersive video streaming has become a very popular service in these days. To increase the user experience on immersive video, user movement adaptive video streaming, 3DoF+, has emerged, and expected to meet this growing demand. While maintaining the upper limit of the bandwidth, providing high quality immersive experience is challenging since 3DoF+ requires multi-view video transmission. 
The proposed system addresses the bitrate efficient 360 system architecture for 3DoF+ 360 video streaming and proposes two main ideas : (i) multi-view video redundancy removal method, (ii) multi-view video packing. In this research, high efficiency video coding (HEVC) reference model and reference view synthesizer (RVS) are used. The proposed system requires less decoders to clients, which decreases burden for immersive video streaming.

6. [Video Streaming] Real-time streaming using private protocol QRTP

Currently, High-Efficiency Video Coding (HEVC) has become the most promising video technology. However, the deployment of HEVC into video streaming system was limited by elements such as cost, the complexity of the design and the compatibility with current systems. While HEVC was considering deploying into various systems with many restrictions, H264-AVC can be the best option for current video streaming systems. This paper presents an adaptive method to manipulate video stream by using video coding on a designed integrated circuit (IC) with a private network processor. The proposed system provides a possibility to stream multimedia data from a camera or other video sources. For this work, a sequence of video/audio packets from video source are forwarded to designed IC, which called as transmitter Tx. The transmitter Tx processes input data into a real-time stream by using private protocol QRTP according to Real-Time Transport Protocol (RTP) for video/audio, then Tx transmits them to video client via the internet. Video client can include hardware decoder or software decoder to process received video stream. Tx uses video coding H264-AVC or HEC to encode multimedia data. By controlling the message exchanging between Tx and Rx/video client, the streaming can be established rapidly, and the streaming throughput can be achieved around 50 Mbps with latency approximate 80 msec.

7. [Video Analytics] Personalized Video Summarization (Movie Trailer) and Recommendation

ImageNet에서 우승한 딥러닝 알고리즘인 AlexNet이 발표된 2012년 이후, 많은 연구단체들은 딥러닝을 이용한 비디오 분석 연구를 수행하고 있다. 최근에는, Netflix, Facebook, Amazon, Flickr 등의 글로벌 기업들은 비디오에 담긴 수많은 메타데이터, 그리고 분석이 가능한 많은 특징 요소들을 활용하여 사용자 비디오 기반의 비디오 스트리밍 서비스를 연구하고 있다. 이러한 연구들은 각 가정에서 이용하는 홈 VoD 서비스, 스마트폰과 태블릿과 같은 모바일 기기에서 이용하는 웹 비디오 스트리밍 서비스, 그리고 유튜브, 트위치 등에서 서비스하는 개인 미디어 방송 스트리밍 서비스에 활용된다. 본 연구는 시청자가 비디오 시청 시에, 맞춤형 스트리밍 서비스를 위한 시청자 분석 및 비디오 장르 및 감정인식을 수행한다. 제안하는 방식은 (1) 시청자가 시청하는 비디오를 딥러닝 기반의 장르 및 감정을 분석하여 결과를 도출하고, (2) 이 결과를 토대로 한 시청자의 반응을 분석하여 감정의 긍정 또는 부정 분류를 나타낸다. 이 방법을 통해, 시청자를 위한 맟춤형 비디오 서비스를 제공하며, 비디오 공급자는 시청자들을 위한 비디오를 더욱 정확하고 자동적으로 제공할 수 있다. 이 시스템의 흐름은 아래의 그림과 같다. (1)에서는 기본적으로 비디오에 담긴 제목, 자막 등의 메타데이터와 비디오, 오디오에서 키 프레임을 추출하여 색상, 움직임 등의 특징들을 선택적으로 활용한다. (2)에서는 비디오 디스플레이와 함께 웹캠을 이용하여, 딥러닝 기반의 얼굴 인식 알고리즘 활용해 사용자의 감정을 추출한다. 이를 서버에서 받아 비디오와 사용자 간의 조화도를 측정한다. 측정한 조화도를 통하여 비디오 공급자가 제공하는 최적의 비디오를 추천한다.

8. [Video Analytics] Client-Driven Personalized Trailers Framework Using Thumbnail Containers

Movie trailers are prepared using a one-size-fits-all framework. These days, however, streaming platforms seek to overcome this problem and provide personalized trailers via the investigation of centralized server-side solutions. This can be achieved by analyzing personal user data, and can lead to two major issues: privacy violation and enormous demand in computational resources. This paper proposes an innovative, low-power, client-driven method to facilitate the personalized trailer generation process. It tackles the complex process of detecting personalized actions in real-time from lightweight thumbnail containers. The HTTP live streaming (HLS) server and client are locally configured to validate the proposed method. The system is designed to support a wide range of client hardware with different computational capabilities and has the flexibility to adapt to network conditions. To test the effectiveness of this method, twenty-five broadcast movies, specifically in the western and sports genres, are evaluated. To the best of our knowledge, this is the first-ever client-driven framework that uses thumbnail containers as input to facilitate the trailer generation process.