Date of Award
Master of Science
Electrical and Computer Engineering
The advent of Chip Multiprocessor (CMP) with high performance, compact size and power efficiency has made many engineering marvel possible. CMPs has played great role in industrial automation, autonomous vehicle, embedded AI, and medical prognosis. In industrial autonomy or in autonomous vehicle there are many critical task which has to be run in isolation without any interference and delay. Virtualization software (Hypervisors) are being used for application isolation in CMPs. Hypervisors such as XEN, KVM are fully fledged hypervisor with many features and have their own scheduling scheme thus, scheduling overhead. In this thesis we used light-weight partitioning hypervisor known as Jailhouse in order to provide isolation to critical task. From our experiment we see that Jailhouse provide better isolation without any scheduling overhead which is suitable for real time application. As Jailhouse partition available resources among cells without any emulation, the number of cell we can create is limited. Also, the resources from root cell (which runs Linux) get divided and application running on it may suffer from resource constraints. We purpose adaptive offloading in order to address this issue which shows performance and quality improvement. We also explore deep learning and its implementation in edge computing device. The availability of GPUs and large data set made it possible to use deep learning state-of-art in many fields computer vision, medical diagnosis, image processing, surveillance, etc. It is evident that deep learning consists of two parts training and inferencing, both of these are power and compute intensive. We implemented YOLOV3 object detection state-of-art algorithm in NVIDIA AGX Xavier. We utilized Tensor and NVDLA cores in the Xavier using NVIDIA TensorRT and CUDA library. This has resulted more than 100% improvement in performance and significant decrease in power consumption from original YOLOV3 in FP16 precision. We explorer FP16 and INT8 precision with TensorRT, and DLA. INT8 precision further optimizes the performance and power with some compromise in accuracy. Our results shows, we can optimize inference engine by using TensorRT and DLA in edge computing device like Jetson Xavier.
This thesis is only available for download to the SIUC community. Current SIUC affiliates may also access this paper off campus by searching Dissertations & Theses @ Southern Illinois University Carbondale from ProQuest. Others should contact the interlibrary loan department of your local library or contact ProQuest's Dissertation Express service.