This paper presents a new RGB-D-camera-based visual-inertial odometry (VIO), termed DUI-VIO, for estimating the motion state of the camera. First, a Gaussian mixture model (GMM) to is employed to model the uncertainty of the depth data for each pixel on the camera’s color image. Second, the uncertainties are incorporated into the VIO’s initialization and optimization processes to make the state estimate more accurate. In order to perform the initialization process, we propose a hybrid-perspective-n-point (PnP) method to compute the pose change between two camera frames and use the result to triangulate the depth for an initial set of visual features whose depth values are unavailable from the camera. Hybrid-PnP first uses a 2D-2D PnP algorithm to compute rotation so that more visual features may be used to obtain a more accurate rotation estimate. It then uses a 3D-2D scheme to compute translation by taking into account the uncertainties of depth data, resulting in a more accurate translation estimate. The more accurate pose change estimated by Hybrid-PnP help to improve the initialization result and thus the VIO performance in state estimation. In addition, Hybrid-PnP make it possible to compute the pose change by using a small number of features with a known depth. This improves the reliability of the initialization process. Finally, DUI-VIO incorporates the uncertainties of the inverse depth measurements into the nonlinear optimization process, leading to a reduced state estimation error. Experimental results validate that the proposed DUI-VIO method outperforms the state-of-the-art VIO methods in terms of accuracy and reliability.