As Deep Neural Networks (DNNs) have been widely used in various applications, including computer vision on image segmentation and recognition, it is important to reduce the makespan of DNN computation, especially when running on mobile devices. Offloading is a viable solution that offloads computation from a slow mobile device to a fast, but remote server in cloud. As DNN computation consists of a multiple-stage processing pipeline, it is critical to decide on what stage should offloading occur to minimize the makespan. Our observations show that the local computation time on a mobile device follows a linear increasing function, while the offloading time on a mobile device is monotonic decreasing and follows a convex curve as more DNN layers are computed in the mobile device. Based on this observation, we first study the optimal partition and scheduling for one line-structure DNN. Then, we extend the result to multiple line-structure DNNs. Heuristic results for general-structure DNNs, represented by Directed Acyclic Graphs (DAGs), are also discussed based on a path-based scheduling policy. Our proposed solutions are validated via real system implementation.