Lesson 26:Diego 1# —物体识别和定位

google最近公布了基于tensorflow物体识别的Api,本文将利用Diego1#的深度摄像头调用物体识别API,在识别物体的同时计算物体与出机器人摄像头的距离。原理如下:

  • Object Detection 订阅Openni发布的Image消息,识别视频帧中的物体
  • Object Depth 订阅Openni发布的Depth Image消息,根据Object Detection识别出的物体列表,对应到Depth Image的位置,计算Object的深度信息
  • Publish Image将视频帧经过处理,增加识别信息Lable后,以Compressed Image消息发布出去,可以方便其他应用订阅

1.创建diego_tensorflow包

由于我们使用的tensorflow,所以我们首先需要安装tensorflow,可以参考tensorflow官方安装说明https://www.tensorflow.org/install/install_linux

Object_detection相关依赖安装见官方安装说明https://github.com/tensorflow/models/blob/master/object_detection/g3doc/installation.md

执行如下命令创建diego_tensorflow包

catkin_create_pkg diego_tensorflow std_msgs rospy roscpp cv_bridge

在diego_tensorflow目录下创建两个子目录

  • scripts:存放相关代码
  • launch:存放launch启动文件

创建完成后diego_tensorflow目录如下图所示:

下载object_detection包:https://github.com/tensorflow/models

下载后将object_detection包上传到diego_tensorflow/scripts目录下,如果自己做模型训练还需有上传slim包到diego_tensorflow/scripts目录下

物体识别的代码都写在ObjectDetectionDemo.py文件中,其中有关识别的代码大部分参考tensorflow官方示例,这里将其包装成为一个ROS节点,并增加物体深度数据的计算

2.ROS节点源代码

#!/usr/bin/env python

import rospy
from sensor_msgs.msg import Image as ROSImage
from sensor_msgs.msg import CompressedImage as ROSImage_C
from cv_bridge import CvBridge
import cv2
import matplotlib
import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tarfile
import tensorflow as tf
import zipfile
import uuid
from collections import defaultdict
from io import StringIO
from PIL import Image
from math import isnan

# This is needed since the notebook is stored in the object_detection folder.
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as vis_util

class ObjectDetectionDemo():
    def __init__(self):
	rospy.init_node('object_detection_demo')
	
	# Set the shutdown function (stop the robot)
        rospy.on_shutdown(self.shutdown)
        
        self.depth_image =None
        
        self.depth_array = None
        
        model_path = rospy.get_param("~model_path", "")
        image_topic = rospy.get_param("~image_topic", "")
        depth_image_topic = rospy.get_param("~depth_image_topic", "")
        if_down=False
        self.vfc=0
        
        self._cv_bridge = CvBridge()
        
        # What model to download.
	#MODEL_NAME = 'ssd_mobilenet_v1_coco_11_06_2017'
	#MODEL_NAME='faster_rcnn_resnet101_coco_11_06_2017'
	MODEL_NAME ='ssd_inception_v2_coco_11_06_2017'
	#MODEL_NAME ='diego_object_detection_v1_07_2017'
	#MODEL_NAME ='faster_rcnn_inception_resnet_v2_atrous_coco_11_06_2017'
	MODEL_FILE = MODEL_NAME + '.tar.gz'
	DOWNLOAD_BASE = 'http://download.tensorflow.org/models/object_detection/'

	# Path to frozen detection graph. This is the actual model that is used for the object detection.
	PATH_TO_CKPT = MODEL_NAME + '/frozen_inference_graph.pb'

	# List of the strings that is used to add correct label for each box.
	PATH_TO_LABELS = os.path.join(model_path+'/data', 'mscoco_label_map.pbtxt')
	#PATH_TO_LABELS = os.path.join('data', 'mscoco_label_map.pbtxt')

	NUM_CLASSES = 90
	
	if if_down:
		opener = urllib.request.URLopener()
		opener.retrieve(DOWNLOAD_BASE + MODEL_FILE, MODEL_FILE)
		tar_file = tarfile.open(MODEL_FILE)
		for file in tar_file.getmembers():
			file_name = os.path.basename(file.name)
			if 'frozen_inference_graph.pb' in file_name:
        			tar_file.extract(file, os.getcwd())


	rospy.loginfo("begin initilize the tf...")
	self.detection_graph = tf.Graph()
	with self.detection_graph.as_default():
		od_graph_def = tf.GraphDef()
		with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
			serialized_graph = fid.read()
			od_graph_def.ParseFromString(serialized_graph)
			tf.import_graph_def(od_graph_def, name='')

	label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
	categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES, use_display_name=True)
	self.category_index = label_map_util.create_category_index(categories)
	
	# Subscribe to the registered depth image
        rospy.Subscriber(depth_image_topic, ROSImage, self.convert_depth_image)
        
        # Wait for the depth image to become available
        #rospy.wait_for_message('depth_image', ROSImage)
	
	self._sub = rospy.Subscriber(image_topic, ROSImage, self.callback, queue_size=1)	
	self._pub = rospy.Publisher('object_detection', ROSImage_C, queue_size=1)
	
	rospy.loginfo("initialization has finished...")
	
	
    def convert_depth_image(self, ros_image):
        # Use cv_bridge() to convert the ROS image to OpenCV format
        # The depth image is a single-channel float32 image
        self.depth_image = self._cv_bridge.imgmsg_to_cv2(ros_image, "32FC1")

        # Convert the depth image to a Numpy array
        self.depth_array = np.array(self.depth_image, dtype=np.float32)
        #print(self.depth_array)
        
    def callback(self,image_msg):
	if self.vfc<12:
		self.vfc=self.vfc+1
	else:
		self.callbackfun(image_msg)
		self.vfc=0	
		    	
    def box_depth(self,boxes,im_width, im_height):
	# Now compute the depth component
        depth=[]
	for row in boxes[0]:
		n_z = sum_z = mean_z = 0
		# Get the min/max x and y values from the ROI
		if row[0]<row[1]:
			min_x = row[0]*im_width
			max_x = row[1]*im_width
		else:
			min_x = row[1]*im_width
			max_x = row[0]*im_width
			
		if row[2]<row[3]:
			min_y = row[2]*im_height
			max_y = row[3]*im_height
		else:
			min_y = row[3]*im_height
			max_y = row[2]*im_height
		# Get the average depth value over the ROI
		for x in range(int(min_x), int(max_x)):
            		for y in range(int(min_y), int(max_y)):
                		try:
					z = self.depth_array[y, x]
				except:
					continue
                
				# Depth values can be NaN which should be ignored
				if isnan(z):
					z=6
					continue
				else:
					sum_z = sum_z + z
					n_z += 1 
			mean_z = sum_z / (n_z+0.01)
		depth.append(mean_z)
	return depth
    def callbackfun(self, image_msg):
	with self.detection_graph.as_default():
		with tf.Session(graph=self.detection_graph) as sess:
			 cv_image = self._cv_bridge.imgmsg_to_cv2(image_msg, "bgr8")
			 #cv_image = (self._cv_bridge.imgmsg_to_cv2(image_msg, "bgr8"))[300:450, 150:380]
			 pil_img = Image.fromarray(cv_image)			 
			 (im_width, im_height) = pil_img.size			 
			 # the array based representation of the image will be used later in order to prepare the
			 # result image with boxes and labels on it.
			 image_np =np.array(pil_img.getdata()).reshape((im_height, im_width, 3)).astype(np.uint8)
			 # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
			 image_np_expanded = np.expand_dims(image_np, axis=0)
			 image_tensor = self.detection_graph.get_tensor_by_name('image_tensor:0')
			 # Each box represents a part of the image where a particular object was detected.
			 boxes = self.detection_graph.get_tensor_by_name('detection_boxes:0')
			 # Each score represent how level of confidence for each of the objects.
			 # Score is shown on the result image, together with the class label.
			 scores = self.detection_graph.get_tensor_by_name('detection_scores:0')
			 classes = self.detection_graph.get_tensor_by_name('detection_classes:0')
			 num_detections = self.detection_graph.get_tensor_by_name('num_detections:0')
			
			 # Actual detection.
			 (boxes, scores, classes, num_detections) = sess.run(
			 	[boxes, scores, classes, num_detections],
			 	feed_dict={image_tensor: image_np_expanded})
			 box_depths=self.box_depth(boxes,im_width,im_height)
			 print(box_depths)
			 # Visualization of the results of a detection.
			 vis_util.visualize_boxes_and_labels_on_image_array(
			 	image_np,
			 	np.squeeze(boxes),
			 	np.squeeze(classes).astype(np.int32),
			 	np.squeeze(scores),
			 	self.category_index,
			 	use_normalized_coordinates=True,
			 	line_thickness=8)
			 
			 ros_compressed_image=self._cv_bridge.cv2_to_compressed_imgmsg(image_np)
			 self._pub.publish(ros_compressed_image)
			
    
    def shutdown(self):
        rospy.loginfo("Stopping the tensorflow object detection...")
        rospy.sleep(1) 
        
if __name__ == '__main__':
    try:
        ObjectDetectionDemo()
        rospy.spin()
    except rospy.ROSInterruptException:
        rospy.loginfo("RosTensorFlow_ObjectDetectionDemo has started.")

下面我们来解释主要的代码逻辑

    def __init__(self):
	rospy.init_node('object_detection_demo')
	
	# Set the shutdown function (stop the robot)
        rospy.on_shutdown(self.shutdown)
        
        self.depth_image =None
        
        self.depth_array = None
        
        model_path = rospy.get_param("~model_path", "")
        image_topic = rospy.get_param("~image_topic", "")
        depth_image_topic = rospy.get_param("~depth_image_topic", "")
        if_down=False
        self.vfc=0
        
        self._cv_bridge = CvBridge()

以上代码是ROS的标准初始化代码,变量的初始化,及launch文件中参数的读取

  • model_path定义object_detection所使用的模型路径
  • image_topic订阅的image主题
  • depth_image_topic订阅的深度image主题
        # What model to download.
	#MODEL_NAME = 'ssd_mobilenet_v1_coco_11_06_2017'
	#MODEL_NAME='faster_rcnn_resnet101_coco_11_06_2017'
	MODEL_NAME ='ssd_inception_v2_coco_11_06_2017'
	#MODEL_NAME ='diego_object_detection_v1_07_2017'
	#MODEL_NAME ='faster_rcnn_inception_resnet_v2_atrous_coco_11_06_2017'
	MODEL_FILE = MODEL_NAME + '.tar.gz'
	DOWNLOAD_BASE = 'http://download.tensorflow.org/models/object_detection/'

	# Path to frozen detection graph. This is the actual model that is used for the object detection.
	PATH_TO_CKPT = MODEL_NAME + '/frozen_inference_graph.pb'

	# List of the strings that is used to add correct label for each box.
	PATH_TO_LABELS = os.path.join(model_path+'/data', 'mscoco_label_map.pbtxt')
	#PATH_TO_LABELS = os.path.join('data', 'mscoco_label_map.pbtxt')

	NUM_CLASSES = 90
	
	if if_down:
		opener = urllib.request.URLopener()
		opener.retrieve(DOWNLOAD_BASE + MODEL_FILE, MODEL_FILE)
		tar_file = tarfile.open(MODEL_FILE)
		for file in tar_file.getmembers():
			file_name = os.path.basename(file.name)
			if 'frozen_inference_graph.pb' in file_name:
        			tar_file.extract(file, os.getcwd())

以上代码设置object_detection所使用的模型,及下载解压相应的文件,这里设置了一个if_down的开关,第一次运行的时候可以打开此开关下载,以后可以关掉,因为下载的时间比较长,下载一次后面就无需再下载了。

self.detection_graph = tf.Graph()
	with self.detection_graph.as_default():
		od_graph_def = tf.GraphDef()
		with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
			serialized_graph = fid.read()
			od_graph_def.ParseFromString(serialized_graph)
			tf.import_graph_def(od_graph_def, name='')

	label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
	categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES, use_display_name=True)
	self.category_index = label_map_util.create_category_index(categories)

以上代码是tensorflow的初始化代码。

        # Subscribe to the registered depth image
        rospy.Subscriber(depth_image_topic, ROSImage, self.convert_depth_image)
        
        # Wait for the depth image to become available
        #rospy.wait_for_message('depth_image', ROSImage)
	
	self._sub = rospy.Subscriber(image_topic, ROSImage, self.callback, queue_size=1)	
	self._pub = rospy.Publisher('object_detection', ROSImage_C, queue_size=1)

以上代码,我们定义此节点订阅depth_image和image两个topic,同时发布一个名为object_detection的Compressed Imagetopic
depth_image的回调函数是convert_depth_image
image的回调函数是callback

    def convert_depth_image(self, ros_image):
        # Use cv_bridge() to convert the ROS image to OpenCV format
        # The depth image is a single-channel float32 image
        self.depth_image = self._cv_bridge.imgmsg_to_cv2(ros_image, "32FC1")

        # Convert the depth image to a Numpy array
        self.depth_array = np.array(self.depth_image, dtype=np.float32)
        #print(self.depth_array)

以上是depth_image处理的回调函数,首先将depth_image主题转换成opencv类型的,然后在将图片转换为numpy数组,赋值给depth_array成员变量

    def callback(self,image_msg):
	if self.vfc<12:
		self.vfc=self.vfc+1
	else:
		self.callbackfun(image_msg)
		self.vfc=0

以上是image处理的回调函数,这里控制视频帧的处理频率,主要是为了减少运算量,可以灵活调整,最终视频帧的处理是在callbackfun中处理的

    def box_depth(self,boxes,im_width, im_height):
	# Now compute the depth component
        depth=[]
	for row in boxes[0]:
		n_z = sum_z = mean_z = 0
		# Get the min/max x and y values from the ROI
		if row[0]<row[1]:
			min_x = row[0]*im_width
			max_x = row[1]*im_width
		else:
			min_x = row[1]*im_width
			max_x = row[0]*im_width
			
		if row[2]<row[3]:
			min_y = row[2]*im_height
			max_y = row[3]*im_height
		else:
			min_y = row[3]*im_height
			max_y = row[2]*im_height
		# Get the average depth value over the ROI
		for x in range(int(min_x), int(max_x)):
            		for y in range(int(min_y), int(max_y)):
                		try:
					z = self.depth_array[y, x]
				except:
					continue
                
				# Depth values can be NaN which should be ignored
				if isnan(z):
					z=6
					continue
				else:
					sum_z = sum_z + z
					n_z += 1 
			mean_z = sum_z / (n_z+0.01)
		depth.append(mean_z)
	return depth

以上代码是深度数据计算,输入参数boxes就是object_detection识别出来的物体的矩形标识rect,我们根据物体的矩形范围,匹配深度图片相应的区域,计算区域内的平均深度值作为此物体的深度数据。返回一个与boxes想对应的1维数组
由于深度图,和一般图片是异步处理,可能出现帧不对应的问题,在这里处理的比较简单,没有考虑此问题,可以通过缓存深度图片的方式来解决,通过时间戳来匹配最近的深度图片。

def callbackfun(self, image_msg):
	with self.detection_graph.as_default():
		with tf.Session(graph=self.detection_graph) as sess:
			 cv_image = self._cv_bridge.imgmsg_to_cv2(image_msg, "bgr8")
			 #cv_image = (self._cv_bridge.imgmsg_to_cv2(image_msg, "bgr8"))[300:450, 150:380]
			 pil_img = Image.fromarray(cv_image)			 
			 (im_width, im_height) = pil_img.size			 
			 # the array based representation of the image will be used later in order to prepare the
			 # result image with boxes and labels on it.
			 image_np =np.array(pil_img.getdata()).reshape((im_height, im_width, 3)).astype(np.uint8)
			 # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
			 image_np_expanded = np.expand_dims(image_np, axis=0)
			 image_tensor = self.detection_graph.get_tensor_by_name('image_tensor:0')
			 # Each box represents a part of the image where a particular object was detected.
			 boxes = self.detection_graph.get_tensor_by_name('detection_boxes:0')
			 # Each score represent how level of confidence for each of the objects.
			 # Score is shown on the result image, together with the class label.
			 scores = self.detection_graph.get_tensor_by_name('detection_scores:0')
			 classes = self.detection_graph.get_tensor_by_name('detection_classes:0')
			 num_detections = self.detection_graph.get_tensor_by_name('num_detections:0')
			
			 # Actual detection.
			 (boxes, scores, classes, num_detections) = sess.run(
			 	[boxes, scores, classes, num_detections],
			 	feed_dict={image_tensor: image_np_expanded})
			 box_depths=self.box_depth(boxes,im_width,im_height)
			 # Visualization of the results of a detection.
			 vis_util.visualize_boxes_and_labels_on_image_array(
			 	image_np,
			 	np.squeeze(boxes),
			 	np.squeeze(classes).astype(np.int32),
			 	np.squeeze(scores),
			 	self.category_index,
                                box_depths,
			 	use_normalized_coordinates=True,
			 	line_thickness=8)
			 
			 ros_compressed_image=self._cv_bridge.cv2_to_compressed_imgmsg(image_np)
			 self._pub.publish(ros_compressed_image)

以上代码是图片的回调函数,主要是将image消息转换为opencv格式,然后再转换成numpy数组,调用object_detection来识别图片中的物体,再调用 vis_util.visualize_boxes_and_labels_on_image_array将识别出来的物体在图片上标识出来,最后将处理后的图片发布为compressed_image类型的消息

现在我们只需要简单修改一下Object_detection/utils目录下的visualization_utils.py文件,就可以显示深度信息

def visualize_boxes_and_labels_on_image_array(image,
                                              boxes,
                                              classes,
                                              scores,
                                              category_index,
	                                      box_depths=None,
                                              instance_masks=None,
                                              keypoints=None,
                                              use_normalized_coordinates=False,
                                              max_boxes_to_draw=20,
                                              min_score_thresh=.5,
                                              agnostic_mode=False,
                                              line_thickness=4):

在visualize_boxes_and_labels_on_image_array定义中增加box_depths=None,缺省值为None

  for i in range(min(max_boxes_to_draw, boxes.shape[0])):
    if scores is None or scores[i] > min_score_thresh:
      box = tuple(boxes[i].tolist())
      if instance_masks is not None:
        box_to_instance_masks_map[box] = instance_masks[i]
      if keypoints is not None:
        box_to_keypoints_map[box].extend(keypoints[i])
      if scores is None:
        box_to_color_map[box] = 'black'
      else:
        if not agnostic_mode:
          if classes[i] in category_index.keys():
            class_name = category_index[classes[i]]['name']
          else:
            class_name = 'N/A'
          display_str = '{}: {}%'.format(
              class_name,
              int(100*scores[i]))
        else:
          display_str = 'score: {}%'.format(int(100 * scores[i]))
          
        #modify by diego robot
        if box_depths!=None:
        	display_str=display_str+"\ndepth: "+str(box_depths[i])
        	global depth_info
        	depth_info=True
        else:
        	global depth_info
        	depth_info=False
        #######################
        box_to_display_str_map[box].append(display_str)
        if agnostic_mode:
          box_to_color_map[box] = 'DarkOrange'
        else:
          box_to_color_map[box] = STANDARD_COLORS[
              classes[i] % len(STANDARD_COLORS)]

在第一个for循环里面,的box_to_display_str_map[box].append(display_str)一句前面增加如上diego robot修改部分代码

depth_info=False

在文件的开头部分定义全局变量,表示是否有深度信息传递进来

 # Reverse list and print from bottom to top.
  for display_str in display_str_list[::-1]: 
    text_width, text_height = font.getsize(display_str)    
    #modify by william
    global depth_info
    if depth_info:
  	text_height=text_height*2
    ###################
    	
    margin = np.ceil(0.05 * text_height)
    draw.rectangle(
        [(left, text_bottom - text_height - 2 * margin), (left + text_width,
                                                          text_bottom)],
        fill=color)
    draw.text(
        (left + margin, text_bottom - text_height - margin),
        display_str,
        fill='black',
        font=font)
    text_bottom -= text_height - 2 * margin

在draw_bounding_box_on_image函数的margin = np.ceil(0.05 * text_height)一句前面增加如上diego robot修改部分代码

3.launch文件

<launch>
   <node pkg="diego_tensorflow" name="ObjectDetectionDemo" type="ObjectDetectionDemo.py" output="screen">

       <param name="image_topic" value="/camera/rgb/image_raw" />

       <param name="depth_image_topic" value="/camera/depth_registered/image" />

       <param name="model_path" value="$(find diego_tensorflow)/scripts/object_detection" />

   </node>

</launch>

launch文件中定义了相应的参数,image_topic,depth_image_topic,model_path,读者可以根据自己的实际情况设定

4.启动节点

启动openni

roslaunch diego_vision openni_node.launch 

启动object_detection

roslaunch diego_tensorflow object_detection_demo.launch

5.通过手机APP订阅object_detection

我们只需要设置一个image_topic为object_detection,既可以在手机上看到物体识别的效果

object_detection物体识别是建立在训练模型上的,本文所采用的是官方所提供的训练好的模型,识别率还是比较高的。但是如果机器人所工作的环境与训练的图片差异比较大,识别率会大大降低。所以我们在实际使用中可以训练自己物体模型,在结合机器人的其他传感器和装置来达到实际应用的要求。比如可以应用在人员流量的监控,车辆的监控,或者特定物体的跟踪,再配合机械臂,深度相机,可以实现物体位置的判断,抓取等功能。

发表评论