A Practical Guide for Debugging TensorFlow Codes

A Practical Guide for Debugging TensorFlow Codes

A Practical Guide for Debugging TensorFlow Codes

.author[Jongwook Choi]

.small[.white[Feb 17th, 2017]
.green[Initial Version: June 18th, 2016]]

.x-small[https://github.com/wookayin/tensorflow-talk-debugging]


layout: false

Bio: Jongwook Choi ()

  • An undergraduate student from Seoul National University,
  • Looking for a graduate (Ph.D) program in ML/DL
  • A huge fan of TensorFlow and Deep Learning ?

    .dogdrip[TensorFlow rocks!!!]

.right.img-33[[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-m9hwp1Gk-1603170186514)(images/profile-wook.png)]]


About

This talk aims to share you with some practical guides and tips on writing and debugging TensorFlow codes.

… because you might find that debugging TensorFlow codes is something like …


class: center, middle, no-number, bg-full
background-image: url(images/meme-doesnt-work.jpg)
background-repeat: no-repeat
background-size: contain


Welcome!

.green[Contents]

  • Introduction: Why debugging in TensorFlow is difficult
  • Basic and advanced methods for debugging TensorFlow codes
  • General tips and guidelines for easy-debuggable code
  • .dogdrip[Benchmarking and profiling TensorFlow codes]

.red[A Disclaimer]

  • This talk is NOT about how to debug your ML/DL model
    .gray[(e.g. my model is not fitting well)],
    but about how to debug your TF codes in a programming perspective
  • I had to assume that the audience is somewhat familiar with basics of TensorFlow and Deep Learning;
    it would be very good if you have an experience to write a TensorFlow code by yourself
  • Questions are highly welcomed! Please feel free to interrupt!

template: inverse

Debugging?


Debugging TensorFlow Application is …

  • Difficult!

  • Do you agree? .dogdrip[Life is not that easy]

Review: TensorFlow Computation Mechanics

  • The core concept of TensorFlow: The Computation Graph
  • See Also:

.center.img-33[[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-JVhJ747f-1603170186516)(images/tensors_flowing.gif)]]


Review: TensorFlow Computation Mechanics

.gray.right[(from )]

TensorFlow programs are usually structured into

  • a .green[construction phase], that assembles a graph, and
  • an .blue[execution phase] that uses a session to execute ops in the graph.

.center.img-33[[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-f4ba6UIA-1603170186519)(images/tensors_flowing.gif)]]


Review: in pure numpy …

W1, b1, W2, b2, W3, b3 = init_parameters() def multilayer_perceptron(x, y_truth): # (i) feed-forward pass assert x.dtype == np.float32 and x.shape == [batch_size, 784] # numpy! * fc1 = fully_connected(x, W1, b1, activation_fn='relu') # [B, 256] * fc2 = fully_connected(fc1, W2, b2, activation_fn='relu') # [B, 256] out = fully_connected(fc2, W3, b3, activation_fn=None) # [B, 10] # (ii) loss and gradient, backpropagation loss = softmax_cross_entropy_loss(out, y_truth) # loss as a scalar param_gradients = _compute_gradient(...) # just an artificial example :) return out, loss, param_gradients def train(): for epoch in range(10): epoch_loss = 0.0 batch_steps = mnist.train.num_examples / batch_size for step in range(batch_steps): batch_x, batch_y = mnist.train.next_batch(batch_size) * y_pred, loss, gradients = multilayer_perceptron(batch_x, batch_y) for v, grad_v in zip(all_params, gradients): v = v - learning_rate * grad_v epoch_loss += c / batch_steps print "Epoch %02d, Loss = %.6f" % (epoch, epoch_loss) 

Review: with TensorFlow

def multilayer_perceptron(x): * fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu) * fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu) out = layers.fully_connected(fc2, 10, activation_fn=None) return out x = tf.placeholder(tf.float32, [None, 784]) y = tf.placeholder(tf.float32, [None, 10]) *pred = multilayer_perceptron(x) loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y)) train_op = tf.train.AdamOptimizer(learning_rate=0.001).minimize(loss) def train(session): batch_size = 200 session.run(tf.initialize_all_variables()) for epoch in range(10): epoch_loss = 0.0 batch_steps = mnist.train.num_examples / batch_size for step in range(batch_steps): batch_x, batch_y = mnist.train.next_batch(batch_size) * _, c = session.run([train_op, loss], {x: batch_x, y: batch_y}) epoch_loss += c / batch_steps print "Epoch %02d, Loss = %.6f" % (epoch, epoch_loss) 

Review: The Issues

def multilayer_perceptron(x): * fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu) * fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu) out = layers.fully_connected(fc2, 10, activation_fn=None) return out x = tf.placeholder(tf.float32, [None, 784]) y = tf.placeholder(tf.float32, [None, 10]) *pred = multilayer_perceptron(x) 
  • The actual computation is done inside session.run();
    what we just have done is to build a computation graph
  • The model building part (e.g. multilayer_perceptron()) is called only once
    before training, so we can’t access the intermediates simply
  • e.g. Inspecting activations of fc1/fc2 is not trivial!
* _, c = session.run([train_op, loss], {x: batch_x, y: batch_y}) 

The most important method in TensorFlow — where every computation is performed!

  • tf.Session.run(fetches, feed_dict) runs the operations and evaluates in fetches,
    subsituting the values (placeholders) in feed_dict for the corresponding input values.

Why TensorFlow debugging is difficult?

  • The concept of Computation Graph might be unfamiliar to us.
  • The “Inversion of Control”
  • The actual computation (feed-forward, training) of model runs inside Session.run(),
    upon the computation graph, but not upon the Python code we wrote
  • What is exactly being done during an execution of session is under an abstraction barrier
  • Therefore, we do not retrieve the intermediate values during the computation,
    unless we explicitly fetch them via Session.run()

template: inverse

Debugging Facilities in TensorFlow


Debugging Scenarios

We may wish to …

  • inspect intra-layer .blue[activations] (during training)
  • e.g. See the output of conv5, fc7 in CNNs
  • inspect .blue[parameter weights] (during training)
  • under some conditions, pause the execution (i.e. .blue[breakpoint]) and
    evaluate some expressions for debugging
  • during training, .red[NaN] occurs in loss and variables .dogdrip[but I don’t know why]

? Of course, in TensorFlow, we can do these very elegantly!


Debugging in TensorFlow: Overview

.blue[Basic ways:]

  • Explicitly fetch, and print (or do whatever you want)!
  • Session.run()
  • Tensorboard: Histogram and Image Summary
  • the tf.Print() operation

.blue[Advanced ways:]

  • Interpose any python codelet in the computation graph
  • A step-by-step debugger
  • tfdbg: The TensorFlow debugger

(1) Fetch tensors via Session.run()

TensorFlow allows us to run parts of graph in isolation, i.e.
.green[only the relevant part] of graph is executed (rather than executing everything)

x = tf.placeholder(tf.float32) y = tf.placeholder(tf.float32) bias = tf.Variable(1.0) y_pred = x ** 2 + bias # x -> x^2 + bias loss = (y - y_pred)**2 # l2 loss? # Error: to compute loss, y is required as a dependency print('Loss(x,y) = %.3f' % session.run(loss, {x: 3.0})) # OK, print 1.000 = (3**2 + 1 - 9)**2 print('Loss(x,y) = %.3f' % session.run(loss, {x: 3.0, y: 9.0})) # OK, print 10.000; for evaluating y_pred only, input to y is not required *print('pred_y(x) = %.3f' % session.run(y_pred, {x: 3.0})) # OK, print 1.000 bias evaluates to 1.0 *print('bias = %.3f' % session.run(bias)) 

Tensor Fetching: Example

We need to access to the tensors as python expressions

def multilayer_perceptron(x): fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu) fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu) out = layers.fully_connected(fc2, 10, activation_fn=None) * return out, fc1, fc2 net = {} x = tf.placeholder(tf.float32, [None, 784]) y = tf.placeholder(tf.float32, [None, 10]) *pred, net['fc1'], net['fc2'] = multilayer_perceptron(x) 

to fetch and evaluate them:

 _, c, fc1, fc2, out = session.run( * [train_op, loss, net['fc1'], net['fc2'], pred], feed_dict={x: batch_x, y: batch_y}) # and do something ... if step == 0: # XXX Debug print fc1[0].mean(), fc2[0].mean(), out[0] 

(1) Fetch tensors via Session.run()

.green[The Good:]

  • Simple and Easy.
  • The most basic method to get debugging information.
  • We can fetch any evaluation result in numpy arrays, .green[anywhere] .gray[(except inside Session.run() or the computation graph)].

.red[The Bad:]

  • We need to hold the reference to the tensors to inspect,
    which might be burdensome if model becomes complex and big

    .gray[(Or, we can simply pass the tensor name such as fc0/Relu:0)]
  • The feed-forward needs to be done in an atomic way (i.e. a single call of Session.run())

Tensor Fetching: The Bad (i)

  • We need to hold a reference to the tensors to inspect,
    which might be burdensome if model becomes complex and big
def alexnet(x): assert x.get_shape().as_list() == [224, 224, 3] conv1 = conv_2d(x, 96, 11, strides=4, activation='relu') pool1 = max_pool_2d(conv1, 3, strides=2) conv2 = conv_2d(pool1, 256, 5, activation='relu') pool2 = max_pool_2d(conv2, 3, strides=2) conv3 = conv_2d(pool2, 384, 3, activation='relu') conv4 = conv_2d(conv3, 384, 3, activation='relu') conv5 = conv_2d(conv4, 256, 3, activation='relu') pool5 = max_pool_2d(conv5, 3, strides=2) fc6 = fully_connected(pool5, 4096, activation='relu') fc7 = fully_connected(fc6, 4096, activation='relu') output = fully_connected(fc7, 1000, activation='softmax') return conv1, pool1, conv2, pool2, conv3, conv4, conv5, pool5, fc6, fc7 conv1, conv2, conv3, conv4, conv5, fc6, fc7, output = alexnet(images) # ?! 
_, loss_, conv1_, conv2_, conv3_, conv4_, conv5_, fc6_, fc7_ = session.run( [train_op, loss, conv1, conv2, conv3, conv4, conv5, fc6, fc7], feed_dict = {...}) 

Tensor Fetching: The Bad (i)

  • Suggestion: Using a dict or class instance (e.g. self.conv5) is a very good idea
def alexnet(x, net={}): assert x.get_shape().as_list() == [224, 224, 3] net['conv1'] = conv_2d(x, 96, 11, strides=4, activation='relu') net['pool1'] = max_pool_2d(net['conv1'], 3, strides=2) net['conv2'] = conv_2d(net['pool1'], 256, 5, activation='relu') net['pool2'] = max_pool_2d(net['conv2'], 3, strides=2) net['conv3'] = conv_2d(net['pool2'], 384, 3, activation='relu') net['conv4'] = conv_2d(net['conv3'], 384, 3, activation='relu') net['conv5'] = conv_2d(net['conv4'], 256, 3, activation='relu') net['pool5'] = max_pool_2d(net['conv5'], 3, strides=2) net['fc6'] = fully_connected(net['pool5'], 4096, activation='relu') net['fc7'] = fully_connected(net['fc6'], 4096, activation='relu') net['output'] = fully_connected(net['fc7'], 1000, activation='softmax') return net['output'] net = {} output = alexnet(images, net) # access intermediate layers like net['conv5'], net['fc7'], etc. 

Tensor Fetching: The Bad (i)

  • Suggestion: Using a dict or class instance (e.g. self.conv5) is a very good idea
class AlexNetModel(): # ... def build_model(self, x): assert x.get_shape().as_list() == [224, 224, 3] self.conv1 = conv_2d(x, 96, 11, strides=4, activation='relu') self.pool1 = max_pool_2d(self.conv1, 3, strides=2) self.conv2 = conv_2d(self.pool1, 256, 5, activation='relu') self.pool2 = max_pool_2d(self.conv2, 3, strides=2) self.conv3 = conv_2d(self.pool2, 384, 3, activation='relu') self.conv4 = conv_2d(self.conv3, 384, 3, activation='relu') self.conv5 = conv_2d(self.conv4, 256, 3, activation='relu') self.pool5 = max_pool_2d(self.conv5, 3, strides=2) self.fc6 = fully_connected(self.pool5, 4096, activation='relu') self.fc7 = fully_connected(self.fc6, 4096, activation='relu') self.output = fully_connected(self.fc7, 1000, activation='softmax') return self.output model = AlexNetModel() output = model.build_model(images) # access intermediate layers like self.conv5, self.fc7, etc. 

Tensor Fetching: The Bad (ii)

  • The feed-forward (sometimes) needs to be done in an atomic way (i.e. a single call of Session.run())
# a single step of training ... *[loss_value, _] = session.run([loss_op, train_op], feed_dict={images: batch_image}) # After this, the model parameter has been changed due to `train_op` # if np.isnan(loss_value): # DEBUG : can we see the intermediate values for the current input? [fc7, prob] = session.run([net['fc7'], net['prob']], feed_dict={images: batch_image}) 
  • In other words, if any input is fed via feed_dict,
    we may have to fetch the non-debugging-related tensors
    and the debugging-related tensors at the same time.

Tensor Fetching: The Bad (ii)

  • In fact, we can just perform an additional session.run() for debugging purposes,
    if it does not involve any side effect
# for debugging only, get the intermediate layer outputs. [fc7, prob] = session.run([net['fc7'], net['prob']], feed_dict={images: batch_image}) # # Yet another feed-forward: 'fc7' are computed once more ... [loss_value, _] = session.run([loss_op, train_op], feed_dict={images: batch_image}) 
  • A workaround: Use  .gray[(undocumented, and still experimental)]
h = sess.partial_run_setup([net['fc7'], loss_op, train_op], [images]) [loss_value, _] = sess.partial_run(h, [loss_op, train_op], feed_dict={images: batch_image}) fc7 = sess.partial_run(h, net['fc7']) 

(2) Tensorboard

  • An off-the-shelf monitoring and debugging tool!
  • Check out a must-read  from TensorFlow documentation



  • You will need to learn
  • how to use and collect
  • how to use  .dogdrip[(previously it was SummaryWriter)]

Tensorboard: A Quick Tutorial

def multilayer_perceptron(x): # inside this, variables 'fc1/weights' and 'fc1/bias' are defined fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, scope='fc1') * tf.summary.histogram('fc1', fc1) * tf.summary.histogram('fc1/sparsity', tf.nn.zero_fraction(fc1)) fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu, scope='fc2') * tf.summary.histogram('fc2', fc2) * tf.summary.histogram('fc2/sparsity', tf.nn.zero_fraction(fc2)) out = layers.fully_connected(fc2, 10, scope='out') return out # ... (omitted) ... loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits( logits=pred, labels=y)) *tf.scalar_summary('loss', loss) *# histogram summary for all trainable variables (slow?) *for v in tf.trainable_variables(): * tf.summary.histogram(v.name, v) 

Tensorboard: A Quick Tutorial

*global_step = tf.Variable(0, dtype=tf.int32, trainable=False) train_op = tf.train.AdamOptimizer(learning_rate=0.001)\ .minimize(loss, global_step=global_step) 
def train(session): batch_size = 200 session.run(tf.global_variables_initializer()) * merged_summary_op = tf.summary.merge_all() * summary_writer = tf.summary.FileWriter(FLAGS.train_dir, session.graph) # ... (omitted) ... for step in range(batch_steps): batch_x, batch_y = mnist.train.next_batch(batch_size) * _, c, summary = session.run( [train_op, loss, merged_summary_op], feed_dict={x: batch_x, y: batch_y}) * summary_writer.add_summary(summary, * global_step.eval(session=session)) 

Tensorboard: A Quick Tutorial (Demo)

Scalar Summary

.center.img-100[[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-UliDXkO1-1603170186522)(images/tensorboard-01-loss.png)]]


Tensorboard: A Quick Tutorial (Demo)

Histogram Summary (activations and variables)

.center.img-66[[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-vaJjQqkR-1603170186523)(images/tensorboard-02-histogram.png)]]


Tensorboard and Summary: Noteworthies

  • Fetching histogram summary is extremely slow!
  • GPU utilization can become very low (if the serialized values are huge)
  • In non-debugging mode, disable it completely; or fetch summaries only periodically, e.g.
eval_tensors = [self.loss, self.train_op] if step % 200 == 0: eval_tensors += [self.merged_summary_op] eval_ret = session.run(eval_tensors, feed_dict) eval_ret = dict(zip(eval_tensors, eval_ret)) # as a dict current_loss = eval_ret[self.loss] if self.merged_summary_op in eval_tensors: self.summary_writer.add_summary( eval_ret[self.merged_summary_op], current_step) 
  • I recommend to take simple and essential scalar summaries only (e.g. train/validation loss, overall accuracy, etc.), and to include debugging stuffs only on demand

Tensorboard and Summary: Noteworthies

  • Some other recommendations:
  • Use proper names (prefixed or scoped) for tensors and variables (specifying name=... to tensor/variable declaration)
  • Include both of train loss and validation loss,
    plus train/validation accuracy (if possible) over step

.center.img-50[[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-AabT06wn-1603170186525)(images/tensorboard-02-histogram.png)]]


(3)

  • During run-time evaluation, we can print the value of a tensor
    .green[without] explicitly fetching and returning it to the code (i.e. via session.run())
tf.Print(input, data, message=None, first_n=None, summarize=None, name=None) 
  • It creates .blue[an identity op] with the side effect of printing data,
    when this op is evaluated.

(3) tf.Print(): Examples

def multilayer_perceptron(x): fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu) fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu) out = layers.fully_connected(fc2, 10, activation_fn=None) * out = tf.Print(out, [tf.argmax(out, 1)], * 'argmax(out) = ', summarize=20, first_n=7) return out 

For the first seven times (i.e. 7 feed-forwards or SGD steps),
it will print the predicted labels for the 20 out of batch_size examples

I tensorflow/core/kernels/logging_ops.cc:79] argmax(out) = [6 6 6 4 4 6 4 4 6 6 4 0 6 4 6 4 4 6 0 4...] I tensorflow/core/kernels/logging_ops.cc:79] argmax(out) = [6 6 0 0 3 6 4 3 6 6 3 4 4 4 4 4 3 4 6 7...] I tensorflow/core/kernels/logging_ops.cc:79] argmax(out) = [3 4 0 6 6 6 0 7 3 0 6 7 3 6 0 3 4 3 3 6...] I tensorflow/core/kernels/logging_ops.cc:79] argmax(out) = [6 1 0 0 0 3 3 7 0 8 1 2 0 9 9 0 3 4 6 6...] I tensorflow/core/kernels/logging_ops.cc:79] argmax(out) = [6 0 0 9 0 4 9 9 0 8 2 7 3 9 1 8 3 9 7 8...] I tensorflow/core/kernels/logging_ops.cc:79] argmax(out) = [6 0 1 1 9 0 8 3 0 9 9 0 2 6 7 7 3 3 3 9...] I tensorflow/core/kernels/logging_ops.cc:79] argmax(out) = [3 6 9 8 3 9 1 0 1 1 9 3 2 3 9 9 3 0 6 6...] [2016-06-03 00:11:08.661563] Epoch 00, Loss = 0.332199 

(3) tf.Print(): Some drawbacks …

.red[Cons:]

  • It is hard to take a full control of print formats (e.g. how do we print a 2D tensor in a matrix format?)
  • Usually, we may want to print debugging values conditionally
    (i.e. print them only if some condition is met)
    or periodically
    (i.e. print just only once per epoch)
  • tf.Print() has limitations to achieve this
  • TensorFlow has control flow operations; an overkill?

(3)

  • Asserts that the given condition is true, when evaluated (during the computation)
  • If condition evaluates to False, print the list of tensors in data,
    and an error is thrown.
    summarize determines how many entries of the tensors to print.
tf.Assert(condition, data, summarize=None, name=None) 

tf.Assert: Examples

Abort the program if …

def multilayer_perceptron(x): fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, scope='fc1') fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu, scope='fc2') out = layers.fully_connected(fc2, 10, activation_fn=None, scope='out') # let's ensure that all the outputs in `out` are positive * tf.Assert(tf.reduce_all(out > 0), [out], name='assert_out_positive') return out 

The assertion will not work!

  • tf.Assert is also an op, so it should be executed as well

tf.Assert: Examples

We need to ensure that assert_op is being executed when evaluating out:

def multilayer_perceptron(x): fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, scope='fc1') fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu, scope='fc2') out = layers.fully_connected(fc2, 10, activation_fn=None, scope='out') # let's ensure that all the outputs in `out` are positive assert_op = tf.Assert(tf.reduce_all(out > 0), [out], name='assert_out_positive') * with tf.control_dependencies([assert_op]): * out = tf.identity(out, name='out') return out 

… somewhat ugly? … or

 # ... same as above ... * out = tf.with_dependencies([assert_op], out) return out 

tf.Assert: Examples

Another good way: store all the created assertion operations into a collection,
(merge them into a single op), and explicitly evaluate them using Session.run()

def multilayer_perceptron(x): fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, scope='fc1') fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu, scope='fc2') out = layers.fully_connected(fc2, 10, activation_fn=None, scope='out') * tf.add_to_collection('Asserts', * tf.Assert(tf.reduce_all(out > 0), [out], name='assert_out_gt_0') * ) return out # merge all assertion ops from the collection *assert_op = tf.group(*tf.get_collection('Asserts')) 
... = session.run([train_op, assert_op], feed_dict={...}) 

Some built-in useful Assert ops

See  in the docs!

tf.assert_negative(x, data=None, summarize=None, name=None) tf.assert_positive(x, data=None, summarize=None, name=None) tf.assert_proper_iterable(values) tf.assert_non_negative(x, data=None, summarize=None, name=None) tf.assert_non_positive(x, data=None, summarize=None, name=None) tf.assert_equal(x, y, data=None, summarize=None, name=None) tf.assert_integer(x, data=None, summarize=None, name=None) tf.assert_less(x, y, data=None, summarize=None, name=None) tf.assert_less_equal(x, y, data=None, summarize=None, name=None) tf.assert_rank(x, rank, data=None, summarize=None, name=None) tf.assert_rank_at_least(x, rank, data=None, summarize=None, name=None) tf.assert_type(tensor, tf_type) tf.is_non_decreasing(x, name=None) tf.is_numeric_tensor(tensor) tf.is_strictly_increasing(x, name=None) 

If we need runtime assertions during computation, they are useful.


(4) Step-by-step Debugger

Python already has a powerful debugging utilities:

which are all interactive debuggers (like gdb for C/C++)

  • set breakpoint
  • pause, continue
  • inspect stack-trace upon exception
  • watch variables and evaluate expressions interactively

Debugger: Usage

Insert set_trace() for a breakpoint

def multilayer_perceptron(x): fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, scope='fc1') fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu, scope='fc2') out = layers.fully_connected(fc2, 10, activation_fn=None, scope='out') * import ipdb; ipdb.set_trace() # XXX BREAKPOINT return out 

.gray[(yeah, it is a breakpoint on model building)]

.img-75.center[
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-jxTod6fR-1603170186527)(images/pdb-example-01.png)]
]


Debugger: Usage

Debug breakpoints can be conditional:

 for i in range(batch_steps): batch_x, batch_y = mnist.train.next_batch(batch_size) * if (np.argmax(batch_y, axis=1)[:7] == [4, 9, 6, 2, 9, 6, 5]).all(): * import pudb; pudb.set_trace() # XXX BREAKPOINT _, c = session.run([train_op, loss], feed_dict={x: batch_x, y: batch_y}) 

.green[Live Demo and Case Example!!] .small[(20-mnist-pdb.py)]

  • Let’s break on training loop if some condition is met
  • Get the fc2 tensor, and fetch its evaluation result (given x)
  • Session.run() can be invoked and executed anywhere, even in the debugger
  • .dogdrip[Wow… gonna love it…]

Hint: Some Useful TensorFlow APIs

To get any .green[operations] or .green[tensors] that might not be stored explicitly:

  • tf.get_default_graph(): Get the current (default) graph
  • G.get_operations(): List all the TF ops in the graph
  • G.get_operation_by_name(name): Retrieve a specific TF op

    .gray[(Q. How to convert an operation to a tensor?)]
  • G.get_tensor_by_name(name): Retrieve a specific tensor
  • tf.get_collection(tf.GraphKeys.~~): Get the collection of some tensors

To get .green[variables]:

  • tf.get_variable_scope(): Get the current variable scope
  • tf.get_variable(): Get a variable (see )
  • tf.trainable_variables(): List all the (trainable) variables
 [v for v in tf.all_variables() if v.name == 'fc2/weights:0'][0] 

IPython.embed()

.green[ipdb/pudb:]

import pudb; pudb.set_trace() 
  • They are debuggers; we can set breakpoint, see stacktraces, watch expressions, …
    (much more general)

.green[embed:]

from IPython import embed; embed() 
  • Open an ipython shell on the current context;
    mostly used for watching expressions only

(5) Debugging ‘inside’ the computation graph

Our debugging tools so far can be used for debugging outside Session.run().

Question: How can we run a .red[‘custom’] operation? (e.g. custom layer)

TensorFlow allows us  in C++ !

The ‘custom’ operation can be designed for logging or debugging purposes (like )

… but very burdensome (need to compile, define op interface, and use it …)


(5) Interpose any python code in the computation graph

We can also embed and interpose a python function in the graph:
 comes to the rescue!

tf.py_func(func, inp, Tout, stateful=True, name=None) 
  • Wraps a python function and uses it .blue[as a tensorflow op].
  • Given a python function func, which .green[takes numpy arrays] as its inputs and returns numpy arrays as its outputs, the function is wrapped as an operation.
def my_func(x): # x will be a numpy array with the contents of the placeholder below return np.sinh(x) inp = tf.placeholder(tf.float32, [...]) *y = py_func(my_func, [inp], [tf.float32]) 

(5) Interpose any python code in the computation graph

In other words, we are now able to use the following (hacky) .green[tricks]
by intercepting the computation being executed on the graph:

  • Print any intermediate values (e.g. layer activations) in a custom form
    without fetching them
  • Use python debugger (e.g. trace and breakpoint)
  • Draw a graph or plot using matplotlib, or save images into file

.gray.small[Warning: Some limitations may exist, e.g. thread-safety issue, not allowed to manipulate the state of session, etc…]


Case Example (i): Print

An ugly example …

def multilayer_perceptron(x): fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, scope='fc1') fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu, scope='fc2') out = layers.fully_connected(fc2, 10, activation_fn=None, scope='out') * def _debug_print_func(fc1_val, fc2_val): print 'FC1 : {}, FC2 : {}'.format(fc1_val.shape, fc2_val.shape) print 'min, max of FC2 = {}, {}'.format(fc2_val.min(), fc2_val.max()) return False * debug_print_op = tf.py_func(_debug_print_func, [fc1, fc2], [tf.bool]) with tf.control_dependencies(debug_print_op): out = tf.identity(out, name='out') return out 

Case Example (ii): Breakpoints!

An ugly example to attach breakpoints …

def multilayer_perceptron(x): fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, scope='fc1') fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu, scope='fc2') out = layers.fully_connected(fc2, 10, activation_fn=None, scope='out') * def _debug_func(x_val, fc1_val, fc2_val, out_val): if (out_val == 0.0).any(): * import ipdb; ipdb.set_trace() # XXX BREAKPOINT * from IPython import embed; embed() # XXX DEBUG return False * debug_op = tf.py_func(_debug_func, [x, fc1, fc2, out], [tf.bool]) with tf.control_dependencies(debug_op): out = tf.identity(out, name='out') return out 

Another one: The tdb library

A third-party TensorFlow debugging tool: .small[https://github.com/ericjang/tdb]

.small[(not actively maintained and looks clunky, but still good for prototyping)]

.img-90.center[
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-ga4IFq4s-1603170186529)(https://camo.githubusercontent.com/4c671d2b359c9984472f37a73136971fd60e76e4/687474703a2f2f692e696d6775722e636f6d2f6e30506d58516e2e676966)]
]


(6) tfdbg: The official TensorFlow debugger

Recent versions of TensorFlow has the official debugger ().
Still experimental, but works quite well!

Check out the  and  on tfdbg!!!

import tensorflow.python.debug as tf_debug sess = tf.Session() # create a debug wrapper session *sess = tf_debug.LocalCLIDebugWrapperSession(sess) # Add a tensor filter (similar to breakpoint) *sess.add_tensor_filter("has_inf_or_nan", tf_debug.has_inf_or_nan) # Each session.run() will be intercepted by the debugger, # and we can inspect the value of tensors via the debugger interface sess.run(loss, feed_dict = {x : ...}) 

(6) tfdbg: The TensorFlow debugger

.img-100.center[
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-GbFu0mri-1603170186530)(images/tfdbg_example1.png)]
]


(6) tfdbg: The TensorFlow debugger

.img-100.center[
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-mIKnOpro-1603170186532)(images/tfdbg_example2.png)]
]


tfdbg: Features and Quick References

Conceptually, a wrapper session is employed (currently, CLI debugger session); it can intercept a single run of session.run()

  • run / r : Execute the run() call .blue[with debug tensor-watching]
  • run -n / r -n : Execute the run() call .blue[without] debug tensor-watching
  • run -f <filter_name> : Keep executing run() calls until a dumped tensor passes a registered filter (conditional breakpoint)
  • e.g. has_inf_or_nan

.img-80.center[
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-NqVEN35J-1603170186533)(images/tfdbg_example_run.png)]
]


tfdbg: Tensor Filters

Registering tensor filters:

sess = tf_debug.LocalCLIDebugWrapperSession(sess) sess.add_tensor_filter("has_inf_or_nan", tf_debug.has_inf_or_nan) 

Tensor filters are just python functions (datum, tensor) -> bool:

def has_inf_or_nan(datum, tensor): _ = datum # Datum metadata is unused in this predicate. if tensor is None: # Uninitialized tensor doesn't have bad numerical values. return False elif (np.issubdtype(tensor.dtype, np.float) or np.issubdtype(tensor.dtype, np.complex) or np.issubdtype(tensor.dtype, np.integer)): return np.any(np.isnan(tensor)) or np.any(np.isinf(tensor)) else: return False 

Running tensor filters are, therefore, quite slow.


tfdbg: Tensor Fetching

In a tensor dump mode (the run-end UI), the debugger shows the list of tensors dumped in the session.run() call:

.img-100.center[
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Us29Mgo2-1603170186535)(images/tfdbg_example_fetch.png)]
]


tfdbg: Tensor Fetching

Commands:

  • .blue[list_tensors (lt)] : Show the list of dumped tensor(s).
  • .blue[print_tensor (pt)] : Print the value of a dumped tensor.
  • node_info (ni) : Show information about a node
  • ni -t : Shows the traceback of tensor creation
  • list_inputs (li) : Show inputs to a node
  • list_outputs (lo) : Show outputs to a node
  • run_info (ri) : Show the information of current run
    (e.g. what to fetch, what feed_dict is)
  • .green[invoke_stepper (s)] : Invoke the stepper!
  • run (r) : Move to the next run

tfdbg: Tensor Fetching

Example: print_tensor fc2/Relu:0

.img-80.center[
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-roFrOqdz-1603170186537)(images/tfdbg_example_pt.png)]
]

  • Slicing: pt f2/Relu:0[0:10]
  • Dumping: pt fc2/Relu:0 > /tmp/debug/fc2.txt

See also:


tfdbg: Stepper

Shows the tensor value(s) in a topologically-sorted order for the run.

.img-100.center[
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-KnMB8o88-1603170186538)(images/tfdbg_example_stepper.png)]
]


tfdbg: Screencast and Demo!

.small.right[From Google Brain Team]

.small[


See also:
]


tfdbg: Other Remarks

  • Currently it is actively being developed (still experimental)
  • In a near future, a web-based interactive debugger (integration with TensorBoard) will be out!

Debugging: Summary

  • Session.run(): Explicitly fetch, and print
  • Tensorboard: Histogram and Image Summary
  • tf.Print(), tf.Assert() operation
  • Use python debugger (ipdb, pudb)
  • Interpose your debugging python code in the graph
  • The TensorFlow debugger: tfdbg

.green[There is no silver bullet; one might need to choose the most convenient and suitable debugging tool, depending on the case]


template: inverse

Other General Tips

.gray[(in a Programmer’s Perspective)]


General Tips of Debugging

  • Learn to use debugging tools, but do not solely rely on them when a problem occurs.
  • Sometimes, just sitting down and reading through ? your code with ☕ (a careful code review!) would be greatly helpful.

General Tips from Software Engineering

Almost .red[all] of rule-of-thumb tips and guidelines for writing good, neat, and defensive codes can be applied to TensorFlow codes 😃

  • Check and sanitize inputs
  • Logging
  • Assertions
  • Proper usage of Exception
  • : immediately abort if something is wrong
  • : Don’t Repeat Yourself
  • Build up well-organized codes, test smaller modules
  • etc …

There are some good guides on the web like


Use asserts as much as you can

  • Use assertion anywhere (early fail is always good)
  • e.g. Data processing routine (input sanity check)
  • Especially, perform .red[shape check] for tensors (like ‘static’ type checking when compiling codes)
net['fc7'] = tf.nn.xw_plus_b(net['fc6'], vars['fc7/W'], vars['fc7/b']) assert net['fc7'].get_shape().as_list() == [None, 4096] *net['fc7'].get_shape().assert_is_compatible_with([B, 4096]) 
  • Sometimes, tf.Assert() operation might be helpful (a run-time checking)
  • should be turned off if we are not debugging now

Use proper logging

  • Being verbose for logging helps a lot (configurations for training hyperparameters, monitor train/validation loss, learning rate, elapsed time, etc.)


    .img-100.center[
    [外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-0ecQPkr1-1603170186540)(images/logging-example.png)]
    ]

Guard against Numerical Errors

Quite often, NaN occurs during the training

  • We usually deal with float32 which is not a precise datatype;
    deep learning models are susceptible to numerical instability
  • Some possible reasons:
  • gradient is too big (clipping may be required)
  • zero or negatives are passed in sqrt or log
  • Check: whether it is NaN or is finite
  • Some useful TF APIs
  • .small]
  • .small[]

Name your tensors properly

  • It is recommended to specify names for intermediate tensors and variables when building model
  • Using variable scopes properly is also a very good idea
  • When something wrong happens, we can easily figure out where the error is from
ValueError: Cannot feed value of shape (200,) for Tensor u'Placeholder_1:0', which has shape '(?, 10)' ValueError: Tensor conversion requested dtype float32 for Tensor with * dtype int32: 'Tensor("Variable_1/read:0", shape=(256,), dtype=int32)' 

A better stacktrace:

ValueError: Cannot feed value of shape (200,) for Tensor u'placeholder_y:0', which has shape '(?, 10)' ValueError: Tensor conversion requested dtype float32 for Tensor with * dtype int32: 'Tensor("fc1/weights/read:0", shape=(256,), dtype=int32)' 

Name your tensors properly

def multilayer_perceptron(x): W_fc1 = tf.Variable(tf.random_normal([784, 256], 0, 1)) b_fc1 = tf.Variable([0] * 256) # wrong here!! fc1 = tf.nn.xw_plus_b(x, W_fc1, b_fc1) # ... 
>>> fc1 <tf.Tensor 'xw_plus_b:0' shape=(?, 256) dtype=float32> 

Better:

def multilayer_perceptron(x): W_fc1 = tf.Variable(tf.random_normal([784, 256], 0, 1), name='fc1/weights') b_fc1 = tf.Variable(tf.zeros([256]), name='fc1/bias') fc1 = tf.nn.xw_plus_b(x, W_fc1, b_fc1, name='fc1/linear') fc1 = tf.nn.relu(fc1, name='fc1/relu') # ... 
>>> fc1 <tf.Tensor 'fc1/relu:0' shape=(?, 256) dtype=float32> 

Name your tensors properly

The style that I much prefer:

def multilayer_perceptron(x): * with tf.variable_scope('fc1'): W_fc1 = tf.get_variable('weights', [784, 256]) # fc1/weights b_fc1 = tf.get_variable('bias', [256]) # fc1/bias fc1 = tf.nn.xw_plus_b(x, W_fc1, b_fc1) # fc1/xw_plus_b fc1 = tf.nn.relu(fc1) # fc1/relu # ... 

or use high-level APIs or your custom functions:

import tensorflow.contrib.layers as layers def multilayer_perceptron(x): fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, * scope='fc1') # ... 

And more style guides …?   .large[Toward Best Practices of TensorFlow Code Patterns]

.small[https://github.com/wookayin/TensorFlowKR-2017-talk-bestpractice]

all of which will help you to write easily-debuggable codes!


Other Topics: Performance and Profiling

Run-time performance is a very important topic!

.dogdrip[there will be another lecture soon…]

  • Beyond the scope of this talk…

Make sure that your GPU utilization is always non-zero (and, near 100%)

  • Watch and monitor using nvidia-smi or

.img-75.center[
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-PTfXYqM5-1603170186541)(https://github.com/wookayin/gpustat/raw/master/screenshot.png)]
]
.img-50.center[
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-4bbmwNPw-1603170186543)(images/nvidia-smi.png)]
]


Other Topics: Performance and Profiling

  • Some possible factors that might slow down your code:
  • Input batch preparation (i.e. next_batch())
  • Too frequent or heavy summaries
  • Inefficient model (e.g. CPU-bottlenecked operations)
  • What we can do
  • Use tfdbg !!!
  • Use ,

    or  in IPython
  • Use  for profiling CUDA operations
  • Use CUPTI (CUDA Profiling Tools Interface)  for TF

.img-66.center[[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-vpJHn9zo-1603170186544)(images/tracing-cupti.png)]]


Concluding Remarks

We have talked about

  • How to debug TensorFlow and python applications
  • Some tips and guidelines for easy debugging — write a nice code that would require less debugging 😃

name: last-page
class: center, middle, no-number

Special Thanks to:
Juyong Kim, Yunseok Jang, Junhyug Noh, Cesc Park, Jongho Park

TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc.

.img-66[[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-1tgM5Pla-1603170186545)(images/tensorflow-logo.png)]]


name: last-page
class: center, middle, no-number

Thank You!

.footnote[Slideshow created using .]

Read more

最新电子电气架构(EEA)调研-3

而新一代的强实时性、高确定性,以及满足CAP定理的同步分布式协同技术(SDCT),可以实现替代TSN、DDS的应用,且此技术已经在无人车辆得到验证,同时其低成本学习曲线、无复杂二次开发工作,将开发人员的劳动强度、学习曲线极大降低,使开发人员更多的去完成算法、执行器功能完善。 五、各大车厂的EEA 我们调研策略是从公开信息中获得各大车厂的EEA信息,并在如下中进行展示。 我们集中了华为、特斯拉、大众、蔚来、小鹏、理想、东风(岚图)等有代表领先性的车辆电子电气架构厂商。        1、华为 图12 华为的CCA电子电气架构              (1)华为“计算+通信”CC架构的三个平台                         1)MDC智能驾驶平台;                         2)CDC智能座舱平台                         3)VDC整车控制平台。        联接指的是华为智能网联解决方案,解决车内、车外网络高速连接问题,云服务则是基于云计算提供的服务,如在线车主服务、娱乐和OTA等。 华

By Ne0inhk
Apache IoTDB 架构特性与 Prometheus+Grafana 监控体系部署实践

Apache IoTDB 架构特性与 Prometheus+Grafana 监控体系部署实践

Apache IoTDB 架构特性与 Prometheus+Grafana 监控体系部署实践 文章目录 * Apache IoTDB 架构特性与 Prometheus+Grafana 监控体系部署实践 * Apache IoTDB 核心特性与价值 * Apache IoTDB 监控面板完整部署方案 * 安装步骤 * 步骤一:IoTDB开启监控指标采集 * 步骤二:安装、配置Prometheus * 步骤三:安装grafana并配置数据源 * 步骤四:导入IoTDB Grafana看板 * TimechoDB(基于 Apache IoTDB)增强特性 * 总结与应用场景建议 Apache IoTDB 核心特性与价值 Apache IoTDB 专为物联网场景打造的高性能轻量级时序数据库,以 “设备 - 测点” 原生数据模型贴合物理设备与传感器关系,通过高压缩算法、百万级并发写入能力和毫秒级查询响应优化海量时序数据存储成本与处理效率,同时支持边缘轻量部署、

By Ne0inhk
SQL Server 2019安装教程(超详细图文)

SQL Server 2019安装教程(超详细图文)

SQL Server 介绍) SQL Server 是由 微软(Microsoft) 开发的一款 关系型数据库管理系统(RDBMS),支持结构化查询语言(SQL)进行数据存储、管理和分析。自1989年首次发布以来,SQL Server 已成为企业级数据管理的核心解决方案,广泛应用于金融、电商、ERP、CRM 等业务系统。它提供高可用性、安全性、事务处理(ACID)和商业智能(BI)支持,并支持 Windows 和 Linux 跨平台部署。 一、获取 SQL Server 2019 安装包 1. 官方下载方式 前往微软官网注册账号后,即可下载 SQL Server Developer 版本(

By Ne0inhk