A Practical Guide for Debugging TensorFlow Codes

Ne0inhk

14 Jan 2025 — 22 min read

A Practical Guide for Debugging TensorFlow Codes

.author[Jongwook Choi]

.small[.white[Feb 17th, 2017]
.green[Initial Version: June 18th, 2016]]

.x-small[https://github.com/wookayin/tensorflow-talk-debugging]

Bio: Jongwook Choi ()

An undergraduate student from Seoul National University,
Looking for a graduate (Ph.D) program in ML/DL
A huge fan of TensorFlow and Deep Learning ?

.dogdrip[TensorFlow rocks!!!]

.right.img-33[[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-m9hwp1Gk-1603170186514)(images/profile-wook.png)]]

About

This talk aims to share you with some practical guides and tips on writing and debugging TensorFlow codes.

–

… because you might find that debugging TensorFlow codes is something like …

class: center, middle, no-number, bg-full
background-image: url(images/meme-doesnt-work.jpg)
background-repeat: no-repeat
background-size: contain

Welcome!

.green[Contents]

Introduction: Why debugging in TensorFlow is difficult
Basic and advanced methods for debugging TensorFlow codes
General tips and guidelines for easy-debuggable code
.dogdrip[Benchmarking and profiling TensorFlow codes]

.red[A Disclaimer]

This talk is NOT about how to debug your ML/DL model
.gray[(e.g. my model is not fitting well)],
but about how to debug your TF codes in a programming perspective
I had to assume that the audience is somewhat familiar with basics of TensorFlow and Deep Learning;
it would be very good if you have an experience to write a TensorFlow code by yourself
Questions are highly welcomed! Please feel free to interrupt!

Debugging?

Debugging TensorFlow Application is …

–

Difficult!

–

Do you agree? .dogdrip[Life is not that easy]

Review: TensorFlow Computation Mechanics

The core concept of TensorFlow: The Computation Graph
See Also:

.center.img-33[[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-JVhJ747f-1603170186516)(images/tensors_flowing.gif)]]

Review: TensorFlow Computation Mechanics

.gray.right[(from )]

TensorFlow programs are usually structured into

a .green[construction phase], that assembles a graph, and
an .blue[execution phase] that uses a session to execute ops in the graph.

.center.img-33[[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-f4ba6UIA-1603170186519)(images/tensors_flowing.gif)]]

Review: in pure numpy …

W1, b1, W2, b2, W3, b3 = init_parameters() def multilayer_perceptron(x, y_truth): # (i) feed-forward pass assert x.dtype == np.float32 and x.shape == [batch_size, 784] # numpy! * fc1 = fully_connected(x, W1, b1, activation_fn='relu') # [B, 256] * fc2 = fully_connected(fc1, W2, b2, activation_fn='relu') # [B, 256] out = fully_connected(fc2, W3, b3, activation_fn=None) # [B, 10] # (ii) loss and gradient, backpropagation loss = softmax_cross_entropy_loss(out, y_truth) # loss as a scalar param_gradients = _compute_gradient(...) # just an artificial example :) return out, loss, param_gradients def train(): for epoch in range(10): epoch_loss = 0.0 batch_steps = mnist.train.num_examples / batch_size for step in range(batch_steps): batch_x, batch_y = mnist.train.next_batch(batch_size) * y_pred, loss, gradients = multilayer_perceptron(batch_x, batch_y) for v, grad_v in zip(all_params, gradients): v = v - learning_rate * grad_v epoch_loss += c / batch_steps print "Epoch %02d, Loss = %.6f" % (epoch, epoch_loss)

Review: with TensorFlow

def multilayer_perceptron(x): * fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu) * fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu) out = layers.fully_connected(fc2, 10, activation_fn=None) return out x = tf.placeholder(tf.float32, [None, 784]) y = tf.placeholder(tf.float32, [None, 10]) *pred = multilayer_perceptron(x) loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y)) train_op = tf.train.AdamOptimizer(learning_rate=0.001).minimize(loss) def train(session): batch_size = 200 session.run(tf.initialize_all_variables()) for epoch in range(10): epoch_loss = 0.0 batch_steps = mnist.train.num_examples / batch_size for step in range(batch_steps): batch_x, batch_y = mnist.train.next_batch(batch_size) * _, c = session.run([train_op, loss], {x: batch_x, y: batch_y}) epoch_loss += c / batch_steps print "Epoch %02d, Loss = %.6f" % (epoch, epoch_loss)

Review: The Issues

def multilayer_perceptron(x): * fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu) * fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu) out = layers.fully_connected(fc2, 10, activation_fn=None) return out x = tf.placeholder(tf.float32, [None, 784]) y = tf.placeholder(tf.float32, [None, 10]) *pred = multilayer_perceptron(x)

The actual computation is done inside session.run();
what we just have done is to build a computation graph
The model building part (e.g. multilayer_perceptron()) is called only once
before training, so we can’t access the intermediates simply
e.g. Inspecting activations of fc1/fc2 is not trivial!

* _, c = session.run([train_op, loss], {x: batch_x, y: batch_y})

The most important method in TensorFlow — where every computation is performed!

tf.Session.run(fetches, feed_dict) runs the operations and evaluates in fetches,
subsituting the values (placeholders) in feed_dict for the corresponding input values.

Why TensorFlow debugging is difficult?

The concept of Computation Graph might be unfamiliar to us.
The “Inversion of Control”
The actual computation (feed-forward, training) of model runs inside Session.run(),
upon the computation graph, but not upon the Python code we wrote
What is exactly being done during an execution of session is under an abstraction barrier
Therefore, we do not retrieve the intermediate values during the computation,
unless we explicitly fetch them via Session.run()

Debugging Facilities in TensorFlow

Debugging Scenarios

We may wish to …

inspect intra-layer .blue[activations] (during training)
e.g. See the output of conv5, fc7 in CNNs
inspect .blue[parameter weights] (during training)
under some conditions, pause the execution (i.e. .blue[breakpoint]) and
evaluate some expressions for debugging
during training, .red[NaN] occurs in loss and variables .dogdrip[but I don’t know why]

–

? Of course, in TensorFlow, we can do these very elegantly!

Debugging in TensorFlow: Overview

Explicitly fetch, and print (or do whatever you want)!
Session.run()
Tensorboard: Histogram and Image Summary
the tf.Print() operation

Interpose any python codelet in the computation graph
A step-by-step debugger
tfdbg: The TensorFlow debugger

(1) Fetch tensors via `Session.run()`

TensorFlow allows us to run parts of graph in isolation, i.e.
.green[only the relevant part] of graph is executed (rather than executing everything)

x = tf.placeholder(tf.float32) y = tf.placeholder(tf.float32) bias = tf.Variable(1.0) y_pred = x ** 2 + bias # x -> x^2 + bias loss = (y - y_pred)**2 # l2 loss? # Error: to compute loss, y is required as a dependency print('Loss(x,y) = %.3f' % session.run(loss, {x: 3.0})) # OK, print 1.000 = (3**2 + 1 - 9)**2 print('Loss(x,y) = %.3f' % session.run(loss, {x: 3.0, y: 9.0})) # OK, print 10.000; for evaluating y_pred only, input to y is not required *print('pred_y(x) = %.3f' % session.run(y_pred, {x: 3.0})) # OK, print 1.000 bias evaluates to 1.0 *print('bias = %.3f' % session.run(bias))

Tensor Fetching: Example

We need to access to the tensors as python expressions

def multilayer_perceptron(x): fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu) fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu) out = layers.fully_connected(fc2, 10, activation_fn=None) * return out, fc1, fc2 net = {} x = tf.placeholder(tf.float32, [None, 784]) y = tf.placeholder(tf.float32, [None, 10]) *pred, net['fc1'], net['fc2'] = multilayer_perceptron(x)

to fetch and evaluate them:

 _, c, fc1, fc2, out = session.run( * [train_op, loss, net['fc1'], net['fc2'], pred], feed_dict={x: batch_x, y: batch_y}) # and do something ... if step == 0: # XXX Debug print fc1[0].mean(), fc2[0].mean(), out[0]

(1) Fetch tensors via `Session.run()`

Simple and Easy.
The most basic method to get debugging information.
We can fetch any evaluation result in numpy arrays, .green[anywhere] .gray[(except inside Session.run() or the computation graph)].

–

We need to hold the reference to the tensors to inspect,
which might be burdensome if model becomes complex and big

.gray[(Or, we can simply pass the tensor name such as fc0/Relu:0)]
The feed-forward needs to be done in an atomic way (i.e. a single call of Session.run())

Tensor Fetching: The Bad (i)

We need to hold a reference to the tensors to inspect,
which might be burdensome if model becomes complex and big

def alexnet(x): assert x.get_shape().as_list() == [224, 224, 3] conv1 = conv_2d(x, 96, 11, strides=4, activation='relu') pool1 = max_pool_2d(conv1, 3, strides=2) conv2 = conv_2d(pool1, 256, 5, activation='relu') pool2 = max_pool_2d(conv2, 3, strides=2) conv3 = conv_2d(pool2, 384, 3, activation='relu') conv4 = conv_2d(conv3, 384, 3, activation='relu') conv5 = conv_2d(conv4, 256, 3, activation='relu') pool5 = max_pool_2d(conv5, 3, strides=2) fc6 = fully_connected(pool5, 4096, activation='relu') fc7 = fully_connected(fc6, 4096, activation='relu') output = fully_connected(fc7, 1000, activation='softmax') return conv1, pool1, conv2, pool2, conv3, conv4, conv5, pool5, fc6, fc7 conv1, conv2, conv3, conv4, conv5, fc6, fc7, output = alexnet(images) # ?!

_, loss_, conv1_, conv2_, conv3_, conv4_, conv5_, fc6_, fc7_ = session.run( [train_op, loss, conv1, conv2, conv3, conv4, conv5, fc6, fc7], feed_dict = {...})

Tensor Fetching: The Bad (i)

Suggestion: Using a dict or class instance (e.g. self.conv5) is a very good idea

def alexnet(x, net={}): assert x.get_shape().as_list() == [224, 224, 3] net['conv1'] = conv_2d(x, 96, 11, strides=4, activation='relu') net['pool1'] = max_pool_2d(net['conv1'], 3, strides=2) net['conv2'] = conv_2d(net['pool1'], 256, 5, activation='relu') net['pool2'] = max_pool_2d(net['conv2'], 3, strides=2) net['conv3'] = conv_2d(net['pool2'], 384, 3, activation='relu') net['conv4'] = conv_2d(net['conv3'], 384, 3, activation='relu') net['conv5'] = conv_2d(net['conv4'], 256, 3, activation='relu') net['pool5'] = max_pool_2d(net['conv5'], 3, strides=2) net['fc6'] = fully_connected(net['pool5'], 4096, activation='relu') net['fc7'] = fully_connected(net['fc6'], 4096, activation='relu') net['output'] = fully_connected(net['fc7'], 1000, activation='softmax') return net['output'] net = {} output = alexnet(images, net) # access intermediate layers like net['conv5'], net['fc7'], etc.

Tensor Fetching: The Bad (i)

Suggestion: Using a dict or class instance (e.g. self.conv5) is a very good idea

class AlexNetModel(): # ... def build_model(self, x): assert x.get_shape().as_list() == [224, 224, 3] self.conv1 = conv_2d(x, 96, 11, strides=4, activation='relu') self.pool1 = max_pool_2d(self.conv1, 3, strides=2) self.conv2 = conv_2d(self.pool1, 256, 5, activation='relu') self.pool2 = max_pool_2d(self.conv2, 3, strides=2) self.conv3 = conv_2d(self.pool2, 384, 3, activation='relu') self.conv4 = conv_2d(self.conv3, 384, 3, activation='relu') self.conv5 = conv_2d(self.conv4, 256, 3, activation='relu') self.pool5 = max_pool_2d(self.conv5, 3, strides=2) self.fc6 = fully_connected(self.pool5, 4096, activation='relu') self.fc7 = fully_connected(self.fc6, 4096, activation='relu') self.output = fully_connected(self.fc7, 1000, activation='softmax') return self.output model = AlexNetModel() output = model.build_model(images) # access intermediate layers like self.conv5, self.fc7, etc.

Tensor Fetching: The Bad (ii)

The feed-forward (sometimes) needs to be done in an atomic way (i.e. a single call of Session.run())

# a single step of training ... *[loss_value, _] = session.run([loss_op, train_op], feed_dict={images: batch_image}) # After this, the model parameter has been changed due to `train_op` # if np.isnan(loss_value): # DEBUG : can we see the intermediate values for the current input? [fc7, prob] = session.run([net['fc7'], net['prob']], feed_dict={images: batch_image})

In other words, if any input is fed via feed_dict,
we may have to fetch the non-debugging-related tensors
and the debugging-related tensors at the same time.

Tensor Fetching: The Bad (ii)

In fact, we can just perform an additional session.run() for debugging purposes,
if it does not involve any side effect

# for debugging only, get the intermediate layer outputs. [fc7, prob] = session.run([net['fc7'], net['prob']], feed_dict={images: batch_image}) # # Yet another feed-forward: 'fc7' are computed once more ... [loss_value, _] = session.run([loss_op, train_op], feed_dict={images: batch_image})

A workaround: Use .gray[(undocumented, and still experimental)]

h = sess.partial_run_setup([net['fc7'], loss_op, train_op], [images]) [loss_value, _] = sess.partial_run(h, [loss_op, train_op], feed_dict={images: batch_image}) fc7 = sess.partial_run(h, net['fc7'])

(2) Tensorboard

An off-the-shelf monitoring and debugging tool!
Check out a must-read from TensorFlow documentation

You will need to learn
how to use and collect
how to use .dogdrip[(previously it was SummaryWriter)]

Tensorboard: A Quick Tutorial

def multilayer_perceptron(x): # inside this, variables 'fc1/weights' and 'fc1/bias' are defined fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, scope='fc1') * tf.summary.histogram('fc1', fc1) * tf.summary.histogram('fc1/sparsity', tf.nn.zero_fraction(fc1)) fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu, scope='fc2') * tf.summary.histogram('fc2', fc2) * tf.summary.histogram('fc2/sparsity', tf.nn.zero_fraction(fc2)) out = layers.fully_connected(fc2, 10, scope='out') return out # ... (omitted) ... loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits( logits=pred, labels=y)) *tf.scalar_summary('loss', loss) *# histogram summary for all trainable variables (slow?) *for v in tf.trainable_variables(): * tf.summary.histogram(v.name, v)

Tensorboard: A Quick Tutorial

*global_step = tf.Variable(0, dtype=tf.int32, trainable=False) train_op = tf.train.AdamOptimizer(learning_rate=0.001)\ .minimize(loss, global_step=global_step)

def train(session): batch_size = 200 session.run(tf.global_variables_initializer()) * merged_summary_op = tf.summary.merge_all() * summary_writer = tf.summary.FileWriter(FLAGS.train_dir, session.graph) # ... (omitted) ... for step in range(batch_steps): batch_x, batch_y = mnist.train.next_batch(batch_size) * _, c, summary = session.run( [train_op, loss, merged_summary_op], feed_dict={x: batch_x, y: batch_y}) * summary_writer.add_summary(summary, * global_step.eval(session=session))

Tensorboard: A Quick Tutorial (Demo)

Scalar Summary

.center.img-100[[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-UliDXkO1-1603170186522)(images/tensorboard-01-loss.png)]]

Tensorboard: A Quick Tutorial (Demo)

Histogram Summary (activations and variables)

.center.img-66[[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-vaJjQqkR-1603170186523)(images/tensorboard-02-histogram.png)]]

Tensorboard and Summary: Noteworthies

Fetching histogram summary is extremely slow!
GPU utilization can become very low (if the serialized values are huge)
In non-debugging mode, disable it completely; or fetch summaries only periodically, e.g.

eval_tensors = [self.loss, self.train_op] if step % 200 == 0: eval_tensors += [self.merged_summary_op] eval_ret = session.run(eval_tensors, feed_dict) eval_ret = dict(zip(eval_tensors, eval_ret)) # as a dict current_loss = eval_ret[self.loss] if self.merged_summary_op in eval_tensors: self.summary_writer.add_summary( eval_ret[self.merged_summary_op], current_step)

I recommend to take simple and essential scalar summaries only (e.g. train/validation loss, overall accuracy, etc.), and to include debugging stuffs only on demand

Tensorboard and Summary: Noteworthies

Some other recommendations:
Use proper names (prefixed or scoped) for tensors and variables (specifying name=... to tensor/variable declaration)
Include both of train loss and validation loss,
plus train/validation accuracy (if possible) over step

.center.img-50[[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-AabT06wn-1603170186525)(images/tensorboard-02-histogram.png)]]

(3)

During run-time evaluation, we can print the value of a tensor
.green[without] explicitly fetching and returning it to the code (i.e. via session.run())

tf.Print(input, data, message=None, first_n=None, summarize=None, name=None)

It creates .blue[an identity op] with the side effect of printing data,
when this op is evaluated.

(3) `tf.Print()`: Examples

def multilayer_perceptron(x): fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu) fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu) out = layers.fully_connected(fc2, 10, activation_fn=None) * out = tf.Print(out, [tf.argmax(out, 1)], * 'argmax(out) = ', summarize=20, first_n=7) return out

For the first seven times (i.e. 7 feed-forwards or SGD steps),
it will print the predicted labels for the 20 out of batch_size examples

I tensorflow/core/kernels/logging_ops.cc:79] argmax(out) = [6 6 6 4 4 6 4 4 6 6 4 0 6 4 6 4 4 6 0 4...] I tensorflow/core/kernels/logging_ops.cc:79] argmax(out) = [6 6 0 0 3 6 4 3 6 6 3 4 4 4 4 4 3 4 6 7...] I tensorflow/core/kernels/logging_ops.cc:79] argmax(out) = [3 4 0 6 6 6 0 7 3 0 6 7 3 6 0 3 4 3 3 6...] I tensorflow/core/kernels/logging_ops.cc:79] argmax(out) = [6 1 0 0 0 3 3 7 0 8 1 2 0 9 9 0 3 4 6 6...] I tensorflow/core/kernels/logging_ops.cc:79] argmax(out) = [6 0 0 9 0 4 9 9 0 8 2 7 3 9 1 8 3 9 7 8...] I tensorflow/core/kernels/logging_ops.cc:79] argmax(out) = [6 0 1 1 9 0 8 3 0 9 9 0 2 6 7 7 3 3 3 9...] I tensorflow/core/kernels/logging_ops.cc:79] argmax(out) = [3 6 9 8 3 9 1 0 1 1 9 3 2 3 9 9 3 0 6 6...] [2016-06-03 00:11:08.661563] Epoch 00, Loss = 0.332199

(3) `tf.Print()`: Some drawbacks …

It is hard to take a full control of print formats (e.g. how do we print a 2D tensor in a matrix format?)
Usually, we may want to print debugging values conditionally
(i.e. print them only if some condition is met)
or periodically
(i.e. print just only once per epoch)
tf.Print() has limitations to achieve this
TensorFlow has control flow operations; an overkill?

(3)

Asserts that the given condition is true, when evaluated (during the computation)
If condition evaluates to False, print the list of tensors in data,
and an error is thrown.
summarize determines how many entries of the tensors to print.

tf.Assert(condition, data, summarize=None, name=None)

`tf.Assert`: Examples

Abort the program if …

def multilayer_perceptron(x): fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, scope='fc1') fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu, scope='fc2') out = layers.fully_connected(fc2, 10, activation_fn=None, scope='out') # let's ensure that all the outputs in `out` are positive * tf.Assert(tf.reduce_all(out > 0), [out], name='assert_out_positive') return out

–

The assertion will not work!

tf.Assert is also an op, so it should be executed as well

`tf.Assert`: Examples

We need to ensure that assert_op is being executed when evaluating out:

def multilayer_perceptron(x): fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, scope='fc1') fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu, scope='fc2') out = layers.fully_connected(fc2, 10, activation_fn=None, scope='out') # let's ensure that all the outputs in `out` are positive assert_op = tf.Assert(tf.reduce_all(out > 0), [out], name='assert_out_positive') * with tf.control_dependencies([assert_op]): * out = tf.identity(out, name='out') return out

… somewhat ugly? … or

 # ... same as above ... * out = tf.with_dependencies([assert_op], out) return out

`tf.Assert`: Examples

Another good way: store all the created assertion operations into a collection,
(merge them into a single op), and explicitly evaluate them using Session.run()

def multilayer_perceptron(x): fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, scope='fc1') fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu, scope='fc2') out = layers.fully_connected(fc2, 10, activation_fn=None, scope='out') * tf.add_to_collection('Asserts', * tf.Assert(tf.reduce_all(out > 0), [out], name='assert_out_gt_0') * ) return out # merge all assertion ops from the collection *assert_op = tf.group(*tf.get_collection('Asserts'))

... = session.run([train_op, assert_op], feed_dict={...})

Some built-in useful Assert ops

See in the docs!

tf.assert_negative(x, data=None, summarize=None, name=None) tf.assert_positive(x, data=None, summarize=None, name=None) tf.assert_proper_iterable(values) tf.assert_non_negative(x, data=None, summarize=None, name=None) tf.assert_non_positive(x, data=None, summarize=None, name=None) tf.assert_equal(x, y, data=None, summarize=None, name=None) tf.assert_integer(x, data=None, summarize=None, name=None) tf.assert_less(x, y, data=None, summarize=None, name=None) tf.assert_less_equal(x, y, data=None, summarize=None, name=None) tf.assert_rank(x, rank, data=None, summarize=None, name=None) tf.assert_rank_at_least(x, rank, data=None, summarize=None, name=None) tf.assert_type(tensor, tf_type) tf.is_non_decreasing(x, name=None) tf.is_numeric_tensor(tensor) tf.is_strictly_increasing(x, name=None)

If we need runtime assertions during computation, they are useful.

(4) Step-by-step Debugger

Python already has a powerful debugging utilities:

which are all interactive debuggers (like gdb for C/C++)

set breakpoint
pause, continue
inspect stack-trace upon exception
watch variables and evaluate expressions interactively

Debugger: Usage

Insert set_trace() for a breakpoint

def multilayer_perceptron(x): fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, scope='fc1') fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu, scope='fc2') out = layers.fully_connected(fc2, 10, activation_fn=None, scope='out') * import ipdb; ipdb.set_trace() # XXX BREAKPOINT return out

.img-75.center[
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-jxTod6fR-1603170186527)(images/pdb-example-01.png)]
]

Debugger: Usage

Debug breakpoints can be conditional:

 for i in range(batch_steps): batch_x, batch_y = mnist.train.next_batch(batch_size) * if (np.argmax(batch_y, axis=1)[:7] == [4, 9, 6, 2, 9, 6, 5]).all(): * import pudb; pudb.set_trace() # XXX BREAKPOINT _, c = session.run([train_op, loss], feed_dict={x: batch_x, y: batch_y})

Let’s break on training loop if some condition is met
Get the fc2 tensor, and fetch its evaluation result (given x)
Session.run() can be invoked and executed anywhere, even in the debugger
.dogdrip[Wow… gonna love it…]

Hint: Some Useful TensorFlow APIs

To get any .green[operations] or .green[tensors] that might not be stored explicitly:

tf.get_default_graph(): Get the current (default) graph
G.get_operations(): List all the TF ops in the graph
G.get_operation_by_name(name): Retrieve a specific TF op

.gray[(Q. How to convert an operation to a tensor?)]
G.get_tensor_by_name(name): Retrieve a specific tensor
tf.get_collection(tf.GraphKeys.~~): Get the collection of some tensors

To get .green[variables]:

tf.get_variable_scope(): Get the current variable scope
tf.get_variable(): Get a variable (see )
tf.trainable_variables(): List all the (trainable) variables

 [v for v in tf.all_variables() if v.name == 'fc2/weights:0'][0]

`IPython.embed()`

import pudb; pudb.set_trace()

They are debuggers; we can set breakpoint, see stacktraces, watch expressions, …
(much more general)

from IPython import embed; embed()

Open an ipython shell on the current context;
mostly used for watching expressions only

(5) Debugging ‘inside’ the computation graph

Our debugging tools so far can be used for debugging outside Session.run().

Question: How can we run a .red[‘custom’] operation? (e.g. custom layer)

–

TensorFlow allows us in C++ !

The ‘custom’ operation can be designed for logging or debugging purposes (like )

… but very burdensome (need to compile, define op interface, and use it …)

(5) Interpose any python code in the computation graph

We can also embed and interpose a python function in the graph:
comes to the rescue!

tf.py_func(func, inp, Tout, stateful=True, name=None)

Wraps a python function and uses it .blue[as a tensorflow op].
Given a python function func, which .green[takes numpy arrays] as its inputs and returns numpy arrays as its outputs, the function is wrapped as an operation.

def my_func(x): # x will be a numpy array with the contents of the placeholder below return np.sinh(x) inp = tf.placeholder(tf.float32, [...]) *y = py_func(my_func, [inp], [tf.float32])

(5) Interpose any python code in the computation graph

In other words, we are now able to use the following (hacky) .green[tricks]
by intercepting the computation being executed on the graph:

Print any intermediate values (e.g. layer activations) in a custom form
without fetching them
Use python debugger (e.g. trace and breakpoint)
Draw a graph or plot using matplotlib, or save images into file

.gray.small[Warning: Some limitations may exist, e.g. thread-safety issue, not allowed to manipulate the state of session, etc…]

Case Example (i): Print

An ugly example …

def multilayer_perceptron(x): fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, scope='fc1') fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu, scope='fc2') out = layers.fully_connected(fc2, 10, activation_fn=None, scope='out') * def _debug_print_func(fc1_val, fc2_val): print 'FC1 : {}, FC2 : {}'.format(fc1_val.shape, fc2_val.shape) print 'min, max of FC2 = {}, {}'.format(fc2_val.min(), fc2_val.max()) return False * debug_print_op = tf.py_func(_debug_print_func, [fc1, fc2], [tf.bool]) with tf.control_dependencies(debug_print_op): out = tf.identity(out, name='out') return out

Case Example (ii): Breakpoints!

An ugly example to attach breakpoints …

def multilayer_perceptron(x): fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, scope='fc1') fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu, scope='fc2') out = layers.fully_connected(fc2, 10, activation_fn=None, scope='out') * def _debug_func(x_val, fc1_val, fc2_val, out_val): if (out_val == 0.0).any(): * import ipdb; ipdb.set_trace() # XXX BREAKPOINT * from IPython import embed; embed() # XXX DEBUG return False * debug_op = tf.py_func(_debug_func, [x, fc1, fc2, out], [tf.bool]) with tf.control_dependencies(debug_op): out = tf.identity(out, name='out') return out

Another one: The `tdb` library

A third-party TensorFlow debugging tool: .small[https://github.com/ericjang/tdb]

.small[(not actively maintained and looks clunky, but still good for prototyping)]

.img-90.center[
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-ga4IFq4s-1603170186529)(https://camo.githubusercontent.com/4c671d2b359c9984472f37a73136971fd60e76e4/687474703a2f2f692e696d6775722e636f6d2f6e30506d58516e2e676966)]
]

(6) `tfdbg`: The official TensorFlow debugger

Recent versions of TensorFlow has the official debugger ().
Still experimental, but works quite well!

Check out the and on tfdbg!!!

import tensorflow.python.debug as tf_debug sess = tf.Session() # create a debug wrapper session *sess = tf_debug.LocalCLIDebugWrapperSession(sess) # Add a tensor filter (similar to breakpoint) *sess.add_tensor_filter("has_inf_or_nan", tf_debug.has_inf_or_nan) # Each session.run() will be intercepted by the debugger, # and we can inspect the value of tensors via the debugger interface sess.run(loss, feed_dict = {x : ...})

(6) `tfdbg`: The TensorFlow debugger

.img-100.center[
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-GbFu0mri-1603170186530)(images/tfdbg_example1.png)]
]

(6) `tfdbg`: The TensorFlow debugger

.img-100.center[
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-mIKnOpro-1603170186532)(images/tfdbg_example2.png)]
]

`tfdbg`: Features and Quick References

Conceptually, a wrapper session is employed (currently, CLI debugger session); it can intercept a single run of session.run()

run / r : Execute the run() call .blue[with debug tensor-watching]
run -n / r -n : Execute the run() call .blue[without] debug tensor-watching
run -f <filter_name> : Keep executing run() calls until a dumped tensor passes a registered filter (conditional breakpoint)
e.g. has_inf_or_nan

.img-80.center[
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-NqVEN35J-1603170186533)(images/tfdbg_example_run.png)]
]

`tfdbg`: Tensor Filters

Registering tensor filters:

sess = tf_debug.LocalCLIDebugWrapperSession(sess) sess.add_tensor_filter("has_inf_or_nan", tf_debug.has_inf_or_nan)

Tensor filters are just python functions (datum, tensor) -> bool:

def has_inf_or_nan(datum, tensor): _ = datum # Datum metadata is unused in this predicate. if tensor is None: # Uninitialized tensor doesn't have bad numerical values. return False elif (np.issubdtype(tensor.dtype, np.float) or np.issubdtype(tensor.dtype, np.complex) or np.issubdtype(tensor.dtype, np.integer)): return np.any(np.isnan(tensor)) or np.any(np.isinf(tensor)) else: return False

Running tensor filters are, therefore, quite slow.

`tfdbg`: Tensor Fetching

In a tensor dump mode (the run-end UI), the debugger shows the list of tensors dumped in the session.run() call:

.img-100.center[
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-Us29Mgo2-1603170186535)(images/tfdbg_example_fetch.png)]
]

`tfdbg`: Tensor Fetching

Commands:

.blue[list_tensors (lt)] : Show the list of dumped tensor(s).
.blue[print_tensor (pt)] : Print the value of a dumped tensor.
node_info (ni) : Show information about a node
ni -t : Shows the traceback of tensor creation
list_inputs (li) : Show inputs to a node
list_outputs (lo) : Show outputs to a node
run_info (ri) : Show the information of current run
(e.g. what to fetch, what feed_dict is)
.green[invoke_stepper (s)] : Invoke the stepper!
run (r) : Move to the next run

`tfdbg`: Tensor Fetching

Example: print_tensor fc2/Relu:0

.img-80.center[
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-roFrOqdz-1603170186537)(images/tfdbg_example_pt.png)]
]

Slicing: pt f2/Relu:0[0:10]
Dumping: pt fc2/Relu:0 > /tmp/debug/fc2.txt

See also:

`tfdbg`: Stepper

Shows the tensor value(s) in a topologically-sorted order for the run.

.img-100.center[
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-KnMB8o88-1603170186538)(images/tfdbg_example_stepper.png)]
]

`tfdbg`: Screencast and Demo!

.small.right[From Google Brain Team]

`tfdbg`: Other Remarks

Currently it is actively being developed (still experimental)
In a near future, a web-based interactive debugger (integration with TensorBoard) will be out!

Debugging: Summary

Session.run(): Explicitly fetch, and print
Tensorboard: Histogram and Image Summary
tf.Print(), tf.Assert() operation
Use python debugger (ipdb, pudb)
Interpose your debugging python code in the graph
The TensorFlow debugger: tfdbg

.green[There is no silver bullet; one might need to choose the most convenient and suitable debugging tool, depending on the case]

Other General Tips

.gray[(in a Programmer’s Perspective)]

General Tips of Debugging

Learn to use debugging tools, but do not solely rely on them when a problem occurs.
Sometimes, just sitting down and reading through ? your code with ☕ (a careful code review!) would be greatly helpful.

General Tips from Software Engineering

Almost .red[all] of rule-of-thumb tips and guidelines for writing good, neat, and defensive codes can be applied to TensorFlow codes 😃

Check and sanitize inputs
Logging
Assertions
Proper usage of Exception
: immediately abort if something is wrong
: Don’t Repeat Yourself
Build up well-organized codes, test smaller modules
etc …

There are some good guides on the web like

Use asserts as much as you can

Use assertion anywhere (early fail is always good)
e.g. Data processing routine (input sanity check)
Especially, perform .red[shape check] for tensors (like ‘static’ type checking when compiling codes)

net['fc7'] = tf.nn.xw_plus_b(net['fc6'], vars['fc7/W'], vars['fc7/b']) assert net['fc7'].get_shape().as_list() == [None, 4096] *net['fc7'].get_shape().assert_is_compatible_with([B, 4096])

Sometimes, tf.Assert() operation might be helpful (a run-time checking)
should be turned off if we are not debugging now

Use proper logging

Being verbose for logging helps a lot (configurations for training hyperparameters, monitor train/validation loss, learning rate, elapsed time, etc.)

.img-100.center[
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-0ecQPkr1-1603170186540)(images/logging-example.png)]
]

Guard against Numerical Errors

Quite often, NaN occurs during the training

We usually deal with float32 which is not a precise datatype;
deep learning models are susceptible to numerical instability
Some possible reasons:
gradient is too big (clipping may be required)
zero or negatives are passed in sqrt or log
Check: whether it is NaN or is finite

Some useful TF APIs
.small]
.small[]

Name your tensors properly

It is recommended to specify names for intermediate tensors and variables when building model
Using variable scopes properly is also a very good idea
When something wrong happens, we can easily figure out where the error is from

ValueError: Cannot feed value of shape (200,) for Tensor u'Placeholder_1:0', which has shape '(?, 10)' ValueError: Tensor conversion requested dtype float32 for Tensor with * dtype int32: 'Tensor("Variable_1/read:0", shape=(256,), dtype=int32)'

A better stacktrace:

ValueError: Cannot feed value of shape (200,) for Tensor u'placeholder_y:0', which has shape '(?, 10)' ValueError: Tensor conversion requested dtype float32 for Tensor with * dtype int32: 'Tensor("fc1/weights/read:0", shape=(256,), dtype=int32)'

Name your tensors properly

def multilayer_perceptron(x): W_fc1 = tf.Variable(tf.random_normal([784, 256], 0, 1)) b_fc1 = tf.Variable([0] * 256) # wrong here!! fc1 = tf.nn.xw_plus_b(x, W_fc1, b_fc1) # ...

>>> fc1 <tf.Tensor 'xw_plus_b:0' shape=(?, 256) dtype=float32>

Better:

def multilayer_perceptron(x): W_fc1 = tf.Variable(tf.random_normal([784, 256], 0, 1), name='fc1/weights') b_fc1 = tf.Variable(tf.zeros([256]), name='fc1/bias') fc1 = tf.nn.xw_plus_b(x, W_fc1, b_fc1, name='fc1/linear') fc1 = tf.nn.relu(fc1, name='fc1/relu') # ...

>>> fc1 <tf.Tensor 'fc1/relu:0' shape=(?, 256) dtype=float32>

Name your tensors properly

The style that I much prefer:

def multilayer_perceptron(x): * with tf.variable_scope('fc1'): W_fc1 = tf.get_variable('weights', [784, 256]) # fc1/weights b_fc1 = tf.get_variable('bias', [256]) # fc1/bias fc1 = tf.nn.xw_plus_b(x, W_fc1, b_fc1) # fc1/xw_plus_b fc1 = tf.nn.relu(fc1) # fc1/relu # ...

or use high-level APIs or your custom functions:

import tensorflow.contrib.layers as layers def multilayer_perceptron(x): fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, * scope='fc1') # ...

And more style guides …? .large[Toward Best Practices of TensorFlow Code Patterns]

all of which will help you to write easily-debuggable codes!

Concluding Remarks

We have talked about

How to debug TensorFlow and python applications
Some tips and guidelines for easy debugging — write a nice code that would require less debugging 😃

Special Thanks to:
Juyong Kim, Yunseok Jang, Junhyug Noh, Cesc Park, Jongho Park

TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc.

Thank You!

A Practical Guide for Debugging TensorFlow Codes

.author[Jongwook Choi]

.small[.white[Feb 17th, 2017] .green[Initial Version: June 18th, 2016]]

.x-small[https://github.com/wookayin/tensorflow-talk-debugging]

Bio: Jongwook Choi ()

About

Welcome!

.green[Contents]

.red[A Disclaimer]

Debugging?

Debugging TensorFlow Application is …

Review: TensorFlow Computation Mechanics

Review: TensorFlow Computation Mechanics

Review: in pure numpy …

Review: with TensorFlow

Review: The Issues

Why TensorFlow debugging is difficult?

Debugging Facilities in TensorFlow

Debugging Scenarios

Debugging in TensorFlow: Overview

(1) Fetch tensors via Session.run()

Tensor Fetching: Example

(1) Fetch tensors via Session.run()

Tensor Fetching: The Bad (i)

Tensor Fetching: The Bad (i)

Tensor Fetching: The Bad (i)

Tensor Fetching: The Bad (ii)

Tensor Fetching: The Bad (ii)

(2) Tensorboard

Tensorboard: A Quick Tutorial

Tensorboard: A Quick Tutorial

Tensorboard: A Quick Tutorial (Demo)

Tensorboard: A Quick Tutorial (Demo)

Tensorboard and Summary: Noteworthies

Tensorboard and Summary: Noteworthies

(3)

(3) tf.Print(): Examples

(3) tf.Print(): Some drawbacks …

(3)

tf.Assert: Examples

tf.Assert: Examples

tf.Assert: Examples

Some built-in useful Assert ops

(4) Step-by-step Debugger

Debugger: Usage

Debugger: Usage

Hint: Some Useful TensorFlow APIs

IPython.embed()

(5) Debugging ‘inside’ the computation graph

(5) Interpose any python code in the computation graph

(5) Interpose any python code in the computation graph

Case Example (i): Print

Case Example (ii): Breakpoints!

Another one: The tdb library

(6) tfdbg: The official TensorFlow debugger

(6) tfdbg: The TensorFlow debugger

(6) tfdbg: The TensorFlow debugger

tfdbg: Features and Quick References

tfdbg: Tensor Filters

tfdbg: Tensor Fetching

tfdbg: Tensor Fetching

tfdbg: Tensor Fetching

tfdbg: Stepper

tfdbg: Screencast and Demo!

tfdbg: Other Remarks

Debugging: Summary

Other General Tips

.gray[(in a Programmer’s Perspective)]

General Tips of Debugging

General Tips from Software Engineering

Use asserts as much as you can

Use proper logging

Guard against Numerical Errors

Name your tensors properly

Name your tensors properly

Name your tensors properly

And more style guides …? .large[Toward Best Practices of TensorFlow Code Patterns]

Other Topics: Performance and Profiling

Other Topics: Performance and Profiling

Concluding Remarks

.small[.white[Feb 17th, 2017]
.green[Initial Version: June 18th, 2016]]

(1) Fetch tensors via `Session.run()`

(1) Fetch tensors via `Session.run()`

(3) `tf.Print()`: Examples

(3) `tf.Print()`: Some drawbacks …

`tf.Assert`: Examples

`tf.Assert`: Examples

`tf.Assert`: Examples

`IPython.embed()`

Another one: The `tdb` library

(6) `tfdbg`: The official TensorFlow debugger

(6) `tfdbg`: The TensorFlow debugger

(6) `tfdbg`: The TensorFlow debugger

`tfdbg`: Features and Quick References

`tfdbg`: Tensor Filters

`tfdbg`: Tensor Fetching

`tfdbg`: Tensor Fetching

`tfdbg`: Tensor Fetching

`tfdbg`: Stepper

`tfdbg`: Screencast and Demo!

`tfdbg`: Other Remarks