TensorFlow Strikes Back
EuroPython 2019 - Michele "Ubik" De Simoni
Phenomenal Computational powers ...
Pythoncrust with a creamy
- Amazing performance
- Easy to deploy in production, especially at massive Google-like scale
- Static Graph meant you could code your model in Python and then export it as proto
object to work with whatever language you wanted
- Highly performant input pipelines inside
Ugly, Clunky API
However, for all its capabilities, it is a matter of fact that TF 1.X came very short in terms of API design and usability.
- Unclear API littered with redundancy and an exploding mess in terms of the
- The use of Statically Defined Computational Graph meant that the way to think about the graph was Define and Run
in contrast to later fwameworks implementintga Dynamically Defined Graph such as PyTorch.
- The explicit need to use Graph dependent scopes and variables which clashed with the concept of Pythonic code.
These objects did not behave as normal Python objects and thus required to familiarity to work with them.
> Be a Machine Learning > Working with TensorFlow required rethinking your coding habits > Obstacles during the debugging + required adaptation = Slow Develepment for newcomers > Plenty of newcomers due to the ML Craze > Most pepole (i.e., researchers) were not leveraging the production capabilities of the framework > PyTorch comes along lighting a fire under TF comfy position as the ONE FRAMEWORK by covering all the usability issues while offering a performant eager computation (production deployment of the models was, and still is in my opinion inferior in terms of easiness and raw performance to TF, but PyTorch is quickly gaining ground) > People, especially in the academia start migrating towards it > You start thinking about refatcoring everything to PyTorch > You panic > You vomit > You cry yourself to sleep > Decide to wait > You wait some more > Summer 2018 comes, you try TF Eager Mode but no banana > PyTorch 1.0 releases > You go over the panic cycle again > You wait another while > Competition does its magic and TF starts shifting to meet its user base > Now you maybe don't need to learn two frameworks
You either die a static graph or live long enough to become Eager by default
TensorFlow’s eager execution is an imperative programming environment that evaluates operations immediately, without building graphs: operations return concrete values instead of constructing a computational graph to run later.
TensorFlow 1.X requires users to manually stitch together an abstract syntax tree (the graph) by making tf.* API calls. It then requires users to manually compile the abstract syntax tree by passing a set of output tensors and input tensors to a session.run() call.
TensorFlow 2.0 executes eagerly (like Python normally does) and in 2.0, graphs and sessions should feel like implementation details.
One notable byproduct of eager execution is that
tf.control_dependencies()is no longer required, as all lines of code execute in order (within a
tf.function, code with side effects execute in the order written).
🎉 Yes, we can now debug and write Python code (especially control flow) as in vanilla Python. 🎉
♻️ Hasta la vista globals ♻️
In a decision of environmental responsability, the TensorFlow team, according to the Variables 2.0 RFC decided it was time to let orphaned global variables be handled by the Garbage Collector.
So you either keep track of your Python pointers or you lose your variables.
While this may seem to create additional overhead for the end-user, together with Keras objects that beahave like sane Python objects (and handle this tracking for you) it actually simplify code structure and usability by quite a lot.
☮️ Make ~~love~~ Functions Not ~~war~~ Sessions ☮️
TensorFlow has gone from a Pure Static Graph to Eager by default while keeping the power and versatilty of Graph compilation by using the
tf.function decorator. This feature is built on top of the TF Autograph, which let users define static graph using a natural, familiar, Python syntax. Using the
tf.function on a Python Function will convert the code from the pure vanilla Python syntax, to a static Graph. Gone is the need of using TensorFlow control flows primitives, you can write Python control flow and have it converted to highly performant static graph code.
The magic goes deeper but to cover it you will need a dedicated talk.
Luckily you can delve into the mysteries of Autograph later with
Paolo Galeone's talk: Dissecting tf.function to discover AutoGraph strengths and subtleties
The Death of an API
tf.layers is dead
tf.Graph is gone (into hiding)
tf.contrib is dead
🎉 It was about time 🎉
Long Live Keras
- One and only Layer API
- Available under
- Pythonic Object
- Simple to use:
- Initialize the Layer
- Use layers as callable
awesome_layer = keras.layers.Dense( units=50, activation="tanh", ) awesome_layer(awesome_inputs)
Losses now are contained it the
While TF 2.0 offers an ample out of the box selections of them, you can easily define
custom ones by subclassing from
tf.keras.losses.Loss, the only strict requirement is the
implementation of a
call() method accepting
y_pred as arguments.
Gone are the old TF optimizers, now they live under the
We will see later how they are used during the training step.
Keras Model API
Subclassing (Chainer) API
Keras Sequential Model
The sequential model is the most straightforward of the the three APIs,
it consists of a stacked sequence of layers, each feeding into the next in a linear graph.
It can be constructed either by passing a list of
keras.layers at its instantiation
or built by iteratively calling its
add() method and passing the desired layer.
model = tf.keras.Sequential([ tf.keras.layers.Dense(...), tf.keras.layers.BatchNormalization(...), tf.keras.layers.Dense(...), tf.keras.layers.Softmax(...) ])
model = tf.keras.Sequential() model.add(tf.keras.layers.Dense(...)) model.add(tf.keras.layers.BatchNormalization(...)) model.add(tf.keras.layers.Dense(...)) model.add(tf.keras.layers.Softmax(...))
Keras Model API
Subclassing (Chainer) API
Keras Model Functional API
keras.Sequential is enough to cover a vast portion of use-cases, there are times
where a linear stack of layers cannot simply represent a more complex architecture.
Keras Functional API comes to the rescue by offering us control over how and when are layers instantiated and called.
[The Functional API] can handle models with non-linear topology, models with shared layers, and models with multiple inputs or outputs.
It's based on the idea that a deep learning model is usually a directed acyclic graph (DAG) of layers. The Functional API a set of tools for building graphs of layers.
For more information see: The Keras Functional API in TensorFlow
1) Start by creating an Input Node. NOTE: we never specify the batch size.
What gets returned,
inputs, contains information about the shape and dtype of
the input data that you expect to feed to your model:
inputs = tf.keras.Input(shape=(784,))
2) To add nodes to the graph simply call a
tf.keras.Model on the
Model need to be initialized before the the
__call__() method is exposed.
inputs = tf.keras.Input(shape=(784,), name="img") x = tf.keras.layers.Dense(64, activation="relu")(inputs) x = tf.keras.layers.Dense(64, activation="relu")(x) outputs = tf.keras.layers.Dense(10, activation="softmax")(x)
3) We can create a
Model by specifying its inputs and outputs in the graph of layers.
NOTE: In case of a
Model with multiple inputs/outputs, we just need to pass them as
model = keras.Model(inputs=inputs, outputs=outputs)
4) After that our model can be inspected with
Model: "mnist_model" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= img (InputLayer) [(None, 784)] 0 _________________________________________________________________ dense_3 (Dense) (None, 64) 50240 _________________________________________________________________ dense_4 (Dense) (None, 64) 4160 _________________________________________________________________ dense_5 (Dense) (None, 10) 650 ================================================================= Total params: 55,050 Trainable params: 55,050 Non-trainable params: 0 _________________________________________________________________
Keras Model API
> Subclassing (Chainer) API
Keras Model Subclassing (Chainer) API
Made popular by the Chainer Deep Learning Framework, this is regarded as the API for the 1% researcher.
The idea is simple, by subclassing from the
tf.keras.Model object we can define our own forward pass.
Create layers in the
__init__ method and set them as attributes of the class instance. Define the forward pass in the
NOTE: Model subclassing is particularly useful when eager execution is enabled,
because it allows the forward pass to be written imperatively.
While this way of building model is extremely powerful it comes at higher cost in terms of bug-likelihood and technical debt.
The following example shows a subclassed tf.keras.Model using a custom forward pass that does not have to be run imperatively.
class MyModel(tf.keras.Model): def __init__(self, num_classes=10): super(MyModel, self).__init__(name="my_model") self.num_classes = num_classes # Define your layers here. self.dense_1 = layers.Dense(32, activation="relu") self.dense_2 = layers.Dense(num_classes, activation="sigmoid") def call(self, inputs): # Define your forward pass here, # using layers you previously defined (in `__init__`). x = self.dense_1(inputs) return self.dense_2(x)
Note: to make sure the forward pass is always run imperatively, you must set
dynamic=True when calling the super constructor.
High Performance Input Pipeline with `tf.data`
TensorFlow 1.X had support for highly performant input pipelines in the form of the
tf.data module. With 2.0 the module is still there and possibly even easier to use than before.
tf.dataAPI introduces a
tf.data.Datasetabstraction that represents a sequence of elements, in which each element consists of one or more Tensor objects. For example, in an image pipeline, an element might be a single training example, with a pair of tensors representing the image and its label.
There are two distinct ways to create a dataset:
- A data source constructs a Dataset from data stored in memory in one ore more files.
- A data transformation constructs a dataset from one or more tf.data.Dataset objects.
- In memory data:
- Input data steored in the recommended TFRecord format:
Once you have a Dataset object, you can transform it into a new Dataset by chaining method calls on the
For example, you can apply per-element transformations such as
Dataset.map(), and multi-element transformations such as
See the documentation for
tf.data.Dataset for a complete list of transformations.
NOTE: In TF 2.0 the Dataset object is a Python iterable. This makes it possible to consume its elements using a for loop
; another option is creating an iterable using Python
iter() and consuming it with the the
After having created your model with one of the above API,
the last step is to let Keras bundle Model, Loss and Optimizer together.
This bit of magic is done via the
tf.keras.Model.compile() method which once called, configures the model for training.
tf.keras.Model.compile takes three important arguments:
optimizer: This object specifies the training procedure.
Pass it optimizer instances from the
tf.keras.optimizersmodule, such as
tf.keras.optimizers.SGD. If you just want to use the default parameters,
you can also specify optimizers via strings, such as 'adam' or 'sgd'.
loss: The function to minimize during optimization. Common choices include mean square error (mse),
categorical_crossentropy, and binary_crossentropy. Loss functions are specified by name
or by passing a callable object from the
metrics: Used to monitor training. These are string names
or callables from the
- Additionally, to make sure the model trains and evaluates eagerly,
you can make sure to pass
run_eagerly=Trueas a parameter to compile.
model = tf.keras.Sequential( [ # Adds a densely-connected layer with 64 units to the model: tf.keras.layers.Dense( 64, activation="relu", input_shape=(32,) ), # Add another: tf.keras.layers.Dense(64, activation="relu"), # Add a softmax layer with 10 output units: tf.keras.layers.Dense(10, activation="softmax"), ] ) model.compile( optimizer=tf.keras.optimizers.Adam(0.001), loss="categorical_crossentropy", metrics=["accuracy"], )
Unless you are in a particualr use case,
Model.fit is the vastly preferred way to train Keras Models.
It is fast, optimized and reliable.
tf.keras.Model.fit() has many optional arguments but these these 5 are the most important ones:
x: Input data
y: Target data
epochs: Training is structured into epochs.
An epoch is one iteration over the entire input data (this is done in smaller batches).
batch_size: When passed NumPy data, the model slices the data into smaller batches and iterates over these batches during training.
This integer specifies the size of each batch.
Be aware that the last batch may be smaller if the total number of samples is not divisible by the batch size.
validation_data: When prototyping a model, you want to easily monitor its performance on some validation data.
Passing this argument—a tuple of inputs and labels—allows the model to display
the loss and metrics in inference mode for the passed data, at the end of each epoch.
import numpy as np data = np.random.random((1000, 32)) labels = random_one_hot_labels((1000, 10)) val_data = np.random.random((100, 32)) val_labels = random_one_hot_labels((100, 10)) model.fit( data, labels, epochs=10, batch_size=32, validation_data=(val_data, val_labels), )
Here, the fit method uses the
steps_per_epoch argument—this is the number of training steps
the model runs before it moves to the next epoch.
Since the Dataset yields batches of data, this snippet does not require a
tf.data.Dataset can also be used for validation:
dataset = tf.data.Dataset.from_tensor_slices((data, labels)) dataset = dataset.batch(32).repeat() val_dataset = tf.data.Dataset.from_tensor_slices( (val_data, val_labels) ) val_dataset = val_dataset.batch(32).repeat() model.fit( dataset, epochs=10, steps_per_epoch=30, validation_data=val_dataset, validation_steps=3, )
The Dark Powers: Custom Training
Beyond the safety of
model.evaluate() lies the dark power of the
those power-mad practitioners who embrace its powers trade their sanity in exchange for a perfect control over the training.
Ominous description aside, this a powerful technique that can make the training of
complex architecture more feasible at the cost of more possible errors.
Calling a model inside a
GradientTape scope enables you to retrieve the gradients of
the trainable weights of the layer with respect to a loss value.
tf.keras.optimizer instance, you can use these gradients to update these
variables (which you can retrieve using
for epoch in range(3): print("Start of epoch %d" % (epoch,)) # Iterate over the batches of the dataset. for step, (x_batch_train, y_batch_train) in enumerate( train_dataset ): # Open a GradientTape to record the operations run # during the forward pass, which enables autodifferentiation. with tf.GradientTape() as tape: # Run the forward pass of the layer. # The operations that the layer applies # to its inputs are going to be recorded # on the GradientTape. logits = model( x_batch_train ) # Logits for this minibatch # Compute the loss value for this minibatch. loss_value = loss_fn(y_batch_train, logits) # Use the gradient tape to automatically retrieve # the gradients of the trainable variables with respect to the loss. grads = tape.gradient( loss_value, model.trainable_weights ) # Run one step of gradient descent by updating # the value of the variables to minimize the loss. optimizer.apply_gradients( zip(grads, model.trainable_weights) ) # Log every 200 batches. if step % 200 == 0: print( "Training loss (for one batch) at step %s: %s" % (step, float(loss_value)) ) print( "Seen so far: %s samples" % ((step + 1) * 64) )
Exporting is dead easy:
model.save for Sequential and Functional API
- ith four possible choices:
- Whole Model Saving
- Export whole model as a SavedModel
- Weights-only-saving in SavedModel Format
If you are using PyTorch --> Try TF 2.0
If you love Keras --> Try TF 2.0
If TF 1.X scared you away (for good reason) --> Try TF 2.0
... Most importantly ...