TensorFlow 2.0
TensorFlow Strikes Back


EuroPython 2019 - Michele "Ubik" De Simoni


https://talks.ubik.dev/tf2-tensorflow-strikes-back

It's a me, Ubik!

avatar
  • Mad Scientist

  • Machine Learning Engineer + Research @zuru.tech

  • Freelancer @ubik.tech

  • PyData Emilia Romagna Founder & Oganizer

  • GDG Bologna Manager

Websites

ubik.tech

Consultancy

ubik.dev

Personal Site

essays.ubik.tech

Technical Blog

journal.ubik.dev

Personal Blog

Phenomenal Computational powers ...

  • Python crust with a creamy C++ core
  • Amazing performance
  • Easy to deploy in production, especially at massive Google-like scale
  • Static Graph meant you could code your model in Python and then export it as proto
    object to work with whatever language you wanted
  • Highly performant input pipelines inside tf.data

Ugly, Clunky API

However, for all its capabilities, it is a matter of fact that TF 1.X came very short in terms of API design and usability.

  • Unclear API littered with redundancy and an exploding mess in terms of the tf.contrib module.
  • The use of Statically Defined Computational Graph meant that the way to think about the graph was Define and Run
    in contrast to later fwameworks implementintga Dynamically Defined Graph such as PyTorch.
  • The explicit need to use Graph dependent scopes and variables which clashed with the concept of Pythonic code.
    These objects did not behave as normal Python objects and thus required to familiarity to work with them.
> Be a Machine Learning
> Working with TensorFlow required rethinking your coding habits
> Obstacles during the debugging + required adaptation = Slow Develepment for newcomers
> Plenty of newcomers due to the ML Craze
> Most pepole (i.e., researchers) were not leveraging the production capabilities of the framework
> PyTorch comes along lighting a fire under TF comfy position as the ONE FRAMEWORK by covering all the usability issues
while offering a performant eager computation (production deployment of the models was,
and still is in my opinion inferior in terms of easiness and raw performance to TF, but PyTorch is quickly gaining ground)
> People, especially in the academia start migrating towards it
> You start thinking about refatcoring everything to PyTorch
> You panic
> You vomit
> You cry yourself to sleep
> Decide to wait
> You wait some more
> Summer 2018 comes, you try TF Eager Mode but no banana
> PyTorch 1.0 releases
> You go over the panic cycle again
> You wait another while
> Competition does its magic and TF starts shifting to meet its user base
> Now you maybe don't need to learn two frameworks

Eager to try Eager Mode? 🤔

You either die a static graph or live long enough to become Eager by default


TensorFlow’s eager execution is an imperative programming environment that evaluates operations immediately, without building graphs: operations return concrete values instead of constructing a computational graph to run later.

TensorFlow 1.X requires users to manually stitch together an abstract syntax tree (the graph) by making tf.* API calls. It then requires users to manually compile the abstract syntax tree by passing a set of output tensors and input tensors to a session.run() call.
TensorFlow 2.0 executes eagerly (like Python normally does) and in 2.0, graphs and sessions should feel like implementation details.
One notable byproduct of eager execution is that tf.control_dependencies() is no longer required, as all lines of code execute in order (within a tf.function, code with side effects execute in the order written).
Eager in the sign language

🎉 Yes, we can now debug and write Python code (especially control flow) as in vanilla Python. 🎉

♻️ Hasta la vista globals ♻️

In a decision of environmental responsability, the TensorFlow team, according to the Variables 2.0 RFC decided it was time to let orphaned global variables be handled by the Garbage Collector.
So you either keep track of your Python pointers or you lose your variables.
While this may seem to create additional overhead for the end-user, together with Keras objects that beahave like sane Python objects (and handle this tracking for you) it actually simplify code structure and usability by quite a lot.

Terminator: Hasta la vista baby

☮️ Make ~~love~~ Functions Not ~~war~~ Sessions ☮️

Fun, Fun, Functions

TensorFlow has gone from a Pure Static Graph to Eager by default while keeping the power and versatilty of Graph compilation by using the tf.function decorator. This feature is built on top of the TF Autograph, which let users define static graph using a natural, familiar, Python syntax. Using the tf.function on a Python Function will convert the code from the pure vanilla Python syntax, to a static Graph. Gone is the need of using TensorFlow control flows primitives, you can write Python control flow and have it converted to highly performant static graph code.


The magic goes deeper but to cover it you will need a dedicated talk.
Luckily you can delve into the mysteries of Autograph later with
Paolo Galeone's talk: Dissecting tf.function to discover AutoGraph strengths and subtleties

The Death of an API

tf.layers is dead

tf.Graph is gone (into hiding)

tf.contrib is dead

🎉 It was about time 🎉

Long Live Keras

  • High Level API Spec with usability in mind

  • Modular backend (TensorFlow 💖, CNTK 💀, Theano 💀)

  • Keras Layers and Models behave in a sane and Pythonic way

  • Chainer API for available for the expert practitioner

  • Keras Website

  • tf.keras API on TensorFlow Docs

Keras Layers

Keras Layers

  • One and only Layer API
  • Available under tensorflow.keras.layers
  • Pythonic Object
  • Simple to use:
    1. Initialize the Layer
    2. Use layers as callable
awesome_layer = keras.layers.Dense(
    units=50,
    activation="tanh",
)
awesome_layer(awesome_inputs)

Keras Losses & Optimizers

Keras Losses

Losses now are contained it the tf.keras.losses module.

While TF 2.0 offers an ample out of the box selections of them, you can easily define
custom ones by subclassing from tf.keras.losses.Loss, the only strict requirement is the
implementation of a call() method accepting y_true and y_pred as arguments.


Keras Optimizers

Gone are the old TF optimizers, now they live under the tf.keras.optimizers module.
We will see later how they are used during the training step.

Keras Model API

> Sequential

Functional

Subclassing (Chainer) API

Keras Sequential Model

Keras Sequential Model

The sequential model is the most straightforward of the the three APIs,
it consists of a stacked sequence of layers, each feeding into the next in a linear graph.


It can be constructed either by passing a list of keras.layers at its instantiation
or built by iteratively calling its add() method and passing the desired layer.

Keras Sequential Model

model = tf.keras.Sequential([
    tf.keras.layers.Dense(...),
    tf.keras.layers.BatchNormalization(...),
    tf.keras.layers.Dense(...),
    tf.keras.layers.Softmax(...)
])
model = tf.keras.Sequential()
model.add(tf.keras.layers.Dense(...))
model.add(tf.keras.layers.BatchNormalization(...))
model.add(tf.keras.layers.Dense(...))
model.add(tf.keras.layers.Softmax(...))

Keras Model API

Sequential

> Functional

Subclassing (Chainer) API

Keras Model Functional API

Keras Model Functional API

While keras.Sequential is enough to cover a vast portion of use-cases, there are times
where a linear stack of layers cannot simply represent a more complex architecture.

Keras Functional API comes to the rescue by offering us control over how and when are layers instantiated and called.


[The Functional API] can handle models with non-linear topology, models with shared layers, and models with multiple inputs or outputs.
It's based on the idea that a deep learning model is usually a directed acyclic graph (DAG) of layers. The Functional API a set of tools for building graphs of layers.

For more information see: The Keras Functional API in TensorFlow

Keras Model Functional API

1) Start by creating an Input Node. NOTE: we never specify the batch size.
What gets returned, inputs, contains information about the shape and dtype of
the input data that you expect to feed to your model:

inputs = tf.keras.Input(shape=(784,))

2) To add nodes to the graph simply call a tf.keras.Layer or tf.keras.Model on the inputs
NOTE: Layer and Model need to be initialized before the the __call__() method is exposed.

inputs = tf.keras.Input(shape=(784,), name="img")
x = tf.keras.layers.Dense(64, activation="relu")(inputs)
x = tf.keras.layers.Dense(64, activation="relu")(x)
outputs = tf.keras.layers.Dense(10, activation="softmax")(x)

Keras Model Functional API

3) We can create a Model by specifying its inputs and outputs in the graph of layers.
NOTE: In case of a Model with multiple inputs/outputs, we just need to pass them as list.

model = keras.Model(inputs=inputs, outputs=outputs)

4) After that our model can be inspected with model.summary()

Model: "mnist_model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
img (InputLayer)             [(None, 784)]             0
_________________________________________________________________
dense_3 (Dense)              (None, 64)                50240
_________________________________________________________________
dense_4 (Dense)              (None, 64)                4160
_________________________________________________________________
dense_5 (Dense)              (None, 10)                650
=================================================================
Total params: 55,050
Trainable params: 55,050
Non-trainable params: 0
_________________________________________________________________

Keras Model API

Sequential

Functional

> Subclassing (Chainer) API

Keras Model Subclassing API

Keras Model Subclassing (Chainer) API

Made popular by the Chainer Deep Learning Framework, this is regarded as the API for the 1% researcher.
The idea is simple, by subclassing from the tf.keras.Model object we can define our own forward pass.
Create layers in the __init__ method and set them as attributes of the class instance. Define the forward pass in the call method.
NOTE: Model subclassing is particularly useful when eager execution is enabled,
because it allows the forward pass to be written imperatively.

CAVEAT USER
While this way of building model is extremely powerful it comes at higher cost in terms of bug-likelihood and technical debt.

Keras Model Subclassing API

The following example shows a subclassed tf.keras.Model using a custom forward pass that does not have to be run imperatively.

class MyModel(tf.keras.Model):

    def __init__(self, num_classes=10):
        super(MyModel, self).__init__(name="my_model")
        self.num_classes = num_classes
        # Define your layers here.
        self.dense_1 = layers.Dense(32, activation="relu")
        self.dense_2 = layers.Dense(num_classes, activation="sigmoid")

    def call(self, inputs):
        # Define your forward pass here,
        # using layers you previously defined (in `__init__`).
        x = self.dense_1(inputs)
        return self.dense_2(x)

Note: to make sure the forward pass is always run imperatively, you must set dynamic=True when calling the super constructor.

High Performance Input Pipeline with `tf.data`

Input Pipeline with tf.data

tf.data

TensorFlow 1.X had support for highly performant input pipelines in the form of the tf.data module. With 2.0 the module is still there and possibly even easier to use than before.


The tf.data API introduces a tf.data.Dataset abstraction that represents a sequence of elements, in which each element consists of one or more Tensor objects. For example, in an image pipeline, an element might be a single training example, with a pair of tensors representing the image and its label.

There are two distinct ways to create a dataset:

  • A data source constructs a Dataset from data stored in memory in one ore more files.
  • A data transformation constructs a dataset from one or more tf.data.Dataset objects.

Plumbing

  • In memory data: tf.data.Dataset.from_tensors() or tf.data.Dataset.from_tensor_slices().
  • Input data steored in the recommended TFRecord format: tf.data.TFRecordDataset()

Once you have a Dataset object, you can transform it into a new Dataset by chaining method calls on the tf.data.Dataset object.
For example, you can apply per-element transformations such as Dataset.map(), and multi-element transformations such as Dataset.batch().

See the documentation for tf.data.Dataset for a complete list of transformations.

NOTE: In TF 2.0 the Dataset object is a Python iterable. This makes it possible to consume its elements using a for loop
; another option is creating an iterable using Python iter() and consuming it with the the next() function.

Model Training

model.compile()

model.compile()

After having created your model with one of the above API,
the last step is to let Keras bundle Model, Loss and Optimizer together.
This bit of magic is done via the tf.keras.Model.compile() method which once called, configures the model for training.


tf.keras.Model.compile takes three important arguments:

  • optimizer: This object specifies the training procedure.
    Pass it optimizer instances from the tf.keras.optimizers module, such as tf.keras.optimizers.Adam
    or tf.keras.optimizers.SGD. If you just want to use the default parameters,
    you can also specify optimizers via strings, such as 'adam' or 'sgd'.
  • loss: The function to minimize during optimization. Common choices include mean square error (mse),
    categorical_crossentropy, and binary_crossentropy. Loss functions are specified by name
    or by passing a callable object from the tf.keras.losses module.
  • metrics: Used to monitor training. These are string names
    or callables from the tf.keras.metrics module.
  • Additionally, to make sure the model trains and evaluates eagerly,
    you can make sure to pass run_eagerly=True as a parameter to compile.

model.compile()

model = tf.keras.Sequential(
    [
        # Adds a densely-connected layer with 64 units to the model:
        tf.keras.layers.Dense(
            64, activation="relu", input_shape=(32,)
        ),
        # Add another:
        tf.keras.layers.Dense(64, activation="relu"),
        # Add a softmax layer with 10 output units:
        tf.keras.layers.Dense(10, activation="softmax"),
    ]
)

model.compile(
    optimizer=tf.keras.optimizers.Adam(0.001),
    loss="categorical_crossentropy",
    metrics=["accuracy"],
)

Do you even model.fit?💪

model.fit()

Terminator: Hasta la vista baby

model.fit()

Unless you are in a particualr use case, Model.fit is the vastly preferred way to train Keras Models.
It is fast, optimized and reliable.

tf.keras.Model.fit() has many optional arguments but these these 5 are the most important ones:

  • x: Input data
  • y: Target data
  • epochs: Training is structured into epochs.
    An epoch is one iteration over the entire input data (this is done in smaller batches).
  • batch_size: When passed NumPy data, the model slices the data into smaller batches and iterates over these batches during training.
    This integer specifies the size of each batch.
    Be aware that the last batch may be smaller if the total number of samples is not divisible by the batch size.
  • validation_data: When prototyping a model, you want to easily monitor its performance on some validation data.
    Passing this argument—a tuple of inputs and labels—allows the model to display
    the loss and metrics in inference mode for the passed data, at the end of each epoch.

model.fit()

import numpy as np

data = np.random.random((1000, 32))
labels = random_one_hot_labels((1000, 10))

val_data = np.random.random((100, 32))
val_labels = random_one_hot_labels((100, 10))

model.fit(
    data,
    labels,
    epochs=10,
    batch_size=32,
    validation_data=(val_data, val_labels),
)

I train my model a quarter of a tf.data.Dataset at a time 🏎

Model.fit() & tf.data.Dataset

Here, the fit method uses the steps_per_epoch argument—this is the number of training steps
the model runs before it moves to the next epoch.
Since the Dataset yields batches of data, this snippet does not require a batch_size.

tf.data.Dataset can also be used for validation:

dataset = tf.data.Dataset.from_tensor_slices((data, labels))
dataset = dataset.batch(32).repeat()

val_dataset = tf.data.Dataset.from_tensor_slices(
    (val_data, val_labels)
)
val_dataset = val_dataset.batch(32).repeat()

model.fit(
    dataset,
    epochs=10,
    steps_per_epoch=30,
    validation_data=val_dataset,
    validation_steps=3,
)

DIY Calisthenics? 🤔

The Dark Powers: Custom Training

Terminator: Hasta la vista baby

⚡️ UNLIMITED POWER ⚡️

Beyond the safety of model.fit() and model.evaluate() lies the dark power of the GradientTape,
those power-mad practitioners who embrace its powers trade their sanity in exchange for a perfect control over the training.


Ominous description aside, this a powerful technique that can make the training of
complex architecture more feasible at the cost of more possible errors.


Calling a model inside a GradientTape scope enables you to retrieve the gradients of
the trainable weights of the layer with respect to a loss value.
Using a tf.keras.optimizer instance, you can use these gradients to update these
variables (which you can retrieve using model.trainable_weights).

for epoch in range(3):
    print("Start of epoch %d" % (epoch,))

    # Iterate over the batches of the dataset.
    for step, (x_batch_train, y_batch_train) in enumerate(
        train_dataset
    ):

        # Open a GradientTape to record the operations run
        # during the forward pass, which enables autodifferentiation.
        with tf.GradientTape() as tape:

            # Run the forward pass of the layer.
            # The operations that the layer applies
            # to its inputs are going to be recorded
            # on the GradientTape.
            logits = model(
                x_batch_train
            )  # Logits for this minibatch

            # Compute the loss value for this minibatch.
            loss_value = loss_fn(y_batch_train, logits)

        # Use the gradient tape to automatically retrieve
        # the gradients of the trainable variables with respect to the loss.
        grads = tape.gradient(
            loss_value, model.trainable_weights
        )

        # Run one step of gradient descent by updating
        # the value of the variables to minimize the loss.
        optimizer.apply_gradients(
            zip(grads, model.trainable_weights)
        )

        # Log every 200 batches.
        if step % 200 == 0:
            print(
                "Training loss (for one batch) at step %s: %s"
                % (step, float(loss_value))
            )
            print(
                "Seen so far: %s samples"
                % ((step + 1) * 64)
            )

Exporting model

Exporting Models

ALL YOU NEED TO KNOW ABOUT EXPORTING MODELS WE ARE RUNNING OUT OF TIME

ALL YOU NEED TO KNOW ABOUT SAVING SUBCLASSED MODELS


TL;DR

Exporting is dead easy: model.save for Sequential and Functional API

  • ith four possible choices:
  • Whole Model Saving
  • Export whole model as a SavedModel
  • Architecture-only-saving
  • Weights-only-saving
  • Weights-only-saving in SavedModel Format
Terminator: Hasta la vista baby

To Conclude

If you are using PyTorch --> Try TF 2.0

If you love Keras --> Try TF 2.0

If TF 1.X scared you away (for good reason) --> Try TF 2.0


... Most importantly ...

Follow me on TwittaH --> @mr_ubik

Thank You All, may the Tensor be with you 💖

HAPPY EUROPYTHON!