Key Concepts

If you are new to Larq and/or Binarized Neural Networks (BNNs), this is the right place to start. Below we summarize the key concepts you need to understand to work with BNNs.

Quantizer

A quantizer defines an operation that quantizes a vector, as well as a pseudo-gradient that is used for automatic differentiation. This pseudo-gradient is in general not the true gradient.

Generally, you will find quantizers throughout the network to quantize activations. This is because most layers output integers, even if all inputs are binary, because they sum over multiple binary values.

It is also common to apply quantizers to the weights during training. This is necessary when relying on real-valued latent-weights to accumulate non-binary update steps, a common optimization strategy for BNNs. After training is finished, the real-valued weights and associated quantization operations can be discarded.

Pseudo-Gradient

The true gradient of a quantizer is in general zero almost everywhere and therefore cannot be used for gradient descent. Instead, the optimization of BNNs relies on what we call pseudo-gradients, which are used during back-propagation. In the documentation for each quantizer you will find the definition and a graph of the pseudo-gradient.

Using Quantizers as Activations

Although quantizers are usually passed to specialized arguments (see Quantized Layers), they can be used just like tf.keras activations:

# Use a quantizer als activation
y = larq.layers.QuantDense(512, activation="ste_sign")(x)

Just like activations, quantizers can also be used on their own:

# The two lines below are equivalent
x_binarized = larq.quantizers.ste_sign(x)
x_binarized = tf.keras.layers.Activation("ste_sign")(x)

Quantized Layers

Each quantized layers requires an input_quantizer and a kernel_quantizer that describe the way of quantizing the incoming activations and weights of the layer respectively. If both input_quantizer and kernel_quantizer are None the layer is equivalent to a full precision layer.

A quantized layer computes

\[ \sigma(f(q_{\, \mathrm{kernel}}(\boldsymbol{w}), q_{\, \mathrm{input}}(\boldsymbol{x})) + b) \]

with full precision weights \boldsymbol{w}, arbitrary precision input \boldsymbol{x}, layer operation f (e.g. f(\boldsymbol{w}, \boldsymbol{x}) = \boldsymbol{x}^T \boldsymbol{w} for a densely-connected layer), activation \sigma and bias b. This will result in the following computational graph:

kernel bias input output input_quantizer kernel_quantizer layer_operation add activation

Larq layers are fully compatible with the Keras API so you can use them with Keras Layers interchangeably:

model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(),
    larq.layers.QuantDense(512, activation="relu"),
    larq.layers.QuantDense(10, activation="softmax")
])
model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(512, activation="relu"),
    tf.keras.layers.Dense(10, activation="softmax")
])

A simple fully-connected Binarized Neural Network (BNN) using the Straight-Through Estimator can be defined in just a few lines of code using either the Keras sequential, functional or model subclassing APIs:

model = tf.keras.models.Sequential([
    tf.keras.layers.Flatten(),
    larq.layers.QuantDense(512,
                           kernel_quantizer="ste_sign",
                           kernel_constraint="weight_clip"),
    larq.layers.QuantDense(10,
                           input_quantizer="ste_sign",
                           kernel_quantizer="ste_sign",
                           kernel_constraint="weight_clip",
                           activation="softmax")])
x = tf.keras.Input(shape=(28, 28, 1))
y = tf.keras.layers.Flatten()(x)
y = larq.layers.QuantDense(512,
                           kernel_quantizer="ste_sign",
                           kernel_constraint="weight_clip")(y)
y = larq.layers.QuantDense(10,
                           input_quantizer="ste_sign",
                           kernel_quantizer="ste_sign",
                           kernel_constraint="weight_clip",
                           activation="softmax")(y)
model = tf.keras.Model(inputs=x, outputs=y)
class MyModel(tf.keras.Model):
    def __init__(self):
        super().__init__()
        self.flatten = tf.keras.layers.Flatten()
        self.dense1 = larq.layers.QuantDense(512,
                                             kernel_quantizer="ste_sign",
                                             kernel_constraint="weight_clip")
        self.dense2 = larq.layers.QuantDense(10,
                                             input_quantizer="ste_sign",
                                             kernel_quantizer="ste_sign",
                                             kernel_constraint="weight_clip",
                                             activation="softmax")

    def call(self, inputs):
        x = self.flatten(inputs)
        x = self.dense1(x)
        return self.dense2(x)

model = MyModel()

Using Custom Quantizers

Quantizers are functions that transform a full precision input to a quantized output. Since this transformation usually is non-differentiable it is necessary to modify the gradient to be able to train the resulting QNN. This can be done with the tf.custom_gradient decorator.

In this example we will define a binarization function with an identity gradient:

@tf.custom_gradient
def identity_sign(x):
    def grad(dy):
        return dy
    return tf.sign(x), grad

This function can now be used as an input_quantizer or a kernel_quantizer:

larq.layers.QuantDense(10,
                       input_quantizer=identity_sign,
                       kernel_quantizer=identity_sign)