# Key Concepts

If you are new to Larq and/or Binarized Neural Networks (BNNs), this is the right place to start. Below we summarize the key concepts you need to understand to work with BNNs.

## Quantizer¶

A quantizer defines an operation that quantizes a vector, as well as a pseudo-gradient that is used for automatic differentation. This pseudo-gradient is in general not the true gradient.

Generally you will find quantizers throughout the network to quantize activations. This is because most layers output integers, even if all inputs are binary, because they sum over multiple binary values.

It is also common to apply quantizers to the weights during training. This is necessary when relying on real-valued latent-weights to accumulate non-binary update steps, a common optimization strategy for BNNs. After training is finished, the real-valued weights and associated quantization operations can be discarded.

### Pseudo-Gradient¶

The true gradient of a quantizer is in general zero almost everywhere and therefore cannot be used for gradient descent. Instead, optimization of BNNs rely on what we call pseudo-gradients, which are used during back-propagation. In the documentation for each quantizer you will find the definition and a graph of the pseudo-gradient.

### Using Quantizers as Activations¶

Although quantizers are usually passed to specialized arguments (see Quantized Layers), they can be used just like `tf.keras`

activations:

# Use a quantizer als activation y = larq.layers.QuantDense(512, activation="ste_sign")(x)

Just like activations, quantizers can also be used on their own:

# The two lines below are equivalent x_binarized = larq.quantizers.ste_sign(x) x_binarized = tf.keras.layers.Activation("ste_sign")(x)

## Quantized Layers¶

Each quantized layers requires an `input_quantizer`

and a `kernel_quantizer`

that describe the way of quantizing the incoming activations and weights of the layer respectively. If both `input_quantizer`

and `kernel_quantizer`

are `None`

the layer is equivalent to a full precision layer.

A quantized layer computes

\[ \sigma(f(q_{\, \mathrm{kernel}}(\boldsymbol{w}), q_{\, \mathrm{input}}(\boldsymbol{x})) + b) \]

with full precision weights \boldsymbol{w}, arbitrary precision input \boldsymbol{x}, layer operation f (e.g. f(\boldsymbol{w}, \boldsymbol{x}) = \boldsymbol{x}^T \boldsymbol{w} for a densely-connected layer), activation \sigma and bias b. This will result in the following computational graph:

Larq layers are fully compatible with the Keras API so you can use them with Keras Layers interchangeably:

model = tf.keras.models.Sequential([ tf.keras.layers.Flatten(), larq.layers.QuantDense(512, activation="relu"), larq.layers.QuantDense(10, activation="softmax") ])

model = tf.keras.models.Sequential([ tf.keras.layers.Flatten(), tf.keras.layers.Dense(512, activation="relu"), tf.keras.layers.Dense(10, activation="softmax") ])

A simple fully-connected Binarized Neural Network (BNN) using the Straight-Through Estimator can be defined in just a few lines of code using either the Keras sequential, functional or model subclassing APIs:

model = tf.keras.models.Sequential([ tf.keras.layers.Flatten(), larq.layers.QuantDense(512, kernel_quantizer="ste_sign", kernel_constraint="weight_clip"), larq.layers.QuantDense(10, input_quantizer="ste_sign", kernel_quantizer="ste_sign", kernel_constraint="weight_clip", activation="softmax")])

x = tf.keras.Input(shape=(28, 28, 1)) y = tf.keras.layers.Flatten()(x) y = larq.layers.QuantDense(512, kernel_quantizer="ste_sign", kernel_constraint="weight_clip")(y) y = larq.layers.QuantDense(10, input_quantizer="ste_sign", kernel_quantizer="ste_sign", kernel_constraint="weight_clip", activation="softmax")(y) model = tf.keras.Model(inputs=x, outputs=y)

class MyModel(tf.keras.Model): def __init__(self): super().__init__() self.flatten = tf.keras.layers.Flatten() self.dense1 = larq.layers.QuantDense(512, kernel_quantizer="ste_sign", kernel_constraint="weight_clip") self.dense2 = larq.layers.QuantDense(10, input_quantizer="ste_sign", kernel_quantizer="ste_sign", kernel_constraint="weight_clip", activation="softmax") def call(self, inputs): x = self.flatten(inputs) x = self.dense1(x) return self.dense2(x) model = MyModel()

## Using Custom Quantizers¶

Quantizers are functions that transform a full precision input to a quantized output. Since this transformation usually is non-differentiable it is necessary to modify the gradient in order to be able to train the resulting QNN. This can be done with the `tf.custom_gradient`

decorator.

In this example we will define a binarization function with an identity gradient:

@tf.custom_gradient def identity_sign(x): def grad(dy): return dy return tf.sign(x), grad

This function can now be used as an `input_quantizer`

or a `kernel_quantizer`

:

larq.layers.QuantDense(10, input_quantizer=identity_sign, kernel_quantizer=identity_sign)