You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This code has exactly the same effect as the code above, but by avoiding a conditional, we ensure it will compile with XLA without problems!
101
101
102
-
**XLA Rule #2: Your code cannot have “data-dependent shapes”**
102
+
#### XLA Rule #2: Your code cannot have “data-dependent shapes”
103
103
104
104
What this means is that the shape of all of the `tf.Tensor` objects in your code cannot depend on their values. For example, the function `tf.unique` cannot be compiled with XLA, because it returns a `tensor` containing one instance of each unique value in the input. The shape of this output will obviously be different depending on how repetitive the input `Tensor` was, and so XLA refuses to handle it!
Here, we avoid data-dependent shapes by computing the loss for every position, but zeroing out the masked positions in both the numerator and denominator when we calculate the mean, which yields exactly the same result as the first block while maintaining XLA compatibility. Note that we use the same trick as in rule #1 - converting a `tf.bool` to `tf.float32` and using it as an indicator variable. This is a really useful trick, so remember it if you need to convert your own code to XLA!
126
126
127
-
**XLA Rule #3: XLA will need to recompile your model for every different input shape it sees**
127
+
#### XLA Rule #3: XLA will need to recompile your model for every different input shape it sees
128
128
129
129
This is the big one. What this means is that if your input shapes are very variable, XLA will have to recompile your model over and over, which will create huge performance problems. This commonly arises in NLP models, where input texts have variable lengths after tokenization. In other modalities, static shapes are more common and this rule is much less of a problem.
130
130
@@ -148,10 +148,10 @@ There was a lot in here, so let’s summarize with a quick checklist you can fol
148
148
149
149
- Make sure your code follows the three rules of XLA
150
150
- Compile your model with `jit_compile=True` on CPU/GPU and confirm that you can train it with XLA
151
-
- Either load your dataset into memory or use a TPU-compatible dataset loading approach (see [notebook]((https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/tpu_training-tf.ipynb))
151
+
- Either load your dataset into memory or use a TPU-compatible dataset loading approach (see [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/tpu_training-tf.ipynb))
152
152
- Migrate your code either to Colab (with accelerator set to “TPU”) or a TPU VM on Google Cloud
153
-
- Add TPU initializer code (see [notebook]((https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/tpu_training-tf.ipynb))
154
-
- Create your `TPUStrategy` and make sure dataset loading and model creation are inside the `strategy.scope()` (see [notebook]((https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/tpu_training-tf.ipynb))
153
+
- Add TPU initializer code (see [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/tpu_training-tf.ipynb))
154
+
- Create your `TPUStrategy` and make sure dataset loading and model creation are inside the `strategy.scope()` (see [notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/examples/tpu_training-tf.ipynb))
155
155
- Don’t forget to take `jit_compile=True` out again when you move to TPU!
0 commit comments