Questions tagged [tensorflow-transform]
TensorFlow Transform (tf.transform) is a library for data preprocessing with TensorFlow. It enables you to define and execute distributed pre-processing or feature engineering functions on large data sets, and then export the same functions as a TensorFlow graph for re-use during training or serving. It also comes with pre-implemented functions for common tasks like normalization, vocabulary generation and bucketization.
tensorflow-transform
78
questions
0
votes
0
answers
16
views
Using tft.scale_to_gaussian for preprocessing a dataset without using other tensorflow operations
I'm working on a project where I have a set of longtail data that I want to transform into a Gaussian distribution. I'm looking to achieve something similar to scikit-learn's PowerTransformer, but ...
0
votes
1
answer
37
views
Dataflow Tensorflow Transform write transformed data to BigQuery
In a GCP Dataflow pipeline, I am trying to write the transformed data from Transform component into Bigquery and I get the error below. First I would appreciate if someone could let me know if there ...
1
vote
0
answers
35
views
Creating Tensors from features that are linked together
I have a set of multi valued features which are linked together. As an example,
ItemCodes
Scores
AK, NA, UY
0.6, 0.2, 0.2
KG, AK
0.5, 0.5
Each Item has a corresponding score associated with it. ...
0
votes
0
answers
25
views
TensorFlow Transform unexpected behavior while using tf.strings.unicode_split
I am trying to use TensorFlow transform (1.13.0) and TensorFlow (2.12.1) as part of my pipeline and noticed that it doesn't return the correct answer.
This is what i am running:
with beam.Pipeline() ...
0
votes
0
answers
29
views
universal sentence encoder batch pipeline failing
I have a batch job on DataFlow runner to calculate the embedding from the input text. Through the journey of pipeline. I am using tft.impl.context and impl.AnalyzeAndTransformDataset for the same
Here ...
2
votes
1
answer
834
views
tensorflow_transform installation failure on Mac M2
According to Can't install due to dependency on numpy #289, TenforFlow Transform (tft) supports Python 3.9 and there is no limitation for Mac OS on Apple silicon stated in TensorFlow Transform github.
...
0
votes
1
answer
676
views
Dealing with missing values in tensorflow
I need some guidance on the approach to imputation in tensorflow/deep learning. I am familiar with how scikit-learn handles imputation, and when I map it to the tensorflow ecosystem, I would expect ...
0
votes
1
answer
221
views
Transforming tensorflow datasets to beam datasets
There are a variety of ways to get a dataset you can train on in tensorflow. One of the things tensorflow transform does is provide the ability to do preprocessing via AnalyzeAndTransformDataset and ...
1
vote
1
answer
228
views
Add reserved tokens to `tft.vocabulary`
I would like to append words to the vocabulary created by tft.vocabulary that are not a part of the training samples (i.e. <mask> and <pad> tokens).
I see in the docs that the tft....
1
vote
1
answer
531
views
apache beam rows to tfrecord in order to GenerateStatistics
I have built a pipeline that read some data, does some manipulations and create some apache beam Row objects (Steps 1 and 2 in the code below). I then would like to generate statistic and write them ...
0
votes
1
answer
141
views
join datasets with tfx tensorflow transform
I am trying to replicate some data preprocessing that I have done in pandas into tensorflow transform.
I have a few CSV files, which I joined and aggregated with pandas to produce a training dataset. ...
0
votes
1
answer
815
views
How to get vocabulary size in tensorflow_transform before apply_vocabulary?
Also posted the question at https://github.com/tensorflow/transform/issues/261
I am using tft in TFX and needs to transform string list class labels into multi-hot indicators inside preprocesing_fn. ...
1
vote
0
answers
190
views
How can I use BigQuery in a standalone tensorflow transform (TFT) pipeline?
I'm interested in interactive development of a preprocessing_fn for tft.AnalyzeAndTransformDataSet. By interactive development, I mean running a standalone beam pipeline in a Jupyter Notebook and ...
1
vote
1
answer
214
views
Tensorflow Extended (TFX): Is there an easy way to debug functions from Transorm component?
I am supposed to modify a function which is a part of Transorm component. It is a long series of tensorflow operations and I am not sure a. how particular steps affect processed variables b. what does ...
1
vote
1
answer
378
views
How do I pass a TensorFlow Dataset through a TensorFlow Transform pipeline?
I have implemented a custom TensorFlow Dataset for my raw data. I can download, prepare, and load the data as a tensorflow.data.Dataset as follows:
import tensorflow_datasets
builder = ...