Context
Recently I've done a project which needs to model using Graph data(not image graph, but network graph). A big problem I encountered is how to slice subgraph into model training. An intuitive idea will be like using sparse matrix to slice out the corresponding neighbors for any given node. But due to the constraint of tf.SparseTensor
, which do not support frequent and also efficient slice operation for now, I choose to use csr_matrix
in scipy
to take care of this part.
However, if you know tensorflow well, there will be a bridge between this csr_matrix
and tf.Tensor
running in our model. How can we feed the data into our model efficiently? This turns out to be actually the trickiest part in the whole implementation. We hit the rock at the beginning but thankfully, we make this thing right before our deadline!
Estimator or old-fashioned tf.placeholder?
Well, at the beginning I don't really think through this and go for estimator directly. In hindsight, I set a big big trap for myself later on.
An important thing I'd like to mention here is that Tensorflow follows the implementation of static computational graph. So if you are using tf.estimator
API, and when you call input_fn
and model_fn
, you only construct the graph instead of the traditional view of what input function will do, i.e. feed you one batch of data at each time you call it.
Thus, if you can't fit your data feeding processing into the form of the tf.data.Dataset
or your own generator, you are probably need to give up this way. Whatever you are doing, a quick check whether you get thing right here is that the Tensor
from the moment it gets out of the Dataset
to the return of the input_fn
should follow a complete tensor data flow(Wow surprise!, that's why we call it Tensorflow!!) . You should not use Tensor.eval()
to interact with outside data, which in my case is the culprit of the failure.
Eventually, I decided to reconstruct the all training process and use tf.placeholder
instead. Pros are evident, it's much more flexible, we can easily interact with data of format of crs_matrix
and produce corresponding tf.tensor
afterwards. Cons are we need to babysit the training process, which can be taken care of by tf.estimator
previously, like you have to set your tf.global_step
, write your own function to save and load the models, add extra code for continue training and etc.
Takeaway
Before taking this project, I never use the old-fashioned modelling API -- tf.placeholder
. Partly because I picked up tensorflow prettly late, around June 2018, during when people had been propagating the advantages and powerfulness of the tf.estimator
for quite a while. Plus I don't feel there is any necessity to do so. But now I get to know the reason why this approach is still optional even for Tensorflow 1.13, for its super smoothness to be fitted into any form of data pipeline. Thus it's the concrete backup for other sophisticated API, which tends to but could not take care of all circumstances at present.
Hopefully if you see my post, try to at least know this is one way you can take and before you get into your own project, think the whole process twice and maybe draw the pipeline on a paper to make sure you can handle anything between you and success. Otherwise, you may end up spending much more unexpected time on it.
Oh BTW, I'd like to share some useful training tips alongside. and thank you for reading~
Training tips
Usually the size of the hidden layer should be slightly larger than the input size. Because do not enlarge it ferociously. It will drag your whole training speed down.
If you set a large dropout rate like 0.7, do not use too low learning rate like 0.001 or so. Otherwise the model will have a hard to learn and the generalization power will be inferior.
It's possible that the eval loss is much larger in scale, like 10x, 100x compared with training one, even at the beginning of training. Maybe it comes from the differently behaviour of dropout layer.
batchnorm layer helps as always.