Tales of Data Science

Google can opensource: TensorFlow


Recently, when I was attending AINL-ISMW FRUCT 2015 conference, I found out that Google open-sourced TensorFlow. That is cool, but may provoke some questions. Here, I try to answer some of them.

You may read this post in Russian if you like.

What is TensorFlow?

It is definitely not the same as, say, cloud-based machine-learning service (like, for example, Azure ML is).

It is a machine-learning library using data flow graphs to build models. The main purpose of the library is to create models to solve various NLP and image recognition tasks.

TensorFlow has been created for Deep Learning to let a user create a neural network architecture by himself (or herself, of course). Still, the library allows the user to work with statistical machine learning algorithms. However, it does not provide them out-of-the-box – user has to implement them on his own and TensorFlow provides only tools to do this.

Also, TensorFlow is a second generation machine learning system and it has been created to replace its predecessor called DistBelief.

What is data flow graph?

In TensorFlow all the computations are represented as directed graphs, where the computations themselves (and input/output data as well) are nodes. The edges of the graph are paths, by which the data flows from node to node.


By the way, TensorFlow has its own visualization module called TensorBoard, which can visualize the created model in order to let a user trace the data flow in the model (Captain Obvious to rescue!).

And what about tensors?

Data in TensorFlow are represented as tensors (multidimensional and dynamically sized data arrays). Actually, tensors flow in the graph from node to node, thus making the name of the library sound logical. Simply speaking, a tensor is a 3D matrix (but it is not a strict mathematical definition, of course!). On a figure below, you may see a tensor in terms of vivisection. Comparing to matrix it has more degrees of freedom regarding data selection and slicing.


What is cool about TensorFlow?

  1. Flexibility of representation – user can create almost any type of data flow graph, visualize and admire it. If you can express your algorithm as data flow graph, you can do it with TensorFlow – no exceptions.
  2. TensorFlow performs calculations both on CPU and GPU (if you have CUDA installed, of course) + it has support for parallel and asynchronous computations. Also, you can easily port your TensorFlow model to any other hardware, for example, from server to PC and from PC to laptop with no code changes. Perfect, isn’t it?
  3. You may use TensorFlow for both research and production purposes. Therefore, you can create some model for research and then push this very model into a product (after some code rewriting, of course, as researchers usually forget about code optimization) using the same TensorFlow library.
  4. Auto-differentiation. TensorFlow can automatically compute derivatives for you. It is very convenient if you love gradient-based machine learning algorithms like Stochastic Gradient Descent.
  5. Python interface! However, it has a poorly documented C++ interface as well, but it is poorly documented (Captain Obvious to rescue!). Go, Java, Lua, JavaScript, and R interfaces are coming soon.

What about the license?

It is Apache 2.0 – you may freely use TensorFlow for both research and commercial purposes. Truly nice.

Are there any tutorials and examples?

Sure there are!

For example, try this tutorial from official website or explore these examples on GitHub.

Besides, you may taste a simplified TensorFlow interface for scikit-learn fans. It makes a transition form scikit-learn to TensorFlow much more comfortable.

Remember: to work with TensorFlow using Python you must have numpy and matplotlib installed. And, of course, you will need CUDA to perform calculations with GPU.

How TensorFlow differs from other Deep Learning Python libraries such as Theano or Caffe?

For some of you it may be interesting if there is any difference between TensorFlow and libraries like Theano, which also can make their own Deep Learning with multi-dimensional arrays and GPU.

Yes, TensorFlow may look like Theano, but:

  • Tensor flow supports parallel computations, while Theano does not (and that is the main difference!)
  • TensorFlow models are more flexible in terms of portability
  • Someone (including me) may consider TensorFlow code structure more human-interpretable and easier to support
  • TensorFlow is a C++ library with Python Interface, while Theano is a Python library with an ability to generate internal C or CUDA modules.

I hope this short overview will help you to understand or at least guesstimate what TensorFlow is, why it is cool and why you do or do not need it. Comments are welcome!

10,281 total views, 1 views today

Google can opensource: TensorFlow
5 2 votes

Leave a Reply

2 Comment threads
1 Thread replies
Most reacted comment
Hottest comment thread
3 Comment authors
torsellloGiansepti Recent comment authors
newest oldest most voted

Thanks! Very clear post


thanks on this post. This is very nice. ST3 Telkom