What I have learned from competing in kaggle competitions

4 min read

Out of interest, I joined in Kaggle a year old and participate in several competitions, but I didn’t reach top 15% until 2 month ago. Here I want to share some tips I have learned from those competitions. I am just a programmer with no acadamic background, so if there is mistake, please contact me. Any advice are greatly appreciated.

PyTorch or TensorFlow

TensorFlow is now widely used by companies, startups, and business firms to automate things and develop new systems. It draws its reputation from its distributed training support, scalable production and deployment options, and support for various devices like Android. PyTorch is gaining popularity for its simplicity, ease of use, dynamic computational graph and efficient memory usage.

What can we build with TensorFlow and PyTorch?

Initially, neural networks were used to solve simple classification problems like handwritten digit recognition or identifying a car’s registration number using cameras. But thanks to the latest frameworks and NVIDIA’s high computational graphics processing units (GPU’s), we can train neural networks on terra bytes of data and solve far more complex problems. A few notable achievements include reaching state of the art performance on the IMAGENET dataset using convolutional neural networks implemented in both TensorFlow and PyTorch. The trained model can be used in different applications, such as object detection, image semantic segmentation and more.

Although the architecture of a neural network can be implemented on any of these frameworks, the result will not be the same. The training process has a lot of parameters that are framework dependent. For example, if you are training a dataset on PyTorch you can enhance the training process using GPU’s as they run on CUDA (a C++ backend). In TensorFlow you can access GPU’s but it uses its own inbuilt GPU acceleration, so the time to train these models will always vary based on the framework you choose.

Comparing PyTorch and TensorFlow

The key difference between PyTorch and TensorFlow is the way they execute code. Both frameworks work on the fundamental datatype tensor. You can imagine a tensor as a multi-dimensional array.

Mechanism: Dynamic vs Static graph definition

TensorFlow is a framework composed of two core building blocks:

A library for defining computational graphs and runtime for executing such graphs on a variety of different hardware.
A computational graph which has many advantages (but more on that in just a moment). A computational graph is an abstract way of describing computations as a directed graph. A graph is a data structure consisting of nodes (vertices) and edges. It’s a set of vertices connected pairwise by directed edges. When you run code in TensorFlow, the computation graphs are defined statically. All communication with the outer world is performed via tf.Session object and tf.Placeholder, which are tensors that will be substituted by external data at runtime. This is how a computational graph is generated in a static way before the code is run in TensorFlow. The core advantage of having a computational graph is allowing parallelism or dependency driving scheduling which makes training faster and more efficient.

Similar to TensorFlow, PyTorch has two core building blocks:

Imperative and dynamic building of computational graphs.
Autograds: Performs automatic differentiation of the dynamic graphs. The graphs change and execute nodes as you go with no special session interfaces or placeholders. Overall, the framework is more tightly integrated with the Python language and feels more native most of the time. Hence, PyTorch is more of a pythonic framework and TensorFlow feels like a completely new language. These differ a lot in the software fields based on the framework you use. TensorFlow provides a way of implementing dynamic graph using a library called TensorFlow Fold, but PyTorch has it inbuilt.
VISUALIZATION When it comes to visualization of the training process, TensorFlow takes the lead. Visualization helps the developer track the training process and debug in a more convenient way. TenforFlow’s visualization library is called TensorBoard. PyTorch developers use Visdom, however, the features provided by Visdom are very minimalistic and limited, so TensorBoard scores a point in visualizing the training process.
PRODUCTION DEPLOYMENT When it comes to deploying trained models to production, TensorFlow is the clear winner. We can directly deploy models in TensorFlow using TensorFlow serving which is a framework that uses REST Client API.

In PyTorch, these production deployments became easier to handle than in it’s latest 1.0 stable version, but it doesn’t provide any framework to deploy models directly on to the web. You’ll have to use either Flask or Django as the backend server. So, TensorFlow serving may be a better option if performance is a concern.

PROS AND CONS OF PYTORCH AND TENSORFLOW

TENSORFLOW PROS:

Simple built-in high-level API.
Visualizing training with Tensorboard.
Production-ready thanks to TensorFlow serving.
Easy mobile support.
Open source.
Good documentation and community support.

TENSORFLOW CONS:
Static graph.
Debugging method.
Hard to make quick changes.

PYTORCH PROS:

Python-like coding.
Dynamic graph.
Easy & quick editing.
Good documentation and community support.
Open source.
Plenty of projects out there using PyTorch.

PYTORCH CONS:
Third-party needed for visualization.
API server needed for production

What I would recommend is if you want to make things faster and build AI-related products, TensorFlow is a good choice. PyTorch is mostly recommended for research-oriented developers as it supports fast and dynamic training.

What I have learned from competing in kaggle competitions

PyTorch or TensorFlow

What can we build with TensorFlow and PyTorch?

Comparing PyTorch and TensorFlow

PROS AND CONS OF PYTORCH AND TENSORFLOW

TENSORFLOW PROS:

TENSORFLOW CONS:

PYTORCH PROS:

PYTORCH CONS:

First of all, take a overview of dataset

Decide the backbone network structure

Optimize one network

Try model assembly

Try TTA

Some other tricks

Tags