Remembering 2017: A Year of Data Science (for me)

Abe Vallerian
4 min readDec 31, 2017

Hi guys! This is my first post. I think it is a good idea to write about my story in my very first post. I don’t know about you guys, but for me, 2017 is a very important year to remember. Besides many memorable personal achievement (e.g. getting a girlfriend ;p), my journey in Data Science begin in 2017.

http://www.thebrunettediaries.com/wp-content/uploads/2017/12/2017.jpg

The Beginning

I was graduated from Applied Mathematics department back in 2015. Then, I started working at a Bank as a Management Trainee. At that time, I felt boring working in banking industry. Maybe it should be interesting and challenging for some people, but I felt that job was lacking technical (i.e. mathematical) skill at least in my department (since I spent most of my time in Operations and Technology department).

Fortunately during my job, I heard a lot about terms such as Data Science and Machine Learning. Since end of 2016, I think Data Science has been a hot topic in Indonesia. Then, at the beginning of year 2017 (I forgot which month), I started to learn about Data Science and stuff. I started by reading A Course in Machine Learning by Hal Daumé III. It gave a very good, easy to understand introduction to the methods in Machine Learning, such as Supervised vs Unsupervised, k-Nearest Neighbors, even Neural Networks. My background in Mathematics also helped me to understand the methods. I also learned programming using Python since it is one of the most popular language for Data Science.

After few years of studying, I applied to Tokopedia, one of the biggest marketplace in Indonesia, as a Data Scientist. At that time, not every company has a Data team. Only technology companies or start-ups have it. Then, I got the job and started my journey as Data Scientist in June 2017. Bon voyage!

Meeting with Networks

I never regret of my decision in entering Data Science field. Almost every day I found out something cool! During my work, I learned to use many models, but my favorite is Neural Networks (NN).

At first, I found it hard to understand how it works, since I didn’t understand what the hidden variables represent. I guess it is my habit from learning mathematics: I need to understand fully how the model works and the mathematics behind it. Then, I found out that actually I don’t need to understand that. Only the machine understands how the hidden variables represent. I think of it as the hidden patterns that the machine finds.

Nice way to understand NN is that it is a kind of extension of Linear Regression (LR). In fact, Linear Regression is an NN with 2 layers (1 input + 1 output layers). Image 1 shows an illustration of LR: the inputs are multiplied by its respective weights and they are summed together.

Image 1: An Example of Linear Regression (http://dungba.org/content/images/2016/03/nnet1.png)

In NN, you can stack hidden layers between input layer and output layer (see Image 2). In transition between layers, there is an activation function. It is used to create non-linearity between layers (without it, it will become a simple LR, trust me!). I won’t explain it in details, since it is not my focus.

Image 2: An Example of Neural Networks (https://upload.wikimedia.org/wikipedia/commons/thumb/4/46/Colored_neural_network.svg/300px-Colored_neural_network.svg.png)

Neural Network is my favorite model because of its effectiveness in solving the problem. There are many extensions of Neural Networks, for example Recurrent Neural Networks (RNN). Think of it as a ‘time-dependent Linear Regression’, since the current output is used to predict the next output. I used RNN a lot since my work at Tokopedia mostly related to text. I used RNN to create Text Classification and Part-of-Speech Tagging model. RNN is very powerful. You have to try it! Oh, I use Tensorflow library to model RNN. I think it has good documentation, a lot of users, and it also has Tensorboard to visualize learning!

The Takeaways

There are some purposes why I write this post:

  1. If you are interested in Data Science or Machine Learning, I want to encourage you to take a step to enter this field. It is very promising, challenging, and still developing a lot. The applications are numerous, e.g. Speech Recognition, Image Classification, even Image Caption Generator!
  2. I want to make this year (2017) memorable. I’m really grateful of year 2017. So many things happened this year, e.g. Trump, bitcoin, North Korea. Just take a moment to contemplate about the great things, experience, events, memories happened to you this year. Make it memorable, as we step into year 2018!
  3. This post is the beginning of my writing. I have thought about starting to write since a long time a go. I guess next year I will write a lot. Just wait!

Hence, this is my Data Science journey. How about you? Share it to the world!

Merry Christmas 2017 & Happy New Year 2018,
Abe Vallerian

--

--

Abe Vallerian

Being Human, Data Scientist, Language Learner, Ex-mathematician