0%

Out of vocabulary problem is very common problem in NLP, it can be seen in text classification, named entity recognition, information retrieval, etc.

It is caused by multiple reasons and there are multiple solutions for it.

Read more »

Overview

This blog notes down something I learned recently about communication with managers. Nobody can read your mind, neither can they understand you better than yourself. With better communications, managers understand your strengths/needs/abilities better, and you career grows faster and happier.

First of all, the definition of management is concise, it differenciates leader and manager.

Definition of a management:

Management is not a promotion, it is a lateral move for people who want to get work done through and grow and develop others

Read more »

Jupyterlab loads too slow

Description

I setup my jupyter server on an EC2 instance and open it with my laptop browser, but it is getting slower and slower when I try to reopen the window.

I looked at the Chrome’s Dev tools, it turns out there is a js file 36MB size, that means each time you need to download 36MB js file to open a jupyter lab page.

Solution

Build the jupyter lab with following command:

1
jupyter lab build --minimize=False --dev-build=False
Read more »

I attended the tech talk at spotify tonight, learned a lot of good stuff, many thanks to Spotify to origanize such great event and great representation from Ramino Yon.

Main takeaways, six challenges when doing machine learning infrastructure and some pointers to existed papers/blogs about ML infrastructures.

Challenges

1. Rely on data standards

Read more »

One big difference between python2 and python3 is about strings. I have been bitten by this serval times. The motivation of this blog is clearify some basic questions.

Code Point

We all know computers can only understand 0s and 1s, to represent a character or a string, we need such kind binary representation, a well-known code is
ASCII, for example:

1
2
3
a = 01100001
b = 01100010
c = 01100011
Read more »