I let a neural network read long texts one letter at a time. Its task was to predict the next letter based on those it had seen so far. Over time, it recognized patterns between letters. Find out, what it learned, by feeding it some letters below. When you click the send button on the right, it will read your text and auto-complete it.

You can choose between networks that read a lot of Wikipedia or US Congress transcripts etc.

The Gini Coefficient is a popular metric on Kaggle, especially for imbalanced class values. But googling "Gini Coefficient" gives you mostly economic explanations. Here is a descriptive explanation with regard to using it as an evaluation metric in classification. The Jupyter Notebook for this post is here.

Spoiler Alert: The Gini coefficients are the orange areas. The normalized Gini coefficient is the left one divided by the right one.

The iCloud key-value storage is like the UserDefaults but synced across devices. It also survives uninstalls of the app. I used it today to add iCloud backups to Emoji Diary. It required 4 lines of code.

Activate the iCloud capability

Select your project in Xcode and then select the target under "Project and Targets". Activate iCloud and check "Key-Value storage".

GitHub Pages hosts your jekyll sites for free under *.github.io.

Gulp lets you automate your build process (minifying .css files, concatenating all .js files etc.).

Babel lets you write ES6 (cool new JavaScript) even though not all browsers support it by transpiling it into ES5 (lame old JavaScript).

I switched to using Jekyll for this blog yesterday and I love it. There are blog posts describing how to combine a subset of the above, but I could not find one for combining all four.

In this post we will continue with digit recognition and try to come closer to the benchmark of 99.8% accuracy. In the last post we already ran a test run with a network consisting of 300 hidden neurons and 10 output neurons. After only 15 epochs, it already reached an accuracy of 95% on the test set. However, in the following epochs, the net overfitted, which I will display in detail below. Furthermore, the optimization of the ca. 240,000 parameters took a significant amount of time per epoch. This slows down the process of finding the correct hyperparameters, such as the learning rate or the number of hidden neurons. This post will be about validation, which we can use to reduce overfitting, and batch learning, which speeds up the training phase.

In this and the next posts, we will do digit recognition using the MNIST dataset. It consists of 70,000 images of handwritten digits (0 to 9). Our current net outputs one value, so in this post, we will modify it. Thus, it will be able to tackle classification tasks with more than two classes.

In this post, we will experiment with our neural network. We will test out values for hyperparameters such as the learning rate and the number of hidden neurons.
Read more

How do neural networks learn? So far, we implemented a single neuron and derived the update rules for its weights, i.e. let it learn. In the last post, we created a neural network consisting of three neurons and already implemented the generation of its outputs. In this post, we will find out, how to update the weights in the neural network. Thus, we will enable our network to learn by adding just 8 lines to the code.

In the last post, we talked about linear separability. We observed that our neuron fails to learn not linearly separable datasets like the XOR dataset. In this post, we will expand to a net of neurons that can learn more complex functions – a neural network.

We successfully extended our neuron, so that it can handle all datasets generated by a linear continuous function. Now, we will move to classification tasks, meaning that the target values in the dataset are discrete and not continuous.

Regression vs. Classification

So far, all problems were regression tasks. Our neuron was given two input variables, which it used to predict a continuous target variable. In the last problem of the last post, we introduced a classification problem. The solution for an input vector was either 0 or 1. Our neuron failed to solve this problem because it can only fit a linear function for solving the problem. It generates a least squares fit. For linear regression tasks, this suffices, but for classification problems, the output value should not change linearly, but abruptly.

In the last post, we created a neuron that was able to learn a dataset generated by a simple linear function \(y = 0.58 x_1 + 0.67 x_2\). Now, we will modify the function behind our dataset just a bit and suddenly our neuron fails to predict the target variable accurately. We will identify the problem and modify the neuron. Thus, we will make the neuron more flexible.
$$\vec{\hat{y}} = \vec{x} \cdot \vec{w} \color{orange}{+ \vec{b}}$$
Read more

This is the first post of a series about understanding Deep Neural Networks. We will start with the core component of artificial neural networks - the neuron. We use a single artificial neuron to learn a simple dataset.

Task

Let's say we want to predict how much money we will probably earn in 2016. We take a look at our financial data and see that there seems to be some kind of correlation between our gross salary, our heritage and how much money we actually earn in that year. But instead of googling tax rates, we decide to throw Machine Learning at the problem and hope that it does all the thinking for us.

This week, I will have a technical interview for a Software Engineering internship. To solve coding problems it is essential to remember the most frequently used data structures. It is even better to know how they are implemented. So, in order to structure my thoughts, I will cover the major data structures and their implementation in Java 8.
Read more

In a geography exam, the correct answer would be the left / upper one. It displays the actual positions of four cities in the US. But that does not make the other map incorrect. It just displays other data. Specifically, it approximates the travel times between the four cities. This means that the closer two cities are on the right map the faster you can travel between them with public transport. We can calculate such maps using Multidimensional Scaling. What is Multidimensional Scaling? How can it help us to approximate travel times? And what is the relationship between the left map with the actual positions and the right map? We are about to find out.
Read more

I found out some helpful and some surprising properties of Java 6. Did you know that to this day, Java has Go To statements? Here are eight facts about the language I was not aware of. Would you have known all of them?
Read more