Multi-SOM: an Algorithm for High-Dimensional, Small Size Datasets
July 13, 2012
Posted by on
Shen Lu, Richard S. Segall
Since it takes time to do experiments in bioinformatics, biological datasets are sometimes small but with high dimensionality. From probability theory, in order to discover knowledge from a set of data, we have to have a sufficient number of samples. Otherwise, the error bounds can become too large to be useful. For the SOM (Self-Organizing Map) algorithm, the initial map is based on the training data. In order to avoid the bias caused by the insufficient training data, in this paper we present an algorithm, called Multi-SOM. Multi-SOM builds a number of small self-organizing maps, instead of just one big map. Bayesian decision theory is used to make the final decision among similar neurons on different maps. In this way, we can better ensure that we can get a real random initial weight vector set, the map size is less of consideration and errors tend to average out. In our experiments as applied to microarray datasets which are highly intense data composed of genetic related information, the precision of Multi-SOMs is 10.58% greater than SOMs, and its recall is 11.07% greater than SOMs. Thus, the Multi-SOMs algorithm is practical.
Full Paper in PDF Document