I was clustering around 40000 points using kmean algorithm. In the first version of the program I wrote the euclidean distance function like this
var euclideanDistance = function( p1, p2 ) { // p1.length === p2.length == 3
var sum = 0;
for( var i in p1 ){
sum += Math.pow( p1[i] - p2[i], 2 );
}
return Math.sqrt( sum );
};
The overall program was quite slow taking on average 7sec to execute. After some profiling I rewrote the above function like this
var euclideanDistance = function( p1, p2 ) { // p1.length === p2.length == 3
var sum = 0;
for( var i = 0; i < p1.length; i++ ) {
sum += Math.pow( p1[i] - p2[i], 2 );
}
return Math.sqrt( sum );
};
Now the programs on average take around 400ms. That's a huge time difference just because of the way I wrote the for loop. I normally don't use for..in
loop for arrays but for some reason I used it while writing this function.
Can someone explain why there is this huge difference in performance between these 2 styles?
Best Answer
Look at what's happening differently in each iteration:
i < p1.length
i
by oneVery simple and fast.
Now look at what's happening in each iteration for this:
for( var i in p1 )
It has to find next property in the object that is enumerable. With your array you know that this can be achieved by a simple integer increment, where as the algorithm to find next enumerable is most likely not that simple because it has to work on arbitrary object and its prototype chain keys.