Jump to content
  • 0
AnnieRuru

help me improve this merge sort algorithm

Question

ok I just recently cracked @KeyWorld's merge sort algorithm in this topic
http://www.eathena.ws/board/index.php?s=&showtopic=180080&view=findpost&p=1293102

here is the script
http://rathena.org/board/pastebin/jdsshuef1ic/

I have 2 questions

1.
why is that comb sort runs faster than merge sort ??

in theory, merge sort should be better than comb sort, right ?
O(n log n) is better than O(n2)

however, in practical test on rathena emulator
when .@total = 100,
comb sort run 16~32 mili-seconds
merge sort run 31~47 mili-seconds

something is wrong here

my 1st assumption is my merge sort is not efficient, maybe it can be improve
... however @Euphy told me maybe its just script engine problem ...

can someone test here with hercules emulator and confirm comb sort is faster than merge sort ?

==============================

2.
it seems I'm not able to write this kind of algorithm
I'm trying to replicate the way that I have used in my comb_sort function

dispbottom .@array[ .@r[.@i] ] +"";

so that I can list out the index of the array
I believe display with the index of the sorted array is better than getting output from sorted array
because like I said in this topic
http://rathena.org/board/topic/92070-suggestion-new-scriptcommand-search-array/?p=242896
I want the index of the sorted array to display the highest score, or the lowest number

I wish someone can point me some light on how to get the sorted index done
because comb sort use index swap, and I can easily do it, but merge sort use array merge <.<

Edited by AnnieRuru

Share this post


Link to post
Share on other sites

14 answers to this question

Recommended Posts

  • 0

Hi Annie

 

I'll check later more deeply your script to see if there are rooms for improvements, not much time currently.

But from what I remember (maybe not the case anymore) the script engine do not like recursion (call functions, return to the previous state) and that's what slowing down the script.

Share this post


Link to post
Share on other sites
  • 0

the script engine do not like recursion (call functions, return to the previous state) and that's what slowing down the script.

YES !

now I remember this topic

http://www.eathena.ws/board/index.php?showtopic=237976

we have already discussed before that *goto is not evil ... LOL !

so using multiple callfunc/function/callsub in the script is the one that slow down the script execution time

since these commands parse very ugly

 

I guess there is no way to improve this script further, since this is a script engine problem

1st question solved

 

2nd part remains

Share this post


Link to post
Share on other sites
  • 0

Seems like you can improve the script :

 

I don't really understand why you create a temporary array, and at least, your (2) problem is related to the fact that you don't bind your output array with your temporary array content.

Seems like you don't use argument 4 of mergesort and this functions can be improve to avoid executing the if condition at each recursion.

And at least in merge function the three last while() can be optimized using our favorite copyarray command.

 

If it's still slow, some solutions exists to implement the merge sort without relying on recursions :P

Share this post


Link to post
Share on other sites
  • 0

Here's some optimizations to those functions, as well as an iterative, bottom-up implementation of mergesort.

 

http://upaste.me/20dc8e

 

This is how they perform on my Hercules copy, using 1000-element arrays:

 

Mergesort (recursive):

[Debug]: script debug : 0 110000001 : time used -> 131 milli-seconds
Mergesort (bottom-up):
[Debug]: script debug : 0 110000002 : time used -> 89 milli-seconds
Combsort:
[Debug]: script debug : 0 110000003 : time used -> 98 milli-seconds
For question 2: what about a parallel array of indices, that you swap when you swap the main one?

Share this post


Link to post
Share on other sites
  • 0

You'll have even better performance if you replace the for loop with a while loop and the two condition with a ternary :)

I can't test my iterative merge sort, I'm currently stuck with a script engine bug, will have to do some work around soon.

Share this post


Link to post
Share on other sites
  • 0

I don't really understand why you create a temporary array, and at least, your (2) problem is related to the fact that you don't bind your output array with your temporary array content.

the reason that I want to create a temporary array $@tmp_array is to not mess up with the original input array

in the original version merge sort you've wrote

the .@array[] will be also sorted along with the .@output[]

when I display the .@array[] after the callfunc "mergesort", this array is sorted !

.

.

And at least in merge function the three last while() can be optimized using our favorite copyarray command.

yeah ... hahaha .. just noticed it too :D

.

.

If it's still slow, some solutions exists to implement the merge sort without relying on recursions :P

yes its still slow,

now I'm using Hercules emulator to test, the array index can go up to 2^32 (4b...)

just like Haru did, he confirm your iterative merge sort still slower than comb sort

.

.

and wait, I just knew merge sort has 2 versions !

http://en.wikipedia.org/wiki/Merge_sort

wait, just how many versions merge sort has ?

nonono .. my head gonna explode 1st hahaha !

.

.

Here's some optimizations to those functions, as well as an iterative, bottom-up implementation of mergesort.

yeah I immediately noticed some modification on the comb sort

but you just bug the script lol

http://upaste.me/c50310475cfb4a976

screenshot -> http://imageshack.com/a/img33/8201/81nz.jpg

obviously you only see the last few index when having over 1000 index

but the 1st few index are not sorted properly

so, I have to use back my own comb sort

 

 

honestly, how fast is your computer o.o

same test as yours

.@total = 1000; generate rand(10000);

 

comb sort -> 468-530 mili-seconds

iterative merge sort -> 484-499 mili-seconds

bottom-up merge sort -> 327-343 mili-seconds (*own*)

.

.

2nd,

keyworld is right

while-loop is faster than for-loop when parse in script engine

the is even a significant difference with this script

same test as before

your original for-loop -> 343-359 mili-seconds

I change into while-loop -> 327-343 mili-seconds

 

keyworld has proposed an improvements to the looping with rathena script engine

http://rathena.org/board/topic/81457-hardcoded-script-commands-improvement/

any chance this might get implement in hercules ?

.

.

Seems like you don't use argument 4 of mergesort and this functions can be improve to avoid executing the if condition at each recursion.

For question 2: what about a parallel array of indices, that you swap when you swap the main one?

that's because I want to output the sorted array index

not the sorted array result

Haru also copyarray .@arr from getarg(0) in the comb sort function

which is not I want

.

.

http://rathena.org/board/topic/92070-suggestion-new-scriptcommand-search-array/?p=242896

when search through an index, the script usually already has 2 or more array being called

example like

getinventorylist

@inventorylist_id[] ...

@inventorylist_amount[]

 

if you want to sort @inventorylist_id array, the @inventorylist_amount array needs to sort as well

here is a real example

http://upaste.me/1d2110474e08867d3

screenshot -> http://imageshack.com/a/img33/778/oyn5.jpg

EDIT: -> forgot to tell getitemname2 function come from here

if you try to do dispbottom .@output[.@i], .@output is actually just the index of the array

I want to sort the ID, I also want to sort the parallel array such as @inventorylist_amount and so on

so I only output the sorted index and put them in as @inventorylist_id[ .@output[.@i] ]; and so on along with the parallel arrays

 

and I have no idea how to do like this on iterative merge sort,

it seems the original array is being sorted as well

but @Haru bottom-up merge sort doesn't seem to be the case

maybe haru can write one xD

 

======== EDIT ===================

 

summary ... I think this post too long read

 

1. throw away the iterative merge sort

2. I want Hercules core developer to take a look at keyworld's script command optimizations and get it implement

3. I like haru's bottom-up merge sort, any way to get that function to output sorted index ?

Edited by AnnieRuru

Share this post


Link to post
Share on other sites
  • 0

You'll have even better performance if you replace the for loop with a while loop and the two condition with a ternary :)

Yup, I think so too (decreasing readability but oh well, if a script is performance-critical) 

 

yeah I immediately noticed some modification on the comb sort

but you just bug the script lol

http://upaste.me/c50310475cfb4a976

screenshot -> http://imageshack.com/a/img33/8201/81nz.jpg

obviously you only see the last few index when having over 1000 index

but the 1st few index are not sorted properly

so, I have to use back my own comb sort

Ah I just noticed an error in the script. In the do/while condition, there's a .@gap > 10 that should be .@gap10 > 10 (I had to introduce that extra gap10 variable because of some checks that were getting messed up because of the division by 1.3, and forgot to update it through the script.

honestly, how fast is your computer o.o

same test as yours

.@total = 1000; generate rand(10000);

 

comb sort -> 468-530 mili-seconds

iterative merge sort -> 484-499 mili-seconds

bottom-up merge sort -> 327-343 mili-seconds (*own*)

Not so much compared to today's standards, it's a late-2009 iMac with an i7-870 cpu... But it still works reasonably well :P

2nd,

keyworld is right

while-loop is faster than for-loop when parse in script engine

the is even a significant difference with this script

same test as before

your original for-loop -> 343-359 mili-seconds

I change into while-loop -> 327-343 mili-seconds

Yes, this is certainly true. As keyworld said, they compile to different assembly, and the for one calls more jump commands.

keyworld has proposed an improvements to the looping with rathena script engine

http://rathena.org/board/topic/81457-hardcoded-script-commands-improvement/

any chance this might get implement in hercules ?

With some luck, we might be able to get an even faster implementation. Just theoretical right now, but there's something in the works :)

Haru also copyarray .@arr from getarg(0) in the comb sort function

which is not I want

That's to avoid affecting the original array, and let the function work on a local copy when it does the swaps.

http://rathena.org/board/topic/92070-suggestion-new-scriptcommand-search-array/?p=242896

when search through an index, the script usually already has 2 or more array being called

example like

getinventorylist

@inventorylist_id[] ...

@inventorylist_amount[]

 

if you want to sort @inventorylist_id array, the @inventorylist_amount array needs to sort as well

here is a real example

http://upaste.me/1d2110474e08867d3

screenshot -> http://imageshack.com/a/img33/778/oyn5.jpg

EDIT: -> forgot to tell getitemname2 function come from here

if you try to do dispbottom .@output[.@i], .@output is actually just the index of the array

I want to sort the ID, I also want to sort the parallel array such as @inventorylist_amount and so on

so I only output the sorted index and put them in as @inventorylist_id[ .@output[.@i] ]; and so on along with the parallel arrays

 

 

Hmm, I see. Then how about, instead of using the .@output array as index, you sort @inventorylist_id, and whenever you do one swap on it, you do the very same swap on @inventorylist_amount as well?

 

Here's two examples (untested, sorry if there's any error). First one sorts in place *both* the original arrays, so you just need to print @inventorylist_id[.@i] and @inventorylist_amount[.@i]. Second one should be what you wanted, it doesn't sort the original arrays at all, and only returns the array of sorted indices, so that you need to print @inventorylist_id[ .@output[.@i] ] and @inventorylist_amount[ .@output[.@i] ]:

 

http://upaste.me/5c1c8f

(changing the for() to while() is left as an exercise to the reader :P)

 

//Edit: did I ever say I hate IPB's broken bbcode parser?

Share this post


Link to post
Share on other sites
  • 0

Ah I just noticed an error in the script. In the do/while condition, there's a .@gap > 10 that should be .@gap10 > 10 (I had to introduce that extra gap10 variable because of some checks that were getting messed up because of the division by 1.3, and forgot to update it through the script.

wow ! that's really fix it

http://upaste.me/a63a10487f4315bee

my comb sort -> 468-514 miliseconds

your comb sort -> 483-500 miliseconds

although it doesn't seem to improve the execution time

but the range/varies seems to be more stable

... maybe I can learn a thing or two from here XD

.

.

.... Second one should be what you wanted, it doesn't sort the original arrays at all, and only returns the array of sorted indices, so that you need to print @inventorylist_id[ .@output[.@i] ] and @inventorylist_amount[ .@output[.@i] ]:

 

http://upaste.me/5c1c8f

YES exactly what I want

http://upaste.me/78f11048686c8fbc4

everything works perfect

this merge sort will be the one I'll be using from now on

 

its solved now

 

you know ... for some reason the answer, merge-sort that I want runs 450 mili-seconds

but your original one on post#5 only runs 350 mili-seconds

however its still 50 mili-seconds faster than comb-sort :P

Share this post


Link to post
Share on other sites
  • 0

Yep, it's slower because it does more operations, since it needs to swap values from two arrays instead of one~

Share this post


Link to post
Share on other sites
  • 0

Optimized version of the merge sort index :

function	script	mergeSortIndex	{	.@size = getarg(2);	copyarray .@arr, getarg(0), .@size;	while (.@i < .@size) {		set .@idx[.@i], .@i;		.@i++;	}	.@width = 1;	while (.@width < .@size) {		.@left = 0;		while (.@left < .@size) {			.@middle = .@size < .@left + .@width     ? .@size : .@left + .@width;			.@right  = .@size < .@left + .@width * 2 ? .@size : .@left + .@width * 2;			.@l      = .@left;			.@m      = .@middle;			while (.@l < .@middle && .@m < .@right) {				if (.@arr[.@l] < .@arr[.@m]) {					.@tmp_arr[.@left] = .@arr[.@l];					.@tmp_idx[.@left] = .@idx[.@l];					.@l++;				} else {					.@tmp_arr[.@left] = .@arr[.@m];					.@tmp_idx[.@left] = .@idx[.@m];					.@m++;				}				.@left++;			}			if (.@middle - .@l) {				copyarray .@tmp_arr[.@left], .@arr[.@l], .@middle - .@l;				copyarray .@tmp_idx[.@left], .@idx[.@l], .@middle - .@l;				.@left += .@middle - .@l;			}			if (.@right - .@m) {				copyarray .@tmp_arr[.@left], .@arr[.@m], .@right - .@m;				copyarray .@tmp_idx[.@left], .@idx[.@m], .@right - .@m;				.@left += .@right - .@m;			}		}		.@width *= 2;		copyarray .@arr, .@tmp_arr, .@size;		copyarray .@idx, .@tmp_idx, .@size;	}	copyarray getarg(1), .@tmp_idx, .@size;	return;}

Share this post


Link to post
Share on other sites
  • 0

that's cheating !
just because you understand more about *eathena scripting engine works
this function is only optimized with *eathena/hercules scripting engine only LOL !

 

but yeah

http://upaste.me/27f51050340b8af47

haru's merge sort -> 452-468 mili-seconds

keyworld's merge sort -> 405-422 mili-seconds

 

actually I'm not that amazed though

I just learn that set getelementofaray( getarg(x) ) is not a good practice <.<

.

.

now I have a quick question ...

is bottom-up merge sort and top-down merge sort has any advantages in certain scenario ?

example like cocktail sort has advantages if the array has already in inverted order ready ...

Share this post


Link to post
Share on other sites
  • 0

It's not cheating, it's just reducing commands calls :)

The best for now is counting sort in Hercules. Can't write it now in my phone but it will execute faaaar faster.
 
Edit: So I checked today in a slow server, with some stupids greats optimizations in all functions. Here my results using 1000 elements :

[Debug]: script debug : 0 110144879 : ==== [SORT] Testing for 1000 elements.[Debug]: script debug : 0 110144879 : [merge_sort] Time used -> 405 ms [PASSED][Debug]: script debug : 0 110144879 : [comb_sort] Time used -> 394 ms [PASSED][Debug]: script debug : 0 110144879 : [counting_sort] Time used -> 57 ms [PASSED] [Debug]: script debug : 0 110144878 : ==== [SORT INDEX] Testing for 1000 elements.[Debug]: script debug : 0 110144878 : [merge_sort_index] Time used -> 608 ms [PASSED][Debug]: script debug : 0 110144878 : [comb_sort_index] Time used -> 483 ms [PASSED]

 

So as you can see, a hacky Comb Sort is faster than a hacky merge sort in all cases.

If you want to cheat even more, you can use my hackyyyy counting sort (but it does not sort indexes).

 

Have fun: http://rathena.org/board/pastebin/2kto2abymrp/

Edited by KeyWorld

Share this post


Link to post
Share on other sites
  • 0

rathena support .@i++

but doesn't support ++.@i

so no, it doesn't support on rathena

.

.

The best for now is counting sort in Hercules.

If you want to cheat even more, you can use my hackyyyy counting sort (but it does not sort indexes).

that's the reason I overlook counting sort as an option

because in practical use there is always a parallel array

originally I also thought counting sort cannot output as the array index

but after looking at your sample ... now I have a 2nd opinion ...

 

your so-called hacking method is not new to me because you have already demonstrated 3 years ago in this shuffle algorithm topic

and I also able to reproduced it until this version

it was made with rathena script engine limitation in mind,

so now I have to update it again to make it compatible with hercules new scripting engine

.

.

in your sample, counting sort doesn't include sorting array index so I think I have to do it

I think it might be possible by faking a 2-dimensional array like

.@tmp[.@_arr[.@i] + .@neg]++;
into
setd ".@tmp_"+( .@arr[.@i] + .@neg )+"["+ getarraysize( getd( ".@tmp_"+( .@arr[.@i] + .@neg ) ) ) +"]", .@idx[.@i];
and this will still retain the complexity of O(n) Edited by AnnieRuru

Share this post


Link to post
Share on other sites
  • 0

So rathena sux :(

Only implement a buggy version of .@var++ sux a lot...

So let say I forget about rathena for this script now.

 

I know about getd() and setd(), the only problem with this command is the memory leak it create (specially in this case), so I'm not fan of it anymore :(

So here a new version, with the *counting_sort_index*, and also a big bonus : asm functions (for fun).

 

http://upaste.me/2d0f10561576aed88

 

[Debug]: script debug : 0 110198899 : ==== [SORT INDEX] Testing for 1000 elements.[Debug]: script debug : 0 110198899 : [merge_sort_index] Time used -> 610 ms [PASSED][Debug]: script debug : 0 110198899 : [merge_sort_index_asm] Time used -> 578 ms [PASSED][Debug]: script debug : 0 110198899 : [comb_sort_index] Time used -> 508 ms [PASSED][Debug]: script debug : 0 110198899 : [comb_sort_index_asm] Time used -> 476 ms [PASSED][Debug]: script debug : 0 110198899 : [counting_sort_index] Time used -> 94 ms [PASSED][Debug]: script debug : 0 110198899 : [counting_sort_index_asm] Time used -> 91 ms [PASSED][Debug]: script debug : 0 110198900 : ==== [SORT] Testing for 1000 elements.[Debug]: script debug : 0 110198900 : [merge_sort] Time used -> 407 ms [PASSED][Debug]: script debug : 0 110198900 : [merge_sort_asm] Time used -> 378 ms [PASSED][Debug]: script debug : 0 110198900 : [comb_sort] Time used -> 385 ms [PASSED][Debug]: script debug : 0 110198900 : [comb_sort_asm] Time used -> 343 ms [PASSED][Debug]: script debug : 0 110198900 : [counting_sort] Time used -> 57 ms [PASSED][Debug]: script debug : 0 110198900 : [counting_sort_asm] Time used -> 53 ms [PASSED]

Share this post


Link to post
Share on other sites
  • 0

since keyworld counting sort doesn't really work

-> it display all same index in the output array

I write my own counting sort

 

use upaste then

http://upaste.me/9aa421992effb80b1

 

now test how fast is it compare to merge sort, comb sort and counting sort

 

http://upaste.me/42e2219938d15d0ff

Haru's merge sort -> 499~531 mili-seconds

Keyworld's merge sort -> 453~468 mili-seconds

Haru's comb sort -> 436~453 mili-seconds

Keyworld's comb sort -> 390~437 mili-seconds

counting sort -> 109~125 mili-seconds

 

I'll stick to my counting sort from now on

function	script	counting_sort_index	{	.@total = .@size = getarg( 2, getarraysize( getarg(0) ) );	copyarray .@arr, getarg(0), .@size;	while ( .@i < .@size ) {		setd ".@index_"+ .@arr[.@i] +"["+( .@tmp[.@arr[.@i]] )+"]", .@i;		.@tmp[.@arr[.@i]]++;		.@i++;	}	do {		.@index = getarraysize(.@tmp) -1;		.@tmp[.@index]--;		.@out[.@size-1] = getd( ".@index_"+ .@index +"["+( .@tmp[.@index] )+"]" );		.@size--;	} while( .@size );	copyarray getarg(1), .@out, .@total;	return;}
addtionally, asm method doesn't run faster anymore, seems haru already upgrade our script engine Edited by AnnieRuru

Share this post


Link to post
Share on other sites
  • 0

wow ... after I learn about the prefix increment ... it really do run faster than post-fix increment

 

http://upaste.me/a70722381339369d1

 

postfix increment -> 109~125 mili-seconds ( same as above )

prefix increment -> 93~109 mili-seconds

 

now stick to this type

//	callfunc "counting_sort_index", <input array>, <sorted array> {, no. of elements };function	script	counting_sort_index	{	.@total = .@size = getarg( 2, getarraysize( getarg(0) ) );	copyarray .@arr, getarg(0), .@size;	while ( .@i < .@size )		setd ".@index_"+ .@arr[.@i] +"["+( .@tmp[.@arr[.@i]]++ )+"]", .@i++;	do {		.@index = getarraysize(.@tmp) -1;		.@out[--.@size] = getd( ".@index_"+ .@index +"["+( --.@tmp[.@index] )+"]" );	} while( .@size );	copyarray getarg(1), .@out, .@total;	return;}

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Answer this question...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...

×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use.