bWolfie 138 Posted August 16, 2018 Hello, Recently my map server crashed. Hoping somebody can help me from this gdb debug information. First it said there was an overflow in script, at script_reg_destroy at if( p->value ) Then it crashed with this log on 'bt full'. https://pastebin.com/q50NEiDD Quote Share this post Link to post Share on other sites
0 4144 364 Posted August 16, 2018 you using any plugins or mods? what commit you using for hercules? From stack look like it different or may be plugin intercept some vars changes? crash because sd is malformed. In some other place look like was null pointer but server not crashed. Quote Share this post Link to post Share on other sites
0 bWolfie 138 Posted August 16, 2018 I'm using quite a few plugins. I've tried to disable them one by one but it is difficult to find what is the cause. Quote Share this post Link to post Share on other sites
0 4144 364 Posted August 16, 2018 also if possible try build server with sanity flags enabled. not sure how install missing packages on centos. and also need gcc 5 or newer. Quote Share this post Link to post Share on other sites
0 bWolfie 138 Posted August 16, 2018 (edited) 3 hours ago, 4144 said: also if possible try build server with sanity flags enabled. not sure how install missing packages on centos. and also need gcc 5 or newer. Thanks for your responses. I am unable to use the enable-sanitize=full option. It tells me 'configure: error: zlib library not found or incompatible...', despite the fact I have installed that dependency (version 1.2.7). The crashes are referencing variables that aren't being used by any script (but were used in the past). And I never made any edit to any src in pc or script other than adding some script command via plugin. This part here, is where pc_setglobalreg(sd, num, val); occurs. Are there any immediate quickfix options like clearing my char_reg_num/str_db? Or that wouldn't make a difference? name=0x7fffffffe0d0 "newbquest", value=0x1, ref=0x0) at script.c:3573 Edited August 17, 2018 by Myriad Quote Share this post Link to post Share on other sites
0 4144 364 Posted August 17, 2018 this configure error mean some packages not installed. try install packages: libasan liblsan libubsan from crash stack you have non latest hercules or modded hercules. i already asked about commit. What hercules commit id you using? without it impossible to check what was wrong here. this line said error on empty line and in other function, this mean stack totally wrong: #5 0x00000000004580cf in chrif_parse (fd=14340) at chrif.c:1645 Quote Share this post Link to post Share on other sites
0 bWolfie 138 Posted August 18, 2018 I wasn't able to enable sanitize after installing those packages. I cleared my char_reg_num_db and things have been okay for 36 hours now. I will post again if things become an issue. Quote Share this post Link to post Share on other sites
0 bWolfie 138 Posted August 19, 2018 I don't know what to do. Seems all sorts of things can cause a crash. Now pc_setregistry did it. https://pastebin.com/Nc5e03gH For reference, I am using src mods (Gepard Shield) and some plugins of my own (various edit). The #BG_TIE variable is being called using pc_setglobalreg(sd, script->add_str("#BG_TIE"), pc_readglobalreg(sd, script->add_str("#BG_TIE")) + 1); pc_setglobalreg(sd, reference_uid(script->add_str("#BG_TIE"), month), pc_readglobalreg(sd, reference_uid(script->add_str("#BG_TIE"), month)) + 1); #1: pc.c/9816 p = ers_alloc(pc->num_reg_ers, struct script_reg_num); #2 script.c/3573 [code:c] case '\'': set_reg_instance_num(st, num, name, val); return 1; default: if (ref) { script->set_reg_pc_ref_num(st, ref, num, name, val); } else { pc_setglobalreg(sd, num, val); //<<<< Here } return 1; [/code] #3 intif,c/1349 script->set_reg(NULL,sd,reference_uid(script->add_str(key), index), key, (const void *)h64BPTRSIZE(ival), NULL); #4 intif.c/2892 case 0x3804: intif->pRegisters(fd); break; #5 chrif.c/1645 if (cmd < 0x2af8 || cmd >= 0x2af8 + ARRAYLENGTH(chrif->packet_len_table) || chrif->packet_len_table[cmd-0x2af8] == 0) { int result = intif->parse(fd); // Passed on to the intif // <<<here #6 socket.c/1418 sockt->session[i]->func_parse(i); #7 core.c/557 sockt->perform(next); Quote Share this post Link to post Share on other sites
0 4144 364 Posted August 19, 2018 issue in not this call stack some where you had null pointer issue but server not crashed, and server used wrong sd pointer after this. try remove plugins and gepard and try to crash server. or use sanitize flags to see real issue. Quote Share this post Link to post Share on other sites
0 bWolfie 138 Posted August 21, 2018 (edited) I managed to --enable-sanitize=full by installing packages miniz and zopfli (not sure which one did it). Hopefully I will be able to debug this now. Edit 1:Despite installing the same packages, I could only get sanitize to work on my production server for some reason, so makes testing hard. I assume this has something to do with any edits I made to pc.c am I correct? or could it be something doing pc->function? ================================================================= ==2032== ERROR: AddressSanitizer: heap-buffer-overflow on address 0x7f23a8ea442c at pc 0x730c95 bp 0x7fffb665a940 sp 0x7fffb665a930 READ of size 4 at 0x7f23a8ea442c thread T0 #0 0x730c94 (/home/user/Hercules/map-server+0x730c94) #1 0x823e0b (/home/user/Hercules/map-server+0x823e0b) #2 0x8cbb5a (/home/user/Hercules/map-server+0x8cbb5a) #3 0x8d25e2 (/home/user/Hercules/map-server+0x8d25e2) #4 0x715134 (/home/user/Hercules/map-server+0x715134) #5 0xa63330 (/home/user/Hercules/map-server+0xa63330) #6 0xa63458 (/home/user/Hercules/map-server+0xa63458) #7 0x7281a7 (/home/user/Hercules/map-server+0x7281a7) #8 0x729b78 (/home/user/Hercules/map-server+0x729b78) #9 0x6d1fea (/home/user/Hercules/map-server+0x6d1fea) #10 0x409ef1 (/home/user/Hercules/map-server+0x409ef1) #11 0x7f23b015b444 (/usr/lib64/libc-2.17.so+0x22444) #12 0x40a622 (/home/user/Hercules/map-server+0x40a622) 0x7f23a8ea442c is located 139791134507871 bytes to the right of global variable '<null>' (0x4d) of size 128 ASAN:SIGSEGV ==2032== AddressSanitizer: while reporting a bug found another one.Ignoring. Edited August 21, 2018 by Myriad Quote Share this post Link to post Share on other sites
0 4144 364 Posted August 21, 2018 gcc also should be atleast 5.0 version. 4.9 partially may works Quote Share this post Link to post Share on other sites
0 bWolfie 138 Posted August 22, 2018 Yep I updated to GCC 7.3 and can't enable it. It is working in one of my servers but I can't use it cause it's live. The other servers I tried I just keep getting ./configure --enable-sanitize=full . .. ... checking for library containing inflateEnd... no configure: error: zlib library not found or incompatible... stopping Quote Share this post Link to post Share on other sites
0 Asheraf 123 Posted August 22, 2018 29 minutes ago, Myriad said: Yep I updated to GCC 7.3 and can't enable it. It is working in one of my servers but I can't use it cause it's live. The other servers I tried I just keep getting ./configure --enable-sanitize=full . .. ... checking for library containing inflateEnd... no configure: error: zlib library not found or incompatible... stopping Then install zlib1g-dev library since it clearly states it's missing Quote Share this post Link to post Share on other sites
0 bWolfie 138 Posted August 22, 2018 (edited) 1 hour ago, Asheraf said: Then install zlib1g-dev library since it clearly states it's missing yum list installed .... zlib.x86_64 1.2.7-17.el7 installed zlib-debuginfo.x86_64 1.2.7-17.el7 @base-debuginfo zlib-devel.x86_64 1.2.7-17.el7 @base zopfli.x86_64 1.0.1-1.el7 @epel I'm using CentOS 7 that package is not available. I have installed zlib-devel. Edit: I am now trying debian and that flag has worked. Edited August 22, 2018 by Myriad Quote Share this post Link to post Share on other sites
0 4144 364 Posted August 22, 2018 @Myriad can you show config.log after failed configure run? inside this file will be actual error why it cant find zlib. it can be wrong flags, missing files etc. can be anything. Quote Share this post Link to post Share on other sites
0 bWolfie 138 Posted August 22, 2018 Here is my ./configure --enable-sanitize=full log. I think maybe it's a CentOS issue?https://pastebin.com/5kUksZ7R Quote Share this post Link to post Share on other sites
0 4144 364 Posted August 22, 2018 this is console output. but need config.log file Quote Share this post Link to post Share on other sites
0 bWolfie 138 Posted August 23, 2018 I managed to compile on my live server. I got this crash info. Would you mind taking a look at it and seeing if you can decipher anything? https://pastebin.com/M3YCHpkk Quote Share this post Link to post Share on other sites
0 4144 364 Posted August 23, 2018 look like compiled without debug info? need debug info. configure flag --enable-debug Or probably because you run with gdb at same time. Anyway what is code at pc.c:9909? look like error in this line Quote Share this post Link to post Share on other sites
0 bWolfie 138 Posted August 23, 2018 When compiling my server, I did: make clean ./configure --enable-debug=gdb --disable-lto --enable-sanitize=full make sql plugins pc.c: 9909 = pc_eventtimer - npc->event(sd,p,0); Quote Share this post Link to post Share on other sites
0 4144 364 Posted August 23, 2018 you have very old hercules? or heavy modifief? i asked some times already what hercules commit you using. or if you not using git, say atleast date of hercules sources. or better show whole function pc_eventtimer Quote Share this post Link to post Share on other sites
0 bWolfie 138 Posted August 23, 2018 Yup it's quite modified. My Hercules is up to date, as in I have merged current master with my source. Src: 5118dceb5c1c5cad7d0c06137a9b1eee2acbe4e8 Scripts: 20f045c8de9f5fb6dde5bb8c8da6306facf2517c static int pc_eventtimer(int tid, int64 tick, int id, intptr_t data) { struct map_session_data *sd=map->id2sd(id); char *p = (char *)data; int i; if(sd==NULL) return 0; ARR_FIND( 0, MAX_EVENTTIMER, i, sd->eventtimer[i] == tid ); if( i < MAX_EVENTTIMER ) { sd->eventtimer[i] = INVALID_TIMER; sd->eventcount--; npc->event(sd,p,0); // pc.c: 9909 here } else ShowError("pc_eventtimer: no such event timer\n"); if (p) aFree(p); return 0; } Quote Share this post Link to post Share on other sites
0 4144 364 Posted August 23, 2018 probably you show wrong function, or you running other server binary, or may be some corruption. because from pc_eventtimer called npc->event. and sd in pc_eventtimer is NULL, but because here check for NULL, it cant call npc->event.. another thing try disable memory manager, because it hiding memory errors. make clean ./configure --enable-debug=gdb --disable-lto --enable-manager=no --enable-sanitize=full make sql plugins Quote Share this post Link to post Share on other sites
0 bWolfie 138 Posted August 23, 2018 Thank you. Seems there were some errors in two buildins used in my scripts. 1. npcshopdelitem(), which was breaking every time it was used. 2. setmapflag(), breaking sometimes when setting 'mf_zone'. I will update again if further issues persist. Quote Share this post Link to post Share on other sites
0 bWolfie 138 Posted August 26, 2018 (edited) I received this crashlog. I don't know what caused it, some plugins were referenced: https://pastebin.com/W2wXWDrg Tried checking over all those plugins for null pointer. Disabled afk, unit, status, trade and itembonus, then it happened later, referencing again the ones I didn't disable. Edited August 26, 2018 by Myriad Quote Share this post Link to post Share on other sites
0 4144 364 Posted August 26, 2018 look like you have disabled sanitize flags and enabled memory manager. enable sanitize and disable memory manager and probably you will get better crash. and please fix gcc, last crash report from sanitize was almost useless, because missing libs or packages. was no correct stack and additional info. Quote Share this post Link to post Share on other sites
Hello,
Recently my map server crashed. Hoping somebody can help me from this gdb debug information.
First it said there was an overflow in script, at script_reg_destroy at if( p->value )
Then it crashed with this log on 'bt full'.
https://pastebin.com/q50NEiDD
Share this post
Link to post
Share on other sites