Versions Tested: 0.5.19c, 0.5.18d, 0.5.15d, 0.5.13b, ...
The game crashes unexpectedly with a SIGSEGV segmentation fault right before the end of battle. I first noticed this issue in the first game version where battle animations were introduced (don't remember which version). Unfortunately, as this crash seems to happen pretty rarely, it's taken me some time to try to determine what things cause the issue, which was quite difficult as the shipped binary doesn't have any debugging symbols (understandably). I've managed to reverse engineer the executable enough to guess what the issue is though, solely in hopes of this bug report being useful and not an annoying "it doesn't work".
First, some facts:
- Battle animations option can be off or on -- doesn't matter.
- It essentially never happens in early parts of the game, because...
- I think I've only ever seen this with 3 or 4 party members, almost always 4.
- It is unrelated to movement of the mouse cursor.
- The duration of the battle plays no effect.
- The speed at which you click the buttons during battle also has no effect.
- Finally, I tried slowing the game's execution down by 5-10x by having the kernel delay any syscall handling, and it ran without issue for an hour and a half before I gave up (happens after ~5 minutes while running normally).
Every single crash happens due to a memory-related issue while, or in relation to, processing the combat.gd script file. This will either be from a protection fault (due to a bad pointer value), or occasionally an instruction fault (trying to interpret garbage in memory as actual CPU code). Following are some excerpts from the debugger, heavily trimmed & massaged to fit this post.
DUMP 1: (from 0.5.19c)
Thread 1 "Strive.x86_64" received signal SIGSEGV, Segmentation fault.
0x0000000200000000 in ?? ()
=> 0x0000000200000000: Cannot access memory at address 0x200000000
(gdb) disas
Dump of assembler code for function _ZN7ClassDB12get_propertyEP6ObjectRK10StringNameR7Variant:
0x000000000155bad0 <+0>: push %r12
0x000000000155bad2 <+2>: mov %rsi,%r12
0x000000000155bad5 <+5>: push %rbp
0x000000000155bad6 <+6>: mov %rdi,%rbp
0x000000000155bad9 <+9>: push %rbx
0x000000000155bada <+10>: mov %rdx,%rbx
0x000000000155badd <+13>: sub $0x60,%rsp
0x000000000155bae1 <+17>: mov %fs:0x28,%rax
0x000000000155baea <+26>: mov %rax,0x58(%rsp)
0x000000000155baef <+31>: xor %eax,%eax
0x000000000155baf1 <+33>: mov 0x70(%rdi),%rax
0x000000000155baf5 <+37>: test %rax,%rax
0x000000000155baf8 <+40>: jne 0x155bb00 <ClassDB::get_property(Object*, StringName const&, Variant&)+48>
0x000000000155bafa <+42>: mov (%rdi),%rax
0x000000000155bafd <+45>: callq *0x30(%rax)
=> 0x000000000155bb00 <+48>: mov 0x85fb69(%rip),%rdi # 0x1dbb670 <ClassDB::classes>
(gdb) info registers
rax 0xb97d580 0xb97d580
rbx 0x7fffffffc310 0x7fffffffc310
rcx 0x7fffffffc590 0x7fffffffc590
rdx 0x7fffffffc310 0x7fffffffc310
rsi 0x9d0bf48 0x9d0bf48
rdi 0xad29480 0xad29480
rbp 0xad29480 0xad29480
rsp 0x7fffffffc270 0x7fffffffc270
r8 0x99d8510 0x99d8510
r9 0x9d0bf40 0x9d0bf40
r10 0x1dbb300 0x1dbb300
r11 0x11 0x11
r12 0x9d0bf48 0x9d0bf48
r13 0x9d0bf48 0x9d0bf48
r14 0x7fffffffc590 0x7fffffffc590
r15 0x9b0e090 0x9b0e090
rip 0x155bb00 0x155bb00 <ClassDB::get_property(Object*, StringName const&, Variant&)+48>
(gdb) x/8g $rsi
0x9d0bf48: 0x0000000002245410 0x0000000000000030
0x9d0bf58: 0x0000000000040021 0x0000000000040000
0x9d0bf68: 0x0000a1b300000001 0x000000650000000a
0x9d0bf78: 0x0000007400000078 0x0000006e00000065
(gdb) x/8g 0x2245410
0x2245410: 0x00007fff0000000b 0x0000000000000000
0x2245420: 0x0000000002243ce0 0x9aa1805700000057
0x2245430: 0x0000000009b11290 0x0000000001fe5060
0x2245440: 0x0000000000000000 0x0000000000000051
(gdb) x/8g 0x2243ce0
0x2243ce0: 0x0000006500000072 0x0000007400000063
0x2243cf0: 0x000000670000005f 0x0000006f0000006c
0x2243d00: 0x0000006100000062 0x0000005f0000006c
0x2243d10: 0x0000006f00000070 0x0000006900000073
(gdb) x/s 0x2243ce0
"rect_global_position"
This happens while processing the chunk of combat.gd that starts 'extends Node\n\nvar period = '. In this case, it's in 'ClassDB::get_property(Object*, StringName const&, Variant&) ()' trying to call the getter method for the child class, but it blows up during the call because the vtable has bad data. This suggests the object was very recently destroyed and that at least the child's destructor had already run.
I've also seen this during execution of 'Variant::evaluate(Variant::Operator const&, Variant const&, Variant const&, Variant&, bool&)+40325', implying a crash occurred while performing some operation on 'rect_global_position'. I think it was either an addition or a floor(), but I wasn't sure.
Finally, I decided to run the game but have the heap's memory reset to known values after alloc & free:
DUMP 2: (from 0.5.19c)
$ valgrind --show-mismatched-frees=no --undef-value-errors=no --malloc-fill=0xbb --free-fill=0xee ./Strive.x86_64
OpenGL ES 3.0 Renderer: NVE7
Boot splash path: res://files/buttons/loading.png
==20622== Invalid read of size 8
==20622== at 0x15B5A47: Object::get(StringName const&, bool*) const [clone .constprop.10155] (in Strive.x86_64)
==20622== Address 0x3171c620 is 80 bytes inside a block of size 1,128 free'd
==20622== at 0x48369EB: free (in vgpreload_memcheck-amd64-linux.so)
==20622== by 0xAD4966: SceneTree::_flush_delete_queue() (in Strive.x86_64)
==20622== by 0x1FFEFFF3DF: ???
==20622== by 0xF84941F: ???
==20622== by 0x1FFEFFF3DF: ???
==20622== by 0xB162A7: SceneTree::idle(float) (in Strive.x86_64)
==20622== by 0x41EFFFFF00000007: ???
==20622== by 0x205105CF: ???
==20622== by 0x3E4AAAC100FFFFFF: ???
==20622== by 0x1FFEFFF307: ???
==20622== Block was alloc'd at
==20622== at 0x48357BF: malloc (in vgpreload_memcheck-amd64-linux.so)
==20622== by 0x158DBD8: Memory::alloc_static(unsigned long, bool) [clone .constprop.10209] (in Strive.x86_64)
==20622== by 0x1FFEFFE4AF: ???
==20622== by 0x1FFEFFE4A7: ???
==20622==
==20622== Invalid read of size 8
==20622== at 0x15B5A6D: Object::get(StringName const&, bool*) const [clone .constprop.10155] (in Strive.x86_64)
==20622== Address 0xeeeeeeeeeeeeeeee is not stack'd, malloc'd or (recently) free'd
==20622==
==20622== Process terminating with default action of signal 11 (SIGSEGV)
==20622== General Protection Fault
==20622== at 0x15B5A6D: Object::get(StringName const&, bool*) const [clone .constprop.10155] (in Strive.x86_64)
The only way that it could be trying to read 0xeeeeeeeeeeeeeeee is by reading after a free(), indicating a race condition. The "rect_global_position" seemed to be godot magic relating to positioning of how something is being drawn? Anyway, I don't really have any experience with how godot works, but one single time I was extremely lucky and actually got a printed error on 0.5.18d:
DUMP3: (from 0.5.18d) $ ./index.x86_64
OpenGL ES 3.0 Renderer: NVE7
Boot splash path: res://files/buttons/loading.png
ERROR: emit_signal: Error calling method from signal 'meta_hover_ended': 'Panel(statspanel.gd)::_on_traittext_meta_hover_ended': Method expected 1 arguments, but called with 0.
At: core/object.cpp:1202.
Segmentation fault
I don't know if this means there's an API mismatch, or if it's a misleading error just saying 0 arguments because it found a NULL pointer or something...
Anyway, this is about all I can get without debug symbols (given I don't wish to spend a couple solid weeks decompiling godot's parsing engine. Sorry for not finding the actual needle in this haystack, but I'm hoping I got close enough that you might have an idea of why this is happening.