I should clarify the intent of my post is mostly just a heads-up that addressing combat isn't going to be the easiest thing, nothing more. That said, two thumbs up for putting together fixes for other various annoyances in the game. I quite like this game and feel it's definitely worth attempting to maintain!
ProofByContradiction
Recent community posts
The autocombat glitch may be partially my fault, depending on what exactly you're talking about. I spent a long time stepping through memory dumps and reverse engineering how the autocombat stuff works to finally find a race condition involving the animation handling -- ref: https://itch.io/t/300782/bug-linux-x86-64-multiple-versions-race-condition-cause...
The animation actually gets run with autocombat, but it's set to a small value. Autocombat can apply all the moves and end the battle, but since the coroutines are interdependent, on a fast machine one thread can terminate before the other is serviced; here, this meant the battle was ended and characters were removed from the arena prior to the animation completing.
My fix does allow the last move to always complete instantaneously without animation, but I wanted to change as little code as possible -- I didn't exactly have a lot to work with. I would advise extreme caution, because 'here be dragons'. Any complete fix that were to work 100% consistently is very likely going to be quite a bit more work than a check here or there: the combat loop code is kind of a mess (though it actually does approximately work) and needs to be fully decoupled from rendering. If it were chunked into an event dispatch model marshalled by the animation handler, any battle glitches would be solved.
I'm on Linux and the past 10-15 releases have all been perfectly fine for me. Currently I'm on 0.5.22b; are you?
When you say the game 'gets stuck on 41%', what does that mean, exactly? My game loads more or less instantly and I don't see any progress bars. Or perhaps my hard disk is faster than yours...
Here are my SHA1 hashes of files in question -- do these match for you?
88ef5ebfefa1444cb53f25b44a9fa4d7be15a8ef Strive.x86_64 5076e06a82402d44b45a50e93fc0aba3fbde4dcc files/interactionpanel.tscn bb7b00ce4f6fa3463bb469d45c215ddc2804c273 files/scripts/LoadScreen.gd
Hey, I just (hopefully) fixed a very similar issue that I was experiencing. Sounds like it to me, anyway. If you're still having issues, maybe give my fix a try and see if it works for you? Here's the relevant thread: https://itch.io/t/300782/bug-linux-x86-64-multiple-versions-race-condition-cause...
Here's a full .patch file to auto-apply the above changes to 0.5.19c code. Probably works fine for 0.5.19d as well if there aren't major changes around those parts of the file. Just run the usual
patch -p0 < whatever_you_saved_it_as.patch
in the game's root and you're good to go.
.patch file (sorry, don't have any better means of sharing this):
--- files/scripts/combat.gd 2018-09-11 13:29:25.000000000 -0700
+++ files/scripts/combat.gd 2018-09-26 00:22:53.290045306 -0700
@@ -905,15 +905,16 @@
if skill.target == 'all':
target = targetarray
self.combatlog += '\n' + combatantdictionary(caster, target, text)
- emit_signal("skillplayed")
endcombatcheck()
+
if period == 'win':
playerwin() if period == 'skilluse':
period = 'base'
+ emit_signal("skillplayed")
func scripteffect(caster,target,script):
globals.abilities.call(script, caster, target)
@@ -1218,16 +1219,16 @@
signal defeatfinished
var ongoinganimation = false
func damagein():
- emit_signal("damagetrigger")
- ongoinganimation = false
yield(get_tree(), 'idle_frame')
+ ongoinganimation = false
+ emit_signal("damagetrigger")
func tweenfinished():
yield(get_tree(), 'idle_frame')
- emit_signal("tweenfinished")
ongoinganimation = false
+ emit_signal("tweenfinished")
func defeatfinished():
emit_signal("defeatfinished")
@@ -1244,18 +1245,21 @@
for i in timings:
timings[i] = 0.05
timings.delay2 = 0
+ timings.delaydamage = 0.0
+
globals.main.sound('attack')
ongoinganimation = true
if combatant.group == 'enemy':
change = -change
tween.interpolate_property(node, "rect_position", pos, Vector2(pos.x, pos.y-change), timings.speed1, Tween.TRANS_ELASTIC, Tween.EASE_IN_OUT)
- tween.interpolate_callback(self, timings.delaydamage, 'damagein')
+ tween.interpolate_deferred_callback(self, timings.delaydamage, 'damagein')
+
tween.interpolate_property(node, "rect_position", Vector2(pos.x, pos.y-change), pos, timings.speed2, Tween.TRANS_LINEAR, Tween.EASE_IN_OUT, timings.delay2)
- tween.interpolate_callback(self, timings.delayfinish, 'tweenfinished')
+ tween.interpolate_deferred_callback(self, timings.delayfinish, 'tweenfinished')
tween.start()
func slamanimation(combatant):
var tween = $Tween
Okay, I think there are actually two distinct but related problems -- both race conditions (!) -- that are compounding and resulting in "random" crashes during battle. The good news is I *think* I've been able to solve the problem. After my changes to combat.gd, I haven't seen a single crash. That said, I haven't had nearly as much time to test as I had before.
I'll apologize in advance for any inaccuracies in what follows I had no prior experience with godot script (I work in assembly or C/C++ usually), and the combat script is somewhat spaghettified, but I think I understand it to a moderate degree now.
Rather than processing each side fully before switching to process the other, the relevant handling is broken into a couple coroutines, which can yield until execution jumps back after some engine event occurs. The player's updates happen in the useskills() function, and the enemy's happen in enemyturn(). But there's a dependency here, because enemyturn() itself calls useskills(), possibly multiple times. This could be okay, but for this to be reliable both functions would have to be waiting on the same event. Unfortunately, there are 3-4 events involved within these functions. Since the stats logic is so deeply nested, the same pattern of yields isn't guaranteed to be consistent. Finally, on top of that, one or two of these events gets fired off in response to input or UI updates from their respective threads (full, proper threads, not coroutines). The event notification will signal the coroutine and execute other remaining handler code *while the other coroutine segment is running*. This is the race condition.
This would particularly be an issue for containers that happen to be resized ever, though all global variables would be fine. This would explain basically all the crashes I had earlier. Godot docs suggest explicit mutexes, but I instead chose to ensure better guarding around the problem areas and not let anything run until fully ready.
Fixes 1 and 2 below fix my crash.
I found a second race condition occurring at the beginning of the turn -- first time I've ever seen it not happen at the end of battle. This one's actually a race condition in the true sense -- explicit timers for damage updates are involved. Fix 3 below resolves this problem too.
This code is from 0.5.19c. I've lightly massaged this file so cited line numbers aren't exact. I'll add a .patch file with exact numbers.
combat.gd Fix 1. Guard checks & period update before jumping coroutine locations. Around line 908, turn this
self.combatlog += '\n' + combatantdictionary(caster, target, text)
emit_signal("skillplayed")
endcombatcheck()
if period == 'win':
playerwin()
if period == 'skilluse':
period = 'base'
into:
self.combatlog += '\n' + combatantdictionary(caster, target, text)
endcombatcheck()
if period == 'win':
playerwin()
if period == 'skilluse':
period = 'base'
emit_signal("skillplayed")
combat.gd Fix 2. Same reasoning/fix applied but for more-problematic thread notifications Around line 1230, change
func damagein():
emit_signal("damagetrigger")
ongoinganimation = false
yield(get_tree(), 'idle_frame') func tweenfinished():
yield(get_tree(), 'idle_frame')
emit_signal("tweenfinished")
ongoinganimation = false
to:
func damagein():
yield(get_tree(), 'idle_frame')
ongoinganimation = false
emit_signal("damagetrigger") func tweenfinished():
yield(get_tree(), 'idle_frame')
ongoinganimation = false
emit_signal("tweenfinished")
combat.gd Fix 3. Ensure delaydamage always happens immediately, still drawing slower if requested. Perform a full yield to the top modelview handler before actually performing the callback, completing the current frame first. Around line 1264.
if instantanimation == true:
for i in timings:
timings[i] = 0.05
timings.delay2 = 0 globals.main.sound('attack') ongoinganimation = true
if combatant.group == 'enemy':
change = -change
tween.interpolate_property(node, "rect_position", pos, Vector2(pos.x, pos.y-change), timings.speed1, Tween.TRANS_ELASTIC, Tween.EASE_IN_OUT)
tween.interpolate_callback(self, timings.delaydamage, 'damagein')
tween.interpolate_property(node, "rect_position", Vector2(pos.x, pos.y-change), pos, timings.speed2, Tween.TRANS_LINEAR, Tween.EASE_IN_OUT, timings.delay2)
tween.interpolate_callback(self, timings.delayfinish, 'tweenfinished')
tween.start()
becomes:
if instantanimation == true:
for i in timings:
timings[i] = 0.05
timings.delay2 = 0 timings.delaydamage = 0.0
globals.main.sound('attack') ongoinganimation = true
if combatant.group == 'enemy':
change = -change
tween.interpolate_property(node, "rect_position", pos, Vector2(pos.x, pos.y-change), timings.speed1, Tween.TRANS_ELASTIC, Tween.EASE_IN_OUT)
tween.interpolate_deferred_callback(self, timings.delaydamage, 'damagein') tween.interpolate_property(node, "rect_position", Vector2(pos.x, pos.y-change), pos, timings.speed2, Tween.TRANS_LINEAR, Tween.EASE_IN_OUT, timings.delay2)
tween.interpolate_deferred_callback(self, timings.delayfinish, 'tweenfinished')
tween.start()
I didn't go through and check I got each possible case. I haven't seen any problems after doing this, though. I definitely ask those interested to help test. I also welcome any comments from the author, whom I'd expect would know this code better than anyone.
Cheers.
The issue happens in the script processing thread while drawing the panel UI, so this problem isn't Linux-specific and would be present in the Windows build too. It's very similar to this issue: https://itch.io/t/299710/bug-game-crashing-ayneris-sword-double-atk since the crash always happens at the very end of a battle. However, it happens regardless of weapon, even if nobody has a weapon that has a chance for a second attack.
I would guess the issue from the other thread is probably the same issue, and it's actually a UI drawing issue (i.e., trying to reference parts of the panel after it's been destroyed).
Versions Tested: 0.5.19c, 0.5.18d, 0.5.15d, 0.5.13b, ...
The game crashes unexpectedly with a SIGSEGV segmentation fault right before the end of battle. I first noticed this issue in the first game version where battle animations were introduced (don't remember which version). Unfortunately, as this crash seems to happen pretty rarely, it's taken me some time to try to determine what things cause the issue, which was quite difficult as the shipped binary doesn't have any debugging symbols (understandably). I've managed to reverse engineer the executable enough to guess what the issue is though, solely in hopes of this bug report being useful and not an annoying "it doesn't work".
First, some facts:
- Battle animations option can be off or on -- doesn't matter.
- It essentially never happens in early parts of the game, because...
- I think I've only ever seen this with 3 or 4 party members, almost always 4.
- It is unrelated to movement of the mouse cursor.
- The duration of the battle plays no effect.
- The speed at which you click the buttons during battle also has no effect.
- Finally, I tried slowing the game's execution down by 5-10x by having the kernel delay any syscall handling, and it ran without issue for an hour and a half before I gave up (happens after ~5 minutes while running normally).
Every single crash happens due to a memory-related issue while, or in relation to, processing the combat.gd script file. This will either be from a protection fault (due to a bad pointer value), or occasionally an instruction fault (trying to interpret garbage in memory as actual CPU code). Following are some excerpts from the debugger, heavily trimmed & massaged to fit this post.
DUMP 1: (from 0.5.19c)
Thread 1 "Strive.x86_64" received signal SIGSEGV, Segmentation fault.
0x0000000200000000 in ?? ()
=> 0x0000000200000000: Cannot access memory at address 0x200000000
(gdb) disas
Dump of assembler code for function _ZN7ClassDB12get_propertyEP6ObjectRK10StringNameR7Variant:
0x000000000155bad0 <+0>: push %r12
0x000000000155bad2 <+2>: mov %rsi,%r12
0x000000000155bad5 <+5>: push %rbp
0x000000000155bad6 <+6>: mov %rdi,%rbp
0x000000000155bad9 <+9>: push %rbx
0x000000000155bada <+10>: mov %rdx,%rbx
0x000000000155badd <+13>: sub $0x60,%rsp
0x000000000155bae1 <+17>: mov %fs:0x28,%rax
0x000000000155baea <+26>: mov %rax,0x58(%rsp)
0x000000000155baef <+31>: xor %eax,%eax
0x000000000155baf1 <+33>: mov 0x70(%rdi),%rax
0x000000000155baf5 <+37>: test %rax,%rax
0x000000000155baf8 <+40>: jne 0x155bb00 <ClassDB::get_property(Object*, StringName const&, Variant&)+48>
0x000000000155bafa <+42>: mov (%rdi),%rax
0x000000000155bafd <+45>: callq *0x30(%rax)
=> 0x000000000155bb00 <+48>: mov 0x85fb69(%rip),%rdi # 0x1dbb670 <ClassDB::classes>
(gdb) info registers
rax 0xb97d580 0xb97d580
rbx 0x7fffffffc310 0x7fffffffc310
rcx 0x7fffffffc590 0x7fffffffc590
rdx 0x7fffffffc310 0x7fffffffc310
rsi 0x9d0bf48 0x9d0bf48
rdi 0xad29480 0xad29480
rbp 0xad29480 0xad29480
rsp 0x7fffffffc270 0x7fffffffc270
r8 0x99d8510 0x99d8510
r9 0x9d0bf40 0x9d0bf40
r10 0x1dbb300 0x1dbb300
r11 0x11 0x11
r12 0x9d0bf48 0x9d0bf48
r13 0x9d0bf48 0x9d0bf48
r14 0x7fffffffc590 0x7fffffffc590
r15 0x9b0e090 0x9b0e090
rip 0x155bb00 0x155bb00 <ClassDB::get_property(Object*, StringName const&, Variant&)+48>
(gdb) x/8g $rsi
0x9d0bf48: 0x0000000002245410 0x0000000000000030
0x9d0bf58: 0x0000000000040021 0x0000000000040000
0x9d0bf68: 0x0000a1b300000001 0x000000650000000a
0x9d0bf78: 0x0000007400000078 0x0000006e00000065
(gdb) x/8g 0x2245410
0x2245410: 0x00007fff0000000b 0x0000000000000000
0x2245420: 0x0000000002243ce0 0x9aa1805700000057
0x2245430: 0x0000000009b11290 0x0000000001fe5060
0x2245440: 0x0000000000000000 0x0000000000000051
(gdb) x/8g 0x2243ce0
0x2243ce0: 0x0000006500000072 0x0000007400000063
0x2243cf0: 0x000000670000005f 0x0000006f0000006c
0x2243d00: 0x0000006100000062 0x0000005f0000006c
0x2243d10: 0x0000006f00000070 0x0000006900000073
(gdb) x/s 0x2243ce0
"rect_global_position"
This happens while processing the chunk of combat.gd that starts 'extends Node\n\nvar period = '. In this case, it's in 'ClassDB::get_property(Object*, StringName const&, Variant&) ()' trying to call the getter method for the child class, but it blows up during the call because the vtable has bad data. This suggests the object was very recently destroyed and that at least the child's destructor had already run.
I've also seen this during execution of 'Variant::evaluate(Variant::Operator const&, Variant const&, Variant const&, Variant&, bool&)+40325', implying a crash occurred while performing some operation on 'rect_global_position'. I think it was either an addition or a floor(), but I wasn't sure.
Finally, I decided to run the game but have the heap's memory reset to known values after alloc & free:
DUMP 2: (from 0.5.19c)
$ valgrind --show-mismatched-frees=no --undef-value-errors=no --malloc-fill=0xbb --free-fill=0xee ./Strive.x86_64
OpenGL ES 3.0 Renderer: NVE7
Boot splash path: res://files/buttons/loading.png
==20622== Invalid read of size 8
==20622== at 0x15B5A47: Object::get(StringName const&, bool*) const [clone .constprop.10155] (in Strive.x86_64)
==20622== Address 0x3171c620 is 80 bytes inside a block of size 1,128 free'd
==20622== at 0x48369EB: free (in vgpreload_memcheck-amd64-linux.so)
==20622== by 0xAD4966: SceneTree::_flush_delete_queue() (in Strive.x86_64)
==20622== by 0x1FFEFFF3DF: ???
==20622== by 0xF84941F: ???
==20622== by 0x1FFEFFF3DF: ???
==20622== by 0xB162A7: SceneTree::idle(float) (in Strive.x86_64)
==20622== by 0x41EFFFFF00000007: ???
==20622== by 0x205105CF: ???
==20622== by 0x3E4AAAC100FFFFFF: ???
==20622== by 0x1FFEFFF307: ???
==20622== Block was alloc'd at
==20622== at 0x48357BF: malloc (in vgpreload_memcheck-amd64-linux.so)
==20622== by 0x158DBD8: Memory::alloc_static(unsigned long, bool) [clone .constprop.10209] (in Strive.x86_64)
==20622== by 0x1FFEFFE4AF: ???
==20622== by 0x1FFEFFE4A7: ???
==20622==
==20622== Invalid read of size 8
==20622== at 0x15B5A6D: Object::get(StringName const&, bool*) const [clone .constprop.10155] (in Strive.x86_64)
==20622== Address 0xeeeeeeeeeeeeeeee is not stack'd, malloc'd or (recently) free'd
==20622==
==20622== Process terminating with default action of signal 11 (SIGSEGV)
==20622== General Protection Fault
==20622== at 0x15B5A6D: Object::get(StringName const&, bool*) const [clone .constprop.10155] (in Strive.x86_64)
The only way that it could be trying to read 0xeeeeeeeeeeeeeeee is by reading after a free(), indicating a race condition. The "rect_global_position" seemed to be godot magic relating to positioning of how something is being drawn? Anyway, I don't really have any experience with how godot works, but one single time I was extremely lucky and actually got a printed error on 0.5.18d:
DUMP3: (from 0.5.18d) $ ./index.x86_64
OpenGL ES 3.0 Renderer: NVE7
Boot splash path: res://files/buttons/loading.png
ERROR: emit_signal: Error calling method from signal 'meta_hover_ended': 'Panel(statspanel.gd)::_on_traittext_meta_hover_ended': Method expected 1 arguments, but called with 0.
At: core/object.cpp:1202.
Segmentation fault
I don't know if this means there's an API mismatch, or if it's a misleading error just saying 0 arguments because it found a NULL pointer or something...
Anyway, this is about all I can get without debug symbols (given I don't wish to spend a couple solid weeks decompiling godot's parsing engine. Sorry for not finding the actual needle in this haystack, but I'm hoping I got close enough that you might have an idea of why this is happening.