Thank you, and interesting suggestion. I suppose I could provide program access to speech recognition functions to openly support alternative engines. I'll give this some consideration for the future.
foxster
Creator of
Recent community posts
All commands are working, but you may not be meeting the criteria. Try "Tower request visual approach" for example. You are free to make adjustments as desired to change the phrases as well. It's designed to be very easy to customize commands to the way you like them, but be sure to watch the basic tutorial first. Play around with an empty library file from scratch to quickly get a feel for it, then make alterations once it's comfortable.
Sorry, but no. The problem is that the speech recognition is a major part of this app and is only freely provided by Microsoft Windows platform. There are other speech platforms but they typically require online processing and charge fees, which I won't support. It is built on the .net core platform for portability, but its reliance on these MS components limits it. There are some open source SR engine initiatives out there so perhaps in the future, but don't hold your breath - it might be a while.
edit: There is the Kaldi SR engine which I can possibly work with in the future found here: https://github.com/kaldi-asr/kaldi
Sorry for the delay - I've been away for a bit.
I suppose then every every Voice key command string is just a command is just a single phrase you say all at once.
Yes - this is correct. This is intrinsic to how speech recognition works. It isn't a word by word operation as phonetics are quite different than written speech.
I was hoping that the software, just by the design interface, would able to single out anyone one of the commands and just excute that.
If it worked this way as you suggest, there would be no way to add flexibility to a single voice command with alternate words or phrases, and no way to distinguish which outputs to produce.
To do what you want, if I understand it, you do indeed need to have them as independent commands since both the input (spoken phrase) and the output for each is different for each one. This is exactly what they were designed for though, so it should be fine.
Are the outputs really supposed to be the same for Scan 1, 2 & 3? If not, then they have to be on separate commands. If so, then I think maybe this is what you're after? This allows for any of these phrases to work:
"Element Scan Peek"
"Element Scan Slide"
"Element Scan Pie"
I'm sorry if I'm misunderstanding you...
Yes, it does, but you need to understand the rules. First, key order doesn't matter - it's just for your convenience. Second, each key must be fulfilled by a word or phrase on the command in order for it to execute. BUT...you can put the same word on multiple keys and that doesn't mean you need to say it twice.
Example:
If you want to be able to say three things, "Element Scan Slide", "Element Scan Pie", "Element Scan Peek":
Create Voice group "Element"
Create Voice Command with two keys
Key #1: "Scan"
Key #2: "Slide", "Pie", "Peek"
This will require saying "Element" to get on the voice group, "Scan" to fulfill Key #1, and either "Slide", "Pie", or "Peek" to fulfill Key #2. If you add "Scan" to Key #2 also, then you can say "Element Scan ...(whatever else)" and it will be fulfilled as the word "Scan" fulfills both Key #1 and Key #2.
If this still doesn't help you, let me know the exact phrases you want to say to execute the command and I'll try to provide a solution.
Yes, it does. You must group it correctly on the command depending on how you want it to work - as an alternative or additional phrase. This is explained best in the basic tutorial video; https://youtu.be/0Aqbf5IWd1w
I'll look into it to see if there is a timing issue or something going on.
One thing to consider is it may be sending the keypresses too quickly causing one or two not to register. You can test that by setting a value for Key Delay under the library settings (cog wheel icon to the right of the library name in the upper left). Set it to a value of something like 10 or 20 ms and see if that helps.
It can/does handle long phrases but there are some things to watch out for, particularly with a phrase as long as the one you show. First, watch out for pauses that trigger a stop in the speech processing (particularly at 'Good day'). Speech processing processes one phrase at a time so if it is split into 2, then each one processes independently. This leads into the second issue - FoxVox works by a command word followed by special key words. In the default library the phrase must start with "Ground" to activate the Ground command group, then include the word taxi (and not include "back" or "ramp"). From your example, you would have to add "Gunsan" to the ground command group to be successful or make "Ground" a wildcard by adding an asterisk "*" behind it (meaning it doesn't have to be the first word spoken and can come anywhere in the phrase). Making it a wildcard would have negative repercussions however as "Ground" is used in other key phrases for wingman, i.e. "Two, datalink ground target" or "Two, weapons free air to ground". These commands wouldn't be executed properly because the ATC Ground command group would be activated instead of the Wingman group due to the wildcard and its earlier position in the list. I may do a revision to prioritize keywords found at the beginning of the phrase over wildcards first, but currently that's not how it works.
So from this, you need to be sure that the processed phrase is found. I experimented with your wording a bit and found that adding both "Taxi*" and "Departure*" as wildcards to the ATC Ground command group seemed to work best. I also added "Departure" as an optional word on the Request Taxi command and as a blocked word to the Request Taxi Back to Ramp command to help prevent an accidental misfire. Doing this seemed to give the best results regardless of pauses being present or not. I just had to focus on making sure "Taxi" and/or "Departure" was understood properly. Also, remember when testing and troubleshooting, be sure to enable the enhanced log to see what is being tracked as you speak. I also think I can possibly make some enhancements with really long phrases that might help in the underlying engine...I'll experiment more for the next update.
I need to see about opening a channel on discord - good suggestion. There's not many people using FoxVox currently, but who knows, might be more in the future.
Just to confirm, you can switch between voice and key configurations at the upper left (just below the library name) and you must enable key recognition as shown here:
Just so you are aware, voice commands can only be triggered by voice, however key commands will trigger based on not only key input but also trigger on conditions, which is why they reside there for the VCC. If you have enabled key commands and everything is triggering but you still don't hear anything, check to see if you have windows media player installed or disabled. They will also play if an alternative player has been set up in Windows, but you may experience issues there if they can't open through the Windows shell command.
Edit: Also make sure the VCC files which contain the audio files are present in the subfolder with the library...sometimes it's the little things that can get overlooked.
Hi there, glad to see your desire to be creative with the app. I'm not sure how well it will function for how you intend to use it, but certainly give it a try. As for the error you report, this is a strange one. The app is basically saying it is failing to find any qualified speech recognition modules on your system. What language is your Windows installed with? Can you check if there are any sub folders found in the following location:
Windows install path (default C:)\Windows\Speech\Engines\SR
For default English US for example there will be a folder call 'en-US'.
If you are finding at least one folder there, the issue may be due to a permissions error in the MS System.Speech module accessing the registry. In that case, installing the app with the installer and possibly running it as an administrator could potentially solve the issue. If there aren't any, then you don't have any recognition engines installed and you'll need to install one here:
Select the Start button, then select Settings > Time & Language > Speech
Please be aware:
- Speech Recognition is available only for the following languages: English (United States, United Kingdom, Canada, India, and Australia), French, German, Japanese, Mandarin (Chinese Simplified and Chinese Traditional), and Spanish.
There are new languages supported using the MS Speech Platform 11 but FoxVox does not have those implemented currently.
I will add in some additional error handling to detect this issue more gracefully, but see if you can investigate what I suggested to try and solve it.
This sounds like an issue with output overload which happens when every tiny movement of the analog stick causes the output to fire, which can quickly overwhelm the system. If you remove/disable the slider binding does everything else work OK? It would be good to narrow down the culprit. Also, just to check, does the system bog down or have any problem when you move the axis while the Key Recognition is disabled (keyboard button second to last in the toolbar)?
Welcome back to BMS...glad you like the app! One thing that comes to mind is make sure that in the settings (little cog wheel icon by library name) you allow any process (by not having any assigned) or include it in the allowed process list. I'm not sure this is the issue though, as you say the built in ones are working. Make sure the command window shows that the voice command is being activated/recognized, and make sure the outputs are defined properly. If you can detail a specific question, I'll be glad to help either here or in the BMS forum.
Regards, Foxster
Thanks Mistral. I must confess, these have been resolved already and are part of the next release, but it's been slow in coming due to the intricate and complex addition of supporting variables on inputs. This is part of the next phase to prepare for full dynamic integration with BMS (and potentially other apps). In the future, hopefully by BMS v4.38, FoxVox will automatically utilize callsigns, detect which seats are human vs AI, automatically set PTT to match UHF and VHF assigned buttons, and use direct callbacks rather than menu assignments for all comms. It will also auto-bind brakes, flight controls, and buttons used by the Virtual Crew Chief which are set in BMS. There's a lot of work going into this and all of it will make FoxVox more versatile in the end. In the meantime, thank you for pointing out these issues - please continue to do so, and thanks for your patience. Look for the next update releasing soon!
I'm looking into this further, but for now make sure that the Windows speech recognition is set to match the same language selected in FoxVox, and also make sure the Windows UI is set to the same language also.
I will be fixing/improving this area soon as other's have reported the same issue and it really shouldn't be as quirky as it is.
Well, I think the cross section of users is pretty small, so you alone probably make up a fair portion. It's not hard to add and won't compromise existing functionality, so I suppose I'll do it. I'm sure you won't be the only one with the issue. Glad you found and fixed it. If you like the app, be sure to pick up the next version coming out soon.
Good to know - thanks for sharing your issue here. When you play a sound from an output command, you have the option of disabling the internal (wmp) player and have it play with through the associated shell extension which would in your case launch vlc and play it that way. The downside is it needs to launch the app through the Windows shell (meaning the file extension must be associated in Windows with vlc). Unfortunately this is currently only available on sounds played through an output. Based on your feedback I'm considering adding a global setting that will disable wmp and play all sounds through the Windows shell. I could also have it detect and set it automatically if wmp is not installed on the system. This might fix your situation...any thoughts?
Yes wav files are fine, as are other formats. What you're doing should work fine. If you're using headphones be sure other computer sounds aren't muted and are coming through...just to be sure it's not something simple overlooked. I've never had anyone else report this so far and it works fine for me. Very unusual.
Ok, I've posted an update to attempt to fix the issue. Unfortunately I don't have any hardware to test, so I'm relying on you to let me know if it resolves the issue. Also, I've added in a feature with the Push-to-Talk settings that let you specify 'Any Key' to allow any of the bound keys to activate PTT rather than all. Let me know how it goes!
Make sure that the button isn't registered in the block list (found in settings). Otherwise, FoxVox won't pick it up when it is pressed. Some other settings in the PTT could also affect it such as the 'Isolate' option which limits PTT to not work if any additional keys are pressed besides the bound PTT Keys.
Besides this, what you describe doesn't seem to be a FoxVox issue. As you said, it picked it up when you assigned it to the PTT. Make sure the hardware/drivers are passing through the button accurately and consistently on press each time - FoxVox doesn't do anything with the input besides monitoring the button states passed in by the OS. Verify it's not being affected by any controller macro logic or third party apps.
Hmmm. Are you binding both buttons to the same command in the PTT? If so, then both buttons would be required to be pressed together - at least how the program works currently. If you try using just one (either UHF or VHF) as a test, does it work consistently? If so I may need to add support on the PTT to allow for either AND/OR condition for multiple buttons. In its current state ALL defined PTT buttons must be engaged together at the same time - it's not an either or.
Aside from this, you could test if there's a joystick detection problem by testing with a keyboard key instead and seeing if it works consistently. You should see the PTT engage and disengage by the icon turning green whenever the button is pressed. Try testing with some simple scenarios and we'll go from there.
Let me know what you find.
Hi @Wreckluse,
First, posting questions and comments here is perfectly fine, and I'll do what I can to answer.
With regards to the delay, this probably has to do with the microphone/sound detection changing between VR and non-VR/normal operation. Are you perhaps using a different microphone setup when you play VR vs standard? The delay is usually caused because the active microphone is picking up external sounds still even after speaking is completed. The speech engine remains listening without letting FoxVox process the speech until it finally times out. You can visually see this if the listening indicator icon remains green for some time after speaking. To fix this, try reducing the sensitivity of the microphone, keep it away from computer fan noise and any other sources of sound that it might pick up as background noise. You can also play with the Timeout setting in FoxVox by entering a low value (like 1) to try and force a timeout earlier.
For your question about numbered commands, I recommend you always use spelled out numbers for the voice command keys (i.e. "One", "Two", etc.) This will prevent "One" from conflicting with "Ten". If you run into a conflict such as "One" vs. "One Hundred", you will need to put a blocking word of "Hundred" onto the "One" key so that it can tell the difference. Alternatively you could place the "One Hundred" command above the "One" command so it will be evaluated first and will be skipped if "Hundred" is not contained in the spoken phrase.
Thanks for your positive comments about the app and I'm glad you like it! My hope is that it can be a useful tool for anyone needing free and customizable voice control :)
Depending on the game, you may need to add pauses (between 10-50 ms typical range) on or between key events to get them to register. Otherwise they are so fast that the game might not register them. You may want to experiment with this. I show how to do this with mouse outputs in the last chapter of my basic tutorial video with drag-drop (it works the same for keyboard outputs).