Skip to main content

Indie game storeFree gamesFun gamesHorror games
Game developmentAssetsComics
SalesBundles
Jobs
TagsGame Engines

ASCIILang - Pseudo game engine and programming language.

A topic by Aim studios created 3 days ago Views: 99 Replies: 7
Viewing posts 1 to 7

Hi all.

Last year, I built a simple ASCII art editor. It was a great experience for me. I got to learn a lot of programming concepts, array handling in godot and also got to learn much more about godot's architecture in general. I have always been interested in ASCII art and animations, those are amazing. I even have a thread up about ASCII art. I have also wanted to build an ASCII based game. There are quite a few of them, but I couldn't find a proper ASCII 'game engine', designed specifically to create ASCII games. So I've decided to build my own.

Before I start, there are a few things I wanted to say. First, I am not an experienced programmer. I am in high-school right now. So I am only 20 percent sure if I will be able to complete this project. Most likely, I would encounter an unsolvable problem and leave the project. The second is that I am not consistent. If you have read my previous devlogs, you would know that I have a bad habit of taking weeks or even months-long breaks while working on a project. I just can't help it. Sometimes I have many more things to worry about, mostly schoolwork and exams. So, please don't expect daily posts on this thread. Next thing is, I am intending on this thread to discuss about more technical and program-related side of things. And obviously, if you are a fairly experienced programmer, you are going to be frustrated seeing my code. So fair warning : )

In the next post, I will discuss more about the concept for the project and how I plan to build it.

Visual concept.

For the game engine, I am planning to have a two-screen system, where one screen is for the debugger, that is testing the game made, and another for the editor, where the programming and other things are done. I think I might also add a third screen for editing ASCII art for the game, but that would happen only later, after the other screens are implemented properly. For now, I am thinking only about the visual aspects of the engine. But I might later plan on adding audios and sound effects. But it will most likely have to wait until most of the core features of the game engine are done.

Technical concepts.

Game loop.

A game loop is certainly among the most important aspects of a game or a game engine. Since this is an ASCII game engine, where updates need not be at 60 frames per second, I plan on implementing a very simple game loop using a 'forever()' command. The idea is that this command will run forever and the user can pass in a delay value, which is the amount of time in seconds the engine will wait before executing the next loop. Pretty simple concept. I might also let users change the wait time while the loop is running. That might be helpful.

The screen.

So, the screen is where the game actually takes place. I plan it to be a two-dimensional array of characters into which the user can write characters using the engine. There are two ways to write into the screen array. The first is where the user writes directly into the screen, manually changing the value/s of specific cell/s in the array. The other would be handled mostly by the engine itself. This method would use something called 'Blocks'.

Blocks.

Blocks are essentially just another two-dimensional array, like an image made of pixels, but instead of pixels Blocks use ASCII/Unicode characters to create an ASCII image. These can be pasted on top of the screen by the engine at a specific position that the user wants, when the user calls a 'draw()' function. There are two main advantages to the block system. The first is that it allows the user to group together a two-dimensional array of characters and operate on them as a whole and treat them like a single entity. This would be immensely helpful while making games. The second is that since each block has a position assigned to it, it can exist outside the screen array. This means that the user can draw a block at a position, say (250,250) in a screen array of size, say (100,100). The object can then exist outside the bounds of the screen array and only the parts of the object which overlaps with the screen array will be rendered by the engine.

The programming language.

Let us discuss about the programming language itself, or at least, a pseudo programming language. ASCIILang (Name change considered) is supposed to be an interpreted language, reading the statements line by line and executing them. So I will have to build a pseudo interpreter for this language. The language will offer facilities for basic arithmetic and logical operations as well as simple control statements like if, else, elif and maybe switch as well as basic loops. But perhaps the most important among these would be jump statements. ASCIILang would rely heavily on 'Labels', which are kind of like functions. They are segments of code which can be executed when called. Jump statements like goto, break, etc. would be crucial for handling labels. Maybe the labels can return a value too. At that point, it would be just better to call them functions. Anyway, I like to keep the scope simple for ASCIILang.

How I plan to implement these.

Godot. I'm not sure how stupid that sounds. Anyone with basic programming experience might use languages like C, C++ or Rust for these king of projects. And they would be right, these languages are GREAT for system level programming. They are fast, efficient and gives way more access to system hardware. The reasons why I am using Godot might sound stupid. It is because Godot has great UI designing, it is used by some people to build some pretty low-level tools, I have built a tool in it before and because I know GDscript better than any other language in the world, except Java. Those two are also my favorite languages. Hope I won't get crucified for saying Java is a favorite.

Update 1 - Implemented the basics.

So we have now implemented a really really basic pseudo-interpreter. It doesn't do very much right now. But it can now recognize basic program commands. This is a great start.

So here you can see how this works. On the right side is the editor window. There I have three very basic commands.

int a = 1 + 2;
int b = 2 + 3;
int c = a + b;

So what our program does now is that it gets the text from the editor and then it does some basic stuff like trimming extra spaces, which is done by replacing double space with a single space, and adding a delimiter at the end of the text.

And then it loops through each character, checking if the character is part of a predefined words set. For example if it is the letter 'i', then it gets appended to a string variable called word. When the program comes across a white space, in the above case that would be after appending the letters 'i', 'n', and 't' into the variable word, it understands that the word is complete and then appends the word into an array called tokens. If the program comes across special characters or operators like ; (semicolon), <, >, +, -, etc. then also it would recognize the word is complete and append it to the tokens array. But this time, the special character is also appended into the array, as a separate element. For example, in the first line, when the program encounters a semicolon, the value in the word (which would be 2 in this case) is appended to the tokens array. However, unlike when it encounters a white space, the character ' ; ' is also appended to the tokens array.

This tokens array is the first line printed as the output on the right side. In this case it was:

[int,a,=,1,+,2,;,int,b,=,2,+,3,;,int,c,=,a,+,b,;]

After the entire code is tokenized, our program then sends the token to another function called interpreter (name arbitrary). This function right now splits the tokens array into small chunks called lines and prints one line at a time. This is done by looking for semicolons to split the array into different lines. The output of this is given in the image as:

[int,a,=,1,+,2]
[int,b,=,2,+,3]
[int,c,=,a,+,b]

Note that these line arrays do not contain semicolon. This is because semicolon is only used as a delimiter here and is omitted when appending to the array.

This is only less than about 1 percent of the actual work. But this is a promising start.

Update 2 - Our program can now identify numbers.

So previously, our program considered every single word we typed separate. This is very useful in tokenizing, but there are certain cases where this is a problem. One of them is when the interpreter has to recognize numbers. As an example, if I were to type in 22.5; in our previous program, the outputted line array would look like this:

[22,.,5]

We obviously don't want that. We want the program to be able to recognize that the dot in the middle is a binding character and that these three are not three separate tokens but a single number. On the first glance, the fix seems easy. Just check if the next token is a dot and if the one after that is a number. If true, combine all these into a single token. But this does not actually solve the problem. This is because, if the above solution is implemented, the program output to the line '22.3.4;' would be:

[22.3.4]

And this is not what we want either. In the above scenario, we want the program to return an error. The fix to this, too, is actually not that complicated. There are a couple of ways, in fact, to work around this problem. I'll describe the method I used. So suppose I typed in 22.3.4; into the editor. The interpreter function discussed earlier would output the array:

[22,.,3,.,4]

Now first, we create a new array called N in a new function called makeNumber. This is the array that shall be returned by the makeNumber function, which identifies numbers in an array. The makeNumber function loops through the entire array. In each iteration, it does the following:

If the element is a dot and the next element is a number, It appends 0 + dot + the next element into the new array as a single element.

[.,3] -> [0.3]

If the element is a dot but the next element is not a number, It returns an error.

[.,a] -> [error]

If the element is a number and the next element is not a dot, it appends the element to the new array.

[3,a] -> [3,a]

If the element is a number and the next element is a dot, and the one after that is a number, it combines the whole and appends to the new array as one element.

[3,.,1] -> [3.1]

If the element is a number and the next element is a dot, but the one after that is not a number, it returns an error.

[3,.,a] -> [error]

Now this is not the exact flow of logic, but it is something similar. In the end, the result would be something like this:

In the right in the editor, there are three commands:

22.2 + 3.1;
23.1;
24.3.2;

In the output, we can see that the first like is the tokens array which is:

[22,.,2,+,3,.,1,;,23,.,1,;,24,.,3,.,2,;]

The next three lines are the results makeNumber function returned when it was passed this array.

[22.2,+,3.1]
[23.1]
Error

Just as we needed, it was able to successfully recognize single numbers separated by dots and combine them into one floating point value. This took about an hour and a half. Also, I just realized I am creating a game engine and its interpreted language, in another game engine using its interpreted language.

(1 edit)

Update 3 - The interpreter can now recognize strings and sign numbers.

So in the previous update, we made the program in such a way that it would return an error when we input some thing like 22.3.5. But what about inputting the string "122.33.89.64". Turns out the program would consider this an error, too. But we don't want that. We want the program to be able to recognize the characters in between double quotes as strings, so as to not apply rules of numbers on these strings. Also previously, inputting something like -24.5 would result in the output:

[-,24.5]

I've now implemented a signNumber function which takes in the array and automatically signs the number. So the above input would become:

[-24.5]

The signNumber function operates on a similar logic as the makeNumber function, only this time it uses '+' and '-' instead of dot, and with some slight changes.

So right now, when a line is inputted, it first goes to the makeString function, which groups words inside double and single quotes, then trims extra white spaces outside them. Then it is passed to makeNumber function, which converts seperate decimal parts of a number into a single token, and then finally to signNumber function, which identifies + and - tokens and signs the number to its right. Here are a few inputs and outputs:


Inputs:

"Hello world";
"How are you?";
'H O W ARE Y O U ?????';
'22.34.12.67.23+67';
"";
23 - -89 + -67 + ----34.8;
"23 - -89 + -67 + ----34.8";

Output:

[", Hello,  , world, ", ;,  , ", How,  , are,  , you, ", ;,  , ', H,  , O,  , W,
  , ARE,  , Y,  , O,  , U,  , ?, ?, ?, ?, ?, ', ;,  , ', 22, ., 34, ., 12, ., 67,
 ., 23, +, 67, ', ;,  , ", ", ;,  , 24,  , -,  , -, 89,  , +,  , -, 67,  , +,  ,
 -, -, -, -, 34, ., 8, ;,  , ", 24,  , -,  , -, 89,  , +,  , -, 67,  , +,  , -, -,
 -, -, 34, ., 8, ", ;,  ]
 ["Hello  world"]
 ["How  are  you"]
 ['H  O  W  ARE  Y  O  U  ?????']
 ['22.34.12.67.23+67']
 [""]
 [24, 89, -67, 34.8]
 ["24  -  -89  +  -67  +  ----34.8"]
(1 edit)

The progress so far in a single picture.

Input:

"Hello world"; 
22.4; .2; 
2.7.4; 
"2.47.4"; 
1 - -3; 
1 + -4; 
A += 1; 
B -= 2.3; 
C *= .2; 
D /= 5; 
E %= 7; 
2.;
                     .2; 
"                    .2";

Output:

["Hello  world"] 
[22.4] 
[0.2] 
Error 
[] 
["2.47.4"] 
[1, 3] 
[1, -4] 
[A, +=, 1] 
[B, -=, 2.3] 
[C, *=, 0.2] 
[D, /=, 5] 
[E, %=, 7] 
Error 
[] 
[0.2] 
"                    .2"

So far its working exactly as intended.

Today I spent most of the time optimizing the program. Made sure there were no useless variables, cut short many lines of code, documented some parts of the code, etc. I also implemented operator grouping, so the program can identify separate back-to-back operators like + and = and combine them into a single operator +=. The result of this can be seen in the output above. We are just a few more steps away from actually letting the program do these operations and return the result. Before that, we need to implement identifier (variable) recognition, order identification so our program knows the order in which operations need to be executed (Our language uses BODMAS or PEMDAS order), and command type recognition, so the program can differentiate between declaration, assignment, etc.

Also, I have remove ++ and -- operators from the language vocabulary. It was such a pain to implement and served no real purpose in my opinion.

Update 4 - Tokenizer complete.

At last, the very first module of our language is up and running - The tokenizer. Also known as a Lexer, an interpreter uses this module to convert user-readable source code file, that is just plain text, into a stream of tokens. This format is a lot easier to process for an interpreter as, with tokens, it is comparatively easier to define the relations between two different individual components in the code, as opposed to plain text words.

So for our interpreter, I have written a separate class (in a separate .gd script) named Token. The token class holds two variables. One is the type and the other is the value. The type variable holds information on what the kind of token is, if it isn't obvious. For example: The token generated for the element "Hello world" would be be a literal type token, more specifically, a string literal type token. I created an enum called TokenType inside the Lexer class to hold the values for each token type constants. This means that the token type is actually an integer which represents what group the specific token belongs to, so as to give our interpreter some context on how to approach and process the specific token. The image below shows how the TokenType enum is defined. At this time we only have nine tokens assigned values 0 to 8.


enum TokenType{
     KEYWORD,
     IDENTIFIER,
     NUMBER,
     STRING,
     TRUE,
     FALSE,
     EQUALS,
     UNARY_OPERATOR,
     BINARY_OPERATOR
}

The second variable in our Token class, value, holds the actual value of the element. For example, for the token generated for element "Hello world", the type would be TokenType.STRING which equals 3, as discussed earlier. However, the value is going to be the actual string "Hello world". The variable value is of type String. This means that even for number literals, the value will be stored as string. For example, the vale of token generated for the element 12, would be "12" of String type. There are also some tokens, like true, false and equals to sign, which need not require a value type, as this could be known from their type property. These tokens will have a value of "" (empty string).

Some interpreters make use of even more token properties, but for our case, these would be enough. All else that is contained in the token class is a simple print_token() function which prints the token in {type : value} format, for the purpose of debugging.

What I plan on doing next is adding more token types, creating a proper error handling system and double checking everything to see if all works properly. After that, I plan to implement the parser, the second module of our interpreter, which takes in these tokens and outputs something called an Abstract Syntax Tree (AST). But before I get there, I need to have a clear syntax definition for our language. And the design needs to be implemented keeping in mind what purpose our language serves. The next post will most likely discuss that. Only after successfully managing to generate an AST at least would I be confident enough to say the project is 10 percent complete.

PS: I remove the signNumber function because it becomes messy when trying to create binary operator tokens.