Chaga Programming Language


About

  • The Chaga programming language began as a derivative of the C programming language. I developed the Chaga programming language to serve as a teaching tool for my programming languages class at Sonoma State University in the Department of Computer Science.
  • I've forked the Chaga programming language into two languages: Chaga (detailed below) and ChagaLite. ChagaLite is a subset of the Chaga programming language. ChagaLite is an interpreted language while Chaga is a compiled language.

Language design philosophy

  • Datatypes should NOT be constrained by hardware limitations. For example, an integer in the C programming language is typically a 32-bit number or four bytes. The technical solution in the Chaga and ChagaLite programming languages is to store all numbers (floats and integers) internally as unsigned strings of bytes. An added benefit of string representation is manipulating extremely large (or small) numbers.
  • I'm designing and writing a numerical string library to represent numbers as strings. It is not complete yet but does store digits in binary-coded decimal (BCD) format for efficiency. My BCD encoding stores two digits in one byte. Eventually, I will fold the numerical string library into my compiler work for the Chaga programming language.

Open-source

  • The Chaga and ChagaLite programming languages are open-source programming languages.
  • The Chaga and ChagaLite programming languages are not to be monitized.
  • Feedback and collaboration are welcome.

Language specifications

Backus-Naur Form (BNF) for Chaga programming language

Chaga programming language datatypes

  • bool: This datatype stores a TRUE or FALSE Boolean value. The bool datatype can be utilized in an array.
  • char: This datatype stores one unsigned byte.
  • string: This datatype stores arbitrarily large or small unsigned bytes as a character string.
  • date: This datatype stores dates in the format YYYY-MM-DD format where YYYY=four digits representing year, MM=two digits representing the month, and DD=two digits representing the day.
  • time: This datatype stores time in the format HH:MM:SS format where HH = two digits representing the hour, MM = two digits representing the minute, and SS = two digits representing the second.
  • datetime: This datatype stores both date and time in the format YY-MM-DD HH:MM:SS.
  • int: This datatype stores arbitrarily large or small integer values.
  • float: This datatype stores arbitrarily large or small real numbers.
  • complex: This datatype stores arbitrarily large or small complex numbers which contain both a real component and an imaginary (square-root of negative one) component.
  • imaginary: This datatype stores arbitrarily large or small imaginary numbers.
  • file: This datatype represents a file handle.
  • pointer: This datatype designates a variable as a pointer to a memory location. A pointer datatype can examine the memory (contents) of any other datatype on a byte-for-byte basis.
  • attribute: This datatype stores attribute data for other variables. It is useful when one wants to store and retrieve the metadata for other variables.
  • void: This datatype designates an empty datatype which holds nothing. No values may be stored in this datatype. It is primarily used as a placeholder when a datatype field is required.
  • enum: This is an enumerated datatype. The user specifies the set of values the enumerated datatype may contain.
  • typedef: This is a custom, user-defined datatype. Example: the user creates a new datatype called "listnode" which is used to create linked list nodes.

Chaga programming language metadata attributes

  • Every identifier in the Chaga programming language contains metadata attributes. These attributes describe the identifier as either the name of a function, the name of a procedure, or the name of a variable. This is particularly useful when an unknown datatype is pased in to a function or procedure by pointer (pass-by-reference). In such cases, the pointer can then access the metadata attributes of the datatype being referred to.
Metadata attributes

Note: the .self and .content attributes are always read-only by the user.

  • .self: this attribute refers to a variable's own metadata values.
  • .content: this attribute is only applicable to pointer datatypes. It refers to the metadata attributes of the memory location referred to by a pointer. Possible values for content are: {bool, char, string, date, time, datetime, int, float, complex, imaginary, file, pointer, void, or usertype}. A usertype datatype indicates a customized, user-defined datatype.
  • .protect: this attribute has the following values: {R, W, RW}. The "R" attribute indicates the contents in this datatype are read-only and cannot be modifed by a pointer. The "W" attribute indicates the contents in this datatype are write-only and cannot be modifed by a pointer. The "RW" attribute indicates the contents in this datatype readable and writeable by a pointer. Note: a pointer may not modify the protect attributes to gain access to a variable's value. This is a priviledge violation. A variable may ONLY change its protect option within the same scope that the variable was defined. Global variables may only change their protection options in global space.

The .self and .contents attributes have the following additional values:

  • .type: this attribute describes what an identifier is defined as. Possible values are: {function, procedure, datatype}.
  • .datatype: this attribute only applies to identifiers of type datatype Possible values are: {bool, char, string, date, time, datetime, int, float, complex, imaginary, file, pointer}.
  • .first_indice_size: this attribute defines the size of the first indice of an array. Note: if datatype is not an array, this value will be zero.
  • .second_indice_size: this attribute defines the size of the second indice of an array. Note: if datatype is not an array, this value will be zero.
  • .third_indice_size: this attribute defines the size of the third indice of an array. Note: if datatype is not an array, this value will be zero.
  • .length: this attribute only applies to identifiers defined as one of the following: {bool, char, string, date, time, datetime, int, float, complex, imaginary, file, or pointer}. The length attribute defines the maximum storage length (in bytes) of the datatype. The unary plus, unary minus, and decimal point ARE included in the length calculation for float. The unary plus and unary minus ARE included in the length calculation for integer.
  • .address: the address in memory where the identifier exists. This is useful for parameter passing.
  • .scope: this attribute defines the scope of the identifier (i.e. a variable name, function name, or procecure name). A zero indicates global scope. Non-negative integers denote the scope of identifiers associated with a procedure or function.

Example: usage of the metada datatype attributes

unsigned char my_string[256];

pointer my_pointer;

my_pointer = my_string.self.address;

my_string is located at address 0x1231230

my_pointer is located at address 0x441420

Notes:

  • my_string.self.datatype is: unsigned char
  • my_string.self.length is: 256
  • my_string.self.address is: 0x1231230
  • my_pointer.self.datatype is: pointer
  • my_pointer.self.address 0x441420
  • my_pointer.contents.datatype is: unsigned char

Chaga statements (examples)

Note: the following statements serve as examples. The BNF language definition will be much more complete (once posted).

Declaration (dec statement)
  • dec datatype identifier;
  • dec datatype[positive integer];
  • dec datatype[positive integer, positive integer];
  • dec datatype[positive integer, positive integer, positive integer];
  • dec function identifier (parameter list) return (parameter list) {statement block}
  • dec procedure identifier (parameter list) {statement block}
Assignment (set statement)
  • set identifier = expression ;
  • set identifier.protect = "R"
  • set identifier.protect = "W"
  • set identifier.protect = "RW"
  • set identifier = function name (parameter list);
  • set (identifier list) = function name (parameter list);
Break

The break statement is used to prematurely leave an iterative loop (e.g. do, for, while) or a switch statement.

  • break;
Call
  • call identifier (parameter list);
Return
  • return identifier;
  • return (identifier list);
Selection
  • if (expression) then {statement block}
  • if (expression) then {statement block} else {statement block}
  • switch (expression) {case constant: statement block break; case constant: statement block break; case constant: statement block break; ... default: statement block break;}
  • switch (expression) {case constant: statement block case constant: statement block case constant: statement block break; ... default: statement block break;}
Iteration
  • for (identifier = integer or real constant to integer or real constant) do {statement block}
  • for (identifier = integer or real constant to integer or real constant step integer or real constant) do {statement block}
  • while (expression) do {statement block}
  • do {statement block} while (expression)

Chaga built-in functions and procedures

Input
  • set (identifier1, identifier2) = getchar (void)

Note: If no error occurred, getchar returns an unsigned character to identifier1. If an error occurred, getchar returns an integer value -1 to identifier2.

Output
  • putchar ("\n");

Tentative features to add

Datatypes
  • Add support for new datatype: pipes.
  • Add support for new datatype: socket.
  • Add support for new datatype: shared memory.

Miscellaneous

  • Allow main procedure to accept environment variables, argv and argc from the operating system.
  • Add built-in support for coroutines
  • Add built-in support for blocking and non-blocking threads