Scheme Scanner

Scheme Scanner (written for LEX)


%{

#include <stdio.h>

#include <stdlib.h>

The file y.tab.h contains all the token definitions generated by yacc when it compiled the parser code.


#include "y.tab.h"

%}

The following definitions are used to make the later definitions tokens easier. A digit is defined as any integer from zero through nine.


digit [0-9]

A nondigit is any character other than the integer zero through nine.


nondigit [^0-9]

An alpha is any lowercase or uppercase letter A through Z or a through z.


alpha [a-zA-Z]

An alphanum is defined as either an alpha or a digit, as defined above.



alphanum ({alpha}|{digit})

The extra characters are characters recognized in Scheme besides those defined above.


extra [\!\$\%\*\+\-\/\=\?]



%%

Here I began defining all the possible way to represent all the tokens that Scheme uses. First, I defined an INT to be an optional + or - symbol followed by any number of digits, as defined above.


[\+\-]?{digit}+ {return INT;}

A REAL is defined as an optional sign symbol followed by any number of digits, then followed by a . (decimal point) and any number of digits, as defined above.


[\+\-]?{digit}+\.{digit}+ {return REAL;}

The keyword let simply returns LET.


let {return LET;}

The keyword define returns DEFINE, used in function and procedure definitions.


define {return DEFINE;}

The keyword lambda is used in function definitions. When the scanner finds lambda, it return the token LAMBDA.


lambda {return LAMBDA;}

An identifier in Scheme is required to begin with a letter (alpha), followed by any number of alphanumeric characters (alphanum) or special characters (extra). This returns ID.


{alpha}({alphanum}|{extra})* {return ID;}

All of the following return a code for the individual characters recognized by Scheme. The construct backslash-character is used to make clear that I mean literally that character. The scanner then will return the appropriate token name, such as LPAREN for a left parenthesis or COLON for a colon.


\( {return LPAREN;}

\) {return RPAREN;}

\/ {return FORESLASH;}

\! {return EXCLAMATION;}

\$ {return DOLLAR;}

\% {return PERCENT;}

\* {return ASTERISK;}

\+ {return PLUS;}

\? {return QUESTION;}

\. {return PERIOD;}

\> {return GREATER;}

\< {return LESS;}

\: {return COLON;}

\^ {return CARET;}

\& {return AMPERSAND;}

\_ {return UNDERSCORE;}

\~ {return TILDE;}

The scanner returns STRING when it finds a string of any character other than a newline enclosed by quotes.


\"[^\n"]*\" {return STRING;}

When the scanner finds a space, a tab, or a newline, it just skips it.


[ \t\n] {}

Otherwise, it returns unknown.


. {return UNKNOWN;}



%%

#ifdef __ALONE

int yywrap(){ return 1; }

void yyerror(char* msg){ fprintf(stderr,"Error: %s\n",msg); }



int main(){

	int knd;

	knd = yylex();

	while (!feof(stdin)){

The following code is used for error checking purposes only and simply prints out the token name for each token's knd.


		switch(knd){

		case INT: printf("INT\n"); break;

		case REAL: printf("REAL\n"); break;

		case LPAREN: printf("LPAREN\n"); break;

		case RPAREN: printf("RPAREN\n"); break;

		case FORESLASH: printf("FORESLASH\n"); break;

		case ATOM: printf("ATOM\n"); break;

		case PLUS: printf("PLUS\n"); break;

		case EQUAL: printf("EQUAL\n"); break;

		case HYPHEN: printf("HYPHEN\n"); break;

		case ASTERISK: printf("ASTERISK\n"); break;

		case EXCLAMATION: printf("EXCLAMATION\n"); break;

		case DOLLAR: printf("DOLLAR\n"); break;

		case PERCENT: printf("PERCENT\n"); break;

		case PERIOD: printf("PERIOD\n"); break;

		case GREATER: printf("GREATER\n"); break;

		case LESS: printf("LESS\n"); break;

		case COLON: printf("COLON\n"); break;

		case CARET: printf("CARET\n"); break;

		case AMPERSAND: printf("AMPERSAND\n"); break;

		case UNDERSCORE: printf("UNDERSCORE\n"); break;

		case TILDE: printf("TILDE\n"); break;

		case STRING: printf("STRING\n"); break;

		case ID: printf("ID\n"); break;

		case LET: printf("LET\n"); break;

		case DEFINE: printf("DEFINE\n"); break;

		case LAMBDA: printf("LAMBDA\n"); break;

      case UNKNOWN:

      default: printf("UNKNOWN\n"); break;

      }

      knd = yylex();

	}

}

#endif

This paper is divided into six sections:

Scheme Introduction
Scheme Lexer (Scanner) with explanation
Scheme Yacc Parser with explanation
Scheme Semantic Structure with explanation
Special Feature: Continuations and their use in Scheme
An Example of a Scheme program