Relay-Version: version B 2.10.3 alpha 4/15/85; site seismo.UUCP Posting-Version: version B 2.10.2 9/3/84; site genrad.UUCP Path: seismo!cbosgd!clyde!watmath!utzoo!decvax!genrad!sources-request From: sources-request@genrad.UUCP Newsgroups: mod.sources Subject: C FORTH (Part 1 of 3) Message-ID: <867@genrad.UUCP> Date: 23 May 85 22:45:46 GMT Sender: john@genrad.UUCP Lines: 661 Approved: john@genrad.UUCP This is posting one of three of a portable FORTH interpreter, written entirely in C. It has been successfully ported to a VAX 11/60 running BSD 2.9, to EUNICE version 3 (I think), and the original machine, a VAX 11/780 running BSD 4.2. When I mentioned in net.lang.forth (and elsewhere) that I was writing this portable FORTH, several people asked that I post the results of my labors. Well, here they are. -- Allan Pratt (after May 7:) APRATT.PA@XEROX.ARPA [moderator's note: I have had no luck at all getting through to this address. There was a missing file in the original distribution "forth.lex.h" which I have reconstructed (hopefully correctly). - John P. Nelson] ------------- cut here ---------------- : Run this shell script with "sh" not "csh" PATH=:/bin:/usr/bin:/usr/ucb export PATH echo 'x - forth.doc' sed 's/^X//' <<'//go.sysin dd *' >forth.doc C-FORTH: a portable, C-coded figFORTH interpreter. Written by Allan Pratt; completed April 1985. This is a FORTH interpreter written entirely in portable C and FORTH. It requires nothing more than a decent C compiler to use. It is not exactly fast or efficient, but it is a true FORTH interpreter. The features include: Bootstrapping threaded definitions from a near-FORTH dictionary file. Block file I/O. Execution tracing and single-stepping. Breakpoint detection, dumping the stack at the breakpoint. Saving and automatic restoration of the FORTH environment. Ability to convert the block file to a line-editor-compatible file, and back. Included with the interpreter is a block file containing: An UNTHREAD utility. A screen editor with key-binding and cursor-addressing. BRINGING UP THE INTERPRETER: THIS FORTH MODEL REQUIRES "int"s TO BE TWICE THE SIZE OF "short"s, and "short"s to be 16 bits. I realize this is a barrier to portability, but you can change occurrances of "int" to "long" and "short" to "int" if "long"s are twice the size of "int"s. Note also that model sizes greater than 32K (with 16-bit cells) are likely to fail because of the sign bit. This has not been adequately tested. The first four sections of the file "common.h" contain implementation-dependent constants. These are TRACE, BREAKPOINT, several default file names, INITMEM, MAXMEM, and NSCR. As distributed, the FORTH system will work on most systems, but especially virtual-memory systems. If you do not have virtual memory, you will want to change MAXMEM -- see common.h for instructions. Once you've configured common.h to your taste, compile the files with touch lex.yy.c make all Note that lex.yy.c is lex output from forth.lex, slightly modified (using sed). lex.yy.c is included in the distribution because not everybody has lex and sed. You touch lex.yy.c before make-ing so make doesn't try to remake it. This make will create several files. Notable among them are "nf", the bootstrapper, "forth.core", the core-image output of nf, and "forth", the interpreter itself. Finally, there are two utility filters, b2l and l2b. These convert files from block format to line format and back (b2l --> block to line). A line-format file is one suitable for editing with vi or emacs: it consists of a header line for each screen, followed by 16 newline-separated lines of text for that screen, followed by the next screen. THERE MUST ALWAYS BE SIXTEEN LINES BETWEEN HEADERS in the line file, or l2b won't work properly. You must use l2b to create the block file "forth.block" from "forth.line". Use: l2b < forth.line > forth.block to do this. If you don't have I/O redirection, I'm afraid you'll have to patch these programs (and lex.yy.c and nf.c, for that matter) to take arguments or use default files. Note that "forth.block" is the default block file used by FORTH. You can change this default in "common.h", and you can change it on the FORTH command line with -f. Now that you have the interpreter ("forth"), the block file ("forth.block"), and the core file ("forth.core"), you are ready to roll. Invoke FORTH with the following line to the C-shell: stty -echo cbreak ; forth ; stty echo -cbreak Some notes while running FORTH: The backspace character is ^H, no matter what your terminal is set for. When you backspace, ASCII 8 gets sent to the screen, so the cursor should back up but not erase the letter you're backing up over. There is no kill character to wipe out an entire line. If you backspace beyond the beginning of the line, you will get a beep. Don't use tabs; FORTH doesn't recognize them as whitespace. Everything in the FORTH world is upper-case. Use caps-lock if you have one. The VLIST command lists the FORTH vocabulary. Some commands, like LIST and VLIST, use the FORTH word ?TERMINAL to see if the user wants to quit. Use your interrupt character (usually ^C) to stop these commands. If you hit ^C twice before ?TERMINAL is checked, then you'll get an ABORT back to the FORTH top level. If the FORTH program is waiting for keyboard input, you'll have to hit ^C twice and then hit a normal key to see this effect. Your Quit character (often ^\) still works to get you back to your shell. When you start FORTH up, it will tell you the number of blocks in the block file (either the default or the one you specified with -f). You can see a block with the LIST command: "3 LIST" will list block number 3. Block zero is special: because of a bug in the FORTH standard model, you can't see block zero until you've accessed many other blocks (where "many" is the number NSCR from "common.h"). In any case, you can load the UNTHREAD utility (see below) with "1 LOAD" (because it starts on block 1), and the editor with "10 LOAD". XFORTH uses "setbuf(stdout,0)" in "forth.c" to force standard output to be unbuffered. If this call doesn't work for you, just remove it, and standard output will be line buffered. If that is the case, change EXPECT (in "forth.dict"): replace the EMIT call with DROP. Then you can call forth directly, without the stty stuff. You use your own erase and kill characters, but the editor won't work. SAVING THE FORTH ENVIRONMENT: When you have been working in FORTH for a while, you will have developed words which you'd like to save, without having to reload them from the block file all the time. The word SAVE will save the current core image on a file, normally "forth.newcore", and exit. The -s flag changes the save file name. When you start FORTH, the core file (either "forth.core" or the one specified with -c) is checked to see if it is a saved image or a bootstrapped image. If it's a saved image, execution begins from the spot it left off from. If it's bootstrapped (fresh out of nf), execution begins at COLD. SUMMARY OF COMMAND-LINE OPTIONS: -t[n] trace; n is a digit from 0 to 9, default 0. Each time through the inner interpreter, a line will be printed out showing the current stack pointer, the top n stack elements (topmost at the left), the current interpretive pointer (ip), an indent to reflect the current nesting depth (actually the return-stack depth), and the name of the word about to be executed. N is the "trace depth"; see DOTRACE below. -d[n] debug; n is a digit from 0 to 9, default 0. Like -t, -d prints out the trace line each time through the inner interpreter. Unlike -t, it will then wait for input from the terminal. If you hit newline (Return, CR, etc.) it will proceed. If you type any key followed by newline, it will dump the current memory image to the dump file, usually "forth.dump", and then continue. -n no setbuf. If -n is present, the setbuf(stdout,NULL) call which makes stdout unbuffered instead of line-buffered will not be executed. This is useful if you intend to do debugging with -t, -d, or the TRON command (see below). -p xxxx breakPoint. Breakpoints are enabled, and one is set at address xxxx (in hex). Each time through the inner interpreter, the ip address is checked against this breakpoint address. If they match, "Breakpoint" is printed, along with the current stack pointer and the entire contents of the stack, with the topmost element at the left. -c corename set the core file name The memory image will be read from this file instead of the default, which is usually "forth.core". -b blockname set the block file name This file will be used as the block file for disk reads and writes, instead of the default block file, which is usually "forth.block". -s savename set the core-save file name This file will be created or overwritten upon execution of the (SAVE) primitive (which is called by the SAVE command). It will contain a core image (just like forth.core or the -c name) which reflects the current status at the time of the save. If this file is used as input (renamed to "forth.core" or used in -c later), the FORTH system will restart right where it left off. Note that -c and -s MAY use the same name. DEPARTURES FROM THE figFORTH-79 STANDARD: ---------- ---- --- -------- -- -------- There are two features of the FORTH-79 standard which are unimplemented in this system: the construct and VOCABULARIES. Both of these are unimplemented simply because I couldn't understand the documentation sufficiently to implement them. In any case, they do not affect the operation of FORTH, except inasmuch as the dictionary is simply a flat stack structure, and defining-words using DOES simply don't work. EXTENSIONS FROM THE figFORTH-79 STANDARD: ---------- ---- --- -------- -- -------- I grew a FORTH implementation from the Z80 figFORTH standard, and found the following extensions to be vital to my sanity as a programmer and developer. CASE c1 OF action1 ENDOF c2 OF action2 ENDOF default-action ENDCASE This construct comes from Doctor Dobb's Journal, Number 59, September, 1984, in an article by Ray Duncan. I used to find that nesting IFs to emulate a CASE structure was the most difficult part of programming in FORTH, but now I don't have to mess with it any more. \ (backslash) This word begins a comment which extends to the end of the current line. This is only meaningful while loading, and causes an error if you are not loading. It amounts to an open-paren where the end of the line is the closing paren. This, too, comes from Dr. Dobb's Journal, Number 59, in a different article, by Henry Laxen. TRON, TROFF, DOTRACE These provide tracing facilities for C-FORTH. TRON takes one parameter, which is the number of stack elements to show. TROFF disables tracing. DOTRACE traces once, using the most recent depth value (from TRON or -tn on the command line). TRON will have no effect (but will still consume its argument) if the FORTH system was compiled without the TRACE flag set. DOTRACE, however, will still trace one line as advertised. REFORTH This word is useful when loading screens, to get the user's input. It reads one line from the terminal (with QUERY) and interprets it (with INTERPRET), then returns to the caller. This is used in the editor (see below) to read in the user's terminal type. This is yet another construct from Ray Duncan's article in Dr. Dobb's Journal, Number 59. ALIAS usage: ALIAS NEW OLD This word is used to change the meaning of an existing word, after it has been used in defining other words. Say you want to change EMIT so it forces a CR when the current column (user var OUT) equals the number of characters per line (constant C/L). The current definition of EMIT is: : EMIT (EMIT) 1 OUT !+ ; You could define a new word, NEWEMIT, as follows: : NEWEMIT OUT @ C/L = IF CR THEN (EMIT) 1 OUT +! ; and use ALIAS NEWEMIT EMIT to make all previous references to EMIT execute NEWEMIT instead. This is accomplished by making the definition of EMIT read: : EMIT NEWEMIT ; (Note that my CR sets OUT to zero, so this really would work.) NOTES ON THE BOOTSTRAP DICTIONARY: ----- -- --- --------- ---------- The contents of forth.dict describe the initial FORTH dictionary. There are several types of things in this file; they will be described by example. PRIM (EMIT) 12 (primitive) This declares a primitive called "(EMIT)", with an execute-code of 12. This execute-code is an index into the great switch statement in the inner interpreter, next(). In a real FORTH system, this would be an address where the actual code for the (EMIT) primitive resided. To add primitives, you must do the following: 1. Add the primitive to forth.dict, using an unused execute-code. 2. Come up with a name for that primitive which is a legal C identifier. Convention for parenthesis, like (EMIT), is PEMIT. For brackets, BEMIT. For slash (like "C/L") is CSLL, where SL stands for slash. 3. Add the C identifier name to "forth.h", with the execute-code as its value, like this: #define PEMIT 12 4. In "prims.c", add the code for the primitive, as a function with the same name as the #define above, except in lower case: pemit() { putchar(pop()); } 5. In "forth.c", tack another case onto the switch statement in next(): case PEMIT: pemit(); break; 6. Recompile (preferably with make). You will need to recompile everything but nf (which uses nf.c, lex.yy.c, and common.h, none of which you changed). That is, you need to regenerate "forth" from forth.c, prims.c, forth.h, and prims.h, and "forth.core" by throwing "forth.dict" at nf. There is one special primitive: EXECUTE must be primitive number zero, since it violates structure and jumps into the middle of next(). It is handled as a special case in the code for NEXT, and it must be primitive number zero. So much for primitives; there are other things in the forth.dict file: CONST C/L 64 (constant) Define constants with the word CONST followed by the name of the constant and its value. USER OUT 36 (user variable) Define user variables with the word USER followed by the name of the variable followed by the offset in the user-variable list of that variable. The cold-start values for user variables are defined in common.h; they get copied into the initial forth.dict by nf, into the "cold-start defaults" area of memory. They are copied from there into the first few user-variable locations by COLD. VAR USE 0 (variable) Define variables with the word VAR followed by the name of the variable followed by its initial value. : EMIT (colon-definition) Begin colon-definitions with a colon (of course), followed by the name of the word. Follow that with the words which make up the definition, ending with ; for normal words or ;* for immediate words. There are two special colon-definitions: {NULL} and COLD. {NULL} is a special symbol which, when it is the name for a colon-definition, gets translated into a single null byte; the special name '\0'. This is necessary since you can't type a null byte into a file, and C would mess up anyway, thinking the null was a terminator and not part of the string. COLD is a special word in that it MUST BE DEFINED. The bootstrapper, nf, compiles the CFA of COLD into a special place in low memory, where the interpreter knows it can start executing threaded code. LABEL Labels are used as the targets of branches. They are handled in much the same way as colon definitions and other words in the dictionary, in terms of forward referencing and so on. For this reason, LABELS MUST BE UNIQUE WORDS. You can't have a label which is the same as any other word in the dictionary, or you'll get a "word redefined" error (or worse!). Since the "nf" preprocessor does not execute FORTH code, it does not have the higher-level FORTH constructs like begin .. until or do .. loop. These must be implemented by the user, using the primitives BRANCH (unconditional branch), 0BRANCH (branch on top-of-stack == 0), (DO) (start a DO construct), and (LOOP) and (+LOOP) (the primitives which end loop constructs). Each of BRANCH, 0BRANCH, (LOOP), and (+LOOP) should be followed by one of two things: a signed offset, where a zero offset branches to the address of the BRANCH (etc.) instruction; or a label tag, which will be compiled as the appropriate offset from the current location. A FORTH word with a loop might look like this: : ONE-TO-TEN 10 0 DO I . LOOP ; This would be entered into the "forth.dict" file as follows: : ONE-TO-TEN LIT 10 0 (DO) LABEL LOOP1 I . (LOOP) LOOP1 ; Note the placement of the LABEL directive AFTER the (DO). Note also the use of LIT before 10, but not before 0. See below under LIT for more details. The branches are used to implement IF ... ELSE ... ENDIF constructs, as well as conditional looping like BEGIN ... WHILE ... REPEAT and BEGIN ... UNTIL. Ordinary programming practices are called for here, with such standbys as branching over the "TRUE" branch of the IF when the predicate is false, and branching over the "FALSE" part after the TRUE part has executed (if the predicate was true). If you need more guidance, study forth.dict, or don't bother. " (quotation mark) The quotation mark begins and ends a "STRING LITERAL", which is compiled into the dictionary as a count byte followed by the bytes in the string, from left to right. This is useful for the primitive (."), which expects a string of this form to follow it in the dictionary. Within quotation marks, a backslash (\) will escape any character, including a quotation mark or another backslash, so: "He said, \"Backslash (\\) is best.\"" produces [33] He said, "Backslash (\) is best." where [33] is the length count. Note that backslash escape conventions like \n, \t, etc. DO NOT WORK IN STRING LITERALS. This is a bug, not a feature. 'x' (character literal surrounded by apostrophes) Character-literal translation is done between single quotes, with the following effects: Text produces 'A' the value of the character A. '\b' system-dependent backspace character '\f' form-feed '\n' newline '\r' carriage return '\t' tab '\\' the value of the character backslash ''' the value of the character apostrophe. These values are only meaningfully used after LIT commands; see below. LIT LIT must be used before all literal values in the dictionary, just as it is used (though the user doesn't see it) when words are defined. The forth.dict file, after all, represents a de-compiled FORTH memory image. It has all the BRANCHes and (LOOP)s showing, and it also has LITs showing. In the above example for looping, LIT was used before 10, but not before 0. This is because "0" is defined a a CONSTant earlier in the forth.dict file. To keep this straight, you should keep the following in mind: When you are in the middle of a definition, each word you type will have its CFA compiled into the dictionary. If the word you type is not defined yet, nf looks at its "hint" about the string of characters: do they form a number, like a one and a zero form ten? Do they form a hexadecimal number, like 0x10? Is it an octal literal, like 077? Is it a character literal, like 'A'? If none of these is the case, then a zero is compiled as a place holder, and the word is considered a FORWARD REFERENCE to something yet to be defined. Yes, forward references are handled, and the distinction of defined words and labels is preserved: labels, instead of compiling their CFA into the dictionary, compile the offset of their CFA from the current dictionary pointer (minus one). If, at the end of the input file, there are still unresolved forward references, they will be listed. If the words containing the forward references are never executed, there will be no problem. If those words ARE executed, FORTH will bomb with a message like "Attempt to execute illegal primitive." One snag is that you have to remember what numbers are defined as CONSTants, and what numbers aren't. If nf sees "LIT 1" it will compile the CFA of LIT, then the CFA of 1, because 1 has already been defined as a constant. LIT must precede decimal, hex, octal, and character literals, but not string literals. If one of these literals is not preceded by LIT, a warning is issued, since this is unusual behavior. No warning is issued if LIT is succeeded by something other than a literal-class item, because that is not so unusual. INSTALLER'S GUIDE FOR THE SCREEN EDITOR: --------- - ----- --- --- ------ ------ This is an editor which I wrote on my Osborne 1, which has a memory-mapped display. I hacked out that part and hacked in the screen updating using .LINE, but at anything lower than 9600 BAUD I expect this to be intolerable. When you get FORTH at your site, you will have to edit the line-file to include functions which clear the screen, locate the cursor, and enter and exit standout mode, named CLS, LOCATE, STANDOUT, and STANDEND, respectively. The trick for installing this is to put those functions in an unused screen (screens 8 and 9 are perfect for this), and place a CONSTANT in screen 10 which is the name of the terminal (like "H19" or "ACT5") and has a value which is the number of the screen you put its definition in. VT100 and ADM5 are included as models, in screens 6 and 7. ADM5 actually covers a broad range of terminals, from the old ADM3A to the TVI920C, though standout might not work the same any more. The two words STANDOUT and STANDEND can be null words, anyway. USER'S GUIDE FOR THE EDITOR: ---- - ----- --- --- ------ This is a screen editor for FORTH screens. It will alter the file "forth.block" in the current directory, or whatever the installer set the default blockfile to be, or whatever you specify on the command line with the "-b file" switch. You call a screen up with the EDIT command: 3 EDIT will begin editing screen 3. The file starts with screen 0, but, due to a bug in figFORTH which I have carefully preserved (:-), you can't see screen 0 until you've edited several others -- as many as fit in memory, in fact. This number is 5 at the outset, but can be as low as 2 or as many as your installer's whim dictated. In any case, stick with screens numbered from 1 to the size of the block file as displayed when you started FORTH. When you start editing, the screen will clear, and the screen you asked for will appear. FORTH "screens" are, by convention and convenience, sixteen lines by sixty-four characters, for a total of 1K per screen. At the bottom of the screen being edited will be the word SCREEN and this screen's number. The keys (as I've distributed the editor) match WordStar, to some extent. The cursor movement keys do, at any rate: ^E, ^D, ^X, ^S form a movement triangle at the left-hand side of the keyboard, moving one character or line at a time. The outer reaches of this triangle move even farther: ^A and ^F move one word backward and forward, and ^R and ^C move one SCREEN backward and forward. If ^C is already your interrupt character, don't panic: use ESC-C (escape is a meta-prefix like in Emacs). If your network won't let you send ^S (like Sytek), ^H will do for backspace. To see all the key bindings, use the FORTH command DESCRIBE-BINDINGS. To make a binding, do this: 1. Type "BIND-TO-KEY function " where function is the name of the command you want bound to a key, and means press return/newline. 2. The computer will print "KEY: " at which time you should strike the key you want to have that function bound to. If you want something bound to ESC plus something, just press escape followed by the key, exactly as you intend to use the key in practice. To see the binding for one key, use the command DESCRIBE-KEY. Again, you will be asked for the key you want described. Strike the key just as you normally do, with ESC before it if necessary. The initial bindings are on screens 27 and 28 of the block file. The editor is always in replace mode. That is, when you strike a key, it overwrites whatever was under the cursor. There is no insert mode; I haven't gotten around to that. As you go around changing things, you are only changing the image of the screen which FORTH keeps in memory; you are not changing the block file. You can mark the current screen for updating to the blockfile with ^U. This only marks the screen; it won't be written to the blockfile until the space is needed again, or you explicitly FLUSH the memory buffers. Use ESC-F (press escape, then press capital F) to mark the current screen for updating, and then FLUSH all marked screens (including this one, now) to the block file. Once you've done that, the change is permanent. If you really mean to make a given change to a screen, mark it as updated (^U) right away. Countless hours have been wasted changing screens without marking them for writing, and then having that memory reused -- the changes are lost. You can leave the editor with ESC-Q (for Quit). Be sure to do an ESC-F before you do, if you want to save what you have. You can never be sure if your changes will get written out if you don't. (Yes, I know this is an oversimplification.) You can leave FORTH by typing BYE. XFINAL NOTES FROM THE AUTHOR: ----- ----- ---- --- ------ XFORTH runs slowly. If anybody can make this turkey fly (without, of course, rewriting it in assembly language), more power to you. It was done as a Junior project at Indiana University, but in fact I learned more about project management and programmer/systems staff interaction than I did about implementing FORTH (which I had done previously in 8080 assembly code). In that sense it was profitable, and, since it works, I see no reason not to distribute it to the network. If you have comments of praise, I can be reached through the summer of 1985 at the address below; after that, I MAY be at iuvax!apratt. Comments of another nature are not enthusiastically solicited, and I do NOT expect to upgrade this implementation AT ALL, EVER. Sorry, but there will be no "Version 2" for this baby. If I have left crucial points out of this documentation, and my (hopefully-well-commented) code doesn't provide the answer, I will reply and possibly improve this documentation. But for the most part, you're on your own. As a parting shot, the inner interpreter's "switch" statement could be replaced by an array of functions, using the variable p as an index. I was not that ambitious, and I don't know if it would be faster than the table-lookup which my C compiler generates. -- Allan Pratt APRATT.PA@XEROX.ARPA QUICK SUMMARY OF FILES (THERE IS A MESS OF THEM!) Makefile supposed to bring them all together b2l.c and b2l filter to convert block-files into line-files for editing l2b.c and l2b filter to convert line-files into block-files for FORTH common.h This is a header file with configuration and common information used by all C source files except lex.yy.c forth.h Header file with primitive numbers in it, among other things forth.c source code for the guts/support functions for the interpreter prims.h Header file with macro definitions for primitives prims.c source code for primitives too complex for macros The above four files, plus common.h, contribute to the executable "forth" nf.c source for the bootstrapper, which interprets the dictionary and generates an initial memory image for FORTH forth.lex lex input for lexical analyzer used by nf.c forth.lex.h header file used by lex.yy.c and nf.c lex.yy.c lex output, modified (look at the Makefile) The above four files, plus common.h, contribute to the executable "nf", the preprocessor. forth.block This is the (default) block-file used by FORTH for its editing- and load-screens forth.line This file usually resembles forth.block, but is in a format suitable for editing with emacs or vi: a header line, followed by sixteen lines of trailing-blank- truncated, newline-terminated text for each screen. If one of forth.line and forth.block is out of date with respect to the other, you can bring it back up to date with b2l or l2b, above. forth.dict This is a human-readable, pseudo-FORTH dictionary which nf uses to generate the initial environment. It contains forward references and no higher structures like DO..LOOP forth.core This is one output of nf: it contains the core image for the FORTH environment, as dictated by common.h and forth.dict forth.newcore This is the file for holding core images saved with the (SAVE) primitive. If FORTH is started with "-c forth.newcore", the image is restarted right where it left off. forth.map This is another output of nf: it contains a human-readable dump of the forth environment which nf generated. This can be compared with the post-mortem dump which FORTH generates in forth.dump in certain cases. //go.sysin dd *