Relay-Version: version B 2.10.3 4.3bsd-beta 6/6/85; site seismo.CSS.GOV Posting-Version: version B 2.10.2 9/3/84; site panda.UUCP Path: seismo!harvard!talcott!panda!sources-request From: sources-request@panda.UUCP Newsgroups: mod.sources Subject: GaTech Sendmail (Part 1 of 3) Message-ID: <1000@panda.UUCP> Date: 14 Oct 85 13:20:44 GMT Date-Received: 14 Oct 85 17:04:49 GMT Sender: jpn@panda.UUCP Lines: 748 Approved: jpn@panda.UUCP Mod.sources: Volume 3, Issue 23 Submitted by: Gene Spafford #! /bin/sh # Make a new directory for these sources, cd to it, and run kits 1 thru 3 # through sh. When all 3 kits have been run, read README. echo "This is GaTech Sendmail kit 1 (of 3). If kit 1 is complete, the line" echo '"'"End of kit 1 (of 3)"'" will echo at the end.' echo "" export PATH || (echo "You didn't use sh, you clunch." ; kill $$) echo Extracting PATCHES cat >PATCHES <<'!STUFFY!FUNK!' From: topaz!hedrick 17 Sept 1985 Here are some patches to SENDMAIL. 1) They allow us to handle an Arpanet host table containing names like topaz.rutgers.edu. The original version of sendmail, as well as sendmail.cf, assumed that all host names end in .arpa. Changes may also be needed in UUCP and net news if your site name does not end in .ARPA. 2) They allow you to translate all host names to their canonical forms, i.e. replace nicknames with the primary name. This is now considered the correct thing to do on the Arpanet. This adds an operator, $%, for doing the mapping. WARNING: $% uses a single fixed location for the translated address. Thus only one translated address may be active at a time. This is fine for foo@bar, but you could not normalize all of the host names in an address like @foo,@bar:bar@gorp. In practice I don't believe this is a problem. 3) They allow you to make processing depend upon whether mail arrived via UUCP or TCP. In addition to the patches here, you will need to modify rmail, or whatever program calls sendmail to delivery UUCP mail. It should call sendmail with an extra argument, -pUUCP. This adds an operator, $&, for evaluating macros at runtime instead of when sendmail.cf is loaded. It adds an option to sendmail, -p, to specify the protocol name. It implements $r, which is documented in the source but never actually implemented. It is the name of the protocol via which mail arrived. $r is set from the -p option, or by the SMTP daemon code, which automatically sets it to the value TCP. The code for reading and writing queue files saves the protocol in a line beginning with O. In case a message gets queued, this allows us to remember which way it arrived. Originally, SENDMAIL would tack .ARPA onto your host name. This is a bad idea, since not all host names end in .ARPA. The assumption was that your rc file said hostname foo and this would turn it into foo.arpa. We now do hostname foo.rutgers.edu and do not want .ARPA tacked on the end. *** daemon.c.ORIG Fri Feb 10 06:59:21 1984 --- daemon.c Sat Aug 3 07:49:46 1985 *************** *** 183,189 /* determine host name */ hp = gethostbyaddr(&otherend.sin_addr, sizeof otherend.sin_addr, AF_INET); if (hp != NULL) ! (void) sprintf(buf, "%s.ARPA", hp->h_name); else /* this should produce a dotted quad */ (void) sprintf(buf, "%lx", otherend.sin_addr.s_addr); --- 185,191 ----- /* determine host name */ hp = gethostbyaddr(&otherend.sin_addr, sizeof otherend.sin_addr, AF_INET); if (hp != NULL) ! (void) sprintf(buf, "%s", hp->h_name); else /* this should produce a dotted quad */ (void) sprintf(buf, "%lx", otherend.sin_addr.s_addr); *************** The following patches add two operators: $% and $&. $% is used to map host names into their canonical forms. Suppose a user sends mail to RED. This should be turned into the official name, RED.RUTGERS.EDU, in the headers. The fact that Unix does not do that causes problems to a number of mailers. It is a violation of the standards. $%n behaves like $n, i.e. it is replaced with the nth thing on the left hand side. However before doing the replacement, the thing is looked up in /etc/hosts. If it is a host name or nickname, the official form of the host name is used instead. $& is part of a change to allow us to differentiate between UUCP and TCP mail. What does foo!bar@baz mean? If the mail arrived via UUCP, it is probably foo!. If it arrived via TCP, it is by definition @baz. As we are a UUCP/TCP gateway, it is important for us to get these things right. In order to do so, we have done two things: - implemented $r. This is documented as being the name of the protocol via which the message arrived. However this was not implemented. I implement it as follows: - rmail calls sendmail with an option -pUUCP - the sendmail daemon code sets TCP automatically - everything else should be local mail, and does not set any value in this macro. - when a queue entry is written, the protocol name must be written out - implemented a new operator $& to allow us to use the value of $r in productions. You can't just use $r on the right side of a production. It will be evaluated when the freeze file is made. So $&r is equivalent to $r, but is evaluated when the rule is executed. This allows us to write rules that differentiate. Here is the part of sendmail.cf that uses it. It checks for foo!bar, but only if the message arrived via UUCP. Note the use of $&r to put the protocol name into the address, so we can match on it. R$-!$* $:<$&r>$1!$2 check arriving protocol R$-^$* $:<$&r>$1^$2 both syntaxes R$-!$* $@$>7$2<@$1.UUCP> if via UUCP, resolve R$-^$* $@$>7$2<@$1.UUCP> if via UUCP, resolve R<$*>$* $2 undo kludge *** main.c.ORIG Fri Feb 10 11:17:52 1984 --- main.c Mon Aug 26 04:10:51 1985 *************** *** 320,325 OpMode = MD_INITALIAS; break; # endif DBM } } --- 320,328 ----- OpMode = MD_INITALIAS; break; # endif DBM + case 'p': /* set protocol used to receive */ + define('r', &p[2], CurEnv); + break; } } *************** *** 538,543 /* at this point we are in a child: reset state */ OpMode = MD_SMTP; (void) newenvelope(CurEnv); openxscript(CurEnv); #endif DAEMON } --- 541,547 ----- /* at this point we are in a child: reset state */ OpMode = MD_SMTP; (void) newenvelope(CurEnv); + define('r',"TCP",CurEnv); openxscript(CurEnv); #endif DAEMON } *************** *** 701,706 /* and finally the conditional operations */ '?', CONDIF, '|', CONDELSE, '.', CONDFI, '\0' }; --- 705,716 ----- /* and finally the conditional operations */ '?', CONDIF, '|', CONDELSE, '.', CONDFI, + + /* now the normalization operator */ + '%', NORMREPL, + + /* and run-time macro expansion */ + '&', MACVALUE, '\0' }; *** parseaddr.c.ORIG Fri Feb 10 06:59:12 1984 --- parseaddr.c Mon Aug 26 04:44:15 1985 *************** *** 394,400 expand("$o", buf, &buf[sizeof buf - 1], CurEnv); (void) strcat(buf, DELIMCHARS); } ! if (c == MATCHCLASS || c == MATCHREPL || c == MATCHNCLASS) return (ONE); if (c == '"') return (QST); --- 394,401 ----- expand("$o", buf, &buf[sizeof buf - 1], CurEnv); (void) strcat(buf, DELIMCHARS); } ! if (c == MATCHCLASS || c == MATCHREPL || c == MATCHNCLASS || ! c == MACVALUE || c == NORMREPL ) return (ONE); if (c == '"') return (QST); *************** *** 446,451 # define MAXMATCH 9 /* max params per rewrite */ rewrite(pvp, ruleset) char **pvp; --- 447,453 ----- # define MAXMATCH 9 /* max params per rewrite */ + char hostbuf[512]; rewrite(pvp, ruleset) *************** *** 447,452 # define MAXMATCH 9 /* max params per rewrite */ rewrite(pvp, ruleset) char **pvp; int ruleset; --- 449,455 ----- char hostbuf[512]; + rewrite(pvp, ruleset) char **pvp; int ruleset; *************** *** 626,631 /* substitute */ for (avp = npvp; *rvp != NULL; rvp++) { register struct match *m; register char **pp; --- 629,635 ----- /* substitute */ for (avp = npvp; *rvp != NULL; rvp++) { + #include register struct match *m; register char **pp; char **oldavp,**tavp; *************** *** 628,633 { register struct match *m; register char **pp; rp = *rvp; if (*rp != MATCHREPL) --- 632,640 ----- #include register struct match *m; register char **pp; + char **oldavp,**tavp; + struct hostent *hostpt; + extern char *macvalue(); rp = *rvp; if ((*rp != MATCHREPL) && (*rp != NORMREPL)) *************** *** 630,636 register char **pp; rp = *rvp; ! if (*rp != MATCHREPL) { if (avp >= &npvp[MAXATOM]) { --- 637,643 ----- extern char *macvalue(); rp = *rvp; ! if ((*rp != MATCHREPL) && (*rp != NORMREPL)) { if (avp >= &npvp[MAXATOM]) { *************** *** 637,643 syserr("rewrite: expansion too long"); return; } ! *avp++ = rp; continue; } --- 644,655 ----- syserr("rewrite: expansion too long"); return; } ! if (*rp == MACVALUE) { ! if (macvalue(rp[1],CurEnv)) ! *avp++ = macvalue(rp[1],CurEnv); ! } ! else ! *avp++ = rp; continue; } *************** *** 642,647 } /* substitute from LHS */ m = &mlist[rp[1] - '1']; # ifdef DEBUG if (tTd(21, 15)) --- 654,660 ----- } /* substitute from LHS */ + m = &mlist[rp[1] - '1']; # ifdef DEBUG if (tTd(21, 15)) *************** *** 658,663 } # endif DEBUG pp = m->first; while (pp <= m->last) { if (avp >= &npvp[MAXATOM]) --- 671,677 ----- } # endif DEBUG pp = m->first; + oldavp = avp; while (pp <= m->last) { if (avp >= &npvp[MAXATOM]) *************** *** 666,671 return; } *avp++ = *pp++; } } *avp++ = NULL; --- 680,695 ----- return; } *avp++ = *pp++; + } + if (*rp == NORMREPL) { + hostbuf[0] = '\0'; + for (tavp = oldavp; tavp < avp; tavp++) + strcat(hostbuf,*tavp); + hostpt = gethostbyname(hostbuf); + if (hostpt) { + *oldavp = hostpt -> h_name; + avp = oldavp + 1; + } } } *avp++ = NULL; *** queue.c.ORIG Fri Feb 10 06:59:20 1984 --- queue.c Mon Aug 26 04:45:19 1985 *************** *** 105,110 /* output creation time */ fprintf(tfp, "T%ld\n", e->e_ctime); /* output name of data file */ fprintf(tfp, "D%s\n", e->e_df); --- 105,115 ----- /* output creation time */ fprintf(tfp, "T%ld\n", e->e_ctime); + /* output protocol */ + if (macvalue('r',e)) { + fprintf(tfp, "O%s\n", macvalue('r',e)); + } + /* output name of data file */ fprintf(tfp, "D%s\n", e->e_df); *************** *** 565,571 /* ** Open the file created by queueup. */ ! p = queuename(e, 'q'); f = fopen(p, "r"); if (f == NULL) --- 570,576 ----- /* ** Open the file created by queueup. */ ! p = queuename(e, 'q'); f = fopen(p, "r"); if (f == NULL) *************** *** 575,580 } FileName = p; LineNumber = 0; /* ** Read and process the file. --- 580,586 ----- } FileName = p; LineNumber = 0; + define('r',NULL,e); /* ** Read and process the file. *************** *** 595,600 (void) chompheader(&buf[1], FALSE); break; case 'M': /* message */ e->e_message = newstr(&buf[1]); break; --- 601,610 ----- (void) chompheader(&buf[1], FALSE); break; + case 'O': + define('r',newstr(&buf[1]),e); + break; + case 'M': /* message */ e->e_message = newstr(&buf[1]); break; *************** *** 628,634 break; } } - FileName = NULL; } /* --- 638,643 ----- break; } } FileName = NULL; } /* *** sendmail.h.ORIG Fri Feb 10 06:59:10 1984 --- sendmail.h Sat Aug 3 05:06:53 1985 *************** *** 290,295 # define CONDIF '\031' /* conditional if-then */ # define CONDELSE '\032' /* conditional else */ # define CONDFI '\033' /* conditional fi */ /* ** Symbol table definitions */ --- 290,300 ----- # define CONDIF '\031' /* conditional if-then */ # define CONDELSE '\032' /* conditional else */ # define CONDFI '\033' /* conditional fi */ + + /* normalize Internet address operator */ + # define NORMREPL '\034' /* normalized host replacement */ + # define MACVALUE '\035' /* run-time macro value */ + /* ** Symbol table definitions */ !STUFFY!FUNK! echo Extracting README cat >README <<'!STUFFY!FUNK!' The files in this package build the sendmail.cf files for machines at Georgia Tech. They are derived from the standard BSD 4.2 sendmail files, and form a set of sendmail files we received along with PMDF from the folks at CSNet. The CSNet set of files were put together by Ray Essick (essick@uiucdcs) and were a great help in putting this package together. Many of the individual rules were derived from various sources posted to the Usenet net.mail and net.sources newsgroups. Credit is also due Rick Adams at seismo.css.gov for his continued comments and help in debugging some of headers, and to Stuart Stirling at emory.CSNET for help with some of the initial debugging. Contained in this package are the following: 1) MANIFEST which lists each file in the package, along with a one line description of what it does; 2) KEY which describes macros used in the sendmail files; 3) source and Makefiles for building our various sendmail.cf files; 4) overview.ms, a paper describing how mail gets routed when mailed to or through gatech (nroff -ms overview.ms | more); 5) uumail.c, the source to our rerouting mailer ("pathalias", which is used to build the mailer database, has been posted multiple times to the net and is not included). See the comments at the beginning of the program before your try to install it; 6) PATCHES, which is a set of changes to the sendmail code, developed at Rutgers, and needed to be implemented to make these sendmail rules work optimally. Make sure to read about the corresponding change to "rmail" described in the comments. 7) Files, which is a brief list of the data files which are present to drive the sendmail on "gatech". The remainder of this file is an overview of the environment in which these files were developed and are used. The machines using "sendmail" at Georgia Tech fall into 3 basic categories: gateway ("gatech"), department machines on a common ethernet ("stratus", "nimbus", et.al.), and campus machines not on the same Ethernet as "gatech" (only "gt-cmmsr" so far). We have at least one Ethernet loop on campus which is separate from the ICS loop ("gtss", "gtqo", et. al.). "gatech" is intended to be the campus gateway machine. It is on the ICS common ethernet, has over 50 major uucp contacts known to the outside world, has a CSNet connection, a number of direct asynchronous links, and a set of rotored phone lines. Sometime in the not-too-distant future, it is possible that "gatech" will also be on the Arpanet and/or Bitnet. It is also the "traditional" mail address known to most outsiders. Thus, the machine is on 3 distinct networks, and has to be configured with the possibility of connecting to at least 1 other major international network in the near future. The department machines currently are comprised of the Clouds research machines "gt-stratus", "gt-cirrus", and "gt-nimbus", and the ICS/OCS Pyramid "gitpyr". They are connected via a common ethernet link, and they all can speak TCP at each other. Other machines are expected to be added to this group before long. Almost all of these machines have a single phone line and/or direct links for uucp to machines that can't speak TCP. (We are trying to keep a consistant naming scheme in use, and thus all campus machines will henceforth be named with the prefix "gt-" in the name. There are a few machines around which had established UUCP networks connections with different names before the decision to use this standard came into being, and their names will probably not change (e.g., "gitpyr").) The third class of machine on campus runs sendmail but has no TCP connection to the others because our Net/One bridge won't pass TCP packets across the backbone. These sites use a phone line or Net/One virtual circuit to connect to "gatech" and some of the other systems. Some of these machines may talk to each other via Ethernet, but there is no common connection amongst all of them. The basic idea in our configuration is for users to be able to use addresses of the forms: site!user, site!site2!user, user@site.UUCP user@site.CSNET, user@site.ARPA, user@site.MAILNET, user@site.BITNET user@site.DEC and the local case: user@site.GTNET, site:user, user%site We'd also like to be able to use just "user@site" and let the mailer figure it out. Here's how my sendmail files accomplish that: All of the internal machines are simple: they merely canonicalize the address according to standard rule, look to see if it is a GTNET host that they know and send the letter straight to that host. Local letters are handled appropriately. Any other address which looks like a network address is sent to the relay site, "gatech", except that each machine can have a small number of direct UUCP connections to outside machines. Ruleset zero for these systems check for these UUCP connections. Note that we use a file (/usr/lib/mail/uucp.local) to hold the UUCP connection list so that we don't have to play around with the actual sendmail configuration if we change contacts. The only thing one has to do to update the list of UUCP connections available on that host is update the file. If you run with a frozen sendmail.cf, you also have to type "/usr/lib/sendmail -bz". The "gatech" machine is the complex one. Any address that the internal machines are unable to handle gets bounced to this machine. The "gatech" machine speaks to a plethora of people. "gatech" should be able to recognize and route any (valid) address. The "gatech" machine compares UUCP addresses against a file similarl to the way the other machines handle them. Mail to the CSNET domain is sent to the PMDF mailer, which queues the letter for phone transmission to the CSnet-relay host. Mail to the ARPA domain, since we have no direct ARPA connection (yet), is handed to the PMDF mailer for transmission to the CSnet-relay, which is an ARPA host. Mail to the BITNET (IBM derivative) and MAILNET (through MIT-multics) machines are routed to the host defined by the $B and $M macros. Mail to the DEC E-net is routed to the site listed in the $E macro, currently "decwrl.arpa". Mail to the OZ network (Australia) is routed to munnari.uucp ($Z). Since we do not have connections to any of those networks, we instead append the address of a known gateway to the address forming something like: user@host.mailnet@mit-multics.arpa and then re-iterate through ruleset 0 to get from our machine to the gateway. Any address without a domain gets converted into an address of the form "user@site", and it makes an attempt to intuit the domain. This is done by checking (in order) the list of local sites, local uucp contacts (1 hop), CSNET, ARPA, BITNET, UUCP, and DEC E-net sites. In the event of a match, the proper domain name is appended to the address and we re-iterate through ruleset zero. This catches a fair number of missing domain problems and hasn't caused too much confusion about names in use in several domains. Finally, the "gatech" machine takes any left-over non-local names and returns them to the sender with a message about the fact that there is an unknown host/domain name in his letter. The UUCP mailer on "gatech" is a re-routing mailer. Any path or address handed to "uumail" gets an "optimal" path supplied to it. That is, the program steps through the address from ultimate destination to beginning, and if it knows a path to that site it will substitute that path for the remainder of the user-supplied path. For example, if the address "a!b!c!d!e!f" is provided to the mailer, and it knows how to get to site "d" via "z!y" (but no idea how to get to "e"), it will rewrite the path to be "z!y!d!e!f". The path database is built using "pathalias" on the uucp map data obtained from the Usenix machine ("gatech" is a regional repository of UUCP map information and gets near-synchronous copies of map updates). The ruleset along with "uumail" rewrites the "To:" field to look like "f@e.UUCP" since the user-supplied address-path is probably not the path that the mailer is going to use. Note that this means that "uumail.m4" and "uucpm.m4" are NOT identical in function -- beware if you decide to use one of them as a base in building your own files. "uucpm.m4" does not muck about with the "To:" field, nor does it reroute mail. This uucp mechanism allows any of our users to simply address mail to "foo@site.UUCP" and not worry about a path. It also optimizes message paths provided when answering news articles, and it allows our neighbors without mail routing software to address mail to "gatech!somesite!person" and expect the mail to get through, if possible. So far, no one has complained about not being able to force a particular path through our mailer. In the 6+ months this mechanism has been working, I've only discovered about 6 sites not registered with the map project and thus ccausing mail to them to fail. That's about it. If you find these useful in some way, great. If you should find bugs or possible enhancements to these files, I would greatly appreciate hearing about it. ---- Gene Spafford The Clouds Project, School of ICS, Georgia Tech, Atlanta GA 30332 CSNet: Spaf @ GATech ARPA: Spaf%GATech.CSNet @ CSNet-Relay.ARPA uucp: ...!{akgua,allegra,hplabs,ihnp4,linus,seismo,ulysses}!gatech!spaf !STUFFY!FUNK! echo Extracting Files cat >Files <<'!STUFFY!FUNK!' The following are the files that this particular sendmail configuration uses. These are in addition to the files /usr/lib/sendmail* and /usr/lib/aliases*. These files are normally set up in the directory /usr/lib/mail on each machine. The file uucp.local needs to be present on each machine, or else the declaration in gtbase.m4 changed so that the uucp neighbors are defined as a class macro in the individual .mc files. I chose the easy way out for here at GaTech. Each list is assumed to be one-per-line, unless otherwise stated. arpa.hosts -- a list of Arpa (EDU, DDN, etc.) hosts. This is derived from one of the hosts.txt file from the Arpa CIC. These also sometimes get posted to net.mail and/or sent out to CSNet sites from the CSNet NIC. bitnet.hosts -- a list of Bitnet hosts. This is obtained from some friendly Bitnet neighbor, or else taken from net.mail the next time it is posted there. csnet.hosts -- a list of CSNet hosts and acceptable nicknames. This can be obtained from the CSNet NIC. decnet.hosts -- a list of DEC network hostnames. This list is really a kludge and I am seriously considering removing it. DEC considers the actual list to be a company secret and so the list is by no means complete. I derived my copy from checking news paths and trading lists with other such watchers. Each entry is in the list twice -- once as "site" and once as "dec-site". This leads to some horrible name conflicts. mailnet.hosts -- a list of Mailnet hosts. I have no idea where you can get an up-to-date copy. Someone mailed me a copy a long time ago, and I haven't seen much mention since. I know next to nothing about it. uucp.hosts -- the output of pathalias when run on the uucp maps. This list is of all sites reachable via uucp, one per line, with a "sprintf" format string specifying the path on the same line, separated from the hostname by a tab. I sort mine, but I don't think it makes a difference. uucp.hosts.dir, uucp.hosts.pag -- the uucp.hosts file in dbm format after running makedb on them (makedb was part of the last pathalias release). uucp.local -- a list of all sites reachable directly via uucp from this site. That is, sites for which we have L.sys info. uumail -- the uumail program executable file. uumail.log -- a log of uucp mail passing through our site. Each line consists of the sender's address, followed by a tab, the destination address as it was presented to uumail from sendmail, a tab, and the actual path that uumail dispatched the mail along. This data can be analysed for traffic patterns, finding sites not listed in the uucp maps, and checking to see how well uumail is working. If the input to uumail generates an error, the third field is the error message generated by uumail and sent back to the "sender". !STUFFY!FUNK! echo "" echo "End of kit 1 (of 3)" cat /dev/null >kit1isdone config=true for iskit in 1 2 3; do if test -f kit${iskit}isdone; then echo "You have run kit ${iskit}." else echo "You still need to run kit ${iskit}." config=false fi done case $config in true) echo "You have run all your kits. Please read README." ;; esac : I do not append .signature, but someone might mail this. exit