Received: from watserv1.uwaterloo.ca (watserv1.waterloo.edu) by karazm.math.UH.EDU with SMTP id AA00339
  (5.65c/IDA-1.4.4 for <glove-list@karazm.math.uh.edu>); Fri, 18 Oct 1991 17:06:10 -0500
Received: by watserv1.uwaterloo.ca
	id <AA10961>; Fri, 18 Oct 91 18:02:00 -0400
Date: Fri, 18 Oct 91 18:02:00 -0400
From: Dave Stampe-Psy+Eng <dstamp@watserv1.uwaterloo.ca>
Message-Id: <9110182202.AA10961@watserv1.uwaterloo.ca>
To: glove-list@karazm.math.uh.edu
|> One reasons that Jez San et al write such fast code is that they
|> count 
|> almost every instruction cycle in their assembler routines aiming to
|> end up 
|> with as few as possible. Writing at such a low level of course means
|> that
|> they have to know a lot about how the computer and its processor
|> work. High
|> level languages and computer operating systems tend to try to mask
|> off the
|> computer; they also try to be flexible which is not always
|> necessary.

>This debate went through the Amiga newsgroups recently, and I will quickly
>point out the conclusion reached there: many good C compilers, with the
>optimizer turned on, come close enough to a good assembler programmer as
>to be indistinguishable. This came out after assembly programmers posted
>an assembly algorithm as they would do it, and the C programmers ran the
>same thing through their C compilers, and the two were nearly identical!

 Was this graphics code or was it something else?  Was it programming the
  Amiga hardware or was it doing the tight loops we know and ?love???

>There was a small cycle-count difference (something like 2 cycles in
>a 100-cycle loop) in that particular loop, so you are right -- assembly
>programmers can get every last cycle, but they're not winning by much.
>And as has already been pointed out here, the high-level programmer can 
>spend more time refining the algorithm, and thus maybe execute the
>loop fewer times. Conclusion: Assembly programmers make each iteration
>of the loop slightly faster, but high-level programmers iterate it
>less.
>
>-- Greg Stelmack (stelmack@eggo.csee.usf.edu)

I think I see the problem here.  The difference is between the IBM and Amiga
designs.  Any real graphics work on the IBM PC requires multiple I/O space
accesses with inportb() and ouportb() type routines.  Since most of the 
available C compilers do not replace these with inline code, this results
in much slower operation than with assembly code.  Also, many of the good 
instructions on the 80x86 (such as LOOP or REP STOSB) are not used by 
compilers.  Again, assembly code is the only solution.  

Conversely, the Amiga's 68000 processor accesses its graphics hardware's
registers as memory-mapped, so C tends to work well here.  There are very
few special instructions on the 68K processor that are good for graphics.
Also, on the Amiga, the graphics hardware makes the need for tight, fast
graphics loops less.

When it comes to higher-level stuff like handling data structures and
calculating stuff, I agree that C is better than assembler.  My code
usually consists of 90% C and 10% embedded assembly code for the graphics
kernals.  I usually gain 300-1000% in performance of the graphics sections
of my code by converting selected sections to assembler.

The low-level VR thread portion which is also taking place via the powerglove
mailing list seems to be showing that there is at LEAST a 50-50 split
between graphics and database manipulation/sorts/etc in a VR system.
This indicates that C code is going to be VERY useful.  Conversely, the
need for assembly code (machine-specific) in the actual poly-rendering
drivers is especially great on the IBM PC machines (see above).

- Dave Stampe