DOCS added a ton of comments for Lessons 2 and 3.

2018-04-25 20:42:39 -07:00 · 2018-04-25 20:42:39 -07:00 · 323e46ade1
parent aaed91113c
commit 323e46ade1
5 changed files with 139 additions and 9 deletions
--- a/.gitignore
+++ b/.gitignore
@ -2,3 +2,7 @@
 *~
 \#*
 .\#*
 hello32
 hello64
 counted-hello32
 counted-hello64
--- a/21
+++ b/21
@ -1,24 +1,27 @@
 .PHONY: help default
 default: help
 help:                 ## Print this help message
 	@M=$$(perl -ne 'm/^((\w|-)*):.*##/ && print length($$1)."\n"' Makefile | \
 		sort -nr | head -1) && \
 		perl -ne "m/^((\w|-)*):.*##\s*(.*)/ && print(sprintf(\"%s: %s\t%s\n\", \$$1, \" \"x($$M-length(\$$1)), \$$3))" Makefile
-hello32: hello32.s    ## Build the 32 bit version of Project 1
+hello32: hello32.s    ## Build the 32 bit version of Project 2
 	nasm -f elf hello32.s
 	ld -m elf_i386 -o hello32 hello32.o
-hello64: hello64.s    ## Build the 32 bit version of Project 1
+hello64: hello64.s    ## Build the 32 bit version of Project 2
 	nasm -f elf64 hello64.s
 	ld -o hello64 hello64.o
-hello-strlen32: hello-strlen32.s    ## Build the 32 bit version of Project 1
+counted-hello32: counted-hello32.s    ## Build the 32 bit version of Project 3
-	nasm -f elf hello-strlen32.s
+	nasm -f elf counted-hello32.s
-	ld -m elf_i386 -o hello-strlen32 hello-strlen32.o
+	ld -m elf_i386 -o counted-hello32 counted-hello32.o
-hello-strlen64: hello-strlen64.s    ## Build the 32 bit version of Project 1
+counted-hello64: counted-hello64.s    ## Build the 32 bit version of Project 3
-	nasm -f elf64 hello-strlen64.s
+	nasm -f elf64 counted-hello64.s
-	ld -o hello-strlen64 hello-strlen64.o
+	ld -o counted-hello64 counted-hello64.o
 clean:                ## Delete all built and intermediate features
-	rm -f hello32 hello64 hello-strlen32 hello-strlen64 *.o
+	rm -f hello32 hello64 counted-hello32 counted-hello64 *.o
--- a/README.md
+++ b/README.md
@ -0,0 +1,123 @@
 # Doing Things with Assembly Language and NASM
 This is just a list of short assembly language programs that I used to
 reboot my assembly language skills, this time in X86 and X86_64.  ("This
 time" because the last time I wrote assembly language I was writing for
 the Motorola 68000 line.)
 The tutorial I based this off of is at http://asmtutor.com/
 ## Getting Started
 There's a Makefile.  It has a nice help¹.
 ## Lesson 2
 There is no Lesson 1.  Okay, there *is*, but I didn't do it.  While I
 was looking around for tutorials I found a couple that taught different
 things, and one of the things they all agreed on was a proper exit
 command.  Since all Lesson 2 does is add that command, that's what I
 did.
 I also used a few NASM features not in the ASM Tutorial.  The `%define`
 Nasm preprocessor allows you to provide named constants, and I've used
 them here.
 The syntax `equ $-msg` basically means "The address from HERE, the first
 byte of this named data segment, minus the address named," which puts
 into `len` the length of the string.  It only works because `len` is the
 immediate next data segment.
 ### Differences between the 32 and 64 bit versions.
 The biggest difference that I see is that the Syscalls have all be
 redefined.  "Write" and "exit" were 4 & 1 in 32-bit Linux, but 1 & 60 int
 64-bit, respectively.  The ASM Tutorial was 32-bit only, and used the
 first four registers.  When I ported it to the 64-bit version, the
 syscall for `write()` uses different registers.
 The 32 bit version uses `int 80h` to interrupt the kernel.  The 64 bit
 uses `syscall`.  The
 [Linux System Call Table](http://blog.rchapman.org/posts/Linux_System_Call_Table_for_x86_64/)
 is handy here.
 ### Lessons
 So far, the assembly language programs have two `sections`: one for
 constant data, the other for the actual program.  Before either section
 there are macros and directives.  Right now the only macros I'm using
 define constants.
 We aren't allocating any memory that's not in a `.data` segment.  And
 that's okay.  Everything is happening inside registers.  The CPU has 16
 of them.  Some of them have side-effects and optimizations, and others
 are *required* for some operations.  The AX register, for example, used
 to be the destination for mathematical operations.  The X86_64 CPU
 architecture is built around stack-based operations, and the command
 `push reg` will push a value (either a register or memory contents) onto
 the stack pointed to by the SP and BP registers, *and then increment
 those registers*.  So, you know, there are quirks to memorize.
 The Makefile contains compiling and linking instructions.  They're
 different for 32 and 64 bit programs, and learning those differences
 would be useful if you intend to write a lot of assembly language.
 ## Lesson 3
 Lesson 3 is a lot like lesson 2, only instead of knowing the length of
 the string, we're going to calculate it, using the NULL value as our
 end-of-string marker.  This also introduces comparison and jump
 commands!
 The question embedded in my comment in the source file is legitimate.
 At the time, I didn't know if `sub` sets things like the "is zero" flag
 when two values are the same value, the way `cmp` does.  The
 [Intel X86 Manual](https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf)
 (**Warning**: PDF, and very big!) doesn't say they do, and the contents
 of those flags should probably not be regarded as robust or reliable
 after a `sub` operation.
 With the 64-bit version, rather than blindly copy the ax/bx/cx/dx
 sequence of registers, I deliberately chose to use `RSI` (the Source
 Index Register) for my data source.  While the first eight registers are
 considered "general purpose," RSI is (somewhat) optimized to read data
 out of memory and its use is a signal to the CPU's predictive cache.  I
 don't know if that's any use to me yet, but it's something I'm aware of
 and I might someday have a use for it.
 ### Memory addressing syntax
 Lesson three also introduces the `cmp byte [rax], 0` syntax, which does
 a few things.  First, there are a *crazy* number of opcodes for the X86
 architecture, and `cmp` is only one-half.  An opcode is the numeric
 representation of an instruction to the chip; it's bit sequence
 literally instructs which nanoscopic wires in the chip to light up to
 perform an operation.  Not including the wild stuff, an Intel chip has
 something like 1,900 opcodes.  But you'll only need to know about 20 of
 them.
 The `[rax]` syntax tells nasm to generate the `cmp` opcode for which the
 first operand is an address in memory; `cmp` will fetch the thing at
 that address first before doing the comparison.  (I'm not sure if this
 occupies another register or what.  The manual doesn't say!)  The `byte`
 command says that the comparison is on a byte-by-byte basis, so that's a
 *different* opcode, but I suspect nasm makes it easy to remember which
 is which with mnemonics.  You don't need to know different ASM commands
 for "compare two registers," "compare a memory location with a
 register," and "compare a memory location with a constant," because
 nasm's syntax makes it easy to understand those operations.
 What I do know is that the one thing you *can't* do is compare two
 memory locations directly.  `cmp` works with two registers, or a
 register and a memory location, or a register and a constant, but no
 other combination.
 More to come... I hope...
 ---
 Footnotes!
 ¹ I firmly believe that no command, typed blindy, should modify the
 contents of your hard drive.  Make takes target arguments, and you
 should specify the targets you want built.  So `make` by itself only
 issues help.
--- a/counted-hello32.s
+++ b/counted-hello32.s
--- a/counted-hello64.s
+++ b/counted-hello64.s