DOCS added a ton of comments for Lessons 2 and 3.
This commit is contained in:
parent
aaed91113c
commit
323e46ade1
|
@ -2,3 +2,7 @@
|
|||
*~
|
||||
\#*
|
||||
.\#*
|
||||
hello32
|
||||
hello64
|
||||
counted-hello32
|
||||
counted-hello64
|
||||
|
|
21
Makefile
21
Makefile
|
@ -1,24 +1,27 @@
|
|||
.PHONY: help default
|
||||
|
||||
default: help
|
||||
|
||||
help: ## Print this help message
|
||||
@M=$$(perl -ne 'm/^((\w|-)*):.*##/ && print length($$1)."\n"' Makefile | \
|
||||
sort -nr | head -1) && \
|
||||
perl -ne "m/^((\w|-)*):.*##\s*(.*)/ && print(sprintf(\"%s: %s\t%s\n\", \$$1, \" \"x($$M-length(\$$1)), \$$3))" Makefile
|
||||
|
||||
hello32: hello32.s ## Build the 32 bit version of Project 1
|
||||
hello32: hello32.s ## Build the 32 bit version of Project 2
|
||||
nasm -f elf hello32.s
|
||||
ld -m elf_i386 -o hello32 hello32.o
|
||||
|
||||
hello64: hello64.s ## Build the 32 bit version of Project 1
|
||||
hello64: hello64.s ## Build the 32 bit version of Project 2
|
||||
nasm -f elf64 hello64.s
|
||||
ld -o hello64 hello64.o
|
||||
|
||||
hello-strlen32: hello-strlen32.s ## Build the 32 bit version of Project 1
|
||||
nasm -f elf hello-strlen32.s
|
||||
ld -m elf_i386 -o hello-strlen32 hello-strlen32.o
|
||||
counted-hello32: counted-hello32.s ## Build the 32 bit version of Project 3
|
||||
nasm -f elf counted-hello32.s
|
||||
ld -m elf_i386 -o counted-hello32 counted-hello32.o
|
||||
|
||||
hello-strlen64: hello-strlen64.s ## Build the 32 bit version of Project 1
|
||||
nasm -f elf64 hello-strlen64.s
|
||||
ld -o hello-strlen64 hello-strlen64.o
|
||||
counted-hello64: counted-hello64.s ## Build the 32 bit version of Project 3
|
||||
nasm -f elf64 counted-hello64.s
|
||||
ld -o counted-hello64 counted-hello64.o
|
||||
|
||||
clean: ## Delete all built and intermediate features
|
||||
rm -f hello32 hello64 hello-strlen32 hello-strlen64 *.o
|
||||
rm -f hello32 hello64 counted-hello32 counted-hello64 *.o
|
||||
|
|
|
@ -0,0 +1,123 @@
|
|||
# Doing Things with Assembly Language and NASM
|
||||
|
||||
This is just a list of short assembly language programs that I used to
|
||||
reboot my assembly language skills, this time in X86 and X86_64. ("This
|
||||
time" because the last time I wrote assembly language I was writing for
|
||||
the Motorola 68000 line.)
|
||||
|
||||
The tutorial I based this off of is at http://asmtutor.com/
|
||||
|
||||
## Getting Started
|
||||
|
||||
There's a Makefile. It has a nice help¹.
|
||||
|
||||
## Lesson 2
|
||||
|
||||
There is no Lesson 1. Okay, there *is*, but I didn't do it. While I
|
||||
was looking around for tutorials I found a couple that taught different
|
||||
things, and one of the things they all agreed on was a proper exit
|
||||
command. Since all Lesson 2 does is add that command, that's what I
|
||||
did.
|
||||
|
||||
I also used a few NASM features not in the ASM Tutorial. The `%define`
|
||||
Nasm preprocessor allows you to provide named constants, and I've used
|
||||
them here.
|
||||
|
||||
The syntax `equ $-msg` basically means "The address from HERE, the first
|
||||
byte of this named data segment, minus the address named," which puts
|
||||
into `len` the length of the string. It only works because `len` is the
|
||||
immediate next data segment.
|
||||
|
||||
### Differences between the 32 and 64 bit versions.
|
||||
|
||||
The biggest difference that I see is that the Syscalls have all be
|
||||
redefined. "Write" and "exit" were 4 & 1 in 32-bit Linux, but 1 & 60 int
|
||||
64-bit, respectively. The ASM Tutorial was 32-bit only, and used the
|
||||
first four registers. When I ported it to the 64-bit version, the
|
||||
syscall for `write()` uses different registers.
|
||||
|
||||
The 32 bit version uses `int 80h` to interrupt the kernel. The 64 bit
|
||||
uses `syscall`. The
|
||||
[Linux System Call Table](http://blog.rchapman.org/posts/Linux_System_Call_Table_for_x86_64/)
|
||||
is handy here.
|
||||
|
||||
### Lessons
|
||||
|
||||
So far, the assembly language programs have two `sections`: one for
|
||||
constant data, the other for the actual program. Before either section
|
||||
there are macros and directives. Right now the only macros I'm using
|
||||
define constants.
|
||||
|
||||
We aren't allocating any memory that's not in a `.data` segment. And
|
||||
that's okay. Everything is happening inside registers. The CPU has 16
|
||||
of them. Some of them have side-effects and optimizations, and others
|
||||
are *required* for some operations. The AX register, for example, used
|
||||
to be the destination for mathematical operations. The X86_64 CPU
|
||||
architecture is built around stack-based operations, and the command
|
||||
`push reg` will push a value (either a register or memory contents) onto
|
||||
the stack pointed to by the SP and BP registers, *and then increment
|
||||
those registers*. So, you know, there are quirks to memorize.
|
||||
|
||||
The Makefile contains compiling and linking instructions. They're
|
||||
different for 32 and 64 bit programs, and learning those differences
|
||||
would be useful if you intend to write a lot of assembly language.
|
||||
|
||||
## Lesson 3
|
||||
|
||||
Lesson 3 is a lot like lesson 2, only instead of knowing the length of
|
||||
the string, we're going to calculate it, using the NULL value as our
|
||||
end-of-string marker. This also introduces comparison and jump
|
||||
commands!
|
||||
|
||||
The question embedded in my comment in the source file is legitimate.
|
||||
At the time, I didn't know if `sub` sets things like the "is zero" flag
|
||||
when two values are the same value, the way `cmp` does. The
|
||||
[Intel X86 Manual](https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf)
|
||||
(**Warning**: PDF, and very big!) doesn't say they do, and the contents
|
||||
of those flags should probably not be regarded as robust or reliable
|
||||
after a `sub` operation.
|
||||
|
||||
With the 64-bit version, rather than blindly copy the ax/bx/cx/dx
|
||||
sequence of registers, I deliberately chose to use `RSI` (the Source
|
||||
Index Register) for my data source. While the first eight registers are
|
||||
considered "general purpose," RSI is (somewhat) optimized to read data
|
||||
out of memory and its use is a signal to the CPU's predictive cache. I
|
||||
don't know if that's any use to me yet, but it's something I'm aware of
|
||||
and I might someday have a use for it.
|
||||
|
||||
### Memory addressing syntax
|
||||
|
||||
Lesson three also introduces the `cmp byte [rax], 0` syntax, which does
|
||||
a few things. First, there are a *crazy* number of opcodes for the X86
|
||||
architecture, and `cmp` is only one-half. An opcode is the numeric
|
||||
representation of an instruction to the chip; it's bit sequence
|
||||
literally instructs which nanoscopic wires in the chip to light up to
|
||||
perform an operation. Not including the wild stuff, an Intel chip has
|
||||
something like 1,900 opcodes. But you'll only need to know about 20 of
|
||||
them.
|
||||
|
||||
The `[rax]` syntax tells nasm to generate the `cmp` opcode for which the
|
||||
first operand is an address in memory; `cmp` will fetch the thing at
|
||||
that address first before doing the comparison. (I'm not sure if this
|
||||
occupies another register or what. The manual doesn't say!) The `byte`
|
||||
command says that the comparison is on a byte-by-byte basis, so that's a
|
||||
*different* opcode, but I suspect nasm makes it easy to remember which
|
||||
is which with mnemonics. You don't need to know different ASM commands
|
||||
for "compare two registers," "compare a memory location with a
|
||||
register," and "compare a memory location with a constant," because
|
||||
nasm's syntax makes it easy to understand those operations.
|
||||
|
||||
What I do know is that the one thing you *can't* do is compare two
|
||||
memory locations directly. `cmp` works with two registers, or a
|
||||
register and a memory location, or a register and a constant, but no
|
||||
other combination.
|
||||
|
||||
More to come... I hope...
|
||||
|
||||
---
|
||||
Footnotes!
|
||||
|
||||
¹ I firmly believe that no command, typed blindy, should modify the
|
||||
contents of your hard drive. Make takes target arguments, and you
|
||||
should specify the targets you want built. So `make` by itself only
|
||||
issues help.
|
Loading…
Reference in New Issue