Show HN: RISC-V core written in 600 lines of C89

190 points
1/20/1970
a year ago
by mnurzia

Comments


aportnoy

How about a RISC-V disassembler in 200 lines of C99?

https://github.com/andportnoy/riscv-disassembler/blob/master...

a year ago

mnurzia

This is really cool, thanks for sharing! Something like this would be a great tool to distribute with my emulator.

a year ago

garganzol

It would be nice if you could put a link to that project to your README file. Both projects are very impressive, especially when seen in conjunction with each other.

a year ago

aportnoy

I mean, his simulator already has a disassembler contained within it, would just need to replace comments with print statements.

a year ago

bjourne

Why stick with c89? Can't think of any compilers that doesn't support c99 nowadays. The major benefit is that you can use uint8_t and friends directly and don't need to define your own wrapper types.

a year ago

mnurzia

It's more of a fun exercise, I guess. But I do have experience with at least one compiler that doesn't support C99: Zilog's ez80 C compiler. Back in the day I used to program my TI-84+ CE for fun[0], and the only C solution was a pretty bespoke C89-only compiler[1] distributed with a community toolchain[2], which has since switched to clang. It's somewhat irrational, but in the back of my mind it bugs me if the software I write can't run on platforms like that.

[0] https://github.com/mnurzia/chip8-ce

[1] http://www.zilog.com/docs/appnotes/pb0098.pdf

[2] https://ce-programming.github.io/toolchain/

a year ago

flohofwoe

One "advantage" (if one wants to call that) is that the code would also compile as C++, while C99 has diverged enough from the common C/C++ subset that one cannot use all C99 features in C++ mode.

a year ago

mnurzia

I totally missed this, good point.

Slightly unrelated, but just thought I should mention: the sokol libraries are awesome!

a year ago

bjourne

There never was a "common C/C++ subset". See https://softwareengineering.stackexchange.com/a/298667/18260

a year ago

flohofwoe

Ah that old thing again ;)

C isn't a subset of C++ (and never was). But there's still a common subset of both the C and C++ languages which compiles both in a C compiler and a C++ compiler (and behaves the same at runtime despite slightly different C vs C++ semantics), and that common subset is what I call C/C++: the pidgin dialect that's neither quite C nor quite C++ but compiles as both.

a year ago

dezgeg

I've met several people that seriously think that C89 is the peak of programming languages and that C99 just brings misfeatures (like, allowing variable declarations in middle of basic blocks according to them)

a year ago

foobarbaz33

Forcing declares at the top makes it easy to estimate at a glance (or exactly calculate) how much space the stack frame will use.

a year ago

dezgeg

Maybe in the early days of C, but with modern compilers doing stuff like keeping variables in only registers, inlining functions, stack cookies, merging non overlapping variables etc. that seems not really worth it. If you want to avoid accidental huge stack usage you can pass flag to gcc/clang to trigger warning when stack usage of a function goes over the specified limit.

a year ago

boricj

Funnily enough, the file rv.h does use stdint.h if available and contains the following comment:

> All I want for Christmas is C89 with stdint.h

a year ago

contrarian1234

Did Visual Studio finally make the jump?! (you could always just compile it as C++ code though)

a year ago

bjourne

Nope stdint.h has been in msvc for over 10 years. Other c99 features may be not supported though.

a year ago

flohofwoe

Except for VLAs (which are optional post-C99 anyway), MSVC actually has pretty good support for recent C versions, and since 2020 they're basically back on the "modern C" train: https://devblogs.microsoft.com/cppblog/c11-and-c17-standard-...

a year ago

mort96

Hasn't the main issue with MS been VLAs? I seem to recall that VLAs are the main reason MSVC won't ever support C99, and that MSVC is one of the main reasons why VLAs were made optional. It seems like MSVC supports C11 and C17 now, thanks to the removal of mandatory VLAs.

a year ago

pjmlp

The whole security industry has vettoed VLAs.

Google even went the extra pay to pay the effort to remove them from the Linux kernel.

a year ago

mort96

Yeah, that's my point. The situation isn't, "MS is garbage, they only support C89"; the situation is "MS supports modern C pretty well, their lack of official C99 supports is just a technicality caused by VLAs which you shouldn't use anyway".

10 months ago

zabzonk

vehement oppostion from ms my be one of the reasons for them being optional (and thus worthless) but the main one is that that they are impossible to use correctly. what happens if you make one too big?

a year ago

mort96

I think they could potentially have some very limited valid use cases, but I agree that a fixed length array and/or heap allocation is usually much better than VLAs.

I was mainly just pointing out that MS's lack of C99 support isn't really a part of keeping C89 alive, especially now that they officially support C11.

a year ago

aragilar

I'm not sure how robust that support is though: https://discuss.python.org/t/requiring-compilers-c11-standar...

a year ago

mort96

If the reply is correct, the only issue there is that they're compiling in C89 mode. If you pass `-pedantic -std=c89` to GCC, it too will warn on C99+ features. If you pass `/std:c11` or `/std:c17` to MSVC, C99+ features should work.

a year ago

Dylan16807

> what happens if you make one too big?

Doesn't that apply to fixed length arrays too?

a year ago

mort96

It does, but if you create a fixed length array that's too big, you'll just deterministically blow the stack regardless of user input. With VLAs (or alloca), your array length is determined by some runtime property. Whether you blow the stack doesn't just depend on the code path any more, but on the data you're operating on too.

As a bonus, that data is often user input...

10 months ago

jpfr

MSVC did a big rewrite of the C frontend around MSVC2013. I haven’t encountered C99 idioms that don’t work nowadays. Granted, I might not use every feature in my typical coding style…

a year ago

arp242

It's been "fully" C99 (and C11, C17) compliant for about 2 or 3 years. The only missing C99 featured before that were relatively rarely used ones like _Pragma.

a year ago

flohofwoe

It's not C99 compliant because that would require VLA support (which has been made optional in C11, which in turn enabled MSVC to be a C11 and C17 compiler, but not C99. Not that it matters much in practice though :)

a year ago

garganzol

Seeing the RISC-V instructions implemented in the emulator like that, it comes to my mind that RISC-V is really a reduced instruction set CPU.

When compared to AVR 16-bit RISC instruction set, RISC-V looks so much simpler. (You may be indirectly familiar with AVR architecture by the household name "Arduino".)

The intriguing part is that AVR is just a microcontroller, while RISC-V is intended to be a full-blown CPU.

a year ago

opencl

The base instruction set is tiny but there are quite a few extensions and pretty much every practical implementation includes at least a few of them.

i.e. the GD32V microcontrollers implement RV32IMAC, Allwinner D1 which is a "full-blown" CPU meant to run Linux implements RV64IMAFDCVU.

RV32I/RV64I are the base 32/64 bit integer instruction sets and every letter after that is a different extension. Most of the extensions are relatively small and simple, but the C (compressed instructions) extension introduces some decoder complexity and the V (vector) extension adds several hundred instructions.

Though even with all the extensions it is still a very small/simple ISA by modern standards.

a year ago

staunton

Pretty much all architectures have "simple" instruction sets under the hood, that is, the microcode that executes the mess we put in binaries. RISC-V is based on the idea that you can skip most of this step. The difficulty is getting fusing and other optimizations to work so the throughout remains high, which seems to work so far.

a year ago

bitwize

I feel myself descending into old-fartitude more and more with every year. My wife and I were recently involved in a car accident (no one was hurt). While I was being checked out I overheard a 20-year-old firefighter exchange Facebook information with an 18-year-old EMT. I was like, "wait a minute, you guys seem really young and you still use Facebook? I thought Facebook was for your grandparents and all the kids now use Snapchat or TikTok?"

I get that same feeling now. This kid is 20 and still using C89? Shouldn't people his age have been reared entirely in the crystal-spires-and-togas utopia of Rust, with raw pointers and buffer overruns being mere legends of their people's benighted past told to them by their elders?

It's kind of comforting to see young programmers embracing the old ways, even if it's for hack value only.

a year ago

mnurzia

Admittedly, C89 has very little utility, especially among people my age. For example, my university progresses from Racket to Java to C++, and has a systems course that partially teaches C11. Although good for teaching, I don't think those languages artificially constrained me in the ways that C89 does. I felt that my programming skills improved the most when I forced myself to work in such an under-powered language.

I also like the idea of being able to run my code anywhere, kind of like Doom.

a year ago

sitkack

I think kids or at least there’s the risk of kids seeing old people romantically reenacting their eight bit micro days and think that it’s some thing besides nostalgia.

I was kind of the opposite as a kid, if it wasn’t crazy futuristic I didn’t want it. So even in the 80s I wanted an FPGA accelerators in every machine.

a year ago

bitwize

It's not just nostalgia. Those old computers really are fun to operate -- like an MGB is fun to drive -- in ways modern systems aren't, even if they are far less useful than a modern system. In fact it's now possible to take advantage of modern software tools on modern systems and push those old beasts to new heights they couldn't have possibly reached during their heyday.

a year ago

LoganDark

> even in the 80s I wanted an FPGA accelerators in every machine

Mostly unrelated, but I recently discovered that you can buy TPUs, right now, as a consumer product, from https://coral.ai.

The stock firmware already allows you to run these things so hard they overheat, which is amazing.

But yes, I also want FPGA accelerators.

10 months ago

nevi-me

Question: do the implementation of single instructions compile to single instructions if targeting RISC-V with optimisations enabled? That would be really awesome if compilers realise what your code is doing and replace the implementations of instructions with those instructions.

a year ago

mnurzia

Not really, my implementation isn't smart enough to guide compilers to the right solution. Trivial instructions, like xor, are of course recognized, but for example the 32x32 mul implementation isn't. Maybe compilers will be smart enough one day...

https://godbolt.org/z/WEcTzKf7M

a year ago

dbcurtis

Yeah, well, the rock that breaks your pick in that scenario is copying all the processor state back and forth to/from the emulation model, including flag register bits, and also correctly handling exceptions and faults. Emulating the instruction’s happy path is just scratching the surface.

a year ago

sitkack

In the guest, you trap on reading emulation state, so that the source of truth is the hardware. Rather than use something like KVM I wonder if you could run another child process and use P trace?

a year ago

duskwuff

> including flag register bits

RISC-V doesn't have those. Compare+branch is a single instruction.

a year ago

peterfirefly

'switch' is a really, really nice language construct that was fully implemented long before C89. Using lots of nested 'if's instead is not a good idea.

a year ago

hgs3

'switch' is good, but for VM's computed goto is better.

a year ago

KerrAvon

depends on the compiler implementation. modern compilers may be able to treat equivalent switch statements, gotos, and if/else statements pretty much the same

a year ago

nsajko

Only in trivial cases.

a year ago

sylware

nested "ifs" are optimized out by compilers. Moreover in the latest horrible gcc extensions you have the case statement using a _not_compiler constant expression (you can find the usage of such horrible gcc extension in linux net code).

a year ago

mnurzia

This was my one of my main justifications for making this design choice, in addition to the (in my opinion) overwhelming amount of break statements that would result from using switches. But more importantly, many of the "if" statements have non-constant or more complex expressions in them that aren't supported in switch statements in ANSI C.

a year ago

sylware

Yep.

And as you stated, it is important to stay as much as possible close to c89, because ISO is literaly doing planned obsolescence, but on a long time cycle (5-10 years).

Hopefully risc-v will be a success, and all system components and interpreters of very-high-level languages will be rewritten in risc-v assembly and it will become actually very hard to do planned-obsolescence.

a year ago

freecodyx

This proves that at the core. The things we rely on to achieve great software and life impacting technologies are extremely simple. The complexity is that how to make them.

a year ago

arcticbull

The core concepts are generally very straightforward, however it's always the optimization that adds complexity. That's how you get the orders of magnitude improvement. This C89 core definitely doesn't do macro op fusion for instance.

a year ago

freecodyx

By complexity, i meant the hardware it self. Not even the architecture or the instructions set. For example you can design a virtual machine in days. But making a real one is at the core of the geopolitical issues we have today

a year ago

numpad0

The complexity is in how to distribute dev workload and how to make it financially viable. No one pays for beautiful works of art unless it’s somehow anchored, tangled and aligned into their interests.

a year ago

charcircuit

This isn't a RISC-V core. It is a RISC-V emulator library.

a year ago

sylware

A bigger implementation, but has 64bits support:

https://bellard.org/tinyemu/

a year ago

RobotToaster

Is this designed to be used with some kind of C to VHDL/verilog transpiler?

a year ago

RealityVoid

Not really, think of it like a... CPU emulator? Ish? You have registers as variables in the program. If you have register a1 and you are at an instruction adding 1 to it, it will add 1 to the variable representing a1. So on and so forth.

This works because, well, memory operations are mostly(all?) a CPU does so this "core" takes the program and does the same kind of memory operations the silicon would do, only in SW.

a year ago

userbinator

Besides not using a switch() for the main instruction decode, there's nothing surprising here. Anyone who has worked with emulators before will find this code straightforward to read. RISC-V really is the new MIPS.

a year ago

rowanG077

The Readme doesn't answer it but I struggle to see why you want a c implementation of an ISA.

a year ago

detrites

Not sure if this was intended, but coming to this as someone vaguely aware of RISC-V, it's looking like a fantastic form of documentation for the ISA, that both describes and gives a way to play with it, but in an intuitive, even fun manner.

Obviously this works best for someone who already knows C - but, given it's C89 mitigates against this aspect somewhat.

a year ago

rowanG077

A reference implementation would be in Verilog or VHDL.

a year ago

nly

So you can compile and run it on any platform with a C compiler

a year ago

rowanG077

That is just something you can do with C code. That is not a goal in itself. Why would you want to run a C ISA instead of just using a standard simulator? Why not use verilator + any of the open source RISC-V cores?

a year ago

LoganDark

Because those are slower, more complex, and more difficult to understand?

a year ago

rowanG077

I doubt verilator is much slower. The speed of it is insane. They are indeed more complex and difficult to understand. But I fail to see how that is a criterium. I would very much rather include an industry standard library in comparison to something homegrown.

a year ago

LoganDark

> They are indeed more complex and difficult to understand. But I fail to see how that is a criterium.

It's... not? Like, if you want to merely use an ISA, you don't need it to be simple or easy to understand, in fact tons of people pride themselves on making extremely high-performance RISC-V cores with OoOE and so on.

But the reason why someone might want a C implementation of an ISA is different from the reason people might want to go implement an ISA in a real project: maybe they want a software simulator that is easy to understand for one reason or another, perhaps for learning or demonstration purposes, or just as a fun hobby project.

These people wouldn't benefit from just pulling down Verilator or using one of the existing BATTLE-TESTED INDUSTRY-STANDARD PROFESSIONALLY-AUDITED HIGH-PERFORMANCE implementations because they literally don't care about any of those things.

In any case, it's a fallacy to assume that every programming project out there has to address a need in order to have a place. https://justforfunnoreally.dev

10 months ago

rowanG077

Yes a hobby/learning is a great reason, why not just start with that? I was simply wondering what the reason was. Maybe the author had an interesting need for a C implementation to do something. Me asking for the reason is not me saying it doesn't deserve to exist. I don't think you are arguing in good faith here.

10 months ago

LoganDark

> why not just start with that?

I'm not the one who started the thread.

> I don't think you are arguing in good faith here.

With that, this discussion is over.

10 months ago

Farmadupe

Considering it's allocation-free, maybe it's an ultralight/ simulator for checking large quantities of compiler output? (i.e no VM to create and destroy for every testcase)

Or the same but for testing some verilog/vhdl CPU implemetation in a simulator?

Or since it's only 500SLOC, maybe it's just for fun!

a year ago

mnurzia

This is an excellent idea. One limitation of a testing library of mine, `mptest`, is its inability to sandbox tests. I may take this idea and develop a more robust (and potentially parallel) testing framework around it.

a year ago

rowanG077

Then I would expect a comparison with verilator.

a year ago

srgpqt

Perhaps this could be used to run sandboxed code. Game engines could safely run mods using something like this, ala QuakeC.

a year ago

mnurzia

Definitely. My motivation for writing this was to have a simple CPU for a virtual game console-like project. I decided to release it on its own, though.

a year ago

mcraiha

For modern game engine you most likely want WebAssembly support. e.g. Flight Simulator does that https://flightsimulator.zendesk.com/hc/en-us/articles/766290...

a year ago

srgpqt

Sure, I’d love to see your 600 line webassembly interpreter.

a year ago

sitkack

Run wasm on this core.

a year ago

[deleted]
a year ago

sutterbutter

As a total newb to programming, what am I looking at here?

a year ago

detrites

There are several different types of CPU's, in two main classes, CISC and RISC. The difference is summarised by the first letter - "Complex" vs. "Reduced" - Instruction Set Computer. Or, what size "vocabulary" a CPU decodes.

RISC-V is a type of CPU architecture (a set of plans for how to build one, not an actual CPU itself), that also happens to be open source. Anyone can build a RISC-V CPU without having to buy the rights to do so. (Many are.)

This project is an emulation of a RISC-V CPU. A kind of virtual "reference" CPU in software. It can be used to compile code that can run on a RISC-V type CPU, and to help understand what's happening inside the CPU when it runs.

It's written in C, which is and was a very fundamental programming language that's influenced the design of many other languages. It is a language that is very close the fundamental language CPU's natively decode and process.

CPU's natively use a language referred to as "Assembly", but which actually has many varieties particular to each CPU design. Regardless of variety of CPU, assembly is usually is about as reasonably "close to metal" as it gets.

It's literally communicating with the CPU directly in its own language. This makes it extremely fast to run, but laborious to code, and also somewhat "dangerous" in that with such low-level control, it's easy to mess things up.

This project takes an input of a text list of RISC-V assembly instructions (a "program") and pretends to be RISC-V CPU with those instructions loaded into it and being run on it. Useful for understanding, prototyping and building a RISC-V program.

CPU's are designed rather to run assembly that already "works", having been created programmatically (compiled or interpreted), by a higher level language that isn't going to give it things that make no sense (hopefully).

So there is not usually a lot of provisioning done in the design of the CPU to make it easy to watch it and its state carefully at a low level and examine how your assembly program is working, or not working. Emulation eases this.

a year ago

dragonwriter

> CPU’s natively use a language referred to as “Assembly”, b

Strictly, CPUs use machine code. Assembly targeting a particular CPU is a very thin more-human-readable abstraction around the underlying machine code, but it is not, itself, what the CPU executes. That’s why “assemblers” exist – they are compilers from assembly language to machine code (though, because assembly is a very thin abstraction, they are much simpler than most other compilers.)

a year ago

detrites

Agree. And deeper than that may be microcode, which we rarely see or reason about, and while may very much be there is rarely of practical use. (Ie, when learning, the distinctions may be somewhat an impediment without payoff.)

a year ago

tester756

Would calling "Assembly" a CPU's frontend language be correct?

The same way as it is in compilers

a year ago

touggourt

This is a well writed explanation!

I wrote a short news about the emulator on the french collaborative website linuxfr.org (see https://linuxfr.org/news/un-emulateur-et-un-desassembleur-ri...)

I would like to translate your comment and add it. Can I ?

10 months ago

sutterbutter

Wow this is beyond helpful. Thank you so much for taking the time to explain this so thoroughly.

a year ago