Add getting started

2022-01-17 22:10:57 +00:00 · 2022-01-17 22:10:57 +00:00 · f134eb08a7
parent 34b4cfc27b
commit f134eb08a7
17 changed files with 230 additions and 56 deletions
--- a/images/flags/card.png
+++ b/images/flags/card.png
--- a/images/flags/xeai.png
+++ b/images/flags/xeai.png
--- a/images/ghidra.png
+++ b/images/ghidra.png
--- a/images/petools.png
+++ b/images/petools.png
--- a/images/vs.png
+++ b/images/vs.png
--- a/images/vs2.png
+++ b/images/vs2.png
--- a/images/vs3.png
+++ b/images/vs3.png
--- a/images/wireshark.png
+++ b/images/wireshark.png
--- a/images/wireshark2.png
+++ b/images/wireshark2.png
--- a/styles.css
+++ b/styles.css
@ -47,6 +47,10 @@ p {
    word-break: break-word;
 }

+img {
+    max-width: 100%;
+}
+
 code {
    vertical-align: middle;
    letter-spacing: .02em;
@ -58,6 +62,11 @@ code {
    word-break: break-word;
 }

+dfn {
+    border-bottom: 1px dashed currentColor;
+    cursor: help;
+}
+
 td>code {
    word-break: normal;
 }
--- a/templates/base.html
+++ b/templates/base.html
@ -7,7 +7,7 @@
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>{% block title %}{% endblock %}{% if self.title() %} | {% endif %}e-Amusement API</title>

-    <link rel="stylesheet" href="{{ROOT}}/styles.css?ver=4">
+    <link rel="stylesheet" href="{{ROOT}}/styles.css?ver=5">
    <link rel="stylesheet" href="{{ROOT}}/tango.css">

    <script async src="https://www.googletagmanager.com/gtag/js?id=G-LG6C6HT317"></script>
--- a/templates/pages/flags.html
+++ b/templates/pages/flags.html
@ -0,0 +1,7 @@
+{% extends "base.html" %}
+{% block title %}Curious flags{% endblock %}
+{% block body %}
+<br>
+<img src="{{ROOT}}/images/flags/xeai.png" >
+<img src="{{ROOT}}/images/flags/card.png" >
+{% endblock %}
--- a/templates/pages/getting_started.html
+++ b/templates/pages/getting_started.html
@ -0,0 +1,155 @@
+{% extends "base.html" %}
+{% block title %}Following along{% endblock %}
+{% block body %}
+<h1>Following along</h1>
+<p>I'd highly recommend following along with the details on these pages yourself. While my aim is to document as much as
+    I can to the best of my ability, there will be things I miss, get wrong, or that are out-right newer than these
+    pages! Knowing where the information here came from is key to being able to reproduce the findings yourself. It's
+    also just generally quite fun, and a useful skill.</p>
+<p>With that out of the way, you might then ask <i>how</i> to follow along. We're going to be getting nitty and gritty
+    with some games, bemani specifically, so the very first step is to get your hands on one of those. Because we're
+    going to want to poke around, we need a version of the game running on our PC (or in a VM), rather than on a
+    cabinet. If you feel like starting with a real cabinet, <a href="https://mon.im/2017/12/konami-arcade-drm.html">mon
+        has a great blog post</a> about that.</p>
+<p>The majority of direct references to code are based on Sound Voltex 4. The specific build I'm using in most snippets
+    is KFC-2019020600; no need to be on private websites to be able to make use of that information.</p>
+<hr>
+<p>Depending on what you have, you may be staring at a working game at this point, or a big network error. Either way,
+    you're sorted.</p>
+
+<h2>Static vs dynamic analysis</h2>
+<p>Quick detour here. In reverse engineering (what we're doing!) you'll often hear these two terms used.</p>
+<p>Static analysis is when we have a copy of the content, be that custom file formats, executable files, you name it,
+    and we aim to identify how they work without running them. This can be very powerful, as it allows us to reverse
+    engineer things we either can't or don't want to run. For example, we can perform static analysis of <i>any</i>
+    program on a modern desktop PC, even a program written for an old games console. If you're sat staring at a network
+    error right now, that's also a great example of the sorts of problems static analysis allows us to work around.</p>
+<p>Dynamic analysis, as you may now have guessed, is when we start the program in question, and poke around while it's
+    running. This poking can vary wildly; you might be curious about the values in memory during the execution of a
+    function identified during static analysis, maybe you want to look at network traffic being created while the
+    program runs, or maybe you just want to use the program normally to understand how it's intended to function.</p>
+<p>We're going to be doing a lot of both, so strap in!</p>
+
+<h2>Setting up our workspace</h2>
+<p>There are a few essential tools every reverse engineer should have in their toolbox:</p>
+<ul>
+    <li>A <dfn title="Interprets machine code, converting it to human-readable assembly">disassembler</dfn></li>
+    <li>A <dfn title="Allows reading of binary files">hex editor</dfn></li>
+    <li>A <dfn title="Able to connect to a running program and show us internal information about it">debugger</dfn>
+    </li>
+    <li>A <dfn
+            title="Takes assembly from a disassembler, and attempts to guess what the original source code may have looked like">decompiler</dfn>
+        (incredibly useful but not essential)</li>
+</ul>
+<p>I'm going to be using:</p>
+<ul>
+    <li><a href="https://ghidra-sre.org/">Ghidra</a>: Disassembler and decompiler</li>
+    <li><a href="https://hex-rays.com/ida-pro/">IDA</a>: Decompiler (for a second opinion; the decompiler isn't in the
+        free version)</li>
+    <li><a href="http://www.flexhex.com/">FlexHex</a>: Hex editor (there are <i>so</i> many free options here, so shop
+        around)</li>
+    <li><a href="https://visualstudio.microsoft.com/vs/community/">Visual Studio</a>: Debugger</li>
+    <li><a href="https://www.wireshark.org/">Wireshark</a>: Network captures</li>
+</ul>
+<p><small><i>(Ghidra has a debugger now, but I'm yet to play around with it enough to ditch VS)</i></small></p>
+
+<h2>Setting up Ghidra</h2>
+
+<p>When you start Ghidra for the very first time, you will be presented with an empty screen. You'll need to create a
+    new project; the name and location aren't especially important, but try and keep them sensible. After that, you can
+    drag a file (libavs-win32.dll from your game is a good choice here) into the window. It will ask a series of
+    questions; just acccept the defaults for everything. Once it's loaded, double click on the file to open the code
+    browser. You will be asked if you would like Ghidra to automatically analise the file for you. Yes!</p>
+<p>The interface can be pretty intimiading to start with, but there are a few important parts to note. Your window
+    likely looks different to mine here, but the general layout will be roughly the same.</p>
+
+<img src="{{ROOT}}/images/ghidra.png">
+
+<p>Everything in the interface is a draggable window, and can be popped out of the main window, so don't be afraid to
+    move things around if that helps. For example, I added the bookmarks window below my disassembler and decompiler,
+    because I use it quite frequently.</p>
+
+<h3>Key things to know in Ghidra:</h3>
+<ul>
+    <li>Double click on any label, function, or address to jump to that item. Alt+left and alt+right navigate through
+        your location history.</li>
+    <li>Middle click on any item to highligh all occurances of it (can be rebound to left click if you prefer it as
+        default)</li>
+    <li><code>L</code> will rename the item the cursor is over, and <code>Ctrl+L</code> will change the type of the item
+        (in the decompiler).</li>
+    <li><code>G</code> will open the jump popup. You can type an address, function name, label, etc. here</li>
+    <li><code>S</code> to open search. If at first you aren't seeing results, you may need to switch to searching
+        <code>All Blocks</code>.
+    </li>
+    <li>There are a bunch of useful tools in the <code>Window</code> dropdown at the top! Have a play around; you can't
+        break anything.</li>
+    <li><code>;</code> allows you to add a comment to any line</li>
+    <li>In the disassembly: <code>T</code> to change the type of the data at the cursor, <code>D</code> to disassemble
+        at the cursor, <code>F</code> to create a function at the cursor, <code>Del</code> to delete a function, and
+        <code>C</code> to clear the selected data, returning it back to unknown bytes.
+    </li>
+</ul>
+
+<h2>Setting up Wireshark</h2>
+<p>While less conventional as a dynamic analysis tool, Wireshark is an invaluable tool when working with network-related
+    tasks.</p>
+<p>Either by editing <code>prop/ea3-config.xml</code>, or using spicecfg, pick a totally bogus service URL, with a
+    distinct port. I'm going to use <code>http://127.0.0.1:54321</code>. Now start Wireshark, click once on the "adapter
+    for loopback traffic capture", then in the capture filter enter <code>port 54321</code> (edit as required). Hit
+    enter, and you'll start capturing. When you now start the game, some things will pop up but because we didn't have
+    anything listening on that port (hopefully!) every attempt at communication was an error.</p>
+<p>To rememdy this, let's run something on that port! It can be quite literally anything. <code>nc -lvp 54321</code>
+    will do, if you have netcat. With wireshark still running, restart the game. This time something interesting should
+    appear! If all went to plan, a green <code>HTTP</code> packet should show up.</p>
+<img src="{{ROOT}}/images/wireshark.png">
+<p>Clicking on it, we can see additional details. If we expand the blue HTTP section, and then the <code>Data</code>
+    section at the bottom of that, we can view the raw data that was included in this HTTP POST request.</p>
+<p>Wireshark is surprisingly flexible. Notice how in my screenshot the packet was identified as <code>XRPC</code>? I
+    wrote a relatively simple protocol dissector, which allows me to view the contents of XRPC packets directly within
+    Wireshark. While I might share it if I clean it up, it only took an hour or so in an evening to write; my aim is
+    that these documents provide everything you could ever need to be able to quickly write your own too.</p>
+<img src="{{ROOT}}/images/wireshark2.png">
+
+<h2>Setting up Visual Studio</h2>
+<p>Saved the worst for last, I'm afraid. Once visual studio starts, drag the exe you use to start the game into it. Odds
+    are this is <code>spice.exe</code>. Visual Studio, in stark contrast to Ghidra, is totally barren.</p>
+<img src="{{ROOT}}/images/vs.png">
+<p>When you press the start button, VS will likely ask you to restart it in elevated mode; go ahead and do that.</p>
+<img src="{{ROOT}}/images/vs2.png">
+<p>Wow. That's a lot more stuff, but it all seems a bit empty? As a debugger, VS only allows you to poke around while
+    the program is paused. We can manually pause using the pause icon at the top, which would normally be sufficient.
+    Unforunately, in our case, we're looking at a far bigger project. Odds are when you pause the program you will get a
+    message that it's running "external" code, or you end up somewhere totally random.</p>
+<p>To solve this, we can setup VS to automatically pause for us.
+    <code>Debug -&gt; New Breakpoint -&gt; Function Breakpoint</code> is the option we use to do this. VS will then
+    allow us to enter a... function name? Aah. The expectation being made here is that we are debugging our own program,
+    and have the full source code. Thankfully, we can instead enter an address here, by prefixing its address with
+    <code>0x</code>. This is where both static and dynamic analysis work together.
+</p>
+<p>If you run the program again now (stop it if it's still running) Visual Studio will know to automatically pau- not so
+    fast. The addresses we can see listed in Ghidra are the addresses we would expect, if the program was being loaded
+    into memory at its "normal" location. Unfortunately for us, that can make genuinely malicious code easier, so a
+    system called <a href="https://en.wikipedia.org/wiki/Address_space_layout_randomization">ASLR</a> is used to
+    randomise the addresses the program will use. This reallly sucks for dynamic analysis.</p>
+<p>Thankfully, we don't need to turn it off for our whole computer. We're going to use a tool called <a
+        href="https://petoolse.github.io/petools/">PE Tools</a>. After starting the program, drag the DLL we're curious
+    about onto it, <code>libavs-win32.dll</code>, for example. We need to lie to Windows that this DLL is not
+    actually able to handle having its addresses randomised, which involves turning off <code>DLL can move</code>. This
+    is going to directly edit the DLL file, so if you happen to be seeding it, consider this your warning to copy
+    everything over to a different folder before continuing.</p>
+<img src="{{ROOT}}/images/petools.png">
+<p>At this point, we can return to Visual Studio and add our breakpoint as previously. If you've been following along,
+    <code>0x1000A920</code> is a good breakpoint to test. It's quite likely however that the breakpoint won't be hit.
+    This is, to the best of my knowledge, an issue in VS. Delete the breakpoint, and this time start the program then
+    hit the pause button immediatly. Only once paused, re-add the breakpoint, then continue execution.
+</p>
+<img src="{{ROOT}}/images/vs3.png">
+<p>The breakpoint should be hit almost right away. This is because that address is one of the logging functions :). In
+    the bottom left, a list of registers are shown. This particular function takes its values via the stack, so paste
+    the ESP register's value into the address box of the memory viewer. Right clicking, we can switch to
+    <code>4-byte</code> mode, and can now see the stack clearly. The second number you see (ESP+0x04) is, in this case,
+    the first argument to the function. Jumping to that value, we can see what it was about to log. In my case it was
+    simply <code>ea3-boot</code>, but expect it to be different for you.
+</p>
+
+{% endblock %}
--- a/templates/pages/index.html
+++ b/templates/pages/index.html
@ -1,46 +1,24 @@
 {% extends "base.html" %}
 {% block body %}
 <h1>Benami/Konami e-Amusement API</h1>
-<p>Why?</p>
-<p>I was curious how these APIs work, yet could find little to nothing on Google. There are a number of
-    closed-source projects, with presumably similarly closed-source internal documentation, and a scattering of
-    implementations of things, yet I couldn't find a site that actually just documents how the API works. If I'm
-    going to have to reverse engineer an open source project (or a closed source one, for that matter), I might as
-    well just go reverse engineer an actual game (or it's stdlib, as most of my time has been spent currently).</p>
-<p>For the sake of being lazy, I'll probably end up calling it eAmuse more than anything else throughout these
-    pages. Other names you may come across include <code>httpac</code> and <code>xrpc</code>*. The former is the
-    suite of HTTP functions used in the Bemani stdlib, and the latter then name of their communication protocol they
-    implement at the application layer, but whenever someone refers to any of them in the context of a rhythm game,
-    they will be referring to the things documented here.<br />
-    <small style="margin-left: 8px">*I believe <code>xrpc</code> is the officialy used name for the protocol.</small>
-</p>
-<p>These pages are very much a work in progress, and are being written <i>as</i> I reverse engineer parts of the
-    protocol. I've been asserting all my assumptions by writing my own implementation as I go, however it currently
-    isn't sharable quality code and, more importantly, the purpose of these pages is to make implementation of one's
-    own code hopefully trivial (teach a man to fish, and all that).</p>
-<p>Sharing annotated sources for all of the games' stdlibs would be both impractical and unwise. Where relevant
-    however I try to include snippets to illustrate concepts, and have included their locations in the source for if
-    you feel like taking a dive too.</p>
-<p>If you're here because you work on one of those aforementioned closed source projects, hello! Feel free to share
-    knowledge with the rest of the world, or point out corrections. Or don't; you do you.</p>
+<i><a href="./motiviation.html">I moved the big block about why these page exist, because it was getting painfully
+        long.</a></i>

-<h3>Code snippets</h3>
-<p>Across these pages there are a number of code snippets. They roughly break down into three categories:</p>
-<ul>
-    <li>Assembly: Directly disassembled code from game binaries</li>
-    <li>C: Either raw decompilation, or slightly cleaned up decompilation</li>
-    <li>Python: Snippets from my local testing implementations</li>
-    <li>Pseudocode: Used to illustrate some points. Note that it probably started life as Python before being
-        pseudo'd</li>
-</ul>
-<p>If you yoink chunks of Python code, attribution is always appreciated, but consider it under CC0 (just don't be
-    that person who tries to take credit for it, yeah?).</p>
-<p>Assembly and C snippets often come with an accompanying filename and address. If you're interested in learning how
-    things work in more detail, I'd strongly recommend checking them out. Not all games come with the same version of
-    files; the provided addresses are for build SDVX build KFC-2019020600, using the default base offset.</p>
+<p>The pages across this mini-site aim to totally document, to the best of my ability, the API used for E-Amusement
+    (XRPC), some of its inner workings, how to interface with it both as a client and a server, and how to perform this
+    sort of analysis yourself.</p>
+
+<p>If you just want a plug-and-play library, this is not it. If you're here for knowledge, my aim is that this is
+    <i>the</i> most comprehensive public documentation, so you're hopefully in the right place.
+</p>

 <h2>Contents</h2>
-<ol>
+<ol start="0">
+    <li><a href="./getting_started.html">Getting started and following along</a></li>
+    <ul>
+        <li>A quick one-stop shop for getting setup with the tools you'll want on hand if you want to investigate things
+            for yourself.</li>
+    </ul>
    <li><a href="./transport.html">Transport layer</a></li>
    <ol>
        <li><a href="./transport.html#packet">Packet structure</a></li>
@ -69,24 +47,21 @@
    </ol>
 </ol>

-<h2>Getting started</h2>
-<p>My aim with these pages is to cover as much as possible, so you don't need to try and figure them out yourself.
-    That said, being able to follow along yourself will almost certainly help get more out of this. For following
-    along with source code, you're really going to want to grab yourself a dumped copy of a game (it's going to be a
-    lot easier, and cheeper, than dumping one yourself). I trust you can figure out where to find that.</p>
-<p>For network related things, your options are a little broader. The ideal would be physical ownership of a
-    cabinet, and a subscription to genuine e-amusement. Odds are you don't have both of those :P. A connection to an
-    alternative network works just as well. In the more likely case that you don't have a physical cabinet, it's
-    time to crack out that dumped copy of a game and just run it on your own PC (or a VM, if you're not on Windows)
-    (odds are whatever you downloaded came with the program you'll need to start it pre-packaged. If not, it rhymes
-    with rice.).</p>
-<p>You will also need a local e-amusement-emulating server. By the time I'm done with these pages, there will
-    hopefully be everything you need to be able to write your own. Unfortunately I'm not finished writing them;
-    depending on where you acquired your game, it may have shipped with one of said servers. If it didn't, Asphyxia
-    CORE will do the trick (yes, it's closed source).</p>
-<p>If this all sounds like way too much work, and/or you're just here because of curiosity, I plan to prepare some
-    pcaps of network traffic to play around with without needing a running copy of a game or a network tap on a cab.
-</p>
+<h2>Code snippets</h2>
+<p>Across these pages there are a number of code snippets. They roughly break down into three categories:</p>
+<ul>
+    <li>Assembly: Directly disassembled code from game binaries</li>
+    <li>C: Either raw decompilation, or slightly cleaned up decompilation</li>
+    <li>Python: Snippets from my local testing implementations</li>
+    <li>Pseudocode: Used to illustrate some points. Note that it probably started life as Python before being
+        pseudo'd</li>
+</ul>
+<p>If you yoink chunks of Python code, attribution is always appreciated, but consider it under CC0 (just don't be
+    that person who tries to take credit for it, yeah?).</p>
+<p>Assembly and C snippets often come with an accompanying filename and address. If you're interested in learning how
+    things work in more detail, I'd strongly recommend checking them out. Not all games come with the same version of
+    files; the provided addresses are for build SDVX build KFC-2019020600, using the default base offset.</p>
+

 <a href="./transport.html">Next page</a>
 {% endblock %}
--- a/templates/pages/motiviation.html
+++ b/templates/pages/motiviation.html
@ -0,0 +1,26 @@
+{% extends "base.html" %}
+{% block body %}
+<p>Why?</p>
+<p>I was curious how these APIs work, yet could find little to nothing on Google. There are a number of
+    closed-source projects, with presumably similarly closed-source internal documentation, and a scattering of
+    implementations of things, yet I couldn't find a site that actually just documents how the API works. If I'm
+    going to have to reverse engineer an open source project (or a closed source one, for that matter), I might as
+    well just go reverse engineer an actual game (or it's stdlib, as most of my time has been spent currently).</p>
+<p>For the sake of being lazy, I'll probably end up calling it eAmuse more than anything else throughout these
+    pages. Other names you may come across include <code>httpac</code> and <code>xrpc</code>*. The former is the
+    suite of HTTP functions used in the Bemani stdlib, and the latter then name of their communication protocol they
+    implement at the application layer, but whenever someone refers to any of them in the context of a rhythm game,
+    they will be referring to the things documented here.<br />
+    <small style="margin-left: 8px">*I believe <code>xrpc</code> is the officialy used name for the protocol.</small>
+</p>
+<p>These pages are very much a work in progress, and are being written <i>as</i> I reverse engineer parts of the
+    protocol. I've been asserting all my assumptions by writing my own implementation as I go, however it currently
+    isn't sharable quality code and, more importantly, the purpose of these pages is to make implementation of one's
+    own code hopefully trivial (teach a man to fish, and all that).</p>
+<p>Sharing annotated sources for all of the games' stdlibs would be both impractical and unwise. Where relevant
+    however I try to include snippets to illustrate concepts, and have included their locations in the source for if
+    you feel like taking a dive too.</p>
+<p>If you're here because you work on one of those aforementioned closed source projects, hello! Feel free to share
+    knowledge with the rest of the world, or point out corrections. Or don't; you do you.</p>
+
+{% endblock %}
--- a/templates/pages/server.html
+++ b/templates/pages/server.html
@ -1,4 +1,5 @@
 {% extends "base.html" %}
+{% block title %}Write a server{% endblock %}
 {% block body %}
 <h1>Let's write an e-Amusement server!</h1>
 <p>No, seriously. It's quite easy.</p>
--- a/templates/pages/smartea.html
+++ b/templates/pages/smartea.html
@ -1,4 +1,5 @@
 {% extends "base.html" %}
+{% block title %}Smart E-Amusement{% endblock %}
 {% block body %}
 <h1>Smart E-Amusement</h1>
 <p>So maybe you've turned on that checkbox before, and you're wondering what magic it used? Thankfully, source code for