An Analysis of the Source 2 Engine: The Schema System
praydog / June 2015 (2569 Words, 15 Minutes)
With the release of the Dota 2 Reborn Beta, users are able to try out the new Source 2 Engine. Just recently, Valve released the Mac and Linux clients for this beta.
What does this mean?
Well, as someone who is familiar with the internals of the Source Engine, getting my hands on the Mac files for one of its games is a dream come true. This is because all the debug symbols are present in these binaries, meaning all function names, class names, and variable names are visible. The tradition has not changed with Valve’s newest engine.
In this series, I am going to be analyzing the new parts of the Source 2 Engine. As a hacker, I will also be talking about the process of transitioning from Valve’s old engine to the new one, as Dota will inevitably completely transition to Source 2 (other games may as well).
At the time of writing this, the Source 2 SDK has not yet been released, so it is mostly reverse engineering.
A Brief History on Source Networking and Classes
In Source 1, classes on the client that needed to be in sync with the server utilized networked variables (and client prediction, but that’s not important right now). This means that when a networked class instance was modified on the server’s end, it would dispatch the modified variables to the proper clients.
An example would be an entity’s health, or its uninterpolated position. Once those are modified on the server, it dispatches the information to clients it deems valid (like ones that have vision of the entity), and that data would be replicated accordingly on the client. For more information, click here.
One of the most important parts about networked variables (for a hacker or modder) is that for a variable to be networked, the game must expose the offset to this variable, relative to its object base. That way the client will know what to modify once it receives an update from the server.
Retrieving these offsets involves calling CHLClient::GetAllClasses
which returns the head of a ClientClass
linked list. One would then traverse each ClientClass::m_pRecvTable by iterating over each of its RecvProps and retrieving the prop’s offset.
While networked variables are certainly advantageous for getting an idea of how a class works and mapping them out, there are still many variables that are not networked within these classes. Such variables do not have their offsets exposed like networked variables. This means there is extra work needed on the reverse engineer’s end to find non-networked member variables residing within a class.
Source 2 however, changes the ballgame completely.
Source 2’s Schema System
With the release of Source 2, there is now a Schema system used by its many modules. This Schema system is used to create very detailed descriptions of Classes, Enumerators, and Types. Many classes have a Schema binding with detailed descriptions of their members and inheritance, even if they don’t contain networked variables. There are even full listings of many Enumerators that were in Source 1, but were not exposed.
CHLClient::GetAllClasses
is still in Source 2 (now called CSource2Client::GetAllClasses
). While it still contains listings and the Schema layout of many of client.dll’s classes, it does not contain all of them.
class ClientClass
{
public:
const char* m_networkName; //0x0000
const char* m_name;
ClientClass* m_next; //0x0008
void* m_createFn; //0x000C
void* m_destroyFn; //0x0010
char _0x0014[8];
// not its type name
Schema* m_schema; //0x001C
char _0x0020[4];
unsigned int m_classId; //0x0024
void* m_stringThing; //0x0028
char m_byte; //0x002C
char _0x002D[3];
};
Digging deeper, ClientClass
is a smaller part of a much more ambitious solution. There is a whole module called schemasystem.dll
which is the core of this new Schema system responsible for mapping out and describing many of Source 2’s Classes, Enumerators, and Types.
The schemasystem.dll
’s main interface, ISchemaSystem
, can be retrieved the same way an interface is retrieved in Source 1, by calling the schemasystem.dll
export: CreateInterface
.
// Pseudocode
SchemaSystem* schemaSystem = CreateInterface("SchemaSystem_001", nullptr);
CSchemaSystem’s virtual method table is laid out in the Mac x64 libschemasystem.dylib like so:
CSchemaSystem::Connect(void * (*)(char const*,int *))
CTier2AppSystem<ISchemaSystem,0>::Disconnect(void)
CBaseAppSystem<ISchemaSystem>::QueryInterface(char const*)
CSchemaSystem::Init(void)
CSchemaSystem::Shutdown(void)
CBaseAppSystem<ISchemaSystem>::PreShutdown(void)
CSchemaSystem::GetDependencies(void)
CTier2AppSystem<ISchemaSystem,0>::GetTier(void)
CTier2AppSystem<ISchemaSystem,0>::Reconnect(void * (*)(char const*,int *),char const*)
CBaseAppSystem<ISchemaSystem>::IsSingleton(void)
CBaseAppSystem<ISchemaSystem>::GetBuildType(void)
CSchemaSystem::GlobalTypeScope(void)
CSchemaSystem::FindOrCreateTypeScopeForModule(char const*)
CSchemaSystem::FindTypeScopeForModule(char const*)
CSchemaSystem::GetTypeScopeForBinding(SchemaTypeScope_t,char const*)
CSchemaSystem::FindClassByScopedName(char const*)
CSchemaSystem::ScopedNameForClass(CSchemaClassInfo const*,char *,int)
CSchemaSystem::FindEnumByScopedName(char const*)
CSchemaSystem::ScopedNameForEnum(CSchemaEnumInfo const*,char *,int)
CSchemaSystem::LoadSchemaDataForModules(CUtlVector<CUtlString,CUtlMemory<CUtlString,int>> const&)
CSchemaSystem::SchemaSystemIsReady(void)
CSchemaSystem::VerifySchemaBindingConsistency(bool)
CSchemaSystem::SynthesizeOldBindingInformation(CSchemaClassInfo const*)
CSchemaSystem::SynthesizeOldEnumBindingInformation(CSchemaEnumInfo const*)
CSchemaSystem::CompleteModuleRegistration(char const*)
CSchemaSystem::DoMetaModify_Add(CSchemaClassInfo *,char const*,SchemaMetadataEntryData_t *,int)
CSchemaSystem::DoMetaModify_Override(CSchemaClassInfo *,char const*,SchemaMetadataEntryData_t *,int)
CSchemaSystem::DoMetaModify_Remove(CSchemaClassInfo *,char const*,SchemaMetadataEntryData_t *,int)
CSchemaSystem::DoMetaModify_RemoveAll(CSchemaClassInfo *,char const*)
CSchemaSystem::GetNoSchemaClassInfo(void)
CSchemaSystem::RegisterAtomicType(SchemaAtomicTypeInfo_t const*)
CSchemaSystem::ReadFieldToKeyValues(SchemaClassField_t const*,void const*,char const*,KeyValues *)
CSchemaSystem::Utils(void)
CSchemaSystem::~CSchemaSystem()
CSchemaSystem::~CSchemaSystem()
The important functions to focus on are:
CSchemaSystem::GlobalTypeScope
CSchemaSystem::FindTypeScopeForModule
Both of these functions return a CSchemaSystemTypeScope*
variable for doing operations under their scope.
GlobalTypeScope
returns a CSchemaSystemTypeScope*
used for doing operations on Schema that were created under this global scope, but can still be assigned a module.
FindTypeScopeForModule
returns a CSchemaSystemTypeScope*
used for doing operations on Schema that were created under the scope of the module name that was passed (like “client.dll”)
That being said, these two functions will allow iteration over the many Schema definitions that were defined within their scope. While there are no direct functions to iterate over the Schema definitions, they are still stored within the CSchemaSystemTypeScope
at offset 0x450
for classes, and 0x1C90
for enumerators (32-bit). The Schema they operate on differ completely from eachother, so it is imperative to use the proper scope for the Class/Enumerator/Type to operate on.
CSchemaSystemTypeScope’s virtual method table is laid out like so:
CSchemaSystemTypeScope::InstallSchemaClassBinding(CSchemaClassBindingBase *,CSchemaClassInfo *,char const*)
CSchemaSystemTypeScope::InstallSchemaEnumBinding(CSchemaEnumBindingBase *,char const*)
CSchemaSystemTypeScope::FindDeclaredClass(char const*)
CSchemaSystemTypeScope::FindDeclaredEnum(char const*)
CSchemaSystemTypeScope::FindSchemaTypeByName(char const*)
CSchemaSystemTypeScope::Type_Builtin(SchemaBuiltinType_t)
CSchemaSystemTypeScope::Type_Ptr(CSchemaType *)
CSchemaSystemTypeScope::Type_Atomic(char const*,int,IAtomicManipulator *,ISchemaManipulator *)
CSchemaSystemTypeScope::Type_Atomic_T(char const*,int,IAtomicManipulator *,ISchemaManipulator *,CSchemaType *)
CSchemaSystemTypeScope::Type_Atomic_TT(char const*,int,IAtomicManipulator *,ISchemaManipulator *,CSchemaType *,CSchemaType *)
CSchemaSystemTypeScope::Type_Atomic_I(char const*,int,IAtomicManipulator *,ISchemaManipulator *,int)
CSchemaSystemTypeScope::Type_DeclaredClass(char const*)
CSchemaSystemTypeScope::Type_DeclaredEnum(char const*)
CSchemaSystemTypeScope::Type_FixedArray(CSchemaType *,int)
CSchemaSystemTypeScope::Type_Bitfield(int)
CSchemaSystemTypeScope::Type_NoschemaType(void)
CSchemaSystemTypeScope::FindSchemaTypeByOldEnum(int)
CSchemaSystemTypeScope::FindType_Atomic(int)
CSchemaSystemTypeScope::FindType_Atomic_T(int,CSchemaType *)
CSchemaSystemTypeScope::FindType_Atomic_TT(int,CSchemaType *,CSchemaType *)
CSchemaSystemTypeScope::FindType_Atomic_I(int,int)
CSchemaSystemTypeScope::FindType_DeclaredClass(char const*)
CSchemaSystemTypeScope::FindType_DeclaredEnum(char const*)
CSchemaSystemTypeScope::FindRawClassBinding(char const*)
CSchemaSystemTypeScope::FindRawClassBinding(uint)
CSchemaSystemTypeScope::FindRawEnumBinding(char const*)
CSchemaSystemTypeScope::FindRawEnumBinding(uint)
CSchemaSystemTypeScope::FindDescendentsOfClass(CSchemaClassInfo const*,SchemaSubclassTraversalDepth_t,CUtlVector<CSchemaClassInfo const*,CUtlMemory<CSchemaClassInfo const*,int>> *)
CSchemaSystemTypeScope::FindBindingsMatching(CUtlVector<CSchemaClassInfo const*,CUtlMemory<CSchemaClassInfo const*,int>> *,ISchemaSearchTester<CSchemaClassInfo const*> *)
CSchemaSystemTypeScope::FindBindingsMatching(CUtlVector<CSchemaEnumInfo const*,CUtlMemory<CSchemaEnumInfo const*,int>> *,ISchemaSearchTester<CSchemaEnumInfo const*> *)
CSchemaSystemTypeScope::FindEnumeratorValue(long long *,char const*,char const*,long long)
CSchemaSystemTypeScope::FindEnumeratorName(char const*,long long,char const*)
CSchemaSystemTypeScope::GetScopeName(void)
CSchemaSystemTypeScope::IsGlobalScope(void)
CSchemaSystemTypeScope::MarkClassAsRequiringGlobalPromotion(CSchemaClassInfo const*)
CSchemaSystemTypeScope::MarkEnumAsRequiringGlobalPromotion(CSchemaEnumInfo const*)
CSchemaSystemTypeScope::ResolveAtomicInfoThreadsafe(SchemaAtomicTypeInfo_t const**,char const*,int)
CSchemaSystemTypeScope::ResolveEnumInfoThreadsafe(CSchemaEnumInfo const**,char const*)
CSchemaSystemTypeScope::ResolveClassInfoThreadsafe(CSchemaClassInfo const**,char const*)
CSchemaSystemTypeScope::FindDescendentsOfClass_R(CSchemaClassBindingBase **,int,CSchemaClassInfo const*,bool,bool,CUtlVector<CSchemaClassInfo const*,CUtlMemory<CSchemaClassInfo const*,int>> *)
FindDeclaredClass
can be used to find a class declared within the CSchemaSystemTypeScope that it’s called from. It will return a CSchemaClassInfo*
which is basically an empty class which inherits from SchemaClassInfoData_t.
SchemaClassInfoData_t
is the class that describes – in astonishing detail, many classes in Source 2.
// Classes
struct SchemaClassInfoData_t
{
SchemaString_t m_Name; // 0x00
// Not in the Schema description.
const char* m_Description; // 0x08
unsigned int m_nSizeOf; // 0xC
unsigned int m_nAlignOf; // 0x10
SchemaArray_t < SchemaClassFieldData_t > m_Fields; // 0x14
// Not in the Schema description.
SchemaArray_t < SchemaStaticFieldData_t > m_staticMembers; // 0x1C
SchemaArray_t < SchemaBaseClassInfoData_t > m_BaseClasses; // 0x24
SchemaArray_t < SchemaFieldMetadataOverrideData_t > m_FieldMetadataOverrides; // 0x2C
SchemaArray_t < CSchemaClassInfo* > m_NestedClasses; // 0x34
SchemaArray_t < CSchemaEnumInfo* > m_NestedEnums; // 0x3C
SchemaMetadataSetData_t m_Metadata; // 0x44
CSchemaSystemTypeScope* m_TypeScope;
private:
char padding[16];
};
class CSchemaClassInfo : public SchemaClassInfoData_t
{
public:
bool GetMetaStrings(const char* metaName, std::vector<const char**>& strings);
private:
};
That is exactly how Source 2 has SchemaClassInfoData_t set up, from the variable names to the variable types. That is because not only does this new Schema system describe classes, it actually also describes the very classes that it uses to describe other classes.
For proof, there are 3 hidden console commands that Valve has in Dota 2 Reborn.
schema_list_bindings
schema_dump_binding
schema_detailed_class_layout
These console commands use the CSchemaSystem::GlobalTypeScope
shown earlier, so it will not show every single class in the game. To do that, there needs to be iteration over every registered module’s CSchemaSystemTypeScope
, which these console commands don’t do.
The “Unaccounted” portions of the dump were named by looking over the class info area in the Mac binaries, to determine what those areas were for.
The second “Unaccounted” piece of data is a SchemaArray_t <SchemaStaticFieldData_t>
used for describing static members of a class. This is something that Source 1 did not do. Here’s an example dump.
24 Jun 2015 06:30:41 - C_DOTAGameManagerProxy STATIC MEMBERS
24 Jun 2015 06:30:41 - C_DOTAGameManagerProxy::s_pGameManagerProxy: 0x44AFD364
Source2Gen
Using the previous information, an SDK generator can be created. The goal is to create header files that can be used in a cheat or mod to streamline the process of creating interoperability with the game engine.
Source2Gen does exactly this. It creates headers for many of the exposed classes and enumerators residing within the target Source 2 game.
An example class that it will generate:
class CEntityInstance : public IHandleEntity
{
// CEntityInstance additional information
// client.dll, project entity2
// Alignment: -1
// SCHEMA_CLASS_HAS_VIRTUAL_MEMBERS
// SCHEMA_CLASS_TEMP_HACK_HAS_NOSCHEMA_MEMBERS
// SCHEMA_CLASS_TEMP_HACK_HAS_CONSTRUCTOR_LIKE_METHODS
// SCHEMA_CLASS_TEMP_HACK_HAS_DESTRUCTOR_LIKE_METHODS
// Abstract Class
public:
class CEntityInstanceEntityClass : public CEntityClass
{
// CEntityInstance::CEntityInstanceEntityClass additional information
// Alignment: -1
// SCHEMA_CLASS_HAS_VIRTUAL_MEMBERS
// Abstract Class
public:
}; // size: 388 (0x184)
public:
__declspec(align(4)) CEntityIdentity *m_pEntity;// 0x4, size 4 (0x4)
// m_pEntity metadata
// MNetworkEnable
// MNetworkPriority
char CEntityInstance_0C[0x4];
__declspec(align(4)) UnknownType <0x4, class CUtlSymbolLarge> m_iszPrivateVScripts;// 0xc, size 4 (0x4)
// m_iszPrivateVScripts metadata
// MKeyfieldname
// MNetworkDisable
__declspec(align(4)) UnknownType <0x4, class CUtlStringToken> m_worldGroupId;// 0x10, size 4 (0x4)
// m_worldGroupId metadata
// MKeyfieldname
// MNetworkDisable
__declspec(align(4)) CScriptComponent *m_CScriptComponent;// 0x14, size 4 (0x4)
// m_CScriptComponent metadata
// MNetworkEnable
// MNetworkDisable
}; // size: 24 (0x18)
The repository can be found here.
Addendum: As of early 2023, this repository has been archived. The code is still available, but it is no longer maintained.