Clang
It's a bit of a mess. Always will be. Kind of comes from C++ being a mess. Go look at this. The internals manual is probably a decent place to start.
libclang
This is the C api for Clang. It exists, but is limited in what parts of the AST are exposed. Despite that, it is sufficient for reading enough of the AST to do a lot. Tools like rust-bindgen are built on it.
libclang at it's core is really simple, but can be hard to immediately grasp because (that I know of) there isn't a good explanation of how the different parts of the api fit together. And I'm not experienced enought to get it right away. This is hopefully a "big ideas" text that me 2 weeks ago as of writing, or me having forgotten everything later, would find helpful.
Prereqs
libclang is for accessing the clang ast (well, clang abstract syntax digraph, because it can be cyclic), but it's a bad way to learn how the ast works. libclang conflates and leaves out the various node types in the clang ast. Read the section on the ast library in the internals manual and take a look at this page/talk for an okay introduction. libclang uses this api to traverse the nodes in the ast and get information on the nodes which currently have a representation in C. Just because something is accessible in the clang api, doesn't mean it is accessible in libclang.
The Parts
The official tutorial gives a good overview of the important types in the clang ast and is worth a read. The centerpiece type is CXCursor
. These point to a node in the ast. There are different kinds of nodes, described by CXCursorKind
. Some have documentation and some don't, but searching the clang api docs for a type with a similar name can be very insightful. An important kind is CXCursor_Unexposed
. This means the node isn't exposed to libclang even though it does exist. These unexposed nodes will still be listed when doing clang -Xclang -ast-dump src-file
which can be confusing. It can be useful to make a traversal using libclang instead of purely relying on the clang ast dump to see what is exposed and what isn't. A particularly thorny thing to realise is libclang
(I think) doesn't give good, direct access to declaration contexts.
Some CXCursor
s point to types. These will have a CXType
which can be obtained, and contains information about the C++ type the piece of code represents (C++ is complicated and you need to resolve types to build the ast). These don't correspond directly to C++ types, often capturing more information than directly in C++ types (e.g. one CXType_Kind
is CXType_ConstantArray
, a C array with a constant size). Depending on the CXType_Kind
, different functions are defined. CXType
s are in some ways more descriptive than types in the clang API, combining, for example, Types and QualTypes.
For inspecting ast nodes, that's basically it. To look at children nodes of a given node, visitors are used with clang_visitChildren
. This starts a traversal of the graph controlled by the return type of the visitor, possibly modifying some data being passed around. It's simple, but because the visitor doesn't know the direct parent (without manually storing as data during the traversal), it can be hard to know how a child node is related to it's parent. I don't know a good solution to this. The C++ api has tools for matching the ast, but libclang does not. Maybe homebrewing a similar matching library is the solution?
There are other things libclang can do, but these types form the base. With an understanding of them, and maybe some bookmarked doc pages, it should be not too hard to read and understand the apis for the rest of libclang's functionality.