How does it work?
MagNumDB is a database that contains about 380,000 items. These items are constants, names, values
all extracted from more than 6,000 header files (.h, .hxx, .hpp, .idl, etc.) provided by standard Windows and Visual Studio SDKs and WDKs.
Some values have been extracted from the very special uuid.lib file that contains the value of thousands of guids and property keys, not present anywhere else in header files.
It also contains around 30,0000 undocumented guids that I found in en envelope someone put in front of my door on a cold and dark night of april 2018.
To build this database, we have tried many existing parsers, things like CLANG or other fine tools, but they just don't suit our needs.
They can't handle tens of thousands of files (or maybe with 128G of RAM?), they can't handle some specific (or just very old) Microsoft constructs,
they don't remember the stack of #define directives that led to a definition, they only give you a final AST, not a partial one, etc.
So, in the end, we have written a C/C++ parser named C2P5 (for C/CPP/PreProcessor/Parser), tailored specifically for computing constants.
C2P5 is capable of preprocessing, parsing and partially evaluating all header files as if they were included in a one big virtual project (that of course, does not compile) on a 32G RAM machine.
It currently supports the following preprocessor and C/C++ constructs:
The parser remembers dynamic preprocessor definitions (#if, #ifdef, etc.) that are conditions for constants definitions and expression computation.
All parsed items are saved in the database, as well as the associated conditions.
There may be more than one item corresponding to a given name, if there are differences in their associated conditions stack.
C2P5 supports the following types of constants, regardless of the way they are defined in source files:
C2P5 and this MagNumDB web site are written in C# and use a Lucene database as a full-text search engine.
C2P5 uses a custom ANTLR4cs C grammar for expression parsing, not for preprocessor parsing.
Frequently Asked Questions
Some important points to note:
Here are some example or custom queries:
title:wm_user returns the WM_USER Windows message item, not all items that reference the WM_USER token.
title:wm_u* returns all items (Windows messages probably) whose name starts with WM_U*.
value:1024 AND title:wm_* returns all items (Windows messages probably) whose name starts with WM_U* and value is 1024. Note AND must be UPPERCASE for Lucene's to understand it as an AND operator.
value:"00000002-0000-0000-C000-000000000046" returns the IMalloc IID guid value.
We welcome feedback.
Seen anything missing? A bug? A wrong value? Do you have any suggestion for improvements?
Do you have an idea for a cool new feature?
Please contact us here
MagNumDB 2017-2019 Simon Mourier V1.3.0. All rights reserved.
All product names, logos, and brands are property of their respective owners. All company, product and service names used in this website are for identification purposes only.
All values, names, source code fragments displayed here have been extracted from files that are property of their respective owners.
THIS SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND OTHER THAN AS SPECIFICALLY SET FORTH IN THE LICENSE AGREEMENT, INCLUDING WITHOUT LIMITATION WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.