'4CCO' (four character codes) versus Symbols

View: New views
3 Messages — Rating Filter:   Alert me  

'4CCO' (four character codes) versus Symbols

by Mateu Batle :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi guys,

I'd like to share some details about an extension we have done to the
Nebula 2 engine kernel to begin some discussion and getting some
comments. Basically it is the substitution of four character codes
(fourcc or 4cc) by symbols. I'll introduce now to the concepts with
some examples and details about the implementation. BTW, this is a
copy from a post in my blog, which I've inaugurated recently,
http://sharedming.blogspot.com, although there is no much to see there
yet.

A fourcc is basically a very efficient way to represent a string. It
is small in size (just the size of an integer variable), and it is way
faster to compare fourccs than strings. An additional adavantage is
that it can be used as a hash key in order to get fast lookups. Those
where the advantages, but which are the drawbacks ? Just one, it makes
programming more cumbersome. Some examples of fourccs are 'SCPN' for
"SetCompanyName" and 'GCRS' for "GetCurrentState".

I agree that fourccs are efficient and I'd like to keep that somehow,
but they make the programmer's life more difficult, they force us to
write more code than needed making the resulting code more difficult
to read and maintain. "Less code, better code". Let's analyze what a
programmer needs to do in order to use fourccs:

* Create a fourcc from a string they represent (and remember them).
* Register somehow the relation of the fourcc and its string.
* Write two versions of the methods, one for accessing by string
(slower) and another accesing by fourcc (faster but harder to use and
read).


There are several examples of these in Nebula 2 code base, like the
command names and the signal names. And there are even more examples
in Nebula 3, like the class names and the attributes.

Let's go to the point. The idea of symbols is basically a constant
string, a string that does not change during the runtime of the
application. Using symbols the programmer just have to rememeber one
string and code one version of the function, that's all, less work and
more important easier to read and therefore to maintain.

Let's see some examples of usage to clarify it:


void IncIntAttribute(nSymbol attributeName)
{
  int val = this->GetIntAttribute(attributeName);
  val++;
  this->SetIntAttribute(attributeName, val);
}

obj->RegisterIntAttribute( NS(LoopCount) );
obj->SetIntAttribute( NS(LoopCount), 0 );
obj->IncIntAttribute( NS(LoopCount) );

Note: the macro NS(XXX) is a preprocessor macro that does some magic
to convert the parameter XXX into an actual value (NS is a shortcut
for NEBULASYMBOL). Actually this is the hard part of the system, but
it can be done since symbols are known at compile time.

Implementation details:

* There is a preprocessor macro NS(XXX) which basically maps into a
C++ preprocessor define. These defines are generated automatically in
a process explained later on. For example NS(LoopCount) translates
into the preprocessor define NSYMBOLID_LoopCount (which maps to an
integer).


#define NS(XXX) NSYMBOLID_ ## XXX

* There is a nSymbolId type which is basically a typedef of an int.
This is the same size of a fourcc.

* There is a nSymbol C++ object, which wraps a nSymbolId and provides
some handy functions to do conversions to and from strings and fast
symbol comparison. Passing nSymbol and nSymbolId as function arguments
is as efficient as with fourccs.

* How to calculate the symbol id ? Any way for mapping from string to
an intteger can be used. But one property must be enforced, it has
always to give the same value in any source code in any file. That's
why we use the CRC (Cyclic Redundancy Check) algorithm, which could
provide some collisions in theory (two different symbols given the
same id), but it has never happenned in practice. In the case of this
situation happens, we detect it and warn the programmer.

* When to calculate the symbol id ? It can be done in several ways,
but basically it has to be done between the time after writing the
source code and before compiling. It could be a pre-compile build
step, or part of the build system of Nebula. This process basically
generates an include file common for the whole target which has all
the symbols included in the target, for example:

#define NSYMBOLID_CLASS 2819245958

#define NSYMBOLID_nroot 4018013252

* Additionally, we have a symbol table, which basically maps from
nSymbolId to a C string. So when the string has to be recovered from
the symbol there is a small penalty, although this operation is not
done normally (and in some systems it is just kept as a debug
feature). There is also an autogenereated C++ file (generated in the
same build process) which does the automatic registration of the
symbol ids and symbol strings.

cheers
  Mateu

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV

*** NOTE: To reply to the list use "reply to all",  ***
***       to reply direct to the sender use "reply" ***
_______________________________________________
Nebuladevice-discuss mailing list
Nebuladevice-discuss@...
https://lists.sourceforge.net/lists/listinfo/nebuladevice-discuss

Re: '4CCO' (four character codes) versus Symbols

by Vadim Macagon :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Mateu Batle wrote:

> Hi guys,
>
> I'd like to share some details about an extension we have done to the
> Nebula 2 engine kernel to begin some discussion and getting some
> comments. Basically it is the substitution of four character codes
> (fourcc or 4cc) by symbols. I'll introduce now to the concepts with
> some examples and details about the implementation. BTW, this is a
> copy from a post in my blog, which I've inaugurated recently,
> http://sharedming.blogspot.com, although there is no much to see there
> yet.

Hi Mateu,

That should be http://sharedmind.blogspot.com :)

You mention that the calculation of symbol ids can be done as a
pre-compile build step or by the Nebula build system, which way did you
implement it?

This sounds interesting, but since it's a change to the core we're going
to have to figure out how to proceed. I'm going to write up some options
this evening.


-+ enlight +-

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV

*** NOTE: To reply to the list use "reply to all",  ***
***       to reply direct to the sender use "reply" ***
_______________________________________________
Nebuladevice-discuss mailing list
Nebuladevice-discuss@...
https://lists.sourceforge.net/lists/listinfo/nebuladevice-discuss

Parent Message unknown Fwd: '4CCO' (four character codes) versus Symbols

by Mateu Batle :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

2007/4/3, Vadim Macagon <vadim@...>:

[snipped]

> That should be http://sharedmind.blogspot.com :)

Ooops ! Thanks enlight. I didn't want to share my Ming at all :D

> You mention that the calculation of symbol ids can be done as a
> pre-compile build step or by the Nebula build system, which way did you
> implement it?

We implement it with the Nebula build system as a new generator type,
which should be more portable but it has the annoyance of having to
redo this process when you add or change a symbol in your code
(although we could do easily that the pre-build step calls this
generator before compiling). We alleviate this problem by having to
rerun this only for the release, and doing a more dynamic runtime
symbol register in debug. By the way, I forgot to say we allow
insertion of new symbols in runtime.

> This sounds interesting, but since it's a change to the core we're going
> to have to figure out how to proceed. I'm going to write up some options
> this evening.

Yeah, I know. In my opinion I'd not change the core at all, it is not
worth, less if we think that Nebula 3 is near. I have had just bad
experiences so far by modifying the core, for example the change to
nObject (which IIRC Bruce and Enlight did) and the signal system (done
by Bruce and me). Even that both changes have been very useful and
worth for us. But what I mean is changing apart from Radon Labs is
like signing a death penalty for those features, or worse creating a
branch off the main project.

I like this feature and I was aiming that this or something similar is
considered for Nebula 3 core, which seems to be not so far away (so
maybe it is too late for that). I just wanted to catch the attention
of floh, I did a post in his blog about this. And he did an
interesting reply:

Floh said...:

"Hmm, this seems to be similar to our "Attribute Id's" we have built
for Mangalore and which will be integrated in Nebula3 (and heavily
used). Attribute Id's are unified string/fourcc/C++-symbol
identifiers. When using them in C++ code, you use the C++ symbol (and
get a compile error if the attribute id doesn't exist), but you can
also get the associated string or fourcc code, which is useful for
persistency or communication (since the symbol form is basically a
pointer to a static C++ object which doesn't make sence across
processes). Attribute Id's are also associated with a datatype (like
Int, Float, etc...), and using a mismatching datatype with the
attribute id also results in a compile error. I think one of my next
blog posts will be about attributes ;)"

I've taken a look at this, and I've some comments:

In my opinion, the attribute ids just share some of the functionality
with the symbols, but attribute ids are a much higher level than
symbols. The attribute ids could be implemented on top of the symbol
system. Attribute ids come with extra functionality which limits the
general applicability of symbols. The symbol system just provides one
thing, a constant string, that can be used in many places.

Attribute types can be checked for existence in compile time. This
seems good a priori, and probably it is the proper thing with
attributes. But this means that attribute ids must be declared and
registered explicitly, symbols are aimed to be very simple to use
because we thought that if they're more complicated than necessary
programmers will not use them. That's why the programmer has just to
write NS(symbol) to use a symbol, that's all, and that's why we have
written a build system generator to do all the "register" process
behind the scenes (we wanted a compile-time solution not runtime). And
it would be much simpler if C++ already had symbols meant to be used
by the programmer, many other languages have them. Back to the main
theme, comparing symbols and attributes (just in the symbol
functionality), I think attributes are not that easy to use, you have
to define them, register them, and the fourccs are still really there
so all problems defined before are still there as well.

BTW, I forgot one more example of the applicability of symbols. For
those enums that have to be converted from and back to string (there
are many of those in Nebula). Symbols already provide that
functionality, so these functions are not needed anymore.

cheers
  Mateu

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV

*** NOTE: To reply to the list use "reply to all",  ***
***       to reply direct to the sender use "reply" ***
_______________________________________________
Nebuladevice-discuss mailing list
Nebuladevice-discuss@...
https://lists.sourceforge.net/lists/listinfo/nebuladevice-discuss