Algorithm to exploit 32 bit time functions to do time zone calculations

View: New views
12 Messages — Rating Filter:   Alert me  

Algorithm to exploit 32 bit time functions to do time zone calculations

by Michael G Schwern :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,
I'm writing to you in reference to your libtai library whose TODO file states
that "support time zones" is still todo.  I had originally considered using
libtai in Perl to avoid the Unix 2038 bug, but Perl requires time zone support.

Instead, I am rewriting the time.h library functions to be 2038-clean.  The
effort is located here.
http://code.google.com/p/y2038/

The piece which is of interest to libtai is this:
http://code.google.com/p/y2038/wiki/HowItWorks

I have figured out a way to make use of 32 bit system functions to do 64 bit
time zone and daylight savings calculations.  I thought you might be able to
apply this to libtai.

Thanks,
Schwern


PS  Any potential license issues I'm happy to work out.


--
Stabbing you in the face so you don't have to.


Parent Message unknown Re: Algorithm to exploit 32 bit time functions to do time zone calculations

by Michael G Schwern :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Thorsten Glaser wrote:
> Michael G Schwern dixit:
>
>> Instead, I am rewriting the time.h library functions to be 2038-clean.  The
>
> *yawn*

Me:  "Hey, I have this great idea that might help out!"
You: "Your shit is boring."

This is the sort of reply one gets when offering help?  What a jackass.


> MirBSD uses a 64-bit time_t type on i386 (ILP32), with the aid
> of tm2mjd and mjd2tm functions from DJB libtai code. The rest of the
> functions from the time library work just fine…

If I understand correctly, that's an entire operating system.

The target for y2038 is people writing portable applications which don't have
the luxury of waiting for every OS to upgrade to a 64 bit time_t.  It works
with a 32 bit time_t and it handles time zones.

This is something, as I understand it, that libtai does not do (this is the
impression I get, please correct me if I'm wrong) and I'm offering a way that
it could.

Also you might want to have a look at the tests in y2038.  libtai's INSTALL
says it's not very well tested and what it has appears to be manual.  y2038
has automated tests with extensive testing data files for expected gmtime()
and localtime() results.  You might be able to adapt that data and also the
tap.c test library to make writing tests easy.

Or is that too boring?


--
If at first you don't succeed--you fail.
        -- "Portal" demo

Re: Algorithm to exploit 32 bit time functions to do time zone calculations

by Thorsten Glaser-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Michael G Schwern dixit:

>You: "Your shit is boring."

More like "has been done already". Sorry.

>If I understand correctly, that's an entire operating system.

Yes, but the gist is, that there is code which uses a certain 64-bit type,
called time_t, but you could of course just use int64_t instead, which does
the job quite fine. (Not even a binary change in the time zone data format.)

>Also you might want to have a look at the tests in y2038.

This is actually a good idea. Compiling (I think it was) CVS with the
changes was a good test too, as the configure script “checks whether
mktime() works”. To get that right (over all of the 64 bit) was hard.
You have to think about very many border cases… in the end, I just
ensured the round-trip via tai64_t was right, not neccessarily the
tai64_t representation itself.

bye,
//mirabilos
--
Sometimes they [people] care too much: pretty printers [and syntax highligh-
ting, d.A.] mechanically produce pretty output that accentuates irrelevant
detail in the program, which is as sensible as putting all the prepositions
in English text in bold font. -- Rob Pike in "Notes on Programming in C"

Re: Algorithm to exploit 32 bit time functions to do time zone calculations

by Michael G Schwern :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Thorsten Glaser wrote:
> Michael G Schwern dixit:
>
>> You: "Your shit is boring."
>
> More like "has been done already". Sorry.

Apology accepted.  Sorry things got off to a bad start.


>> If I understand correctly, that's an entire operating system.
>
> Yes, but the gist is, that there is code which uses a certain 64-bit type,
> called time_t, but you could of course just use int64_t instead, which does
> the job quite fine. (Not even a binary change in the time zone data format.)

Is this approach portable outside BSD?

My main targets are Perl, Python and Ruby which have to be absurdly portable.
 As I understand it, almost nothing about the time zone database is portable.
 For example, I could not expect tzload() to work on all operating systems.  I
would have to ship my own, which I do not want to do, or write time zone code
for each operating system, which I also do not want to do.

I could be wrong, I'm really a Perl programmer who plays a C programmer on TV.

That said, I do recognize that a lot of my work will boil down to just doing a
search and replace for "time_t" with "Time64_T" and "int" with "Int64_T".  For
example, asctime().  It's really the trick to get localtime() working that's
important.  I intend to loot BSD code for everything I can, right now I'm
mostly using it mostly as a reference.  In fact, I'm considering changing from
MIT to BSD license to make that even easier.

Speaking of asctime(), I think this is a y2**31 bug in asctime3:

        char year[INT_STRLEN_MAXIMUM(int) + 2];

Since 64 bit time can go well above the year 2 billion, years must be stored
as 64 bit ints.  If int is 32 bits, the code above is only allocating enough
room for 2**31 years.

I've been testing a lot of 64 bit system's time handling lately and that's a
common mistake, 32 bit years.  Though not as bad as HP/UX's Y10k bug.

Oh, don't forget to make tm.tm_year 64 bit!


>> Also you might want to have a look at the tests in y2038.
>
> This is actually a good idea. Compiling (I think it was) CVS with the
> changes was a good test too, as the configure script “checks whether
> mktime() works”. To get that right (over all of the 64 bit) was hard.
> You have to think about very many border cases… in the end, I just
> ensured the round-trip via tai64_t was right, not neccessarily the
> tai64_t representation itself.

To test timegm() I just round tripped it through gmtime() at various
interesting times.

    time = 60*60*16;
    gmtime64_r(&time, &date);
    is_Int64( timegm64(&date), time, "timegm64(60*60*16)" );

An mktime() test should work the same way, round trip through localtime(), no?
You can see the test here (and I really should put all that repeated code into
its own test function).
http://code.google.com/p/y2038/source/browse/trunk/t/timegm.c

Anyhow, if you're interested in the Test Anything Protocol, a really simple
implementation is here:
http://code.google.com/p/y2038/source/browse/trunk/tap.c

And the code to run them is here in the "tap_tests" target.
http://code.google.com/p/y2038/source/browse/trunk/Makefile

More info about TAP can be found here:
http://testanything.org/wiki/index.php/Main_Page

The MyTAP library used by MySQL might be relevant (and cleaner than mine)
http://www.kindahl.net/mytap/doc/

And MySQL's documentation on that.
http://dev.mysql.com/doc/mysqltest/en/unit-test.html


--
The mind is a terrible thing,
and it must be stopped.

Re: Algorithm to exploit 32 bit time functions to do time zone calculations

by Thorsten Glaser-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Michael G Schwern dixit:

>>> If I understand correctly, that's an entire operating system.
>>
>> Yes, but the gist is, that there is code which uses a certain 64-bit type,
>> called time_t, but you could of course just use int64_t instead, which does
>> the job quite fine. (Not even a binary change in the time zone data format.)
>
>Is this approach portable outside BSD?
>
>My main targets are Perl, Python and Ruby which have to be absurdly portable.
> As I understand it, almost nothing about the time zone database is portable.

Oh, okay. This is a step further away from Unix.

> For example, I could not expect tzload() to work on all operating systems.  I
>would have to ship my own, which I do not want to do, or write time zone code
>for each operating system, which I also do not want to do.

You probably could just take the entire Olson time library in its original,
portable state, and change that. (This would have the beneficial side effect
to replace vendors’ probably buggier time libraries and time zone databases
with something known to work.) But I like your “wrapping” approach.

>In fact, I'm considering changing from
>MIT to BSD license to make that even easier.

There is not a single BSD licence. MIT is just fine… OpenBSD uses ISC for
new code, and, being European, MirBSD has to have its own only slightly
different one too ;)

>Speaking of asctime(), I think this is a y2**31 bug in asctime3:
>
> char year[INT_STRLEN_MAXIMUM(int) + 2];

Yup, probably.

>Oh, don't forget to make tm.tm_year 64 bit!

Yeah, that bites us great time. For example, look here:
http://cvs.mirbsd.de/ports/lang/python/2.5/patches/patch-Modules_datetimemodule_c
When you have only one of the two chunks, it dumps core.
Takes a while to find it, since -Wformat did not, obviously,
catch this case. It usually spots all occurrences though.

>Anyhow, if you're interested in the Test Anything Protocol, a really simple

I implemented it in Python for the day-job project I'm currently
working on, since my colleague is a Perl fan… but thanks ;)

bye,
//mirabilos
--
Sometimes they [people] care too much: pretty printers [and syntax highligh-
ting, d.A.] mechanically produce pretty output that accentuates irrelevant
detail in the program, which is as sensible as putting all the prepositions
in English text in bold font. -- Rob Pike in "Notes on Programming in C"

Re: Algorithm to exploit 32 bit time functions to do time zone calculations

by Thorsten Glaser-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Michael G Schwern dixit:

>Speaking of asctime(), I think this is a y2**31 bug in asctime3:

Two of them even, please look at asctime.c in cvsweb, I fixed it ☺
(The allbsd.org mirror will not update before 04:10 UTC though you
can use http://www.mirbsd.org/cvs.cgi/)

//mirabilos
--
Sometimes they [people] care too much: pretty printers [and syntax highligh-
ting, d.A.] mechanically produce pretty output that accentuates irrelevant
detail in the program, which is as sensible as putting all the prepositions
in English text in bold font. -- Rob Pike in "Notes on Programming in C"

Re: Algorithm to exploit 32 bit time functions to do time zone calculations

by Thorsten Glaser-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Michael G Schwern dixit:

>That said, I do recognize that a lot of my work will boil down to just doing a
>search and replace for "time_t" with "Time64_T" and "int" with "Int64_T".

Ah. The trick with my implementation is, to change “int” and “long” to
“time_t” instead *ONLY* where it is neccessary, and leave the remaining
narrow integer types alone. In some places, I use int64_t or uint64_t
for casts, for either clarity, simplicity or portability, but mostly,
I stuck with time_t, as it’s 32-bit on the sparc platform, 64-bit on
the i386 platform with MirBSD.

I wanted to avoid switching EVERY integer type to 64 bit even where not
needed, as that can be much slower and is much bigger.

bye,
//mirabilos
--
Sometimes they [people] care too much: pretty printers [and syntax highligh-
ting, d.A.] mechanically produce pretty output that accentuates irrelevant
detail in the program, which is as sensible as putting all the prepositions
in English text in bold font. -- Rob Pike in "Notes on Programming in C"

Re: Algorithm to exploit 32 bit time functions to do time zone calculations

by Michael G Schwern :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Thorsten Glaser wrote:
>> My main targets are Perl, Python and Ruby which have to be absurdly portable.
>> As I understand it, almost nothing about the time zone database is portable.
>
> Oh, okay. This is a step further away from Unix.

It's about nine. :)  If you're morbidly curious, take a look at the supported
platform list for Perl.
http://perldoc.perl.org/perlport.html#Supported-Platforms


>> For example, I could not expect tzload() to work on all operating systems.  I
>> would have to ship my own, which I do not want to do, or write time zone code
>> for each operating system, which I also do not want to do.
>
> You probably could just take the entire Olson time library in its original,
> portable state, and change that. (This would have the beneficial side effect
> to replace vendors’ probably buggier time libraries and time zone databases
> with something known to work.)

I've considered that, yes system libraries are often suspect, but I don't want
to now have each application with its own independent time zone library that
has to be updated by the user independent of the system's own.  Odds are, it
won't get updated.


> But I like your “wrapping” approach.

Thanks!

Maybe you can shed some light on this problem, what to do about year 0?  Right
now the only limit I have on dates is the limit of what Time64_T can store.
Does it make sense to stop at 0?  I've seen a number of implementations that
do.  My thinking for gmtime() is that a negative year is just BC, so let it go!

localtime()... well, localtime() gets absurd real fast.  Gregorian/Julian
calendar shifts.  The time zone simply not having existed in the past.  It's
really hard to say what the locals would have thought the datetime was X
seconds ago.

Any insights?


--
You are wicked and wrong to have broken inside and peeked at the
implementation and then relied upon it.
        -- tchrist in <31832.969261130@chthon>

Re: Algorithm to exploit 32 bit time functions to do time zone calculations

by Thorsten Glaser-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Michael G Schwern dixit:

>Odds are, it
>won't get updated.

True…

>Maybe you can shed some light on this problem, what to do about year 0?

I actually haven’t thought about year 0, since localtime/mktime do that,
and mjd2tm and tm2mjd from DJB code. I think it’s illegal, isn’t it?

>Right now the only limit I have on dates is the limit of what Time64_T
>can store. Does it make sense to stop at 0?

No, it makes sense to not stop. The thing is, mktime() and gmtime()
*must* have full round-trip capabilities (GNU autoconf checks for that
before it uses it), which is why, for example, my tai64_t data type does
not exactly store what DJB calls a TAI timestamp. There is a small wrap-
around at about 0x8000000000000000 (time_t) / 0xC000000000000000 (tai64_t)
of 10 (I think, due to the leap seconds) seconds. So, for whatever value
you have in whatever representation (time_t, tai64_t, struct tm), all of
these must have full round-trip capabilities (with the possible exception
that struct tm with a 64-bit year can go beyond the 64-bit time_t value
scale, but then, a 32-bit year does the same for a 32-bit time_t, so this
is no change).

>localtime()... well, localtime() gets absurd real fast.  Gregorian/Julian
>calendar shifts.

Mh. Maybe a struct tm.tm_year is always Gregorian? Have a look, while at
it, at the "%J" strftime modifier (and, especially, my implementation of
it, using the tm2mjd function). This will get you Julian days, but nothing
in Unix has them split off into a calendar time kind of structure.

So I think you don’t need to worry about THAT. That’s application layer
to do, similar to hebrew, muslim, asian etc. calendars. And at that, it
REALLY gets absurd (cf. http://blogs.msdn.com/michkap/default.aspx), but
that’s not the (our) OS’ job to worry about.

However, for “absurd” years, the OS’ own functions might go crazy. Too
bad we can’t access the OS’ own time zone table (I need to get the info
for leap seconds out of it, for example).

bye,
//mirabilos
--
Sometimes they [people] care too much: pretty printers [and syntax highligh-
ting, d.A.] mechanically produce pretty output that accentuates irrelevant
detail in the program, which is as sensible as putting all the prepositions
in English text in bold font. -- Rob Pike in "Notes on Programming in C"

Re: Algorithm to exploit 32 bit time functions to do time zone calculations

by Michael G Schwern :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

It's very nice to have someone else to bounce this all off of.


Thorsten Glaser wrote:
>> Maybe you can shed some light on this problem, what to do about year 0?
>
> I actually haven’t thought about year 0, since localtime/mktime do that,
> and mjd2tm and tm2mjd from DJB code. I think it’s illegal, isn’t it?

Illegal according to who?

C99 and POSIX 1003.1 define tm.tm_year as a signed int with a range of "years
since 1900" which makes a negative year perfectly "legal".


>> Right now the only limit I have on dates is the limit of what Time64_T
>> can store. Does it make sense to stop at 0?
>
> No, it makes sense to not stop. The thing is, mktime() and gmtime()
> *must* have full round-trip capabilities (GNU autoconf checks for that
> before it uses it), which is why, for example, my tai64_t data type does
> not exactly store what DJB calls a TAI timestamp. There is a small wrap-
> around at about 0x8000000000000000 (time_t) / 0xC000000000000000 (tai64_t)
> of 10 (I think, due to the leap seconds) seconds. So, for whatever value
> you have in whatever representation (time_t, tai64_t, struct tm), all of
> these must have full round-trip capabilities (with the possible exception
> that struct tm with a 64-bit year can go beyond the 64-bit time_t value
> scale, but then, a 32-bit year does the same for a 32-bit time_t, so this
> is no change).

I'm a little lost.  Could you give an example?


>> localtime()... well, localtime() gets absurd real fast.  Gregorian/Julian
>> calendar shifts.
>
> Mh. Maybe a struct tm.tm_year is always Gregorian? Have a look, while at
> it, at the "%J" strftime modifier (and, especially, my implementation of
> it, using the tm2mjd function). This will get you Julian days, but nothing
> in Unix has them split off into a calendar time kind of structure.

C99 says "Many functions [in time.h] deal with a calendar time that represents
the current date (according to the Gregorian calendar) and time."

But then goes on to say:

"Some functions deal with local time, which is the calendar time expressed for
some specific time zone, and with Daylight Saving Time, which is a temporary
change in the algorithm for determining local time. The local time zone and
Daylight Saving Time are implementation-defined."

POSIX 1003.1 does not appear to discuss the matter.

It seems clear to me that gmt is always Gregorian but not so clear what
happens with localtime().

I wonder how a Chinese locale deals with this.  They switched from Julian to
Gregorian after 1901, so it should show up in any Chinese localtime()
implementation.  Also Russia and much of Eastern Europe.


> So I think you don’t need to worry about THAT. That’s application layer
> to do, similar to hebrew, muslim, asian etc. calendars. And at that, it
> REALLY gets absurd (cf. http://blogs.msdn.com/michkap/default.aspx), but
> that’s not the (our) OS’ job to worry about.
>
> However, for “absurd” years, the OS’ own functions might go crazy. Too
> bad we can’t access the OS’ own time zone table (I need to get the info
> for leap seconds out of it, for example).

I'm just glad that ctime() isn't locale sensitive.  Oi, what a mess that would be.


PS  I just found this gem in the mktime() standard:

    the original values [in the tm struct] of the other components are not
    restricted to the ranges described in <time.h>.

which the BSD time.h man page expands out to:

     The original values of the tm_wday and tm_yday components of the struc-
     ture are ignored, and the original values of the other components are not
     restricted to their normal ranges, and will be normalized if needed.  For
     example, October 40 is changed into November 9, a tm_hour of -1 means 1
     hour before midnight, tm_mday of 0 means the day preceding the current
     month, and tm_mon of -2 means 2 months before January of tm_year.  (A
     positive or zero value for tm_isdst causes mktime() to presume initially
     that summer time (for example, Daylight Saving Time) is or is not in
     effect for the specified time, respectively.  A negative value for
     tm_isdst causes the mktime() function to attempt to divine whether summer
     time is in effect for the specified time.  The tm_isdst and tm_gmtoff
     members are forced to zero by timegm().)

     On successful completion, the values of the tm_wday and tm_yday compo-
     nents of the structure are set appropriately, and the other components
     are set to represent the specified calendar time, but with their values
     forced to their normal ranges; the final value of tm_mday is not set
     until tm_mon and tm_year are determined.  The mktime() function returns
     the specified calendar time; if the calendar time cannot be represented,
     it returns -1;

This is something I haven't tested for yet.


PPS  How does one get the damn qsecretary program to stop making you confirm
every email to this list?  I'm already subscribed.


--
The interface should be as clean as newly fallen snow and its behavior
as explicit as Japanese eel porn.

Re: Algorithm to exploit 32 bit time functions to do time zone calculations

by Thorsten Glaser-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Michael G Schwern dixit:

>>> Maybe you can shed some light on this problem, what to do about year 0?
>>
>> I actually haven’t thought about year 0, since localtime/mktime do that,
>> and mjd2tm and tm2mjd from DJB code. I think it’s illegal, isn’t it?
>
>Illegal according to who?
>
>C99 and POSIX 1003.1 define tm.tm_year as a signed int with a range of "years
>since 1900" which makes a negative year perfectly "legal".

0 is not negative, and there has been no year 0, only 1 ante christo (-1)
followed directly by 1 post christo. (Ironically, he was probably not born
by then.)

>I'm a little lost.  Could you give an example?

Yup. Convert -2⁶³ from time_t to TAI (while honouring leap seconds), and
it will wrap, because the result would be -2⁶³-10 (plus the BIAS), which
is actually positive. But that doesn’t matter, as it wraps back on the
way back. So don’t introduce any arbitrary limits.

>I wonder how a Chinese locale deals with this.  They switched from Julian to
>Gregorian after 1901, so it should show up in any Chinese localtime()
>implementation.  Also Russia and much of Eastern Europe.

I think a “struct tm” is just always gregorian, since other calendars
do not neccessarily have the same day/month/year concept (example:
Japanese).

>PPS  How does one get the damn qsecretary program to stop making you confirm
>every email to this list?  I'm already subscribed.

Same problem here. I think it uses the envelope address, not the header
address, to check, which is, IMO, a bug in DJB’s mailing list software ☺

bye,
//mirabilos
--
Sometimes they [people] care too much: pretty printers [and syntax highligh-
ting, d.A.] mechanically produce pretty output that accentuates irrelevant
detail in the program, which is as sensible as putting all the prepositions
in English text in bold font. -- Rob Pike in "Notes on Programming in C"

Re: Algorithm to exploit 32 bit time functions to do time zone calculations

by Michael G Schwern :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Thorsten Glaser wrote:

> Michael G Schwern dixit:
>
>>>> Maybe you can shed some light on this problem, what to do about year 0?
>>> I actually haven’t thought about year 0, since localtime/mktime do that,
>>> and mjd2tm and tm2mjd from DJB code. I think it’s illegal, isn’t it?
>> Illegal according to who?
>>
>> C99 and POSIX 1003.1 define tm.tm_year as a signed int with a range of "years
>> since 1900" which makes a negative year perfectly "legal".
>
> 0 is not negative, and there has been no year 0, only 1 ante christo (-1)
> followed directly by 1 post christo. (Ironically, he was probably not born
> by then.)

Oh, wasn't thinking about that.  I was thinking about negative years.  But
you're right, year 0 is sticky.  Let's look at what ISO 8601 does...

Wikipedia claims ISO 8601-2004 treats year 0 as 1 BC, but I can't find an
explicit reference.  They also claim that ISO 8601-2004 uses "Astronomical
year numbering" but again, I can't find that in the standard.  It just says a
"calendar year" is "in the Gregorian calendar" (2.2.13).  However, it seems
perfectly sensible and I think I'll go with that.

Here's the language in 3.2.1 "The Gregorian calendar" which uses their usual
"mutual agreement of the partners in information interchange" cop out

  The use of this calendar for dates preceding the introduction of the
  Gregorian calendar [1582] (also called the proleptic Gregorian calendar)
  should only be by agreement of the partners in information interchange.

And in 4.1.2.1...

  calendar year is, unless specified otherwise, represented by four digits.
  Calendar years are numbered in ascending order according to the Gregorian
  calendar by values in the range [0000] to [9999]. Values in the range [0000]
  through [1582] shall only be used by mutual agreement of the partners in
  information interchange.

They also define an expanded year format which, "by mutual agreement" allows
for negative years.  (4.4.3.3)


>> I wonder how a Chinese locale deals with this.  They switched from Julian to
>> Gregorian after 1901, so it should show up in any Chinese localtime()
>> implementation.  Also Russia and much of Eastern Europe.
>
> I think a “struct tm” is just always gregorian, since other calendars
> do not neccessarily have the same day/month/year concept (example:
> Japanese).

I've got a call out for someone to test what a properly localized Unix dist
does, just to get a data point.


--
"I went to college, which is a lot like being in the Army, except when
 stupid people yell at me for stupid things, I can hit them."
    -- Jonathan Schwarz