FYI, ran "git gc" on all git repositories

View: New views
7 Messages — Rating Filter:   Alert me  

FYI, ran "git gc" on all git repositories

by Jim Meyering :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I did the same thing a few months ago.
For some it made a big difference: emacs.git went from 1.1GB to 155MB.
Active repositories were shrunk to ~20% or even 5% of their original size.

This is the script I ran:

#!/bin/bash
log=$(mktemp /tmp/log-repo-gc-XXXXXX)
printf "Run this to see more detail:\ntail -f $log\n"
exec >$log

cd /vservers/vcs-noshell/srv/git

for dir in *.git; do
  echo $dir... 1>&2
  start_kb=$(du -sk $dir|cut -f1)
  printf '%-20s %u KiB->' $dir $start_kb
  start_sec=$(date +%s)
  git --git-dir=$dir gc
  end_sec=$(date +%s)
  elapsed=$((end_sec - start_sec))
  end_kb=$(du -sk $dir|cut -f1)
  percent_saved=$(echo "scale=2; 100 * ($start_kb - $end_kb) / $start_kb"|bc)
  printf '%s (saved %s%% in %ss)\n' $end_kb $percent_saved $elapsed
done



Re: FYI, ran "git gc" on all git repositories

by Sylvain Beucler-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, Oct 09, 2009 at 04:27:32PM +0200, Jim Meyering wrote:

> I did the same thing a few months ago.
> For some it made a big difference: emacs.git went from 1.1GB to 155MB.
> Active repositories were shrunk to ~20% or even 5% of their original size.
>
> This is the script I ran:
>
> #!/bin/bash
> log=$(mktemp /tmp/log-repo-gc-XXXXXX)
> printf "Run this to see more detail:\ntail -f $log\n"
> exec >$log
>
> cd /vservers/vcs-noshell/srv/git
>
> for dir in *.git; do
>   echo $dir... 1>&2
>   start_kb=$(du -sk $dir|cut -f1)
>   printf '%-20s %u KiB->' $dir $start_kb
>   start_sec=$(date +%s)
>   git --git-dir=$dir gc
>   end_sec=$(date +%s)
>   elapsed=$((end_sec - start_sec))
>   end_kb=$(du -sk $dir|cut -f1)
>   percent_saved=$(echo "scale=2; 100 * ($start_kb - $end_kb) / $start_kb"|bc)
>   printf '%s (saved %s%% in %ss)\n' $end_kb $percent_saved $elapsed
> done

Cool.  Nice optimization.

I wonder what kind of effects this have though.

Possibly HTTP users will have to download a big file (but then
shouldn't use http ;))

We should ask Petr Baudis from repo.or.cz, I think there's been a
discussion a year or two ago, and they weren't sure -- or fix git into
doing it automatically if that's feasible.

--
Sylvain



Re: FYI, ran "git gc" on all git repositories

by Jim Meyering :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Sylvain Beucler wrote:

> On Fri, Oct 09, 2009 at 04:27:32PM +0200, Jim Meyering wrote:
>> I did the same thing a few months ago.
>> For some it made a big difference: emacs.git went from 1.1GB to 155MB.
>> Active repositories were shrunk to ~20% or even 5% of their original size.
>>
>> This is the script I ran:
>>
>> #!/bin/bash
>> log=$(mktemp /tmp/log-repo-gc-XXXXXX)
>> printf "Run this to see more detail:\ntail -f $log\n"
>> exec >$log
>>
>> cd /vservers/vcs-noshell/srv/git
>>
>> for dir in *.git; do
>>   echo $dir... 1>&2
>>   start_kb=$(du -sk $dir|cut -f1)
>>   printf '%-20s %u KiB->' $dir $start_kb
>>   start_sec=$(date +%s)
>>   git --git-dir=$dir gc
>>   end_sec=$(date +%s)
>>   elapsed=$((end_sec - start_sec))
>>   end_kb=$(du -sk $dir|cut -f1)
>>   percent_saved=$(echo "scale=2; 100 * ($start_kb - $end_kb) / $start_kb"|bc)
>>   printf '%s (saved %s%% in %ss)\n' $end_kb $percent_saved $elapsed
>> done
>
> Cool.  Nice optimization.
>
> I wonder what kind of effects this have though.

It's certainly safe.
I've done this numerous times on savannah, and on other systems,
with no ill effects, other than for those unlucky
enough to have to use git-over-http.

> Possibly HTTP users will have to download a big file (but then
> shouldn't use http ;))

Right on both counts.

> We should ask Petr Baudis from repo.or.cz, I think there's been a
> discussion a year or two ago, and they weren't sure -- or fix git into
> doing it automatically if that's feasible.

One way or another, it's worth automating.



Re: FYI, ran "git gc" on all git repositories

by Sylvain Beucler-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Fri, Oct 09, 2009 at 07:57:52PM +0200, Jim Meyering wrote:

> Sylvain Beucler wrote:
> > On Fri, Oct 09, 2009 at 04:27:32PM +0200, Jim Meyering wrote:
> >> I did the same thing a few months ago.
> >> For some it made a big difference: emacs.git went from 1.1GB to 155MB.
> >> Active repositories were shrunk to ~20% or even 5% of their original size.
> >>
> >> This is the script I ran:
> >>
> >> #!/bin/bash
> >> log=$(mktemp /tmp/log-repo-gc-XXXXXX)
> >> printf "Run this to see more detail:\ntail -f $log\n"
> >> exec >$log
> >>
> >> cd /vservers/vcs-noshell/srv/git
> >>
> >> for dir in *.git; do
> >>   echo $dir... 1>&2
> >>   start_kb=$(du -sk $dir|cut -f1)
> >>   printf '%-20s %u KiB->' $dir $start_kb
> >>   start_sec=$(date +%s)
> >>   git --git-dir=$dir gc
> >>   end_sec=$(date +%s)
> >>   elapsed=$((end_sec - start_sec))
> >>   end_kb=$(du -sk $dir|cut -f1)
> >>   percent_saved=$(echo "scale=2; 100 * ($start_kb - $end_kb) / $start_kb"|bc)
> >>   printf '%s (saved %s%% in %ss)\n' $end_kb $percent_saved $elapsed
> >> done
> >
> > Cool.  Nice optimization.
> >
> > I wonder what kind of effects this have though.
>
> It's certainly safe.
> I've done this numerous times on savannah, and on other systems,
> with no ill effects, other than for those unlucky
> enough to have to use git-over-http.
>
> > Possibly HTTP users will have to download a big file (but then
> > shouldn't use http ;))
>
> Right on both counts.
>
> > We should ask Petr Baudis from repo.or.cz, I think there's been a
> > discussion a year or two ago, and they weren't sure -- or fix git into
> > doing it automatically if that's feasible.
>
> One way or another, it's worth automating.

I think the issues was weird branching, maybe branching from remotes
that were deleted _and_ in forks using repo.or.cz space-efficient
local forks - but I may be wrong.

Last you ran a massive gc was 2009-04-12 - btw it's good to mention
this in the ChangeLog too :)

--
Sylvain



Re: FYI, ran "git gc" on all git repositories

by Jim Meyering :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Sylvain Beucler wrote:
> Last you ran a massive gc was 2009-04-12 - btw it's good to mention
> this in the ChangeLog too :)

Done.  and committed the script under infra/



Re: FYI, ran "git gc" on all git repositories

by Thomas Schwinge-6 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello!

On Fri, Oct 09, 2009 at 04:27:32PM +0200, Jim Meyering wrote:
> cd /vservers/vcs-noshell/srv/git
>
> for dir in *.git; do

I guess this doesn't catch Git repositories that are located in
subfolders: things like hurd/gnumach.git.  Or is the on-disk layout a
single-hierarchy one?

Also, might there be race conditions -- is there some kind of locking
being employed, so that this GC doesn't collect objects that are
currently being uploaded but for which no anchoring ref has been created
yet?  (It may certainly be that this is safe, inherently to the Git
protocol.)


Regards,
 Thomas


signature.asc (198 bytes) Download Attachment

Re: FYI, ran "git gc" on all git repositories

by Jim Meyering :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Thomas Schwinge wrote:
> On Fri, Oct 09, 2009 at 04:27:32PM +0200, Jim Meyering wrote:
>> cd /vservers/vcs-noshell/srv/git
>>
>> for dir in *.git; do
>
> I guess this doesn't catch Git repositories that are located in
> subfolders: things like hurd/gnumach.git.  Or is the on-disk layout a
> single-hierarchy one?

Hi Thomas,

Good point.
There were more than 40 like that.

I've rerun the script with this (temporarily):

  for dir in $(find . -mindepth 2 -maxdepth 2 -name '*.git'); do

and this will get all of them next time:

  for dir in $(find . -maxdepth 2 -name '*.git'); do

> Also, might there be race conditions -- is there some kind of locking
> being employed, so that this GC doesn't collect objects that are
> currently being uploaded but for which no anchoring ref has been created
> yet?  (It may certainly be that this is safe, inherently to the Git
> protocol.)

AFAIK, it's safe.
If you know/learn of any risk, please let us know.

Thanks for the feedback.